automation in the bug flow - machine learning for triaging and tracing

43
Automation in the Bug Flow - MACHINE LEARNING FOR TRIAGING AND TRACING MARKUS BORG, LUND UNIVERSITY

Upload: markus-borg

Post on 19-Jun-2015

288 views

Category:

Science


1 download

DESCRIPTION

Issue management is a costly part of software development. In large projects, the continuous inflow of issue reports contributes to the information overload in a project, i.e., "a state where individuals do not have time or capacity to process all available information". In issue triaging, an initial step in issue management, a developer must be able to overview existing issue reports and easily navigate the software engineering project landscape. In this presentation, we present support for two work tasks involved in issue management: 1) issue assignment and 2) change impact analysis. We use machine learning to harness the ever-growing number of issue reports, by training recommendation systems on previous issues. Our industrial evaluations on 50,000+ issue reports in two large software development organizations indicate that automated issue assignment performs in line with current manual work. Moreover, we present how traceability from already resolved issue reports to various artifacts can be reused to jump start change impact analyses for newly submitted issues. Finally, we speculate on future ways to tame information overload into helpful software engineering recommendations.

TRANSCRIPT

Page 1: Automation in the Bug Flow - Machine Learning for Triaging and Tracing

Automation in the Bug Flow- MACHINE LEARNING FOR TRIAGING AND TRACING

MARKUS BORG, LUND UNIVERSITY

Page 2: Automation in the Bug Flow - Machine Learning for Triaging and Tracing

Bug tracker

The number of incoming bug reports can be overwhelming…

Page 3: Automation in the Bug Flow - Machine Learning for Triaging and Tracing

Bug trackerMachine Learning

Use machine learning

to provide actionable

recommendations

based on previous

patterns!

Page 4: Automation in the Bug Flow - Machine Learning for Triaging and Tracing

- Final year PhD student- MSc CS and engineering- ABB software developer (3 years)

process automationcompilers and editors

Per Runeson

Page 5: Automation in the Bug Flow - Machine Learning for Triaging and Tracing

The ChallengeThe SolutionThe Evaluation

Page 6: Automation in the Bug Flow - Machine Learning for Triaging and Tracing

Reqts. DB

Issue RepoCode Repo

Test DB

Doc. DB.

Developers in large projects must navigate complex information landscapes that continously change

Page 7: Automation in the Bug Flow - Machine Learning for Triaging and Tracing

One bug is not much of a problem…

Page 8: Automation in the Bug Flow - Machine Learning for Triaging and Tracing

Bug tracker

But a large simultaneous inflow of bug reports can make the best bug tracking system sweat!

Page 9: Automation in the Bug Flow - Machine Learning for Triaging and Tracing

Making the wrong prioritizations might result in bugs on your customers

Page 10: Automation in the Bug Flow - Machine Learning for Triaging and Tracing

In a safety-critical context

1. Issue Assignment

2. Change Impact Analysis

This talk addresses two tasks involved in issue management:

Page 11: Automation in the Bug Flow - Machine Learning for Triaging and Tracing

By safety-critical we refer to document-driven development with a rigid process…

… prior to changing source code, a formal change impact analysis has to be conducted and reported according to an approved template.

Page 12: Automation in the Bug Flow - Machine Learning for Triaging and Tracing

We want to increase the confidence at commit time even further!

Page 13: Automation in the Bug Flow - Machine Learning for Triaging and Tracing

The ChallengeThe SolutionThe Evaluation

Page 14: Automation in the Bug Flow - Machine Learning for Triaging and Tracing

We aim to harness the intrinsic navigational value of bug reports

Page 15: Automation in the Bug Flow - Machine Learning for Triaging and Tracing

Bug trackerMachine Learning

We leverage on the number of bug reports in the projects…

Page 16: Automation in the Bug Flow - Machine Learning for Triaging and Tracing

Bug tracker

Machine Learning

The more bugs, the better the machine learning gets!

Page 17: Automation in the Bug Flow - Machine Learning for Triaging and Tracing
Page 18: Automation in the Bug Flow - Machine Learning for Triaging and Tracing

Leif Jonsson

Automated Issue Assignment

• Goal:

Useful tool deployable with minimum configuration effort

• Approach:

Train classifiers on historical bug reports

Combine them using state-of-the-art ensemble learning

Joint work with

Ericsson Research

Page 19: Automation in the Bug Flow - Machine Learning for Triaging and Tracing

How to Represent a Bug Report?

• Component

• Severity

• System Version

• Submit Date

• Submitter Location

etc.

• … And the text! Title and description.

Page 20: Automation in the Bug Flow - Machine Learning for Triaging and Tracing
Page 21: Automation in the Bug Flow - Machine Learning for Triaging and Tracing

Automated Change Impact Analysis

• Goal:

Intuitive tool to jump start analyses based on historical data

Faster + more accurate analyses compared to fully manual work

• Approach

Step 1: Mine the history

Step 2: Recommend impact for new bug fix

Page 22: Automation in the Bug Flow - Machine Learning for Triaging and Tracing

Present recommendations Amazon-style:”Other developers working on this class also modified/tested…”

Page 23: Automation in the Bug Flow - Machine Learning for Triaging and Tracing

Construct network of previously reported impact

Index textual data with

Page 24: Automation in the Bug Flow - Machine Learning for Triaging and Tracing

Calculate centrality measures

Page 25: Automation in the Bug Flow - Machine Learning for Triaging and Tracing

Automated Impact Analysis

• Approach part 2: Recommend impact

Find similar bugs using Apache Lucene

Follow links to identify candidate impact set

Design Doc. X.Y

Req. X.Y

Test case UTC56

Req. Z.Y

Design Doc. X.Y

Page 26: Automation in the Bug Flow - Machine Learning for Triaging and Tracing

Automated Impact Analysis

• Approach part 2: Recommend impact

Follow links to identify candidate impact set

Use centrality measures to rank candidate impact

Find similar bugs using Apache Lucene

1. Requirement X.Y

2. Design Document X.Y

3. Test case UTC56

4. Design Document X.Y

5. Requirement Z.Y

Page 27: Automation in the Bug Flow - Machine Learning for Triaging and Tracing

Screen shot of prototype tool ImpRec

Page 28: Automation in the Bug Flow - Machine Learning for Triaging and Tracing

The ChallengeThe SolutionThe Evaluation

Page 29: Automation in the Bug Flow - Machine Learning for Triaging and Tracing
Page 30: Automation in the Bug Flow - Machine Learning for Triaging and Tracing

Experiment: Issue Assignment

• Five large datasets from two companies

– Telecom and Automation

– 50.000+ issue reports

• 10-fold cross-validation and simulation (”replaying history”)

Page 31: Automation in the Bug Flow - Machine Learning for Triaging and Tracing

Experiment: Issue Assignment

• Prediction accuracy in line with human activity

• Numbers depict number of teams in the projects

67

1764

2836

Page 32: Automation in the Bug Flow - Machine Learning for Triaging and Tracing

Experiment: Issue Assignment

• Warning! Some systems need fresh training data

Page 33: Automation in the Bug Flow - Machine Learning for Triaging and Tracing

Experiment: Issue Assignment

But the decay is not always exponential…

Page 34: Automation in the Bug Flow - Machine Learning for Triaging and Tracing
Page 35: Automation in the Bug Flow - Machine Learning for Triaging and Tracing

Experiment: Change Impact Analysis

• Experiment with historical impact

– Training set: 8 years, Test set: 2 years

ImpRec presents 30% of past impact among the top-5 recommendations(40%@10, 50%@20)

But what does that mean? User study needed.

Page 36: Automation in the Bug Flow - Machine Learning for Triaging and Tracing

Case Study: Change Impact Analysis

• Industrial case study

– Two units of analysis: Team Sweden & Team India

– Tool deployed in March 2014 & August 2014

– Interviews and user log files

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 200

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

Click Distribution, top-20 hits

IA Google

Initial result:Developers’ interaction with impact recommendations similar to Google searches

Page 37: Automation in the Bug Flow - Machine Learning for Triaging and Tracing

Case Study: Change Impact Analysis

• ”Finding these past bugs was exactly what I was looking for actually”

- Developer, India

• ”Directing attention to potential side-effects is very important”

- Project manager, Sweden

Some encouraging

comments from

developers…

Page 38: Automation in the Bug Flow - Machine Learning for Triaging and Tracing

Conclusions

Page 39: Automation in the Bug Flow - Machine Learning for Triaging and Tracing

Automated Issue Assignment

• Automated assignment as accurate as current manual work

– But instantaneous!

• At least 2.000 bug reports needed for training

• Continously monitor the accuracy

Favourable Unfavourable

Static team structure

Dynamic team structure

Maintenance project

New development

Page 40: Automation in the Bug Flow - Machine Learning for Triaging and Tracing

Automated Change Impact Analysis

• Recommendation system provides a useful starting point

• Recommending related issues is a popular feature

– Study previous issue resolutions

– Compare with previous impact analyses

• Recommendation recall for impact: 30-55%

– Reuse previous impact to jump-start analysis

– Provide warning if probable impact is missing

Page 41: Automation in the Bug Flow - Machine Learning for Triaging and Tracing

Embrace your bugs!

Machine learning canguide maintenance work

Much potential:- Severity prediction- Resolution times- ”Noise” filtering

Page 42: Automation in the Bug Flow - Machine Learning for Triaging and Tracing

PHOTO CREDITS

Brown stink bug- Marlin E. RiceIsopods- Omoshiro Aquarium- Flickr: littoraria, codaCubicles- Flickr: templetonelliot, ifl, danburgmurmurEightball girl- Flickr: mobilestreetlifeEvaluate- Flickr: theideadeskMy wife- My wife

Thank you!

[email protected]/markus_borg @_Troddel_

Page 43: Automation in the Bug Flow - Machine Learning for Triaging and Tracing