automation in the bug flow - machine learning for triaging and tracing

Post on 19-Jun-2015

290 Views

Category:

Science

1 Downloads

Preview:

Click to see full reader

DESCRIPTION

Issue management is a costly part of software development. In large projects, the continuous inflow of issue reports contributes to the information overload in a project, i.e., "a state where individuals do not have time or capacity to process all available information". In issue triaging, an initial step in issue management, a developer must be able to overview existing issue reports and easily navigate the software engineering project landscape. In this presentation, we present support for two work tasks involved in issue management: 1) issue assignment and 2) change impact analysis. We use machine learning to harness the ever-growing number of issue reports, by training recommendation systems on previous issues. Our industrial evaluations on 50,000+ issue reports in two large software development organizations indicate that automated issue assignment performs in line with current manual work. Moreover, we present how traceability from already resolved issue reports to various artifacts can be reused to jump start change impact analyses for newly submitted issues. Finally, we speculate on future ways to tame information overload into helpful software engineering recommendations.

TRANSCRIPT

Automation in the Bug Flow- MACHINE LEARNING FOR TRIAGING AND TRACING

MARKUS BORG, LUND UNIVERSITY

Bug tracker

The number of incoming bug reports can be overwhelming…

Bug trackerMachine Learning

Use machine learning

to provide actionable

recommendations

based on previous

patterns!

- Final year PhD student- MSc CS and engineering- ABB software developer (3 years)

process automationcompilers and editors

Per Runeson

The ChallengeThe SolutionThe Evaluation

Reqts. DB

Issue RepoCode Repo

Test DB

Doc. DB.

Developers in large projects must navigate complex information landscapes that continously change

One bug is not much of a problem…

Bug tracker

But a large simultaneous inflow of bug reports can make the best bug tracking system sweat!

Making the wrong prioritizations might result in bugs on your customers

In a safety-critical context

1. Issue Assignment

2. Change Impact Analysis

This talk addresses two tasks involved in issue management:

By safety-critical we refer to document-driven development with a rigid process…

… prior to changing source code, a formal change impact analysis has to be conducted and reported according to an approved template.

We want to increase the confidence at commit time even further!

The ChallengeThe SolutionThe Evaluation

We aim to harness the intrinsic navigational value of bug reports

Bug trackerMachine Learning

We leverage on the number of bug reports in the projects…

Bug tracker

Machine Learning

The more bugs, the better the machine learning gets!

Leif Jonsson

Automated Issue Assignment

• Goal:

Useful tool deployable with minimum configuration effort

• Approach:

Train classifiers on historical bug reports

Combine them using state-of-the-art ensemble learning

Joint work with

Ericsson Research

How to Represent a Bug Report?

• Component

• Severity

• System Version

• Submit Date

• Submitter Location

etc.

• … And the text! Title and description.

Automated Change Impact Analysis

• Goal:

Intuitive tool to jump start analyses based on historical data

Faster + more accurate analyses compared to fully manual work

• Approach

Step 1: Mine the history

Step 2: Recommend impact for new bug fix

Present recommendations Amazon-style:”Other developers working on this class also modified/tested…”

Construct network of previously reported impact

Index textual data with

Calculate centrality measures

Automated Impact Analysis

• Approach part 2: Recommend impact

Find similar bugs using Apache Lucene

Follow links to identify candidate impact set

Design Doc. X.Y

Req. X.Y

Test case UTC56

Req. Z.Y

Design Doc. X.Y

Automated Impact Analysis

• Approach part 2: Recommend impact

Follow links to identify candidate impact set

Use centrality measures to rank candidate impact

Find similar bugs using Apache Lucene

1. Requirement X.Y

2. Design Document X.Y

3. Test case UTC56

4. Design Document X.Y

5. Requirement Z.Y

Screen shot of prototype tool ImpRec

The ChallengeThe SolutionThe Evaluation

Experiment: Issue Assignment

• Five large datasets from two companies

– Telecom and Automation

– 50.000+ issue reports

• 10-fold cross-validation and simulation (”replaying history”)

Experiment: Issue Assignment

• Prediction accuracy in line with human activity

• Numbers depict number of teams in the projects

67

1764

2836

Experiment: Issue Assignment

• Warning! Some systems need fresh training data

Experiment: Issue Assignment

But the decay is not always exponential…

Experiment: Change Impact Analysis

• Experiment with historical impact

– Training set: 8 years, Test set: 2 years

ImpRec presents 30% of past impact among the top-5 recommendations(40%@10, 50%@20)

But what does that mean? User study needed.

Case Study: Change Impact Analysis

• Industrial case study

– Two units of analysis: Team Sweden & Team India

– Tool deployed in March 2014 & August 2014

– Interviews and user log files

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 200

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

Click Distribution, top-20 hits

IA Google

Initial result:Developers’ interaction with impact recommendations similar to Google searches

Case Study: Change Impact Analysis

• ”Finding these past bugs was exactly what I was looking for actually”

- Developer, India

• ”Directing attention to potential side-effects is very important”

- Project manager, Sweden

Some encouraging

comments from

developers…

Conclusions

Automated Issue Assignment

• Automated assignment as accurate as current manual work

– But instantaneous!

• At least 2.000 bug reports needed for training

• Continously monitor the accuracy

Favourable Unfavourable

Static team structure

Dynamic team structure

Maintenance project

New development

Automated Change Impact Analysis

• Recommendation system provides a useful starting point

• Recommending related issues is a popular feature

– Study previous issue resolutions

– Compare with previous impact analyses

• Recommendation recall for impact: 30-55%

– Reuse previous impact to jump-start analysis

– Provide warning if probable impact is missing

Embrace your bugs!

Machine learning canguide maintenance work

Much potential:- Severity prediction- Resolution times- ”Noise” filtering

PHOTO CREDITS

Brown stink bug- Marlin E. RiceIsopods- Omoshiro Aquarium- Flickr: littoraria, codaCubicles- Flickr: templetonelliot, ifl, danburgmurmurEightball girl- Flickr: mobilestreetlifeEvaluate- Flickr: theideadeskMy wife- My wife

Thank you!

markus.borg@cs.lth.secs.lth.se/markus_borg @_Troddel_

top related