the adoption of machine learning techniques for software defect prediction: an initial industrial...

18
Rakesh Rana 1 , Miroslaw Staron 1 , Jörgen Hansson 1 , Martin Nilsson 2 , Wilhelm Meding 3 1 Computer Science & Engineering, Chalmers | University of Gothenburg, Sweden 2 Volvo Car Group, Gothenburg, Sweden 3 Ericsson, Gothenburg, Sweden [email protected] The adoption of machine learning techniques for software defect prediction: An initial industrial validation

Upload: rakesh-rana

Post on 15-Aug-2015

98 views

Category:

Software


1 download

TRANSCRIPT

Page 1: The adoption of machine learning techniques for software defect prediction: An initial industrial validation

Rakesh Rana1, Miroslaw Staron1, Jörgen Hansson1, Martin Nilsson2, Wilhelm Meding3

1Computer Science & Engineering, Chalmers | University of Gothenburg, Sweden

2Volvo Car Group, Gothenburg, Sweden

3Ericsson, Gothenburg, Sweden

[email protected]

The adoption of machine learning techniques for software defect prediction: An initial

industrial validation

Page 2: The adoption of machine learning techniques for software defect prediction: An initial industrial validation

Software Defect Prediction (SDP) methods

Image 1: https://www.reliablesoft.net/how-to-become-an-expert-in-your-niche-even-if-you-are-not/

Image 2: Fenton, Norman, et al. "Predicting software defects in varying development lifecycles using Bayesian nets." Information and Software Technology 49.1 (2007): 32-43.

Image 3: Kan, Stephen H. Metrics and models in software quality engineering. Addison-Wesley Longman Publishing Co., Inc., 2002.

Image 4: http://www.codeodor.com/index.cfm/2009/11/12/Its-Not-Your-Fault-Your-Software-Sucks/3058

Page 3: The adoption of machine learning techniques for software defect prediction: An initial industrial validation

SDP: Methods based on Machine Learning

• Decision Trees (DTs)

• Support Vector Machines (SVMs)

• Artificial Neural Networks

(ANNs)

• Bayesian Belief Networks (BNNs)

Image 1: http://www.webpages.uidaho.edu/veg_measure/Modules/Lessons/Module%202(Sampling)/2_3_Accuracy_and_bias.htm

Image 2: http://www.business2community.com/marketing/3-easy-keyword-research-tips-for-inbound-marketing-success-0215660#!bKz0G0

Image 3: http://dpss.co.riverside.ca.us/childrens-services-division/adoption-information/foster-adoptive-parent

Image 4: http://www.haitian-truth.org/treatys-tighter-adoption-rules-kick-in-for-haiti/

Page 4: The adoption of machine learning techniques for software defect prediction: An initial industrial validation

Objective

“What are the factors that are important for

companies to make informed decision to adopt

(or not adopt) ML algorithms for the purpose of

software defect predictions (SDP)?”

Page 5: The adoption of machine learning techniques for software defect prediction: An initial industrial validation

Research Process

Page 6: The adoption of machine learning techniques for software defect prediction: An initial industrial validation

Framework for Adoption of ML Techniques in Industry

Page 7: The adoption of machine learning techniques for software defect prediction: An initial industrial validation

ML characteristics that affects its acceptance for SDP

Page 8: The adoption of machine learning techniques for software defect prediction: An initial industrial validation

Organizational characteristics that affects its acceptance for defect prediction task

Page 9: The adoption of machine learning techniques for software defect prediction: An initial industrial validation

Study Design

Unit of

analysis

(Domain)

Software

development

process

Current methods for

SDP

Current state

of adoption of

ML for SDP

VCG

(Automotive)

V-shaped

software

development

Focus on status

visualization and analogy

based prediction

Considering

evaluation

Ericsson

(Telecom)

Lean and Agile

development

Various modes of

presenting current status

and predictions methods

Considering

evaluation

Page 10: The adoption of machine learning techniques for software defect prediction: An initial industrial validation

Study Design

The interviewees: VCG, QM VCG, MetricsTL Ericsson, QM Ericsson, MetricsTL

Level Need and importance (Table 2)

Level of Satisfaction (Table 3)

Level of importance (Table 4)

Very Low (VL)

The information is not needed.Not satisfactory, improvement is needed.

The attribute is not needed for analysis.

Low (L)The information is desired, but not considered important.

Not satisfactory, improvement is desired.

The attribute can be considered but not required.

Medium (M)

The information is desired and is considered of value (if available).

Satisfactory, but could be improved.

The attribute is useful for making the analysis.

High (H)The information is deemed as needed and is considered important.

Satisfaction is high.The information on given attribute is needed for making the analysis.

Very High (VH)

The information is a must and should be provided with high accuracy.

Satisfaction is very high, with low scope for further improvement.

Cannot make a decision without information about this attribute.

Page 11: The adoption of machine learning techniques for software defect prediction: An initial industrial validation

Results: Information need and its importance for SDP

Prediction Needs w.r.t software defectsVCG

(QM)

VCG

MetricsTL

Ericsson

(QM)

Ericsson

MetricsTL

Classification of defect prone files/modules L H VH VH

Expected number of defects in SW components H H L VH

Expected defect inflow for a project/release H H L VH

Release readiness/expected latent defects H VH H VH

Severity classification of defects VH M H H

VCG most OEMs (Original Equipment Manufacturers) in automotive domain, Model Based Development (MBD)

Assessing release readiness is important (High) for both case units

Page 12: The adoption of machine learning techniques for software defect prediction: An initial industrial validation

Results: Satisfaction with existing systems

Factors: Satisfaction with existing

systems

VCG

(QM)

VCG

MetricsTL

Ericsson

(QM)

Ericsson

MetricsTL

Status information H H H H

Trend visualization H M M H

Predictions accuracy M M L H

Cost (current costs are low) VH VH - VH

Reliability VH H VH M

“Cost of obtaining results is very important factor and the current systems we use are very cheap to run and maintain” – QM at VCG.

Page 13: The adoption of machine learning techniques for software defect prediction: An initial industrial validation

Results: Familiarity and competence with ML techniques

Factors: Familiarity and competence

with ML techniques

VCG

(QM)

VCG

MetricsTL

Ericsson

(QM)

Ericsson

MetricsTL

ML tried in previous project L L - M

Understanding of the technology L L - M

Ability to implement algorithms in-house VL M - M

Academic collaboration M H - M

Ability to interpret the results H H - M

Ability to assess quality of results H M - M

Participating companies in the study show medium to high confidence with their ability to interpret the results from such analysis.

Page 14: The adoption of machine learning techniques for software defect prediction: An initial industrial validation

Results: Perceived Benefits

Factors: Perceived BenefitsVCG

(QM)

VCG

MetricsTL

Ericsson

(QM)

Ericsson

MetricsTL

Accuracy in predicting H H VH VH

Automation of pattern discovery M H VH VH

Adaptability to different data sets M H VH VH

Ability to handle large data H H M VH

Ability to generate new insights H M H H

“When it comes to the benefits, accuracy and automation are the top priorities for us” – MetricsTL at Ericsson

Page 15: The adoption of machine learning techniques for software defect prediction: An initial industrial validation

Results: Tool availability & External factors

Factors: Tool availability & External

factors

VCG

(QM)

VCG

MetricsTL

Ericsson

(QM)

Ericsson

MetricsTL

Compatibility with existing systems M L H VH

Availability of open source tools L H M VH

Low cost of obtaining results VH H H M

Support/consulting services H M L VL

Adoption by other industries L L L M

Use by competitors H M L M

“Even if open source tools are available, we typically need a vendor in between to do tool integration, manage upgrades and do maintenance work – we do not have resources for that” – QM at VCG.

“We are not afraid of trying new things and being the first one, but if it is used in automotive sector and we have not tried it surely helps the case” – QM at VCG.

Page 16: The adoption of machine learning techniques for software defect prediction: An initial industrial validation

Specific challenges in adopting ML techniques in industry for SDP

Lack of information to make a strong business case

Uncertainty on applicability of ML when access to source

code is not available

How to adapt ML techniques for model driven

development

How to effectively use text base artefacts for SDP

Uncertainty over where ML fits in context of compliance to

standards

Page 17: The adoption of machine learning techniques for software defect prediction: An initial industrial validation

ML Adoption for SDP: Conclusions

ML based techniques have high potential to aid companies in SDP efforts

We identified a total of nine important factors and twenty seven related attributes

ML adoption framework help increase our understanding of factors and attributes relevant for industrial practitioners

ML adoption framework will be useful for

Companies, Researchers, and Tool vendors

What are the factors that are important for companies to make informed decision to adopt (or not adopt) ML algorithms for the purpose of software defect predictions (SDP)?

Page 18: The adoption of machine learning techniques for software defect prediction: An initial industrial validation