analysing defect data from an iterative …...iterative development process carina andersson lund...

Analysing Defect Data from

an Iterative Development

Process

Carina Anderson

Lund University

Analyzing Defect Data

from an

Iterative Development Process

Carina Andersson

Lund University

Disposition

• Background and purpose

• Defect data collected

• Analysis and results

– Hypotheses

– Defect prediction models

• Conclusions

Introduction

• Purpose

– Quantified management decision support

– Understanding observed phenomena

• Methodology

– Case study with several cycles

– Flexible approach, scope adjusted continuously

Set goals

and scope

Collect data Filter data Analyze

data

Present analysis

/feedback

Metrics and models in

software engineering

• How to measure software quality?

Quality = lack of “bugs”?

• Defect prediction models using

– Size and complexity

– Testing metrics (e.g. fault rate)

– Process quality data (e.g. CMM)

– Multivariate approaches

• Measurements: the key to achieve control of the development process

Data collection

• 3 projects

– Different views

• Complete project

• Function groups

• Iterative process

– Test activity

• Function test

• System test

• Operator acceptance

• Miscellaneous

• Defect data classification

– Date found

– Reporter

– Module

– Status

Dev.

FT

ST

CAT

Alpha

time

Ship UR1 UR2Managment

activites

Technical

activites

Dev.

FT

ST

CAT

Alpha

time

Ship UR1 UR2Managment

activites

Technical

activites

Hypotheses

1. Pareto distribution of faults– Does the Pareto principle exist? I.e. do 20% of the

software modules stand for 80% of all defects detected?

2. Module size’s effect– Does a high number of faults in a module relate to

a large software module?

3. Persistence of fault-proneness:– Does a high number of faults in function test imply

a high number of faults in system test?

4. Fault densities in corresponding test activities– Do the test activities have the same fault

distributions for different projects?

Fault distribution

• Pareto analysis

– Identifying areas that cause most of the problems

Figure shows

•Percentage of modules

vs. percentage of faults

•All faults

–Similar behaviour for all

projects

0%

20%

40%

60%

80%

100%

1% 21% 41% 61% 81%

% of modules

% o

f fa

ult

s

20% 40% 60% 80%

Module size’s effect

Larger modules → more faults?? Scatter plot of number of

faults against LOC per

module

– Each dot represents a

module

– Showing total LOC,

(alternative: changed)

In this case:

-No strong evidence that

module size has a strong

impact on fault density0

100

200

300

400

500

Lines of code

Fau

lts

0%

20%

40%

60%

80%

100%

1% 22% 42% 62% 82%

% of modules

% o

f accu

mu

late

d f

au

lts

all faults

LOC

Changed LOC20% 40% 60% 80%

Module size’s effect

Larger modules → more faults??

LOC for ranking the most

fault-prone modules?

Accumulated percentage of

number of faults when

modules are ordered with

respect to LOC

In this case:

-LOC works mediocre at

ranking the most fault-prone

modules

Persistence of fault-proneness

High number of faults in function test

→ high number of faults in system

test??

Scatter plot of faults found by

function test against faults

found by system test

– Each dot represents a

module

The scatter plot indicates that

a relation could exist, saying

that the most fault-prone

modules in function test will

be fault-prone in system test

0

10

20

30

40

50

0 10 20 30 40 50 60 70

FT

ST

Persistence of fault-proneness

Fault-prone modules in system

test could be predicted in

function test??

Alberg diagram of accumulated percentage of number of faults in ST when modules ordered with respect to number of faults in ST vs. FT

In this case:

-10% of the most fault-prone modules in FT are responsible for 40% of the faults in ST

-10% of the most fault-prone modules in ST are responsible for 58% of the faults

0%

20%

40%

60%

80%

100%

1% 21% 41% 61% 81%

% of modules

% o

f accu

mu

late

d f

au

lts i

n S

T

ST

FT

20% 40% 60% 80%

Fault densities at

corresponding test activities

Do the test activities have the same fault densities for

different project??

In this case: The development process has shown to be

stable with respect to fault densities

Test

activityProject 1 Project 2 Project 3 Average

FT 67% 69% 62% 66%

ST 19% 25% 30% 25%

CAT 5% 2% 3% 3%

Misc. 9% 4% 5% 6%

Percentage

distribution of

faults detected by

each test activity,

from project start,

up till shipping

Basic defect prediction model

• Assumptions: similar behaviour between projects (statistical significance)

– Fault densities of the test activities

– Share of function test defects detected at milestone Alpha80%

• Prediction model applied at Alpha to estimate total number of defects at ship

Test

activityAverage

FT 66%

ST 25%

CAT 3%

Misc. 5%

18966,0

1

80,0

1100

11

,,

,,

ShipFTFT

FTShipTotalDistrShare

XX

Software reliability growth models

• Reliability – a quality attribute

– “The probability of failure-free operation of a

software program for a specified time in a

specified environment” (Musa, 1987)

• Reliability growth models

– To estimate the software failure rate from

observed failures

– Support stop test-decisions

Applying

software reliability growth models

• Four models were applied to the failure data

– Concave: G-O, Yamada

– S-shaped: delayed S-shaped, Gompertz

– Basic model: (G-O)

• Selection criteria

– Stability: <5% from one week’s estimate to the next

– Curve fit: statistical test for goodness of fit (R2)

• Evaluation

– Predictive ability:

• Error (estimate – actual)

• Relative error (error/actual)

)1()( bteat

Software reliability growth models –

Behaviour of the models

Applied to system test failures

Delayed

S-shapedGompertz

GOF (R2) 0.992 0.999

•Yamada: do not converge

•G-O: low R2-value

0 5 10 15 20 25 300

weeks

cu

mu

lative

de

fects

Gompertz

Delayed S-shaped

G-O

Gompertz

Delayed s-shaped

Software reliability growth models -

System test failures

• Support release decisions

•Estimates for each week

•Based on observed system test failures

•Best fit: S-shaped models

•Relative error (selected models): 3-12%

Software reliability growth models -

Function test failures

• Violates basic assumptions!

•Based on observed function test failures

•Best fit: S-shaped models

•Relative error (selected models): 0,2-3%

•Useful for predicting the expected number of failures to detect

Gompertz

Delayed s-shaped

Summary – Predictions

• Basic model

– Easy to use

– Applied early

– Requires stable

process

• Software reliability

growth models

– Stabilizes rather late

– Low relative error

Conclusions

• Motivation for collecting various fault data

– Quality control and planning

– To build fault profiles to get fault and failure

predictions

– Enable us to evaluate the effectiveness of different

testing strategies

• Results for the hypotheses: case study evidence?

– To shed light on a number of issues

analysing defect data from an iterative …...iterative development process carina andersson lund...

Documents