analysing defect data from an iterative …...iterative development process carina andersson lund...
TRANSCRIPT
Analysing Defect Data from
an Iterative Development
Process
Carina Anderson
Lund University
Analyzing Defect Data
from an
Iterative Development Process
Carina Andersson
Lund University
Disposition
• Background and purpose
• Defect data collected
• Analysis and results
– Hypotheses
– Defect prediction models
• Conclusions
Introduction
• Purpose
– Quantified management decision support
– Understanding observed phenomena
• Methodology
– Case study with several cycles
– Flexible approach, scope adjusted continuously
Set goals
and scope
Collect data Filter data Analyze
data
Present analysis
/feedback
Metrics and models in
software engineering
• How to measure software quality?
Quality = lack of “bugs”?
• Defect prediction models using
– Size and complexity
– Testing metrics (e.g. fault rate)
– Process quality data (e.g. CMM)
– Multivariate approaches
• Measurements: the key to achieve control of the development process
Data collection
• 3 projects
– Different views
• Complete project
• Function groups
• Iterative process
– Test activity
• Function test
• System test
• Operator acceptance
• Miscellaneous
• Defect data classification
– Date found
– Reporter
– Module
– Status
Dev.
FT
ST
CAT
Alpha
time
Ship UR1 UR2Managment
activites
Technical
activites
Dev.
FT
ST
CAT
Alpha
time
Ship UR1 UR2Managment
activites
Technical
activites
Hypotheses
1. Pareto distribution of faults– Does the Pareto principle exist? I.e. do 20% of the
software modules stand for 80% of all defects detected?
2. Module size’s effect– Does a high number of faults in a module relate to
a large software module?
3. Persistence of fault-proneness:– Does a high number of faults in function test imply
a high number of faults in system test?
4. Fault densities in corresponding test activities– Do the test activities have the same fault
distributions for different projects?
Fault distribution
• Pareto analysis
– Identifying areas that cause most of the problems
Figure shows
•Percentage of modules
vs. percentage of faults
•All faults
–Similar behaviour for all
projects
0%
20%
40%
60%
80%
100%
1% 21% 41% 61% 81%
% of modules
% o
f fa
ult
s
20% 40% 60% 80%
Module size’s effect
Larger modules → more faults?? Scatter plot of number of
faults against LOC per
module
– Each dot represents a
module
– Showing total LOC,
(alternative: changed)
In this case:
-No strong evidence that
module size has a strong
impact on fault density0
100
200
300
400
500
Lines of code
Fau
lts
0%
20%
40%
60%
80%
100%
1% 22% 42% 62% 82%
% of modules
% o
f accu
mu
late
d f
au
lts
all faults
LOC
Changed LOC20% 40% 60% 80%
Module size’s effect
Larger modules → more faults??
LOC for ranking the most
fault-prone modules?
Accumulated percentage of
number of faults when
modules are ordered with
respect to LOC
In this case:
-LOC works mediocre at
ranking the most fault-prone
modules
Persistence of fault-proneness
High number of faults in function test
→ high number of faults in system
test??
Scatter plot of faults found by
function test against faults
found by system test
– Each dot represents a
module
The scatter plot indicates that
a relation could exist, saying
that the most fault-prone
modules in function test will
be fault-prone in system test
0
10
20
30
40
50
0 10 20 30 40 50 60 70
FT
ST
Persistence of fault-proneness
Fault-prone modules in system
test could be predicted in
function test??
Alberg diagram of accumulated percentage of number of faults in ST when modules ordered with respect to number of faults in ST vs. FT
In this case:
-10% of the most fault-prone modules in FT are responsible for 40% of the faults in ST
-10% of the most fault-prone modules in ST are responsible for 58% of the faults
0%
20%
40%
60%
80%
100%
1% 21% 41% 61% 81%
% of modules
% o
f accu
mu
late
d f
au
lts i
n S
T
ST
FT
20% 40% 60% 80%
Fault densities at
corresponding test activities
Do the test activities have the same fault densities for
different project??
In this case: The development process has shown to be
stable with respect to fault densities
Test
activityProject 1 Project 2 Project 3 Average
FT 67% 69% 62% 66%
ST 19% 25% 30% 25%
CAT 5% 2% 3% 3%
Misc. 9% 4% 5% 6%
Percentage
distribution of
faults detected by
each test activity,
from project start,
up till shipping
Basic defect prediction model
• Assumptions: similar behaviour between projects (statistical significance)
– Fault densities of the test activities
– Share of function test defects detected at milestone Alpha80%
• Prediction model applied at Alpha to estimate total number of defects at ship
Test
activityAverage
FT 66%
ST 25%
CAT 3%
Misc. 5%
18966,0
1
80,0
1100
11
,,
,,
ShipFTFT
FTShipTotalDistrShare
XX
Software reliability growth models
• Reliability – a quality attribute
– “The probability of failure-free operation of a
software program for a specified time in a
specified environment” (Musa, 1987)
• Reliability growth models
– To estimate the software failure rate from
observed failures
– Support stop test-decisions
Applying
software reliability growth models
• Four models were applied to the failure data
– Concave: G-O, Yamada
– S-shaped: delayed S-shaped, Gompertz
– Basic model: (G-O)
• Selection criteria
– Stability: <5% from one week’s estimate to the next
– Curve fit: statistical test for goodness of fit (R2)
• Evaluation
– Predictive ability:
• Error (estimate – actual)
• Relative error (error/actual)
)1()( bteat
Software reliability growth models –
Behaviour of the models
Applied to system test failures
Delayed
S-shapedGompertz
GOF (R2) 0.992 0.999
•Yamada: do not converge
•G-O: low R2-value
0 5 10 15 20 25 300
weeks
cu
mu
lative
de
fects
Gompertz
Delayed S-shaped
G-O
Gompertz
Delayed s-shaped
Software reliability growth models -
System test failures
• Support release decisions
•Estimates for each week
•Based on observed system test failures
•Best fit: S-shaped models
•Relative error (selected models): 3-12%
Software reliability growth models -
Function test failures
• Violates basic assumptions!
•Based on observed function test failures
•Best fit: S-shaped models
•Relative error (selected models): 0,2-3%
•Useful for predicting the expected number of failures to detect
Gompertz
Delayed s-shaped
Summary – Predictions
• Basic model
– Easy to use
– Applied early
– Requires stable
process
• Software reliability
growth models
– Stabilizes rather late
– Low relative error
Conclusions
• Motivation for collecting various fault data
– Quality control and planning
– To build fault profiles to get fault and failure
predictions
– Enable us to evaluate the effectiveness of different
testing strategies
• Results for the hypotheses: case study evidence?
– To shed light on a number of issues