imperfect debugging in software reliability a bayesian approach
TRANSCRIPT
-
7/27/2019 Imperfect Debugging in Software Reliability a Bayesian Approach
1/22
\ fWf#
The Institute for Integrating Statistics in Decision Sciences
Technical Report TR-2011-1
January 24, 2011
Imperfect Debugging in Software Reliability:A Bayesian Approach
Tevfik Aktekin
Department of Decision Sciences
University of New Hampshire
Toros Caglar
Department of Decision Sciences
The George Washington University
-
7/27/2019 Imperfect Debugging in Software Reliability a Bayesian Approach
2/22
Imperfect Debugging in Software Reliability: A
Bayesian Approach
Tevfik Aktekin
Department of Decision Sciences
The George Washington University
Toros Caglar
Department of Decision Sciences
The George Washington University
January 19, 2011
Abstract
The objective of studying software reliability is to assist software engineers in understanding
more of the probabilistic nature of software failures during the debugging stages and to construct
reliability models. In this paper, we consider Bayesian modeling of the inter-failure times whose
parameter space is evolving stochastically over time typically observed in software reliability
applications. In doing so, we focus on the modeling of relevant parameters such as the fault
detection rate per fault and the total number of faults that are both latent (unobservable)
factors. We consider a model which can take into account imperfect debugging type of behavior
under certain conditions. Furthermore, we investigate the existence of perfect vs. imperfect
debugging in the light of data. In order to show how the proposed model work, we use real data
based on which we obtain the predictive reliability functions after each testing stage, carry out
inference on relevant model parameters and present additional insights from Bayesian analysis.
1
-
7/27/2019 Imperfect Debugging in Software Reliability a Bayesian Approach
3/22
1 Introduction and Overview
Over the last two decades, more than a hundred software reliability models have been proposed
by researchers as pointed out by both Singpurwalla and Wilson (1994) and Kuo and Yang (1996).
Software reliability can be defined as the probability of not observing a failure for a specified
time interval under certain conditions. Here, not observing a failure represents the fact that
the software runs without any problems (which does not necessarily mean that the code does not
contain any more bugs). Software failure mainly happens due to problems in the code which can
be attributed to human error. In software reliability research hard data are based on the failures
not the errors left (bugs), since the latter are observable quantities whereas the formal is not.
Before the final release, the software passes through several stages of testing in which debugging is
performed and its reliability is assessed after each stage. As pointed out by Singpurwalla and Wilson(1994), the testing stages of software during development is considered to be the most expensive
step. When the so called software reliability is found to be adequate the software is released. The
question of when to release software given its reliability function is of utmost importance to software
practitioners and requires proper modeling of relevant uncertain quantities which give birth to the
software reliability function.
The purpose of software testing is to detect software faults (bugs) inherent in the software
code. A software fault is an error in the source-code, which can cause the software to fail when
the program is executed. The testing stage consists of several consecutive program executions.
Whenever a failure occurs, the software engineer attempts to fix the problem. When the problem
is fixed and no new errors are introduced during debugging, then a perfect debugging is said to
have occurred. Whereas, when new errors are introduced during debugging, then an imperfect
debugging is said to have occurred. We note here that when imperfect debugging occurs, the
software reliability is worsened whereas with perfect debugging the software becomes more reliable
due to the fact that there are fewer bugs left in the software. There are several reliability models
that either assume perfect or imperfect debugging which we briefly summarize in the sequel.
Most software reliability models can be classified into one of three categories: models that are
based on failure rates of the inter-failure times, models based on number of failures and models
based on the actual inter-failure times. The earliest software reliability model is due to Jelinski and
Moranda (1972) (JM) where the focus is on the modeling of failure rates. Although the model is
2
-
7/27/2019 Imperfect Debugging in Software Reliability a Bayesian Approach
4/22
very simple and is based on several questionable assumptions, it is the building block for most of
the modern software reliability models in the literature. The JM model assumes that the software
contains an unknown number of bugs and upon software failure a bug is detected and corrected
entirely, i.e. perfect debugging. The JM model assumes that every fault contributes equally to the
failure rate at any stage of the testing and that each fault is removed permanently upon failure.
A straight forward maximum likelihood estimation is used to estimate model parameters. Another
model that is based on failure rates is the one introduced by Littlewood and Verall (1973) (LV)
where every fault does not contribute equally to the failure rate at any stage of the software testing,
each fault is still assumed to be removed permanently upon failure. For parameter estimation the
LV model uses a combination of maximum likelihood and Bayes estimation methods. Bayesian
extensions of both JM and LV models have been considered by Kuo and Yang (1995) and the LV
model by Soyer and Mazzuchi (1988). More recently, Washburn (2006) introduce the generalization
of the Jelinski-Moranda model by considering a negative binomial prior for the number of faults
left in the software where perfect debugging is assumed.
The earliest failure count model is due to Goel and Okumoto (1979) where a non-homogeneous
Poisson process with intensity function, (t) = a(1 ebt) is introduced. This model is considered
to have given birth to most count models in software reliability, also referred to as NHPP models.
Later, Musa and Okumoto (1984) propose another failure count model where a logarithmic Poisson
execution time type of approach is considered. This model simply suggests that the rate at which
failures occur exponentially decreases with the expected number of failures. Bayesian analysis
of failure count models have been the subject of interest of many studies. Kuo and Yang (1996)
propose a unified approach to the non-homogeneous Poisson process models and carry out Bayesian
inference via Markov chain Monte Carlo methods. In doing so, they consider modeling the epochs
of failures via a general order statistics model or a record value statistics model Campodonico
and Singpurwalla (1995) discuss the incorporation of expert knowledge into the count processes
typically used in software reliability such as the non-homogeneous Poisson process.
Software reliability models based on the actual inter-failure times are scarcer in the literature.
One such model is due to Singpurwalla and Soyer (1992) where the relationship between the inter-
failure times are modeled via the power law which boils down to a first order autoregressive linear
model in logarithms. Coupled with an evolution equation on the parameter space, the authors
3
-
7/27/2019 Imperfect Debugging in Software Reliability a Bayesian Approach
5/22
obtain a Gaussian Kalman filter model. Dalal and Mallows (1988) investigate the optimal stopping
during the testing stage of software development and Morali and Soyer (2003) propose a Bayesian
state space model with analytically tractable properties for optimal stopping. There is also a
considerable amount of work done in software reliability from an operational profile perspective
which is defined as the set of all operations that a software is designed to perform and the occurrence
probabilities of these operations. Ozekici et al. (2000), Ozekici and Soyer (2001) and Ozekici and
Soyer (2003) study such operational profiles and develop optimal release strategies. In addition,
thorough reviews of most software reliability models can be found in Xie (1991), Singpurwalla
(1995) and Singpurwalla and Wilson (1999).
In this paper we consider Bayesian modeling of the inter-failure times whose parameter space
is evolving stochastically over testing stages. We define our uncertainty about these unknown
parameters via probabilities therefore a Bayesian approach is the natural choice. This method
allows us to sequentially update relevant model parameters as well as their predictive distributions
such as the predictive reliability function which can be used in determining the optimal time to
release software. In doing so we focus on the modeling of relevant parameters such as the fault
detection rate per fault and the total number of faults that are both latent (unobservable) factors.
Coupled with an exponential likelihood on the inter-failure times, we consider a model which can
take into account imperfect debugging type of behavior under certain conditions typically observed
in software reliability problems but quite scarce in the software reliability literature, see Ruggeri
et al. (2010) for a most recent study. Furthermore, we investigate the existence of perfect vs.
imperfect debugging by comparing the in-sample fit of both models using methods which will be
detailed in the sequel. In estimating model parameters, we use Markov chain Monte Carlo methods
such as the Metropolis-Hastings and the Gibbs sampler or a combination of both, see Smith and
Gelman (1992) and Chib and Greenberg (1995) for a detailed review of these methods. In order
to show how the proposed model work we use real software inter-failure times data using which we
predict the reliability function after each testing stage, carry out inference on the number of bugs
left in the system and fault detection rate per fault after each stage of testing.
There are many features of the Bayesian approach which would be of interest to software reli-
ability practitioners. As pointed out by Lindley (1990) the Bayesian approach provides a coherent
framework for making decisions under uncertainty thus enables us to develop optimal strategies
4
-
7/27/2019 Imperfect Debugging in Software Reliability a Bayesian Approach
6/22
in releasing software. One of the attractive features of the Bayesian approach is its ability to al-
low expert knowledge incorporation via the prior distributions of relevant model parameters, see
Campodonico and Singpurwalla (1995) for instance. Another important property is the straight
forward manner in which it handles sequential updating of model parameters as one goes through
the testing stages in the light of new information regarding failures.
A synopsis of our study is as follows: In Section 2, we introduce a Bayesian model which can take
into account the imperfect debugging phenomena under certain conditions and discuss a method to
compare models. An illustration of the proposed models is carried out in Section 3 via real software
failure data where we discuss in-sample fit issues, convergence issues of the estimation method and
the inference of relevant model parameters. Finally, in Section 4 we conclude with a summary of
our findings and suggestions for future work.
2 Proposed Model
The modeling approach that we will develop in the sequel is based on the Jelinski Moranda (JM)
model as is the case for most subsequent work in the software reliability literature. Two of the main
assumptions of the JM model is that every fault contributes equally to the failure rate at any stage
of the testing and that each fault is removed permanently upon failure. We will consider the case
where each fault is removed permanently upon failure, along with the possibility of introducing
new faults into the source code while conducting the debugging procedure upon failure. In other
words, we consider scenarios where the so called imperfect debugging can occur which is scarce in
the software reliability literature.
Let us first introduce a set of relevant parameters whose inferential implications will be discussed
in the sequel. Let i for i = 1, . . . , N be the fault detection rate per fault during the ith stage of
testing where N represents the last stage of testing. In addition, let i for i = 1, . . . , N represent the
number of faults present on the software code during the ith stage of testing. Thus, the product
(ii) represents the failure rate of the software during the ith stage of testing. We note here
that both quantities are latent, i.e. unobservable quantities whose uncertainty will be defined via
probability distributions. Both i and i are functions ofi, in other words they evolve stochastically
from testing stage to testing stage.
5
-
7/27/2019 Imperfect Debugging in Software Reliability a Bayesian Approach
7/22
2.1 Likelihood
As is the case with most software reliability models, we assume that the inter-failure times that are
observable quantities, ti for i = 1, . . . , N are exponentially distributed. Furthermore, given i and
i, ti are said to be conditionally independent. Therefore the likelihood function can be obtained
as
L(, ; D) =Ni=1
iiexp{tiii}, (2.1)
where = {1, . . . , N}, = {1, . . . , N} and D = {t1, . . . , tN}. Therefore the joint posterior of
and would be given by
p(, |D) Ni=1
iiexp{tiii}p()p(), (2.2)
where (2.2) will not be analytically available for any reasonable joint prior choice ofp() and p().
Therefore, one can use Markov chain Monte Carlo (MCMC) methods such as the Metropolis-
Hastings algorithm or the Gibbs sampler to obtain the posterior distributions of and . We
discuss the implementation and implications of such MCMC methods in our numerical example
section. Next we introduce modeling strategies for is and is.
2.2 Modeling the fault detection rate per fault, i
We believe that the fault detection rate per fault during the ith stage will be highly dependent on
what happened on the (i 1)th debugging stage. Let the dependent structure of the is be given
by the following power law relationship
i = i1 i, for i = 1, . . . , N , (2.3)
where i is lognormally distributed as i LN(0, ). By taking the logarithms of (2.3), we canobtain the following the linear model in logarithms
log(i) = log(i1) + i, for i = 1, . . . , N , (2.4)
6
-
7/27/2019 Imperfect Debugging in Software Reliability a Bayesian Approach
8/22
where i = log(i). (2.4) is a first order autoregressive process of the latent fault detection rates
per fault in the log scale. Therefore using the Markov property, the conditional distributions of
log(i)s can be written as
log(i)|log(i1), N(log(i1), ), for i = 1, . . . , N , (2.5)
where acts like a first order autoregressive coefficient and is assumed to be U(a, b). The
availability of the conditional distributions given by (2.5) is an attractive feature since, via the
chain rule, they can be used as a product in (2.2) in lieu of p(). The lognormal is a natural
choice due to its well known properties and the availability of full conditionals ( log(i)|log(i1))
for i = 1, . . . , N .
The relationship implied by (2.3) also dictates the type of debugging that occurs during the ithdebugging stage. If for instance i < i1, then perfect debugging is said to have occurred, whereas
when i > i1, imperfect debugging is said to have occurred. In other words, when a failure is
detected at the (i 1)th failure epoch, a fault has been detected and repaired, however a new fault
was introduced during the same debugging stage. A priori we assume that is are lognormally
distributed and their dependent structure is given via the power law introduced in (2.3) where
determines on average how the fault detection rate per fault is changing from stage to stage. For
instance, when 0 < i1 < 1 and > 1 then perfect debugging tends to occur, conversely when
0 < < 1 then imperfect debugging tends to occur. Thus, inference on and being able to make
probabilistic arguments on would be of interest to software engineers in assessing the overall
performance of the testing stage. For instance one can determine the probability of imperfect
debugging via the posterior distribution of , P(0 < < 1|D) when 0 < i1 < 1. We discuss
implications of such probabilistic arguments on in our numerical application.
2.3 Modeling the total number of faults, i
Another quantity of interest is the inherent total number of faults during the ith stage of debugging
denoted by i for i = 1, . . . , N . We note here that is are latent factors whose inference will be
conditional on the fault detection rate at each stage of testing. Given the dependent structure
previously introduced for is, it is natural to assume that the total number of faults left in the
software code after the ith debugging, i, will be dependent on the previous stage. Conditional
7
-
7/27/2019 Imperfect Debugging in Software Reliability a Bayesian Approach
9/22
on whether perfect or imperfect debugging has occurred during the previous debugging stage, the
total number of faults left in the software code is assumed to have the following structure
i = i1 i, for i = 1, . . . , N (2.6)
where
i = 1, with probability p(i < i1)
= 0, with probability p(i > i1) for i = 1, . . . , N .
In (2.6), i is a Bernoulli process whose probability of success is determined via the probability
of perfect debugging, i.e. p(i < i1). This structure, makes sure that when perfect debugging
occurs during the ith debugging stage then i goes down by one unit, since the fault that has
caused the failure has been found and fixed. Whereas when imperfect debugging occurs i stays
the same, since the fault that has caused the failure has been found and fixed, however a new fault
has been introduced while fixing the previous one. An important assumption that has been made
with regards to the behavior of Nis was that the imperfect debugging introduces only a single fault
and the perfect debugging removes a single fault. That may or may not be the case in real life
software debugging. That is more than one fault can be introduced during an imperfect debugging
and more than one fault can be repaired during a perfect debugging. Our model will assume that a
single fault can be repaired or introduced during a debugging operation. An alternative to handle
such a case is to assume a discrete Markov chain structure on i. Such a setup will not be addressed
in this study since it will further complicate the estimation procedure and can be addressed in the
future as a new problem.
Furthermore, the initial number of inherent faults (will be denoted by 0) is also unknown,
therefore needs to be defined via a probability distribution. We assume that 0 follows a Poisson
distribution with mean . In addition, we introduce an extra level of hierarchy on which can beinterpreted as the expected initial number of faults prior to testing. Typically, in software reliability
applications the initial number of inherent number of faults is estimated via deterministic functions
that are based on the total number of lines of source-code. Thus based on the structures introduced
on is and is and the data at hand, we learn more about by carrying out inference via its posterior
8
-
7/27/2019 Imperfect Debugging in Software Reliability a Bayesian Approach
10/22
distribution. We assume a Gamma prior on and thus summarize the hierarchy on the inherent
number of faults prior to testing as follows
0 P oisson() (2.7)
and
Gamma(c, d). (2.8)
In our numerical example, we assume a flat Gamma prior for and point out how sensitive the
model estimation is to changes in hyper-parameters, c and d.
2.4 Predictive reliability function estimation
As pointed out previously, prior to the final release the software passes through several stages of
testing and its reliability is assessed after each stage. When the reliability of a piece of software
is found to be adequate the software is released. Therefore, the optimal release of software is
determined via the estimated predictive reliability function after the (i 1)th testing stage. Given
the model parameters such as i and i, the predictive reliability function during the ith testing
stage (given that the software has gone through (i 1)th stages of testing) can be obtained via
R(ti|D(i1)
) =
R(ti|i, i, D(i1)
)didi, (2.9)
where D(i1) = {t1, . . . , ti1}. Since Markov chain Monte Carlo methods will be used to generate
samples of i and i, (2.9) can also be computed via the following
R(ti|D(i1)) = 1
1
S
Sj=1
F(ti|(j)i ,
(j)i , D
(i1)), (2.10)
where S is the number of generated posterior samples and (j)i and
(j)i are the jth posterior samples
given D(i1). We note here that given D(i1), only posterior samples of (j)i1 and (j)i1 would be
available. However, one can use straight Monte Carlo in (2.3) in order to generate (j)i s and in (2.6)
to generate (j)i s. Once (2.10) is estimated it can easily be used as part of a software reliability
optimal release scheme.
9
-
7/27/2019 Imperfect Debugging in Software Reliability a Bayesian Approach
11/22
2.5 Model comparison of perfect vs. imperfect debugging models
In addition to the imperfect debugging model developed previously, using a similar structure we
have also estimated a model which assumes perfect debugging, namely we have estimated a modified
Bayesian version of Jelinski and Moranda (1972), the JM model. In doing so, we have assumed
that all is follow a common Gamma prior and adjusted the structure on is as in (2.6) such that
the term i is replaced by 1, i.e. i = i11, for i = 1, . . . , N . Thus, the perfect debugging model
can be used as a benchmark for comparison purposes.
In comparing the in-sample performance of the proposed models, we will use one of the most
commonly used Bayes factor approximation of models with MCMC steps, this measure is also
referred to as the harmonic mean estimator. As summarized by Kass and Raftery (1995), the
harmonic mean estimator can be obtained as
p(D) = {1
S
Sj=1
p(D|(j))1}1, (2.11)
where S is the number of iterations and (j) is jth generated posterior sample vector. (2.11) is the
harmonic mean of the likelihood values and can be used to compare the in-sample fit performance
of the proposed models. For the proposed models, (2.11) can be computed as follows
p(D) = { 1S
Sj=1
{Ni=1
p(ti|(j)i , (j)i )}1}1, (2.12)
where p(ti|(j)i ,
(j)i ) =
(j)i
(j)i exp{ti
(j)i
(j)i } as in (2.1). Therefore, in comparing two models,
a higher p(D) value indicates a better fit. In our numerical example, p(D) is computed in the
log-scale.
2.6 Summary of the imperfect debugging model
Below is a concise summary of the material developed in Sections 2.1, 2.2 and 2.3 which we refer
to as the imperfect debugging model in the sequel. For i = 1,...,N, the likelihood term for the
inter-failure times is given by
ti Exp(i i).
10
-
7/27/2019 Imperfect Debugging in Software Reliability a Bayesian Approach
12/22
The state evolution equation of the fault detection rate per fault, i is given by
i = i1 i
where i LN(0, ) and U(a, b).
The state evolution equation of inherent number of faults, i is given by
i = i1 i,
where
i = 1, with probability p(i < i1)
= 0, with probability p(i > i1).
with 0 Poisson() and Gamma(c, d).
3 Numerical Example
The degree to which our software reliability model performs with actual data is crucial in evaluating
its validity. The details of the dataset, the estimation implications, and a summary of our findings
are discussed in the sequel.
3.1 The dataset
The numerical application of our model is carried out on the well known dataset first reported in
Jelinski and Moranda (1972) referred to as the JM data in the sequel. The dataset consists of 31
software inter-failure times, 26 of which were obtained during the production stage of debugging
and the remaining 5 during the rest of the testing stage. In our example, all 31 inter-failure times
were used. Soyer and Mazzuchi (1988) and Kuo and Yang (1996) also use the same dataset to
illustrate their respective proposed models from a Bayesian point of view.
11
-
7/27/2019 Imperfect Debugging in Software Reliability a Bayesian Approach
13/22
3.2 Markov chain Monte Carlo implementation and convergence
In order to obtain the posterior samples of model parameters, a combination of Markov chain Monte
Carlo (MCMC) methods such as the Gibbs sampling and the Metropolis-Hastings were used (see
Smith and Gelman (1992) and Chib and Greenberg (1995) for a summary of most common practice
in MCMC). In order to generate posterior samples for the proposed models (both imperfect and
perfect debugging models), the WinBUGS software was used and the code is available via email
upon request from the authors. We have assumed flat priors for model parameters when required.
Three parallel chains were used with different initial points. The chains were ran for 5,000 iterations
as the burn-in period and 20,000 samples were collected with a thinning interval of 2. Below, we
present the posterior sample output for relevant parameters and discuss further implications of
findings. For the sake of preserving space we will omit a detailed summary of convergence for all
relevant model parameters of the imperfect debugging model. We only present results for some of
the parameters and note that similar results were obtained for the rest of the parameters.
iterations
5000 10000 15000 20000 25000
0.
80
0.
85
0.
90
0.
95
1.
00
iterations
5000 10000 15000 20000 25000
20
40
60
80
iterations
1
5000 10000 15000 20000 25000
0.
005
0.0
10
0.0
15
iterations
1
5000 10000 15000 20000 25000
10
20
30
40
50
60
Figure 1: Trace plots for , , 1 and 1 for the imperfect debugging model
In obtaining the posterior samples, we did not encounter a problem of convergence in either
model. This can informally be observed from the trace plots of the examples in Figure 1.
A more formal way of assessing convergence can be carried out using the Brooks and Gelman
plots and the shrink factor as discussed in Brooks and Gelman (1998). If the shrink factor, also
referred to as the scale reduction point estimate, is around 1 then convergence is said to have been
attained. The Brooks and Gelman plots are shown in Figure 2 where the shrink factor approaches
1 as the number of iterations increases for parameters, , , 1 and 1. The estimated shrink factor
was 1 for and 1.01 for . The individual shrink factors for is were within a range of 1.00 and 1.07
with a multivariate shrink factor of 1.02. Similarly, is shrink factors were all around 1.02 with a
12
-
7/27/2019 Imperfect Debugging in Software Reliability a Bayesian Approach
14/22
5000 10000 15000 20000 25000
1.
00
1.
05
1.1
0
1.
15
last iteration in chain
shrink
factor
median
97.5%
beta
5000 10000 15000 20000 25000
1.
0
1.2
1.4
1.
6
1.
8
last iteration in chain
shrink
factor
median
97.5%
theta
5000 10000 15000 20000 25000
1.
00
1.
05
1.
10
1.
15
1.
20
1.
25
last iteration in chain
shrink
factor
median
97.5%
5000 10000 15000 20000 25000
1.0
1.5
2.0
2.
5
last iteration in chain
shrink
factor
median
97.5%
Figure 2: Brooks and Gelman plots for , , 1 and 1 for the imperfect debugging model
multivariate shrink factor of 1.01. Similar results were obtained for other parameters, therefore the
convergence details of other parameters will be omitted from our discussion in order to preserve
space. Thus, we can conclude that we did not encounter any convergence issues with either model.
3.3 Summary of findings
Next we discuss the implications of the posterior distributions obtained for the relevant model
parameters. A boxplot based on the posterior samples of the fault detection rate per fault during
each debugging stage, i.e. is, is shown in Figure 3. As introduced previously, when i < i1
perfect debugging is said to have occurred. Figure 3 shows how the fault detection rate per fault is
changing over debugging stages. It can be fairly argued that during the first 15-16 testing stages is
tend to go up, namely it becomes easier to find faults, and is tend to go down towards the end of
the testing, in other words it becomes harder to find faults as more faults are discovered. However,
while this pattern is occurring the fault detection rate per fault goes up/down as imperfect/perfect
debugging occurs over the testing horizon. Another finding is that the uncertainty of the fault
detection rate based on the upper and lower quantiles of is seem to be higher around the 15th
and 16th testing stages and lower at the beginning and at the end of the testing. One of the
advantages of the Bayesian approach is that it allows probabilistic arguments to be made about
model parameters, for instance based on Figure 3 it would be straight forward to compute posterior
probabilities such as P(i < i1|D) for i = 2, . . . , 31 which indicates the probability of perfect
debugging from stage to stage.
Right panel of Figure 4 shows the posterior distribution of 1, the fault detection rate per fault
during the first debugging stage (similar distributions were obtained for i for i = 2, . . . , 31). Left
13
-
7/27/2019 Imperfect Debugging in Software Reliability a Bayesian Approach
15/22
0.
00
0.
02
0.
04
0.
06
0.
08
0.1
0
0.1
2
Debugging Stages
i
Figure 3: Boxplots of i for i = 1, . . . , 31
panel of Figure 4 shows the posterior distribution of as introduced in (2.3) whose posterior mean,
E(|D) was 0.96 with as standard deviation of 0.028. This indicates that on the average imperfect
debugging tends to occur during the 31 stages of testing.
The posterior probability of perfect debugging, P(i < i1|D), determines the probability
of success of the Bernoulli process i for i = 1, . . . , 31 as introduced in (2.6). Figure 5 shows
the behavior of the posterior mean of is over testing stages, i.e. posterior probability of perfect
debugging. When the probability is greater than 0.5 then perfect debugging is said to be more
likely to occur as opposed to imperfect debugging. In Figure 5, cases that are located above 0.5 can
be categorized as the most likely perfect debugging scenarios whereas cases that are below 0.5 as
14
-
7/27/2019 Imperfect Debugging in Software Reliability a Bayesian Approach
16/22
0.80 0.85 0.90 0.95 1.00
0
5
10
15
Density
0.005 0.010 0.015
0
50
1
00
150
200
1
Density
Figure 4: Posterior distribution plots for (left) and 1 (right) for the imperfect debugging model
0 5 10 15 20 25 30
0.2
0.4
0.6
0.8
Debugging Stages
ProbabilityofPerfectDebugging
Figure 5: Probability of perfect debugging vs. debugging stages
the most likely imperfect debugging scenarios. This shows informal evidence in favor of potential
imperfect debugging behavior in the numerical example studied in this section. As a function ofis, the number of errors in the system, is are determined according to the structure introduced in
(2.6). Figure 6 is the boxplot of i for i = 1, . . . , 31. The expected number of errors in the system
after the first debugging stage, E(1|D), was estimated to be 18.142 and the expected number of
errors in the system after the thirty first debugging stage, E(31|D), was estimated to be 4.063.
15
-
7/27/2019 Imperfect Debugging in Software Reliability a Bayesian Approach
17/22
This once more shows evidence in favor of the presence of imperfect debugging in our numerical
example since if perfect debugging were to occur after 31 stages of debugging the expected number
of faults would have gone down by 31 units which is was not found to be the case in our example.
0
5
10
15
20
25
30
Debugging Stages
i
Figure 6: Boxplots of i for i = 1, . . . , 31
Left panel of Figure 7 shows the posterior distribution of the latent factor , i.e. expected
number of faults in the software prior to testing as introduced in (2.8). The posterior mean of ,
E(|D) was 28.01 with as standard deviation of 9.023. In estimating the hyper-parameter , we
have used a flat Gamma prior, with initial values of 50, 75, 100 for the three chains that the MCMC
algorithm was run for. We note there that the inference of was not sensitive to the choice of
the initials. Right panel of Figure 7 shows the posterior distribution of 0, the inherent number of
faults in the software prior to testing whose determined conditional on the hyper-parameter .
16
-
7/27/2019 Imperfect Debugging in Software Reliability a Bayesian Approach
18/22
20 40 60 80
0.0
0
0.0
1
0.0
2
0.0
3
0.0
4
0.0
5
Density
10 20 30 40 50 60
0.0
0
0.0
2
0.0
4
0.0
6
0.0
8
0.1
0
0
Density
Figure 7: Posterior distribution plots for (left) and 0 (right) for the imperfect debugging model
3.4 Additional insights from Bayesian analysis
In inferring whether perfect or imperfect debugging occurs in our numerical example, we have usedthe Bayes factor calculation as introduced in (2.12). Table 1 shows the likelihood contributions
in the log-scale for the two candidate models, the imperfect debugging model and the perfect
debugging model as discussed in Section 2.5. The imperfect debugging model has the highest log-
likelihood value, implying a Bayes factor of approximately > 100 (BF=p(D|Imperfect Debug)p(D|Perfect Debug) ) which
according to Kass and Raftery (1995) shows decisive support in its favor. Thus, it can be argued
that imperfect debugging as described in this study occurs given the numerical example at hand.
Perfect Debug Imperfect Debuglog{p(D)} -191.0651 -108.58
Table 1: log{p(D)} under each model
Another attractive feature of the Bayesian approach is the computation of the predictive reli-
ability function. We have obtained the predictive reliability functions of the last three debugging
stages (29,30 and 31) as shown in Figure 8. In doing so, we have used the following
R(ti|D(i1)) = 1S
Sj=1
exp{ti(j)i (j)i }, (3.1)
for i = 29, . . . , 31 where the posterior samples of (j)i s and
(j)i s are obtained using straight Monte
Carlo from (j)i =
(j)(j)i1
(j)i (2.3) and
(j)i =
(j)i1
(j)i (2.6), respectively. These predictive
reliability distributions can easily be used as part of a larger software reliability optimal release
17
-
7/27/2019 Imperfect Debugging in Software Reliability a Bayesian Approach
19/22
problem.
0 10 20 30 40 50
0.4
0.5
0.6
0.7
0.8
0.9
1.0
Predictive Reliability Function After 28 Stages of Testing
t
R(t|D)
0 10 20 30 40 50
0.4
0.5
0.6
0.7
0.8
0.9
1.0
Predictive Reliability Function After 29 Stages of Testing
t
R(t|D)
0 10 20 30 40 50
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
Predictive Reliability Function After 30 Stages of Testing
t
R(t|D)
Figure 8: Predictive reliability functions for i = 29, . . . , 31 for the imperfect debugging model
4 Concluding Remarks
In this study, we considered Bayesian analysis of the inter-failure times typically observed in software
reliability. In doing so, we defined our uncertainty about relevant model parameters via probabilities
therefore a Bayesian point of view have been followed. This allowed us to sequentially update model
parameters as well as relevant predictive distributions such as the predictive reliability function
which can be used in determining the optimal time to release software. We focused on the modeling
of latent factors such as the fault detection rate per fault and the total number of faults where amodel which can take into account imperfect debugging type of behavior under certain conditions
was introduced. This phenomenon is typically observed in software reliability applications however
its analysis is quite scarce in the literature. Furthermore, we investigated the existence of perfect
vs. imperfect debugging by comparing the in-sample fit of both models where decisive evidence in
favor of the imperfect debugging model has been found.
There are many attractive features of the Bayesian approach which would be of interest to soft-
ware reliability practitioners. The Bayesian approach provides a coherent framework for making
decisions under uncertainty thus can easily be implemented in optimal software release strategies.
Another important feature of the Bayesian approach is its ability to allow expert knowledge in-
corporation via the prior distributions of relevant model parameters as emphasized by Washburn
(2006). In addition, the straight forward manner in which it handles sequential updating of model
parameters as one goes through the testing stages in the light of new information regarding software
18
-
7/27/2019 Imperfect Debugging in Software Reliability a Bayesian Approach
20/22
failures would be of interest to software reliability practitioners.
We believe that there are couple of areas on which future extensions of our proposed model
are possible. One of the main assumptions of our model is that only one fault can be removed
or introduced during the testing stage. This can be relaxed by introducing a Markov chain type
of structure on the number of bugs repaired or introduced during the debugging stage instead
of the current Bernoulli setup. Unfortunately, such structure will over complicate the estimation
procedure and can be considered in a companion study in the future. Another potential extension
is to introduce a state space evolution of the coefficient in the power law relationship between
the inter-failure times, in other words introduce a time dependent dynamic structure which is
sequentially updated after each testing stage as new inter-failure time data is observed.
19
-
7/27/2019 Imperfect Debugging in Software Reliability a Bayesian Approach
21/22
References
Brooks, S. P. and Gelman, A. (1998). General methods for monitoring convergence of iterative
simulations. Journal of Computational and Graphical Statistics, 7(4):434455.
Campodonico, S. and Singpurwalla, N. D. (1995). Inference and predictions from Poisson point
processes incorporating expert knowledge. Journal of the American Statistical Association,
90(429):220226.
Chib, S. and Greenberg, E. (1995). Understanding the Metropolis-Hastings algorithm. The Amer-
ican Statistician, 49(4):327335.
Dalal, S. R. and Mallows, C. L. (1988). When should one stop testing software? Journal of the
American Statistical Association, 83(403):872879.
Goel, A. L. and Okumoto, K. (1979). Time-dependent error detection rate model for software
reliability and other performance measures. Reliability, IEEE Transactions on, 28:206211.
Jelinski, Z. and Moranda, P. (1972). Software reliability research. Statistical Computer Performance
Evaluation, pages 465484.
Kass, R. E. and Raftery, A. E. (1995). Bayes factors. Journal of the American Statistical Associa-
tion, 90(430):773795.
Kuo, L. and Yang, T. Y. (1995). Bayesian computation of software reliability. Journal of Compu-
tational and Graphical Statistics, 4(1):6582.
Kuo, L. and Yang, T. Y. (1996). Bayesian computation for nonhomogeneous Poisson processes in
software reliability. Journal of the American Statistical Association, 91(434):763773.
Lindley, D. V. (1990). The present position in Bayesian statistics. Statistical Science, 5(1):4489.
Littlewood, B. and Verall, J. L. (1973). A Bayesian reliability growth model for computer software.
Applied Statistics, 22(3):332346.
Morali, N. and Soyer, R. (2003). Optimal stopping in software reliability. Naval Research Logistics,
50(1):88104.
20
-
7/27/2019 Imperfect Debugging in Software Reliability a Bayesian Approach
22/22
Musa, J. D. and Okumoto, K. (1984). A logarithmic Poisson execution time model for software re-
liability measurement. Proceedings of the 7th international Conference on Software Engineering;
Orlando, Florida, pages 230238.
Ozekici, S., Altinel, I. K., and Ozcelikyurek, S. (2000). Testing software with an operational profile.
Naval Research Logistics, 47:620634.
Ozekici, S. and Soyer, R. (2001). Bayesian testing strategies for software with an operational profile.
Naval Research Logistics, 48:747763.
Ozekici, S. and Soyer, R. (2003). Reliability of software with an operational profile. European
Journal of Operational Research, 149:459474.
Ruggeri, F., Pievatolo, A., , and Soyer, R. (2010). A Bayesian hidden markov model for imperfectdebugging. Under review.
Singpurwalla, N. (1995). Survival in dynamic environments. Statistical Science, 10(1):86103.
Singpurwalla, N. and Soyer, R. (1992). Non-homogeneous autoregressive processes for tracking
(software) reliability growth, and their Bayesian analysis. Journal of Royal Statistical Society:
Series B(Methodological), 54(1):145156.
Singpurwalla, N. and Wilson, S. (1994). Software reliability modeling. International StatisticalReview, 62(3):289317.
Singpurwalla, N. and Wilson, S. (1999). Statistical Methods in Software Engineering: Reliability
and Risk. Springer.
Smith, A. F. M. and Gelman, A. E. (1992). Bayesian statistics without tears: A Sampling perspec-
tive. The American Statistician, 46(2):8488.
Soyer, R. and Mazzuchi, T. A. (1988). A Bayes empirical Bayes model for software reliability.
Reliability, IEEE Transactions on, 37(2):348254.
Washburn, A. (2006). A sequential Bayesian generalization of the Jelinsli-Moranda software relia-
bility model. Naval Research Logistics, 53(4):354362.
Xie, M. (1991). Software Reliability Modeling. World Scientific Publisher, Singapore.
21