imperfect debugging in software reliability a bayesian approach

Upload: lidia-rednic

Post on 03-Apr-2018

217 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/27/2019 Imperfect Debugging in Software Reliability a Bayesian Approach

    1/22

    \ fWf#

    The Institute for Integrating Statistics in Decision Sciences

    Technical Report TR-2011-1

    January 24, 2011

    Imperfect Debugging in Software Reliability:A Bayesian Approach

    Tevfik Aktekin

    Department of Decision Sciences

    University of New Hampshire

    Toros Caglar

    Department of Decision Sciences

    The George Washington University

  • 7/27/2019 Imperfect Debugging in Software Reliability a Bayesian Approach

    2/22

    Imperfect Debugging in Software Reliability: A

    Bayesian Approach

    Tevfik Aktekin

    Department of Decision Sciences

    The George Washington University

    Toros Caglar

    Department of Decision Sciences

    The George Washington University

    January 19, 2011

    Abstract

    The objective of studying software reliability is to assist software engineers in understanding

    more of the probabilistic nature of software failures during the debugging stages and to construct

    reliability models. In this paper, we consider Bayesian modeling of the inter-failure times whose

    parameter space is evolving stochastically over time typically observed in software reliability

    applications. In doing so, we focus on the modeling of relevant parameters such as the fault

    detection rate per fault and the total number of faults that are both latent (unobservable)

    factors. We consider a model which can take into account imperfect debugging type of behavior

    under certain conditions. Furthermore, we investigate the existence of perfect vs. imperfect

    debugging in the light of data. In order to show how the proposed model work, we use real data

    based on which we obtain the predictive reliability functions after each testing stage, carry out

    inference on relevant model parameters and present additional insights from Bayesian analysis.

    1

  • 7/27/2019 Imperfect Debugging in Software Reliability a Bayesian Approach

    3/22

    1 Introduction and Overview

    Over the last two decades, more than a hundred software reliability models have been proposed

    by researchers as pointed out by both Singpurwalla and Wilson (1994) and Kuo and Yang (1996).

    Software reliability can be defined as the probability of not observing a failure for a specified

    time interval under certain conditions. Here, not observing a failure represents the fact that

    the software runs without any problems (which does not necessarily mean that the code does not

    contain any more bugs). Software failure mainly happens due to problems in the code which can

    be attributed to human error. In software reliability research hard data are based on the failures

    not the errors left (bugs), since the latter are observable quantities whereas the formal is not.

    Before the final release, the software passes through several stages of testing in which debugging is

    performed and its reliability is assessed after each stage. As pointed out by Singpurwalla and Wilson(1994), the testing stages of software during development is considered to be the most expensive

    step. When the so called software reliability is found to be adequate the software is released. The

    question of when to release software given its reliability function is of utmost importance to software

    practitioners and requires proper modeling of relevant uncertain quantities which give birth to the

    software reliability function.

    The purpose of software testing is to detect software faults (bugs) inherent in the software

    code. A software fault is an error in the source-code, which can cause the software to fail when

    the program is executed. The testing stage consists of several consecutive program executions.

    Whenever a failure occurs, the software engineer attempts to fix the problem. When the problem

    is fixed and no new errors are introduced during debugging, then a perfect debugging is said to

    have occurred. Whereas, when new errors are introduced during debugging, then an imperfect

    debugging is said to have occurred. We note here that when imperfect debugging occurs, the

    software reliability is worsened whereas with perfect debugging the software becomes more reliable

    due to the fact that there are fewer bugs left in the software. There are several reliability models

    that either assume perfect or imperfect debugging which we briefly summarize in the sequel.

    Most software reliability models can be classified into one of three categories: models that are

    based on failure rates of the inter-failure times, models based on number of failures and models

    based on the actual inter-failure times. The earliest software reliability model is due to Jelinski and

    Moranda (1972) (JM) where the focus is on the modeling of failure rates. Although the model is

    2

  • 7/27/2019 Imperfect Debugging in Software Reliability a Bayesian Approach

    4/22

    very simple and is based on several questionable assumptions, it is the building block for most of

    the modern software reliability models in the literature. The JM model assumes that the software

    contains an unknown number of bugs and upon software failure a bug is detected and corrected

    entirely, i.e. perfect debugging. The JM model assumes that every fault contributes equally to the

    failure rate at any stage of the testing and that each fault is removed permanently upon failure.

    A straight forward maximum likelihood estimation is used to estimate model parameters. Another

    model that is based on failure rates is the one introduced by Littlewood and Verall (1973) (LV)

    where every fault does not contribute equally to the failure rate at any stage of the software testing,

    each fault is still assumed to be removed permanently upon failure. For parameter estimation the

    LV model uses a combination of maximum likelihood and Bayes estimation methods. Bayesian

    extensions of both JM and LV models have been considered by Kuo and Yang (1995) and the LV

    model by Soyer and Mazzuchi (1988). More recently, Washburn (2006) introduce the generalization

    of the Jelinski-Moranda model by considering a negative binomial prior for the number of faults

    left in the software where perfect debugging is assumed.

    The earliest failure count model is due to Goel and Okumoto (1979) where a non-homogeneous

    Poisson process with intensity function, (t) = a(1 ebt) is introduced. This model is considered

    to have given birth to most count models in software reliability, also referred to as NHPP models.

    Later, Musa and Okumoto (1984) propose another failure count model where a logarithmic Poisson

    execution time type of approach is considered. This model simply suggests that the rate at which

    failures occur exponentially decreases with the expected number of failures. Bayesian analysis

    of failure count models have been the subject of interest of many studies. Kuo and Yang (1996)

    propose a unified approach to the non-homogeneous Poisson process models and carry out Bayesian

    inference via Markov chain Monte Carlo methods. In doing so, they consider modeling the epochs

    of failures via a general order statistics model or a record value statistics model Campodonico

    and Singpurwalla (1995) discuss the incorporation of expert knowledge into the count processes

    typically used in software reliability such as the non-homogeneous Poisson process.

    Software reliability models based on the actual inter-failure times are scarcer in the literature.

    One such model is due to Singpurwalla and Soyer (1992) where the relationship between the inter-

    failure times are modeled via the power law which boils down to a first order autoregressive linear

    model in logarithms. Coupled with an evolution equation on the parameter space, the authors

    3

  • 7/27/2019 Imperfect Debugging in Software Reliability a Bayesian Approach

    5/22

    obtain a Gaussian Kalman filter model. Dalal and Mallows (1988) investigate the optimal stopping

    during the testing stage of software development and Morali and Soyer (2003) propose a Bayesian

    state space model with analytically tractable properties for optimal stopping. There is also a

    considerable amount of work done in software reliability from an operational profile perspective

    which is defined as the set of all operations that a software is designed to perform and the occurrence

    probabilities of these operations. Ozekici et al. (2000), Ozekici and Soyer (2001) and Ozekici and

    Soyer (2003) study such operational profiles and develop optimal release strategies. In addition,

    thorough reviews of most software reliability models can be found in Xie (1991), Singpurwalla

    (1995) and Singpurwalla and Wilson (1999).

    In this paper we consider Bayesian modeling of the inter-failure times whose parameter space

    is evolving stochastically over testing stages. We define our uncertainty about these unknown

    parameters via probabilities therefore a Bayesian approach is the natural choice. This method

    allows us to sequentially update relevant model parameters as well as their predictive distributions

    such as the predictive reliability function which can be used in determining the optimal time to

    release software. In doing so we focus on the modeling of relevant parameters such as the fault

    detection rate per fault and the total number of faults that are both latent (unobservable) factors.

    Coupled with an exponential likelihood on the inter-failure times, we consider a model which can

    take into account imperfect debugging type of behavior under certain conditions typically observed

    in software reliability problems but quite scarce in the software reliability literature, see Ruggeri

    et al. (2010) for a most recent study. Furthermore, we investigate the existence of perfect vs.

    imperfect debugging by comparing the in-sample fit of both models using methods which will be

    detailed in the sequel. In estimating model parameters, we use Markov chain Monte Carlo methods

    such as the Metropolis-Hastings and the Gibbs sampler or a combination of both, see Smith and

    Gelman (1992) and Chib and Greenberg (1995) for a detailed review of these methods. In order

    to show how the proposed model work we use real software inter-failure times data using which we

    predict the reliability function after each testing stage, carry out inference on the number of bugs

    left in the system and fault detection rate per fault after each stage of testing.

    There are many features of the Bayesian approach which would be of interest to software reli-

    ability practitioners. As pointed out by Lindley (1990) the Bayesian approach provides a coherent

    framework for making decisions under uncertainty thus enables us to develop optimal strategies

    4

  • 7/27/2019 Imperfect Debugging in Software Reliability a Bayesian Approach

    6/22

    in releasing software. One of the attractive features of the Bayesian approach is its ability to al-

    low expert knowledge incorporation via the prior distributions of relevant model parameters, see

    Campodonico and Singpurwalla (1995) for instance. Another important property is the straight

    forward manner in which it handles sequential updating of model parameters as one goes through

    the testing stages in the light of new information regarding failures.

    A synopsis of our study is as follows: In Section 2, we introduce a Bayesian model which can take

    into account the imperfect debugging phenomena under certain conditions and discuss a method to

    compare models. An illustration of the proposed models is carried out in Section 3 via real software

    failure data where we discuss in-sample fit issues, convergence issues of the estimation method and

    the inference of relevant model parameters. Finally, in Section 4 we conclude with a summary of

    our findings and suggestions for future work.

    2 Proposed Model

    The modeling approach that we will develop in the sequel is based on the Jelinski Moranda (JM)

    model as is the case for most subsequent work in the software reliability literature. Two of the main

    assumptions of the JM model is that every fault contributes equally to the failure rate at any stage

    of the testing and that each fault is removed permanently upon failure. We will consider the case

    where each fault is removed permanently upon failure, along with the possibility of introducing

    new faults into the source code while conducting the debugging procedure upon failure. In other

    words, we consider scenarios where the so called imperfect debugging can occur which is scarce in

    the software reliability literature.

    Let us first introduce a set of relevant parameters whose inferential implications will be discussed

    in the sequel. Let i for i = 1, . . . , N be the fault detection rate per fault during the ith stage of

    testing where N represents the last stage of testing. In addition, let i for i = 1, . . . , N represent the

    number of faults present on the software code during the ith stage of testing. Thus, the product

    (ii) represents the failure rate of the software during the ith stage of testing. We note here

    that both quantities are latent, i.e. unobservable quantities whose uncertainty will be defined via

    probability distributions. Both i and i are functions ofi, in other words they evolve stochastically

    from testing stage to testing stage.

    5

  • 7/27/2019 Imperfect Debugging in Software Reliability a Bayesian Approach

    7/22

    2.1 Likelihood

    As is the case with most software reliability models, we assume that the inter-failure times that are

    observable quantities, ti for i = 1, . . . , N are exponentially distributed. Furthermore, given i and

    i, ti are said to be conditionally independent. Therefore the likelihood function can be obtained

    as

    L(, ; D) =Ni=1

    iiexp{tiii}, (2.1)

    where = {1, . . . , N}, = {1, . . . , N} and D = {t1, . . . , tN}. Therefore the joint posterior of

    and would be given by

    p(, |D) Ni=1

    iiexp{tiii}p()p(), (2.2)

    where (2.2) will not be analytically available for any reasonable joint prior choice ofp() and p().

    Therefore, one can use Markov chain Monte Carlo (MCMC) methods such as the Metropolis-

    Hastings algorithm or the Gibbs sampler to obtain the posterior distributions of and . We

    discuss the implementation and implications of such MCMC methods in our numerical example

    section. Next we introduce modeling strategies for is and is.

    2.2 Modeling the fault detection rate per fault, i

    We believe that the fault detection rate per fault during the ith stage will be highly dependent on

    what happened on the (i 1)th debugging stage. Let the dependent structure of the is be given

    by the following power law relationship

    i = i1 i, for i = 1, . . . , N , (2.3)

    where i is lognormally distributed as i LN(0, ). By taking the logarithms of (2.3), we canobtain the following the linear model in logarithms

    log(i) = log(i1) + i, for i = 1, . . . , N , (2.4)

    6

  • 7/27/2019 Imperfect Debugging in Software Reliability a Bayesian Approach

    8/22

    where i = log(i). (2.4) is a first order autoregressive process of the latent fault detection rates

    per fault in the log scale. Therefore using the Markov property, the conditional distributions of

    log(i)s can be written as

    log(i)|log(i1), N(log(i1), ), for i = 1, . . . , N , (2.5)

    where acts like a first order autoregressive coefficient and is assumed to be U(a, b). The

    availability of the conditional distributions given by (2.5) is an attractive feature since, via the

    chain rule, they can be used as a product in (2.2) in lieu of p(). The lognormal is a natural

    choice due to its well known properties and the availability of full conditionals ( log(i)|log(i1))

    for i = 1, . . . , N .

    The relationship implied by (2.3) also dictates the type of debugging that occurs during the ithdebugging stage. If for instance i < i1, then perfect debugging is said to have occurred, whereas

    when i > i1, imperfect debugging is said to have occurred. In other words, when a failure is

    detected at the (i 1)th failure epoch, a fault has been detected and repaired, however a new fault

    was introduced during the same debugging stage. A priori we assume that is are lognormally

    distributed and their dependent structure is given via the power law introduced in (2.3) where

    determines on average how the fault detection rate per fault is changing from stage to stage. For

    instance, when 0 < i1 < 1 and > 1 then perfect debugging tends to occur, conversely when

    0 < < 1 then imperfect debugging tends to occur. Thus, inference on and being able to make

    probabilistic arguments on would be of interest to software engineers in assessing the overall

    performance of the testing stage. For instance one can determine the probability of imperfect

    debugging via the posterior distribution of , P(0 < < 1|D) when 0 < i1 < 1. We discuss

    implications of such probabilistic arguments on in our numerical application.

    2.3 Modeling the total number of faults, i

    Another quantity of interest is the inherent total number of faults during the ith stage of debugging

    denoted by i for i = 1, . . . , N . We note here that is are latent factors whose inference will be

    conditional on the fault detection rate at each stage of testing. Given the dependent structure

    previously introduced for is, it is natural to assume that the total number of faults left in the

    software code after the ith debugging, i, will be dependent on the previous stage. Conditional

    7

  • 7/27/2019 Imperfect Debugging in Software Reliability a Bayesian Approach

    9/22

    on whether perfect or imperfect debugging has occurred during the previous debugging stage, the

    total number of faults left in the software code is assumed to have the following structure

    i = i1 i, for i = 1, . . . , N (2.6)

    where

    i = 1, with probability p(i < i1)

    = 0, with probability p(i > i1) for i = 1, . . . , N .

    In (2.6), i is a Bernoulli process whose probability of success is determined via the probability

    of perfect debugging, i.e. p(i < i1). This structure, makes sure that when perfect debugging

    occurs during the ith debugging stage then i goes down by one unit, since the fault that has

    caused the failure has been found and fixed. Whereas when imperfect debugging occurs i stays

    the same, since the fault that has caused the failure has been found and fixed, however a new fault

    has been introduced while fixing the previous one. An important assumption that has been made

    with regards to the behavior of Nis was that the imperfect debugging introduces only a single fault

    and the perfect debugging removes a single fault. That may or may not be the case in real life

    software debugging. That is more than one fault can be introduced during an imperfect debugging

    and more than one fault can be repaired during a perfect debugging. Our model will assume that a

    single fault can be repaired or introduced during a debugging operation. An alternative to handle

    such a case is to assume a discrete Markov chain structure on i. Such a setup will not be addressed

    in this study since it will further complicate the estimation procedure and can be addressed in the

    future as a new problem.

    Furthermore, the initial number of inherent faults (will be denoted by 0) is also unknown,

    therefore needs to be defined via a probability distribution. We assume that 0 follows a Poisson

    distribution with mean . In addition, we introduce an extra level of hierarchy on which can beinterpreted as the expected initial number of faults prior to testing. Typically, in software reliability

    applications the initial number of inherent number of faults is estimated via deterministic functions

    that are based on the total number of lines of source-code. Thus based on the structures introduced

    on is and is and the data at hand, we learn more about by carrying out inference via its posterior

    8

  • 7/27/2019 Imperfect Debugging in Software Reliability a Bayesian Approach

    10/22

    distribution. We assume a Gamma prior on and thus summarize the hierarchy on the inherent

    number of faults prior to testing as follows

    0 P oisson() (2.7)

    and

    Gamma(c, d). (2.8)

    In our numerical example, we assume a flat Gamma prior for and point out how sensitive the

    model estimation is to changes in hyper-parameters, c and d.

    2.4 Predictive reliability function estimation

    As pointed out previously, prior to the final release the software passes through several stages of

    testing and its reliability is assessed after each stage. When the reliability of a piece of software

    is found to be adequate the software is released. Therefore, the optimal release of software is

    determined via the estimated predictive reliability function after the (i 1)th testing stage. Given

    the model parameters such as i and i, the predictive reliability function during the ith testing

    stage (given that the software has gone through (i 1)th stages of testing) can be obtained via

    R(ti|D(i1)

    ) =

    R(ti|i, i, D(i1)

    )didi, (2.9)

    where D(i1) = {t1, . . . , ti1}. Since Markov chain Monte Carlo methods will be used to generate

    samples of i and i, (2.9) can also be computed via the following

    R(ti|D(i1)) = 1

    1

    S

    Sj=1

    F(ti|(j)i ,

    (j)i , D

    (i1)), (2.10)

    where S is the number of generated posterior samples and (j)i and

    (j)i are the jth posterior samples

    given D(i1). We note here that given D(i1), only posterior samples of (j)i1 and (j)i1 would be

    available. However, one can use straight Monte Carlo in (2.3) in order to generate (j)i s and in (2.6)

    to generate (j)i s. Once (2.10) is estimated it can easily be used as part of a software reliability

    optimal release scheme.

    9

  • 7/27/2019 Imperfect Debugging in Software Reliability a Bayesian Approach

    11/22

    2.5 Model comparison of perfect vs. imperfect debugging models

    In addition to the imperfect debugging model developed previously, using a similar structure we

    have also estimated a model which assumes perfect debugging, namely we have estimated a modified

    Bayesian version of Jelinski and Moranda (1972), the JM model. In doing so, we have assumed

    that all is follow a common Gamma prior and adjusted the structure on is as in (2.6) such that

    the term i is replaced by 1, i.e. i = i11, for i = 1, . . . , N . Thus, the perfect debugging model

    can be used as a benchmark for comparison purposes.

    In comparing the in-sample performance of the proposed models, we will use one of the most

    commonly used Bayes factor approximation of models with MCMC steps, this measure is also

    referred to as the harmonic mean estimator. As summarized by Kass and Raftery (1995), the

    harmonic mean estimator can be obtained as

    p(D) = {1

    S

    Sj=1

    p(D|(j))1}1, (2.11)

    where S is the number of iterations and (j) is jth generated posterior sample vector. (2.11) is the

    harmonic mean of the likelihood values and can be used to compare the in-sample fit performance

    of the proposed models. For the proposed models, (2.11) can be computed as follows

    p(D) = { 1S

    Sj=1

    {Ni=1

    p(ti|(j)i , (j)i )}1}1, (2.12)

    where p(ti|(j)i ,

    (j)i ) =

    (j)i

    (j)i exp{ti

    (j)i

    (j)i } as in (2.1). Therefore, in comparing two models,

    a higher p(D) value indicates a better fit. In our numerical example, p(D) is computed in the

    log-scale.

    2.6 Summary of the imperfect debugging model

    Below is a concise summary of the material developed in Sections 2.1, 2.2 and 2.3 which we refer

    to as the imperfect debugging model in the sequel. For i = 1,...,N, the likelihood term for the

    inter-failure times is given by

    ti Exp(i i).

    10

  • 7/27/2019 Imperfect Debugging in Software Reliability a Bayesian Approach

    12/22

    The state evolution equation of the fault detection rate per fault, i is given by

    i = i1 i

    where i LN(0, ) and U(a, b).

    The state evolution equation of inherent number of faults, i is given by

    i = i1 i,

    where

    i = 1, with probability p(i < i1)

    = 0, with probability p(i > i1).

    with 0 Poisson() and Gamma(c, d).

    3 Numerical Example

    The degree to which our software reliability model performs with actual data is crucial in evaluating

    its validity. The details of the dataset, the estimation implications, and a summary of our findings

    are discussed in the sequel.

    3.1 The dataset

    The numerical application of our model is carried out on the well known dataset first reported in

    Jelinski and Moranda (1972) referred to as the JM data in the sequel. The dataset consists of 31

    software inter-failure times, 26 of which were obtained during the production stage of debugging

    and the remaining 5 during the rest of the testing stage. In our example, all 31 inter-failure times

    were used. Soyer and Mazzuchi (1988) and Kuo and Yang (1996) also use the same dataset to

    illustrate their respective proposed models from a Bayesian point of view.

    11

  • 7/27/2019 Imperfect Debugging in Software Reliability a Bayesian Approach

    13/22

    3.2 Markov chain Monte Carlo implementation and convergence

    In order to obtain the posterior samples of model parameters, a combination of Markov chain Monte

    Carlo (MCMC) methods such as the Gibbs sampling and the Metropolis-Hastings were used (see

    Smith and Gelman (1992) and Chib and Greenberg (1995) for a summary of most common practice

    in MCMC). In order to generate posterior samples for the proposed models (both imperfect and

    perfect debugging models), the WinBUGS software was used and the code is available via email

    upon request from the authors. We have assumed flat priors for model parameters when required.

    Three parallel chains were used with different initial points. The chains were ran for 5,000 iterations

    as the burn-in period and 20,000 samples were collected with a thinning interval of 2. Below, we

    present the posterior sample output for relevant parameters and discuss further implications of

    findings. For the sake of preserving space we will omit a detailed summary of convergence for all

    relevant model parameters of the imperfect debugging model. We only present results for some of

    the parameters and note that similar results were obtained for the rest of the parameters.

    iterations

    5000 10000 15000 20000 25000

    0.

    80

    0.

    85

    0.

    90

    0.

    95

    1.

    00

    iterations

    5000 10000 15000 20000 25000

    20

    40

    60

    80

    iterations

    1

    5000 10000 15000 20000 25000

    0.

    005

    0.0

    10

    0.0

    15

    iterations

    1

    5000 10000 15000 20000 25000

    10

    20

    30

    40

    50

    60

    Figure 1: Trace plots for , , 1 and 1 for the imperfect debugging model

    In obtaining the posterior samples, we did not encounter a problem of convergence in either

    model. This can informally be observed from the trace plots of the examples in Figure 1.

    A more formal way of assessing convergence can be carried out using the Brooks and Gelman

    plots and the shrink factor as discussed in Brooks and Gelman (1998). If the shrink factor, also

    referred to as the scale reduction point estimate, is around 1 then convergence is said to have been

    attained. The Brooks and Gelman plots are shown in Figure 2 where the shrink factor approaches

    1 as the number of iterations increases for parameters, , , 1 and 1. The estimated shrink factor

    was 1 for and 1.01 for . The individual shrink factors for is were within a range of 1.00 and 1.07

    with a multivariate shrink factor of 1.02. Similarly, is shrink factors were all around 1.02 with a

    12

  • 7/27/2019 Imperfect Debugging in Software Reliability a Bayesian Approach

    14/22

    5000 10000 15000 20000 25000

    1.

    00

    1.

    05

    1.1

    0

    1.

    15

    last iteration in chain

    shrink

    factor

    median

    97.5%

    beta

    5000 10000 15000 20000 25000

    1.

    0

    1.2

    1.4

    1.

    6

    1.

    8

    last iteration in chain

    shrink

    factor

    median

    97.5%

    theta

    5000 10000 15000 20000 25000

    1.

    00

    1.

    05

    1.

    10

    1.

    15

    1.

    20

    1.

    25

    last iteration in chain

    shrink

    factor

    median

    97.5%

    5000 10000 15000 20000 25000

    1.0

    1.5

    2.0

    2.

    5

    last iteration in chain

    shrink

    factor

    median

    97.5%

    Figure 2: Brooks and Gelman plots for , , 1 and 1 for the imperfect debugging model

    multivariate shrink factor of 1.01. Similar results were obtained for other parameters, therefore the

    convergence details of other parameters will be omitted from our discussion in order to preserve

    space. Thus, we can conclude that we did not encounter any convergence issues with either model.

    3.3 Summary of findings

    Next we discuss the implications of the posterior distributions obtained for the relevant model

    parameters. A boxplot based on the posterior samples of the fault detection rate per fault during

    each debugging stage, i.e. is, is shown in Figure 3. As introduced previously, when i < i1

    perfect debugging is said to have occurred. Figure 3 shows how the fault detection rate per fault is

    changing over debugging stages. It can be fairly argued that during the first 15-16 testing stages is

    tend to go up, namely it becomes easier to find faults, and is tend to go down towards the end of

    the testing, in other words it becomes harder to find faults as more faults are discovered. However,

    while this pattern is occurring the fault detection rate per fault goes up/down as imperfect/perfect

    debugging occurs over the testing horizon. Another finding is that the uncertainty of the fault

    detection rate based on the upper and lower quantiles of is seem to be higher around the 15th

    and 16th testing stages and lower at the beginning and at the end of the testing. One of the

    advantages of the Bayesian approach is that it allows probabilistic arguments to be made about

    model parameters, for instance based on Figure 3 it would be straight forward to compute posterior

    probabilities such as P(i < i1|D) for i = 2, . . . , 31 which indicates the probability of perfect

    debugging from stage to stage.

    Right panel of Figure 4 shows the posterior distribution of 1, the fault detection rate per fault

    during the first debugging stage (similar distributions were obtained for i for i = 2, . . . , 31). Left

    13

  • 7/27/2019 Imperfect Debugging in Software Reliability a Bayesian Approach

    15/22

    0.

    00

    0.

    02

    0.

    04

    0.

    06

    0.

    08

    0.1

    0

    0.1

    2

    Debugging Stages

    i

    Figure 3: Boxplots of i for i = 1, . . . , 31

    panel of Figure 4 shows the posterior distribution of as introduced in (2.3) whose posterior mean,

    E(|D) was 0.96 with as standard deviation of 0.028. This indicates that on the average imperfect

    debugging tends to occur during the 31 stages of testing.

    The posterior probability of perfect debugging, P(i < i1|D), determines the probability

    of success of the Bernoulli process i for i = 1, . . . , 31 as introduced in (2.6). Figure 5 shows

    the behavior of the posterior mean of is over testing stages, i.e. posterior probability of perfect

    debugging. When the probability is greater than 0.5 then perfect debugging is said to be more

    likely to occur as opposed to imperfect debugging. In Figure 5, cases that are located above 0.5 can

    be categorized as the most likely perfect debugging scenarios whereas cases that are below 0.5 as

    14

  • 7/27/2019 Imperfect Debugging in Software Reliability a Bayesian Approach

    16/22

    0.80 0.85 0.90 0.95 1.00

    0

    5

    10

    15

    Density

    0.005 0.010 0.015

    0

    50

    1

    00

    150

    200

    1

    Density

    Figure 4: Posterior distribution plots for (left) and 1 (right) for the imperfect debugging model

    0 5 10 15 20 25 30

    0.2

    0.4

    0.6

    0.8

    Debugging Stages

    ProbabilityofPerfectDebugging

    Figure 5: Probability of perfect debugging vs. debugging stages

    the most likely imperfect debugging scenarios. This shows informal evidence in favor of potential

    imperfect debugging behavior in the numerical example studied in this section. As a function ofis, the number of errors in the system, is are determined according to the structure introduced in

    (2.6). Figure 6 is the boxplot of i for i = 1, . . . , 31. The expected number of errors in the system

    after the first debugging stage, E(1|D), was estimated to be 18.142 and the expected number of

    errors in the system after the thirty first debugging stage, E(31|D), was estimated to be 4.063.

    15

  • 7/27/2019 Imperfect Debugging in Software Reliability a Bayesian Approach

    17/22

    This once more shows evidence in favor of the presence of imperfect debugging in our numerical

    example since if perfect debugging were to occur after 31 stages of debugging the expected number

    of faults would have gone down by 31 units which is was not found to be the case in our example.

    0

    5

    10

    15

    20

    25

    30

    Debugging Stages

    i

    Figure 6: Boxplots of i for i = 1, . . . , 31

    Left panel of Figure 7 shows the posterior distribution of the latent factor , i.e. expected

    number of faults in the software prior to testing as introduced in (2.8). The posterior mean of ,

    E(|D) was 28.01 with as standard deviation of 9.023. In estimating the hyper-parameter , we

    have used a flat Gamma prior, with initial values of 50, 75, 100 for the three chains that the MCMC

    algorithm was run for. We note there that the inference of was not sensitive to the choice of

    the initials. Right panel of Figure 7 shows the posterior distribution of 0, the inherent number of

    faults in the software prior to testing whose determined conditional on the hyper-parameter .

    16

  • 7/27/2019 Imperfect Debugging in Software Reliability a Bayesian Approach

    18/22

    20 40 60 80

    0.0

    0

    0.0

    1

    0.0

    2

    0.0

    3

    0.0

    4

    0.0

    5

    Density

    10 20 30 40 50 60

    0.0

    0

    0.0

    2

    0.0

    4

    0.0

    6

    0.0

    8

    0.1

    0

    0

    Density

    Figure 7: Posterior distribution plots for (left) and 0 (right) for the imperfect debugging model

    3.4 Additional insights from Bayesian analysis

    In inferring whether perfect or imperfect debugging occurs in our numerical example, we have usedthe Bayes factor calculation as introduced in (2.12). Table 1 shows the likelihood contributions

    in the log-scale for the two candidate models, the imperfect debugging model and the perfect

    debugging model as discussed in Section 2.5. The imperfect debugging model has the highest log-

    likelihood value, implying a Bayes factor of approximately > 100 (BF=p(D|Imperfect Debug)p(D|Perfect Debug) ) which

    according to Kass and Raftery (1995) shows decisive support in its favor. Thus, it can be argued

    that imperfect debugging as described in this study occurs given the numerical example at hand.

    Perfect Debug Imperfect Debuglog{p(D)} -191.0651 -108.58

    Table 1: log{p(D)} under each model

    Another attractive feature of the Bayesian approach is the computation of the predictive reli-

    ability function. We have obtained the predictive reliability functions of the last three debugging

    stages (29,30 and 31) as shown in Figure 8. In doing so, we have used the following

    R(ti|D(i1)) = 1S

    Sj=1

    exp{ti(j)i (j)i }, (3.1)

    for i = 29, . . . , 31 where the posterior samples of (j)i s and

    (j)i s are obtained using straight Monte

    Carlo from (j)i =

    (j)(j)i1

    (j)i (2.3) and

    (j)i =

    (j)i1

    (j)i (2.6), respectively. These predictive

    reliability distributions can easily be used as part of a larger software reliability optimal release

    17

  • 7/27/2019 Imperfect Debugging in Software Reliability a Bayesian Approach

    19/22

    problem.

    0 10 20 30 40 50

    0.4

    0.5

    0.6

    0.7

    0.8

    0.9

    1.0

    Predictive Reliability Function After 28 Stages of Testing

    t

    R(t|D)

    0 10 20 30 40 50

    0.4

    0.5

    0.6

    0.7

    0.8

    0.9

    1.0

    Predictive Reliability Function After 29 Stages of Testing

    t

    R(t|D)

    0 10 20 30 40 50

    0.3

    0.4

    0.5

    0.6

    0.7

    0.8

    0.9

    1.0

    Predictive Reliability Function After 30 Stages of Testing

    t

    R(t|D)

    Figure 8: Predictive reliability functions for i = 29, . . . , 31 for the imperfect debugging model

    4 Concluding Remarks

    In this study, we considered Bayesian analysis of the inter-failure times typically observed in software

    reliability. In doing so, we defined our uncertainty about relevant model parameters via probabilities

    therefore a Bayesian point of view have been followed. This allowed us to sequentially update model

    parameters as well as relevant predictive distributions such as the predictive reliability function

    which can be used in determining the optimal time to release software. We focused on the modeling

    of latent factors such as the fault detection rate per fault and the total number of faults where amodel which can take into account imperfect debugging type of behavior under certain conditions

    was introduced. This phenomenon is typically observed in software reliability applications however

    its analysis is quite scarce in the literature. Furthermore, we investigated the existence of perfect

    vs. imperfect debugging by comparing the in-sample fit of both models where decisive evidence in

    favor of the imperfect debugging model has been found.

    There are many attractive features of the Bayesian approach which would be of interest to soft-

    ware reliability practitioners. The Bayesian approach provides a coherent framework for making

    decisions under uncertainty thus can easily be implemented in optimal software release strategies.

    Another important feature of the Bayesian approach is its ability to allow expert knowledge in-

    corporation via the prior distributions of relevant model parameters as emphasized by Washburn

    (2006). In addition, the straight forward manner in which it handles sequential updating of model

    parameters as one goes through the testing stages in the light of new information regarding software

    18

  • 7/27/2019 Imperfect Debugging in Software Reliability a Bayesian Approach

    20/22

    failures would be of interest to software reliability practitioners.

    We believe that there are couple of areas on which future extensions of our proposed model

    are possible. One of the main assumptions of our model is that only one fault can be removed

    or introduced during the testing stage. This can be relaxed by introducing a Markov chain type

    of structure on the number of bugs repaired or introduced during the debugging stage instead

    of the current Bernoulli setup. Unfortunately, such structure will over complicate the estimation

    procedure and can be considered in a companion study in the future. Another potential extension

    is to introduce a state space evolution of the coefficient in the power law relationship between

    the inter-failure times, in other words introduce a time dependent dynamic structure which is

    sequentially updated after each testing stage as new inter-failure time data is observed.

    19

  • 7/27/2019 Imperfect Debugging in Software Reliability a Bayesian Approach

    21/22

    References

    Brooks, S. P. and Gelman, A. (1998). General methods for monitoring convergence of iterative

    simulations. Journal of Computational and Graphical Statistics, 7(4):434455.

    Campodonico, S. and Singpurwalla, N. D. (1995). Inference and predictions from Poisson point

    processes incorporating expert knowledge. Journal of the American Statistical Association,

    90(429):220226.

    Chib, S. and Greenberg, E. (1995). Understanding the Metropolis-Hastings algorithm. The Amer-

    ican Statistician, 49(4):327335.

    Dalal, S. R. and Mallows, C. L. (1988). When should one stop testing software? Journal of the

    American Statistical Association, 83(403):872879.

    Goel, A. L. and Okumoto, K. (1979). Time-dependent error detection rate model for software

    reliability and other performance measures. Reliability, IEEE Transactions on, 28:206211.

    Jelinski, Z. and Moranda, P. (1972). Software reliability research. Statistical Computer Performance

    Evaluation, pages 465484.

    Kass, R. E. and Raftery, A. E. (1995). Bayes factors. Journal of the American Statistical Associa-

    tion, 90(430):773795.

    Kuo, L. and Yang, T. Y. (1995). Bayesian computation of software reliability. Journal of Compu-

    tational and Graphical Statistics, 4(1):6582.

    Kuo, L. and Yang, T. Y. (1996). Bayesian computation for nonhomogeneous Poisson processes in

    software reliability. Journal of the American Statistical Association, 91(434):763773.

    Lindley, D. V. (1990). The present position in Bayesian statistics. Statistical Science, 5(1):4489.

    Littlewood, B. and Verall, J. L. (1973). A Bayesian reliability growth model for computer software.

    Applied Statistics, 22(3):332346.

    Morali, N. and Soyer, R. (2003). Optimal stopping in software reliability. Naval Research Logistics,

    50(1):88104.

    20

  • 7/27/2019 Imperfect Debugging in Software Reliability a Bayesian Approach

    22/22

    Musa, J. D. and Okumoto, K. (1984). A logarithmic Poisson execution time model for software re-

    liability measurement. Proceedings of the 7th international Conference on Software Engineering;

    Orlando, Florida, pages 230238.

    Ozekici, S., Altinel, I. K., and Ozcelikyurek, S. (2000). Testing software with an operational profile.

    Naval Research Logistics, 47:620634.

    Ozekici, S. and Soyer, R. (2001). Bayesian testing strategies for software with an operational profile.

    Naval Research Logistics, 48:747763.

    Ozekici, S. and Soyer, R. (2003). Reliability of software with an operational profile. European

    Journal of Operational Research, 149:459474.

    Ruggeri, F., Pievatolo, A., , and Soyer, R. (2010). A Bayesian hidden markov model for imperfectdebugging. Under review.

    Singpurwalla, N. (1995). Survival in dynamic environments. Statistical Science, 10(1):86103.

    Singpurwalla, N. and Soyer, R. (1992). Non-homogeneous autoregressive processes for tracking

    (software) reliability growth, and their Bayesian analysis. Journal of Royal Statistical Society:

    Series B(Methodological), 54(1):145156.

    Singpurwalla, N. and Wilson, S. (1994). Software reliability modeling. International StatisticalReview, 62(3):289317.

    Singpurwalla, N. and Wilson, S. (1999). Statistical Methods in Software Engineering: Reliability

    and Risk. Springer.

    Smith, A. F. M. and Gelman, A. E. (1992). Bayesian statistics without tears: A Sampling perspec-

    tive. The American Statistician, 46(2):8488.

    Soyer, R. and Mazzuchi, T. A. (1988). A Bayes empirical Bayes model for software reliability.

    Reliability, IEEE Transactions on, 37(2):348254.

    Washburn, A. (2006). A sequential Bayesian generalization of the Jelinsli-Moranda software relia-

    bility model. Naval Research Logistics, 53(4):354362.

    Xie, M. (1991). Software Reliability Modeling. World Scientific Publisher, Singapore.

    21