coping with panic attacks and agoraphobia - ed beckham, ph.d

E�ect of Fault Distribution and ExecutionPatterns on Fault Exposure in Software: ASimulation StudyA. von Mayrhauser1 and Dexing Chen21Department of Computer ScienceColorado State UniversityFort Collins, CO 805232Open View Software DivisionHewlett Packard CompanyFort Collins, CO 80525AbstractExecution patterns and fault distribution characteristics of a program will a�ectthe failure process and thus reliability estimates. The failure process of a soft-ware system is in uenced by many factors, and traditional software reliabilityengineering has found it di�cult to isolate the e�ect of each individual factor.A simulation approach is used to investigate the e�ects of fault distribution,execution pattern and program structure on software reliability estimates. Areliability simulation environment (RSIM) is extended by introducing variablefault distribution patterns in its code generation phase. Flow control pointsallow varying the execution frequency of di�erent parts of a program. The sim-ulation results show that fault distribution patterns and execution patterns havedramatic e�ects on fault exposure rate. If the fault distribution is non-uniform,a non-uniform code execution exposes faults more e�ciently and e�ectively thanuniform execution. Results also show that the structure of a program a�ectsfault exposure rate and testing time required.Keywords: Test execution patterns, fault exposure behavior, program simu-lation.1

1 IntroductionAs dependence on software grows, the need for reliable software and the need tobe able to quantify this reliability are increasing. The reliability of a programin uences software release decisions and is used as an indicator of softwarequality in operation [16]. Therefore, one must be able to accurately estimate itsreliability. It is only a matter of time before certain software products will comewith warranties, instead of the current disclaimers. A quanti�cation of softwarereliability is likely to be used in calculations that will determine the conditionof the warranties. Thus, the ability to accurately quantify software reliability isand will remain an important issue.Many models have been developed under a variety of assumptions [6]. How-ever, the realism of many of the underlying assumptions and the applicability ofthese models for assessing software reliability continue to be questioned [6, 8, 22].In addition, most of the existing models are tested against very limited data.Usually, software reliability growth models attempt to determine the shape of amathematical function. This function represents reliability growth as observed,not root causes responsible for the phenomenon represented by that function.A recent study [12] suggested combining the di�erent predictions obtained fromseveral models since none of the current reliability growth models will alwayso�er the most accurate estimation [14]. These e�orts purely pursue a better �tof the shape of a mathematical function as pointed out by Horgan et al. [8].They do not consider the structure of the code nor the execution patterns. Un-fortunately, software reliability models are sometimes applied by practitionerseven when their assumptions are not met. In particular, it is not clear howexecution patterns and fault distribution characteristics of a program will a�ectthe failure process and the resulting reliability estimate.Software failure processes are complicated and in uenced by many factors.Some of those may prohibit mathematical solutions for reliability growth mod-els. Simulation models, on the other hand, are able to isolate key factors andovercome simpli�ed assumptions in traditional software reliability engineering[26]. Simulation presents an attractive alternative for investigating the e�ective-ness of di�erent testing strategies on reliability growth. This paper investigatesthat direction with respect to the e�ect of execution patterns and varying faultdistribution characteristics on software failures. To this end, capability of thereliability simulation environment (RSIM) [25, 26] in simulating di�erent testingstrategies and fault distribution patterns was extended to make it possible toinvestigate the e�ects of execution and fault distribution patterns on reliabilityestimates using the extended RSIM.Section 2 presents background on common assumptions made in softwarereliability modeling and known results on testing e�ectiveness for testing strate-gies that meet the assumptions made in these models. Section 3 describes thesoftware reliability simulation environment RSIM [25, 26]. RSIM will be usedto investigate the e�ect of program structure, fault distribution, and execution2

characteristics on fault exposure. To do this, RSIM had to be extended. Thisis described in section 4. Section 5 introduces the design of the two sets ofexperiments. The �rst keeps program characteristics constant, but varies faultdistribution and execution patterns. The second investigates the e�ect of di�er-ent program characteristics on fault exposure, in addition to the other factors.Section 6 reports the results of both experiments. Finally, section 7 summarizesthe �ndings.2 BackgroundSoftware reliability growth models (SRGMs) have been intensively researched[13]. Software reliability could be represented either by the ratio of cumulative(detected) defects over the total number of defects of a program or the faultexposure rate over time [16]. The fault exposure rate is the number of exposedfaults divided by the total number of faults. SRGM parameters are estimatedusing failure data collected during testing. Reliability is estimated from a �ttedmodel. In applying these models, it is assumed that the program is executedusing an operational pro�le. According to [16], an operational pro�le consists ofa set of runs where each run is associated with a frequency of occurring. Runsshould model the execution of a user function. Their execution also causesa certain execution pattern at the statement level. This is why operationalpro�le assumptions are ultimately related to corresponding assumptions aboutexecution patterns of the software. It is the latter we are most interested in.Similarly, execution of test cases, no matter what testing strategy is followed,also results in (measurable) execution patterns. In other words, execution pat-terns are caused by executing test cases, whether the test cases are derived byoperational pro�le or other approaches.The predictability of software reliability models has been investigated bycomparing their predictions with sets of failure data, and showed that no singlemodel works best for all data sets [4, 12]. Many common assumptions madein these models cannot be strictly adhered to in practical applications [6, 17].Speci�c assumptions that limit the applicability and the e�ectiveness of thoseSRGMs are:1. Times between failures are independent. This implies random testing.In general, however, testing, especially functional testing and clusteredtesting, are used extensively in testing of many products. Such testing isno longer random.2. Homogeneous distribution of faults and equal likelihood of fault exposure.By contrast, some studies have shown that the defect distribution is gener-ally uneven [18, 19]. Selby and Basili [19] found that interfaces appear tobe most error prone, regardless of the module type. They reported that3

initialization errors are about 11%, control errors 16%, interface errors39%, data errors 14% and computation errors 20%.3. No new faults are introduced. In practice, appropriate actions are takento remove faults. This process may incorporate design modi�cations, andintroduce new faults.Horgan et al. [8] provide a detailed critique of operational pro�le based re-liability estimation. They suggest that existing methods for software reliabilityestimation are fraught with risk due to di�culties in estimating accurate oper-ational pro�les and due to invalid assumptions. However, it is not clear howreliability estimation will be a�ected by inaccurate operational pro�le estima-tions (which also result in inaccurate assumptions about execution patterns ofthe underlying code).Like operational pro�les, testing strategies select a subset of inputs from theinput space of the software. Testing strategies di�er in how they select thissubset of inputs [24]. During test execution, di�erent testing strategies maytherefore show di�erent patterns how the code is executed.Given that a fault cannot be exposed without executing the statement con-taining it and that it may take di�erent ways of executing a statement contain-ing a fault to expose it, execution patterns should show some in uence on faultexposure.While there are no known studies to evaluate the e�ect of varying executionpatterns on fault exposure, several studies evaluated testing strategies in termsof fault exposure [2, 11, 24]. Under certain circumstances, random testing per-forms better, although this seems counter-intuitive, since random testing doesnot use any information about the function or structure of the software. Li andMalaiya [11] showed that the optimal partition of the input space depends onthe operational pro�le and defect detectability pro�les of the program. Hamletand Taylor [7] showed that random testing was comparable to partition testingwith respect to deriving the operational reliability of the program.There is signi�cant interest in applying reliability analysis or SRGMs inguiding resource allocation in software development. To do so requires an es-timate of the software reliability at early stages of the development. However,SRGMs are parameterized based on system testing data. To solve this con ict,researchers have been trying to use static metrics of the software to estimatereliability, e.g., [20]. Malaiya et al. [15] proposed to estimate parameters in areliability growth model using easily obtainable initial faults and fault densities.Their model relates fault exposure ratio to the initial number of faults and thefault density of a program. Von Mayrhauser and Keables [25] proposed that thefault exposure ratio might depend on the structure of a program, speci�cally itsloopiness and branchiness. Loopiness is de�ned in [25] as the average number ofexecutions per instruction (how many times the program iterates over the sameinstruction). Branchiness is a function of the number of branch statements andtheir nesting level. For a detailed de�nition see [25]. However, it is unknown4

Failure Data

StaticMetrics

ReliabilityEstimates

Analyzer/Reports

ReliabilityModels

ProgramGenerator

CodeStructureParameters

ExecutionPatternParameters

Program Text

StaticMetricsTool

ExecutionHarness

FaultDistributionParameters

RSIM: Software Reliability Simulation Environment(modified from von Mayrhauser et al., 1993)

Figure 1: RSIM: Software Reliability Simulation Environmenthow testing strategies will a�ect the relationships between the fault exposureratio, and the structure and fault density of a program.To fully understand the failure process, requires isolating the e�ect of variousdriving factors (e.g., structural characteristics of code, execution pattern ortesting strategy) and exploring a variety of scenarios while controlling thosefactors [26]. Obviously, no real software project can a�ord to do the same projectseveral times while varying the factors of interest. Even if it were possible collecta large amount of data from di�erent projects with desired properties, these datamay not have been collected in a controlled environment. Von Mayrhauser, etal. [25, 26] suggested the use of simulation in software reliability modelingto overcome the above problems. A simulation environment called RSIM wasdeveloped and applied to investigate the e�ect of software structure on faultexposure ratio [25]. Simulators of the software reliability process could be usedto supply carefully controlled, homogenous data for use in evaluating variousexisting analytical reliability models and in forecasting reliability growth [21,26].3 RSIM: A Reliability Simulation EnvironmentRSIM is a software reliability simulation environment. It has been used toinvestigate the relationship between program structure and dynamic models of5

software reliability [25, 26]. RSIM (cf. Figure 1) consists of a code generator, anexecution harness, a debugger simulator, a static data collector and a reliabilityanalyzer. The code generator simulates the generation of programs based on thedistribution of di�erent types of statements and faults that are to be included inthe programs. Statements and faults are generated as values whose frequenciesconform to given distributions.RSIM is driven by the following user controlled parameters:� Statement Type Frequencies. Programs consist of a collection of programstatements. Programs generated in the simulation environment have fourtypes of program statements: assignment statements, loop statements, ifstatements, and subprogram calls. The structure of the generated pro-grams varies with the relative frequencies of these four types of state-ments. For example, a program might contain 30% if statements, 5%loop statements, 10% subroutine calls, and the remaining 55% assignmentstatements.� Average and Maximum Nesting Level. The nesting of program statementsis important in relating program structure to execution time behavior, i.e.,execution patterns.� Program Size. This is used to generate program with various sizes.� Number and Size of Subprograms. These parameters specify the numberof subprograms and the average size of each subprogram.� Fault Density. This parameter speci�es the number of faults per KLOCand is used to determine whether a newly generated statement will containa fault.The execution harness and debugger compile the generated program, exe-cute it, collect times between failures in an output log and remove faults as theyare exposed. When a program executes under UNIX, a new process is spawnedfor that program. In RSIM's case, a driver program executes and then spawnsa "child" process to execute the generated program. The simulator models afault as a division with a �xed probability of generating a failure (as a divide-by-zero). When a failure occurs, it generates an exception. Occurrences of thisexception can be detected using the signal capabilities of UNIX. The driver pro-gram logs the failure, removes the fault that caused the failure, and recompilesthe program. Accounting data is collected on a per process basis, enabling aseparation of the execution time of the driver from the execution time of thegenerated program. It is also convenient to have the child process be the onethat fails with the parent remaining to clean up afterward.The reliability analyzer uses the failure data obtained from the test har-ness/debugger to assess the reliability of the simulated code. It uses the BasicExecution Time Model [16], one of many SRGMs.6

The static data collector is used to collect metrics from C programs on a�le basis. The metrics calculated include the number of branch decisions, loopdecisions and their ratio; the number of executable source statements and non-commentary source lines; the maximal and average nesting levels and number offunction declarations. This enables validation of the RSIM generated syntheticcode against the user speci�ed code characteristics that drove generation of thesynthetic code.The implementation of the original version of RSIM assumed that faultsare of only one single type and severity, distributed uniformly throughout theprogram, and have a constant likelihood that a failure results when executionencounters a statement containing a fault. The execution pattern is simulatedas a uniformly distributed random process [25, 26]. Extensions to the simulatorenable a large variety of fault distributions and execution patterns. This madeit possible to evaluate the e�ects of execution characteristics and di�erent faultdistributions.4 Extensions To RSIMThe objective of the RSIM extensions was to provide the ability to simulate alarger variety of fault distribution patterns, and to create more diverse executionpatterns. The reliability analyzer also was adapted to report fault exposure overtime.4.1 Fault Distribution PatternsReal faults in a program are more likely distributed non-uniformly over the pro-gram. Here, the focus is on run-time rather than compile-time faults. Existingevidence suggests that fault distribution depends upon functionality and struc-ture of the code [1, 19]. Like [18], faults are classi�ed into three major types:(1) computing and data faults associated with computation and initialization,(2) control faults associated with data ow control statements, and (3) interfacefaults related to interfaces (procedure or function calls and parameter passing).To simulate di�erent fault distribution patterns, the fault insertion module ofRSIM was extended to allow more than one fault density and fault type. Theuser controls how many fault types he wants to model. Fault types are linkedto di�erent statement types. The RSIM �rst reads in the speci�c fault densitiesand their distributions from the input parameter �le. Then the code generatorof the RSIM inserts the faults based on sampling the associated distribution. Ifthe sample falls in the speci�ed region of the fault distribution, the statementgenerated contains a fault of the appropriate type. It is easy to generate pro-grams with di�erent fault distributions and fault densities by varying the inputfault distribution parameters of the code generator. As in the original RSIMimplementation, a failure causes an exception and the driver program takes ac-7

tion. The only di�erence is that the type of fault is also communicated, so bothoccurrence and type of fault can be logged. This implementation is consistentwith the structure of the original RSIM and provides more exibility [25].4.2 Execution PatternAs explained before, di�erent testing strategies can result in widely di�erentexecution patterns for a program. For example, one testing strategy may causea loop to be exercised many more times than another strategy would. Thisexecution pattern can be described by associating a high probability with thecondition in the while statement being true and a low one with it being false.Yet another testing strategy may cause a decision point to be mostly true(say 90% of the time), rather than false. In the simulator this type of behavior iscontrollable with user de�nable frequencies associated with if, case, while,and for statements in the code. These frequencies determine how often eachbranch or loop is executed according to input execution pattern parameters.This scheme is analogous to the fault distribution pattern control. Varyingcombinations of the execution pattern parameters is used to simulate a varietyof execution patterns at the code execution level.4.3 Reliability AnalyzerThe reliability analyzer had to be changed as well. First, the SRGM upon whichthe reliability analyzer in the original RSIM is based makes assumptions that aretoo restrictive. Second, since the focus is on comparing fault exposure capabilityof various execution patterns, the measure of interest is the fault exposure rate.This is the number of exposed faults divided by the total number of faults.Insofar as SRGMs show cumulative failure behavior over time, and it is possibleto express fault exposure over time as well, it is appropriate to consider faultexposure over time a type of reliability measure, although the focus of this paperis measuring the fault exposure capability of various execution patterns.Like in the original RSIM, performance was very good. Generation of thesimulated code is almost instantaneous. The simulation time itself dependsmore on the structure of the code (how deeply nested, for example) and theexecution patterns chosen, rather than the size.5 Simulation ExperimentsAdding multiple fault distributions and execution pattern control to RSIM makeit possible to investigate the e�ects of di�erent fault distributions and executionpatterns on software fault exposure. The hypothesis is that fault distributionand execution patterns a�ect the failure process of a program and hence theexposure of faults, and that structural characteristics of a program, speci�cally8

its branchiness, and its fault distribution a�ect fault exposure rate. To testthese hypotheses, two experiments were carried out.5.1 Design of ExperimentsThe �rst experiment (Exp 1) has 18 treatments = 3 fault density distributionpatterns X 6 execution patterns. Table 1 shows the six execution patterns.Execution patterns are uniform or biased. Bias is introduced by favoring onebranch at a decision point over another and by biasing loop execution towardsa large number (high looping) versus executing loops only a few times (lowlooping).Table 2 shows the three fault distribution patterns. Fault densities (faults/KLOC)are associated with three types of errors: computational (Dcomputing), inter-face errors (Dinterface), and errors in control statements (Dcontrol), respectively.Fault type non-uniform A represents a situation where interfaces are error-proneand have higher fault density than other components of a program. Fault typeuniform is uniformly distributed. This is what most SRGMs assume. Typenon-uniform B is a variant of type non-uniform A with a higher proportion ofcontrol defects. The values of these parameters are selected to re ect the rangesfound in empirical studies [19], and the average fault densities over a wholeprogram are the same (seven faults/KLOC) among treatments. 1The second experiment (Exp 2) is designed to investigate �ve factors relatedto the structure of a program and its fault densities (Table 3) and their e�ecton fault exposure. The program structure variables describe1. the proportion of statements that are procedure calls. Procedure calls arealso used to evaluate the potential impact of interface errors, and2. the branchiness versus loopiness of a program as a ratio between branchand loop decisions (B/R ratio).The three remaining factors represent fault densities for computational, inter-face, and control faults. For each of these �ve factors, two values were chosen torepresent low and high levels. The values were selected based on empirical datafor program structure characteristics [25], and for fault densities [19]. RSIMgenerated programs with a size of 10,000 lines of code and 20 subroutines.This size was chosen to show that the simulator is able to generate and sim-ulate non-trivial programs. The synthetic programs were executed and failuredata was logged using the execution harness of the extended RSIM. The faultexposure or reliability is represented by the fraction of total faults exposed as afunction of execution time.Results are reported as the proportion of faults detected and the executiontime to detect them (cf. Figures 2 { 9). Further, the comparison measure for1Note: averaging the fault densities in Table 2 will not produce 7 faults/KLOC, since thepercentages of code constructs with such errors is not equal.9

Table 1: Experiment 1: Execution PatternsExecution Pattern TreatmentUniform Execution (UE) Branch and loop decisions have an equalprobability of execution.Branch Biased Execution (BR Biased) One branch in a decision point is exe-cuted more times than the other.High Looping Biased Execution(high LP) Code execution is biased towards a highnumber of loop iterations.Branch & High Looping Biased Execu-tion (BR & high LP) Code execution is biased towards a highnumber of loop iterations. Executionalso favors some branches over others.Low Looping Biased Execution(low LP) Code execution is biased towards a lownumber of loop iterations.Branch & Low Looping Biased Execu-tion (BR & low LP) Code execution is biased towards a lownumber of loop iterations. Executionalso favors some branches over others.Table 2: Experiment 1: Fault Distribution PatternsFault Distribution Pattern < Dcomputing ; Dinterface; Dcontrol >Non-uniform A < 5, 15, 5 >Uniform < 7, 7, 7 >Non-uniform B < 3, 15, 11 >all execution patterns is how long it takes to expose 80% of all faults. This isonly one of the many possible points from the result �gures. While, admittedly,the choice of 80% fault exposure is arbitrary, this number represents a test yieldwhen �nding faults has not saturated (any of the execution strategies is stille�ective, but some are more e�cient).Each treatment was repeated three times. The average of three was used inthe result analysis and presented for each treatment. All simulation runs wereexecuted on a dedicated machine (HP 700 series) to reduce workload variation.Since the data measured in the three repetitions varied less than 10% for any ofthe treatments, no more than three repeats of each treatment were necessary.In both experiments, the null-hypothesis is that changes in factor levels do nota�ect fault exposure. 10

Table 3: Experiment 2: Factor Levels for Program Structure and Fault Distri-bution Calls (%) B/L Dcomputing(1/KLOC) Dinterface(1/KLOC) Dcontrol(1/KLOC)Low 10 3 2 5 5High 20 9 4 15 105.2 ValidityThe internal validity of an experiment addresses whether di�erences in treat-ments are the reason for di�erences in observed results. A number of confound-ing factors can a�ect internal validity of an experiment [3]. The simulationenvironment ran on a stable, dedicated platform to generate and execute pro-grams. This guarantees internal validity.Whereas the internal validity of the experiment is concerned with whetherthe results make sense in the context of the experiment, the external validityaddresses the generalizability of the results. Here, one must determine whetherthe synthetic programs and their faults are representative of (some) actual pro-grams. First, consider the size and structure of the programs. Obviously, withthe variability of software, it is impossible to have a synthetic program thatis representative of all programs. Rather, one must determine whether thecharacteristics chosen for the synthetic programs in this experiment occur inactual software. Keables [10] measured structural characteristics of 44 actualprograms from industry, academia, and software archives available via anony-mous ftp. These programs totaled several millions of lines of code and variedwidely in size and application domain. The size and structural characteristicsof the synthetic programs generated in the current experiments �t well withinthe bounds of the data observed by [10]. Thus the structural characteristics ofthese synthetic programs are representative of actual programs.Similarly, the types of faults and the fault distributions are based on theempirical data reported in [18]. Thus it is fair to say that the synthetic programscompare to actual programs in industry with respect to their structural and faultcharacteristics.However, there are clearly many more possible fault distributions and struc-tural characteristics that occur in actual programs. This work is one step to-wards exploring questions about the relationship between execution patterns,fault characteristics, program structure, and fault exposure. A comprehensiveanswer to these questions will have to be assembled from many future studiessimilar to ours.11

Non-Uniform A Uniform Non-Uniform B

0 0.005 0.01 0.015 0.02 0.025 0.03 0.035 0.04 0.045CPU Time (hour)

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Cum

ulat

ive

Failu

res/

Tota

l Fau

lts

Figure 2: Fault exposure for three fault distributions and uniform execution6 ResultsThe results report the e�ect of fault distribution patterns, execution patterns,and program structure on fault exposure. They show that all three factorsin uence the rate at which faults are exposed.6.1 E�ect of fault distribution pattern on fault exposureGiven a uniform execution of a program, the faults are exposed fastest forprograms with a uniform fault distribution, and slowest for the non-uniform Afault distribution type (Fig. 2). This is the situation with a relatively highinterface fault density and a lower fault density for computational and controlfaults (their fault density is equal). It takes the least amount of time to expose12


0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08CPU Time (hour)

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Cum

ulat

ive

Failu

res/

Tota

l Fau

lts

Figure 3: Fault exposure for three fault distributions and high looping execution80% of all faults in a program with a uniform fault distribution, the longest timewith a non-uniform A fault distribution pattern. Uniform execution is clearlybest for uniform fault distributions, but yield saturates for non-uniform A andB distributions.However, under non-uniform execution (high-looping biased), a programwith non-uniform A or B fault distribution has a higher fault exposure ratethan a program with a uniform fault distribution (Fig. 3).By contrast, Figure 4 shows that under the branch biased (BR) execution,the fault exposure rate is lowest with a fault distribution of type non-uniformB. The other two are similar.Thus, testing strategies with a high looping bias are preferrable when thefault distribution is not uniform. However, when there is reason to suspectthat the code contains a signi�cant proportion of control faults (as in type non-13


0 0.02 0.04 0.06 0.08 0.1 0.12CPU Time (hour)

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Cum

ulat

ive

Failu

res/

Tota

l Fau

lts

Figure 4: Fault exposure for three fault distributions and under branch-biasedexecutionuniform B vs. A), branch-biased execution will be worse.This �nding is reasonable, since the execution of the code is focused ona portion of the conditional paths, and faults associated with the neglectedbranches are never executed. In general, these results con�rm the intuition thatfor programs with a uniform fault distribution, uniform testing is relatively moree�cient than non-uniform testing in detecting faults. On the other hand, non-uniform testing will be better for programs with non-uniform fault distributions.14

uniform exe. BR biased high_LP

low_LP BR & high_LP BR & low_LP

0 0.005 0.01 0.015 0.02 0.025 0.03CPU Time (hour)

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Cum

ulat

ive

Failu

res/

Tota

l Fau

lts

Figure 5: Fault exposure for three fault distributions and branch biased execu-tion6.2 E�ect of execution pattern on fault exposureTo investigate the e�ect of fault distribution and execution strategy on fault ex-posure further, the next step is to compare fault exposure results for each faultdistribution type. Figure 5 shows that, given a non-uniform A fault distribu-tion, non- uniform executions are more e�ective in exposing faults. Speci�cally,high LP, BR & high LP and BR & low LP are much more e�ective comparedto more uniform execution patterns. All but the high LP strategy are strate-gies combining loop bias with branch bias. Up to the 80% exposure level, themost e�cient is a combination of branch and high looping bias. However, yield15

saturates more quickly than high LP alone or BR & low LP.

uniform exe. BR biased high_LP

low_LP BR & high_LP BR & low_LP

0 0.005 0.01 0.015 0.02 0.025 0.03CPU Time (hour)

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Cum

ulat

ive

Failu

res/

Tota

l Fau

lts

Figure 6: Fault exposure for all six execution patterns and uniform fault distri-butionGiven a uniform fault distribution, the uniform execution takes the leasttime to expose 80% of the faults (Fig. 6). Execution with high iteration loopingbias takes the longest time. The execution strategy that was best for non-uniform A fault distributions performs worst. The one that is fairly good foreither non-uniform A or B fault distributions is a combination of branch biasand low looping preferred.Consider the non-uniform B fault distribution next (Fig. 7). A non-uniformB fault distribution has relatively large fault densities for interface and controlfaults, and much smaller fault density for computational faults. Here, uniform16

uniform exe.uniform exe. BR biasedBR biased high_LPhigh_LP

low_LPlow_LP BR & high_LPBR & high_LP BR & low_LPBR & low_LP

0 0.005 0.01 0.015 0.02 0.025 0.03 0.035 0.04 0.045 0.05CPU Time (hour)

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Cum

ulat

ive

Failu

res/

Tota

l Fau

lts

Figure 7: Fault exposure for all six execution patterns and non-uniform B faultdistributionexecution is still among the best, together with low LP and BR & low LP. Anyof the high looping strategies should be avoided.Now consider an evaluation of e�ciency for exposing 80% of the faults foreach type of fault distribution and each execution strategy. Table 4 presents theCPU time required to detect 80% of all faults in a program for the three faultdistribution patterns and six execution patterns. Overall, uniform execution ofthe code with a uniform fault distribution takes less CPU time to expose 80% ofthe faults; biased executions take less CPU time for those programs with non-uniform fault distributions. These results indicate that uniform executions willbe more e�cient than non-uniform executions when the faults are distributeduniformly across the code. This is consistent with [7]. However, this is not thecase for the two non-uniform fault distributions17

Table 4: CPU time (hours) to detect 80% of total faults.non-uniform A uniform non-uniform BUniform Exec. 0.015526 0.001567 0.008276Branch Biased 0.016037 0.010522 0.039973High Looping 0.00582 0.019358 0.008123Branch&High Looping 0.003845 0.017337 0.009367Low Looping 0.01478 0.006505 0.009679Branch&Low Looping 0.004688 0.003612 0.003688First, both types of non-uniformly distributed faults will take longer to �nd.Second, the best strategy for non-uniform A is di�erent from the one for non-uniform B. This appears to indicate that test execution patterns are best whentailored to the expected fault types.

18

6.3 E�ect of program structure on fault exposure

B/L=9,Subs=20% B/L=3,Subs=20% B/L=9,Subs=10% B/L=3,Subs=10%

0 0.005 0.01 0.015 0.02 0.025 0.03 0.035 0.04 0.045 0.05CPU Time (hour)

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Cum

ulat

ive

Failu

res/

Tota

l Fau

lts

Fault Exposure of Different Structural Programswith Same Fault Density Distribution and Uniform Execution

Dcontrol = 5/KLOC, Dinterface = 5/KLOC, Dcomputing = 2/KLOC

Figure 8: Fault exposure of programs with varying structure, and low faultdensityThe next step investigates the e�ect of program structure on fault exposurekeeping the execution strategy constant. Recall that there are two factor levelsfor program structure variables and for fault densities. Figures 8 and 9 showthe results for a lower and a higher overall fault density. For the lower faultdensity (cf. Figure 8), a higher level of branchiness versus loopiness (B/L ratiois higher) requires more test time to expose faults, especially when combinedwith the smaller proportion of subroutine calls.The results are not much di�erent for the higher fault density (cf. Figure 9),except that the e�ect of the percentage of subprogram calls on fault exposure19

B/L=9, Subs=20% B/L=3, Subs=20% B/L=9, Subs=10% B/L=3, Subs=10%

0 0.005 0.01 0.015 0.02 0.025 0.03 0.035 0.04 0.045 0.05CPU time (hour)

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Cum

ulat

ive

Failu

res/

Tota

l Fau

lts

Fault Exposure of Different Structural Programs

Dcontrol = 10/KLOC, Dinterface = 15/KLOC, Dcomputing = 2/KLOCwith Same Fault Density Distribution and Uniform Execution

Figure 9: Fault exposure of programs with varying structure, and high faultdensityseems fault density dependent. These results are consistent with previous results[25] that program structure a�ects the fault exposure ratio.Table 5 presents the CPU time needed to expose 80% of all faults for allfactor levels related to program structure and fault distributions under the sameexecution strategy. The �rst column in table 5 describes the di�erent factorlevels used in the experiment (see also table 3 in the experimental design). Theydescribe the structure of the program in terms of its organization into functions,and the fault densities for computational, interface and control faults: <Subcalls(%), Dcomputing , Dinterface, Dcontrol >. The second and third columns showthe CPU time required to detect 80% of all faults for a low branch/loop ratio20

(B/L=3, column 2) and a high one (B/L=9, column 3).The average CPU time is 0.0107975 and 0.022323 hours, respectively, forprograms with low and high B/L ratios. The next question is whether there is astatistically signi�cant di�erence in fault exposure due to variations in programstructure and fault distribution. In other words, does the program structureand fault distribution matter? A �2 test reveals that the CPU time neededto detect 80% of all faults di�ers signi�cantly (p<0.001) for di�erent programstructure characteristics and fault distributions. This implies that code requiresmore testing time if it has more branch decisions than loop decisions.Table 5: CPU time (hours) to detect 80% of all faults.(%subcalls, Dcomputing ,Dinterface, Dcontrol) B/L=3 B/L=9<10%, 5, 2, 5> 0.003344 0.048303<20%, 5, 2, 5,> 0.010736 0.014847<10%, 5, 2, 10> 0.002724 0.0349846<20%, 5, 2, 10> 0.002852 0.001211<10%, 15, 2, 5> 0.001978 0.027186<20%, 15, 2, 5> 0.02821 0.014064<10%, 15, 2,10> 0.008285 0.013797<20%, 15, 2,10> 0.007894 0.020759<10%, 5, 4, 5> 0.011723 0.056777<20%, 5, 4, 5> 0.007112 0.007273<10%, 5, 4,10> 0.023257 0.023273<20%, 5, 4, 10> 0.015864 0.01526<10%, 15, 4, 5> 0.007025 0.020407<20%, 15, 4, 5> 0.00361 0.004423<10%, 15, 4,10> 0.021678 0.03718<20%, 15, 4,10> 0.016499 0.017501Average 0.0107975 0.0223237 SummaryThe failure process of a software system is complicated and in uenced by manyfactors. This work applied a simulation approach to investigate the e�ects offault distribution, execution pattern and program structure on fault exposure.The reliability simulator RSIM was extended [25, 26] to allow variable faultdistribution patterns in the code generation phase and to provide execution21

pattern control so that the execution frequency of di�erent parts of a programcould be varied. This extension made it possible to investigate the e�ectivenessof di�erent execution strategies in detecting faults under di�erent scenarios. Thesimulation results show that fault distribution patterns and execution patternshave signi�cant e�ects on fault exposure rate. In general, if the fault distributionis non- uniform, a non-uniform execution of the code is more e�cient in terms offault exposure than a uniform execution. This implies that when considering atesting strategy for a product, one should take into account the fault distributionpatterns. The results also show that the structure of a program a�ects faultexposure rate and the testing time required. The results show that the failureprocess is a�ected by the fault distributions and testing strategies as suggestedin other empirical studies [22, 23]. The �ndings of this study can be interpretedin two ways.� They help to determine a desirable execution pattern and related testingstrategy based on knowledge of program structure and estimated faultdensity of various types of faults. In order to make use of the results inthis paper, a tester needs three types of information:1. Code structure { as expressed in the metrics used in this paper.2. Fault density estimates for each type of fault. Such estimates usu-ally rely on historical data. Programmers who follow the PersonalSoftware Process [9] should have such data readily available. Manyorganizations collect such data as part of their process assessmentand improvement e�orts.3. Execution patterns related to the current testing strategy. Thesecan be measured as tests are executed and compared to desirableexecution patterns based on this study. Then, if a given testingstrategy does not show enough loop or branch bias, test cases can begenerated that come closer to the more e�ective execution pattern.For example, need for higher loop bias may lead to tests that processlarger lists, �les, or problems, if that is what the loops in the codedo.� The di�erences in fault exposure over time of uniform versus non- uni-form execution patterns can be interpreted as an indicator how far o� anSRGM with uniformity assumptions would be. Fault exposure rate as afunction of time (and assumptions relating to its functional form) is an im-portant descriptor of SRGMs. The study found some large di�erences infault exposure between uniform and non-uniform execution patterns. Theresults also showed dramatically di�erent fault exposure rate (and thereliability estimates derivable from it) for programs with the same codesize and fault density but di�erent code structure and execution patterns.This illustrates that one of the reasons for poor reliability predictions in22

traditional SRGMs is their inability to model the e�ects of code structureand execution patterns. There is a risk in using software reliability growthmodels based on random testing assumptions. New approaches are neededfor estimating software reliability. They should take into account the ef-fects of testing strategies, fault distribution and structure of a softwaresystem.AcknowledgementsWe would like to thank the reviewers and Lee White, the editor, for their helpfulsuggestions in improving the paper.References[1] V.R. Basili, B.T. Perricone, \Software errors and complexity: an empiricalinvestigation", Communication of the ACM, vol. 27, no. 1, p. 42-52, 1984.[2] M.-H. Chen, A.P. Mathur, V.J. Rego, \E�ect of testing techniques onsoftware reliability estimates obtained using a time-domain model", IEEETrans. on Reliability, vol. 44, no. 1, p. 97-103, 1995.[3] S. D. Conte, H. E. Dunsmore, V. Y. Shen, Software Engineering: Metricsand Models, Benjamin Cummings, Menlo Park, CA, 1986.[4] W. Farr, \Software reliability modeling survey", In: Handbook on SoftwareReliability Engineering, M.R. Lyu (Ed.), McGraw-Hill, p.71-117, 1995.[5] P. Frankl, E.J. Weyuker, \An applicable family of data ow testing crite-ria", IEEE Trans. on Software Eng., vol. 14, no. 10, p. 1483-1498, 1988.[6] A. L. Goel, \Software reliability models: assumptions, limitations, andapplicability", IEEE Trans Software Engineering, vol. 11, no.12, p.1411-1423, Dec., 1985.[7] D. Hamlet, R. Taylor, \Partition testing does not inspire con�dence", IEEETrans Software Engineering, vol. 16, no. 12, p. 1402-1411.[8] J.R. Horgan, A.P. Mathur, A. Pasquini, V.J. Rego, \Perils of software re-liability modelling", Technical Report, Purdue University, February, 1995.[9] W. S. Humphrey, A Discipline of Software Engineering, Addison Wesley,Reading, MA, 1995.[10] J. M. Keables, Program Structure and Dynamic Models of Software Re-liability: Investigation in a Simulation Environment, PhD thesis, IllinoisInstitute of Technology, Chicago, IL, Dec. 1991.23

[11] N. Li, Y.K. Malaiya, \On input pro�le selection for software testing", Tech-nical Report CS-94-100, Colorado State University, March, 1994.[12] M. Lu, S. Brocklehurst and B. Littlewood, \Combination of predictions ob-tained from di�erent software reliability growth models", Journal of Com-puter and Software Engineering, 1 (4), p. 303-323, 1993.[13] Y. K. Malaiya, P. K. Srimani, Software Reliability Models: TheoreticalDevelopment, Evaluation and Applications, IEEE Computer Society Press,1990.[14] Y.K. Malaiya, N. Karunanithi, P. Verma, \Predictability measures for soft-ware reliability models", IEEE Trans. on Reliability, vol. 41, p. 539-546,1992.[15] Y.K. Malaiya, A. von Mayrhauser, P.K. Srimani, \An examination of faultexposure ratio", IEEE Trans. on Software Eng., vol. 19, no. 11, p. 1087-1094, 1993.[16] J.D. Musa, A. Iannino, and K. Okumoto, Software Reliability: Measure-ment, Prediction, Application, McGraw-Hill, New York, 1987.[17] A. P. Nikora and M.R. Lyu, \Software reliability measurement experi-ence", In: Handbook on Software Reliability Engineering, M.R. Lyu (Ed.),McGraw- Hill, pp.255-301, 1995.[18] R.W. Selby, V.R. Basili, \Error localization during software maintenance:Generating hierarchical system descriptions from the source code alone",In: Proceedings of the Conference on Software Maintenance, p. 192-197,1988.[19] R. Selby, V. R. Basili, \Analyzing error-prone system structure", IEEETransactions on Software Engineering, vol. 17, no. 2, p. 141-152.[20] V.Y. Shen, T.-J. Yu, S.M. Thebaut, L.R. Paulsen, \Identifying error-pronesoftware {An empirical study", IEEE Trans. on Software Eng., SE-11(4),p. 317-323, 1985.[21] R.C. Tausworth, M.R. Lyu, \Software reliability simulation", In: Handbookon Software Reliability Engineering, M.R. Lyu (Ed.), McGraw-Hill, p. 661-697, 1995.[22] J. Tian, P. Lu and J. Palma, \Testing-execution based reliability measure-ment and modeling for large commercial software", IEEE Trans. SoftwareEngineering, vol. 21, no. 5, p. 405-414, May 1995.[23] J. Troster and J. Tian, \Measurement and defect modeling for a legacysoftware system", Annals of Software Engineering, vol. 1, p. 95-118, Aug.1995. 24

[24] M.Z. Tsoukalas, J.W. Duran, S.C. Ntafos, \On some reliability estimationproblems in random and partition testing", IEEE Trans. on Software Eng.,vol. 19, no. 7, p. 687-697, 1993.[25] A. von Mayrhauser, J. M. Keables, \Program structure and dynamic mod-els of software reliability: Investigation in a simulation environment", J.Computer and Software Eng., vol. 1 (4), p. 349-366, 1993.[26] A. von Mayrhauser, Y.K. Malaiya, P.K. Srimani, \On the need for sim-ulation for better characterization of software reliability", In: Procs. 4thISSRE, p. 264- 272, Denver, 1993.

25

coping with panic attacks and agoraphobia - ed beckham, ph.d

Documents