00614082_ieee_towards a statistical approach to testing object-oriented programs

8/3/2019 00614082_IEEE_Towards a Statistical Approach to Testing Object-Oriented Programs

1/10

Towards a Statistical Approach to Testing Object-Oriented ProgramsPascale Thkvenod-Fosse and H&ne Waeselynck

LAAS - CNRS7, Avenue du Colonel Roche3 1077 Toulouse Cedex 4 - FRANCEE-mail: { thevenod, waeselyn}@laas.frAbstract

Statistical testing is based on a probabilisticgeneration of test data: structural or functional criteriaServe as guides for defining an input profile and a test size.Previous work has confirmed the high fault revealingpower of this approach for procedural programs; it is nowinvestigated for object-oriented programs. A method forincremental statistical testing is defined at the clusterkvel, based on the class inheritance hierarchy. Startingfrom the root class of the program, descendant class(es) aregradually added and test data are designed or ( i)structuraltesting of newly defined features and, ( ii ) regressiontesting of inherited features. The feasibility of the methodis exemplified by a small case study (a Travel Agency)implemented in Eifsel.

I.: IntroductionObject-oriented(00) esign corresponds to a recentapproach to software construction, the software systembeing viewed as a collection of entities that interact witheach other. 00 languages provide powerful mechanismslike entity instantiation, inheritance, polymorphism, thathave no equivalent in traditional procedural languages (C,Fortran, ...).Hence, proper verification approaches have tobe developed that take into account these specificities.Testing involves exercising the software by

supplying it with input values. Since exhaustive testing isnot tractable, the tester is faced with the problem ofselecting a subset of the input domain that is well-suitedfor revealing the (unknown) faults. The selection is guidedby test criteria that relate either to a model of the programstructure or a model of its functionality, and that specify aset of elements to be exercised during testing [2]. Forexample, for procedural programs, the control flow graphis a classical structural model, and branch testing is anexample of a criterion related to this model. In the case of

00 programs, further work is needed to determine to whatextent such classical criteria may offer useful guidance.This paper belongs to on-going research aimed at adaptingstructural techniques to the testing of 00 programs.Given a criterion, the methods for generating testinputs proceed according to one of two principles: eitherdeterministic or probabilistic.The deterministic principle consists in selecting apriori a set of test patterns such that each element isexercised at least once; and this set is most often built sothat each element is exercised only once, in order tominimize the test size. But a major limitation is due tothe imperfect connection of the criteria with the real faults.Because of the lack of an accurate model for softwaredesign faults, this problem is not likely to be solved soon,even in the case of traditional procedural programs.Exercising only once, or very few times, each elementdefined by such imperfect criteria is not enough to ensure ahigh fault exposure power. Obviously, the problem isexpected to get even worse in the case of 00 programs.To make an attempt at improving current testingtechniques, one can cope with imperfect criteria andcompensate their weakness by requiring that each elementbe exercised several times. This involves large sets of testpatterns that may be tedious to derive manually; hence theneed for an automatic generation of test data. This is themotivation of statistical testing designed according to acriterion: combine the information provided by imperfectcriteria with a practical way of producing numerous inputpatterns, that is, a random generation. This approachshould not be confused with blind random testing, whichuses a uniform profile over the input domain. Previouswork has shown the high fault revealing power ofstatistical testing for procedural programs (see e.g., [13,141). This paper investigates the feasibility of statisticalstructural testing for small 00 subsystems (class cluster).Path selection techniques are combined with theconsideration of one specific 00 concept: inheritance. Thisconcept should facilitate an incremental testing approach.

990731-3071/97 $10.00 0 1997 IEEE
mailto:laas.frmailto:laas.frmailto:laas.fr


2/10

The paper is organized as follows. Section 2 recallsthe theoretical framework of statistical testing. Section 3discusses the new testing issues raised by the 00paradigm and reports on current work aiming to addressthem. Then, a method for designing incremental statisticaltesting is defined in Section 4. It is a structural class-basedapproach whose steps are determined according to theinheritance hierarchy. Its feasibility is exemplified by asmall case study (A Travel Agency) implemented in Eiffel.Section 5 describes directions for further investigation.2: Statistical testing

Given a criterion C, let Sc be the corresponding set ofelements to be exercised during testing. To comply withfinite test sets, Sc must contain a finite number ofelements that can be exercised by at least one inputpattern. For example, the structural criterion "All-Branches" requires that each program branch be executed:C = "Branches"3 Sc= {executableprogram edges}.When using the probabilistic method for generatingtest data, the number of times each element k of Sc isexercised is a random variable. Two factors play a part inensuring that on average each k is exercised several times,whatever the particular test input values generatedaccording to the test profile and within a moderate testduration.The first factor is the input probability distributionwhich must allow us to increase the probability ofexercising the least likely element of Sc. Two differentways of deriving such a proper test profile are possible:either analytical, or empirical. The first way supposes thatthe activation conditions of the elements can be expressedas function of the input parameters: then theirprobabilities of occurrence are a function of the inputprobabilities, facilitating the derivation of a profile thatmaximizes the frequency of the least likely element. Thesecond way consists in instrumenting the software in orderto collect statistics on the numbers of activations of theelements: starting from a large number of input data drawnfrom an initial distribution (e.g., the uniform one), thetest profile is progressively refined until the frequency ofeach element is deemed sufficiently high.The second factor is the test s ize N which must belarge enough to ensure that the least likely element isexercised several times under the test profile derived. Thenotion of test quality with respect to a criterion providesus with a theoretical framework to assess a test size.

Definition. A criterion C is covered with aprobability qN if each element of Schas a probability of atleast qN of being exercised during N executions withrandom input patterns. qN is the test quality with respectto (wrt) C .

The quality qN is a measure of the test coverage wrtC. Let pk be the probability that a random patternexercises the element k under the test profile retained, andPC= min {pk, k E Sc} be the occurrence probability perexecution of the least likely element. Then the test qualityand the test size N are linked by the equation: (l-Pc)N =1-qN. The result of this is that on average each element isexercised several times. More precisely, this equationestablishes a link between qN and the expected number oftimes, denoted n, the least likely element is exercised:n 3 - ln(1-qN). For example, n = 7 for qN = 0.999, andn E 9 for qN = 0.9999. Thus, knowing the value of Pc forthe test profile derived, if a test quality objective qN isrequired the minimum test size amounts to:

Nmin = In(1 qN) / In(1 Pc) (1)The principle of the method for designingstatistical testing according to a given criterion Cinvolves two steps, the first of which being the cornerstone of the method. These steps are the following:1. Search for an input distribution which is well-suited torapidly exercise each element of Sc in order to decreasethe test size; or equivalently, the distribution mustaccommodate the highest possible Pc value;2. Assessment of the test size N required to reach a targettest quality qN wrt C, given the value of Pc inferred fromthe first step; relation (1) yields the minimum test size.Going back to the imperfect connection of the criteriawith the real faults, it is worth noting that the criteriondoes not influence random data generation in the same wayas in the deterministic approach: it serves as a guide fordefining an input profile and a test size, but does not allowfor the a priori selection of a (small) subset of inputpatterns. Indeed, the efficiency of the probabilisticapproach relies mainly on the assumption that theinformation supplied by the criterion retained is relevant toderive a test profile that enhances the program failureprobability. And there is a direct link between faultexposure power and random data: from relation (l), anyfault involving a failure probability p 2 PCper executionaccording to the test profile has a probability of at least qNof being revealed by a set of "in random patterns'. N osuch link is foreseeable as regards deterministic test data;and this link should carry more weight than a thoroughdeterministic selection (thus providing in essence a perfectcoverage) with respect to questionable criteria. Theexperimental results obtained for procedural programs (seee.g., [13, 141) support this assumption.

I Since the test profile is derived from the criterion retained, it mayhave little connection with actual usage (i.e., operational profile):the focus is bug-finding, not reliability assessment.

100


3/10

3: 00 paradigm from a tester'sviewpointAs an in-depth description of the 00 paradigm wouldfall beyond the scope of this paper, we assume a basicknowledge of the underlying concepts (see e.g., [161).Theterms of object , class, inheritance (which can be single,multiple, or repeated), polymorphism and dynamic bindingare used with their usual meaning. We adopt the Eiffel

[lo] terminology to denote thefeatures that can be invokedon one object: attributes represent data items associatedwith the object while routines represent computations(either procedures or functions) applicable to it.We adopt the view of testers to discuss the 00concepts. There is now an agreement among the testingcommunity that these concepts raise a number of problems.3.1: Structure versus behavior

As already mentioned, test criteria are related to amodel of either the program structure (white box model) orof its functionality (black box model).

00 design and implementation is typically used inthe case of open, interactive software systems. The blackbox description of such systems is discussed in [17]. It isargued that the observable behavior of objects cannot bemodeled by algorithms, Turing machines or traditionalmathematics: the adequate model of computation is theinteraction machine. Interaction machines extend Turingmachines with input actions supporting interaction withan external environment during computation. The richerexpressive power of interaction machines has a cost:complete specification of 00 system behavior isimpossible. Only a subset of all possible computationscan be modeled, by providing partial description ofinterface behaviors. A s will be shown below, theirreducibility of object behavior to that of algorithms alsohas dramatic consequences for structural models,Structural analysis of procedural programs is usuallybased on the analysis of control flow graphs that may besupplemented with data flow information. The source codeis divided into basic blocks, that is, sequences ofinstructions that cannot be interrupted. Then, the controlflow graph is constructed by making each basic block anode and drawing an arc for each possible transfer ofcontrol from one basic block to another. Data-flowannotation may be added to mark the nodes where avariable is assigned a value and the ones where it is used.Such models are consistent with a view of programsacting as closed inputloutput processes: each input patternpredetermines the path in the graph to be executed; theoutput is produced as a result of the processing of basicblocks along the selected path. This emphasis on closed

algorithms and their transformation effect is questionablein the case of 00 programs.Typically, the body of routines applicable to anobject is of small size, but contains calls to routines ofother objects, which in turn may invoke distant features.This phenomenon is described in [9]. Having analyzedclasses of the Interviews library, the authors found 4,818invocation chains, most of them involving between twoto nine calls in sequence, the longest ones being of lengthfourteen. This complies with the interactive nature ofobjects: the structure responsible for the behavior of oneroutine is distributed among several objects2. Moreover,this structure is not restricted to algorithms. Objects carrya state that remembers and influences the effect ofroutines. The state space can be very large, because of theunfolding of attributes associated with the objects throughthe invocation chains; it is not a static dimension sinceobjects can be created or deleted during computation.Hence, the structure of 00 programs cannot be readilyanalyzed in terms of algorithmic composition and transferof control. It is worth noting that this problem is notspecific to the 00 paradigm: conventional structuralanalysis would exhibit limitation for any designcharacterized by highly decentralized flow of control,extensive use of shared data structures, and dynamicallocation of memory, whether or not the implementationlanguage is object-oriented. However, the facilities offeredby 00 languages further compound the problem.The class is the basic modular unit into which thesource code of an 00 program is organized. But a class isa purely static notion: only objects can be executed (andtested). This results in the syntactic constructs having aless straightforward operational interpretation than in thecase of traditional procedural languages. For example, anattribute declaration within the scope of a class is notequivalent to a variable declaration (several instances maybe present at some point of the computation). Owing topolymorphism, the dynamic type of an object variablemay be different from the one with which it is declared;hence, the invocation of an object routine is not the directtransfer of control induced by a procedure or function call(the adequate version of the routine has to be selected inaccordance with the dynamic type of the object).Another difficulty in analyzing source code arisesthrough the incremental description of classes throughoutthe inheritance hierarchy. The code provided for every newclass is an addendum to the parent classes, allowing theprogrammer to introduce new features, rename or redefine

2 But the concept of interaction also goes beyond the boundary of thesoftware system. The processing may be partly delegated to othersystems (hardware device, human operator, ... not encompassedwithin the structural model.

101


4/10


5/10

antidecomposition axiom, stating that adequately testing aprogram P (with respect to a criterion) does not imply thateach component of P is adequately tested (with respect tothe same criterion). The child class may provide newcontexts of usage that were impossible in the originaldefinition of the routine. Actually, if the design imposesno semantic constraints on the inheritance relation, theonly case where retesting is not required is when there areno interactions between the new and inherited features.A last result of [1 ] concerns regression testing. Itmay be thought that, thanks to encapsulation, changingthe internal implementation of a class would not affect itsclients provided that the interface is left the same. Butintegration testing is always needed: the anticompositionaxiom reminds us that adequately testing all componentsof a program P does not imply that P is adequately tested.To conclude, it is far from trivial to determinewhether some piece of code needs retesting, and if so , todetermine whether existing test data can be reused.Moreover, the analysis to be conducted will be differentdepending on the programming language: as pointed out in[12], the 00 languages have a great variance in thefacilities that they provide to the programmer. It must beadded that common facilities may also have very differentimplementations. For example, the inheritance mechanismssupported by Eiffel are not the ones offered by C++.3.4: Previous work on testing 00 programs

The problems posed by 00 programs are a growingconcern of the testing community. We present below anoverview of previous work on this area (see [4] oradditional pointers). To our knowledge, none of the authorshave investigated the issue of statistical testing yet.Testing from formal specifications has been studied inthe framework of algebraic abstract data types (ADT). Thisis not surprising since many 00 concepts originate fromADTs: a class can be seen as the implementation of anADT. In the ASTOOT approach [ 6 ] , est cases consist ofpairs of sequences of routine invocations that should putobjects of the target class into the same abstract state.Each sequence in the pair is executed on a different object,and the paired objects are compared by means of a user-supplied routine that approximates an observationalequivalence checker: if the check result is negative, a faulthas been revealed. Thus ASTOOT is an approach to thetesting of classes that form standalone units, likeimplementations of stacks and queues. The work initiatedby 111 aims at dealing with larger subsystems involvingseveral communicating entities. The specification is donein a language called CO-OPNI;! (Concurrent Object-OrientedPetri Nets) based on algebraic nets. First investigationshave consisted in adapting the theory of ADT testing [3]

to this new framework. Further work has to be conductedin order to obtain a practical testing strategy. To concludeon ADT testing approaches, a key advantage is that theyprovide an oracle procedure to determine the correctness ofthe test results, although there are unavoidably somelimiting constraints due to observability problems.The testing approach presented in [9] involves threecomplementary models obtained from the reverseengineering of source code:The object relation diagram (ORD) displays theinheritance, aggregation and association relationshipsbetween the classes;The block branch diagram (BBD) is a kind of control anddata flow graph for each routine; it also providesinformation about the interface with other routines;The object state diagram (OSD) is a hierarchical,concurrent, communicating state machine that modelsthe object state dependent behavior.Based on these models, a global testing strategy issketched. The ORD is used to determine a class test orderaccording to the dependencies between the classes; the aimis to obtain an integration strategy that minimizes thenumber of test stubs. The BBDs are used to prepareconventional structural test data for the individual routines.The OSDs facilitate the sequential testing of object statebehavior. The processing of models is partly supported bytools. The problem of regression testing is also addressed:a "firewall" tool automatically computes the class affectedby a change; however, since this tool is based on staticanalysis, polymorphism and dynamic binding are nottaken into account.The work of [8 ] is focused on the identification ofintegration levels. Emphasis is put on behavioral, ratherthan structural, considerations: grey box analysis is usedto identify the most significant paths and interactions inthe program. Two intermediate levels are defined inaddition to routine and system levels:

Meth omes sage Paths, or MM-Paths for short. This isrelated to the notion of message quiescence: an MM-Path starts with a routine (or method, in theirterminology) and ends when it reaches a routine whichdoes not issue any message of its own;Atomic System Functions, or ASF for short. This isrelated to the notion of event quiescence: an ASF is aninput port event, followed by a set of MM-Paths, andterminates by an output port event.The integration proceeds by testing the MM-Pathsfirst and then testing their interaction in an ASF.In [7], an incremental testing strategy based on theinheritance hierarchy is proposed. Each class has a testinghistory involving three integration levels: standard unittesting of individual routines, intra-class integration testingof routines, and inter-class integration testing; at each

103


6/10

level, structural and functional test sets are distinguished.The concern is to have histories be incrementally updatedso that many test sets of a child class can be derived fromthe ones of its parent class. The approach considers onlysingle inheritance, and is oriented toward C++. Dependingon the type of child feature (new / virtual-new, recursive Ivirtual-recursive, redefined / virtual-redefined), appropriatedecision is taken at the various testing levels: design newtest cases, incorporate parent test cases in the child testinghistory, rerun or not these incorporated test cases. Thedecision procedure is defined by explicitly referring to thework in [l 13 related to adequacy axioms (see Section 3.3).A synthetic view of the problems addressed by thevarious authors is provided in Figure 1. For the purpose ofcomparison, we also show the specific contribution of thispaper, to be detailed in the next section.

~~ ~~

algebraic nets:Co-oPN2

ORD, BBD, OSDfrom reveBeengineeringMM-Path, ASF

m

~~ ~ ~oracle problem: evaluation of groundtemporal logic formula.Theoretical investigation of test dataselection.

Global integration strategy minimizingthe number of stubs.Firewall analysis for regression esting.Integration strategy based on behavioraldependence: message and eventquiescence.

ThisPaper

Models of 00 Contribution to testing issues

Oracle problem: observational

- I Reuse of test data according to theinheritance relation.static & dynamic I Statistical testing steps based on theBON models inheritance relation (cluster level).

Fig. 1: Overview of the problems addressed bythe various authors

4: Statistical testing based oninheritance hierarchyAs a first step, the work presented in this section hasconcentrated on the question of how to design statisticaltesting for clusters of directly instantiable classes (noparametrised or abstract classes). We assume two kinds ofstatic relations: single inheritance and client relations.

4.1: PrincipleOur goal is to define a compromise between two testinglevels: class level (each class is studied in isolation), and

cluster level (all the objects involved in a cluster are testedin a single step). The proposed strategy is made up ofseveral testing steps, each one focusing on a subset ofclasses. Such a progressive testing process is aimed at(i) defining testing steps requiring no stub constructions,and (ii) facilitating the diagnosis which could become verytedious due to complex dependency between classes, inparticular in case of polymorphism and dynamic binding.To define the strategy, two questions must be answered:1 . How to divide a cluster into subsets of classes whichform meaningful subsystems without stubs, that is,

how to define the testing steps?2. How to design test data specific to each testing step?4.1.1: Definition of the testing steps. The methodfor defining testing steps is based on the analysis of thestatic dependencies between classes which are due to bothinheritance mechanism and client relations. Starting fromthe root class of the cluster (which may be a test driver), asubset denoted Si is defined: it groups the root class, itsserver classes and their parent classes. By construction,SS i = Si forms a meaningful subsystem without requiringstubs. It will be the focus of the first testing step.Then, a descendant of a class of SS i is selected todefine the classes grouped in a new subset S2 made up of:the selected descendant class, its server classes and all theparent classes which do not belong to S S i . By construction,SS2 = SSi U S2 forms a meaningful subsystem withoutrequiring stubs. It will be used in the second step whichwill focus on testing objects of the new S2 classes.The process goes on in a similar way by integratingprogressively the descendant classes. Let SSi denote thesubsystem formed by the union of the successive sets Si,. ,Si:the set Si+lcontains a descendant class of an elementof SSi, its server classes and the parent classes that do notbelong to SSi.The subsystem SSi+l = SSi U Si+l will beused in step (i+l) focused on testing objects of Si+lclasses.4.1.2: Design of statistical test patterns. In ourmind, structural analysis of 00 program has an importantrole to play in the design of test data (Section 3.1). At thecluster level, we recommend that one takes advantage ofstructural information to define statistical test profiles, aslong as the analysis of execution paths is not prohibitedby the class dependency complexity. We are conscious thatevery cluster is not simple enough to enable the structuralanalysis of execution paths; yet, we believe that themethod may apply to a large number of clusters, namelythose involving reasonable numbers of calls in sequence(say, less than 8) with typical class routines of small size.Since the testing method involves several test steps,structural analysis is also conducted incrementally on the

104


7/10

expanded control flow graphs associated to the subsystemsSSi. Starting from the control flow graph of the creationroutine of the root class, these graphs are built as follows.Step 1. The nodes of the control graph of the creationroutine of the root class may contain a call to a localroutine of the root class, or to a distant routine of a class

of the set S i . Then, node expansion is performed, whichconsists in replacing the calls by partial control flowgraphs of the corresponding routines: here, partial meansthat the only paths that are taken into account are thosecorresponding to the execution of S S I objects. Pathsinvolving objects not in SS1 classes are ignored becausetheir activation is delayed until the inclusion of the classesin a subsystem SSi, i>l. The partial graphs thus addedmay, in turn, contain routine calls and the expansionnode process goes on. The control graph GI of S S I iscomplete when all call nodes have been expanded.

Step (i+l). Starting from Gi, the control graph Gi+l ofSSi+l is got by the addition of new execution paths.Partial control graphs obtained at step i are completedwith the paths containing calls to routines of the Si+lclasses. The added parts may contain routine calls andthe expansion node process goes on by taking into accountall the calls to routines of any of the SSi+l classes.Based on the graphs Gi, the coverage of the executionpaths provides us with a test criterion that ensures a soundprobe of the program structure. At each step i, the methodfor designing statistical test patterns (Section 2 ) may beapplied using Sci = {execution paths of Gi} as the set ofelements to be activated during testing. Nevertheless, suchtest pattems should be quite redundant from one step to thefollowing ones, since Gi 3 Gi.1. Then, an extremesolution could be to execute only the new execution pathsat each step i, that is, to use Sci = {execution paths of Githat do not exist in Gi-l}. But this weak criterion couldsignificantly reduce the test effectiveness since the

execution paths of Gi-1 will not be tested in the newsubsystem SSi: as was stated in Section 3.3, previouspaths must be retested in the new context of execution.Then, at each step i, the solution we propose is totake into account the whole set of execution paths of Gi,and use the notion of test quality qN to require a lessstringent coverage of the paths already executed in theprevious steps. This can be done by setting different targettest quality objectives, that is, for example:qN = 0.9999 for the new paths that must be strongly

tested, which means that each path will be executed 9times on average;For the other (retested) paths, qN = 0.9 which means thateach path will be executed only 2 times on average.How to define test profiles and assess test sizes whichmeet these requirements will be exemplified in Section4.3, by the case study described below: the Travel Agency.

4.2: Description of the case studyThe example studied corresponds to a cluster of classeswhich implements the management of a provision catalogueprovided by a travel agency. The catalogue management

consists in creating and updating provisions. A provisionis defined by a destination, a departure date, an arrival date,and one or several services. Three types of services areavailable: normal, comfort, and luxury services. Theprogram has to perform the following operations: creationof a new provision in the catalogue, update of a provisionpreviously recorded, printing of a provision, printing ofthe whole catalogue. Updating a provision consists of (i)the modification of the destination or of the departure date orof the arrival date, or (ii) a service addition or cancellation.The program was developed using the BON (BusinessOriented Notation) method [153 and implemented in Eiffel[ IO ] (about 670 lines of code). The BON method producesstatic descriptions and dynamic descriptions of the system.

Static descriptions document the software structure,that is, what the components are and how they are relatedto each other. The design of the Travel Agency example hasled to the definition of 8 classes (see Fig. 2 ) . All of themare instantiable (no parametrised or abstract class). Theyare linked together by aggregation relationships, or singleinheritance relations involving polymorphism (see Fig. 3).Dynamic descriptions document how objects interactat execution time. First, for each event triggered bysomething in the external world of the software, a list of theobjects that may become involved as part of the softwareresponse is identified. Starting from these lists, possibleexecutions (called scenarios) are selected to illustrate theimportant aspects of the system behavior. For eachscenario a dynamic diagram, which consists of a set ofcommunicating objects is produced. Figure 4 shows anexample of scenario and the associated dynamic diagram: a

simple rectangle represents an object, a double rectanglerefers to a set of objects; a message sent from one objectto another is represented by a dashed arrow labeled bysequence numbers. In the dynamic diagrams, the messagerelations are always potential. For example, in an executionrelated to Scenario A: message 2 will be sent only if theprovision to be updated has been found in the actualcatalogue; only one message will be sent by the PROVISIONobject to create either a SERVICE object, or a C-SERVICEobject, or a L-SERVICE object, depending on the type of theservice to be added in the specific execution.4.3: Definitionof the testing steps

The Travel Agency allows us to illustrate some of theissues discussed in Section 3: class dependency, poly-morphism, dynamic binding, and open interactive behavior.

105


8/10

ScenarioA: add a service in a provision1 Search for the provision to be updated2 Request to create an additional service3 Create a new service

1 4 Create stop(s)

SYSTEMPURPOSE

Iormal service (flight price, flight number,I ERvlcE I nsurance company)

CATALOGUE-MANAGEMENTCreates and updates provisions

C-SERVICE

L-SERVICEComfort service (one or several stops in additionto a normal service)Luxury service (one or several luxury stops inaddition to a normal service)Basic stop (town, hotel cateqory)

TRAVEL-AGENCYPROVISION

I c -s roP I Comfort stop (identical to a basic stop) I

The root class (actually, a test driver)Provision (destination, departure date, arrivaldate, one or several services)

Iuxury stop (guide, tour and two meals inL-srop I addition to a basic stop)Fig. 2: System chart for the Travel Agency

Fig. 3: Static Architecture of the Travel Agency4.3.1: Static dependency analysis. To define thesuccessive testing steps, we examine the static architecturemodel (Fig. 3 ) . Since inheritance and client relationsinduce dependencies between classes, the behavior of anysingle object would be meaningless. Moreover, we havepolymorphism through the services aggregated in thePROVISION class. The analysis proceeds as follows:1. Starting from the root class and retaining only thetopmost classes linked together by client relationshipsprovides us with a first subset S1 of three classes: Si =2 . To progressively integrate descendants of the elementsof S i , we can choose between two classes: C-SERVICE or

L-SERVICE. The class description (Fig. 2) shows that theformer class should involve less features than the later

{ TRAVEL-AGENCY, PROVISION, SERVICE};

C-SERVICE

Fig. 4: Example of scenario and associateddynamic diagram for the Travel Agencyone: hence, we select C-S ERVICE. Then, starting fromthis class, we proceed as in step 1 to define a secondsubset of classes: s2 = { C-SERVICE, C-STOP, STOP};

3 . Finally, starting from the L-S ERVICE class leads us todefine a third subset which contains the two remainingclasses: s3= { L-SERVICE, L-STOP}.Hence, three incremental steps are required to accountfor the polymorphism of the service features: SS1 = S i ,SS2 = Si U S2 and SS3 = SIU S2 v S3 . S S I correspondsto the minimum set of objects that are able to generatemeaningful behavior patterns (no test stubs are required).

4.3.2: Structural analysis. The structural analysis ofthe routines actually implemented in the Eiffel programaims at providing us with detailed information on (i) allthe possible sequences of calls and, (ii) the structuralelements (e.g., execution paths) of the routines to beactivated in order to provide a complete coverage of theprogram. Tracing the paths may be difficult wheninvocation chains of routines traverse the class boundaries.Hence, it is often useful to refer to the dynamic model(Fig. 4) that provides a high level view of the dynamics.The various cases for dynamic binding of SERVICE objectshave to be taken into account: in each case it is necessaryto identify the active part of the software structure, leadingto partial control graphs. The analysis is conducted in thefollowing way.Step 1: analysis of the subsystem SS1 = Si .The control graph of the creation procedure of the rootclass TRAVEL-AGENCY is first constructed. This graph ismade up of 8 execution paths. The nodes of the graphmay contain a call to a local routine of the root class, orto a distant routine of the PROVIS ION class. Then, node

106


9/10

expansion consists in replacing the calls by partialcontrol graphs of the corresponding routines: the partialgraphs are constructed by ignoring all paths that containa reference to C-SERVICE or L-SERVICE classes, since theydo not have to be tested in subsystem S S I . Finally,calls to SERVICEclass routines are expanded in turn. Thecontrol graph GI f the subsystem SSI is then complete:it is made up of 17 execution paths, each of theminvolving between one to five routine calls in sequence.

Step 2: analysis of SS2 = SS1 v S2. Startingfrom GI, the control graph G2 of SS2 is got by additionof new execution paths. Partial control graphs of thePROVISION class routines are augmented with the pathscontaining calls to routines of C-SERVICE. Calls to C-STOProutines are then expanded. This leads to 9 additionalpaths: G2 is made up of 26 execution paths, each ofthem involving between one to seven calls in sequence.

Step 3: analysis of the whole system S S 3 = SS2U S3. In a similar way, starting from G2, we proceedby addition of paths associated with calls to routines ofthe S3 classes. The graph G3 contains a total of 35execution paths (9 new paths); the maximum number ofroutine calls in sequence remains unchanged (7).

4.3.3: Statistical test sets. The class cluster formsan open system that supports interaction with an externaloperator during computation. The number and type ofrequestshnputs to be supplied by the operator varydepending on the executed path. This compounds thedesign of statistical test sets and forces us to proceed intwo stages: (i) random choice of a path number, and (ii)random choice of the actual test pattern according to therequests to be supplied when this path is executed. As aresult, at each test step, the k execution paths of Gi arefirst numbered; then the method for designing statisticaltest patterns (Section 2) is applied to search for an inputdistribution over the set { 1, . , k) and assess a test size Niwhich gives the total amount of path numbers to begenerated. Starting from a sequence of Ni path numbersthus generated, stage (ii) will consist in constructing anactual input pattern for each path number: this pattern israndomly chosen from the input subdomain associatedwith the path execution, according to a uniform distributionover this subdomain.

As said in Section 4.1.2, the test criteria retained arethe coverage of the execution paths of the graphs Gi and,at each step, two different test quality objectives qN arerequired depending on whether or not the paths havealready been executed during previous steps. The numericalresults given below correspond to qN = 0.9999 for newpaths, and qN = 0.9 for retested paths. Sci denotes the setof elements to be activated at step i, and pj the probabilityof the path numbered j.

Step 1: Subsystem SSi.The graph GI contains 17paths: SCi= {pi, ..., p17). Each path must be stronglytested since it is the first step, and they can easily bemade equally likely (i.e., pj = 1/17 V j) by choosing auniform distribution over the set of numbers { 1, . 17).For P =pj = 1/17 and qN = 0.9999, relation (1) in Section2 gives: NI = "in = 152. Hence, the first testing stepis made up of 152 program executions under a uniformdistribution over the paths of G I .

Step 2: Subsystem SS2. G2 contains 9 new pathsp18, ..., p26: Sc2 = { P I , ..., p26). Let pnew be theprobability of each new path, and paid be the probabilityof each retested path. Since the sum of the probabilitiesof the 26 paths must be equal to 1, we get:

(2)The requirementsof qN = 0.9999 wrt new paths and qN =0.9 wrt retested paths give respectively, from relation (1):Nmjn = ln(O.OOO1) / ln(l-pnew) = In(O.l) / In( I-pold) (3)Then, replacing paid in (3) by its value deduced fromrelation (2) gives:ln(0.0001) / ln(1-pnew) = ln(0.1) / ln(1-((1-9.pnew )/I7),from which we get: Pnew = 4/53, Pold = 1/53. Then, from(3): N2 = Nmin z 120 test patterns generated according tothe distribution defined over the 26 paths of G2.

Step 3: System SS3. G3 contains 9 new paths p27,...,p35: Sc3= (pi, ...,p35). Relation (2) becomes:(2')Replacing pold in (3)by its value deduced from (2') leadsus to the following solution: Pnew = 4/62, Pold = 1/62.Then, from (3): N3 E 140 test patterns generated accordingto the distribution defined over the 35 paths of G3.In conclusion, testing of the Travel Agency according

to the incremental strategy involves three steps. A completetest set is then divided into three series of test patternswhich correspond to a total amount of 412 test patterns.

17.pold + 9.pnew = 1

26.~01~19.pnew = 1

5 Conclusion and Future DirectionThis work is a modest attempt to address the

challenging issues of testing 00 programs. Having beenworking on statistical testing for several years, we adoptthe view that it is a suitable means to compensate for thetricky link between current test criteria and software designfaults. As a first step, we perform a feasibility study ofstatistical structural testing at a unit level. The example isa class cluster from a Travel Agency software system.This small case study manages to exhibit some of theproblems arising from the 00 paradigm. In particular thebody of one routine cannot be tested in isolation. This wasnot obvious from the literature: many authors recommendthat each individual routine is tested as a first step of unittesting, using traditional functional and structural methods.Actually, due to the interactive nature of objects, the

10 7


10/10

smallest meaningful unit is likely to be a (minimal)cluster of classes. The structural analysis at this levelrequires both class dependency models and control flowmodels traversing the boundary of the classes; dynamicmodels helped us to understand the behavior carried by thestructure. Assembling this information, it was possible togenerate statistical test sets according to a series of inputdistributions ensuring a suitable probe of execution paths.Contrary to previous experimental work for proceduralprograms, we were not able to assess the fault revealingpower of the generated sets. The evaluation method thatwe previously adopted is mutation analysis: in [ 5 ] , it isshown that, for procedural programs, simple syntacticmutations have the ability to produce complex run-timeerrors representative of the ones that would have beencaused by real faults. Transferring this result to 00programs is questionable. Indeed, we have introducedmutations in the Travel Agency class routines, and theywere all revealed by our statistical test sets. But we feelthat those mutations are not representative of the problemsarising from the misuse of some complex aspects of the00 languages.The proposed statistical approach, clearly, does notcover all aspects of the 00 testing problems. At the unitlevel, further work is needed to address more complexschemes of (multiple, repeated) inheritance, abstract andparametrised classes, and state dependency problems. Atthe integration level, practical methods for statisticaltesting are still to be defined. It is expected that, owing tothe fact that the individual clusters have been thoroughlytested through statistical sets, it will be possible to basethe analysis on more abstract models and less stringentcriteria. For medium-scale procedural programs, retainingweak criteria and deriving several test profiles, each onefocusing on the coverage of a subset of a hierarchy ofmodels, was an efficient approach that allowed us tomanage complexity [14]. For 00 programs, ourinvestigation will be supported by experimental workperformed on a case study involving several class clusters.Acknowledgments. This work is partially supportedby the European Community (ESPRIT Project no 20072:DeVa). We wish to thank Caroline Parent very much forher useful contribution to the Travel Agency case study,within the framework of a student project.References[ l ] S . Barbey, D. Buchs, C. PCraire, A Theory ofSpecification-Based Testing for Object-OrientedSoftware, nProc. 2nd European Dependable ComputingConf. (EDCC-2), Taormina, Italy, pp.303-320, 1996.

[2][3]

B. Beizer, Software Testing Techniques, 2nd Edition,Van Nostrand Reinhold, New York, 1990.G. Bernot, M-C. Gaudel, B. Marre, Software TestingBased on Formal Specifications: A Theory and a Tool,IEE Software Engineering Journal, 6 (6), pp.387-405,1991.

[4] R. Binder, Testing Object-Oriented Software: aSurvey, Software Testing, Verification & Reliability, 6M. Daran, P. ThCvenod-Fosse, Software Error Analysis:ARealCase Study Involving Real Faults and Mutations,in Proc. ACM Int. Symp. Software Testing and Analysis(ISSTA96),San Diego, USA, pp.158-171, 1996.R-K. Doong, P. Frankl, The ASTOOT Approach toTesting Object-Oriented Programs, ACM Trans. Soft.Engineering and Methodology, 3 (2),pp. 101 130,1994.M-J. Harrold, J. McGregor, K. Fitzpatrick, IncrementalTesting of Object-Oriented Class Structures, in P r o c .14th IEEE Int. Con$ on Software Engineering (ICSE-1 4 ) , Melbourne, Australia, pp.68-80, 1992.P. Jorgensen, C. Erickson, Object-Oriented IntegrationTesting, Communications of the ACM, 37 (9), pp.30-38, 1994.

191 D. Kung et al., Developping an Object-OrientedSoftware Testing and Maintenance Environment,Communications of the ACM, 38 (IO), pp.75-87, 1995.[ lo] B. Meyer, Eiffel - The Language, Prentice-HallInternational, Object-oriented Series, 1992.[111 D. E. Perry, G . E. Kaiser, Adequate Testing and Object-Oriented Programming, Journal of Object-OrientedProgramming, 2 (3 , p.13-19, 1990.[121 M. Smith, D. Robson, Object-Oriented Programming -the Problems of Validation, in Proc. IEEE Con$ onSoftware Maintenance, New York, pp.272-281, 1990.[13] P. ThCvenod-Fosse, H. Waeselynck, Y. Crouzet, AnExperimental Study of Software Structural Testing:Deterministic versus Random Input Generation, inPr oc . 2 I s t IEEE Int. Symp. on F aul t -To le r an tComputing (FTCS-21), MontrCal, pp.410-417, 1991.[141 P. ThCvenod-Fosse,H. Waeselynck, Statemate Appliedto Statistical Testing, in Proc. ACM Int. Symp. onSofhvare Testing and Analysis (ISSTA93),Cambridge,[151 K. Waldtn, J-M. Nerson, Seamless Objec t-Or ientedSoftware Architecture, Prentice-Hall, The Object-Oriented Series, 1995.[161 P. Wegner, Dimensions of Object-Based Language

Design, in Proc. ACM Con$ on Obje c t -Or ie n te dProgramming, Systems and Language (OOPSLA87),New York, pp.168-182, 1987.[171 P. Wegner, Interactive Foundations of Object-BasedProgramming,lEEE Computer,28 (lO),pp.70-72,1995.[181 E. Weyuker, Axiomatizing Software Test DataAdequacy, IEEE Trans. Software Engineering, SE-12

(3/4), pp.125-252, 1996. ,[5]

[6]

[7]

[8]

USA, pp.99-109, 1993.

(12), pp.1128-1138, 1986.

108

00614082_ieee_towards a statistical approach to testing object-oriented programs

Documents