structural test data generation using a memetic ant colony...

Accepted Manuscript

Structural test data generation using a memetic ant colony optimization based onevolution strategies

Hossein Sharifipour, Mojtaba Shakeri, Hassan Haghighi

PII: S2210-6502(17)30337-1

DOI: 10.1016/j.swevo.2017.12.009

Reference: SWEVO 340

To appear in: Swarm and Evolutionary Computation BASE DATA

Received Date: 7 May 2017

Revised Date: 30 November 2017

Accepted Date: 17 December 2017

Please cite this article as: H. Sharifipour, M. Shakeri, H. Haghighi, Structural test data generation usinga memetic ant colony optimization based on evolution strategies, Swarm and Evolutionary ComputationBASE DATA (2018), doi: 10.1016/j.swevo.2017.12.009.

This is a PDF file of an unedited manuscript that has been accepted for publication. As a service toour customers we are providing this early version of the manuscript. The manuscript will undergocopyediting, typesetting, and review of the resulting proof before it is published in its final form. Pleasenote that during the production process errors may be discovered which could affect the content, and alllegal disclaimers that apply to the journal pertain.

https://doi.org/10.1016/j.swevo.2017.12.009

MANUSCRIP

T

ACCEPTED

ACCEPTED MANUSCRIPT

Structural Test Data Generation Using a Memetic Ant Colony Optimization Basedon Evolution Strategies

Abstract

Test data generation is one of the key activities that has a significant impact on the efficiency and effectiveness of softwaretesting. Since manual test data generation is quite inefficient and even impractical, automated test data generationhas been realized to produce an appropriate subset of input data to carry out effective software testing in reasonabletimes. This paper presents a memetic ant colony optimization (ACO) algorithm for structural test data generation. Theproposed approach incorporates (1+1)-evolution strategies (ES) to improve the search functionality of ants in local movesand enhance search exploitation. Moreover, we have introduced a novel definition of the pheromone functionality in theway that it discourages ants from choosing mostly covered paths of the program to reinforce search exploration. Giventhat branch coverage is considered as the coverage criterion, two fitness functions are used accordingly for our proposedalgorithm. The first fitness function is a Boolean function which is particularly defined to maximize branch coverage.It outputs one if a given solution is successful in traversing at least a yet uncovered branch; otherwise, it returns zero.The second fitness function is formulated according to the complexity of branches covered. The value of the secondfitness function is not taken into account for solutions whose Boolean function value equals one. For these solutions, thedecision-making process of ants is merely carried out based on the first fitness function. The experimental results indicatethe superiority of our memetic ACO algorithm relative to existing test data generation techniques in terms of both branchcoverage and convergence speed.

Keywords: automated test data generation, branch coverage, ant colony optimization, evolution strategies, pheromonetrail, fitness functions

1. Introduction

Software testing is an important activity in the software development life cycle. One of the challenges in software testingis to generate a set of test data, called a test suite, such that it satisfies a certain test criterion [1, 2, 3]. Unfortunately,this process is normally tedious and costly. According to [4, 5], nearly up to 50% of software development costs have beenrelated to testing. Automated test data generation on the other hand has the potential to significantly reduce software5

testing time and costs [6].Since the amount of data representing the input space of a program may approach infinity, it is required that some

methods be developed to generate an appropriate test suite that covers most parts of the program [7]. More precisely,assuming a program p, the input vector of p is represented as X = (x1, x2, . . . , xn) where n is the number of inputsand xi is the ith input parameter. If the domain of input xi is equal to Di, the program input space then equals to10

D = D1 ∗D2 ∗ . . . ∗Dn. It is clear that the exhaustive testing of a program with such numerous inputs is not affordable,requiring that an efficient and effective test suite within the given domain be generated. The generated test suite shouldcover more parts of the program (as a criterion of effectiveness) while it has a low size in terms of the number of test dataincluded in the suite (as a criterion of efficiency) to allow a low test execution time.

Test data generation methods can normally be divided into two distinct classes of functional and structural test data15

generation [8, 9, 10]. In the former, test data are chosen with reference only to the functional specification of the softwareunder test (SUT) whereas test data for the latter are selected based on the structural elements of the SUT source codesuch as statements, branches, definition-usage pairs and paths. Compared with functional testing, structural test datageneration is more cost-effective to detect failures in programs, thus has been widely applied and studied [6].

Various coverage criteria have been proposed to evaluate a generated test suite. Some important ones include statement20

coverage [11], branch coverage [12, 13], path coverage [14], coverage of a particular path in the program [7] and coverageof a particular node in a particular path in the program [7]. These coverage criteria can be calculated over the control flowgraph (CFG) of the program. Among them, branch coverage is the most cost-effective. Compared to branch coverage,complete path coverage is labor-intensive and is not feasible for CFG graphs with cycles [11]. On the other hand, asopposed to its simplicity, statement coverage cannot be employed to test the false condition in the code. Accordingly,25

most studies adopt branch coverage as the coverage criterion [11].With the growing size of software systems and the increase in their complexity, the use of traditional automated test

data generation methods, such as symbolic execution [15] and random test data generation [10] is challenging and costly.

Preprint submitted to Swarm and Evolutionary Computation December 18, 2017

MANUSCRIP

T

ACCEPTED

ACCEPTED MANUSCRIPT

Alternatively, search-based software testing (SBST) has been widely used in the related literature [16]. The main ideabehind SBST is to use a metaheuristic algorithm to generate test suites that meet a particular test criterion [10, 17]. The30

problem search space is defined as the SUT input space from which a metaheuristic algorithm aims at finding the smallesttest suite that can cover as many as the program structures in reasonably short times [18, 16]. This approach first startswith a small test suite which is then gradually enlarged during the algorithm execution until either 100% coverage (withrespect to a selected test criterion) is achieved or the maximum iteration limit is reached.

A variety of metaheuristic algorithms including genetic algorithms (GA) [1, 13, 19, 20, 21], simulated annealing (SA)35

[22], tabu search (TS) [17, 23], particle swarm optimization (PSO) [24] and ant colony optimization (ACO) [25] have beenso far used for test data generation. In addition, the employment of hybrid metaheuristics based on combining globalsearch strategies like GA with local search operators to form the so-called memetic algorithms (MA) have been shown tobe effective for test data generation [5, 26, 27, 28, 29].

In this paper, we present a memetic ACO algorithm for structural test data generation by aiming at covering most40

branches of the code. The proposed approach enhances search exploitation of ants by incorporating (1+1)-evolutionstrategies (ES) in their local moves. Moreover, a novel definition of the pheromone functionality has been introducedsuch that it discourages ants from choosing mostly covered paths of the program to help search exploration. We considertwo fitness functions for our proposed algorithm. The first fitness function is a Boolean function targeted at maximizingbranch coverage. The function outputs one if a given solution is successful in traversing at least a yet uncovered branch;45

otherwise, it returns zero. The second fitness function is formulated according to the complexity of branches covered.The complexity is defined based on the nested level and predicate type of branches, both of which are known to be goodindicators of the branch reachability [25]. We do not take into account the value of the second fitness function for thosesolutions whose Boolean function value is equal to one. In this condition, the decision-making process of ants with respectto their corresponding solution is solely performed based on the first fitness function. The termination criteria are either50

achieving full branch coverage or exceeding a pre-specified iteration number. The experimental results indicate that theproposed memetic algorithm achieves considerably high coverage rates, while at the same time, it benefits from goodconvergence speed by balancing between global and local moves. We can summarize the main contributions of our workin three directions as follows:

• The incorporation of (1+1)-ES for local moves of ants rather than pursuing them randomly to reinforce search55

exploitation.

• The introduction of a novel functionality of pheromone trails to enhance search exploration.

• The employment of two complementary fitness functions for the purpose of maximizing branch coverage as well asexpediting the convergence to full coverage.

The rest of the paper is organized as follows. Section 2 presents the related work. Section 3 reviews the ACO algorithm60

proposed in [25] for test data generation based on which our memetic algorithm has been developed. The proposed methodis described in Section 4. Section 5 presents the experimental results together with both empirical and statistical analysesand discussions. Four evaluation criteria are used for the assessment of both branch coverage and convergence speed.Finally, Section 6 concludes the paper and highlights some future research directions.

2. Related work65

There are numerous studies that have proposed metaheuristic optimization algorithms for structural test data gen-eration. Michael et al. [1] developed a testing tool based on GA called GADGET for automated test data generation.Two different implementations of GA were presented: standard, in which each input is represented as a string of bits anddifferential, which is based on a real-valued input representation. The objective defined for both implementations was tofind a set of tests that satisfy condition-decision coverage. To get 100% decision coverage, every decision in the program70

must take all possible outcomes at least once whereas to get 100% condition coverage, every condition in a decision inthe program must take all possible outcomes at least once. Since both types of coverage are important and one does notguarantee satisfying the other, in practice, they are typically combined to form condition-decision coverage [9]. The twoproposed GAs were compared with both random and gradient descent techniques. Random test data generation simplyconsists of generating inputs at random until a useful input is found. Gradient descent essentially works by making75

successive small changes to one input value to determine a good direction for making larger moves. When an appropri-ate direction is found, increasingly large steps are taken in that direction until no further improvement is obtained. Inthat case, a different input value is modified, and the process terminates when no more progress can be made for anyinput value. The experimental results indicated that the standard GA implementation outperformed the other three (i.e.,differential GA, random and gradient descent).80

Considering the branch coverage criterion, Fraser and Arcuri [21] developed an efficient testing tool called EvoSuiteby using GA to automatically generate test data for classes written in Java code. As other studies that used GA forautomated test data generation, we can refer to [13, 19, 20, 22].

2

MANUSCRIP

T

ACCEPTED

ACCEPTED MANUSCRIPT

A few studies focused on developing single solution based metaheuristics for automated test data generation. Masnourand Salame [14] employed SA by the aim of maximizing path coverage. To define the fitness function, each branch85

predicate was first reformulated to an equality relation consisting of two operands. The closeness between the valuesof the two operands was then measured by a weighted Hamming distance between them. The experimental resultsdemonstrated that the coverage rate of SA is slightly better than GA at the expense of longer execution times. Dı́az etal. [17] employed TS for automated test data generation by considering the condition-decision coverage criterion. Theproposed TS was compared with a random test data generation technique. The results showed the superiority of TS in90

terms of both coverage and test data generation time. Harman and McMinn [30] used hill climbing (HC) to generate testdata by considering the branch coverage criterion. The fitness function defined is a combination of two measures includingthe approach level and the branch distance. The approach level is the minimum number of branches required to reach thetarget branch from the path covered by test data, and the branch distance for a given branch predicate is the deviationfrom its trueness when the input values are assigned to the variables. To avoid being entrapped in local optima, HC was95

restarted periodically at a new randomly chosen starting point until a specified number of fitness evaluations was reached.Some recent studies used computational swarm intelligence algorithms for automated test data generation. Windisch

et al. [24] proposed a comprehensive learning PSO (CL-PSO) where each particle learns from different neighbors for eachdimension separately depending on its assigned learning rate. In case no further improvement was achieved for a specifiednumber of iterations, the algorithm was restarted by reassigning the particles. The fitness function defined was the same100

as that of Harman and McMinn [30] by combining the branch distance with the approach level. The superiority of PSOwas empirically demonstrated against GA in terms of branch coverage.

Singal et al. [31] presented a hybrid approach based on PSO and GA to automate test data generation by consideringtwo data-flow coverage criteria, including the “definition-computational use” and the “definition-predicate use” . To findthe closeness of test data to the target node in the corresponding data-flow graph, they added a value called closeness105

level (CL) to the fitness values of test data. The proposed hybrid approach combined the power of PSO and GA, resultingin more branch coverage. Moa [3] developed a harmony search (HS) algorithm by the aim of maximizing branch coverage.The fitness function was constructed by considering branch distance and branch weight where the latter was determinedby the nested level of each branch and its predicate type. The branch distance for each type of predicates was determinedand included in the fitness function formulation (see Table 1 in Section 3.1). By predicate type, we mean the kind of110

predicates to be evaluated which consists of two types of Boolean and arithmetic predicates [32]. Experimental resultsshowed that the coverage of HS was higher than SA and GA.

Various work has investigated the benefit of memetic algorithms for test data generation both empirically and theoret-ically. Wang and Jeng [26] were the first who developed a memetic algorithm (MA) by combining GA with hill climbing(HC) to generate test data for procedural functions. They showed that MA performed better than GA and HC when115

employed individually. Arcuri and Yao [27] applied the same algorithms as those in [26] for test data generation of objectoriented software and obtained the similar conclusions from their experiments. Liaskos and Roper [28] combined GA withan artificial immune system (AIS) algorithm as the local search operator. The authors empirically demonstrated that thehybridized approach outperformed GA; there were even cases that the combination with a simple local search operator ismore effective than the more sophisticated AIS algorithm. Torres-Jimenez and Rodriguez-Tello [29] combined GA with120

an SA-based algorithm as the local search operator for combinatorial testing. Harman and McMinn [33] conducted atheoretical analysis based on Holland’s well-known schema theory [34] to decide which type of search, global or local, isbest for a given type of test data generation problem. They also carried out a large-scale empirical study to validatetheir theoretical predictions. Their findings led to the consideration of hybrid global and local techniques. Fraser et al.[5] indicated that in order to form an effective test suite, small local changes should be applied on primitive values like125

integers or doubles such that some target structure is executed. They thus extended evolutionary testing into a memeticalgorithm by employing several local search operators. The superiority of their proposed hybrid approach in maximizingbranch coverage was demonstrated by extensive experiments over 12,000 java classes.

The effectiveness of global exploration combined with local exploitation not only has been demonstrated in softwaretest data generation but also in a variety of applications including machine learning and its associated problems [35, 36].130

For example, Qian et al. [35] developed an evolutionary bi-objective optimization method combined with a local searchoperator for ensemble pruning by aiming at maximizing the generalization prediction performance of an ensemble andminimizing the number of base learners selected for the prediction. From the theoretical point of view, it has beenanalytically shown that hybrid strategies are capable of approaching the global optimum in a dramatically shorter expectedrunning time than a single pure search strategy [37].135

One of the latest studies that has exploited a memetic algorithm to automate test data generation is the one carriedout by Mao et al. [25] who developed an ACO search-based algorithm equipped with a random local search operator.The work was the extension to the author’s prior work in [3] and was compared with three metaheuristics including SA,GA, and PSO. Experimental results showed that the memetic ACO was significantly better than SA and GA and wascomparable to PSO in generating test data.140

3

MANUSCRIP

T

ACCEPTED

ACCEPTED MANUSCRIPT

3. The basic ACO for test data generation

This study has been inspired most from Mao et al.’s ACO-based approach [25]. It is hence important to dedicate oneindividual section to describe their work. In [25], the authors reformed the basic ACO algorithm into a continuous versionfor structural test data generation where each ant’s position represented a test datum. The solutions corresponding tothe entire population thus form a test suite. At the beginning, ants are randomly distributed in the search space. For145

each ant, a continuous neighborhood with a constant radius was defined. To improve the algorithm’s searching abilityand generate more diverse test inputs, three procedures including local transfer, global transfer and pheromone updatetogether with a fitness function for branch coverage were defined and applied.

In the local transfer, each ant randomly moves within a neighborhood of a fixed radius predetermined with respectto the complexity of benchmarks. The ant first generates a neighbor solution randomly and then transfers to this new150

position if its fitness is better than the old one or else stays on its current position. This procedure continues for aprespecified number of times. In the global transfer, those ants whose corresponding fitness is lower than the average arerandomly displaced to a new position within the entire search space provided that a certain probability is met. Otherwise,ants undergo a search procedure called biased exploration. In this type of search, ants with larger volumes of pheromonehave higher probabilities to absorb other ants to search within their neighborhoods. The authors defined the pheromone155

structure as an array of real values the same size as the number of ants. In each iteration, the value of pheromonecorresponding to each ant is proportional to the quality of the solution it has produced and is updated accordingly.Finally, a fitness function was defined based on two criteria including the branch distance and branch weight which willbe described next.

3.1. Definition of fitness160

The so-called branch distance for a given branch predicate is the deviation from its trueness when the input values areassigned to the variables. Mao et. al [25] measured the branch distance according to the definitions in [3, 8, 10]. Table 1shows the value of the branch distance for each type of predicate. The value δ (δ > 0) in the table refers to a constantwhich is always added if the term is not true.

Given a program of s branches, Equation 1 represents the proposed fitness function for test inputXk ∈ TS (1 ≤ k ≤ m),165

where m is the number of test inputs in test suite TS:

fitness(Xk) = 1/[θ +

s∑i=1

(wi × f(bchi, Xk))]2 (1)

Where θ is a small constant which the authors set to 0.01 in their experiments; f(bchi, Xk) is the distance of branchbchi (1 ≤ i ≤ s) for test input Xk, and wi is the weight of branch bchi. Mao et al. [25] developed an analytical approachto calculate the weight of a branch according to its reachability. The proposed approach will be described in the next

section. It should be noted that

s∑i=1

wi = 1.170

Finally, they calculated the fitness of the whole test suite TS, comprising m test inputs, by using the following equation:

fitness TS = 1/[θ +s∑i=1

(wi ×min{f(bchi, Xk)}mk=1)]2 (2)

Mao et al. [25] argued that the best value of fitness TS at the end of the algorithm should be as close to 1/θ2 aspossible. In other words, the more is the branch coverage, the greater is the fitness. They used fitness TS to guide testinputs to cover all branches in a program.

Table 1: The branch distance for several kinds of branch predicates [3, 8, 10, 25].

No Predicate type Branch distance1 Boolean if true then 0 else δ2 ¬a Negation is propagated over a3 a = b if abs(a− b) = 0 then 0 else abs(a− b) + δ4 a 6= b if abs(a− b) 6= 0 then 0 else δ5 a < b if (a− b) < 0 then 0 else abs(a− b) + δ6 a ≤ b if (a− b) ≤ 0 then 0 else abs(a− b) + δ7 a > b if (a− b) > 0 then 0 else abs(a− b) + δ8 a ≥ b if (a− b) ≥ 0 then 0 else abs(a− b) + δ9 a and b f(a) + f(b)10 a or b Min(f(a), f(b))

4

MANUSCRIP

T

ACCEPTED

ACCEPTED MANUSCRIPT

3.2. Branch weight175

Mao et al. [25] assigned each branch a weight according to its reachability degree and enforced their ACO-based searchalgorithm to spend more effort on covering highly-weighted branches. They expressed the reachability degree of a branchby the following two factors: nested weight and predicate weight. Generally speaking, the deeper is the nested level of abranch, the harder is to reach it. Given a branch bchi (1 ≤ i ≤ s) whose nested level is nli , the nested weight of bchiwas measured according to the following formula:180

wn(bchi) =nli − nlmin + 1

nlmax − nlmin + 1(3)

Where nlmax and nlmin are the maximum and minimum nested levels of all branches, respectively. The branch nestedweight was then normalized according to Equation 4:

wn′(bchi) =wn(bchi)s∑j=1

wn(bchj)

(4)

In addition to the nested weight of a branch, the reachability degree of a branch (or, in other words, the difficulty tosatisfy that branch) also depends on its predicate weight. According to the semantics of a predicate statement, four typesof clauses were introduced. Table 2 shows a reference weight model for all kinds of clauses. The value of the reference185

weight for each clause type is proportional to the difficulty of satisfying its trueness. The harder is to make a conditiontrue, the higher is its assigned weight.

Since a branch predicate is normally a combining form (i.e., containing several clauses) in practice, Mao et al. [25]assumed that the branch predicate bchi (1 ≤ i ≤ s) contains h clauses. Then, for each clause like cj (1 ≤ j ≤ h), theydetermined its reference weight wr(cj) according to the corresponding condition type. Thus, the formula in Equation 5190

could be used to measure the weight wp(bchi) of the branch predicate bchi. According to this equation, if bchi is formedby combining h conditions via the and logical operator, its predicate weight is the square root of the sum of w2

r(cj) and ifthe operator is the logical or, the predicate weight is calculated as the minimum element in the set of wr(cj) (1 ≤ j ≤ h).

wp(bchi) =

{√∑hj=1 w

2r(cj) if conjunction (and) is used

min {wr(cj)}hj=1 if conjunction (or) is used(5)

Similar to the nested level, the predicate weight was normalized according to the following equation:

wp′(bchi) =wp(bchi)s∑j=1

wp(bchj)

(6)

Finally, for branch bchi (1 ≤ i ≤ s), they defined its weight wi as the combination of wn′(bchi) and wp′(bchi) based195

on Equation 7.

wi = λwn′(bchi) + (1− λ)wp′(bchi) (7)

Where λ is a balance coefficient which was set to 0.5 in the experiments.

4. The proposed memetic ACO

Given the ACO algorithm proposed by Mao et al. [25] as the base algorithm, our memetic ACO differs from it inthree main aspects. First, we reinforce the sensitivity in the local transfer of ants by employing (1+1)-ES via applying a200

Gaussian mutation operator rather than generating random moves within a given neighborhood. Second, the definitionand usage of the pheromone structure is totally different and is not included in the global moves of ants for carrying

Table 2: The reference weights of different clause types [3].

No Clause type Reference weight1 == 0.92 <,≤, >,≥ 0.63 Bolean 0.54 6= 0.2

5

MANUSCRIP

T

ACCEPTED

ACCEPTED MANUSCRIPT

out the so-called biased exploration; see Section 3. The reason is that this type of search reinforces search exploitationrather than exploration and thus is removed from the global search phase of our proposed ACO algorithm. Alternatively,the pheromone information is added to the fitness function to guide the construction of new solutions by the ants via205

discouraging them from revisiting those branches which were previously covered by the other ants.The third difference is the definition of an additional fitness function for the evaluation of candidate solutions. It is a

Boolean function, denoted as f1 in this paper, which outputs one if a given solution is successful in traversing at least ayet uncovered branch; otherwise, it returns zero. For those solutions whose Boolean function value is equal to one, thevalue of the other fitness function, denoted as f2 in this paper, is not considered by the ants for performing both their210

local and global moves (it should be noted that f2 is slightly different from the fitness function of [25] as it includes thepheromone information in addition to the two criteria related to the branch reachability). For the local moves of ES,those offspring whose f1 value is equal to one replace their parents irrespective of their f2 fitness value. Global movesare not also considered for those ants whose corresponding solution has the f1 value of one. The reason is that they havealready discovered a new region within the problem search space and no further global move is required.215

The test data generation problem addressed in this paper can be formally described as follows. Suppose that a SUT pconsists of n input variables represented by a vector X = (x1, x2, . . . , xn) where variable xi ranges in a continuous domainDi(1 ≤ i ≤ n) in the Euclidean space. Thus, the corresponding input domain of the whole program can be expressed asD = D1 ∗D2 ∗ . . . ∗Dn. Given the explanations presented in Section 1, branch coverage has been chosen as the coveragecriterion. The problem objective is to generate a test suite TS = {X} such that the branch coverage in the corresponding220

CFG is maximized as well as the speed for establishing 100% coverage is improved.Figure 1 depicts an overall view of the proposed approach. In this approach, the SUT is initially analyzed to extract

the static information including the SUT source code structure (like the number of program branches and the nestedlevel of each branch, the number of the SUT input parameters and their domains). Next, the SUT input parameters areinitialized with random values. The input parameters are encoded into the position vector of the ant colony. Each ant225

generates a test datum to cover a subtree of the corresponding CFG per iteration (note that the set of test data generatedby the entire population would form a test suite). The ant colony is evolved by conducting the following three procedures.First, the solution to each ant is followed by a number of local moves through applying (1+1)-ES and then a globalrandom move, if necessary. Second, the test data generated by ants after their local and global transfers are executeddynamically to compute branch coverage, and a report is generated including the calculation of the two fitness functions230

for each solution. In the third procedure, the pheromone trail (which is used in the calculation of f2) is updated byemploying the evaporation and reinforcement procedures. Afterwards, each ant resumes the solution construction processfor the next iteration. The above procedure continues until the full branch coverage is obtained or a maximum numberof iterations is reached. The final output is a test suite containing the test data that have covered the maximum numberof branches.235

Source code

Extract the static information

Initialize input vector

Execution result

Local search

Global search

Test suite

Dynamic execution

Pheromone update

Ant colony evolution

Branch coverage

report

Pheromone trail

Fitness functions

Figure 1: The test data generation process of our proposed approach.

At the beginning, m ants are distributed randomly in the search space. The location of ant k (1 ≤ k ≤ m) is representedby a continuous vector of Xk = (xk1, xk2, . . . , xkn) which is the input to the variables of the SUT. The neighborhood N

6

MANUSCRIP

T

ACCEPTED

ACCEPTED MANUSCRIPT

of ant k contains a set of points like Y = (y1, y2, . . . , yn) which satisfy the following formula:

N(Xk) = {Y | ‖Y −Xk‖ ≤ rmax} where, ‖Y −Xk‖ =√

(xk1 − y1)2 + (xk2 − y2)2 + . . .+ (xkn − yn)2 (8)

Where rmax is the maximum neighborhood radius initialized with a prespecified value and then changes dynamicallyaccording to a non-increasing monotonic function during the algorithm execution.240

Our memetic ACO is presented in the following four sections. Sections 4.1 and 4.2 describe the local and globaltransfer of ants, respectively. Section 4.3 describes the pheromone trail update. Finally, Section 4.4 introduces the twofitness functions applied in our proposed algorithm.

4.1. The proposed local transfer of ants

The local transfer of ants is carried out by employing evolution strategies (ES). ES is one subclass of evolutionary245

algorithms originally developed by Rechenberg and Schewefel [38, 39]. It is mostly applied to continuous optimizationproblems whose solution representation is based on a real-valued vector. The basic version of ES, (1+1)-ES, has apopulation of two individuals: the current point (parent) and the result of its mutation (offspring). In each iteration ofES, one parent is used to generate one offspring. The parent is replaced by its offspring if it is better; otherwise, theoffspring is disregarded [7].250

Here (1+1)-ES is used as a local search on the input data that each ant considers for the branch coverage. Thealgorithm exploits the neighborhood of each ant for a pre-determined number of iterations until it finds a neighborsolution whose f1 fitness function value becomes one or it acquires a higher value out of f2. In case a neighbor solution isfound that satisfies either of the above two conditions, the ant accepts the solution by moving to its position; otherwise,it stays in its current position. It should be noted that if the accepted solution has the f1 value of one, i.e., it covers at255

least one branch that has not been covered yet, the local search procedure stores its corresponding test datum in the bestsuite that the colony has found till then by replacing the current one.

Given Xi an individual represented by a vector of input variables to the SUT, the offspring is produced by applyingthe following mutation operator for each variable:

Xi

′= Xi +R (9)

where R = (r1, r2, . . . , rn) is a vector; element ri is a Gaussian random variable with mean 0 and variance rmax and260

n is the number of the problem inputs. As already stated, rmax represents the maximum neighborhood radius and ismonotonically decreased during the algorithm’s execution so that ants do distant local moves in early search stages tillconverge to carry out search exploitation with smaller neighborhoods.

4.2. The proposed global transfer of ants

To reinforce exploration in our memetic ACO, we probabilistically apply a global random move at the end of each265

iteration for those ants satisfying the following two conditions. First, their corresponding f1 Boolean value is equal tozero and second, their f2 fitness value is not greater than the average. The global move is not applied to those antswhose corresponding f1 value is one. The reason is that they have traversed at least a yet uncovered branch and alreadydiscovered a new region within the problem search space. The probability that the ant moves to a new randomly generatedposition is set by a prespecified threshold denoted as q0. For this purpose, a random number q in U [0, 1] is generated. In270

case it is lower than q0 (i.e., q < q0 ), the ant moves to the new position.

4.3. The proposed pheromone usage

Unlike [25] which defines the pheromone structure as an array of real values the same size as the number of ants, thepheromone structure defined in our work is an array of real values whose length is equal to the number of edges of thecorresponding CFG of a given SUT. The value of pheromone corresponding to each edge is proportional to the number275

of times its respective branch has been covered so far. The proposed definition, nonetheless, does not violate the generalfunctionality of pheromone defined in the literature of ant colonies. According to Talbi [40], the role of the pheromonetrails is to memorize the characteristics of “good” generated solutions, which will guide the construction of new solutionsby the ants. It represents the memory of the whole ant search process. Similarly, in this paper, the pheromone trails havebeen used to guide the construction of new solutions by the ants via discouraging them from revisiting those branches280

which were previously covered by the other ants. We have thus inserted the value of pheromone in the denominator of f2

(see Equation 13) to assign lower selection probabilities for more frequently covered branches than uncovered or barelyvisited ones. This can significantly enhance search exploration.

As already stated, the pheromone data structure is represented as an array whose length is equal to the number of theCFG edges. To each edge like j (1 ≤ j ≤ s), where s is the total number of edges in the CFG, a weight τ(j) representing285

its corresponding pheromone is assigned. The initial pheromone value for all edges is set to 1.

7

MANUSCRIP

T

ACCEPTED

ACCEPTED MANUSCRIPT

When each ant constructs a solution per iteration, the pheromone update routine is invoked. This normally consistsof two phases: evaporation and reinforcement. In the first phase the pheromone trail decreases automatically. Eachpheromone value is reduced by a fixed proportion according to the following formula:

τ(j) = (1− α)τ(j), 1 ≤ j ≤ s (10)

where α ∈ (0, 1] represents the reduction rate of the pheromone. The evaporation is used to avoid the entire population290

from a premature convergence by encouraging diversification in the search space [40].In the reinforcement phase the pheromone trail corresponding to those CFG edges covered by any ant k in the current

iteration is updated by adding a positive ∆∗ value. In this paper ∆∗ is equal to the fitness of the best solution obtainedin the current iteration.

τ(jk) = τ(jk) + ∆∗ (11)

where jk refers to a branch covered by ant k in the current iteration.295

4.4. Fitness functions

Two fitness functions are used for our memetic ACO algorithm. The first is a Boolean function which is particularlydefined to maximize branch coverage. Equation 12 represents our proposed Boolean function f1 for test input Xk ∈TS (1 ≤ k ≤ m), where m is the number of test inputs in test suite TS:

f1(Xk) =

{1 if at least a yet uncovered branch is traversed by Xk

0 otherwise(12)

The second fitness function is formulated according to the complexity of branches covered. The complexity is char-300

acterized by the branch reachability and quantified using two criteria including the branch distance and branch weightas applied in [25]. Our proposed function, however, slightly differs from the one in [25] as it incorporates the pheromonecorresponding to each branch of the CFG in the fitness calculation of each test input. This discourages ants from choosingthose mostly covered branches. Given a program of s branches, Equation 13 represents our proposed fitness function f2

for Xk ∈ TS (1 ≤ k ≤ m) as a test input:305

f2(Xk) = 1/[θ +

s∑i=1

(wi × f(bchi, Xk) + cki × τ(i))]2 (13)

Where θ is a small constant which, similar to [25], has been set to 0.01 in our experiments; f(bchi, Xk) is the distanceof branch bchi (1 ≤ i ≤ s) for Xk and is calculated according to the values listed in Table 1; wi is the weight of branchbchi and is computed by Equation 7. Lastly, cki is a Boolean flag indicating whether or not branch bchi is covered by Xk

and τ(i) is the pheromone assigned to branch bchi.Finally, the fitness of the whole test suite is calculated according to Equation 2. Given the information provided in310

the preceding sections, we can now describe our proposed memetic algorithm.

4.5. The proposed algorithm

Algorithm 1 presents our proposed memetic ACO algorithm in pseudo code. The list of parameters used in thedescription of the algorithm and its procedures has been given in Table 3. In the algorithm, the entire pheromone arrayis first initialized by a constant value, which has been set to one in this study (Lines 4-6). Second a position vector in the315

problem search space is randomly assigned to each ant by initializing the values of the input vector X within domain D.The ant positions are then evaluated using both fitness functions and the test data corresponding to the ant colony formthe initial value of the best test suite (Lines 7-13).

In the iterative procedure of our algorithm (Lines 15-28), we conduct a local search routine using (1+1)-ES on thecurrent location of each ant followed by a global move if applicable (Lines 16-18 and 19, respectively). Next, the pheromone320

trail is updated according to the branches covered by ants (Line 20). Finally, at the end of each iteration, the correspondingtest suite composed of the solutions to all ants (each solution has been decoded to a test datum) is evaluated according tothe percentage of branch coverage (Lines 21-24). The new test suite then replaces the best found test suite if it acquireseither a higher branch coverage or, in case of a tie, a greater total fitness (Lines 25-27). The total number of iterations issubject to either the maximum number of generations is reached or full coverage is achieved.325

8

MANUSCRIP

T

ACCEPTED

ACCEPTED MANUSCRIPT

Table 3: The parameters defined for the Memetic ACO algorithm and its procedures.

Parameter DescriptionX SUT input vectorn Number of SUT input parametersD SUT input domains Number of SUT branchesm Number of antsant[k].x[i] ith input parameter of ant[k]LS itr Maximum iterations of the local search procedurermax Maximum neighborhood radiusgen Current generationmaxGen Maximum generationτ Pheromone trailτ [j] Pheromone corresponding to branch jα Pheromone evaporation ratioq0 Control parameter for applying the global transferTS Current test suitebestTS Best output test suite

Algorithm 1 Proposed memetic ACO algorithm

1: Input:

(1) The corresponding CFG of SUT.

(2) SUT input parameters (according to Table 3).

(3) ACO input parameters (according to Table 3).

2: Output: best test suite bestTS satisfying branch coverage.3: Stage 1: Initialization4: for j = 1 : s do5: τ [j]⇐ 1;6: end for7: for k = 1 : m do8: for i = 1 : n do9: initialize randomly ant[k].x[i] for ant k;

10: end for11: calculate both fitness values of ant k;12: decode position ant[k].x[1 . . . n] to a test datum tdk and add it to bestTS;13: end for14: Stage 2: Iterative Search15: while gen ≤ maxGen and full coverage has not been achieved do16: for k = 1 : m do17: LocalSearch(k);18: end for19: GlobalSearch();20: UpdatePheromone();21: for k = 1 : m do22: decode position ant[k].x[1 . . . n] to a test datum tdk and add it to TS;23: end for24: collect the coverage information by executing SUT p using TS;25: if (TS.coverage > bestTS.coverage) or (TS.coverage = bestTS.coverage and TS.fitness > bestTS.fitness)

then26: bestTS ⇐ TS27: end if28: end while29: return bestTS;

9

MANUSCRIP

T

ACCEPTED

ACCEPTED MANUSCRIPT

Procedure 2 describes the local search process in pseudo code. Each ant generates a new position by employing(1+1)-ES (Line 5) for a prespecified number of iterations until it finds a neighbor solution whose f1 value becomes one(Lines 6-9) or it acquires a higher f2 value (Lines 10-13). In case a neighbor solution is found that satisfies either of theabove two conditions, the ant accepts the solution by moving to its position; otherwise, it stays in its current position. Itis important to note that if the accepted solution has the f1 value of one, i.e., it covers at least one branch that has not330

been covered yet, the best suite is updated with its corresponding test datum (Line 9).

Procedure 2 LocalSearch(k)

1: Flag=false;2: Count=0;3: while (Flag = false) and (Count < LS itr) do4: ant[k] generates a new position by employing (1+1)-ES algorithm;5: if f1(ant[k].x[1 . . . n]) = 1 then6: Flag=true;7: ant[k] moves to the new position;8: decode position ant[k].x[1 . . . n] to a test datum tdk and update bestTS;9: else if f2(ant[k].new x[1 . . . n]) > f2(ant[k].x[1 . . . n]) then

10: Flag=true;11: ant[k] moves to the new position;12: end if13: Count+ +;14: end while

Procedure 3 describes the global search process in pseudo code. A random global move is applied to any ant whosecorresponding f1 value is zero and its f2 value is not greater than the average value subject to the probability q (Line 6).Finally, both fitness values of ant k will be recomputed according to its current position (Line 7).

Procedure 3 GlobalSearch()

1: calculate the second fitness average value sec f avg of ant colony;2: for k = 1 : m do3: if f1(ant[k].x[1 . . . n]) = 0 and f2(ant[k].x[1 . . . n]) ≤ sec f avg then4: if U(0, 1) < q0 then5: randomly select a position in D for ant k;6: recalculate both fitness values of ant k according to its current position;7: end if8: end if9: end for

Procedure 4 describes the pheromone update process in pseudocode. First, all pheromones are evaporated by a335

constant factor (Lines 2-4). Next, in the reinforcement phase, the pheromones corresponding to those CFG edges coveredby ant k in the current iteration are updated (Lines 5-9).

Procedure 4 UpdatePheromone()

1: for j=1:s do2: τ(j) = (1− α)τ(j) //evaporation3: end for4: for k=1:m do5: for branch jk that covered by ant[k] in the current iteration do6: τ(jk) = τ(jk) + ∆∗ //reinforcement7: end for8: end for

5. Experimental results and analysis

The proposed algorithm was coded in MATLAB 2013 and run on a 64-bit Intel(R) CoreTM i5 2.30GHz CPU with4GB RAM. To comprehensively assess the performance of our extended ACO relative to the basic ACO in [25], three340

implementations are presented out of the proposed approach based on the following three main extensions:

10

MANUSCRIP

T

ACCEPTED

ACCEPTED MANUSCRIPT

• Incorporating (1+1)-ES for local moves of ants rather than pursuing them randomly.

• Introducing a different definition of pheromone trails and extending the fitness function of [25] by including thepheromone information.

• Defining an additional Boolean fitness function for the evaluation of candidate solutions.345

The first implementation which incorporates (1+1)-ES for local moves of ants in the basic ACO (rather than pursuingthem randomly) is denoted as ACO-ES. The second implementation includes the second and third extensions but carriesout the local moves of ants randomly. This is denoted as extended ACO or E-ACO in the experiments. Lastly, thethird implementation comprises all the three extensions and is denoted as extended ACO-ES or E-ACO-ES. These threeimplementations were compared with three other algorithms including GA [20], PSO [24] and ACO [25]. It is important to350

mention that GA, PSO, and ACO have been reimplemented in accordance with the descriptions provided in the respectivepapers.

5.1. Benchmark programs

Ten benchmark programs were used to evaluate our proposed algorithm. The details of these programs are presentedin Table 4. The first column gives the name of the program while the second to the fourth columns indicate the number of355

input arguments (i.e., #Arg), the number of branches (i.e., #Br) and the maximum nested level of the program structure(i.e., NLmax), respectively. The fifth column shows lines of code (LOC) in the program source code. The sixth columndescribes each program and the last column introduces a few references which used the given program in their experiments.

The first program, TriangleType, receives three integer numbers as inputs and decides what kind of triangle theyrepresent: equilateral, isosceles, scalene or no valid triangle. Program Line takes eight inputs, four of which represent the360

coordinates of a rectangle and the other four represent the coordinates of a line. The program determines the position ofthe line with respect to the position of the rectangle and generates one out of four possible outputs:

1) The line is completely inside the rectangle.

2) The line is completely outside the rectangle.

3) The line is partially covered by the rectangle.365

4) Error: The input values do not define a line and/or a rectangle.

The third program, CalDay, computes the day of the week according to a given date defined as three integer inputarguments representing the month, day and year. Program Complex gets six input arguments and calculates complexarithmetic functions consisting of complex predicate conditions with relational operators combined with complex ANDand OR conditions. The next program, Remainder, calculates the remainder of two integers. Program PrintCalendar is370

used to print the standard calendar of a month in some specific year. The seventh program, Schedule, schedules multiplejobs with different priorities on a single processor. Program Mcknap solves different variants of the knapsack problem.The ninth program, Number, calculates the number of days lying between two dates. Lastly, program Bessj computesBessel function of the general integer order.

It is important to note that the rationale behind choosing the benchmarks for this study is having deeper nested levels375

and complex branch predicate types, which according to [25], are good indicators of the branch reachability. This makesthem sound suitable for the assessment of algorithmic approaches that adopt branch coverage as the coverage criterion.

Table 4: The benchmark programs used for the experimental analysis.

Program #Arg #Br NLmax LOC Description Source

TriangleType 3 15 4 52 Type classification for a triangle [3, 5, 6, 25, 41]

Line 8 36 12 92 Check if a line overlaps a rectangle [3, 5]

CalDay 3 11 8 72 Calculate the day of the week [3, 5, 6, 25, 41]

Complex 6 24 4 74 Calculate complex arithmetic functions [5, 7]

Remainder 2 18 6 49 Calculate the remainder of an integer division [3, 5, 25]

PrintCalendar 2 28 2 187 Print calendar according to the input of year and month [3, 25]

Schedule 3 52 3 412 Schedule multiple jobs [42]

Mcknap 4 200 6 1620 Solve the knapsack problem [43]

Number 6 78 10 265 Calculate the number of days between two dates [44, 45]

Bessj 2 36 4 245 Bessel Jn function [3, 25]

11

MANUSCRIP

T

ACCEPTED

ACCEPTED MANUSCRIPT

5.2. Evaluation metrics

Our assessment process is based on two general criteria: the quality of the test suite and the data generation speed.The following four metrics are defined for comparing the performance of the six aforementioned algorithms (i.e., GA,380

PSO, ACO, ACO-ES, E-ACO and E-ACO-ES):

(1) Branch coverage (BC). This metric is our main evaluation criterion; it is an indicator of the test suite quality byshowing the percentage of branches covered by a given test suite.

(2) Average convergence speed (ACS). This metric is an indicator of the data generation speed. It is used when thecompared algorithms are able to reach full branch coverage within the given maximum iterations. ACS is assessedaccording to the following formula:

ACS =

(1−

∑ηk=1 itrk

η × itrmax

)× 100 (14)

Where itrk is the number of iterations in which the algorithm reaches full coverage in its kth run, itrmax is themaximum number of iterations per run and η is the number of program runs.385

(3) Success rate (SR). This metric is an indicator of the test suite quality and is defined according to [3]. SR indicates thepercentage that all branches can be covered by the generated test suite. Assume bbck to be a Boolean flag representingwhether or not the algorithm has achieved full coverage in its kth run. The value of one indicates full coverage andthe value of zero represents partial coverage. SR is assessed according to Equation 15 as follows:

SR =

∑ηk=1 bbckη

× 100 (15)

Where η is the number of program runs [3].

(4) Average termination time (ATT). This metric is an indicator of the data generation speed by measuring the averagetermination time of the compared algorithms for all η runs. Here, the termination criterion is defined as whether thealgorithm has achieved full coverage before the specified maximum number of iterations is met or the total numberof iterations has reached the maximum value. Given ttk as the termination time of the algorithm in its kth run, ATTis measured according to the following formula:

ATT =

∑ηk=1 ttkη

(16)

5.3. Experimental setup

Table 5 shows the values of the input test parameters required for our experiments. In order to analyze the sensitivityof the population size of the compared algorithms on the branch coverage rate, we have given the number of individualsin four ascending values and have considered differently for each benchmark. Here, an individual refers to a chromosome390

in GA, a particle in PSO and an ant in ACO. The values were identified after conducting some preliminary experimentsand, as it can be noticed from the table, they are good indicators of the number and reachability of branches in the givenbenchmarks. The maximum number of iterations was set to 1000 for six programs. For the three programs includingPrintCalendar, Schedule and Bessj, and program Mcknap, the number was reduced to 100 and 50, respectively, due totheir high execution times. The total number of runs per benchmark was set to 10 for each population size.395

Table 5: The input test parameters defined for the experimental setup.

ProgramNumber of Individuals (m)

itr∗max η∗∗m1 m2 m3 m4

TriangleType 4 6 8 10 1000 10Line 10 20 30 40 1000 10CalDay 5 10 15 20 1000 10Complex 5 10 15 20 1000 10Remainder 3 5 8 10 1000 10PrintCalendar 12 14 16 18 100 10Schedule 7 10 15 20 100 10Mcknap 3 5 8 10 50 10Number 10 20 30 40 1000 10Bessj 2 4 6 8 100 10∗itrmax: maximum number of iterations∗∗η: number of runs

12

MANUSCRIP

T

ACCEPTED

ACCEPTED MANUSCRIPT

Table 6: The parameter settings of the compared algorithms.

Algorithm Parameter Value

GASelection strategy Gambling rouletteCrossover probability pc 0.90Mutation probability pm 0.05

PSOInertia weight w Monotonically reduces from 1 to 0.2Acceleration constants c1 and c2 c1 = c2 = 2.05The maximum velocity Vmax It is assigned according to the ranges of program inputs

ACOMaximum radius rmax Monotonically reduces from 20 to 6Pheromone decay parameter α 0.3Pre-set slope T 1.0

ACO-ES

Adjustment coef. ϕ 0.5Threshold of global random search control q0 0.5Threshold of neighborhood transfer control ρ0 0.3Maximum local search iteration LS itr 10

E-ACO Maximum radius rmax Monotonically reduces from 20 to 6Pheromone decay parameter α 0.3

E-ACO-ES Threshold of global random search control q0 0.5Maximum local search iteration LS itr 10

Table 6 presents the list of parameters defined for each compared algorithm. The values were taken from [25] in whichthe authors performed a parameter sensitivity analysis to find the most proper parameter settings for their comparedalgorithms including GA, PSO, and ACO. Accordingly, we applied the same parameter settings as the basic ACO to ACO-ES, E-ACO and E-ACO-ES. Our preliminary experiments also verified the appropriateness of the parameter values relativeto each benchmark. We, nevertheless, conducted independent experiments to tune the value of LS itr, the maximum400

number of local search iterations, for our compared memetic ACO-based algorithms (i.e., ACO, ACO-ES, E-ACO andE-ACO-ES) and the value was set to 10. We also considered the maximum neighborhood radius rmax variable duringthe execution and monotonically decreased it from 20 to 6. The six compared algorithms were then executed in identicalconditions subject to the given parameter values. It should also be noted that the three compared algorithms includingGA, PSO, and ACO use the fitness function defined in [25] for the evaluation of candidate solutions; see Equation 1.405

5.4. Results and discussion

Figure 2 presents the average best BC (in percentage) for the six algorithms after being executed 10 runs on the tenbenchmarks. The results are relative to the number of individuals defined in Table 5. It can be generally noticed fromthe figure that, as the population size increases, the final test suite produced by the algorithms covers more number ofbranches. This is expected since, by employing a higher number of solutions, the number of test data generated to cover410

the branches will be more as well. Nevertheless, it is readily observed from the figure that our proposed extensions of ACOexcluding ACO-ES (i.e., E-ACO and E-ACO-ES) are more successful in directing the search behavior of the population(here, ants) in terms of exploitation and exploration as both are able to reach full coverage using a fewer number of antsfor most benchmarks. For example, for program Complex, both E-ACO and E-ACO-ES could obtain 100% coverage withonly 10 ants whereas the remaining four (including ACO-ES) failed to establish full coverage with the same number of415

individuals; the highest rate recorded belonged to ACO-ES that achieved 86% coverage.The reason why the performance of ACO-ES is not comparable to that of E-ACO and E-ACO-ES is that the former

exactly employs the global search of the basic ACO and unlike the other two, does not benefit from the strategies proposedfor augmenting search exploration. Recall from Section 3 that in the global search of the basic ACO, ants carry out theso-called biased exploration in which those with larger volumes of pheromone have higher probabilities to absorb other ants420

to search within their neighborhoods. This type of search, however, reinforces search exploitation rather than explorationand thus was not employed in the global search phase of our proposed memetic algorithm. Effective exploitation requireseffective exploration of the search space. Put differently, making strongly-oriented local moves, as done by (1+1)-ES,from non-promising regions would result in entrapment in local optima. On the other hand, the co-employment of ourproposed exploitation and exploration strategies in E-ACO-ES significantly affected branch coverage. As can be seen in425

Figure 2, E-ACO-ES ranked first at establishing full coverage or maximizing BC for all benchmarks compared to the otheralgorithms.

It is important to consider the performance of E-ACO and E-ACO-ES relative to each other. As Figure 2 indicates,E-ACO is comparable with E-ACO-ES for most of the programs (even so, for some benchmarks like PrintCalendar, E-ACO-ES is significantly better than E-ACO). This implies that the proposed usage of the pheromone information together430

with the employed Boolean fitness function are more influential in increasing branch coverage than using (1+1)-ES tocarry out local moves.

13

MANUSCRIP

T

ACCEPTED

ACCEPTED MANUSCRIPT

0

10

20

30

40

50

60

70

80

90

100

m1 m2 m3 m4

bra

nch

co

vera

ge p

erce

nt

population size

TriangleType

GA PSO ACO ACO-ES E-ACO-ES E-ACO

(a)

0

10

20

30

40

50

60

70

80

90

100

m1 m2 m3 m4

bra

nch

co

vera

ge p

erce

nt

population size

Line


(b)

0

10

20

30

40

50

60

70

80

90

100

m1 m2 m3 m4

bra

nch

co

vera

ge p

erce

nt

population size

CalDay


(c)

0

10

20

30

40

50

60

70

80

90

100

m1 m2 m3 m4

bra

nch

co

vera

ge p

erce

nt

population size

Complex


(d)

0

10

20

30

40

50

60

70

80

90

100

m1 m2 m3 m4

bra

nch

co

vera

ge p

erce

nt

population size

Remainder


(e)

0

10

20

30

40

50

60

70

80

90

100

m1 m2 m3 m4

bra

nch

co

vera

ge p

erce

nt

population size

PrintCalendar


(f)

14

MANUSCRIP

T

ACCEPTED

ACCEPTED MANUSCRIPT

0

10

20

30

40

50

60

70

80

90

100

m1 m2 m3 m4

bra

nch

co

vera

ge p

erce

nt

population size

Schedule


(g)

0

10

20

30

40

50

60

70

80

90

100

m1 m2 m3 m4

bra

nch

co

vera

ge p

erce

nt

population size

Mcknap


(h)

0

10

20

30

40

50

60

70

80

90

100

m1 m2 m3 m4

bra

nch

co

vera

ge p

erce

nt

population size

Number


(i)

0

10

20

30

40

50

60

70

80

90

100

m1 m2 m3 m4

bra

nch

co

vera

ge p

erce

nt

population size

Bessj


(j)

Figure 2: The average best BC for GA, PSO, ACO, ACO-ES, E-ACO-ES, and E-ACO for ten benchmarks relative to the population size.

Figures 3 and 4 depict the average convergence speed (ACS) and success rate (SR) of the six compared algorithmsrelative to each benchmark, respectively. Note that the algorithms were executed 10 times per benchmark for each numberof individuals defined in Table 5 (i.e., in total, 40 runs per benchmark). The average results were recorded afterwards435

just to indicate whether each algorithm has been able to achieve full coverage irrespective of the number of individuals.According to both figures, E-ACO-ES is the most successful as it achieves 100% coverage for at least one time for eightprograms excluding Schedule and Mcknap. It should be reminded that both ACS and SR are counted to be zero ifthe given algorithm cannot reach full coverage for any occasion. Furthermore, E-ACO-ES exhibits significantly a higherconvergence speed relative to ACO-ES and E-ACO for the programs where all the three reach 100% coverage. This implies440

that (1+1)-ES could reduce the number of local moves per iteration of the algorithm if accompanied with an effectiveglobal search. (1+1)-ES establishes strong locality around the current position of each ant via applying a Gaussianmutation operator rather than generating random moves within a given neighborhood. The main reason why none of thealgorithms were successful to achieve full coverage for programs Schedule and Mcknap is that, in these programs, somebranches are executed only when an exception like a memory allocation or an I/O error occurs during the execution. The445

generated test suites, however, did not produce such errors and full coverage was not obtained accordingly.

15

MANUSCRIP

T

ACCEPTED

ACCEPTED MANUSCRIPT

0

20

40

60

80

100

aver

age

con

verg

ence

sp

eed

per

cen

t

benchmark

Convergence Speed


Figure 3: Average Convergence Speed (ACS) of GA, PSO, ACO, ACO-ES, E-ACO-ES, and E-ACO for the ten benchmarks.

0

10

20

30

40

50

60

70

80

90

100

succ

ess

rate

per

cen

t

benchmark

Success Rate


Figure 4: Success rate (SR) of GA, PSO, ACO, ACO-ES, E-ACO-ES, and E-ACO for the ten benchmarks.

Finally, Table 7 presents the average termination time of the six algorithms for each benchmark. According to thetable, GA and PSO were the fastest ones even though they failed to reach full coverage for most of the programs (GAtotally failed and PSO could achieve full coverage only for TriangleType and Remainder). They converge considerably fastbecause, in contrast to the four memetic ACO-based algorithms, they do not apply any local search on each individual450

of their population per iteration. In other words, they are purely global search methods. Comparing E-ACO-ES withthe other three ACO-based algorithms, we notice that the former has shorter or comparable termination times except forthose programs which failed to establish 100% coverage, i.e., Schedule and Mcknap. This is mainly due to the fact thatE-ACO-ES reaches full coverage in fairly earlier iterations than the other three and, as the termination criterion is met,terminates its execution earlier. It is good to notice that the longest termination times for Schedule and Mcknap programs455

belong to E-ACO and E-ACO-ES. This is because both algorithms were not successful in obtaining 100% coverage and,similar to the other four algorithms, their termination lasted until the maximum number of iterations was met. Asevaluating the Boolean fitness function incurs computational overhead on the two algorithms, their total terminationtime turned out to be longer than ACO and ACO-ES. Lastly, by comparing E-ACO and E-ACO-ES with each other, we

16

MANUSCRIP

T

ACCEPTED

ACCEPTED MANUSCRIPT

Table 7: Average termination time (in seconds) of GA, PSO, ACO, ACO-ES, E-ACO-ES, and E-ACO.

GA PSO ACO ACO-ES E-ACO-ES E-ACOTriangleType 2.933051 2.176954 18.23634 17.27594 12.43399 15.51939Line 6.913065 7.532995 110.6681 135.1519 80.90775 74.84737CalDay 3.081341 3.03521 23.53711 30.90698 11.94048 16.04251Complex 3.30275 4.050081 44.87702 46.99228 11.67405 11.87476Remainder 2.437161 1.763959 10.52909 9.146827 4.871038 8.090522PrintCalendar 14.87442 15.23126 504.4945 274.6504 90.89653 334.2278Schedule 7.69451 19.28058 332.4113 353.3098 584.2267 545.9263Mcknap 116.0104 146.9123 573.518 535.1421 779.9375 809.3036Number 16.75473 17.39319 239.5524 367.7383 284.2515 322.8475Bessj 0.4425 0.337911 1.502076 1.658515 1.307652 1.673308

observe that the latter usually converges faster. The main reason lies in the positive impact of (1+1)-ES on reducing460

the number of local moves which we already discussed in the preceding paragraphs. In other words, (1+1)-ES has beensuccessful in generating offspring whose corresponding test input could cover at least a yet uncovered branch for mostE-ACO-ES iterations.

5.5. Statistical analysis

To statistically be confident in our conclusions derived from the empirical results, we conducted ANOVA (ANalysis Of465

VAriance) test on the results of BC (as the major evaluation criterion) obtained from the six compared algorithms. Table8 presents the ANOVA test for our best proposed algorithm (i.e., E-ACO-ES) relative to the three existing algorithmsincluding GA, PSO and ACO, respectively. We can readily observe from this table that E-ACO-ES significantly performsbetter for all programs. The p-values are all less than 0.05. To further scrutinize the superiority of E-ACO-ES relative tothe ACO algorithm developed in [25], we carried out another ANOVA test in terms of the number of ants employed for test470

generation. This is shown in Table 9. It is strikingly interesting that E-ACO-ES significantly operates better than ACOfor all benchmarks excluding Bessj by using the least number of ants (even so, E-ACO-ES outperforms ACO for Bessj ifthe number of ants increases from two to only six). As stated in Section 5.4, E-ACO-ES benefits from outstandingly ahigher utilization of ants to generate test suites for acquiring maximum coverage than the basic ACO. Finally, Table 10presents our last ANOVA test to statistically analyze the performance of the ACO-based algorithms with one another.475

According to the table, we can draw the following conclusions:

• The extension of the basic ACO to ACO-ES and E-ACO could significantly improve its results for the majority ofthe programs.

• The simultaneous incorporation of the above two extensions (i.e., ACO-ES and E-ACO) to form E-ACO-ES coulddramatically improve the results of the basic ACO for all benchmarks. This implies that our proposed hybrid480

exploitation and exploration strategies in the resulting memetic algorithm has been outstandingly effective.

• Evolution strategies, being a vigorous local search operator, could effectively orient the local movements of antsaround their current position. Nevertheless, as discussed in Section 5.4, lack of an effective exploration mechanismundermines a powerful exploitation. This is apparent when we compare ACO-ES with E-ACO-ES in Table 10. Dueto a better exploration strategy in E-ACO-ES, this algorithm outperforms ACO-ES for more than half of the test485

programs.

The stability analysis of the compared algorithms in achieving full coverage after executing them for 40 times areshown in Figure 5 (the algorithms were executed 10 times per benchmark for each number of individuals defined in Table5, i.e., in total, 40 runs per benchmark). According to the figure, E-ACO-ES is more robust in establishing full coverage;see the position of the median (i.e., the red line) in each box plot. We can also observe that the worst values of BC for490

E-ACO-ES are always better than those for GA, PSO and ACO for all benchmarks. The worst results of E-ACO-ESare even better than the median of GA, PSO and ACO for eight programs excluding Remainder and Bessj. Also, bycomparing E-ACO-ES with ACO-ES and E-ACO, we notice that the co-incorporation of our proposed exploitation andexploration strategies has reasonably contributed to the robustness of E-ACO-ES.

17

MANUSCRIP

T

ACCEPTED

ACCEPTED MANUSCRIPT

Table 8: The ANOVA test on average branch coverage (BC) for E-ACO-ES vs. GA, PSO and ACO at the 0.05 significant level.

ProgramE-ACO-ES(x) vs. GA(y) E-ACO-ES(x) vs. PSO(y) E-ACO-ES(x) vs. ACO(y)x-y(%) p-value x-y(%) p-value x-y(%) p-value

TriangleType 38.999 3.849E-25 11.4985 0.001 10.16675 0.003Line 40.90175 1.1421E-50 20.41575 6.9016E-19 24.02625 3.005E-24CalDay 19.37475 2.2507E-15 16.04150 2.1481E-11 11.6650 6.4519E-7Complex 35.5670 1.1992E-62 26.02175 1.0659E-42 20.4545 1.3209E-30Remainder 20.83150 8.5779E-10 11.80425 0.000355 11.38750 0.001PrintCalendar 13.0360 1.8032E-41 12.7680 2.4649E-40 8.39075 5.4311E-22Schedule 17.1150 9.0764E-31 18.220 1.2145E-33 14.4205 6.9586E-24Mcknap 3.2875 7.7703E-25 3.3125 1.4262E-25 1.6375 1.728E-8Number 32.310 1.474E-31 27.1485 1.9973E-24 11.9240 9.6174E-7Bessj 25.2080 6.8353E-17 8.40375 0.003 5.557 0.048

Table 9: The ANOVA test on average branch coverage (BC) for E-ACO-ES vs. ACO relative to the number of ants at the 0.05 significant level.

Programm1 m2 m3 m4

x-y(%) p-value x-y(%) p-value x-y(%) p-value x-y(%) p-valueTriangleType 10.6675 0.003 8.0010 0.104 10.666 0.048 8.00100 0.000358Line 18.054 1.2914E-9 30.831 1.6733E-26 24.4440 6.7553E-14 22.7760 1.1643E-19CalDay 14.9950 3.8438E-10 11.66900 3.456E-7 11.6660 5.9508E-9 8.3300 0.000001Complex 20.4545 1.3209E-30 23.6370 1.024E-21 21.3650 1.5928E-18 20.0010 2.985E-16Remainder 11.38750 0.001 18.8850 0.000003 15.001 0.000008 2.22400 0.280PrintCalendar 8.39075 5.4311E-22 10.35400 2.4992E-12 7.8540 2.4505E-9 4.6410 0.000105Schedule 14.4205 6.9586E-24 18.8430 1.0061E-12 11.3430 7.7615E-11 8.6510 4.5367E-17Mcknap 1.6500 0.000162 2.0500 9.8291E-11 1.600 0.000004 1.250 0.003Number 11.9240 9.6174E-7 9.745 0.026 11.280 0.000082 18.0780 1.0518E-15Bessj 6.1140 0.155 4.7250 0.189 6.3870 0.000427 28.6120 9.6694E-9

Table 10: The ANOVA test on average branch coverage (BC) for ACO-based algorithms at the 0.05 significant level.

ProgramACO-ES(x) vs. ACO(y) E-ACO(x) vs. ACO(y) E-ACO-ES(x) vs. ACO-ES(y) E-ACO-ES(x) vs. E-ACO(y)x-y(%) p-value x-y(%) p-value x-y(%) p-value x-y(%) p-value

TriangleType 8.000 0.017 8.50025 0.012 2.16675 0.517 1.6665 0.618Line 8.54300 0.000068 15.000 1.3163E-11 15.48325 3.3084E-12 9.02625 0.000027CalDay 4.37625 0.056 9.79075 0.000026 7.28875 0.002 1.87425 0.412Complex 5.9090 0.000148 20.22725 4.087E-30 14.5455 2.7093E-18 0.22725 0.882Remainder 7.08275 0.031 7.63800 0.020 4.30475 0.188 3.74950 0.251PrintCalendar 2.6785 0.001 1.96475 0.013 5.71225 5.0244E-12 6.4260 1.6961E-14Schedule 2.2105 0.085 13.84425 1.893E-22 12.210 1.7168E-18 0.57625 0.652Mcknap 0.6750 0.017 1.4625 4.012E-7 0.9625 0.001 0.1750 0.533Number 8.9755 0.000193 10.03275 0.000033 2.9485 0.215 1.89125 0.425Bessj 2.77725 0.321 4.02875 0.151 2.77975 0.321 1.52825 0.585

18

MANUSCRIP

T

ACCEPTED

ACCEPTED MANUSCRIPT

algorithmE-ACOE-ACO-ESACO-ESACOPSOGA

bran

ch c

over

age

perc

ent

100

80

60

40

20

0

TriangleType

Page 1

(a)


bran

ch c

over

age

perc

ent

100

80

60

40

20

0

2

Line

Page 1

(b)


bran

ch c

over

age

perc

ent

100

80

60

40

20

0

CalDay

Page 1

(c)


bran

ch c

over

age

perc

ent

100

80

60

40

20

0

Complex

Page 1

(d)


bran

ch c

over

age

perc

ent

100

80

60

40

20

0

Remainder

Page 1

(e)


bran

ch c

over

age

perc

ent

100

80

60

40

20

0

PrintCalendar

Page 1

(f)

19

MANUSCRIP

T

ACCEPTED

ACCEPTED MANUSCRIPT


bran

ch c

over

age

perc

ent

100

80

60

40

20

0

Schedule

Page 1

(g)


bran

ch c

over

age

perc

ent

100

80

60

40

20

0

MCknap

Page 1

(h)


bran

ch c

over

age

perc

ent

100

80

60

40

20

0

Number

Page 1

(i)


bran

ch c

over

age

perc

ent

100

80

60

40

20

0

Bessj

Page 1

(j)

Figure 5: Stability analysis on the branch coverage.

6. Conclusions and future research directions495

In this paper, we presented an efficient memetic ant colony optimization algorithm to generate effective test data.Three main innovations were proposed and developed in the paper. First, we employed (1+1)-ES to carry out the localmoves of ants rather than pursuing them randomly. The experimental results demonstrated that the idea significantlyexpedited the convergence to full coverage. Second, we defined a different usage of the pheromone information and appliedit in the calculation of the fitness of candidate solutions. Third, we introduced an additional Boolean fitness function500

mainly to maximize the branch coverage. The experimental results indicated that the last two ideas considerably improvedthe branch coverage and, for eight (out of 10) benchmarks, the memetic ACO could achieve full coverage. In order tocarry out a careful experimental analysis, we introduced three implementations of our proposed approach denoted asACO-ES, E-ACO and E-ACO-ES and compared them with three existing algorithms based on GA, PSO and ACO inthe related work. We observed that E-ACO-ES exhibited the best performance and was more successful in directing the505

search behavior of the ant population in terms of exploitation and exploration, as it was capable of reaching full coverageusing a fewer number of ants for most benchmarks. We also conducted further statistical analysis to demonstrate 1) thesuperiority of our proposed memetic ACO relative to the compared algorithms in terms of branch coverage and 2) itsrobustness in acquiring high coverage rates.

As our future work, the effectiveness of the proposed approach in establishing the full branch coverage should be510

evaluated for real-world programs. Furthermore, other criteria that better represent the dynamics of the SUT should beadopted and efficient fitness functions should be developed accordingly. Lastly and most importantly, is the extension ofour approach to support mixed data types including String or Object.

7. References

[1] C. C. Michael, G. McGraw, M. A. Schatz, Generating software test data by evolution, IEEE transactions on software515

engineering 27 (12) (2001) 1085–1110.

20

MANUSCRIP

T

ACCEPTED

ACCEPTED MANUSCRIPT

[2] P. R. Srivatsava, B. Mallikarjun, X.-S. Yang, Optimal test sequence generation using firefly algorithm, Swarm andEvolutionary Computation 8 (2013) 44–53.

[3] C. Mao, Harmony search-based test data generation for branch coverage in software structural testing, Neural Com-puting and Applications 25 (1) (2014) 199–216.520

[4] G. Gay, M. Staats, M. Whalen, M. P. Heimdahl, The risks of coverage-directed test case generation, IEEE Transac-tions on Software Engineering 41 (8) (2015) 803–819.

[5] G. Fraser, A. Arcuri, P. McMinn, A memetic algorithm for whole test suite generation, Journal of Systems andSoftware 103 (2015) 311–327.

[6] D. Xu, W. Xu, M. Kent, L. Thomas, L. Wang, An automated test generation technique for software quality assurance,525

IEEE Transactions on Reliability 64 (1) (2015) 247–268.

[7] J. Wegener, A. Baresel, H. Sthamer, Evolutionary test environment for automatic structural testing, Informationand Software Technology 43 (14) (2001) 841–854.

[8] N. Tracey, J. Clark, K. Mander, McDermi, An automated framework for structural test-data generation, in: Auto-mated Software Engineering, 1998. Proceedings. 13th IEEE International Conference on, IEEE, 1998, pp. 285–288.530

[9] J. J. Chilenski, S. P. Miller, Applicability of modified condition/decision coverage to software testing, SoftwareEngineering Journal 9 (5) (1994) 193–200.

[10] P. Thvenod-Fosse, H. Waeselynck, STATEMATE applied to statistical software testing, in: ACM SIGSOFT SoftwareEngineering Notes, Vol. 18, ACM, 1993, pp. 99–109.

[11] P. Ammann, J. Offutt, Introduction to software testing, Cambridge University Press, 2008.535

[12] P. R. Srivastava, K. Baby, Automated software testing using metaheuristic technique based on an ant colony opti-mization, in: Electronic System Design (ISED), 2010 International Symposium on, IEEE, 2010, pp. 235–240.

[13] K. Ayari, S. Bouktif, G. Antoniol, Automatic mutation test input data generation via ant colony, in: Proceedings ofthe 9th annual conference on Genetic and evolutionary computation, ACM, 2007, pp. 1074–1081.

[14] N. Mansour, M. Salame, Data generation for path testing, Software Quality Journal 12 (2) (2004) 121–136.540

[15] L. A. Clarke, A system to generate test data and symbolically execute programs, IEEE Transactions on softwareengineering (3) (1976) 215–222.

[16] S. Ali, L. C. Briand, H. Hemmati, R. K. Panesar-Walawege, A systematic review of the application and empiricalinvestigation of search-based test case generation, IEEE Transactions on Software Engineering 36 (6) (2010) 742–762.

[17] E. Dı́az, J. Tuya, R. Blanco, J. J. Dolado, A tabu search algorithm for structural software testing, Computers &545

Operations Research 35 (10) (2008) 3052–3072.

[18] M. Harman, P. McMinn, J. T. De Souza, S. Yoo, Search based software engineering: Techniques, taxonomy, tutorial,in: Empirical software engineering and verification, Springer, 2012, pp. 1–59.

[19] A. J. Simons, Jwalk: a tool for lazy, systematic testing of java classes by design introspection and user interaction,Automated Software Engineering 14 (4) (2007) 369–418.550

[20] A. Gursaran, Program test data generation branch coverage with genetic algorithm: Comparative evaluation ofa maximization and minimization approach, International Journal of Software Engineering and Applications 3 (1)(2012) 207–218.

[21] G. Fraser, A. Arcuri, Evosuite: automatic test suite generation for object-oriented software, in: Proceedings of the19th ACM SIGSOFT symposium and the 13th European conference on Foundations of software engineering, ACM,555

2011, pp. 416–419.

[22] R. P. Pargas, M. J. Harrold, R. R. Peck, Test-data generation using genetic algorithms, Software Testing Verificationand Reliability 9 (4) (1999) 263–282.

[23] E. Dı́az, J. Tuya, R. Blanco, Automated software testing using a metaheuristic technique based on tabu search,in: Automated Software Engineering, 2003. Proceedings. 18th IEEE International Conference on, IEEE, 2003, pp.560

310–313.

21

MANUSCRIP

T

ACCEPTED

ACCEPTED MANUSCRIPT

[24] A. Windisch, S. Wappler, J. Wegener, Applying particle swarm optimization to software testing, in: Proceedings ofthe 9th annual conference on Genetic and evolutionary computation, ACM, 2007, pp. 1121–1128.

[25] C. Mao, L. Xiao, X. Yu, J. Chen, Adapting ant colony optimization to generate test data for software structuraltesting, Swarm and Evolutionary Computation 20 (2015) 23–36.565

[26] H.-C. Wang, B. Jeng, C.-M. Chen, Structural testing using memetic algorithm, in: Proceedings of the Second TaiwanConference on Software Engineering, 2006.

[27] A. Arcuri, X. Yao, A memetic algorithm for test data generation of object-oriented software, in: EvolutionaryComputation, 2007. CEC 2007. IEEE Congress on, IEEE, 2007, pp. 2048–2055.

[28] K. Liaskos, M. Roper, Hybridizing evolutionary testing with artificial immune systems and local search, in: Software570

Testing Verification and Validation Workshop, 2008. ICSTW’08. IEEE International Conference on, IEEE, 2008, pp.211–220.

[29] E. Rodriguez-Tello, J. Torres-Jimenez, Memetic algorithms for constructing binary covering arrays of strength three.,in: Artificial Evolution, Springer, 2009, pp. 86–97.

[30] M. Harman, P. McMinn, A theoretical & empirical analysis of evolutionary testing and hill climbing for structural575

test data generation, in: Proceedings of the 2007 international symposium on Software testing and analysis, ACM,2007, pp. 73–83.

[31] S. Singla, D. Kumar, H. Rai, P. Singla, A hybrid PSO approach to automate test data generation for data flowcoverage with dominance concepts, International Journal of Advanced Science and Technology 37 (2011) 15–26.

[32] B. Korel, Automated test data generation for programs with procedures, in: ACM SIGSOFT Software Engineering580

Notes, Vol. 21, ACM, 1996, pp. 209–215.

[33] M. Harman, P. McMinn, A theoretical and empirical study of search-based testing: Local, global, and hybrid search,IEEE Transactions on Software Engineering 36 (2) (2010) 226–247.

[34] C. R. Reeves, J. E. Rowe, Genetic algorithms principles and presentation, a guide to GA theory (2002).

[35] C. Qian, Y. Yu, Z.-H. Zhou, Pareto ensemble pruning., in: AAAI, 2015, pp. 2935–2941.585

[36] C. Qian, J.-C. Shi, K. Tang, Z.-H. Zhou, Constrained monotone k-submodular function maximization using multi-objective evolutionary algorithms with theoretical guarantee, IEEE Transactions on Evolutionary Computation.

[37] C. Qian, K. Tang, Z.-H. Zhou, Selection hyper-heuristics can provably be helpful in evolutionary multi-objectiveoptimization, in: International Conference on Parallel Problem Solving from Nature, Springer, 2016, pp. 835–846.

[38] I. Rechenberg, Optimierung technischer Systeme nach Prinzipien der biologischen Evolution (The English translation590

of the title is Evolution strategy - Optimization of Technical Systems according to Principles of Biological Evolution),JSTOR, 1970.

[39] I. Rechenberg, Cybernetic solution path of an experimental problem, Evolutionary Computation: The Fossil Record(1998 (First published in 1964)) 301–310.

[40] E.-G. Talbi, Metaheuristics: from design to implementation, Vol. 74, John Wiley & Sons, 2009.595

[41] E. Alba, J. F. Chicano, Software testing with evolutionary strategies, in: International Workshop on Rapid Integrationof Software Engineering Techniques, Springer, 2005, pp. 50–65.

[42] T. Ostrand, Software-artifact infrastructure repository: Object downloads.URL http://sir.unl.edu/php/showfiles.php

[43] H. Liu, F.-C. Kuo, D. Towey, T. Y. Chen, How effectively does metamorphic testing alleviate the oracle problem?,600

IEEE Transactions on Software Engineering 40 (1) (2014) 4–22.

[44] J. Miller, M. Reformat, H. Zhang, Automatic test data generation using genetic algorithm and program dependencegraphs, Information and Software Technology 48 (7) (2006) 586–605.

[45] R. J. Abbott, Program design by informal english descriptions, Communications of the ACM 26 (11) (1983) 882–894.

22

http://sir.unl.edu/php/showfiles.php

http://sir.unl.edu/php/showfiles.php

本文献由“学霸图书馆-文献云下载”收集自网络，仅供学习交流使用。

学霸图书馆（www.xuebalib.com）是一个“整合众多图书馆数据库资源，

提供一站式文献检索和下载服务”的24 小时在线不限IP

图书馆。

图书馆致力于便利、促进学习与科研，提供最强文献下载服务。

图书馆导航：

图书馆首页文献云下载图书馆入口外文数据库大全疑难文献辅助工具

http://www.xuebalib.com/cloud/

http://www.xuebalib.com/

http://www.xuebalib.com/cloud/


http://www.xuebalib.com/vip.html

http://www.xuebalib.com/db.php

http://www.xuebalib.com/zixun/2014-08-15/44.html


structural test data generation using a memetic ant colony...

Documents