the multi-objective next release...

8
The Multi-Objective Next Release Problem Yuanyuan Zhang King’s College London Strand, London WC2R 2LS, UK [email protected] Mark Harman King’s College London Strand, London WC2R 2LS, UK [email protected] S. Afshin Mansouri King’s College London Strand, London WC2R 2LS, UK [email protected] ABSTRACT This paper is concerned with the Multi-Objective Next Re- lease Problem (MONRP), a problem in search-based re- quirements engineering. Previous work has considered only single objective formulations. In the multi-objective formu- lation, there are at least two (possibly conflicting) objectives that the software engineer wishes to optimize. It is argued that the multi-objective formulation is more realistic, since requirements engineering is characterised by the presence of many complex and conflicting demands, for which the software engineer must find a suitable balance. The paper presents the results of an empirical study into the suitability of weighted and Pareto optimal genetic algorithms, together with the NSGA-II algorithm, presenting evidence to sup- port the claim that NSGA-II is well suited to the MONRP. The paper also provides benchmark data to indicate the size above which the MONRP becomes non–trivial. Categories and Subject Descriptors D.2.1 [SOFTWARE ENGINEERING]: Requirements/ Specifications—Methodologies General Terms Algorithms, Measurement, Performance, Experimentation. Keywords Pareto optimality, next release problem, multi-objective ge- netic algorithms 1. INTRODUCTION In this paper we address the Multi-Objective Next Release Problem (MONRP), in which a set of customers with vary- ing requirements are targeted for the next release of an ex- isting software system. Satisfying each requirement entails spending a certain amount of resources which can be trans- lated into cost terms. In addition, satisfying each require- Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. GECCO’07, July 7–11, 2007, London, England, United Kingdom. Copyright 2007 ACM 978-1-59593-697-4/07/0007 ...$5.00. ment provides some value to the software development com- pany. The problem is to select the set of requirements that maximize total value and minimize required cost. These objectives are conflicting, so it would be difficult to assign weights to each. This makes it inappropriate to combine the two objectives into a single fitness function. Rather, a multi-objective formulation is better suited to this problem. The single objective problem to maximize the value, sub- ject to a limited amount of resources (a fixed budget con- straint), is an instance a Knapsack Problem which is NP - hard [22]. As a result, the multi-objective problem stated above is also NP -hard, and therefore cannot be solved us- ing exact optimization techniques for large scale problem instances. As explained in Section 2, single objective formulations of the Next Release Problem (NRP) have been considered in the literature. Also, approaches have been considered, where the multiple objectives are combined into a single ob- jective using weight. However, the multi-objective version of the problem, in which there are two or more competing ob- jectives (which may conflict) has not been considered. The multi-objective formulation is important because, in prac- tice, a software engineer is more likely to have many con- flicting objectives to address when determining the set of requirements to include in the next release of the software. As such, the MONRP is more likely to be appropriate than the single objective NRP. In this paper we apply metaheuristic search techniques to find approximations of Pareto-optimal set (or front) for the MONRP. This allows the decision maker to select the preferred solution from the Pareto-optimal set, according to their priorities. The Pareto front can also provide valuable insights into the outcome of the selected set of requirements to the software development company, because it captures the trade-offs between the two competing objectives. The results can also be used in ‘what if?’ analyses, for instance: “what would we gain if we could securely assign 10% more resources to the project?”” Or “what would we lose by reducing the project’s budget by 20%?” In such situations, a Pareto front is more useful in exploring the outcome of these changes to the scenario, because it captures the entire range of trade-off decisions, rather than fixing on a single point. The single objective formulation of the problem lacks the ability to provide such decision aids. 1129

Upload: others

Post on 18-Oct-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: The Multi-Objective Next Release Problemcrest.cs.ucl.ac.uk/fileadmin/crest/sebasepaper/ZhangHM07.pdf · a weighted multi-objective search, in which several metrics that assess the

The Multi-Objective Next Release Problem

Yuanyuan ZhangKing’s College London

Strand, LondonWC2R 2LS, UK

[email protected]

Mark HarmanKing’s College London

Strand, LondonWC2R 2LS, UK

[email protected]

S. Afshin MansouriKing’s College London

Strand, LondonWC2R 2LS, UK

[email protected]

ABSTRACTThis paper is concerned with the Multi-Objective Next Re-lease Problem (MONRP), a problem in search-based re-quirements engineering. Previous work has considered onlysingle objective formulations. In the multi-objective formu-lation, there are at least two (possibly conflicting) objectivesthat the software engineer wishes to optimize. It is arguedthat the multi-objective formulation is more realistic, sincerequirements engineering is characterised by the presenceof many complex and conflicting demands, for which thesoftware engineer must find a suitable balance. The paperpresents the results of an empirical study into the suitabilityof weighted and Pareto optimal genetic algorithms, togetherwith the NSGA-II algorithm, presenting evidence to sup-port the claim that NSGA-II is well suited to the MONRP.The paper also provides benchmark data to indicate the sizeabove which the MONRP becomes non–trivial.

Categories and Subject DescriptorsD.2.1 [SOFTWARE ENGINEERING]: Requirements/Specifications—Methodologies

General TermsAlgorithms, Measurement, Performance, Experimentation.

KeywordsPareto optimality, next release problem, multi-objective ge-netic algorithms

1. INTRODUCTIONIn this paper we address the Multi-Objective Next Release

Problem (MONRP), in which a set of customers with vary-ing requirements are targeted for the next release of an ex-isting software system. Satisfying each requirement entailsspending a certain amount of resources which can be trans-lated into cost terms. In addition, satisfying each require-

Permission to make digital or hard copies of all or part of this work forpersonal or classroom use is granted without fee provided that copies arenot made or distributed for profit or commercial advantage and that copiesbear this notice and the full citation on the first page. To copy otherwise, torepublish, to post on servers or to redistribute to lists, requires prior specificpermission and/or a fee.GECCO’07, July 7–11, 2007, London, England, United Kingdom.Copyright 2007 ACM 978-1-59593-697-4/07/0007 ...$5.00.

ment provides some value to the software development com-pany. The problem is to select the set of requirements thatmaximize total value and minimize required cost. Theseobjectives are conflicting, so it would be difficult to assignweights to each. This makes it inappropriate to combinethe two objectives into a single fitness function. Rather, amulti-objective formulation is better suited to this problem.

The single objective problem to maximize the value, sub-ject to a limited amount of resources (a fixed budget con-straint), is an instance a Knapsack Problem which is NP-hard [22]. As a result, the multi-objective problem statedabove is also NP-hard, and therefore cannot be solved us-ing exact optimization techniques for large scale probleminstances.

As explained in Section 2, single objective formulationsof the Next Release Problem (NRP) have been consideredin the literature. Also, approaches have been considered,where the multiple objectives are combined into a single ob-jective using weight. However, the multi-objective version ofthe problem, in which there are two or more competing ob-jectives (which may conflict) has not been considered. Themulti-objective formulation is important because, in prac-tice, a software engineer is more likely to have many con-flicting objectives to address when determining the set ofrequirements to include in the next release of the software.As such, the MONRP is more likely to be appropriate thanthe single objective NRP.

In this paper we apply metaheuristic search techniquesto find approximations of Pareto-optimal set (or front) forthe MONRP. This allows the decision maker to select thepreferred solution from the Pareto-optimal set, according totheir priorities. The Pareto front can also provide valuableinsights into the outcome of the selected set of requirementsto the software development company, because it capturesthe trade-offs between the two competing objectives. Theresults can also be used in ‘what if?’ analyses, for instance:

“what would we gain if we could securely assign10% more resources to the project?””

Or

“what would we lose by reducing the project’sbudget by 20%?”

In such situations, a Pareto front is more useful in exploringthe outcome of these changes to the scenario, because itcaptures the entire range of trade-off decisions, rather thanfixing on a single point. The single objective formulation ofthe problem lacks the ability to provide such decision aids.

1129

Page 2: The Multi-Objective Next Release Problemcrest.cs.ucl.ac.uk/fileadmin/crest/sebasepaper/ZhangHM07.pdf · a weighted multi-objective search, in which several metrics that assess the

The paper introduces a formal definition of the MONRPand develops four different multi-objective search-based so-lutions: Random Search, Single-Objective (Weighted) Ge-netic Algorithm, a Pareto GA and Non-dominated SortingGenetic Algorithm II (NSGA-II) approach proposed by Debet al. [8]. Random Search is included partly to provide a‘sanity check’ that metaheuristic search techniques can out-perform it (as would be expected) and partly to allow us toempirically identify the point at which the number of cus-tomers and requirements becomes sufficiently large to en-sure that the problem is non–trivial. Results from applyingthe four algorithms are presented that compare their per-formance at finding approximations of the Pareto-optimalfront.

The primary contributions of the paper are as follows:

1. The paper formulates the Next Release Problem asMulti-Objective Optimization Problem, treating the‘cost’ constraint as an objective and combining it withthe ‘value’ objective. This is the first paper to gener-alise the NRP to the Multi-Objective NRP (MONRP).

2. The paper presents the results of an empirical studyinto the relative performance of four different multi-objective search techniques for solving the problem.The results show:

(a) That the weighted Single-Objective GA, ParetoGA and NSGA-II can all outperform Randomsearch (passing the ‘sanity check’);

(b) That NSGA-II outperforms the Pareto GA, bothin terms of diversity of results and in terms ofquality of the results (which dominate all resultsfrom the Pareto GA);

(c) That the weighted Single-Objective GA can behelpful in finding extreme points on the Paretofront, by a suitable choice of weights.

3. The paper presents the results of a second empiricalstudy, aimed at determining the size of problem forwhich the MONRP becomes non–trivial. This revealsthat the number of requirements must be larger thanapproximately 20, while the number of customers needonly be larger than 2 in order to produce a non–trivialinstance of the MONRP.

The rest of the paper is organised as follows:Section 2 describes the context of related work in which

the current paper is located. Section 3 gives the definitionsof the Multi-Objective Optimization Problem (MOOP) andthe Pareto-optimal front. In Section 4 the research problemis defined formally, while Section 5 introduces the search al-gorithms studied and how they are tailored to the MONRP.Sections 6 and 7 present the results of the experiments anddiscuss the findings. Section 8 concludes.

2. RELATED WORKThe most closely related previous work to the content of

the present paper is the work on the Next Release Problem(NRP) studied by several authors [2, 10, 14]. The term:‘Next Release Problem’ was coined by Bagnall et al. [2]. Inthe NRP, as formulated by Bagnall et al., the goal is to findthe ideal set of requirements that balance customer requests

within resource constraints, so the problem is a constrainedsingle objective optimization problem. They applied a va-riety of techniques (including search based and non searchbased algorithms) to a set of synthetic data. Greer and Ruhealso studied the NRP, proposing a Genetic Algorithm–basedapproach for planning software releases [10].

The NRP is an example of a Feature Subset Selection(FSS) search problem. Other FSS problems in previous workon SBSE include the problem of determining good qualitypredictors in software project cost estimation, studied byKirsopp et al. [15] and chosing components to include indifferent releases of a system, studied by Harman et al. [11].

Previous work, both on the NRP and other FSS problemsin SBSE has been solely concerned with single objective for-mulations of the problems concerned. The present paper isthe first paper to generalise the NRP to the Multi-ObjectiveNRP (MONRP). Indeed, much of the other existing work onSBSE has also tended to consider software engineering prob-lems as single objective optimization problems. However, arecent trend appears to be developing, in which multipleobjectives are considered. This would seem to be a naturaland realistic extension of the initial work on SBSE, since somany software engineering problems are inherently multi-objective.

Other existing SBSE work that does consider multi-objectiveformulations of software engineering problems, tends to usethe weighted approach to combine fitness functions for eachobjective into a single objective function using weighting co-efficients to denote the relative importance of each individ-ual fitness function. For example, in the seminal work of theDrexel group [9, 17, 18, 19] on search based clustering, thetwo objectives of cohesion and coupling are combined into asingle objective by the Module Quality Metric (MQ), whichhas been widely studied, both by the Drexel group and alsoby other authors [5, 12, 16]. For search based refactoring,both Seng et al. [23] and O’Keeffe and O’Cinneide [20] usea weighted multi-objective search, in which several metricsthat assess the quality of refactorings are combined into asingle objective function. In the domain of search basedproject planning [1], Chicano and Alba [3] combined severalproject management metrics into a single objective function,using weighting to guide resource allocation.

The present paper is one of the first papers on SBSEto consider the set of objectives independently in order toexplore the search space towards the Pareto-optimal front(rather than simply weighting each objective to form a uni-fied objective function which, at best, is capable of directingto a single Pareto-optimal solution).

The Pareto approach is more suitable when it is difficult tocombine fitness functions into a single overall objective func-tion. Such a combination is difficult in many situations, forexample, because the fitness functions measure fundamen-tally different properties (so combining them would be toseek to ‘compare apples and oranges’), because weights can-not be adequately determined or because the use of weightsbiases the search to a certain part of the solution space. Asthe results presented in the present paper indicate, for theMONRP, the weighting approach suffers from this problem;the weights force the search towards certain small areas ofthe Pareto front so several runs with different weights arerequired to find diverse approximations of Pareto fronts.

1130

Page 3: The Multi-Objective Next Release Problemcrest.cs.ucl.ac.uk/fileadmin/crest/sebasepaper/ZhangHM07.pdf · a weighted multi-objective search, in which several metrics that assess the

3. PARETO-OPTIMAL FRONTThe Multi-Objective Optimization Problem (MOOP) can

be defined as the problem of finding a vector of decisionvariables −→x , which optimizes a vector of M objective func-tions fi(

−→x ) where i = 1, 2, . . . , M ; subject to inequalityconstraints gj(

−→x ) ≥ 0 and equality constraints hk(−→x ) = 0where j = 1, 2, . . . , J and k = 1, 2, . . . , K. The objectivefunctions form a mathematical description of performancecriteria that are usually in conflict with each other [21].

Without loss of generality, a MOOP can be defined asfollows:

Maximize {f1(−→x ), f2(

−→x ), . . . , fM (−→x )}subject to:

gj(−→x ) ≥ 0; j = 1, 2, . . . , J

and

hk(−→x ) = 0; k = 1, 2, . . . , K.

where −→x is vector of decision variables; fi(−→x ) is the i-th ob-

jective function; and g(−→x ) and h(−→x ) are constraint vectors.These objective functions constitute a multi-dimensional

space in addition to the usual decision space. This additionalspace is called the objective space, Z. For each solution −→xin the decision variable space, there exists a point in theobjective space:

−→f (−→x ) = Z = (z1, z2, . . . , zM )T

In a Multi-Objective Optimization Problem, we wish tofind a set of values for the decision variables that optimizesa set of objective functions. A decision vector −→x is said todominate a decision vector −→y (also written as −→x � −→y ) iff:

fi(−→x ) ≥ fi(

−→y ) ∀ i ∈ {1, 2, . . . , M};and

∃ i ∈ {1, 2, . . . , M} | fi(−→x ) > fi(

−→y ).

All decision vectors that are not dominated by any otherdecision vector are called non-dominated or Pareto-optimaland constitute the Pareto-optimal Front. These are solu-tions for which no objective can be improved without de-tracting from at least one other objective.

4. PROBLEM STATEMENTThis section gives definitions and characteristics of the

MONRP problem as an extension of the traditional NRPmodel.

4.1 NRP ModelIt is assumed that for an existing software system, there

is a set of customers,

C = {c1, . . . , cm}whose requirements are to be considered in the developmentof the next release.

The set of possible software requirements is denoted by:

R = {r1, . . . , rn}It’s assumed that all the requirements are independent. Inorder to satisfy each requirement, some resources need to

be allocated. The resources needed to implement a partic-ular requirement can be transformed into cost terms andconsidered to be the associated cost to fulfill the require-ment. The resultant cost vector for the set of requirementsri(1 ≤ i ≤ n) is denoted by:

Cost = {cost1, . . . , costn}Each customer has a degree of importance for the companythat can be reflected by a weight factor. The set of relativeweights associated with each customer cj(1 ≤ j ≤ m) isdenoted by:

Weight = {w1, . . . , wm}where wj ∈ [0, 1] and

�mj=1 wj = 1.

It is assumed that all requirements are not equally im-portant for a given customer. The level of satisfaction for agiven customer depends on the requirements that are satis-fied in the next release of the software, which provide valueto the company. Each customer cj(1 ≤ j ≤ m) assigns avalue to requirement ri(1 ≤ i ≤ n) denoted by: value(ri, cj)where value(ri, cj) > 0 if customer j has the requirement iand 0 otherwise.

Based on above, the overall score or importance of a givenrequirement ri(1 ≤ i ≤ n) can be calculated by:

scorei =m�

j=1

wj · value(ri, cj)

The ‘score’ of a given requirement is represented as itsoverall ‘value’ for the company.

The decision vector −→x = {x1, . . . , xn} ∈ {0, 1} determinesthe requirements that are to be be satisfied in the next re-lease. In this vector, xi is 1 if requirement i is selected and0 otherwise.

4.2 MONRP FormulationIn the formulation of the MONRP, two objectives are

taken into consideration in order to maximize customer sat-isfaction (or total value for the company) and minimize re-quired cost. We treat cost in the current research as anobjective instead of a constraint for the first time in ourMONRP model. The reason is to explore the whole setof points on the Pareto-optimal front; this is a valuablesource of information for the decision maker to understandthe trade-offs inherent in meeting the conflicting objectives.

The following objective function is considered for maxi-mizing total value:

Maximizen�

i=1

scorei · xi

The problem is to select a subset of the customers’ require-ments which results in the maximum value for the company.

The second objective function is defined as follows to min-imize total cost required for the satisfaction of customer re-quirements:

Minimizen�

i=1

costi · xi

In order to convert the second objective to a maximiza-tion problem in the MONRP, the total cost is multiplied by-1. Therefore, the MONRP model consisting of fitness func-tions can be represented as follows:

1131

Page 4: The Multi-Objective Next Release Problemcrest.cs.ucl.ac.uk/fileadmin/crest/sebasepaper/ZhangHM07.pdf · a weighted multi-objective search, in which several metrics that assess the

Maximize f1(−→x ) =

n�

i=1

scorei · xi

Maximize f2(−→x ) = −

n�

i=1

costi · xi

5. THE SOLUTION APPROACHESThis section describes the search algorithms used in this

paper. As stated earlier, in the solution of MOOPs thereexist multiple and possibly conflicting objectives to be op-timized simultaneously. There are various approaches tosolve MOOPs. Among the most widely adopted techniquesare: sequential optimization, ε-constraint method, weightingmethod, goal programming, goal attainment, distance basedmethod and direction based method. For a comprehensivestudy of these approaches, readers may refer to Szidarovskyet al. [25] and Collette and Siarry [6].

Among meta-heuristics, Evolutionary Algorithms (EAs)seem particularly desirable to solve MOOPs. EAs are usedto solve problems of this nature mainly because of the pop-ulation based nature of EAs which enables them to capturethe dominance relations in the population as a vehicle toguide the search towards Pareto-optimal front. They dealsimultaneously with a set of possible solutions (the so-calledpopulation) which unlike traditional mathematical program-ming techniques, can find good approximations of Pareto-optimal set in a single run. Additionally, EAs are less sus-ceptible to the shape or continuity of the Pareto-optimalfront, whereas these two issues are a real concern for math-ematical programming techniques.

EAs usually contain several parameters that need to be‘tuned’ for each particular application, which is, in manycases, highly time consuming. In addition, since the EAs arestochastic optimizers, different runs tend to produce differ-ent results. Therefore, multiple runs of the same algorithmon a given problem are needed to statistically describe theirperformance on that problem. For a more detailed discus-sion on application of EAs in multi-objective optimizationsee Coello et al. [4] and Deb [7].

To solve the MONRP, Multi-Objective EAs need to fulfilltwo major tasks:

1. Guiding the search towards the Pareto-optimal set toaccomplish fitness assignment.

2. Maintaining a diverse population to achieve a well dis-tributed non-dominated front.

We examine three search techniques namely: NSGA-II, aPareto GA and a Single-Objective GA for the solution of theMONRP. These techniques are also compared with the Ran-dom Search to verify viability of applying computationallyexpensive search techniques for the MONRP.

5.1 NSGA-IINon-dominated Sorting Genetic Algorithm-II (NSGA-II),

introduced by Deb et al.[8] is an extension to an earlierMulti-Objective EA called NSGA developed by Srinivas andDeb [24]. NSGA-II incorporates elitism to maintain the so-lutions of the best front found. The rank of each individual isbased on the level of non-domination. NSGA-II is a compu-tationally efficient algorithm whose complexity is O(mN2),

compared to NSGA with the complexity O(mN3), where mis the number of objectives and N is the population size.

The population is sorted using the non-domination rela-tion into several fronts. Each solution is assigned a fitnessvalue according to its non-domination level. In this way,the solutions in better fronts are given higher fitness values.The NSGA-II uses a measure of crowding distance to pro-vide an estimation of the density of solutions belonging tothe same front. This parameter is used to promote diversitywithin the population. Solutions with higher crowding dis-tance are assigned a better fitness compared to those withlower crowding distance, thereby avoiding the use of the fit-ness sharing factor [13].

Assume that every individual i in the population has twoattributes:

1. nondomination rank (irank);

2. crowding distance (idistance).

We now define a partial order ≺n as

i ≺ j if (irank < jrank)

or ((irank = jrank) and (idistance > jdistance))

That is, between two solution with differing nondomina-tion ranks, we prefer the solution with the lower(better)rank.Otherwise, if both solutions belong to the same front, thenwe prefer the solution that is located in a lesser crowdedregion[8].

The algorithm can be described as follows. Initially, arandom parent population P0 is created. The populationsize is N. Tournament selection, crossover, and mutationoperators are used to create a child population Q0 of size N .The NSGA-II procedure executes the main loop describedin Algorithm 1.

Algorithm 1: NSGA-II (main loop)

while not stopping rule doLet Rt = Pt ∪ Qt;Let F = fast-non-dominated-sort (Rt);Let Pt+1 = ∅ and i = 1;while |Pt+1| + |Fi| � N do

Apply crowding-distance-assignment(Fi);Let Pt+1 = Pt+1 ∪ Fi;Let i = i + 1;

endSort(Fi,≺n);Let Pt+1 = Pt+1 ∪ Fi[1 : (N − |Pt+1|)];Let Qt+1 = make-new-pop(Pt+1);

Let t = t + 1;end

5.2 Pareto GAThe Pareto GA algorithm used in this research is a varia-

tion of the simple genetic algorithm. A simple GA maintainsthe populations of candidate solutions that are evaluated ac-cording to a single fitness value. In the MONRP there areat least two fitness values for each solution which are usedin tournament selection. The algorithm uses Pareto domi-nance relations between solutions to select candidates for the

1132

Page 5: The Multi-Objective Next Release Problemcrest.cs.ucl.ac.uk/fileadmin/crest/sebasepaper/ZhangHM07.pdf · a weighted multi-objective search, in which several metrics that assess the

mating pool. The new population is created by recombiningthe selected solutions through crossover and mutation oper-ators. The main procedure of the Pareto GA is described inAlgorithm 2.

Algorithm 2: Pareto GA

t = 0;Generate population P0;while not stopping rule do

Evaluate objective functions for ∀i ∈ Pt;Let t = t + 1;Let i = 0;while i < N do

Consider two solutions x and y ∈ Pt atrandom;Let selected solution = ∅;if x � y then

Let selected solution = x;Let i = i + 1;

endelse if y � x then

Let selected solution = y;Let i = i + 1;

endAdd selected solution to mating pool;

endApply genetic operators (crossover andmutation) to Pt;

end

5.3 Single-Objective GATo apply a Single-Objective GA to MONRP, we use the

weighting method that combines the objective functions intoa single objective using a weight factor ω(0 ≤ ω ≤ 1). Thefitness value of a given solution −→x in the Single-ObjectiveGA is calculated as follows:

F (−→x ) = (1 − ω) · f1(−→x ) + ω · f2(

−→x )

By changing the weight factor ω ∈ [0, 1], one can explorevarious regions of the Pareto-optimal front. This approach isemployed to compare the performance of the Single-ObjectiveGA with other search techniques.

5.4 Random SearchWe also applied the Random Search technique to the MONRP.

This is merely a ‘sanity check’; all metaheuristic algorithmsshould be capable of comfortably outperforming RandomSearch for a well-formulated optimization problem. Thusthe Random Search technique was given the same numberof fitness evaluations as the other algorithms to provide auseful lower bound benchmark for measuring the other al-gorithms’ performance.

6. EXPERIMENTAL SET UPIn this section, we describe the test problems used to com-

pare the performance of NSGA-II with Pareto GA, RandomSearch and Single-Objective GA.

The test problems were created by assigning random choicesfor value and cost. The range of costs were from 1 through to9 inclusive (zero cost is not permitted). The range of values

were from 0 to 5 inclusive (zero value is permitted, indicat-ing that the customer places no value on, i.e. does not want,this requirement). This simulates the situation where a cus-tomer ranks the choice of requirements (for value) and thecost is estimated to fall in a range, very low, low, medium,high, very high. Each algorithm was executed 5 times foreach data set.

The four algorithms were applied to two test problem sets,for two separate empirical study cases: Empirical Study 1(ES1) and Empirical Study 2 (ES2). In the ES1 we reportresults concerning the performance of the four algorithmsfor what might be considered ‘typical’ cases, with the num-ber of customers ranging from 15 to 100 and the numberof requirements ranging from 40 to 140. In ES2, we areconcerned with bounding the problem below, to determinethe size of problem for which search is not appropriate; thepoint at which the problem becomes too small. In order todo this, we seek to find the point at which the metaheuris-tic techniques can (only just) outperform a Random Search,since we deem this to indicate that the problem is suffi-ciently large for it to be worth considering the applicationof metaheuristic search techniques.

All approaches were run for a maximum of 10,000 func-tion evaluations. The initial population was set to 200. Weused a simple binary GA encoding, with one bit to code foreach decision variable (the inclusion or exclusion of a re-quirement). The length of a chromosome is thus equivalentto the number of requirements. Each experimental execu-tion of each algorithm was terminated when the generationnumber reached 51 (i.e after 10,000 evaluations). All geneticapproaches used the tournament selection (the tournamentsize is 5), single-point crossover and bitwise mutation forbinary-coded GAs. The crossover probability was set toPc = 0.8 and mutation probability to Pm = 1/n (where n isthe string length for binary-coded GAs) were used.

7. RESULTS AND ANALYSIS

7.1 Empirical Study 1—Scale ProblemIn ES1 we investigate the relative performance of the four

approaches to the MONRP for cases that we consider typi-cal. We consider three ‘scales’ of problem that we hope arecharacteristic of some of the problems to which the approachmay be applied. The number of customers and requirementsfor each scale problem is listed in Table 1:

Table 1: Scale test sets

Scale Customer RequirementS1 15 40S2 50 80S3 100 140

The results of S1, S2 and S3 are shown in Figure 1(a-c). The figure shows typical results from the 5 runs of eachalgorithm. Space does not permit us to show all results, butthe results from other runs were very similar to those shownin the figure. In each picture, the results show the non-dominated front obtained after 50 generations with NSGA-II, Pareto GA, Single-Objective GA, and all 10,000 solutionsobtained by Random Search.

1133

Page 6: The Multi-Objective Next Release Problemcrest.cs.ucl.ac.uk/fileadmin/crest/sebasepaper/ZhangHM07.pdf · a weighted multi-objective search, in which several metrics that assess the

In the Single-Objective GA, there are nine different weightcoefficients ω for each objective ranging from 0.1 to 0.9 inthis paper. The step size of weight is 0.1. The algorithm wasexecuted 9 times for different choices of weight coefficientsto obtain different solutions within a single experiment.

The results lend evidence to support the following claims:

1. The NSGA-II algorithm performs the best in all threescale problems. Figures 1 (a), (b) and (c) show asmooth, non-interrupted Pareto-optimal front gener-ated by NSGA-II. Clearly, it is able to find a better di-versity of solution distribution and it also converges onresults that are better than those for any of the otheralgorithms, because the front line dominates those pro-duced by the other approaches. This provides empiri-cal evidence that the NSGA-II algorithm is effective insolving the MONRP and that it may outperform thePareto GA and the Single-Objective GA (as well, ofcourse, as random).

2. The solutions obtained using the weight-based Single-Objective GA where ω was close to 0.0 or 1.0 drive thesearch towards the extreme ends of the Pareto front.These extreme parts of the Pareto front do not ap-pear to be explored by NSGA-II, even though it doesproduce a good spread of results. In such extreme so-lutions, one objective is maximized to the almost totalexclusion of the other. Although the Single-ObjectiveGA may have difficulties in finding a good spread ofsolutions near the Pareto-optimal front, these extremenon-dominated solutions which the Single-ObjectiveGA managed to find provide a necessary supplementto the NSGA-II solutions. They may be useful in realworld scenarios, because they allow the software engi-neer to explore the extremes of the Pareto front, whereone objective dominates.

3. The trend we observed from the three experiments isthat the larger the scale of the test problem, the widerthe gap in performance, both between metaheuristictechniques and Random Search and between the leader(NSGA-II) and the runner up (Pareto GA). This indi-cates that as the problem scales, the performance im-provement offered by the use of NSGA-II also scales,making it an increasingly preferred choice of solutionalgorithm.

7.2 Empirical Study 2—Boundary ProblemIn ES2, we want to explore the boundaries of MONRP.

For the lower boundary, there are two extreme situations,namely many customers with very few requirements and viceversa. In this section, we are interested in obtaining the‘critical mass’ of both customers and requirements. Oncethe critical mass exceeds a predetermined critical point, itbecomes worthwhile applying metaheuristic techniques tothe MONRP; below this point, the problem is too trivial tomerit a metaheuristic solution.

In the first situation, there are many customers and fewrequirements. We set the number of customers to 100, withthe aim of finding the smallest set of requirements to meetthe critical point.

Figure 1(d) illustrates this critical point. From the figurewe can see that there is no gap between the solutions gen-erated by the Random Search and the Pareto-optimal front

produced by NSGA-II. The number of requirements is setto 20 in Figure 1(d). The NSGA-II and the Random Searchsolutions share several common points. However, when weincrease the number of requirements to 25, as shown in Fig-ure 1(e), we can see that there is a clear gap between thesolutions produced by the Random Search and the NSGA-II. If we increase the number of requirements further, thegap continues to widen.

In the second situation, there are few customers and manyrequirements. We assume that the smallest number of cus-tomers is 2. However, even with only 2 customers, withsufficient requirements, there is a noticeable difference inthe performance of the metaheuristic search algorithms andRandom Search. This is illustrated in Figure 1(f), wherethe number of requirements is 200. In this figure, NSGA-IIclearly outperforms other algorithms.

Thus it can be seen that the primary determinant of the‘tipping point’ is a function of the number of requirements,which should exceed about 20 requirements. By contrastthere is no number of customers that is ‘too small’ for theproblem to be worthwhile.

8. CONCLUSIONS AND FUTURE WORKIn this paper we address the Multi-Objective Next Re-

lease Problem (MONRP) for the first time, in order to opti-mize both value and cost simultaneously. These objectivesare naturally conflicting so attaining a single optimal solu-tion may not be possible in many cases. Instead, the setof Pareto-optimal solutions are to be sought that enablesdecision makers to select the best solutions in different cir-cumstances based on their priorities. Four search techniquesnamely: Random Search, Single-Objective GA, Pareto GAand NSGA-II were examined to find approximations of thePareto front in different problem instances.

It was observed that the NSGA-II outperforms other tech-niques in finding a large and important part of the Paretofront in large problems. However, for these problems, aSingle-Objective GA (applied to the unified objective func-tion) performs better in finding extreme regions along thefront when is run iteratively, using varying weights for thetwo objectives.

In small problem instances, no major difference was ob-served between the solution techniques in terms of quality ofthe Pareto front found. In this way, the paper provides anempirically determined lower bound benchmark on problemsize, indicating the point below which the MONRP becomestrivial.

Future work will verify these findings by applying searchtechniques to real world problems. This will provide valu-able feedback to researchers and practitioners in search tech-niques as well as software engineering communities. Otherreformulations of the MONRP considering different sets ofobjectives and constraints including dependency relation-ship between requirements will be experimented with. Thisin turn may give rise to the need for the development ofmore efficient solution techniques.

9. REFERENCES[1] Antoniol, G., Penta, M. D., and Harman, M.

Search-based techniques applied to optimization ofproject planning for a massive maintenance project. In21st IEEE International Conference on Software

1134

Page 7: The Multi-Objective Next Release Problemcrest.cs.ucl.ac.uk/fileadmin/crest/sebasepaper/ZhangHM07.pdf · a weighted multi-objective search, in which several metrics that assess the

0 2000 4000 6000 8000 10000 12000 14000−180

−160

−140

−120

−100

−80

−60

−40

−20

0

Score

−1*

Cos

tNSGA−IIPareto GARandom searchSingle−Objective GA

0 5000 10000 15000−80

−70

−60

−50

−40

−30

−20

−10

0

Score

−1*

Cos

t

NSGA−IIPareto GARandom searchSingle−Objective GA

(a) 15 customers; 40 requirements (d) 100 customers; 20 requirements

0 0.5 1 1.5 2

x 105

−400

−350

−300

−250

−200

−150

−100

−50

0

Score

−1*

Cos

t

NSGA−IIPareto GARandom searchSingle−Objective GA

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5

x 104

−140

−120

−100

−80

−60

−40

−20

0

Score

−1*

Cos

t

NSGA−IIPareto GARandom searchSingle−Objective GA

(b) 50 customers; 80 requirements (e) 100 customers; 25 requirements

0 2 4 6 8 10 12

x 105

−550

−500

−450

−400

−350

−300

−250

−200

−150

−100

−50

0

Score

−1*

Cos

t

NSGA−IIPareto GARandom searchSingle−Objective GA

1 1.5 2 2.5 3

x 104

−800

−700

−600

−500

−400

−300

−200

−100

Score

−1*

Cos

t

NSGA−IIPareto GARandom searchSingle−Objective GA

(c) 100 customers; 140 requirements (f) 2 customers; 200 requirements

Figure 1: In (a), (b) and (c) metaheuristic search techniques have outperformed Random Search. The NSGA-II performed better or equal to others for a large part of the Pareto front while Single-Objective performedbetter in the extreme regions. The gap between search techniques and Random Search became larger as theproblem size increased. (d) shows the boundary case concerning the number of requirements beyond which,Random Search fails to produce comparable results with metaheuristic search techniques. (e) shows 25%increase in the number of requirements, the gap became significant. The NSGA-II performed the best, andPareto GA shared part of the front with NSGA-II. (f) shows that the gap was obviously large. The NSGA-IIhas outperformed Pareto GA but not the Single-Objective GA in the extreme ends of the front.

1135

Page 8: The Multi-Objective Next Release Problemcrest.cs.ucl.ac.uk/fileadmin/crest/sebasepaper/ZhangHM07.pdf · a weighted multi-objective search, in which several metrics that assess the

Maintenance (Los Alamitos, California, USA, 2005),pp. 240–249.

[2] Bagnall, A., Rayward-Smith, V., and Whittley,

I. The next release problem. Information and SoftwareTechnology 43, 14 (Dec. 2001), 883–890.

[3] Chicano, F., and Alba, E. Management of softwareprojects with gas. In 6th Metaheuristics InternationalConference (MIC2005) (Vienna, Austria, Aug. 2005).

[4] Coello Coello, C. A., Van Veldhuizen, D. A.,

and Lamont, G. B. Evolutionary Algorithms forSolving Multi-Objective Problems. Kluwer AcademicPublishers, New York, May 2002.

[5] Cohen, M., Kooi, S. B., and Srisa-an, W.

Clustering the heap in multi-threaded applications forimproved garbage collection. In GECCO 2006:Proceedings of the 8th annual conference on Geneticand evolutionary computation (Seattle, Washington,USA, 8-12 July 2006), M. Keijzer, M. Cattolico,D. Arnold, V. Babovic, C. Blum, P. Bosman, M. V.Butz, C. Coello Coello, D. Dasgupta, S. G. Ficici,J. Foster, A. Hernandez-Aguirre, G. Hornby,H. Lipson, P. McMinn, J. Moore, G. Raidl,F. Rothlauf, C. Ryan, and D. Thierens, Eds., vol. 2,ACM Press, pp. 1901–1908.

[6] Collette, Y., and Siarry, P. MultiobjectiveOptimization: Principles and Case Studies. Springer,2004.

[7] Deb, K. Multi-Objective Optimization UsingEvolutionary Algorithms. Wiley, Chichester, UK, 2001.

[8] Deb, K., Pratap, A., Agarwal, S., and

Meyarivan, T. A Fast and Elitist MultiobjectiveGenetic Algorithm: NSGA–II. IEEE Transactions onEvolutionary Computation 6, 2 (Apr. 2002), 182–197.

[9] Doval, D., Mancoridis, S., and Mitchell, B. S.

Automatic clustering of software systems using agenetic algorithm. In International Conference onSoftware Tools and Engineering Practice (STEP’99)(Pittsburgh, PA, 30 August - 2 September 1999).

[10] Greer, D., and Ruhe, G. Software release planning:an evolutionary and iterative approach. Information &Software Technology 46, 4 (2004), 243–253.

[11] Harman, M., Steinhofel, K., and Skaliotis, A.

Search based approaches to component selection andprioritization for the next release problem. In 22nd

International Conference on Software Maintenance(ICSM 06) (Philadelphia, Pennsylvania, USA, Sept.2006). To appear.

[12] Harman, M., Swift, S., and Mahdavi, K. Anempirical study of the robustness of two moduleclustering fitness functions. In Genetic andEvolutionary Computation Conference (GECCO 2005)(Washington DC, USA, June 2005), pp. 1029–1036.

[13] Horn, J., and Nafpliotis, N. Multiobjectiveoptimization using the niched pareto geneticalgorithm. Tech. Rep. IllIGAL 93005, Illinois GeneticAlgorithms Laboratory, University of Illinois atUrbana-Champaign, Urbana, IL, 1993.

[14] Karlsson, J., Wohlin, C., and Regnell, B. Anevaluation of methods for priorizing softwarerequirements. Information and Software Technology 39(1998), 939–947.

[15] Kirsopp, C., Shepperd, M., and Hart, J. Searchheuristics, case-based reasoning and software projecteffort prediction. In GECCO 2002: Proceedings of theGenetic and Evolutionary Computation Conference(San Francisco, CA 94104, USA, 9-13 July 2002),W. B. Langdon, E. Cant-Paz, K. Mathias, R. Roy,D. Davis, R. Poli, K. Balakrishnan, V. Honavar,G. Rudolph, J. Wegener, L. Bull, M. A. Potter, A. C.Schultz, J. F. Miller, E. Burke, and N. Jonoska, Eds.,Morgan Kaufmann Publishers, pp. 1367–1374.

[16] Mahdavi, K., Harman, M., and Hierons, R. M. Amultiple hill climbing approach to software moduleclustering. In IEEE International Conference onSoftware Maintenance (Los Alamitos, California,USA, Sept. 2003), pp. 315–324.

[17] Mancoridis, S., Mitchell, B. S., Chen, Y.-F.,

and Gansner, E. R. Bunch: A clustering tool for therecovery and maintenance of software systemstructures. In Proceedings; IEEE InternationalConference on Software Maintenance (1999), IEEEComputer Society Press, pp. 50–59.

[18] Mancoridis, S., Mitchell, B. S., Rorres, C.,

Chen, Y.-F., and Gansner, E. R. Using automaticclustering to produce high-level system organizationsof source code. In International Workshop on ProgramComprehension (IWPC’98) (Los Alamitos, California,USA, 1998), pp. 45–53.

[19] Mitchell, B. S., and Mancoridis, S. On theautomatic modularization of software systems usingthe bunch tool. 193–208.

[20] O’Keeffe, M., and O’Cinneide, M. Search-basedsoftware maintenance. In Conference on SoftwareMaintenance and Reengineering (CSMR’06) (Mar.2006), pp. 249–260.

[21] Osyczka, A. In Multicriteria optimization forengineering design (1985), Design Optimization,pp. 193–227.

[22] Papadimitriou, C. H., and Steiglitz, K.

Combinatorial Optimization: Algorithms andComplexity. Dover, 1998.

[23] Seng, O., Stammel, J., and Burkhart, D.

Search-based determination of refactorings forimproving the class structure of object-orientedsystems. In GECCO 2006: Proceedings of the 8thannual conference on Genetic and evolutionarycomputation (Seattle, Washington, USA, 8-12 July2006), M. Keijzer, M. Cattolico, D. Arnold,V. Babovic, C. Blum, P. Bosman, M. V. Butz,C. Coello Coello, D. Dasgupta, S. G. Ficici, J. Foster,A. Hernandez-Aguirre, G. Hornby, H. Lipson,P. McMinn, J. Moore, G. Raidl, F. Rothlauf, C. Ryan,and D. Thierens, Eds., vol. 2, ACM Press,pp. 1909–1916.

[24] Srinivas, N., and Deb, K. MultiobjectiveOptimization Using Nondominated Sorting in GeneticAlgorithms. Evolutionary Computation 2, 3 (Fall1994), 221–248.

[25] Szidarovsky, F., Gershon, M. E., and Dukstein,

L. Techniques for multiobjective decision making insystems management. Elsevier, New York, 1986.

1136