a study of search algorithms’ optimization speed

15
J Comb Optim DOI 10.1007/s10878-012-9514-7 A study of search algorithms’ optimization speed Andrea Valsecchi · Leonardo Vanneschi · Giancarlo Mauri © Springer Science+Business Media, LLC 2012 Abstract Search algorithms are often compared by the optimization speed achieved on some sets of cost functions. Here some properties of algorithms’ optimization speed are introduced and discussed. In particular, we show that determining whether a set of cost functions F admits a search algorithm having given optimization speed is an NP-complete problem. Further, we derive an explicit formula to calculate the best achievable optimization speed when F is closed under permutation. Finally, we show that the optimization speed achieved by some well-know optimization techniques can be much worse than the best theoretical value, at least on some sets of optimization benchmarks. Keywords Optimization problems · Search algorithms · Optimization speed 1 Introduction Search algorithms are robust optimization techniques aimed at solving optimization problems on discrete or continuous domains. The No Free Lunch theorem (Wolpert A. Valsecchi European Centre for Soft Computing, 33600, Mieres, Spain e-mail: [email protected] L. Vanneschi ( ) · G. Mauri DISCo, Università di Milano-Bicocca, 20126, Milan, Italy e-mail: [email protected] G. Mauri e-mail: [email protected] L. Vanneschi ISEGI, Universidade Nova de Lisboa, 1070-312, Lisbon, Portugal e-mail: [email protected]

Upload: leonardo-vanneschi

Post on 10-Dec-2016

215 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: A study of search algorithms’ optimization speed

J Comb OptimDOI 10.1007/s10878-012-9514-7

A study of search algorithms’ optimization speed

Andrea Valsecchi · Leonardo Vanneschi ·Giancarlo Mauri

© Springer Science+Business Media, LLC 2012

Abstract Search algorithms are often compared by the optimization speed achievedon some sets of cost functions. Here some properties of algorithms’ optimizationspeed are introduced and discussed. In particular, we show that determining whethera set of cost functions F admits a search algorithm having given optimization speed isan NP-complete problem. Further, we derive an explicit formula to calculate the bestachievable optimization speed when F is closed under permutation. Finally, we showthat the optimization speed achieved by some well-know optimization techniques canbe much worse than the best theoretical value, at least on some sets of optimizationbenchmarks.

Keywords Optimization problems · Search algorithms · Optimization speed

1 Introduction

Search algorithms are robust optimization techniques aimed at solving optimizationproblems on discrete or continuous domains. The No Free Lunch theorem (Wolpert

A. ValsecchiEuropean Centre for Soft Computing, 33600, Mieres, Spaine-mail: [email protected]

L. Vanneschi (�) · G. MauriDISCo, Università di Milano-Bicocca, 20126, Milan, Italye-mail: [email protected]

G. Maurie-mail: [email protected]

L. VanneschiISEGI, Universidade Nova de Lisboa, 1070-312, Lisbon, Portugale-mail: [email protected]

Page 2: A study of search algorithms’ optimization speed

J Comb Optim

and Macready 1997; Schumacher et al. 2001; Igel and Toussaint 2004; Rowe et al.2009)—loosely speaking—states that no such algorithm has a better performancethan any other if tested on all possible cost functions; therefore, in order to find thebest technique (or even a suitable one) to solve a given problem, the different searchalgorithms must be compared. Such comparison is often based on the optimizationspeed or first hitting time (Jackson et al. 2009) (i.e. the number of evaluations of thecost function needed to find an optimal solution) achieved by these algorithms. Torate an algorithm in absolute terms, one can compare its optimization speed with thebest achievable on that problem.

In this work, we present some properties of search algorithms’ optimization speed.In particular, we derive an equation to express this speed for some particular sets ofcost functions (closed under permutation, or c.u.p.). In general, we prove that deter-mining whether a set of cost functions admits a search algorithm with given optimiza-tion speed is an NP-complete problem. Finally, we are interested in discovering howmuch the optimization speed of some well-known search algorithms differs from thebest theoretical one (or at least an approximation of it) on a set of optimization bench-marks. These results extend the ones published in Valsecchi and Vanneschi (2008).

This paper is structured as follows: in Sect. 2 we give some basic definitions andrestate the non-uniform sharpened no free lunch theorem given in Igel and Toussaint(2004). In Sect. 3 we introduce the concepts of optimization speed and minimalityfor search algorithms; successively we present an equation for the best achievableoptimization speed of a search algorithm on c.u.p. sets of cost functions; finally weprove that determining whether a set of cost functions admits a search algorithmwith bounded optimization speed is an NP-complete problem. Section 4 contains theexperimental comparison between an approximation of the best optimization speedand the speed achieved by Genetic Algorithms (GAs) and Simulated Annealing (SA)(see Holland 1975; Goldberg 1989; Aarts and Korst 1989).

2 The No Free Lunch theorem

We follow Schumacher et al. (2001) in definition of search algorithms. Let X be afinite search space, f :X → Y a cost (or fitness) function and let Y be the set of allpossible cost values. We call trace over f a sequence of pairs:

t = ⟨(x1, y1), . . . , (xm, ym)

such that every xi ∈ X and yi = f (xi). A trace t represents a sequence of solutionsevaluated by a search algorithm together with their cost values.

Given a trace t = 〈t1, t2, . . . , tm〉, the notation tX is used for the sequence〈x1, x2, . . . , xm〉 and the notation tY for 〈y1, y2, . . . , ym〉. We call simple a trace t

such that ti = tj ⇒ i = j , i.e. a trace in which each solution appears only once.Let Tf be the set of all possible traces over the function f . Given a set of functions

F ⊆ YX , let

T =⋃

f ∈F

Tf

Page 3: A study of search algorithms’ optimization speed

J Comb Optim

We call search operator a function g:T → X. Given a trace, a search operator is thefunction that chooses the solution that will be examined at the next step. A searchoperator g is said to be non-repeating (Radcliffe and Surry 1995) if g(t) /∈ tX foreach t ∈ T . Note that, if t ∈ T is simple and g is non-repeating, then also

t ′ = t ‖ (g(t), f ◦ g(t)

)

where ‖ is the concatenation operator, is a simple trace.A deterministic search algorithm Ag is an application

Ag: (F × T ) → T

such that

∀t ∈ T Ag(f, t) = t ‖ (g(t), f ◦ g(t)

)

where g is a search operator. Ag is said to be non-repeating if g is non-repeating. Inthis work we will focus on deterministic, non-repeating search algorithms. In whatfollows, we will refer to such algorithms simply as search algorithms.

With Am(f, t) we denote the application of m < |X| iterations of a search algo-rithm A to a trace t , defined as:

A0(f, t) = t

Am+1(f, t) = A(f,Am(f, t)

)

Finally, we define Am(f ) = Am(f, 〈〉), where 〈〉 is the empty trace.A search algorithm can be easily represented by a decision tree. Given a search

algorithm Ag , the associated tree is defined as follows:

1. The root ρ is labeled g(〈〉);2. Let t be the trace obtained concatenating the labels of nodes and arcs from the

root to the current node. Then for each f such that t ∈ Tf from the current nodethere is an arc labeled f (g(t)) that leads to a node labeled with g(t).

Example 1 Consider the following functions

f1 = (0,1,2) f2 = (3,1,1)

f3 = (2,1,2) f4 = (0,2,3)

where ∀i = 1,2,3,4, fi has been defined with the notation: fi = (y0, y1, y2) wherefi(j) = yj for j = 0,1,2. Let A be a search algorithm such that

A3(f1) = ⟨(x2,1), (x3,2), (x1,0)

A3(f2) = ⟨(x2,1), (x3,1), (x1,3)

A3(f3) = ⟨(x2,1), (x3,2), (x1,2)

A3(f4) = ⟨(x2,2), (x1,0), (x3,3)

The tree representation of A is shown in Fig. 1.

Page 4: A study of search algorithms’ optimization speed

J Comb Optim

Fig. 1 The tree representationof the search algorithmdescribed in Example 1

We assume that the performance of an algorithm A after m iterations with respectto a function f depends only on the sequence

Am(f )Y = ⟨f (x1), f (x2), . . . , f (xm)

of cost-values the algorithm has produced. Let the function c denote a performancemeasure mapping sequences of cost-values to the real numbers. For example, in thecase of function minimization a performance measure that returns the minimum cost-value in the sequence could be a reasonable choice.

Let double brackets symbol denote a multi-set.

Theorem 1 (“No Free Lunch” theorem, Wolpert and Macready 1997) It holds that

c({{

A(f )Y∣∣ f ∈ YX

}}) = c({{

B(f )Y∣∣ f ∈ YX

}})

for all search algorithms A and B , and for all performance measures c.

The theorem states that no matter what performance measure is considered, allalgorithms have the same performance when tested on all possible cost functions.This is no real issue in practice, as almost all functions have no physical realization.However, the thesis of the theorem can be true even when smaller sets of functionsare considered.

Let ΠX be the set of permutations of X. The notion of closure under permutationis introduced.

Definition 1 A set F ⊆ YX is called closed under permutation (c.u.p.) if for eachf ∈ F and for each permutation σ in ΠX , (f ◦ σ) ∈ F .

The following holds.

Theorem 2 (Sharpened NFL theorem, Schumacher et al. 2001) Let F ⊆ YX . It holdsthat

c({{

A(f )Y∣∣ f ∈ F

}}) = c({{

B(f )Y∣∣ f ∈ F

}})

for all search algorithms A and B , and for all performance measures c, if and onlyif F is closed under permutation.

That is, all algorithms have identical performance, irrespective of the perfor-mance measure, if and only if F is closed under permutation. This result can be

Page 5: A study of search algorithms’ optimization speed

J Comb Optim

further extended to the non-uniform case, in which the objective functions are se-lected randomly according to a given probability distribution p. Let us call block aset [f ] = {f ◦ σ | σ ∈ ΠX}, and note that {[f ] | f ∈ YX} is a partition of the setof cost functions (English 2004). That is, each cost function is a member of exactlyone block. A probability distribution p is called block-uniform iff g ∈ [f ] impliesp(f ) = p(g) for all f,g ∈ YX .

Theorem 3 (Non-uniform sharpened NFL theorem, English 2004) Let Z be a ran-dom variable taking values in YX , and let p be its probability distribution. Thenc(A(Z)Y ) is equal in distribution to c(B(Z)Y ) for all search algorithms A and B ifand only if p is block-uniform.

See Häggström (2006) for an intuitive explanation of block-uniformity in terms ofexchangeability of the random variables Z(x).

3 Optimization speed and minimal search algorithms

Given a search algorithm A and a cost function f :X → Y , we define optimizationspeed (or first hitting time) as

ϕA(f ) = min{m ∈ N

∣∣ Am(f ) contains an optimal solution}

Given F ⊆ YX the average of such value on F is:

ϕA(F ) = 1

|F |∑

f ∈F

ϕA(f )

As X, Y are finite, the set of traces for F is finite and thus the search algorithms arealso finite. For such a set, we denote the best average optimization speed achievable

Δ(F) = min{ϕA(F )

}

A search algorithm A is called minimal for F iff ϕA(F ) = Δ(F). Note that, in gen-eral, such an algorithm is not unique. We can define the analogous concepts for thenon-uniform case. Let Z be a random variable taking values in YX according to adistribution p, then we define

ϕA(Z) =∑

f ∈YX

ϕA(f )p(f )

and

Δ(Z) = min{ϕA(Z)

}

3.1 A particular case: sets of functions closed under permutation

Let x = x1, . . . , xn be a shortest sequence over X containing an optimum for eachfunction in F , and exhibiting the property that i < j implies that xi is an optimum ofat least as many functions as xj . Then, the average optimization speed for an algo-rithm with such X-trace is at most (n+1)/2, with n ≤ min{|X|, |F |}. However, when

Page 6: A study of search algorithms’ optimization speed

J Comb Optim

we consider the case in which F is c.u.p., by the Sharpened NFL theorem, every non-repeating algorithm has the same average optimization speed of a random, exhaustivesearch. Drawing solutions uniformly from X without replacement, the expected num-ber of trials required to obtain an optimal solution of f is (|X| + 1)/(op(f ) + 1).

Theorem 4 (Igel and Toussaint 2004) If F is c.u.p. and every function in F hasexactly k optimal solutions, then:

Δ(F) = |X| + 1

k + 1

We can prove a stronger result.

Theorem 5 Let Z be a random variable with block-uniform distribution p on YX

and let F be the basis of p, i.e. F = {f ∈ YX | p(f ) = 0}. Then

Δ(Z) = (|X| + 1) ∑

[f ]⊆F

∑g∈[f ] p(g)

op(f ) + 1

Proof Each block [f ] is c.u.p. and every function in it has the same number of opti-mal solutions op(f ). According to previous theorems, for any search algorithm A

Δ([f ]) = ϕA

([f ]) = |X| + 1

op(f ) + 1.

Thus, the following property holds:

Δ(Z) =∑

[f ]⊆F

Δ([f ])p([f ])

= (|X| + 1) ∑

[f ]⊆F

p([f ])op(f ) + 1

(1)

where p([f ]) = ∑g∈[f ] p(g). �

Note that the equation in Theorem 5 provides the expected performance of randomsearch, consistently with the NFL theorem.

3.2 The general case

We prove that, given w ∈ N and a set of functions F , determining whether Δ(F) issmaller than w is an NP-complete problem. The proof uses a well known computa-tional problem, called Binary Decision Tree (BDT), that we briefly introduce below.

Let O = {o1, . . . , on} be a finite set of objects and let T = {t1, . . . , tk} be a finite setof tests. For each test ti and each object oj , we either have ti (oj ) = 1 or ti (oj ) = 0;also, for every distinct pair o1, o2 ∈ O there exists t ∈ T such that t (o1) = t (o2). TheBDT problem consists in constructing an identification procedure for the objects in O

such that the expected number of tests required to identify an element is smaller thana given threshold w. Such a procedure can be represented by a binary decision treehaving the root and internal nodes labeled with tests in T and the leaf nodes labeled

Page 7: A study of search algorithms’ optimization speed

J Comb Optim

with objects in O . To identify an unknown object o, one first applies the test labelingthe root to o; if the outcome of that test is 0, the left branch is traversed, otherwisethe right one. This procedure is repeated at the root of each subtree until one reachesa terminal node that permits to identify the object.

Let p(o) be the length of the path from the root of the tree to the terminal nodethat enables to identify o, i.e. the number of tests required to identify o, and let s =∑

o∈O p(o). Then, the cost of a tree t is the average external path length:

c(t) = s(t)

|O| = 1

|O|∑

o∈O

p(o)

Problem 1 (Binary decision tree) Given (O,T ) defined as above and w ∈ N, deter-mine whether a decision tree for (O,T ) with a smaller cost than w exists.

BDT is known to be NP-complete (Hyafil and Rivest 1976). A test t is said to beidentifying if there is a single o ∈ O such that t (o) = 1. The following property holds:

Lemma 1 Given a set of objects O , a set of tests T , a decision tree A for (O,T ) anda set of identifying tests I , there exists a decision tree A′ for (O,T ) with c(A′) ≤ c(A)

such that all tests in T \ I have all their ancestors in T \ I and the number of testoccurrences of I in A and A′ is the same.

Proof Suppose that A has a subtree γ like the following one:

where t̂ is an identifying test in I and α,β are subtrees. Note that α and β cannotcontain leaves labeled o as both subtrees are reached only if test t̂ yields 0. We cansuppose t (o) = 0 without loss of generality. Now, consider the following tree δ

Subtrees α and β are both still reached only if test t̂ yields 0. The path length p(o) isincreased by 1, and the path lengths of leaves in subtree β are decreased by 1. Thereis at least one leaf in β , so c(A′) <= c(A). Repeatedly restructuring the tree in thismanner yields a tree in which nodes labeled with identifying tests have only leavesas children. �

Page 8: A study of search algorithms’ optimization speed

J Comb Optim

We define the following problem:

Problem 2 (Search algorithm) Given a set of functions F ⊆ YX and w ∈ N, deter-mine whether exists a search algorithm A such that ϕA(F ) ≤ w.

Then, the following theorem holds:

Theorem 6 The Search Algorithm (SA) problem is NP-complete.

Proof We assume F to be explicitly specified in the problem instance, i.e. the tape ofthe Turing machine encodes a set of pairs (x, f (x)) for each x ∈ X and f ∈ F . Thesize of the problem instance is polynomial with respect to |X| and |F |.

We need to prove that SA is in NP. Consider the verifier-based definition ofNP, according to which a problem is in NP if and only if there exists a veri-fier for the problem that executes in polynomial time. We can check an algorithmhas optimization speed over F lower than w by running the algorithm over eachfunction in F and computing the optimization speed. Each run will take O(|X|)time as a (non-repeating) search algorithm evaluate the function in a point at mostonce. Thus, the verification of a solution takes time O(|X||F |) and therefore SA isin NP.

To prove that SA is NP-complete, we show that BDT can be reduced to it. Forevery pair (O,T ) let X = {x1, . . . , xn+k}, Y = {0,1,2} and F = {f1, . . . , fn} suchthat

fi(xj ) :=⎧⎨

tj (oi) for 1 ≤ j ≤ k

2 for j = k + i

0 otherwise

We prove the following:

BDT(O,T ,w) = yes ⇐⇒ SA(F,X,Y,w) = yes

⇒ Since no pair of functions f,f ′ ∈ F has the same optimum, knowing thefunction optimal solution is equivalent to identifying a function among F . There-fore, given a binary decision tree for (O,T ) we can build a search algorithm for F

by replacing every label tj with xj and every label oi with xk+i . The optimizationspeed of such search algorithm over F would be equal to the cost of the decisiontree.

⇐ Given a search algorithm for (F,X,Y ), we can build a binary decision tree for(O,T ) with a smaller or equal cost. We begin replacing the labels 2 with 1 and thelabels xj with tj for 1 ≤ j ≤ k. Then, we replace each label xk+i with oi if it labelsa leaf and with ri if it labels an internal node. Now we get a binary decision tree A

for the structure (O,T ∪ R), where R = {r1, . . . , rn} and a test ri(oj ) = 1 iff i = j .Therefore, all tests in R are identifying.

Page 9: A study of search algorithms’ optimization speed

J Comb Optim

Using the lemma above with I := R, the tree A can be transformed in a tree A′such that all tests in T have all their parents in T . Thus, every ending of A′ will be inthe form

with one of the nodes oep , oep+1 possibly missing. If this is the case we can justremove rep and attach its only child to rep−1 . Otherwise, consider oep and oep+1 . Byhypothesis, T contains a test t̄ such that t̄ (oep ) = t̄ (oep+1) and clearly t̄ cannot be anancestor of rp otherwise oep and oep+1 would have been placed in different subtreesof t̄ . Thus we can replace every occurrence of rep with t̄ , obtaining a tree A′′.

Using the previous lemma again and iterating this step, we can replace every oc-currence of a test in R with an occurrence of a test in T , and then we get a binarydecision tree for (O,T ) with an equal or smaller cost than w.

To end the proof we note that the reduction can be performed in polynomial time.Basically, we only need to copy the encoding of T and the add some trivial extravalues to each function.

Finally, note that

∃A : ϕA(F ) ≤ w ⇐⇒ Δ(F) ≤ w

hence determine whether Δ(F) ≤ w is equivalent to the Search Algorithm prob-lem. �

4 Automatically generated search algorithms

4.1 Motivation

We wish to show that the difference between Δ(F) and the optimization speedachieved by popular search algorithms can be remarkably large. To avoid the com-putation of Δ(F), which would be NP-hard, one can build a search algorithm S that,unlike regular search algorithms, takes advantage of the knowledge of the functionsin F . If the claim above holds for ϕS(F ), it will also hold for Δ(F). To build sucha search algorithm, we introduce the algorithm called RR (Algorithm 1). It is impor-tant to remark that this is not a feasible approach for real applications, as building thetree representation of a search algorithm alone requires time (and space) Θ(|X||F |),which is a higher cost than sampling exhaustively every solution in X for every func-tion in F . Rather, RR is a tool in a proof of concept.

Page 10: A study of search algorithms’ optimization speed

J Comb Optim

Input : A set of functions F and the set Z ⊆ X of solutions not examined yet.Output: A search algorithm for F .begin

if every function in F has an optimal solution in X \ Z thenCreate a unary tree t with |Z| nodes;Label the nodes of t with the solutions in Z;Label every arc of t with ∗;return t ;

elseCreate a root node for the tree;if all the functions in F have an optimal solution x ∈ Z in common then

Label root with x;Add a new tree branch labeled ∗ below root ;Add RR(F,Z \ {x}) below the new branch;

elsex̂ = Solution_Selection (F,Z);Label root with x̂;foreach a ∈ {f (x̂)|f ∈ F } do

Add a new tree branch labeled a below root ;Fa = {f ∈ F |f (x̂) = a};Add RR(Fa,Z \ {x̂}) below the new branch;

endendreturn root;

endend

Algorithm 1: The RR algorithm

4.2 The RR algorithm

RR is a greedy recursive algorithm that builds a compact tree representation of asearch algorithm in a top-down fashion. The algorithm has two parameters: a set offunctions F and the set Z ⊆ X of the solutions not yet examined. As a preprocessingstep, F is examined and a table containing the optimal solutions for each function iscreated. Then, RR is called on input (F,X). At the beginning, the algorithm deter-mines whether every function in F has an optimal solution not in Z. If the answer ispositive, since the solutions visited by the search algorithm after an optimal solutiondo not figure in the calculation of ϕ, RR creates a unary tree t with |Z| nodes. Thenodes of such tree are labeled with the solutions in Z and every arc is labeled with ∗,meaning any value.

Then, RR checks if all the functions in F have an optimal solution x in common.If the answer is positive, the algorithm creates a root node labeled with x and assignsthe tree returned by RR(F,Z \ {x}) as one of its subtrees. Otherwise, the algorithmcalls a solution selection routine, which returns a solution x̂ ∈ X. Let Fa be the setof all functions in F having value a in x̂. Then, for every a, the RR algorithm isexecuted on input Fa and the result is attached to a branch of the root node labeled a.

Page 11: A study of search algorithms’ optimization speed

J Comb Optim

The task of the solution selection routine is to select the most appropriate solutionamong those that have not been examined yet. In order to keep the tree balanced, weselect the point producing the most even split, i.e. the point such that the varianceof the size of the family {Fa} is as small as possible. Other splitting criteria, e.g.,minimizing the average size or average entropy (logarithmic size) of subsets mayyield better results. The routine is shown in Algorithm 2.

Input : A set of functions F and the set Z ⊆ X of solutions not examined yet.Output: A solution in Z.begin

foreach x ∈ Z doμx = 0;Vx = 0;foreach a ∈ {f (x)|f ∈ F } do

Sa = |{f ∈ F |f (x) = a}|;μx+ = Sa ;Vx+ = (Sa)

2;endVx = Vx/n − (μ/n)2; /* Vx = variance of Sa */

endx̂ = arg min

x∈Z

(Vx);

return x̂;end

Algorithm 2: The solution selection algorithm

4.3 Experimental setting and results

In line with our motivations, in designing our experiments we wanted to use functionsthat pose a real challenge for traditional search algorithms, but without resorting toad-hoc or completely artificial problems. We chose two well-known families of testproblems: NK fitness landscapes with adjacent neighborhood (Altenberg 1997) andtrap functions (Deb and Goldberg 1993).

The sets of functions used in our experiments are partitioned into two groups,each one composed by four sets. The first group contains sets of trap functions. Trapfunctions are defined over X = 0,1N , and their value is determined by the distanceto a given optimal solution x̂. They depend on the values of two constants: B (thewidth of the attractive basin for each optimum) and R (their relative importance), sothat the cost of a solution is

f (x) ={

(B−d(x))B

d(x) ≤ B

R(d(x)−B)N−B

otherwise

where d(x) is the Euclidean distance between x and x̂. The four sets used for ourexperiments are respectively composed by 100, 250, 500 and 1000 trap functions.

Page 12: A study of search algorithms’ optimization speed

J Comb Optim

The values for B , R and optimal solution have been chosen at random with uniformlydistributed probability over their domains (i.e., the range (0,1) for B and R). Thenumber of bits N is 8.

The second group of functions contains four sets of NK-landscapes functions.NK-landscape functions are defined by two constants (N and K) and one functionφ: [0,1]K+1 → [0,1], according to the formula

f (x) =N∑

i=1

φ(xi−�K/2�, . . . , xi+�K/2�)

Periodic boundary conditions are used, meaning xj = xj±N . The sets of functionswe have used are respectively composed by 100, 250, 500 and 1000 NK-landscapeswhose parameters have been generated uniformly at random: K was chosen amongthe values 0, . . . ,N − 1 and φ among all functions [0,1]K+1 → [0,1]. For all func-tions, the search space is composed by binary strings of 8 bits (i.e. N = 8).

The search algorithms produced by RR are compared with Genetic Algorithms(GAs) and Simulated Annealing (SA) (see Goldberg 1989; Holland 1975; Aarts andKorst 1989 respectively). We also computed the expected optimization speed of Ran-dom Search (RS) to be used as baseline for the comparison. Since both GAs andSA are repeating search algorithms, in the calculation of ϕ the cost function evalu-ations are counted ignoring repetitions. Moreover, since both GAs and SA are non-deterministic algorithms, 100 independent runs have been performed. Runs in whichthe algorithm has failed in finding the optimum has been counted as 256, the cost ofan exhaustive search.

Both GAs and SA have been tested using many different parameters values. Below,we report the parameters setting that has allowed us to obtain the best results on thepresented test functions. For GAs the following set of parameters is used:

– population size of 100 potential solutions;– uniform crossover with rate 0.9;– standard mutation with rate 0.01;– tournament selection with size 2;– elitism;– number of generations 200.

For the SA the following set of parameters is used:

– initial temperature 0.95;– temperature decrement factor 0.95;– final temperature 10−9;– iterations per cycle 50.

Test results are reported in Table 1. The first column names the sets of functionsF on which the experiments were done (e.g., “Trap p” names a set of p randomlychosen trap functions). The next columns report the scores obtained by the searchalgorithms. For GA and SA, we reported the median value of 100 runs, and in paren-thesis there is the number of runs that succeeded in finding the global optimum. Giventhe heavy tail of the GA and SA performance distribution, the median is a better in-dication of typical performance than the mean. Finally, the last column reports theexpected optimization speed of Random Search.

Page 13: A study of search algorithms’ optimization speed

J Comb Optim

Table 1 The number of evaluations of the cost function to obtain an optimal solution for the algorithmsproduced by RR, Genetic Algorithm, Simulated Annealing. For Genetic Algorithm and Simulated Anneal-ing the number of successful runs is reported between parenthesis. In the last column there is the expectednumber of evaluations for Random Search

F RRmean

GAmedian

SAmedian

RSmean

Trap 1000 3.03 124 (82) 148 (76) 128.5

Trap 500 2.98 126 (82) 144 (77) 128.5

Trap 250 2.96 125 (83) 141 (78) 128.5

Trap 100 2.76 125 (85) 150 (75) 128.5

NK 1000 5.66 87 (96) 131 (81) 84.8

NK 500 4.55 92 (98) 137 (80) 87.4

NK 250 4.2 91 (97) 129 (82) 86.9

NK 100 3.83 93 (98) 132 (81) 82.7

The optimization speed achieved by RR is remarkably small. With values between2 and 6, RR scored less then one-tenth the value of random search. Instead, the me-dian performance of both GA and SA is comparable with that of RS. The meanperformance of GA and SA, not reported, is worse than that of RS due to the lowconvergence rate achieved on some of the test problems.

The results of the experiments prove our thesis: the value of Δ(F) can be farlower than the performance of commonly used search algorithms. However, such aperformance came with a price: we used perfect knowledge of the problems we weredealing with and an extremely large amount of resources. That makes our approachcompletely infeasible in practice. Also, the search algorithms we produced are com-pletely tailored for the problem instances they were built with (i.e. a specific set offunctions) as they essentially memorize what to do rather then use a general approachlike common search algorithms. As a consequence, they have no ability to generalizeto other set of instances of the same problem and are likely to perform worse thanrandom search.

5 Conclusions

“No free lunch” theorems imply that if the optimization practitioner knows nothingas to how the cost function associates solutions with values, then the choice of searchalgorithm is arbitrary. As a consequence, given a particular optimization problem,it is not straightforward to find a suitable algorithm to solve it in an efficient way.In general, various algorithms are compared in order to find the most efficient one,and optimization speed (i.e. the number of steps needed by an algorithm to find anoptimal solution) is often used as an efficiency criterion. Thus, the best optimizationspeed that can be obtained by an algorithm over a set of functions is an importantmeasure, because it can give an immediate view of “how good” the algorithms weare testing are and how much they can (at least in theory) be improved.

Page 14: A study of search algorithms’ optimization speed

J Comb Optim

In this paper, we have presented a study of the best optimization speed and someof its properties. We have been able to derive an equation for the best optimizationspeed for particular sets of functions, i.e. closed under permutation sets of functions,and in general block-uniform distributions of cost functions. Furthermore, we haveproven that determining whether a set of cost functions admits a search algorithmwith bounded optimization speed is an NP-complete problem.

However, there may exist an approximation algorithm able to estimate the bestoptimization speed in polynomial time. A formal method (that we have called RR)to automatically generate a “suitable” optimization algorithm for a given set of costfunctions has been presented in this paper. We have used the optimization speed ofthe algorithms generated by RR as a bound of the best optimization speed, and wehave compared it with the optimization speed of genetic algorithms and simulatedannealing on some instance sets of well-known optimization benchmarks (these setswere composed by various trap functions and NK landscapes). Even though RR hasa high computational complexity, and thus is not of practical use, we believe thatit is interesting to point out that the algorithms generated by RR have remarkablyoutperformed genetic algorithms and simulated annealing over all sets of functionsthat we have studied. Given that the best optimization speed can be even better thanthe optimization speed of the algorithms generated by RR, it is interesting to remarkhow commonly used optimization algorithms (like genetic algorithms or simulatedannealing) can offer poor performances compared to the best theoretical one.

References

Aarts E, Korst J (1989) Simulated annealing and Boltzmann machines: a stochastic approach to combina-torial optimization and neural computing. Wiley, New York

Altenberg L (1997) Nk fitness landscapes. In: Back T, et al (eds) Handbook of evolutionary computation.IOP Publishing Ltd and Oxford University Press, Bristol, pp B2.7:5–B2.7:10

Deb K, Goldberg DE (1993) Analyzing deception in trap functions. In: Whitley D (ed) Foundations ofgenetic algorithms, vol 2. Morgan Kaufmann, San Mateo, pp 93–108

English TM (2004) On the structure of sequential search: beyond “no free lunch”. In: Gottlieb J, Raidl GR(eds) EvoCOP. Lecture notes in computer science, vol 3004. Springer, Berlin, pp 95–103

Goldberg DE (1989) Genetic algorithms in search, optimization and machine learning. Addison-Wesley,Reading

Häggström O (2006) Intelligent design and the nfl theorems. Biol Philos 22(2):217–230Holland JH (1975) Adaptation in natural and artificial systems. University of Michigan Press, Ann ArborHyafil L, Rivest RL (1976) Constructing optimal binary decision trees is NP-complete. Inf Process Lett

5(1):15–17Igel C, Toussaint M (2004) A no-free-lunch theorem for non-uniform distributions of target functions. J

Math Model Algorithms 3(4):313–322Jackson K, Kreinin A, Zhang W (2009) Randomization in the first hitting time problem. Stat Probab Lett

79(23):2422–2428. doi:10.1016/j.spl.2009.08.016Radcliffe N, Surry PD (1995) Fundamental limitations on search algorithms: evolutionary computing in

perspective. Lecture notes in computer science, vol 1000. Springer, Berlin, pp 275–291Rowe JE, Vose MD, Wright AH (2009) Reinterpreting no free lunch. Evol Comput 17:117–129. DOI

http://dx.doi.org/10.1162/evco.2009.17.1.117. URL http://dx.doi.org/10.1162/evco.2009.17.1.117Schumacher C, Vose MD, Whitley LD (2001) The no free lunch and problem description length.

In: Spector L, Goodman ED, Wu A, Langdon WB, Voigt HM, Gen M, Sen S, Dorigo M,Pezeshk S, Garzon MH, Burke E (eds) Proceedings of the genetic and evolutionary com-putation conference (GECCO-2001). Morgan Kaufmann, San Francisco, pp 565–570. URLcite-seer.ist.psu.edu/schumacher01no.html

Page 15: A study of search algorithms’ optimization speed

J Comb Optim

Valsecchi A, Vanneschi L (2008) A study of some implications of the no free lunch theorem. In: Gi-acobini M, et al (eds) International workshop on theoretical aspects in artificial evolution, EvoTheory2008. Proceedings of applications of evolutionary computing, EvoWorkshops 2008. Lecture notes incomputer science, vol 4974. Springer, Berlin, pp 633–642

Wolpert DH, Macready WG (1997) No free lunch theorems for optimization. IEEE Trans Evol Comput1(1):67–82. URL citeseer.ist.psu.edu/wolpert96no.html