evaluation of random gradient techniques for unconstrained optimization

EVALUATION OF RANDOM: GRADIENT TECHNIQUES FOR UNCONSTRAINED OPTI)IIZATION

F . A R C H E T T I (i)

ABSTRACT: The random gradient method is first evaluated analytically in connection with ((Adaptive Step Size Random Search ~ (A. S.S.R.S.). The random weighted gradient technique (R. W. G) is proposed as an effective algorithm for unconstrained optimization. Its implementation and numerical performance on some test problems are analyzed.

RIASSUNTO: I1 metodo del gradiente statistico viene dapprima valutato analiticamente confrontandolo con ~ Adaptive Step Size Random Search ~ (A. S. S. R. S.). La tecnica del gradiente statistico pesato (R.W. G) viene proposta come un algoritmo efficace per la ottimizzazione non vincolata. Si esaminano successivamente la realizzazione dell'algoritmo ed il suo comportamento numerico per alcuni problemi test.

1. I n t r o d u c t l o . .

R a n d o m s e a r c h m e t h o d s for m i n i m i z a t i o n were f irst s u g g e s t e d b y

B r o o k s [1] a n d R a s t r i g i n [2] who c o m p a r e d a (< f ixed s t e p s ize r a n d o m

s e a r c h , w i th a f ixed s t ep s ize g r a d i e n t t e c h n i q u e s h o w i n g t h a t in some

s i t u a t i o n s t he r a n d o m sea rch is supe r io r .

A m a j o r i m p r o v e m e n t was r e a l i z e d b y S c h u m e r S t e i g l i t z [3]: s t u d y i n g

t h e in f luence of t he s t ep s ize on t h e p e r f o r m a n c e of t he s e a r c h t h e y ana ly -

zed �9 O p t i m u m S t e p Size R a n d o m S e a r c h ~) O .S .S .R .S . a t h e o r e t i c a l mode l

o f t he r a n d o m s e a r c h for t he o b j e c t i v e f u n c t i o n Q ~ ~ x~ a n d d e r i v e d an i~1

a c t u a l a l g o r i t h m << A d a p t i v e S t e p S ize R a n d o m S e a r c h >) A . S . S . R . S . wh ich

a p p r o x i m a t e s t h e p e r f o r m a n c e of O .S .S .R .S .

T h e y t e s t e d A . S . S . R . S . a g a i n s t the N e w t o n - R a p h s o n m e t h o d a n d the

s i m p l e x m e t h o d of N e l d e r a n d Mead for t he o b j e c t i v e f u n c t i o n s Q ~ ~ x 2

Q = .~ e, x~, Q = .~ c, x~ w h e r e o, a r e r a n d o m v a r i a b l e s u n i f o r m l y chosen iffi=l /~1

Riceved 7-5-74. (i) Rieereatoro C.N.R. - - Laboratorio di AuMisi Numeriea - - Pavia.

84 l 7. ARCHETTI: Evaluation o/ random

in [0.1,1]. The performance criterion used is the number of function evaluations required to obtain a prefixed accuracy. (This number in the sequel will be indicated by IFUN). Schumer and Steiglitz showed that A.S.S.1-LS., whose IFUN grows linearly with n, outperforms Iqewtonl~aphson and the simplex method for high dimensionality problems. White-Day [4] improved A.S.S.R.S. parameters, testing it for the same functions against the Fletcher" Powell method (FLEPOW): for FLEPOW, IFUIg is actually a linear function of n with a better slope than A:S.S.R.S. Schrack-Borowsky [5] tested extensively A.S.S.R.S. and other random methods, showing their stability in noisy conditions but remarking their poor performance near the minimum and their ineffectiveness in presence of ridges and narrow valleys in the objective functions.

Despite these results and the widespread opinion that deterministic methods outperform random search for most problems, the latter seems however interesting and worth a further study for the following reasons: i) Small memory storage required ii) Structural semplicity and ease of computer implementation iii) Stability with respect to noise both inherent and generated (rounding errors). In this paper we introduce a family of random methods related to the work of Ermol'ev [6] and ~Nikolaev [7], [8], [9], more general than A.S.S.R.S., and test their numerical peribrmance for the same functions of White-Day. These methods retain the adwtntages listed in i) ii)iii) and show a much better rate of convergence than A.S.S.R.S., comparing favorably with FLEPOW.

2. O.S.S.R.S. and A.S.S.R.S. methods.

We briefly recall some analytical results of Schumer-Steiglitz that will prove useful in later sections. The algorithm for the (( fixed step size random search ~ is x~§ ~ x~ - - as A x~-~- z]x~+l where x( is the position at the i-th step and

I 0 Q, Q,*_~ Q ~ Q ( x ~ ) Q~*~ min Qi a s ~ l

i--1, 2 . . . . . ~ ( 1 Q ~ Q?_~

Ax~ is a random vector, with [Ax~[..~-s, that is distributed uniformly on the hypersphere of radius s with center at the origin.

Let Q----- ~ xl and ~ be the angle between the displacement vector and /_---1

the negative direction of the gradient; r depending on s~ is the largest value of ~o giving an improvement of the function Q. The probability density of q~ in view of the assumed uniformity of the distribution on the

gradient techniques ]or unconstrained optimization 85

hypersphere is : sin "-2

p (~) = ~ 8 [o, ~1. 2

2 f sin '~-2 T dcp 0

I f the step size is small we may approximate the fimction Q by an hyper- plane, so that the probability of a succesful step is nearly 0.5: the improvement however is very small : a large step size instead gives a small probability of improvement. Let A Q = Q~+I- Q~ and ~ = s/~ where

l 2 1 ~/~ == ~ xi : SchumerSteigl i tz

improvement

E ~ ( 1 (,~, '7) = Q

considered the normalized expected

f (2,7

O

cos T--,2 ~) sin ~-2 ~ d~o

2

2 f sin '~-2 ~o d~o

0

The desire to maximize I(Ib ~/) rules the step size choice : set t ing df!n, ~).~_ 0,

as ~ ~ arcos (~/2), we get the equation in ~/

0 ~ r / < 2 ~ / f sin'S-' 9 d~ = (! --n--ln~/4)-V

0

Appropriate approximations yield -- for large n -- the asymptotic relations

1.225 0.406 ~]opt ~ ] /~- /:)opt -----0.27 I~AX ~ - - n

where /'opt is the probability of a succes ful step, corresponding to ~ / ~ ~/ovt �9 The calculations leading to such results are carefully developed is the paper by Schumer-Steiglitz. We can easily prove that if the normalized improvements are independent random variables the following asymptotic relation holds :

1 EIIFUN 1 coK'n where K" =--~- Log , K= 0.406,

Q0 is the initial value of the objective fnnction and Qy is the final value

86 F. ARCHETTI: Evaluation of random

assuring the prefixed accuracy. O.S.S.R.S. is a theoretical model: the 1.225

relation So~t= ~ - 0 is derived by an asymptotic approximation and

holds only for Q - - ~ x~.

In the actual algorithm A.S.S.R.S. the evaluation of the step size requires more experimentation [3] and the observed improvement is less than Im.~.

The importance of this model is that the linear behaviour of I F U ~ as well as the relation between the prefixed accuracy and IFUN hold for A.S.S.R.S. and other objective functions. We may introduce an adaptivity principle in the direction of the search by taking, after an unsuccessful trial, the next step in the opposite direction. For the objective function

n 2 Q = X x~ the performance of the above random search with reversal has

i ~ l

been evaluated analytically in [10] and the improvement may be actually tested. For other objective functions the actual improvement becomes negli- gible and our computations seem to suggest that this adaptivity is rather rough. The more complex architecture of random gradient methods will give us the possibility of introducing in the algorithm a suitable adaptivity principle.

3. Developments of l landom gradient techniques.

We consider methods of this form xk+l-~- xk + 0h ~k where ~ is the direction along which the search at the k th iteration takes place and 0~ is the step size. Also O.S.S.R.S. may be written as one of these methods but, while for O.S.S.R.S. the step size choice is intended to maximize the expected improvement, in this paper we develope random gradient techniques dealing separately with the choice of ~ and 0k.

4. Choice of the step size.

Given the values P0 ~- Q~ = Q (xk), Pi = Q (xk + ~) and P2--'-- Q (xk + 2~k) we use a quadratic fitting Pck(~) = a)l ~ + b~ + e (1) of the objective function along the direction ~ . The optimal value of the step size 0~P t is such that :

Q (x~ + 0~p t ~) = rain Q (x k + a$~).

From (1) for quadratic functions we obtain:

b 3Po - - 4P, + P~


For non quadratio functions approximation by (1) is rough and calcu- lation by (~) leads to a non optimal step value. Actual computat ions for

the objective function Q ~- ~ e i x~ show that the non optimal step size

choice by (2) doesn' t affect significantly the performance of the search. For more complex objective functions we could rely upon a bulky l i terature on line optimization using e.g. the golden section rule, cubic or repeated quadratic fittings.

However the application of such line techniques to random search will

be the subject of future investigations. For the function Q ~ ~, x~ ~ (x, x)

and a uniform random choice of ~'k we evaluate the performance of the method : xk+ 1 ~-- x~ -[- ~pt ~k"

The improvement at the k th i terat ion is AQk ~- Q (xk+l) - - Q (xk) AQz

~__ Q~ (~-~ ~ ~-k) _~_ 2~ k (~'k, xk). Set t ing dAQ~__~ 0 we get o~ _ (x/~, ~'k) de~ ek - - (r r "

(D~'k, x~) The same analysis for Q~- ~ c i x 2. would resul t in e~p t~ ( / ) ~ , $~)

where D is a diagonal mat r ix : d j ~ cj. Let ~o be the angle between ~

and the negat ive direction of the grad ien t : if Q ~ 2 ----- x ~ , we get :

Q(xk+~) ~--Q(x~)sin2~ because xk+l , with the optimal choice of ~ , is on the n dimensional hypersphere through xk and the origin. The nor-

~ A Q~ _ Q (xk_~.i) - - Q (x~) malized improvement is Ik------ - - - - c o s 2 r " F ~ Q~ Q (xk)

assumed uniformity of distr ibution of % the expected normalized improvement is

2

f o e s ~ ~o T sin--2 d~ I

I k = E I = E I c o s ~ I = o __" n 2

f ain -2 ~ dq~

0

We may est imate the number of i terations required to satisfy a stopping condition Q ( x k ) ~ Q f, star t ing with Q (Xo)~ Q0. Q~ may be expressed

recursively as

k--I (3) Q~ ---- q~_~ (~ - - ~_~) ---- Q0 27 (~ - - ~ ).

j~-0

8 8 F. ARCHETTI: Evaluation oJ random

By taking the expected value of both sides, under the reasonable assumption tha t normalized improvements are independent random variables, we get

E(Qk)~ Oo 1 - . Sett ing E(Qk)= Q] the stopping condition yields

an estimate of the required number of steps: k ~ and the

i Oj asymptotic expression for large n becomes k c o - - n g ~ . For O,S.S.R.S.

n QY where K == 0.406. we have E l IFUIq l co - - -~- Igo-- ~-

As each step in the above method requires the two function evaluations (x~,r and (r Ck) we get a slighly better value than throughout O.S.S.R.S..~fforeover it is important to remark that while O.S.S.R.S. is only a theoretical model, the performance evaluated analytical ly for the method of this section has been actually tested. We now t ry to improve this performance by analyzing a random technique for the choice of the direction.

5. Di rec t ion choice.

Let v~J j ~ 1, m be a system of independent n-dimensional normal random vectors with zero mathematical expectation and unit covariance matrix and ~J j ~ 1, m an orthonormalized basis obtained from tPJ.

At the point xl~ we define the random m-gradient G'* (x~)

G 'a (xk) ----- ~ ~/J lim Q (xl~ -~- st]J) - - Q (xk) j~---1 ~ ~ O 8

Let ~k ----" - - G'* (xk). For quadratic objective functions Q -.--- (Ax, x) where A is posit ive definite, 5Tikolaev proved in (8) tha t ~ has the probabil i ty density function

P (q~) ~ 2B-1( m2 ' n--2 m) (sin ~),~-,,,,-1 (cos ~).,,-1

and c o s ~ follows a fl-distribution with dens i ty :

p x , y , - - = , ~ x - / - l ( l _ x ) 2

/., n


A s w e

method to (3) :

are interested in evaluating the performance of the random G"

we get from the expression E lcos2cp l - -= - - a formula analogous ~t

E IQ(xk+l) i ~ Q(x~) 1 - - ~ - ~ - ]

where ~* is the spectral condition number of the matrix A. For largo values of n~ the stopping condition Q(xk) ~ QI yields the

asymptotic expression of the required number of steps

m Q0 "

As each step with the random G "~ method requires m-{-3 function evalua-

tions we get an asymptotic estimate: E IIFU~NI,~ m-Jr-3 n~alg QI G'* m ~ 0 "

The G'* method, for m ~ 3 performs better than G i and this improvement has been actually tested. From the G '~ method m ~ 2 we may derive the l~andom Weighted Gradient method (R.W.G.). The larger the improvement

AQ~-Q(x~..]-s~]i) -Q(xk) and the greater the effect of the j-th trial will be on the direction ;'l,. To enhance this effect we use the following formula

(4) ~ ~ ~/JlsgnAQ/~ lira [AQi~IP I

where p is a positive .exponent. A formula related to (4) was first suggested in [11]. The gradient

approximation given by (4) proved very effective. We remark that in the implementation of the R.W.G. algorithm no attempt is made to estimate

accurately "~k far from the minimum. Initially s takes a nominal value (s ~ 1) and is reduced during the minimization procedure. The structure and the numerical performance of the R.W.G. algorithm will be analyzed in detail in the next section.

6. The R.W.G. Algori thm and its numerical performance.

The algorithm is set up in the following way:

1) Generate ~pJ j - - 1 , m a system of n-dimensional random vectors uniformly distributed on the unit hypersphere with center at the origin.

90 F. ARCHETTI: Evaluation o] random

Evaluate the search direction ~'~ by the following formula:

(5) ~ 1 ~ : ~'~ilsgnAQJi=~ j=L~'AQ~'P I'AQ~'P

2) Considering both successful and insuccessful trials. Here AQY~---- = O ( x ~ + s w j) - - Q (xT:).

3) By parabolic interpolation determine ~)k along the direction r using the formuht (2).

4) In determining $~+l ; if no trial happened to be a success, among the m performed at xa, reduce s by a factor ~.

The exponent p has a marked effect on the performance of the search : p-----2 proved to be the bos~ choice. The parameters m and ~ wore evalua" ted experimentally. We sot 5 = 0.2 and m-----3, 5, 7 respectively for n ~ 10, (20, 30) and (40, 50, 60). By allowing both successful and unsuccessful trials to be considered in (5) we introduce an adaptivity principle in the direction choice, thereby improving the performance of the search. Computations wore carried out on the UNIVAC 1106 and CII 10020 computers of the computing center of (< UniversitA di Milano, . Each function has been minimized for n ~ -

10, 20, 30, 40, 50~ 60 For each dimension tested 5 independent trials were run. The starting point for each of these trials was determined randomly on the unit hypersphero with center at the origin. The stopping condition

was set by Qy~lO -s. (For the objective function Q --..~ c~x~ Qy~lO-S Q0)" i~1

Analogous results have been obtained by imposing the stopping condition on the step size. A.S.S.R.S. has been implemented through extensive expe. rimontation and better results than those reported by White-Day were obtained. The average number of required function evaluations is represented by IFUbL For the dimensions we tested, the obtained values are well fitted by the straight lines shown in the following diagrams, whose slopes are reported in table 1.

~ A B L E 1

A.S.S .R.S . 55.2 76.1 40.1.

R'W.G. 32.2 44.5 26.6

gradient techniques /or unconstrained optimization 91

~4

. 0

I.I

~-~] ".~ I I

c,1

=.

e.

r

II O'

92 F. ARCI-IET'rI: Evaluation of random

N F I j I

~d d

II

r"

Q

, 0

N n ~ / I

k~

9 "~

II


t l ,

Fig. 4 shows the I F U ~ behaviour for the objective function Q = Z c~ x~

and the stopping conditions Q/-~-- 10 -h h = 1~ 2~ ... ~ 12. For both methods tested I F U N behaves l inearly with h, as derived

analytically in earlier sections. The performance of R.W.G. over A.S.S.R.S. improves as the required precision increases.

Comparisons with other algorithms are generally difficult to establish. In the ease of quadrat ic functions, the results obtained by R.W.G. are only slighly worse than those reported for F L E P O W by White-Day. The expe-

r iments we performed for Q = ~' c,. x~ suggest that also for non quadrat ic i = l

functions~ at least ia some situations, R.W.G. may face the determinist ic minimization algorithms.

7. Conc lud ing r emarks .

) Iore research is needed ia order to develope random methods suited for more complex objective functions. The R.W.G. algori thm retains the positive features of other random search methods (e:g. small memory storage and ease of computer implementation) and shows a much be t te r ra te of convergence. For these reasons :R.W.G. may well appear an a t t rac t ive technique for high dimensionality optimization problems.

9~ F. ARCHETTI: Evaluation of random

R E F E R E N C E S

[10] I. P.

[111 P. I.

[1] S. H. BRooKs, Discussion of random methods for locating surface maxima, Operations Research (;, (1958), 244-251.

[2] L. A. RASTRIGIN, The convergence of the random search method in the extremal control of a many parameter system, Automation and Remote Control 24: (1963), 1337-1542.

[3] M . A . SCHUMER, K. STEICLITZ, Adaptive Step Size Random Search, I. E. E. E. Tran- sactions on Automatic Control vol. AC-13, no. 3, (1968) 270-276.

[4] L. J. WHITE, R. G. DAY, An evaluation of Adaptive Step Size Random Search, I.E.E.E. Transactions on Automatic Control vol. AC 16 no. 5 (1971), 475-478.

[5] G. SCnRACK, N. BOROWS~Y, An experimental comparison of three random searches, in Numerical methods ]or non-linear optimization, (F. Lootsma ed.) Academic Press (1972), London.

[6] Yu. M. ERMOL%V, On the method of generalized stochastic gradients and stochastic quasi-Fejer sequences, Cybernetics vol. 5 no. 2, (1969), 208-220.

[7] E. G. NIKOLAEV, The random m-gradient method, Automatic Control, (I) 8 (1969), 26-29.

[8] E. G. NIKOLAEV, Steepest descent based on the random m-gradient method, Automa- tic Control (3) 4 (1970), 39-44.

[9] E. G. NI~OLAEV, Steepest descent with a random choice of directions, Automatic Con- trol (5)4 (1970), 25-31.

LAWRENCE, F. P. EMAD, An analytic comparison of random searching and gradient searching for the extremum of a known objective function, I .E .E .E . Transactions on Automatic Control vol. AC. 18 no. 6 (1973), 669-671.

BARTOLOMEI, Investigation of the statistical gradient method and its recent mo- dification in the problem of optimizing a multiparameter system, Automatia Control (2) 5 (1971), 32-37.

evaluation of random gradient techniques for unconstrained optimization

Documents