trust region algorithm - bachelor dissertation

Christian Adom (k0849957)Kingston University

4/27/20122012

The Trust Region AlgorithmTwo Semester Project

Abstract: Trust region methods are modern techniques for solving optimization problems. In this paper the operations and underlying theory of the trust-region algorithms is investigated. The convergence properties of the basic algorithm in relation to the Cauchy point are also examined. The basic algorithm is then extended by incorporating Powell’s double dog-leg step. The final algorithm is programmed in MATLAB and implemented on a test problem. The performance of the algorithm is then compared with the Newton-Raphson method.

Contents PageIntroduction.................................................................................................................................................3

The Trust Region Method............................................................................................................................5

[2] (B.T.R.A) Basic Trust Region Algorithm...................................................................................................7

The trust region sub-problem......................................................................................................................8

Local Model Minimiser............................................................................................................................9

The Cauchy point and the Model Decrease.............................................................................................9

Convergence of the algorithm...........................................................................................................11

Cases 1: Model minimiser within Trust region...................................................................................11

Case1b: Model Minimiser outside Trust region.................................................................................11

Case 2: Negative Curvature...............................................................................................................12

Powell’s Double Dog-leg step................................................................................................................12

Dogleg Parameters............................................................................................................................13

The Double-Dog leg Algorithm..........................................................................................................14

MATLAB Implementation..........................................................................................................................15

Code for Newton’s Method...................................................................................................................16

Code For Trust Region Double Dog Leg.................................................................................................19

Test Problem.............................................................................................................................................27

Analysis on Test problem...........................................................................................................................34

Conclusion.................................................................................................................................................35

References.................................................................................................................................................35

2

IntroductionLine search algorithms are one of two basic methods for solving optimization problems. These algorithms employ descent directions as a search direction, with the aim of reducing some objective function by taking an appropriate step length within this search direction.[1]

Examples of such methods are: The steepest (gradient) Descent, Newton and Quasi Newton Methods.[1]

In this paper an alternative method of solving unconstrained optimization problems will be examined, namely the Trust region method.

Trust region methods have been developed for over five decades and are commonly founded in the field of non-linear parameter estimation.[2]

The development of the method is primarily attributed to three individuals who seem to have developed it independently;

Kenneth Levenberg (1944) – Researched adding a multiple of the identity to the Hessian as a stabilization/damping procedure in an attempt to derive the solution of a number of nonlinear least-square problem.[2]

Morrison (1960) - In his a paper on trajectory tracking Morrison developed on Levenberg’s ideas, in which convergence of the estimation algorithm is enhanced by minimizing a quadratic model. In Morison’s paper a technique based on eigen-value decomposition is given to compute the model’s minimum within a chosen sphere.[2]

Donald Marquardt (1963) – While researching the link between damping the Hessian and reducing the length of the step Marquardt came to a similar conclusion as Morrison by proving that minimizing a damped model is similar to minimizing the original model within a restricted region.[2]

The trust region method is based primarily on the idea of approximating some objective function within a certain region at each iteration.

In contrast to line search algorithms (which also employ this idea of solving approximate models) the approximate model used with the trust region algorithm is constrained within a region at the current iterate with the idea that the model can only be “trusted” within this bound. This is the main difference between the trust region and line search algorithms.

The most the most prevalent of line search algorithms are the Newton and Quasi-Newton methods, which widely used within the field of optimization due to their fast (quadratic) convergence properties, given that certain conditions are satisfied.

3

The trust region algorithm is in fact a modification of such methods, in that it restricts the Newton step within the bounds of the trust region.[3]

This approach might at first seem counter-intuitive as an important feature of any algorithm is to reach the optimal point as quickly as possible.

However the trust region approach addresses (and remedies) the major drawbacks inherent in Newton’s method and is put in place to safe-guard Newton’s method from diverging. In fact most modern algorithms use a combination of line search and trust region methods for unconstrained optimization problems.

As a reminder we know that Newton’s method will converge to a local minima if:

1. Start point is not too far from the optimal point (less negative curvature to deal with)2. The Hessian matrix (or its approximation) is positive definite at each iteration.

In fact the requirement for a positive definite matrix is to ensure that the curvature of the function is always positive.

Now it can be proven that global convergence can still be achieved given an indefinite Hessian if a constraint of the form:‖sk‖2≤∆ kis imposed on the stepsize. [4]

This idea of constraining the step-size is one of the distinct features of the trust region method, where ∆ kis defined as the “trust region radius at iteration k” the region where we trust the model/approximation of the objective function to be “a faithful representation” of the objective function.

Consider the case where in a search for a minimum we encounter a region of negative curvature (Hessian negative definite), in this case Newton’s method will most likely diverge whilst the trust region algorithm is designed to calculate a step length of ‖sk‖2=∆k which will

Usually be a long step in the direction of the minimum[4]. We will explore this idea in more detail at a later stage.

The major drawback to the trust region method is that in order to obtain in the step sk a minimization problem subject to one constraint (known as the Trust region sub-problem) must be solved. This is not trivial and can be both computationally expensive time consuming, especially if a there are a large number of variables.

Finally it is worth noting that trust region methods have a wide range of applications within the fields of science, engineering and even the social sciences. The table below gives a few some examples of some of its applications

4

Table of Application [2]

Applied Mathematics Min-cost flows, Bi-level programming, Least-distance problems, Boundary values problems, Partial and ordinary differential equations

Physics Fluid dynamics, optics, electromagnetism

Chemistry Physical Chemistry, Chemical engineering, Molecular modelling, Mass transfer

Engineering Transportation analysis, Radar applications, Circuit design

Economics & sociology – Game theory, Random utility models, Financial portfolio management

5

The Trust Region MethodGiven an unconstrained optimization problem of the form:

min f ( x )∨xϵ R[1.0]

where: f(x) is assumed to be a real valued, twice continuously differentiable.

The trust region is defined as:

Bk= {xϵ Rn|‖x−xk‖2≤ Δk}[1.1] [5]

where: Δk is the trust region radius at the kth iteration

Now, it is worth noting that shape of the trust region can differ depending on the type of norm used (as shown in Figure [1.0]), and some authors have suggested that the shape of the trust region should be adjusted with each iteration, however for simplicity this paper will only consider the 2-norm.

Fig[1.0] – Showing various shapes of the trust region.

The trust-region approach to solving [1.0] is by first approximating the objective function by a model function, this is because the model function will usually be easier to handle and less costly to evaluate.

6

This model function is usually chosen to be quadratic, and is based on the idea that a function can be expanded locally by its Taylor series:

f ( x+δx )=f ( x )+δx f ' ( x )+ δx2

f ' ' ( x )+…[1.2]

Now [1.2] can be extended from one dimension to many dimensions so that the quadratic model can be defined as:

mk (xk+s )=f (xk )+gkT sk+

12sk

T H k sk [1.3] [6]

where: g is the gradient vector (or its approximation) and [2]H is the Hessian matrix (the square matrix of second order partial derivatives of a function) and s is the trial step.

Note: In real life problems (where there a large number variables) the Hessian matrix is usually approximated by methods such DFP and BFGS. However in this paper the analytical Hessian will be used since the test problems will only consist of a small number of (at most three) variables.

Once the model function is constructed the algorithm is then concerned with finding a step that sufficiently reduces the model within the trust region. This step is what we call the trial step.

Two further conditions are also placed on this step:

xk+sk ϵ Bk∧‖sk‖2≤ Δk [1.4 ]

Once a step that satisfies the conditions mentioned above is obtained the algorithm needs a way of deciding whether the reduction predicted by the model using this trial step is in agreement with the actual reduction observed in the objective function, thus it moves to evaluate what is known as the ratio of agreement:

(Actual Reduction )

(Predicted Reduction)≝ ρk=

f (xk )−f (xk+sk )f (xk )−m (xk+sk )

[1.5] [7]

If the value obtained from [1.5] shows “adequate” agreement between the reduction in the model and objective function the trial step is accepted and is used to compute the next iterate. In addition if this agreement is very close the trust region radius can be expanded with the hopes that the model function will continue to approximate the objective function well within an enlarged region.

Alternatively, if the value obtained from [1.5] shows “inadequate” agreement between the reduction in the model and objective function the trial step is rejected and the trust region

7

radius is reduced with the hopes that the model function will be better able to approximate the objective function within a smaller region.

(B.T.R.A) Basic Trust Region Algorithm [8] Step 0: Initialization

o Set k = 0o Choose an initial guess/search point, define as x0o Choose an initial trust region radius, define as Δ0o Choose parameters for η1 , η2 , γ 1 , γ 2 such that:

0<η1≤η2<1∧0<γ 1≤ γ 2<1 Step 1: Model Definition

o Define: mk (xk+s )=f (xk )+gT sk+12sT H k sk

Step 2: Step Calculationo Determine a step sk that reduces the model subject to:

xk+sk ϵ Bk∧‖sk‖2≤ Δk

Step 3: Acceptance of trial point:

o Compute: pk=f ( xk )− f (xk+sk )f ( xk )−m ( xk+sk )

if ρk ≥η1Then: xk+1=xk+sk else if ρk<η1 Then: xk+1=xk

Step 4: Trust region radius update

o Δk+1={[ Δk ,∞ )if : ρk≥η2∧¿ [γ 1Δk , γ2 Δk ]∧if : ρk ϵ [η1 ,η2][γ 1 Δk , γ 2Δk ] if : ρk<η1

Step 5: Stopping criteria:o ‖gk‖2<ε∧‖sk‖2<ε

Increment k by 1 and go to step 1

8

The trust region sub-problemAn important part of the trust region algorithm is the determination of the trial step sk that reduces the model defined in [1.3]

In order to obtain this trial step a constrained minimization problem must be solved, this problem is known as the trust region sub-problem and takes the form:

minmk (s )=gT s+ 12sT Hs [1.5 ] [9]

subject ¿:‖s‖2≤ Δk

Due to the importance of [1.5] the rest of this paper will be dedicated to examining a small subset of the known methods for efficiently solving this problem. There are three primary methods for solving [1.5] namely; The Local Model Minimiser, The Cauchy point and the Double Dog-Leg step. This paper will be dedicated to examining the last two methods; The Cauchy point and the Double Dog-Leg step, whilst the Local model minimiser will only be discussed briefly

Local Model Minimiser

The idea behind this method is to find a step sk which minimizes the model defined in [1.3] whilst satisfying the constraints, the main advantage of this method is that it usually gives an asymptotically fast-rate of convergence. It takes the form:

Given:

minmk (s )=gT s+ 12sT Hs [1.5 ] [10]

subject ¿:‖s‖2≤ Δk

Determine the global minimiser of [1.5] such that:

(H+λI ) s=−g [1.6 ] [10]

where: H+λI is positive semi-definite

and: lagrangianmultiplier λ≥0

In order to solve [1.5]-[1.6] a unique λ¿ must be found which satisfies the condition above. This is usually done by applying Newton’s method [10].

9

If a unique λ¿ can found at each iteration then a step sk that sufficiently reduces the model can be computed and consequently global convergence can be achieved. Yet this method has some major drawbacks, of all the methods available it is the most computational expensive, since obtaining the solution to [1.5]-[1.6] will require the factorisation of H+λI and matrix factorization can be very demanding [11].

Therefore rather than obtaining the actual local model minimiser at each iteration, algorithms have been designed that rather seek to approximate it. A few examples include preconditioned conjugate Gradient, Leven-Marquardt and Powell’s Dog-Leg method [12].

The Cauchy point and the Model DecreaseBefore discussing the double dog leg method, it important to examine what is known as the Cauchy point.As discussed above, all trust region algorithms seek to minimise some model or approximation of some objective function within a specific region. A simple way to do this is examine the behaviour of the model along the steepest descent direction, as this is where can expect a significant reduction in the model.The minimum of a model is found along the Cauchy arc, the point where we can expect the greatest decrease in the model is known as the Cauchy point and the step taken towards such a point is called the Cauchy step.

10

Bk

sC

−∇ f (x )

xc

[6]The Cauchy point is defined mathematically as:xkC≝ x+sk

C [1.7]Where:

skC=−α k gk is theCauchy step[1.71]

And:

α ≥0xkC ϵ Bk

Convergence of the algorithmIt can be proved that the achievable model decrease at each iteration is given by:

Predicted Reduction≝ f (xk )−mk ( xk+sk )≥ 12‖gk‖2min[∆ ,‖gk‖21+‖H k‖2 ] [1.8 ][13]

This proof will not be shown in this paper however some of the important convergence properties of the algorithm will be explored.To determine the Cauchy point there are three particular cases to consider. These cases are discussed below.

Cases 1: Model minimiser within Trust regionLet:

mk (xk−α gk )≝m (xkC )[1.9]Then applying [1.3] we can write [1.9] as:

m ( xkC )=f k (xk )−α‖gk‖22+ 12α2gk

T H k gk [2.0 ][14]

Now introduce the condition:gkT H k gk>0 [2.1]

(i.e. - Require that the curvature of the model along the descent direction to be positive)This is to ensure convergence to a local minimum. Now, if the above condition holds then the optimal value of alpha (denoted α ¿) which minimises the model (defined in [2.0]) along the Cauchy arc is found by the usual method of differentiation and equating to zero.Thus:

11

Figure [1.1] - contour plot of the Rosenbrock functionAn example of the Cauchy point within the trust region The red dot represents the Cauchy pointThe dashed arrow represents the Cauchy arc in the negative descent directionThe Cauchy step is the distance from the current search point xc to the Cauchy point and is denoted is here by sC

∂(mk (xkC ))

∂α=−‖gk‖2

2+α k gk

T H k gk [2.2]

Then equating [2.2] to zero and solving for alpha gives:

α k¿=

‖gk‖22

gkT H k gk

[2.3]

Now we know from [1.71] that the Cauchy step is given by: skC=−α k gk thus we can expect the

Cauchy point to lie within the trust region when:α ¿

k‖gk‖2≤∆k [2.4]If this is the case then it is expedient to choose the value for alpha as the optimal value defined by [2.3]. Therefore we have: α k=αk

¿ [2.5 ]Now replacing [2.3] into [2.0] will allow us to deduce the amount of decrease we can expect to achieve from the model when the Cauchy point is within the trust region. Thus:

m ( xkC )=f k (xk )−( ‖gk‖22

gkT H k gk

)‖gk‖22+ 12 ( ‖gk‖2

2

gkT H k gk

)2

gkT

⇒ f k (xk )−mk (xkC )=12 ( ‖gk‖24

gkT H k gk

)[2.6]Case1b: Model Minimiser outside Trust regionThis is a sub-case of Case 1 rather than a separate case on it’s own as we assume that condition [2.1] still holds.If the model minimiser lies outside the trust region:

α k‖gk‖2>∆k [2.7 ] [14 ]

Then it is prudent to take a step back towards the boundary of the trust region to avoid divergence.Thus the appropriate value in this case for the parameter alpha is given by:

α k‖gk‖2=Δk

⇒ αk=Δk

‖gk‖2[2.8 ]

Now replacing [2.8] into [2.0] will allow us to deduce the amount of decrease we can expect to achieve from the model when the Cauchy point is outside the trust region. Thus:

m ( xkC )=f k (xk )−( Δk

‖gk‖2 )‖gk‖22+ 12 ( Δk

‖gk‖2 )2

gkT H k gk

⇒ f k (xk )−mk (xkC )=‖gk‖2Δk−

12 ( Δk

‖gk‖2 )2

gkT H k gk [2.9]

12

Case 2: Negative CurvatureThis case corresponds to the situation when [2.1] is violated, giving:

gkT H k gk<0 [3.0]

Then [3.0] implies that:

m ( xkC )=f k (xk )−αk‖gk‖22+ 12α2gk

T H k gk≤ f k (xk )−α k‖gk‖22 [3.1 ] [14 ]

(Since the highlighted term will be negative due to [3.0])Now since the Cauchy point is on the boundary of the trust region we replace [2.8] into [3.1] to obtain:

m ( xkC )=f k (xk )−Δk‖gk‖2+12 ( Δk

‖gk‖2 )2

gkT H k gk ≤ f k (xk )−( Δk

‖gk‖2 )‖gk‖22 [3.2 ]

(Since the highlighted term will be negative due to [3.0])

⇒m (xkC )−f k (xk )≤ f k (xk )−( Δk

‖gk‖2 )‖gk‖22

as the amount of decrease we can expect to achieve from the model when we have negative curvature.This concludes the analysis on the Cauchy point.

Powell’s Double Dog-leg stepAs discussed above the Cauchy step provides a trial point which gives a model decrease, it greatest advantage is that it computationally cheap to obtain. However this method is based on steepest descent direction thus continues steps towards this Cauchy point will probably result in a slow converging method for certain methods. This is perhaps the reason why it is very rarely used as the sole search method.

This bring us to the Double Dog leg step attributed to Powell this method is works in a similar way to the Levenburg-Marquardt method in that it uses combinations of the steepest descent and Guass-Newton direction. It addresses the slow convergence the steepest descent and the difficulties of computing the exact local Model minimiser.

The algorithm begins by computing the step to the Newton point (See page 15). If this point is within the trust region radius the Newton step is taken as the trial step and the sub-problem is solved, thus we proceed to step 3 of the Basic Trust region algorithm (defined on page 4).If the Newton point is outside the trust region radius, then the algorithm first computes the step to the Cauchy point. If this point is on the boundary of the trust region radius then no better step can be achieved thus Cauchy step is taken as the trial step and the sub-problem is solved, thus we proceed to step 3 of the Basic Trust region algorithm (defined on page 4)If the Cauchy point is within the trust region the algorithm connects a line from the Cauchy point to a point in the Gauss-Newton direction, such that the dogleg step is found this along the

13

line. In fact the main purpose of the algorithm is to calculate an exact step of ‖s‖2=Δk (i.e a step to the boundary of the trust region) where we can expect a good model reduction.

The double dogleg step has two important properties that make the process of finding the step mathematical sound and computationally efficient. Firstly as the algorithm moves from the current iterate to the Cauchy point, all the way to the new point, the distance from the current iterate increases monotonically. Meaning for any ∆k≤‖H c

−1∇ f (xc )‖2 there is a unique point xk+1 on the dogleg curve (see figure 1.2) such that ‖sk‖2=∆k. Secondly the value of the quadratic model defined in [1.3] will decrease monotonically as sk moves along the current iterate, to the Cauchy point all the way to the new point [15].

Fig[1.2] – Shows the process of computing the dogleg step

Dogleg ParametersSo far the general form of the algorithm has been given, now the mathematics behind it is to be examined. The mathematical parameters for calculating the double dogleg were developed in 1979 by Dennis and Mei:The Newton step is given by:

sN=−H−1g [3.3]Note: In practice the inverse Hessian is not computed, rather a system of non-linear equations are solved.The step in the Newton direction is given by:

[9 ]sN̂=η sN [3.4 ] [16 ]

14

Bk

xc

−∇ f (x )

xC N̂SN

where:[9 ]η=0.8 γ+0.2 for γ ≤η≤1 [3.5 ] [16]

This is a scaling factor used to reduce the length of the Newton step.

and:

[9 ]γ= ‖gk‖24

(gT Hg )(gT H−1 g)[3.6 ] [16 ]

Now given these initial set of parameters the dogleg step is given by:[9 ]sD=sC+λ (sN̂−sC ) [3.7 ] [16 ]

where:0≤ λ≤1∧sC is defined by [1.71]

Now the aim of the algorithm is to take a step that sufficiently reduces the model at each iteration, a possible means of achieving this aim is to take a step to boundary of the trust region, i.e our step must satisfy:

‖skD‖=Δk [3.8]This brings us to a second issue, the algorithm must find a value of λ that satisfies [3.8], this value is found by solving the quadratic equation:

‖sC+ λ (sN̂−sC )‖22=Δ2[3.9 ]

Then [3.9] can be expanded and re-written as:‖sN̂−sC‖2

2λ2+2( (sN̂−sC )T sC ) λ+‖sC‖2

2−Δ2=0 [4.0]

Then applying the quadratic formula: −b±√b2−4 ac2a

to [4.0] gives:

λ=−2 ((sN̂−sC )T sC )± √2( (sN̂−sC )T sC )2−4 (‖sN̂−sC‖¿¿22)(‖sC‖22−Δ2)

2 (‖sN̂−sC‖¿¿22)[ 4.1]¿¿

Note: the algorithm always chooses the positive root of [4.1] since we must have 0≤ λ≤1

The Double-Dog leg Algorithm Step 1:

o Compute sN [3.3] If sNis less than Trust region radius got to step 3 of B.T.R.A (Pg 4) Else if sNis greater than Trust region radius proceed to step 2

Step 2: o Compute: sC [1.71]

If sCis equal to than Trust region radius got to step 3 of B.T.R.A (Pg 4) Else if sNis less than Trust region radius proceed to step 3

15

Step 3: o Compute: sD [3.7 ]o Got to step 3 of B.T.R.A (Pg 4)

MATLAB Implementation In this section of the report we put the theory into practice by implementing the trust region algorithm in a computer program. The main purpose of the programs is not only to test the convergence properties of the trust region algorithm but also to compare it with the Newton-Raphson Method.All the computer programs have been written using Matlab software.

Code for Newton’s MethodThe Newton-Raphson method was simple to code, the main work of the algorithm is done in line 73 where it solves a system of non-linear equations.

1. %MATLAB CODE2. %_______________________________________________________________________3. %_______________________________________________________________________

4. %Newton Algorithm

5. disp('****')6. disp('Welcome Newton Method)7. disp('By: Christian Adom, Kingston University, k0849957')8. disp('****')

9. %Display Objective function to minimize10. disp('Minimize the function:f = -10*x(1)^2 + 10*x(2)^2+

4*sin(x(1)*x(2))-2*x(1)+x(1)^4')11. %Display 3D plot of function12. ezmeshc('-10*x^2 + 10*y^2+ 4*sin(x*y)-2*x+x^4',[-2,2,-2,2])

13. disp('****')14. %Local Minimum of function15. xo = [2.3029;-0.3409];

16. disp('known minimum of function at')17. disp(xo)

18. %Ask the user to supply a start point for search19. in1 = input('Please Enter starting x value:');20. in2 = input ('Please Enter starting y value:');21. %Store user input into a vector

16

22. x = [in1;in2];23. disp('****')24. disp('Start point')25. disp(x)

26. %Stopping criteria27. normtol = input('Stop search when either 2-norm of gradient or

step length is less than?:');

28. disp('****')

29. %Ask user to specify maximum number iterations allowed30. Maxit = input('Please enter maximum number of iterations

allowed:');

31. disp('****')

32. %Initiate a while loop will terminate when either the 2-norm of the gradient or step length is less than the specified tolerance value or the maximum number of iterations are exceeded

33. while (n > normtol && iteration < Maxit)34. iteration = iteration +1;35. disp('**********************************')36. disp('iteration')37. disp(iteration)38. disp('***********************************')

39. %Compute value of Objective function current point40. f = -10*x(1)^2 + 10*x(2)^2+ 4*sin(x(1)*x(2))-2*x(1)+x(1)^4;

41. disp('Current Objective function Value')42. disp(f)

43. % Compute Gradient vector of objective function at current point44. g = [-20*x(1)+4*cos(x(1)*x(2))*x(2)-

2+4*x(1)^3;20*x(2)+4*cos(x(1)*x(2))*x(1)];

45. disp('Current Gradient vector value')46. disp(g)

47. %Compute Hessian Matrix of objective function at current point

48. h = [-20-4*sin(x(1)*x(2))*x(2)^2+12*x(1)^2,-4*sin(x(1)*x(2))*x(1)*x(2)+4*cos(x(1)*x(2));-4*sin(x(1)*x(2))*x(1)*x(2)+4*cos(x(1)*x(2)),20-4*sin(x(1)*x(2))*x(1)^2];

49. disp('Current Hessian Matrix Value')

17

50. disp(h)

51. %Extract First element in Hessian matrix52. FirstElement = h(1,1);

53. %Compute Determinant of Hessian Matrix54. dm = det(h);55. disp('The determinant is')56. disp(dm)

57. %Evaluate A series of if statements to determine the nature of the Hessian

58. if FirstElement > 0 && dm > 059. disp('Hessian is positive definite')60. elseif FirstElement < 0 && dm > 0

a. disp('Hessian is negative definite')61. elseif FirstElement >= 0 && dm >=062. disp('Hessian is positive semidefinite')63. elseif FirstElement <= 0 && dm >= 064. disp('Hessian is negative semidefinite')65. else66. disp('Hessian is indefinite')67. end

68. %Compute 2-Norm of gradient 69. n = norm(g,2);70. disp('2 norm of gradient')71. disp(n)

72. %Solve systems of equations to obtain Newton’s step73. sn = h\(-g);74. disp('Newton step')75. disp(sn)

76. %Compute the Newton point77. xn = x+sn;78. disp('Newtons point')79. disp(xn)

80. %Store new value in x 81. x = xn;

82. end

83. disp('*********************************************************************')

18

84. disp('RESULTS')85. disp('************************************************************

*********')86. disp('Distance from current point to optimal solution')87. %disp(distSol)88. disp('The 2 norm of gradient is')89. disp(n)90. disp('The 2 norm of step length is')91. disp(TrialStepNorm)92. disp('Location of minimum at')93. disp(x)94. disp('Function value at minimum')95. disp(f)96. disp('Total Number of iterations')97. disp(iteration)

98. %Display a contour plot of objective function99. fplot4=@(x,y) -10*x.^2 + 10*y.^2+ 4*sin(x*y)-2*x+x.^4;100. ezcontour(fplot4,[-5,5,-5,5],49)

Code For Trust Region Double Dog LegThe code for the double dog leg is large and complex, it totals approximately 300 lines of code, this due to the fact that it incorporates three methods (Steepest Descent, Newton & Dogleg) to solve the trust region sub-problem.

1. %MATLAB CODE2. %_______________________________________________________________________3. %_______________________________________________________________________

4. %Basic Trust region Algorithm5. %The Double Dogleg Step

6. disp('****')7. disp('Welcome to the Trust region Algorithm')8. disp('Double Dogleg Step')9. disp('By: Christian Adom, Kingston University, k0849957')10. disp('****')

11. %Display objective function to minimize12. disp('Minimize the function:f = -10*x(1)^2 + 10*x(2)^2+

4*sin(x(1)*x(2))-2*x(1)+x(1)^4')13. Display a 3D plot of function14. ezmeshc('-10*x^2 + 10*y^2+ 4*sin(x*y)-2*x+x^4',[-2,2,-2,2])

15. disp('****')

19

16. %Local minimum of objective function17. xo = [2.3029;-0.3409];

18. disp('known minimum of function at')19. disp(xo)

20. %Ask the user to supply a start point for search21. in1 = input('Please Enter starting x value:');22. in2 = input ('Please Enter starting y value:');23. %Store values as a vector24. x = [in1;in2];25. disp('****')26. disp('Start point')27. disp(x)

28. %Trust region radius modification values29. eta1 = 0.01;30. eta2 = 0.9;

31. %Iteration counter32. iteration = 0;

33. %Dummy value for norms34. n = 10;35. TrialStepNorm = 10;

36. %Stopping criteria37. normtol = input('Stop search when either 2-norm of gradient or

step length is less than?:');

38. disp('****')

39. %Ask user to specify maximum number iterations allowed40. Maxit = input('Please enter maximum number of iterations

allowed:');41. %Maxit = M1;

42. disp('****')

43. %Ask user to supply an initial Trust region radius44. del = input('Please enter intial trust region radius:');45. disp('intial trust region radius')46. disp(del)

20

47. %Initiate a while loop will terminate when either the 2-norm of the gradient or step length is less than the specified tolerance value or the maximum number of iterations are exceeded

48. while (n > normtol && iteration < Maxit)49. iteration = iteration +1;50. disp('**********************************')51. disp('iteration')52. disp(iteration)53. disp('***********************************')

54. %Compute value of Objective function at current point55. f = -10*x(1)^2 + 10*x(2)^2+ 4*sin(x(1)*x(2))-2*x(1)+x(1)^4;

56. disp('Current Objective function Value')57. disp(f)

58. % Compute Gradient vector of objective function at current point59. g = [-20*x(1)+4*cos(x(1)*x(2))*x(2)-

2+4*x(1)^3;20*x(2)+4*cos(x(1)*x(2))*x(1)];

60. disp('Current Gradient vector value')61. disp(g)

62. %Compute Hessian Matrix of objective function at current point 63. h = [-20-4*sin(x(1)*x(2))*x(2)^2+12*x(1)^2,-

4*sin(x(1)*x(2))*x(1)*x(2)+4*cos(x(1)*x(2));-4*sin(x(1)*x(2))*x(1)*x(2)+4*cos(x(1)*x(2)),20-4*sin(x(1)*x(2))*x(1)^2];

64. disp('Current Hessian Matrix Value')65. disp(h)

66. %Extract first element in Hessian matrix67. FirstElement = h(1,1);

68. %Compute Determinant of Hessian Matrix69. dm = det(h);70. disp('The determinant is')71. disp(dm)

72. %Evaluate a series of if statements to determine the nature of the Hessian

73. if FirstElement > 0 && dm > 074. disp('Hessian is positive definite')75. elseif FirstElement < 0 && dm > 0

a. disp('Hessian is negative definite')76. elseif FirstElement >= 0 && dm >=077. disp('Hessian is positive semidefinite')

21

78. elseif FirstElement <= 0 && dm >= 079. disp('Hessian is negative semidefinite')80. else81. disp('Hessian is indefinite')82. end

83. %Compute 2-Norm of gradient 84. n = norm(g,2);85. disp('2 norm of gradient')86. disp(n)

87. % Solve systems of equations to obtain Newton’s step88. sn = h\(-g);89. disp('Newton step')90. disp(sn)

91. %Compute the Newton point92. xn = x+sn;93. disp('Newtons point')94. disp(xn)

95. %Compute length of newton step96. snnorm = norm(sn,2);97. disp('length of newton step is')98. disp(snnorm)99. disp('current TRA radius')100. disp(del)

101. % If step to newton is greater than TRA radius then first calculate step to

102. %cauchy point103. if snnorm > del104. disp('Newton step is greater than current T.R radius')

105. disp('Move to calculating cauchy step')

106. %Compute curvature of model along steepest descent107. curvature = (g)'*h*g;

108. %Series of if statements to determine the Cauchy point

109. %Case 1: Model Minimiser within trust region110. if curvature > 0

22

111. disp('curvature along steepest descent is positive, optimal value for alpha is')

112. %Compute optimal value for alpha113. aopt = (n)^2/(curvature);114. disp(aopt)

115. if aopt*n <= dela. disp('Model Minimizer lies within trust region')b. disp('Cauchy point lies within interior of trust region')c. %Set alpha value to optimald. alpha = aopt;

116. %Case 1b: Model Minimiser output trust region 117. else

a. disp('Model Minimiser outside trust region')b. disp('Compute Cauchy point at boundary of trust region')

118. %Compute a value of alpha that sets Cauchy point at boundary of trust region

a. alpha = (del)/n; 119. end

120. %Case 2: Negative curvature 121. else122. disp('curvature along steepest descent is negative')123. disp('Compute Cauchy point at boundary of trust region')124. %Compute a value of alpha that sets Cauchy point at boundary of

trust region125. alpha = (del)/n;126. end

127. %Compute Cauchy step128. sc = -alpha*g;129. disp('cauchy step is')130. disp(sc)

131. %Compute Cauchy point 132. xc = x + sc;133. disp('cauchy point is')134. disp(xc)

135. %Compute length of Cauchy step 136. scnorm = norm(sc,2);137. disp('length of the cauchy step is:')138. disp(scnorm);139. disp('current TRA radius')140. disp(del)

23

141. %if length of Cauchy step is less than tra radius then calculate dogleg step

142. if abs(scnorm-del)> 0.0001 143. %Note: Since matlab doesnt regonize == sign when dealing with 144. %floating point values we use the abs value of 145. %the difference subject to a tolerance to evaluate if the two

values are equal146. disp('***************')147. disp('***************')148. disp('Cauchy step is less than current trust region radius, move

to calculating dogleg step')149. disp('***************')150. disp('***************')

151. %Calculation of various parameters needed for Dogleg step (See report for

152. %explanation)153. gamma = ((n)^4)/((curvature)*(g)'*(-sn));154. disp('Value for gamma')155. disp(gamma); 156. kappa = (0.8*gamma) + 0.2;

157. if kappa >= gamma && kappa <= 1158. disp('value for kappa')159. disp(kappa);160. else161. disp('kappa value out of bounds')162. disp(kappa)163. break164. end

165. %Compute the step towards guass-newton direction 166. snhat = kappa * sn;167. disp('Step to nhat')168. disp(snhat);

169. %Compute the nhat point170. nhat = x + snhat;171. disp('nhat point')172. disp(nhat);

173. %Compute length of snhat step174. snhatnorm = norm(snhat,2);175. disp('length nhat step ')176. disp(snhatnorm);177. disp('current TRA radius')178. disp(del)

24

179. v = snhat - sc;180. %shatnorm = norm(v,2);181. %disp(v)

182. %Compute value of lambda that satisfies [3.8],[3.9] (See report) 183. lambda1 = (sqrt(-4*((v(1)^2)+(v(2)^2))*((sc(1)^2)+(sc(2)^2)-

((del)^2))+(((2*sc(1)*v(1))+(2*sc(2)*v(2)))^2))-2*sc(1)*v(1)-2*sc(2)*v(2))/(2*v(1)^2+2*v(2)^2);

184. lambda2 = -(sqrt(-4*((v(1)^2)+(v(2)^2))*((sc(1)^2)+(sc(2)^2)-((del)^2))+(((2*sc(1)*v(1))+(2*sc(2)*v(2)))^2))-2*sc(1)*v(1)-2*sc(2)*v(2))/(2*v(1)^2+2*v(2)^2);

185. %A series of if statements to choose the positive root of the expression for %lambda

186. if lambda1 >= 0187. lambdaopt = lambda1;188. elseif lambda2 >= 0189. lambdaopt = lambda2; 190. else191. disp('Value for lambda is negative')192. break193. end

194. disp('Optimal value for lambda is')195. disp(lambdaopt);

196. %Subproblem solved, now compute trial point 197. TrialStep = sc +lambdaopt*(v);198. disp('Trial step (using dogleg)')199. disp(TrialStep)

200. %If the Cauchy step is equal to the trust region radius we simply201. %use it as the trial new step and avoid calculating the dogleg

step altogether

202. elseif abs(scnorm-del)<0.0001

203. disp('Cauchy step is equal to Trust Region radius, thus use Cauchy step')

204. TrialStep = sc;205. disp('')206. disp('Trial step (using cauchy) is')207. disp(TrialStep)

208. else 209. disp('Algorithm stopped prematurely')

25

210. break

211. end

212. %Back to the newton step213. else214. disp('Newton step is less than Trust region radius, thus use

newton step')215. TrialStep = sn;216. disp('Trial step (using Newton)')217. disp(TrialStep)218. end

219. %Compute length of Trial Step 220. TrialStepNorm = norm(TrialStep,2);221. disp('Trial step length is:')222. disp(TrialStepNorm);

223. %Compute the trial point 224. TrialPoint = x + TrialStep;225. disp('New Trial point')226. disp(TrialPoint);

227. %Compute Quadratic Model at current point228. m = f + (TrialStep)'*(g)+0.5*(TrialStep)'*h*TrialStep;229. disp('Quadratic Model value')230. disp(m)

231. %Compute function value at trial point232. fn = -10*TrialPoint(1)^2 + 10*TrialPoint(2)^2+

4*sin(TrialPoint(1)*TrialPoint(2))-2*TrialPoint(1)+TrialPoint(1)^4;233. disp('Function value at Trial point')234. disp(fn)

235. %Compute reduction predicted by the model236. Predred = f - m;237. disp('Predicted reduction value is:')238. disp(Predred)

239. %Compute actual reduction in objective function240. Actualred = f - fn;241. disp('Actual reduction')242. disp(Actualred)

243. %Compute ratio of agreement244. r = Actualred/Predred;245. disp('Ratio of agreement')

26

246. disp(r)

247. %Acceptance of Trial point and trust region radius adjustments

248. if r >= eta2249. x = TrialPoint;250. del = 2*del;%Double trust region radius251. disp('very succesful iteration')252. disp('New point')253. disp(x)254. disp('New trust region radius')255. disp(del)

256. %Compute distance from current point to optimal solution257. distSol = norm((x - xo),2);258. disp('Distance from current point to optimal solution')259. disp(distSol)260. elseif r>= eta1 && r < eta2261. x = TrialPoint;262. disp('succesful iteration')263. disp('New point')264. disp(x)265. disp('Trust region radius remains the same as:')266. disp(del)267. %Distance from current point to optimal solution268. distSol = norm((x - xo),2);269. disp('Distance from current point to optimal solution')270. disp(distSol)271. else272. disp('unsucessful iteration') 273. disp('Retain current Point at:')274. disp(x)275. disp('reduce trust region radius to:')276. del = del*0.5;%Half trust region radius277. disp(del)278. %Distance from current point to optimal solution279. distSol = norm((x - xo),2);280. disp('Distance from current point to optimal solution')281. disp(distSol)

282. end

283. %proceed = input('Press any number to continue:');284. end

285. disp('***********************************************************************************')

27

286. disp('RESULTS')287. disp('************************************************************

***********************')288. disp('Distance from current point to optimal solution')289. disp(distSol)290. disp('The 2 norm of gradient is')291. disp(n)292. disp('The 2 norm of step length is')293. disp(TrialStepNorm)294. disp('Location of minimum at')295. disp(x)296. disp('Function value at minimum')297. disp(f)298. disp('Total Number of iterations')299. disp(iteration)300. disp('Final trust region radius')301. disp(del)

302. %Display contour plot of objective function 303. fplot4=@(x,y) -10*x.^2 + 10*y.^2+ 4*sin(x*y)-2*x+x.^4;304. ezcontour(fplot4,[-5,5,-5,5],49)

Test ProblemIn this section we examine one test problem, while varying the start point of the search. The aim is to compare the Trust-region(Dogleg) with Newton’s Method at different start points.

Consider the function:

f = -10*x(1)^2 + 10*x(2)^2+ 4*sin(x(1)*x(2))-2*x(1)+x(1)^4

28

This function has two local minimum located at:x¿= (2.3 ,−0.3 )∧x¿=(−2.3,0 .3)

30

Search 1 (Newton)

Newton MethodBy: Christian Adom, Kingston University, k0849957****Minimize the function:f = -10*x(1)^2 + 10*x(2)^2+ 4*sin(x(1)*x(2))-2*x(1)+x(1)^4****known minimum of function at 2.3029 -0.3409

Please Enter starting x value:2Please Enter starting y value:-1****Start point 2 -1

Stop search when either 2-norm of gradient or step length is less than?:0.05****Please enter maximum number of iterations allowed:50

*********************************************RESULTS

Search 1 (Trust Region Dog Leg)

Welcome to the Trust region AlgorithmDouble Dogleg StepBy: Christian Adom, Kingston University, k0849957****Minimize the function:f = -10*x(1)^2 + 10*x(2)^2+ 4*sin(x(1)*x(2))-2*x(1)+x(1)^4****known minimum of function at 2.3029 -0.3409


Stop search when either 2-norm of gradient or step length is less than?:0.05****Please enter maximum number of iterations allowed:50****Please enter initial trust region radius:1initial trust region radius 1

*********************************************RESULTS*********************************************

31

Search 1 (Newton)

Newton MethodBy: Christian Adom, Kingston University, k0849957****Minimize the function:f = -10*x(1)^2 + 10*x(2)^2+ 4*sin(x(1)*x(2))-2*x(1)+x(1)^4****known minimum of function at 2.3029 -0.3409



*********************************************RESULTS

Search 1 (Trust Region Dog Leg)




*********************************************RESULTS*********************************************

Search 2 (Newton)

****Newton MethodBy: Christian Adom, Kingston University, k0849957****Minimize the function:f = -10*x(1)^2 + 10*x(2)^2+ 4*sin(x(1)*x(2))-2*x(1)+x(1)^4****known minimum of function at 2.3029 -0.3409



*********************************************RESULTS*********************************************Distance from current point to optimal solutionThe 2 norm of gradient is

Search 2 (Trust Region Dogleg)


Please Enter starting x value:3Please Enter starting y value:-2****Start point 3 -2Stop search when either 2-norm of gradient or step length is less than?:0.05****Please enter maximum number of iterations allowed:50****Please enter initial trust region radius:1Initial trust region radius 1

*************************************************RESULTS*************************************************Distance from current point to optimal solution 0.0094

32

Search 2 (Newton)




*********************************************RESULTS*********************************************Distance from current point to optimal solutionThe 2 norm of gradient is

Search 2 (Trust Region Dogleg)


Please Enter starting x value:3Please Enter starting y value:-2****Start point 3 -2Stop search when either 2-norm of gradient or step length is less than?:0.05****Please enter maximum number of iterations allowed:50****Please enter initial trust region radius:1Initial trust region radius 1

*************************************************RESULTS*************************************************Distance from current point to optimal solution 0.0094

Search 3 (Newton)

Newton MethodBy: Christian Adom, Kingston University, k0849957****Minimize the function:f = -10*x(1)^2 + 10*x(2)^2+ 4*sin(x(1)*x(2))-2*x(1)+x(1)^4

known minimum of function at 2.3029 -0.3409

Please Enter starting x value:5Please Enter starting y value:4****Start point 5 4

Stop search when either 2-norm of gradient or step length is less than?:0.05

Please enter maximum number of iterations allowed:50

*********************************************RESULTS*********************************************Distance from current point to optimal solution

Search 3 (Trust region Dogleg)

****Welcome to the Trust region AlgorithmDouble Dogleg StepBy: Christian Adom, Kingston University, k0849957****Minimize the function:f = -10*x(1)^2 + 10*x(2)^2+ 4*sin(x(1)*x(2))-2*x(1)+x(1)^4****known minimum of function at 2.3029 -0.3409

Please Enter starting x value:5Please Enter starting y value:4****Start point 5 4Stop search when either 2-norm of gradient or step length is less than?:0.05****Please enter maximum number of iterations allowed:50****Please enter initial trust region radius:1Initial trust region radius 1

*************************************************RESULTS*************************************************Distance from current point to optimal solution

33

Search 3 (Newton)

Newton MethodBy: Christian Adom, Kingston University, k0849957****Minimize the function:f = -10*x(1)^2 + 10*x(2)^2+ 4*sin(x(1)*x(2))-2*x(1)+x(1)^4

known minimum of function at 2.3029 -0.3409

Please Enter starting x value:5Please Enter starting y value:4****Start point 5 4

Stop search when either 2-norm of gradient or step length is less than?:0.05

Please enter maximum number of iterations allowed:50

*********************************************RESULTS*********************************************Distance from current point to optimal solution

Search 3 (Trust region Dogleg)


Please Enter starting x value:5Please Enter starting y value:4****Start point 5 4Stop search when either 2-norm of gradient or step length is less than?:0.05****Please enter maximum number of iterations allowed:50****Please enter initial trust region radius:1Initial trust region radius 1

*************************************************RESULTS*************************************************Distance from current point to optimal solution

Search 4 (Newton)




*******************************************RESULTS*******************************************Distance from current point to optimal solutionThe 2 norm of gradient is

Test Problem 4 (Dogleg)




****************************************************RESULTS****************************************************Distance from current point to optimal solution 0.0094

Analysis on Test problemThis section of the report will attempt to give some analysis on the results produced by the algorithm Firstly an initial trust region radius of 1 is chosen for all the tests, however it is at the user’s discretion to determine the most appropriate value based on the nature of the problem to be solved. Secondly a cap of 50 iterations is imposed on the search and lastly both algorithms begin searching at the same start point. The aim of this test is to start very close to the optimal

34

Search 4 (Newton)




*******************************************RESULTS*******************************************Distance from current point to optimal solutionThe 2 norm of gradient is

Test Problem 4 (Dogleg)




****************************************************RESULTS****************************************************Distance from current point to optimal solution 0.0094

point and then gradually choose start points further away from the optimal point with each search, with the aim of proving the tendency of Newton methods to diverge when far from the minimum whilst subsequently showing the superiority of the Trust region approach.

Search 1:

The first search begins by choosing a start point very close to the minimum. In this search the newton-raphson method is more efficient since it reaches the local minima in only four iterations whilst the Trust-region(Dogleg) takes six iterations. This is to be expected since the major advantage of Newton’s method is fast convergence when near the minimum, whilst the Trust region method usually begins by taking steps the Cauchy point.

Search 2:

In the second search the start point is moved only slightly further away from the minimum but even this is enough to cause Newton’s method to diverge as it reaches the iteration cap of 50 with coordinates far away from the minimum. The Trust-region(Dogleg) performance as predicted by the theory converges and reaches one of the local minima in 23 iterations. The reason Newton’s method failed is most likely due to the fact the Hessian was not positive definite at a particular iteration, this will certainly have caused divergence. It is also interesting to observe the size of trust region radius at the end of the search, such a small radius could probably indicate a series of average or poor ratio of agreement (See page 6) through the search.

Search 3:

Similar analysis to search 2

Search 4:

It is not surprising that Newton’s method again fails when starting so far from the minimum in this search, however what is even more surprising is the fast convergence of the Trust-region method (only 35 iterations) when starting at [50,-56] which is a long distance away from the local minima [2,-0.3] or [-2,0.3]. In addition the large trust region radius size (namely 512) at the end of the search most likely indicates a series of very successful iterations.

ConclusionThis report has explored the underlying theory of the trust region algorithm and its operations. The convergence properties of the algorithm when taking steps to the Cauchy point has been examined and it has been shown that the double dog leg method is far more superior in terms of speed and

35

convergence and than the steepest descent or Newton’s method. In summary the trust region method is simply a modification of Newton’s method with the aim of safe-guarding Newton’s Method from diverging by restricting the step-size within the bounds of the trust region.

References[1] Wikipedia, Line search [online] Available at: http://en.wikipedia.org/wiki/Line_search

[2] Andrew R. Conn, Nicholas I. M. Gould, Philippe L. Toint, (2000 p.8-12). Trust region-Methods. SIAM

[3] Frank Vanden Berghen,(2004 p.4) Levenberg-Marquardt algorithms vs Trust Region algorithms [pdf] Available at: http://www.applied-mathematics.net/LMvsTR/LMvsTR.pdf

[4] Frank Vanden Berghen,(2004 p.3) Levenberg-Marquardt algorithms vs Trust Region algorithms [pdf] Available at: http://www.applied-mathematics.net/LMvsTR/LMvsTR.pdf

[5] Andrew R. Conn, Nicholas I. M. Gould, Philippe L. Toint, (2000 p.115). Trust region-Methods. SIAM




[9] Ya-xiang Yuan,(n.d p.3) A review of trust region algorithms for optimization [pdf] Available at: ftp://ftp.cc.ac.cn/pub/yyx/papers/p995.pdf





36

ftp://ftp.cc.ac.cn/pub/yyx/papers/p995.pdf



http://www.applied-mathematics.net/LMvsTR/LMvsTR.pdf

http://www.applied-mathematics.net/LMvsTR/LMvsTR.pdf

http://en.wikipedia.org/wiki/Line_search

[14] Nick Gould (n.d) Trust-region methods for unconstrained optimization [pdf] Available at: http://www.numerical.rl.ac.uk/nimg/msc/lectures/part3.2.pdf

[15] J. Dennis and R. Schnabel, (1996 p.139). Numerical Methods for Unconstrained Optimization and Nonlinear Equation. SIAM

[16] J. Dennis and R. Schnabel, (1996 p.141). Numerical Methods for Unconstrained Optimization and Nonlinear Equation. SIAM

37

http://www.numerical.rl.ac.uk/nimg/msc/lectures/part3.2.pdf

trust region algorithm - bachelor dissertation

Documents