math 523: numerical analysis i solution of homework … · math 523: numerical analysis i solution...

4
Math 523: Numerical Analysis I Solution of Homework 4. Numerical Optimization Problem 1. Program the steepest descent and Newton’s methods using the backtracking line search algorithm (using either the Wolfe conditions or the Goldstein conditions). Use them to minimize the Rosenbrock function F (x, y) = 100(y - x 2 ) 2 + (1 - x) 2 . Set the initial step size to be 1 and print out the step size at iteration in your algorithms. Test your algorithms with two initial guesses: (1.2, 1.2) and (-1.2, 1). What about you choose step sizes always equal to 1.0? What should be the exact minimizer and minimal value? What are convergence rates for these algorithms? Problem 2. Implement the BFGS quasi-Newton method with the line search algorithm with the Wolfe conditions. Check the condition y T k s k > 0 at each iteration. Use your code to minimize the Rosenbrock function in Problem 1. Problem 3. Test your algorithms with and without line search in the previous two problems for the minimizing the function F (x, y)= x 4 + y 2 . Problem 4. Minimize the following function F (x, y)= x 4 4 - x 2 +2x +(y - 1) 2 with the pure Newton’s method (α = 1). Explain why the pure Newton’s method does not converge for this problem (Hint: examine the Hessian matrix). What if you use the line search step length selection rules? 1

Upload: dangngoc

Post on 10-Aug-2018

216 views

Category:

Documents


0 download

TRANSCRIPT

Math 523: Numerical Analysis I

Solution of Homework 4. Numerical Optimization

Problem 1. Program the steepest descent and Newton’s methods using the backtracking linesearch algorithm (using either the Wolfe conditions or the Goldstein conditions). Use them tominimize the Rosenbrock function

F (x, y) = 100(y − x2)2 + (1− x)2.

Set the initial step size to be 1 and print out the step size at iteration in your algorithms. Testyour algorithms with two initial guesses: (1.2, 1.2) and (−1.2, 1). What about you choose stepsizes always equal to 1.0? What should be the exact minimizer and minimal value? What areconvergence rates for these algorithms?

Problem 2. Implement the BFGS quasi-Newton method with the line search algorithm withthe Wolfe conditions. Check the condition yT

k sk > 0 at each iteration. Use your code to minimizethe Rosenbrock function in Problem 1.

Problem 3. Test your algorithms with and without line search in the previous two problemsfor the minimizing the function

F (x, y) = x4 + y2.

Problem 4. Minimize the following function

F (x, y) =x4

4− x2 + 2x+ (y − 1)2

with the pure Newton’s method (α = 1). Explain why the pure Newton’s method does notconverge for this problem (Hint: examine the Hessian matrix). What if you use the line searchstep length selection rules?

1

Solution. In this homework, we mainly focus on applying different minimization techniques forsome simple test objective functions. From our numerical tests, we can draw some superficialconclusions:

• When the Newton’s method converges, it converges very fast (quadratic convergenceasymptotically).

• The Newton’s method requires second order derivatives which are difficult, if possible, toobtain. Furthermore, to store the second derivatives, we need O(n2) storage, where n

is the number of variables of the objective function. The steepest descent method andquasi-Newton methods can be used instead.

• The quasi-Newton method is a good compromise between convergence speed and complex-ity. It usually converges fast, and some times converges even without step length control.The drawback is the high storage requirement.

• The steepest descent method usually does not converge without step length control exceptwe fix the step length α to be sufficiently small. It is a low complexity and low storagemethod. This is the last choice to resort in Matlab function fminunc (unconstrainedminimization).

• The initial guess is extremely important for Newton-like methods. How to find a “goodenough” initial guess is an interesting question to explore.

• For non convex functions, like in Problem 4, the Hessian matrix might not be alwayspositive definite. This is very crucial for Newton-like methods because if the Hessian isnot positive definite, the Newton’s direction might not even be a descent direction asdiscussed in class.

• The performance of line search algorithms depends on many parameters, like the condi-tion(s) you choose and constants in those conditions. Different line searching conditionscould give different performance of you descent direction methods.

Here is the Matlab code for general descent direction methods:

2

function [min, xhist, steplength] = ...

descentdirect(f, grad_f, hessian_f, x0, maxit, tol, dir, line)

%--------------------------------------------------------------------------

% General descent direction method

%--------------------------------------------------------------------------

% f: object function

% grad_f: gradient of f

% hessian_f: hessian matrix of f

%--------------------------------------------------------------------------

%--------------------------------------------------------------------------

% Parameter

%--------------------------------------------------------------------------

c_1 = 10^-4; c_2 = 0.9;

%c_1 = 0.4; c_2 = 0.6;

alpha_max = 16;

%--------------------------------------------------------------------------

% Initialization

%--------------------------------------------------------------------------

i = 1; x_k = x0; stop = 1; xsize = length(x0); B_k = eye(xsize);

xhist = zeros(xsize,maxit); xhist(:,i) = x_k;

steplength = zeros(maxit,1);

%--------------------------------------------------------------------------

while stop && i < maxit

%--------------------------------------------------------------------

% Search Direction

%--------------------------------------------------------------------

if (dir == 1) % Steepest descent direction

p_k = steepdir(grad_f,x_k);

elseif (dir == 2) % Newton’s direction

p_k = newtondir(grad_f,hessian_f,x_k);

elseif (dir == 3) % Quasi-Newton direction

p_k = qnewtondir(grad_f,B_k,x_k);

end

%--------------------------------------------------------------------

% Step Length

%--------------------------------------------------------------------

if (line == 1) % No stepsize control

alpha = 1;

elseif (line == 2) % Wolfe condition

alpha = linesearch(f, grad_f, p_k, x_k, c_1, c_2, alpha_max);

end

steplength(i) = alpha;

%--------------------------------------------------------------------

% Update

%--------------------------------------------------------------------

x_old = x_k;

x_k = x_k + alpha * p_k;

i = i + 1;

xhist(:,i) = x_k;

if (norm(grad_f(x_k)) < tol) || (norm(x_k - x_old) < 1e-12)

stop = 0;

end

%--------------------------------------------------------------------

% Updating Quasi-Newton matrix

%--------------------------------------------------------------------

if (dir == 3) % Quasi-Newton direction

B_k = bfgs(x_k,x_old,B_k,grad_f);

end

end

min = x_k;

steplength = steplength(1:i);

xhist = xhist(:,1:i);

3

%--------------------------------------------------------------------

function p_k = steepdir(grad_f,x_k)

% Steepest descent direction

p_k = - feval(grad_f, x_k);

%--------------------------------------------------------------------

function p_k = newtondir(grad_f,hessian_f,x_k)

% Newton’s direction

grad_f_k = feval(grad_f, x_k);

hessian_f_k = feval(hessian_f, x_k);

p_k = - hessian_f_k\grad_f_k;

%--------------------------------------------------------------------

function p_k = qnewtondir(grad_f,B_k,x_k)

% Quasi-Newton’s direction

grad_f_k = feval(grad_f, x_k);

p_k = -B_k\grad_f_k;

%--------------------------------------------------------------------

function B_k = bfgs(x_k,x_old,B_k,grad_f)

% BFGS

s_k = x_k - x_old;

y_k = feval(grad_f, x_k) - feval(grad_f, x_old);

if y_k’ * s_k <= 0

return

end

B_k = B_k - (B_k * s_k * s_k’ * B_k) / (s_k’ * B_k * s_k) + ...

(y_k* y_k’) / (y_k’ * s_k);

%--------------------------------------------------------------------

function B_k = sr1(x_k,x_old,B_k,grad_f)

% SR1

s_k = x_k - x_old;

y_k = feval(grad_f, x_k) - feval(grad_f, x_old);

if y_k’ * s_k <= 0

return

end

B_k = B_k + ((y_k - B_k * s_k) * (y_k - B_k *s_k)’) / ...

(y_k - B_k * s_k)’*s_k;

4