math 523: numerical analysis i solution of homework … · math 523: numerical analysis i solution...
TRANSCRIPT
Math 523: Numerical Analysis I
Solution of Homework 4. Numerical Optimization
Problem 1. Program the steepest descent and Newton’s methods using the backtracking linesearch algorithm (using either the Wolfe conditions or the Goldstein conditions). Use them tominimize the Rosenbrock function
F (x, y) = 100(y − x2)2 + (1− x)2.
Set the initial step size to be 1 and print out the step size at iteration in your algorithms. Testyour algorithms with two initial guesses: (1.2, 1.2) and (−1.2, 1). What about you choose stepsizes always equal to 1.0? What should be the exact minimizer and minimal value? What areconvergence rates for these algorithms?
Problem 2. Implement the BFGS quasi-Newton method with the line search algorithm withthe Wolfe conditions. Check the condition yT
k sk > 0 at each iteration. Use your code to minimizethe Rosenbrock function in Problem 1.
Problem 3. Test your algorithms with and without line search in the previous two problemsfor the minimizing the function
F (x, y) = x4 + y2.
Problem 4. Minimize the following function
F (x, y) =x4
4− x2 + 2x+ (y − 1)2
with the pure Newton’s method (α = 1). Explain why the pure Newton’s method does notconverge for this problem (Hint: examine the Hessian matrix). What if you use the line searchstep length selection rules?
1
Solution. In this homework, we mainly focus on applying different minimization techniques forsome simple test objective functions. From our numerical tests, we can draw some superficialconclusions:
• When the Newton’s method converges, it converges very fast (quadratic convergenceasymptotically).
• The Newton’s method requires second order derivatives which are difficult, if possible, toobtain. Furthermore, to store the second derivatives, we need O(n2) storage, where n
is the number of variables of the objective function. The steepest descent method andquasi-Newton methods can be used instead.
• The quasi-Newton method is a good compromise between convergence speed and complex-ity. It usually converges fast, and some times converges even without step length control.The drawback is the high storage requirement.
• The steepest descent method usually does not converge without step length control exceptwe fix the step length α to be sufficiently small. It is a low complexity and low storagemethod. This is the last choice to resort in Matlab function fminunc (unconstrainedminimization).
• The initial guess is extremely important for Newton-like methods. How to find a “goodenough” initial guess is an interesting question to explore.
• For non convex functions, like in Problem 4, the Hessian matrix might not be alwayspositive definite. This is very crucial for Newton-like methods because if the Hessian isnot positive definite, the Newton’s direction might not even be a descent direction asdiscussed in class.
• The performance of line search algorithms depends on many parameters, like the condi-tion(s) you choose and constants in those conditions. Different line searching conditionscould give different performance of you descent direction methods.
Here is the Matlab code for general descent direction methods:
2
function [min, xhist, steplength] = ...
descentdirect(f, grad_f, hessian_f, x0, maxit, tol, dir, line)
%--------------------------------------------------------------------------
% General descent direction method
%--------------------------------------------------------------------------
% f: object function
% grad_f: gradient of f
% hessian_f: hessian matrix of f
%--------------------------------------------------------------------------
%--------------------------------------------------------------------------
% Parameter
%--------------------------------------------------------------------------
c_1 = 10^-4; c_2 = 0.9;
%c_1 = 0.4; c_2 = 0.6;
alpha_max = 16;
%--------------------------------------------------------------------------
% Initialization
%--------------------------------------------------------------------------
i = 1; x_k = x0; stop = 1; xsize = length(x0); B_k = eye(xsize);
xhist = zeros(xsize,maxit); xhist(:,i) = x_k;
steplength = zeros(maxit,1);
%--------------------------------------------------------------------------
while stop && i < maxit
%--------------------------------------------------------------------
% Search Direction
%--------------------------------------------------------------------
if (dir == 1) % Steepest descent direction
p_k = steepdir(grad_f,x_k);
elseif (dir == 2) % Newton’s direction
p_k = newtondir(grad_f,hessian_f,x_k);
elseif (dir == 3) % Quasi-Newton direction
p_k = qnewtondir(grad_f,B_k,x_k);
end
%--------------------------------------------------------------------
% Step Length
%--------------------------------------------------------------------
if (line == 1) % No stepsize control
alpha = 1;
elseif (line == 2) % Wolfe condition
alpha = linesearch(f, grad_f, p_k, x_k, c_1, c_2, alpha_max);
end
steplength(i) = alpha;
%--------------------------------------------------------------------
% Update
%--------------------------------------------------------------------
x_old = x_k;
x_k = x_k + alpha * p_k;
i = i + 1;
xhist(:,i) = x_k;
if (norm(grad_f(x_k)) < tol) || (norm(x_k - x_old) < 1e-12)
stop = 0;
end
%--------------------------------------------------------------------
% Updating Quasi-Newton matrix
%--------------------------------------------------------------------
if (dir == 3) % Quasi-Newton direction
B_k = bfgs(x_k,x_old,B_k,grad_f);
end
end
min = x_k;
steplength = steplength(1:i);
xhist = xhist(:,1:i);
3
%--------------------------------------------------------------------
function p_k = steepdir(grad_f,x_k)
% Steepest descent direction
p_k = - feval(grad_f, x_k);
%--------------------------------------------------------------------
function p_k = newtondir(grad_f,hessian_f,x_k)
% Newton’s direction
grad_f_k = feval(grad_f, x_k);
hessian_f_k = feval(hessian_f, x_k);
p_k = - hessian_f_k\grad_f_k;
%--------------------------------------------------------------------
function p_k = qnewtondir(grad_f,B_k,x_k)
% Quasi-Newton’s direction
grad_f_k = feval(grad_f, x_k);
p_k = -B_k\grad_f_k;
%--------------------------------------------------------------------
function B_k = bfgs(x_k,x_old,B_k,grad_f)
% BFGS
s_k = x_k - x_old;
y_k = feval(grad_f, x_k) - feval(grad_f, x_old);
if y_k’ * s_k <= 0
return
end
B_k = B_k - (B_k * s_k * s_k’ * B_k) / (s_k’ * B_k * s_k) + ...
(y_k* y_k’) / (y_k’ * s_k);
%--------------------------------------------------------------------
function B_k = sr1(x_k,x_old,B_k,grad_f)
% SR1
s_k = x_k - x_old;
y_k = feval(grad_f, x_k) - feval(grad_f, x_old);
if y_k’ * s_k <= 0
return
end
B_k = B_k + ((y_k - B_k * s_k) * (y_k - B_k *s_k)’) / ...
(y_k - B_k * s_k)’*s_k;
4