1 chapter 6 unconstrained multivariable optimization

1

Ch

apte

r 6

Chapter 6

UNCONSTRAINED MULTIVARIABLEOPTIMIZATION

2

Ch

apte

r 6

6.1 Function Values Only

6.2 First Derivatives of f (gradient and conjugate direction methods)

6.3 Second Derivatives of f (e.g., Newton’s method)

6.4 Quasi-Newton methods

3

Ch

apte

r 6

4

Ch

apte

r 6

5

Ch

apte

r 6

6

Ch

apte

r 6

7

Ch

apte

r 6

8

Ch

apte

r 6

9

Ch

apte

r 6

General Strategy for Gradient methods

(1) Calculate a search direction(2) Select a step length in that direction to reduce f(x)

1k k k k kx x s x x

Steepest DescentSearch Direction

( ) k ks f x Don’t need to normalize

Method terminates at any stationary point. Why?

0)( xf

ks

10

Ch

apte

r 6

So procedure can stop at saddle point. Need to show

)( *xH is positive definite for a minimum.

Step Length

How to pick • analytically• numerically

11

Ch

apte

r 6

12

Ch

apte

r 6

13

Ch

apte

r 6

14

Ch

apte

r 6

Analytical MethodHow does one minimize a function in a search direction using an analytical method?

It means s is fixed and you want to pick , the steplength to minimize f(x). Note

1 1( ) ( ) ( ) ( )( ) ( ) ( )( )

2

( )0 ( )( ) ( ) ( )( )

Solve for

( )( )

( ) ( )( )

k k k k k k k k kT Tf x s f x f x f x x x H x x

k kdf x s k k k k kT Tf x s s H x sd

k kT f x sk k kTs H x s

This yields a minimum of the approximating function.

. k kx s

(6.9)

15

Ch

apte

r 6

Numerical MethodUse coarse search first (1) Fixed ( = 1) or variable ( = 1, 2, ½, etc.)

Options for optimizing (1) Use interpolation such as quadratic, cubic(2) Region Elimination (Golden Search)(3) Newton, Secant, Quasi-Newton(4) Random(5) Analytical optimization

(1), (3), and (5) are preferred. However, it maynot be desirable to exactly optimize (better togenerate new search directions).

16

Ch

apte

r 6

Suppose we calculate the gradient at the point xT = [2 2]

17

Ch

apte

r 6

18

Ch

apte

r 6

19

Termination Criteria

Big change in f(x) but little changein x. Code will stop if x is sole criterion.

Big change in x but little changein f(x). Code will stop if x is sole criterion.

For minimization you can use up to three criteria for termination:

f(x)

f(x)

(1) 1

1 12

except when ( ) 0 ( ) ( )

( ) then use ( ) ( )

kk k

k k k

f xf x f x

f x f x f x

1

3 14

except when 0

then use

kk ki i

k k ki

xx x

x x x

5 6( ) or k kif x s

(2)

(3)

x

x

Ch

apte

r 6

20

Conjugate Search Directions

• Improvement over gradient method for general quadratic functions

• Basis for many NLP techniques• Two search directions are conjugate relative to Q if

• To minimize f(xnx1) when H is a constant matrix (=Q), you are guaranteed to reach the optimum in n conjugate direction stages if you minimize exactly at each stage

(one-dimensional search)

( ) ( ) 0i T j s Q s

Ch

apte

r 6

21

Ch

apte

r 6

22

Conjugate Gradient Method0 0

0 0

At calculate ( ). Let

( )

f

f

Step 1. x x

s x

0

1 0 0 0

Save ( ) and compute

f

Step 2. x

x x s

by minimizing f(x) with respect to in the s0 direction (i.e., carry out a unidimensional search for 0).

Step 3. Calculate The new search direction is a linear combination of

For the kth iteration the relation is

For a quadratic function it can be shown that these successive search directions are conjugate.After n iterations (k = n), the quadratic function is minimized. For a nonquadratic function,the procedure cycles again with xn+1 becoming x0.

Step 4. Test for convergence to the minimum of f(x). If convergence is not attained, return to step 3.

Step n. Terminate the algorithm when is less than some prescribed tolerance.

1 11 1 0

0 0

( ) ( )( )

( ) ( )

T

T

f ff

f f

x x

s x sx x

1 11 1 ( ) ( )

( )( ) ( )

T k kk k k

T k k

f ff

f f

x xs x s

x x

1 1( ), ( ).f fx x

( )kf x

(6.6)

0 1 and ( ) :fs x

Ch

apte

r 6

23

Ch

apte

r 6

24

Ch

apte

r 6

25

Ch

apte

r 6

26

Ch

apte

r 6

27

Ch

apte

r 6

2 21 2( 3) 9( 5)f x x Minimize using the method of conjugate gradients with

0 01 21 and 1x x

0 1

1x

0

4

72xf

For steepest descent,

0

0 4

72xs f

Steepest Descent Step (1-D Search)

1 0 01 4, 0.

1 72x

The objective function can be expressed as a function of 0 as follows:

0 0 2 0 2( ) (4 2) 9(72 4) .f

Minimizing f(0), we obtain f = 3.1594 at 0 = 0.0555. Hence

1 1.223

5.011x

as an initial point.

In vector notation,

Ch

apte

r 6

28

Ch

apte

r 6

Calculate Weighting of Previous step

The new gradient can now be determined as

and 0 can be computed as

Generate New (Conjugate) Search Direction

and

One dimensional Search

Solving for 1 as before [i.e., expressing f(x1) as a function of 1 and minimizing with respect to 1] yields f = 5.91 x 10-10 at 1 = 0.4986. Hence

which is the optimum (in 2 steps, which agrees with the theory).

1

3.554

0.197xf

2 20

2 2

(3.554) (0.197)0.00244.

(4) (72)

1 3.554 4 3.5640.00244

0.197 72 0.022s

2 11.223 3.564

5.011 0.022x

2 3.0000

5.0000X

Ch

apte

r 6

29

Ch

apte

r 6

30

Ch

apte

r 6

31

Ch

apte

r 6

32

Ch

apte

r 6

Fletcher – Reeves Conjugate Gradient Method

0 0

1 1 01

2 2 12

1

( )

( )

( )

are chosen to make 0 (conjugate directions)k k kk

Let s f x

s f x s

s f x s

s H s

Derivation: (let )kH H

1 2

1

( ) ( ) ( )( )

( ) ( )

k k k k

k k k kk

f x f x f x x x

f x f x H x H s

33

Ch

apte

r 6

1 1

1 1

1

1 1 1

1

1

( ) ( )

( ) ( ) ( ) /

Using definition of conjugate directions, ( ) =0,

( ) ( ) ( ) 0

( ) ( ) 0

and ( ) 0,

(

k k k k

Tk k kT k

k kT

Tk k k kk

k kT

k kT

Tk

s H f x f x

s f x f x H

s Hs

f x f x H H f x s

f x f x

f x s

f

1 1

1 1

) ( )

( ) ( )

( )

k k

k kT

k k kk

x f x

f x f x

s f x s

and solving for the weighting factor:

34

Ch

apte

r 6

Linear vs. Quadratic Approximation of f(x)

1( ) ( ) ( ) ( ) ( ) ( )( )

2

(1) Using a linear approximation of ( ) :

( )0 ( ) so cannot solve for !

( )

(2) Using a quadratic approximation for (x) :

( )

(

k k k k k kT T

k k kk

k k

T

T

f x f x x x f x x x H x x x

x x x s

f x

df xf x x

d x

f

df x

d x

1 k 1

Newton's method0 ( ) ( )( )) solves one of these

with x x( ) ( )

(simultaneous

k k k

k k k

f x H x x x

or x x H x f x

equation-solving)

35

Ch

apte

r 6

1

Note: Both direction and step length are determined

- Requires second derivatives (Hessian)

- , must be positive definite (for minimum) to guarantee convergence

- Iterate if ( ) is not quadratic

Mod

H H

f x

1 1

2 21 2

0

ified Newton's Procedure:

( ) ( )

1 for Newton's Method

(If , you have steepest descent)

Example

( ) 20

Minimize starting at x 1 1

k k k kk

k

T

x x H x f x

H I

f x x x

f

36

Ch

apte

r 6

37

Ch

apte

r 6

38

Ch

apte

r 6

39

Ch

apte

r 6

Marquardt’s Method1

11

If ( ) or ( ) is not always positive definite, make it

positive definite.

Let ( ) ( ) ; similar for H( )

is a positive constant large enough to shift all the

negative eigenvalues of (

H x H x

H x H x I x

H x

0

0

1 2

0

).

Example

At the start of the search, ( ) is evaluated at and

Not positive definite1 2

found to be ( ) as the eigenvalues2 1

are 3, 1

Modify ( ) to be ( 2)

Positive def1 2 2

H2 1 2

H x x

H x

e e

H x

1 2

inite as the

eigenvalues are e 5, 1

is adjusted as search proceeds.

e

40

Ch

apte

r 6

Step 1

0Pick x the starting point.

Let convergence criterion

Step 2

0 3Set 0. Let 10 k

Step 3

Calculate ( ) kf x

Step 4

Is ( ) ) ? If yes, terminate. If no, continue. kf x

41

Ch

apte

r 6

Step 5

1

Calculate s( ) - ( )

k k k kkx H I f x

Step 6

1Calculate ( ) k k kx x s x

Step 7

1Is ( )? If yes, go to step 8. If no, go to step 9. k kf( x ) f x

Step 8

k 1 1Set and 1. Go to step 3

4 k k k

Step 9

kSet 2 . Go to step 5 k

42

Ch

apte

r 6

Secant MethodsRecall for one dimensional search the secant methodonly uses values of f(x) and f ′(x).

1

1

1

( ) ( )( )

Approximate ( ) by a straight line (the secant).

Hence it is called a "Quasi-Newton" method.

The basic idea (for a quadratic function):

( ) 0 ( )

k pk k k

k p

k k k k

f x f xx x f x

x x

f x

f x H x or x x 1

k

2 2

1 1

2 1 2 1

( )

Pick two points to start (x Ref. point)

( ) ( ) ( )

( ) ( ) ( )

( ) ( ) ( )

k

k k

k k

k

H f x

f x f x H x x

f x f x H x x

f x f x y H x x

43

Ch

apte

r 6

1

1

For a non-quadratic function, would be calculated,

after taking a step from to , by solving the

secant equations

y

- An infinite number of candidates exist for when n 1

-

k k

k k k k

H

x x

H x or x H y

H-1 -1We want to choose (or ) close to (or ) in

some sense. Several methods can be used to update

H H H H

H

44

Ch

apte

r 6

• Probably the best update formula is the BFGS update(Broyden – Fletcher – Goldfarb – Shanno) – ca. 1970

• BFGS is the basis for the unconstrained optimizerin the Excel Solver

• Does not require inverting the Hessian matrix butapproximates the inverse with values of f

45

Ch

apte

r 6

46

Ch

apte

r 6

47

Ch

apte

r 6

48

Ch

apte

r 6

1 chapter 6 unconstrained multivariable optimization

Documents

x x chapter

d search

coarse search

unidimensional search

conjugate gradient method

new search directions

conjugate direction

successive search directions