1 chapter 6 unconstrained multivariable optimization
TRANSCRIPT
1
Ch
apte
r 6
Chapter 6
UNCONSTRAINED MULTIVARIABLEOPTIMIZATION
2
Ch
apte
r 6
6.1 Function Values Only
6.2 First Derivatives of f (gradient and conjugate direction methods)
6.3 Second Derivatives of f (e.g., Newton’s method)
6.4 Quasi-Newton methods
3
Ch
apte
r 6
4
Ch
apte
r 6
5
Ch
apte
r 6
6
Ch
apte
r 6
7
Ch
apte
r 6
8
Ch
apte
r 6
9
Ch
apte
r 6
General Strategy for Gradient methods
(1) Calculate a search direction(2) Select a step length in that direction to reduce f(x)
1k k k k kx x s x x
Steepest DescentSearch Direction
( ) k ks f x Don’t need to normalize
Method terminates at any stationary point. Why?
0)( xf
ks
10
Ch
apte
r 6
So procedure can stop at saddle point. Need to show
)( *xH is positive definite for a minimum.
Step Length
How to pick • analytically• numerically
11
Ch
apte
r 6
12
Ch
apte
r 6
13
Ch
apte
r 6
14
Ch
apte
r 6
Analytical MethodHow does one minimize a function in a search direction using an analytical method?
It means s is fixed and you want to pick , the steplength to minimize f(x). Note
1 1( ) ( ) ( ) ( )( ) ( ) ( )( )
2
( )0 ( )( ) ( ) ( )( )
Solve for
( )( )
( ) ( )( )
k k k k k k k k kT Tf x s f x f x f x x x H x x
k kdf x s k k k k kT Tf x s s H x sd
k kT f x sk k kTs H x s
This yields a minimum of the approximating function.
. k kx s
(6.9)
15
Ch
apte
r 6
Numerical MethodUse coarse search first (1) Fixed ( = 1) or variable ( = 1, 2, ½, etc.)
Options for optimizing (1) Use interpolation such as quadratic, cubic(2) Region Elimination (Golden Search)(3) Newton, Secant, Quasi-Newton(4) Random(5) Analytical optimization
(1), (3), and (5) are preferred. However, it maynot be desirable to exactly optimize (better togenerate new search directions).
16
Ch
apte
r 6
Suppose we calculate the gradient at the point xT = [2 2]
17
Ch
apte
r 6
18
Ch
apte
r 6
19
Termination Criteria
Big change in f(x) but little changein x. Code will stop if x is sole criterion.
Big change in x but little changein f(x). Code will stop if x is sole criterion.
For minimization you can use up to three criteria for termination:
f(x)
f(x)
(1) 1
1 12
except when ( ) 0 ( ) ( )
( ) then use ( ) ( )
kk k
k k k
f xf x f x
f x f x f x
1
3 14
except when 0
then use
kk ki i
k k ki
xx x
x x x
5 6( ) or k kif x s
(2)
(3)
x
x
Ch
apte
r 6
20
Conjugate Search Directions
• Improvement over gradient method for general quadratic functions
• Basis for many NLP techniques• Two search directions are conjugate relative to Q if
• To minimize f(xnx1) when H is a constant matrix (=Q), you are guaranteed to reach the optimum in n conjugate direction stages if you minimize exactly at each stage
(one-dimensional search)
( ) ( ) 0i T j s Q s
Ch
apte
r 6
21
Ch
apte
r 6
22
Conjugate Gradient Method0 0
0 0
At calculate ( ). Let
( )
f
f
Step 1. x x
s x
0
1 0 0 0
Save ( ) and compute
f
Step 2. x
x x s
by minimizing f(x) with respect to in the s0 direction (i.e., carry out a unidimensional search for 0).
Step 3. Calculate The new search direction is a linear combination of
For the kth iteration the relation is
For a quadratic function it can be shown that these successive search directions are conjugate.After n iterations (k = n), the quadratic function is minimized. For a nonquadratic function,the procedure cycles again with xn+1 becoming x0.
Step 4. Test for convergence to the minimum of f(x). If convergence is not attained, return to step 3.
Step n. Terminate the algorithm when is less than some prescribed tolerance.
1 11 1 0
0 0
( ) ( )( )
( ) ( )
T
T
f ff
f f
x x
s x sx x
1 11 1 ( ) ( )
( )( ) ( )
T k kk k k
T k k
f ff
f f
x xs x s
x x
1 1( ), ( ).f fx x
( )kf x
(6.6)
0 1 and ( ) :fs x
Ch
apte
r 6
23
Ch
apte
r 6
24
Ch
apte
r 6
25
Ch
apte
r 6
26
Ch
apte
r 6
27
Ch
apte
r 6
2 21 2( 3) 9( 5)f x x Minimize using the method of conjugate gradients with
0 01 21 and 1x x
0 1
1x
0
4
72xf
For steepest descent,
0
0 4
72xs f
Steepest Descent Step (1-D Search)
1 0 01 4, 0.
1 72x
The objective function can be expressed as a function of 0 as follows:
0 0 2 0 2( ) (4 2) 9(72 4) .f
Minimizing f(0), we obtain f = 3.1594 at 0 = 0.0555. Hence
1 1.223
5.011x
as an initial point.
In vector notation,
Ch
apte
r 6
28
Ch
apte
r 6
Calculate Weighting of Previous step
The new gradient can now be determined as
and 0 can be computed as
Generate New (Conjugate) Search Direction
and
One dimensional Search
Solving for 1 as before [i.e., expressing f(x1) as a function of 1 and minimizing with respect to 1] yields f = 5.91 x 10-10 at 1 = 0.4986. Hence
which is the optimum (in 2 steps, which agrees with the theory).
1
3.554
0.197xf
2 20
2 2
(3.554) (0.197)0.00244.
(4) (72)
1 3.554 4 3.5640.00244
0.197 72 0.022s
2 11.223 3.564
5.011 0.022x
2 3.0000
5.0000X
Ch
apte
r 6
29
Ch
apte
r 6
30
Ch
apte
r 6
31
Ch
apte
r 6
32
Ch
apte
r 6
Fletcher – Reeves Conjugate Gradient Method
0 0
1 1 01
2 2 12
1
( )
( )
( )
are chosen to make 0 (conjugate directions)k k kk
Let s f x
s f x s
s f x s
s H s
Derivation: (let )kH H
1 2
1
( ) ( ) ( )( )
( ) ( )
k k k k
k k k kk
f x f x f x x x
f x f x H x H s
33
Ch
apte
r 6
1 1
1 1
1
1 1 1
1
1
( ) ( )
( ) ( ) ( ) /
Using definition of conjugate directions, ( ) =0,
( ) ( ) ( ) 0
( ) ( ) 0
and ( ) 0,
(
k k k k
Tk k kT k
k kT
Tk k k kk
k kT
k kT
Tk
s H f x f x
s f x f x H
s Hs
f x f x H H f x s
f x f x
f x s
f
1 1
1 1
) ( )
( ) ( )
( )
k k
k kT
k k kk
x f x
f x f x
s f x s
and solving for the weighting factor:
34
Ch
apte
r 6
Linear vs. Quadratic Approximation of f(x)
1( ) ( ) ( ) ( ) ( ) ( )( )
2
(1) Using a linear approximation of ( ) :
( )0 ( ) so cannot solve for !
( )
(2) Using a quadratic approximation for (x) :
( )
(
k k k k k kT T
k k kk
k k
T
T
f x f x x x f x x x H x x x
x x x s
f x
df xf x x
d x
f
df x
d x
1 k 1
Newton's method0 ( ) ( )( )) solves one of these
with x x( ) ( )
(simultaneous
k k k
k k k
f x H x x x
or x x H x f x
equation-solving)
35
Ch
apte
r 6
1
Note: Both direction and step length are determined
- Requires second derivatives (Hessian)
- , must be positive definite (for minimum) to guarantee convergence
- Iterate if ( ) is not quadratic
Mod
H H
f x
1 1
2 21 2
0
ified Newton's Procedure:
( ) ( )
1 for Newton's Method
(If , you have steepest descent)
Example
( ) 20
Minimize starting at x 1 1
k k k kk
k
T
x x H x f x
H I
f x x x
f
36
Ch
apte
r 6
37
Ch
apte
r 6
38
Ch
apte
r 6
39
Ch
apte
r 6
Marquardt’s Method1
11
If ( ) or ( ) is not always positive definite, make it
positive definite.
Let ( ) ( ) ; similar for H( )
is a positive constant large enough to shift all the
negative eigenvalues of (
H x H x
H x H x I x
H x
0
0
1 2
0
).
Example
At the start of the search, ( ) is evaluated at and
Not positive definite1 2
found to be ( ) as the eigenvalues2 1
are 3, 1
Modify ( ) to be ( 2)
Positive def1 2 2
H2 1 2
H x x
H x
e e
H x
1 2
inite as the
eigenvalues are e 5, 1
is adjusted as search proceeds.
e
40
Ch
apte
r 6
Step 1
0Pick x the starting point.
Let convergence criterion
Step 2
0 3Set 0. Let 10 k
Step 3
Calculate ( ) kf x
Step 4
Is ( ) ) ? If yes, terminate. If no, continue. kf x
41
Ch
apte
r 6
Step 5
1
Calculate s( ) - ( )
k k k kkx H I f x
Step 6
1Calculate ( ) k k kx x s x
Step 7
1Is ( )? If yes, go to step 8. If no, go to step 9. k kf( x ) f x
Step 8
k 1 1Set and 1. Go to step 3
4 k k k
Step 9
kSet 2 . Go to step 5 k
42
Ch
apte
r 6
Secant MethodsRecall for one dimensional search the secant methodonly uses values of f(x) and f ′(x).
1
1
1
( ) ( )( )
Approximate ( ) by a straight line (the secant).
Hence it is called a "Quasi-Newton" method.
The basic idea (for a quadratic function):
( ) 0 ( )
k pk k k
k p
k k k k
f x f xx x f x
x x
f x
f x H x or x x 1
k
2 2
1 1
2 1 2 1
( )
Pick two points to start (x Ref. point)
( ) ( ) ( )
( ) ( ) ( )
( ) ( ) ( )
k
k k
k k
k
H f x
f x f x H x x
f x f x H x x
f x f x y H x x
43
Ch
apte
r 6
1
1
For a non-quadratic function, would be calculated,
after taking a step from to , by solving the
secant equations
y
- An infinite number of candidates exist for when n 1
-
k k
k k k k
H
x x
H x or x H y
H-1 -1We want to choose (or ) close to (or ) in
some sense. Several methods can be used to update
H H H H
H
44
Ch
apte
r 6
• Probably the best update formula is the BFGS update(Broyden – Fletcher – Goldfarb – Shanno) – ca. 1970
• BFGS is the basis for the unconstrained optimizerin the Excel Solver
• Does not require inverting the Hessian matrix butapproximates the inverse with values of f
45
Ch
apte
r 6
46
Ch
apte
r 6
47
Ch
apte
r 6
48
Ch
apte
r 6