intro to optimization math-415 numerical analysis 1

INTRO TO OPTIMIZATIONMATH-415 Numerical Analysis

1

• Optimization is finding values of the variables that minimize or maximize the objective function while (in general) satisfying some constraints

• Some slides from the Princeton university course COS-323 materials are used here https://www.cs.princeton.edu/courses/archive/fall09/cos323/syll.html

Optimization

2

https://www.cs.princeton.edu/courses/archive/fall09/cos323/syll.html



Different Kinds of Optimization

Figure from: Optimization Technology Centerhttp://www-fp.mcs.anl.gov/otc/Guide/OptWeb/

Optimization in 1-D

• Let us have a function of one variable

• Our task is to find minimum (or maximum) of this function

4

: ,

,

f x a b R

a b R

Optimization in 1-D

• Look for analogies to bracketing in bisection method for solving nonlinear equations

• What does it mean to bracket a minimum?(xleft, f(xleft))

(xright, f(xright))

(xmid, f(xmid))xleft < xmid < xrightf(xmid) < f(xleft)f(xmid) < f(xright)

Optimization in 1-D

• Once we have these properties, there is at least one local minimum between xleft and xright

• Establishing bracket initially: Given xinitial, increment Evaluate f(xinitial), f(xinitial+increment) If decreasing, step until find an increase Else, step in opposite direction until find an increase Grow increment at each step

• For maximization: substitute –f for f

Optimization in 1-D

• Strategy: evaluate function at some xnew

(xleft, f(xleft))

(xright, f(xright))

(xmid, f(xmid))(xnew, f(xnew))

Optimization in 1-D

• Strategy: evaluate function at some xnewHere, new “bracket” points are xnew, xmid, xright

(xleft, f(xleft))

(xright, f(xright))


Optimization in 1-D

• Strategy: evaluate function at some xnewHere, new “bracket” points are xleft, xnew, xmid

(xleft, f(xleft))

(xright, f(xright))


Optimization in 1-D

• Unlike with bisection method for solving nonlinear equations, in can’t always guarantee that interval will be reduced by a factor of 2

• Let’s find the optimal place for xmid, relative to left and right, that will guarantee same factor of reduction regardless of outcome

Optimization in 1-D: Golden Section Method

b

• and shall be selected in such a way that

• is the golden section (a positive root of the equation

1x 2x

1 2

1.618b a b ab x x a

1x 2x

2 1 51 0 1.6182

1.618

Optimization in 1-D: Golden Section Method

• performs the golden section of• performs the golden section of

1x

2b ax a

2,a x

2x 1,x b

1b ax b

b

1x 2x

Optimization in 1-D: Golden Section Method: Algorithm (min)

• Localize the interval [a, b] to bracket the min and determine - accuracy of a solution

• Find

• If then • else

• If then is our solution

13

1 2 1 1 2 2; ; ;b a b ax b x a y f x y f x

1 2y y 1 1 2 2; ; b aa x x x x a

2 2 1 1; ; b ab x x x x b

b a 2a bx

Optimization in 1-D: Golden Section Method: Algorithm (max)

• Localize the interval [a, b] to bracket the max and determine - accuracy of a solution

• Find

• If then • else

• If then is our solution

14

1 2 1 1 2 2; ; ;b a b ax b x a y f x y f x

1 1 2 2; ; b aa x x x x a

2 2 1 1; ; b ab x x x x b

b a 2a bx

1 2y y

Optimization in 1-D: Golden Section Method: Analysis

• Golden section method is one of the classical methods used in 1-D optimization

• Like the bisection method for solving nonlinear equations, it has linear convergence, narrowing the bracketing interval in the golden proportion

Error Tolerance

• Around minimum, derivative = 0, so

• Rule of thumb: pointless to ask for more accuracy than sqrt( ) Can use double precision if you want a single-

precision result (and/or have single-precision data)

212

212

( ) ( ) ( ) ...

( ) ( ) ( ) machine

~

f x x f x f x x

f x x f x f x x

x

Faster 1-D Optimization

• Trade off super-linear convergence forworse robustness Combine with Golden Section search for safety

• Usual bag of tricks: Fit parabola through 3 points, find minimum Compute derivatives as well as positions, fit cubic Use second derivatives: Newton

Newton’s Method

Newton’s Method

• At each step:

• Requires 1st and 2nd derivatives• Quadratic convergence

1( )( )k

k kk

f xx xf x

Multi-Dimensional Optimization (Finding min/max of functions of multiple variables)

• Important in many areas Fitting a model to measured data Finding best design in some parameter space

• Hard in general Weird shapes: multiple extrema, saddles,

curved or elongated valleys, etc. Can’t bracket

Newton’s Method inMultiple Dimensions

• Replace 1st derivative with gradient and2nd derivative with Hessian matrix

2 2

2

2 2

2

( , )fxfy

f fx yx

f fx y y

f x y

f

H

Newton’s Method inMultiple Dimensions

• Replace 1st derivative with gradient and2nd derivative with Hessian matrix

• So,

• Tends to be extremely fragile unless function very smooth and starting close to minimum

11 ( ) ( )k k k kx x H x f x

Steepest Descent Methods

• What if you can’t / don’t want touse 2nd derivative?

• “Quasi-Newton” methods estimate Hessian• Alternative: walk along (negative of) gradient…

Perform 1-D minimization along line passing through current point in the direction of the gradient

Once done, re-compute gradient, iterate

Problem With Steepest Descent

Conjugate Gradient Methods

• Idea: avoid “undoing” minimization that’s already been done

• Walk along direction

• Polak and Ribiere formula:

1 1k k k kd g d

1

T1

T

( )k k k

kk k

g g gg g

Conjugate Gradient Methods

• Conjugate gradient implicitly obtains information about Hessian

• For quadratic function in n dimensions, gets exact solution in n steps (ignoring roundoff error)

• Works well in practice…

Value-Only Methods in Multi-Dimensions

• If can’t evaluate gradients, life is hard• Can use approximate (numerically evaluated)

gradients:

11

2

2

3

3

( ) ( )

( ) ( )

( ) ( )( )

f f x e f xe

f f x e f xe

f x e f xfe

f x

Generic Optimization Strategies

• Uniform sampling: Cost rises exponentially with # of dimensions

• Simulated annealing: Search in random directions Start with large steps, gradually decrease “Annealing schedule” – how fast to cool?

intro to optimization math-415 numerical analysis 1

Documents