intro to optimization math-415 numerical analysis 1
DESCRIPTION
Different Kinds of Optimization Figure from: Optimization Technology CenterTRANSCRIPT
INTRO TO OPTIMIZATIONMATH-415 Numerical Analysis
1
• Optimization is finding values of the variables that minimize or maximize the objective function while (in general) satisfying some constraints
• Some slides from the Princeton university course COS-323 materials are used here https://www.cs.princeton.edu/courses/archive/fall09/cos323/syll.html
Optimization
2
Different Kinds of Optimization
Figure from: Optimization Technology Centerhttp://www-fp.mcs.anl.gov/otc/Guide/OptWeb/
Optimization in 1-D
• Let us have a function of one variable
• Our task is to find minimum (or maximum) of this function
4
: ,
,
f x a b R
a b R
Optimization in 1-D
• Look for analogies to bracketing in bisection method for solving nonlinear equations
• What does it mean to bracket a minimum?(xleft, f(xleft))
(xright, f(xright))
(xmid, f(xmid))xleft < xmid < xrightf(xmid) < f(xleft)f(xmid) < f(xright)
Optimization in 1-D
• Once we have these properties, there is at least one local minimum between xleft and xright
• Establishing bracket initially: Given xinitial, increment Evaluate f(xinitial), f(xinitial+increment) If decreasing, step until find an increase Else, step in opposite direction until find an increase Grow increment at each step
• For maximization: substitute –f for f
Optimization in 1-D
• Strategy: evaluate function at some xnew
(xleft, f(xleft))
(xright, f(xright))
(xmid, f(xmid))(xnew, f(xnew))
Optimization in 1-D
• Strategy: evaluate function at some xnewHere, new “bracket” points are xnew, xmid, xright
(xleft, f(xleft))
(xright, f(xright))
(xmid, f(xmid))(xnew, f(xnew))
Optimization in 1-D
• Strategy: evaluate function at some xnewHere, new “bracket” points are xleft, xnew, xmid
(xleft, f(xleft))
(xright, f(xright))
(xmid, f(xmid))(xnew, f(xnew))
Optimization in 1-D
• Unlike with bisection method for solving nonlinear equations, in can’t always guarantee that interval will be reduced by a factor of 2
• Let’s find the optimal place for xmid, relative to left and right, that will guarantee same factor of reduction regardless of outcome
Optimization in 1-D: Golden Section Method
b
• and shall be selected in such a way that
• is the golden section (a positive root of the equation
1x 2x
1 2
1.618b a b ab x x a
1x 2x
2 1 51 0 1.6182
1.618
Optimization in 1-D: Golden Section Method
• performs the golden section of• performs the golden section of
1x
2b ax a
2,a x
2x 1,x b
1b ax b
b
1x 2x
Optimization in 1-D: Golden Section Method: Algorithm (min)
• Localize the interval [a, b] to bracket the min and determine - accuracy of a solution
• Find
• If then • else
• If then is our solution
13
1 2 1 1 2 2; ; ;b a b ax b x a y f x y f x
1 2y y 1 1 2 2; ; b aa x x x x a
2 2 1 1; ; b ab x x x x b
b a 2a bx
Optimization in 1-D: Golden Section Method: Algorithm (max)
• Localize the interval [a, b] to bracket the max and determine - accuracy of a solution
• Find
• If then • else
• If then is our solution
14
1 2 1 1 2 2; ; ;b a b ax b x a y f x y f x
1 1 2 2; ; b aa x x x x a
2 2 1 1; ; b ab x x x x b
b a 2a bx
1 2y y
Optimization in 1-D: Golden Section Method: Analysis
• Golden section method is one of the classical methods used in 1-D optimization
• Like the bisection method for solving nonlinear equations, it has linear convergence, narrowing the bracketing interval in the golden proportion
Error Tolerance
• Around minimum, derivative = 0, so
• Rule of thumb: pointless to ask for more accuracy than sqrt( ) Can use double precision if you want a single-
precision result (and/or have single-precision data)
212
212
( ) ( ) ( ) ...
( ) ( ) ( ) machine
~
f x x f x f x x
f x x f x f x x
x
Faster 1-D Optimization
• Trade off super-linear convergence forworse robustness Combine with Golden Section search for safety
• Usual bag of tricks: Fit parabola through 3 points, find minimum Compute derivatives as well as positions, fit cubic Use second derivatives: Newton
Newton’s Method
Newton’s Method
Newton’s Method
Newton’s Method
Newton’s Method
• At each step:
• Requires 1st and 2nd derivatives• Quadratic convergence
1( )( )k
k kk
f xx xf x
Multi-Dimensional Optimization (Finding min/max of functions of multiple variables)
• Important in many areas Fitting a model to measured data Finding best design in some parameter space
• Hard in general Weird shapes: multiple extrema, saddles,
curved or elongated valleys, etc. Can’t bracket
Newton’s Method inMultiple Dimensions
• Replace 1st derivative with gradient and2nd derivative with Hessian matrix
2 2
2
2 2
2
( , )fxfy
f fx yx
f fx y y
f x y
f
H
Newton’s Method inMultiple Dimensions
• Replace 1st derivative with gradient and2nd derivative with Hessian matrix
• So,
• Tends to be extremely fragile unless function very smooth and starting close to minimum
11 ( ) ( )k k k kx x H x f x
Steepest Descent Methods
• What if you can’t / don’t want touse 2nd derivative?
• “Quasi-Newton” methods estimate Hessian• Alternative: walk along (negative of) gradient…
Perform 1-D minimization along line passing through current point in the direction of the gradient
Once done, re-compute gradient, iterate
Problem With Steepest Descent
Problem With Steepest Descent
Conjugate Gradient Methods
• Idea: avoid “undoing” minimization that’s already been done
• Walk along direction
• Polak and Ribiere formula:
1 1k k k kd g d
1
T1
T
( )k k k
kk k
g g gg g
Conjugate Gradient Methods
• Conjugate gradient implicitly obtains information about Hessian
• For quadratic function in n dimensions, gets exact solution in n steps (ignoring roundoff error)
• Works well in practice…
Value-Only Methods in Multi-Dimensions
• If can’t evaluate gradients, life is hard• Can use approximate (numerically evaluated)
gradients:
11
2
2
3
3
( ) ( )
( ) ( )
( ) ( )( )
f f x e f xe
f f x e f xe
f x e f xfe
f x
Generic Optimization Strategies
• Uniform sampling: Cost rises exponentially with # of dimensions
• Simulated annealing: Search in random directions Start with large steps, gradually decrease “Annealing schedule” – how fast to cool?