introduction to function minimization

Introduction to Function Minimization

Motivation exampleData on height of a group of 10000 people, men and womenData on gender not recorded, not known who was man and who woman

Can one estimate number of men in the group?

Asymmetric histogram

Non-Gaussian: two subgroups (men & woman) ? two superposed Gaussians ?

______________________________This is artificially simulated data, just demo, two Gaussians with different mean randomly men/women = 7/3

See error bars!

N

Best Gaussian fit

Two Gaussiansbest fit

This is artificially simulated data, just demo, two Gaussians were used for simulation, with different mean, randomly men/women = 7/3

Press-the-button user: find the best fit by two Gaussiansbut: How it is done?

Gaussian)(

2

2

2

2

2

)(exp

2

1),;N(

x

x

Two Gaussians

),;N(),;N(),,,,,;F( 543210543210 ppxpppxpppppppx

Find the best values of the parameters

Needs goodness-of-fit criterion

Fit function is nonlinear in its parameters

50 histogram binseach bin represents "one data point"

j

j

j

hx

:error :contentbin

:positionbinth j

Goodness-of-fit (least squares) :

50

12

2543210

5432102 )),,,,,;F((

),,,,,(j j

jj ppppppxhpppppp

Mathematically, the problem is the following

I have a function of n variables (parameters of goodness-of-fit)

I want to find the pointfor which the function achieves its minimum

),,,,,( 5432102 pppppp

543210 ,,,,, pppppp

The function is often non-linear, so analytical solution of the problem is usually hopeless. I need numerical methods to attack the problem

Numerically, the problem is the following

I have a function (in the sense of program subroutine) of n parameters

I want to find the pointfor which the function achieves its minimum. Each call to evaluate the function value is often time-consuming, having in mind its definition like

)p,p,p,p,p,FCN(p 543210

543210 p,p,p,p,p,p

50

12

2543210

543210

)),,,,,;F((),,,,,FCN(

j j

jj ppppppxhpppppp

I need a numerical procedure which can call the function FCN and by repeating calls with possibly different values of the parameters finally finds the minimizing set of parameter values

Stepping algorithms

• Start at some point in the parameter space• Choose direction and step size• Walk according to some clever strategy doing

iterative steps and looking for small values of the minimized function

One dimensional optimization problem

Stepping algorithm with adaptable step size

fcurrent =FCN(xcurrent);

repeat forever {

xtrial = xcurrent+step; ftrial=FCN(xtrial);

if (ftrial<fcurrent) //success

{xcurrent=xtrial; fcurrent=ftrial; step=3*step;}

else //failure

{step=-0.4*step;}

}

Fast approach to minimum area, slow convergence at the end

parabola

Success – failure method

parabola

line

line ............estimates first derivative (gradient)

parabola ... estimates first as well as second derivative

Stepping method can estimate gradient as well as second derivative

All functions around minimum look like parabola

Newton: go straight to minimum

gGxx

xxGgxx

xxGxxgfx

10min

0minminmin'

2000

0)()(f

)(21

)()f(

Inverse to the second derivative helps to jump straight to minimum in the direction of the negative gradient

Any stepping method needs

• starting point• initial step size• end criterion (otherwise infinite loop)

• step size < δ• improvement in the function values < ε

Problem of local and/or boundary minima

Many-dimensional minimization

)p,FCN(p 10

• start at point (p0,p1)

• fix value p1

• perform minimization with respect to p0

• fix value p0

• perform minimization with respect to p1

axes in wrong directions: not in the direction of gradient

cure: rotate axes after each iteration so that the first axis is in the direction of the (estimated) gradient

Simplex minimization

get rid of the worst point

estimated gradient direction

trial point

Gradient methods

Stepping methods which use in addition to the function value at the current point also local gradient at that point

The local gradient can be obtained

• by calling a user-supplied procedure which returns the vector (n-dimensional array) of first order derivatives

• by estimating the gradient numerically evaluating the function value at n points in a vary small

neighborhood of the current point

Performing one-dimensional minimization in the direction of the negative gradient significantly improves the current point position

Quadratic approximation to the minimized function at the current point p0 in n dimensions

))((21

)()F( ,0,0,0,0,00 jjiiijiii ppppGppgfp

Knowing g, one can perform one dimensional optimization with respect to "step size" α in search for the best point

iii gpp ,0,0 If G0 were known, the minimum will be at

iijii gGpp ,01,0,0

It would be useful, if the gradient method could also estimate the matrix G-1

If the minimized function is exactly quadratic, then G is constant in the whole parameter space

))((21

)()F( ,0,0,0,0,00 jjiiijiii ppppGppgfp

If the minimized function is not exactly quadratic, we expect slow variations of G in the region not far from the minimum

Local numerical estimate of G is costly, needs many calls to F, and than matrix inversion is needed

Idea: can one iteratively estimate G-1?

Example of a variable metric method

lklk

ljlkik

kk

jiijij

iiiii

ii

jii

ijii

G

GGGG

ggpppggpp

Ggp

1,0

1,0

1,01

,01

,0i,0

,0

-11,0,0,0

:iterate

and :calculateat :evaluate

value optimizedfor : togo)G (estimated , , :pointcurrent

for quadratic functions the iteration converges to minimum and true G-1

Variable metric methods

• fast convergence in the almost quadratic region around minimum

• added value: good knowledge of G-1 at minimum what means that the shape of the optimized function around minimum is known

MINUITAuthor: F.James (CERN)

Complex minimization program (package) comprising various minimization procedures as well as other useful data-fitting tools

Among them

• SIMPLEX

• MIGRAD (variable metric method)

MINUITOriginally FORTRAN program in the CERN library

Now available in C++

• stand alone (SEAL project)

http://seal.web.cern.ch/seal/MathLibs/Minuit2/html/index.html

•contained in the CERN ROOT package

http://root.cern.ch

Available in Java (FreeHEP JAIDA project)

http://java.freehep.org/freehep-jminuit/index.html

References

F. James and M. Winkler, C++ MINUIT User's Guidehttp://seal.cern.ch/documents/minuit/mnusersguide.pdf

F. James, Minuit Tutorial on Function Minimization http://seal.cern.ch/documents/minuit/mntutorial.pdf

F. James, The Interpretation of Errors in Minuit http://seal.cern.ch/documents/minuit/mnerror.pdf

Microsoft Visual c++ Express is FREE c++ for Windowshttp://www.microsoft.com/express/vc/

introduction to function minimization

Documents

function fcn

minimumthe function

fit criterion fit function

different values

simulated data

different mean

best values

step size walk