introduction to function minimization
Post on 23-Jan-2016
59 Views
Preview:
DESCRIPTION
TRANSCRIPT
Introduction to Function Minimization
Motivation exampleData on height of a group of 10000 people, men and womenData on gender not recorded, not known who was man and who woman
Can one estimate number of men in the group?
Asymmetric histogram
Non-Gaussian: two subgroups (men & woman) ? two superposed Gaussians ?
______________________________This is artificially simulated data, just demo, two Gaussians with different mean randomly men/women = 7/3
See error bars!
N
Best Gaussian fit
Two Gaussiansbest fit
This is artificially simulated data, just demo, two Gaussians were used for simulation, with different mean, randomly men/women = 7/3
Two Gaussiansbest fit
This is artificially simulated data, just demo, two Gaussians were used for simulation, with different mean, randomly men/women = 7/3
Two Gaussiansbest fit
This is artificially simulated data, just demo, two Gaussians were used for simulation, with different mean, randomly men/women = 7/3
Press-the-button user: find the best fit by two Gaussiansbut: How it is done?
Gaussian)(
2
2
2
2
2
)(exp
2
1),;N(
x
x
Two Gaussians
),;N(),;N(),,,,,;F( 543210543210 ppxpppxpppppppx
Find the best values of the parameters
Needs goodness-of-fit criterion
Fit function is nonlinear in its parameters
50 histogram binseach bin represents "one data point"
j
j
j
hx
:error :contentbin
:positionbinth j
Goodness-of-fit (least squares) :
50
12
2543210
5432102 )),,,,,;F((
),,,,,(j j
jj ppppppxhpppppp
Mathematically, the problem is the following
I have a function of n variables (parameters of goodness-of-fit)
I want to find the pointfor which the function achieves its minimum
),,,,,( 5432102 pppppp
543210 ,,,,, pppppp
The function is often non-linear, so analytical solution of the problem is usually hopeless. I need numerical methods to attack the problem
Numerically, the problem is the following
I have a function (in the sense of program subroutine) of n parameters
I want to find the pointfor which the function achieves its minimum. Each call to evaluate the function value is often time-consuming, having in mind its definition like
)p,p,p,p,p,FCN(p 543210
543210 p,p,p,p,p,p
50
12
2543210
543210
)),,,,,;F((),,,,,FCN(
j j
jj ppppppxhpppppp
I need a numerical procedure which can call the function FCN and by repeating calls with possibly different values of the parameters finally finds the minimizing set of parameter values
Stepping algorithms
• Start at some point in the parameter space• Choose direction and step size• Walk according to some clever strategy doing
iterative steps and looking for small values of the minimized function
One dimensional optimization problem
Stepping algorithm with adaptable step size
fcurrent =FCN(xcurrent);
repeat forever {
xtrial = xcurrent+step; ftrial=FCN(xtrial);
if (ftrial<fcurrent) //success
{xcurrent=xtrial; fcurrent=ftrial; step=3*step;}
else //failure
{step=-0.4*step;}
}
Fast approach to minimum area, slow convergence at the end
parabola
Success – failure method
parabola
line
line ............estimates first derivative (gradient)
parabola ... estimates first as well as second derivative
Stepping method can estimate gradient as well as second derivative
All functions around minimum look like parabola
Newton: go straight to minimum
gGxx
xxGgxx
xxGxxgfx
10min
0minminmin'
2000
0)()(f
)(21
)()f(
Inverse to the second derivative helps to jump straight to minimum in the direction of the negative gradient
Any stepping method needs
• starting point• initial step size• end criterion (otherwise infinite loop)
• step size < δ• improvement in the function values < ε
Problem of local and/or boundary minima
Many-dimensional minimization
)p,FCN(p 10
• start at point (p0,p1)
• fix value p1
• perform minimization with respect to p0
• fix value p0
• perform minimization with respect to p1
axes in wrong directions: not in the direction of gradient
cure: rotate axes after each iteration so that the first axis is in the direction of the (estimated) gradient
Simplex minimization
get rid of the worst point
estimated gradient direction
trial point
Gradient methods
Stepping methods which use in addition to the function value at the current point also local gradient at that point
The local gradient can be obtained
• by calling a user-supplied procedure which returns the vector (n-dimensional array) of first order derivatives
• by estimating the gradient numerically evaluating the function value at n points in a vary small
neighborhood of the current point
Performing one-dimensional minimization in the direction of the negative gradient significantly improves the current point position
Quadratic approximation to the minimized function at the current point p0 in n dimensions
))((21
)()F( ,0,0,0,0,00 jjiiijiii ppppGppgfp
Knowing g, one can perform one dimensional optimization with respect to "step size" α in search for the best point
iii gpp ,0,0 If G0 were known, the minimum will be at
iijii gGpp ,01,0,0
It would be useful, if the gradient method could also estimate the matrix G-1
If the minimized function is exactly quadratic, then G is constant in the whole parameter space
))((21
)()F( ,0,0,0,0,00 jjiiijiii ppppGppgfp
If the minimized function is not exactly quadratic, we expect slow variations of G in the region not far from the minimum
Local numerical estimate of G is costly, needs many calls to F, and than matrix inversion is needed
Idea: can one iteratively estimate G-1?
Example of a variable metric method
lklk
ljlkik
kk
jiijij
iiiii
ii
jii
ijii
G
GGGG
ggpppggpp
Ggp
1,0
1,0
1,01
,01
,0i,0
,0
-11,0,0,0
:iterate
and :calculateat :evaluate
value optimizedfor : togo)G (estimated , , :pointcurrent
for quadratic functions the iteration converges to minimum and true G-1
Variable metric methods
• fast convergence in the almost quadratic region around minimum
• added value: good knowledge of G-1 at minimum what means that the shape of the optimized function around minimum is known
MINUITAuthor: F.James (CERN)
Complex minimization program (package) comprising various minimization procedures as well as other useful data-fitting tools
Among them
• SIMPLEX
• MIGRAD (variable metric method)
MINUITOriginally FORTRAN program in the CERN library
Now available in C++
• stand alone (SEAL project)
http://seal.web.cern.ch/seal/MathLibs/Minuit2/html/index.html
•contained in the CERN ROOT package
http://root.cern.ch
Available in Java (FreeHEP JAIDA project)
http://java.freehep.org/freehep-jminuit/index.html
References
F. James and M. Winkler, C++ MINUIT User's Guidehttp://seal.cern.ch/documents/minuit/mnusersguide.pdf
F. James, Minuit Tutorial on Function Minimization http://seal.cern.ch/documents/minuit/mntutorial.pdf
F. James, The Interpretation of Errors in Minuit http://seal.cern.ch/documents/minuit/mnerror.pdf
Microsoft Visual c++ Express is FREE c++ for Windowshttp://www.microsoft.com/express/vc/
top related