non-linear least squares

Upload: thehermitianoperator

Post on 02-Jun-2018

228 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/10/2019 Non-Linear Least Squares

    1/30

    IMM

    METHODS FORNON-LINEAR LEASTSQUARES PROBLEMS

    2nd Edition, April 2004

    K. Madsen, H.B. Nielsen, O. Tingleff

    Informatics and Mathematical ModellingTechnical University of Denmark

    C ONTENTS

    1. I NTRODUCTION AND D EFINITIONS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

    2. D ESCENT M ETHODS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

    2.1. The Steepest Descent method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

    2.2. Newtons Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

    2.3. Line Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

    2.4. Trust Region and Damped Methods . . . . . . . . . . . . . . . . . . . . . . . . 11

    3. N ON -L INEAR L EAST S QUARES P ROBLEMS . . . . . . . . . . . . . . . . . . . . . 17

    3.1. The GaussNewton Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

    3.2. The LevenbergMarquardt Method.. . . . . . . . . . . . . . . . . . . . . . . .24

    3.3. Powells Dog Leg Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

    3.4. A Hybrid Method: LM and QuasiNewton. . . . . . . . . . . . . . . . .34

    3.5. A Secant Version of the LM Method . . . . . . . . . . . . . . . . . . . . . . 40

    3.6. A Secant Version of the Dog Leg Method. . . . . . . . . . . . . . . . . . . 45

    3.7. Final Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

    A PPENDIX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

    R EFERENCES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

    INDEX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

  • 8/10/2019 Non-Linear Least Squares

    2/30

    1. I NTRODUCTION AND DEFINITIONS

    In this booklet we consider the following problem,

    Denition 1.1. Least Squares Problem

    Find x, a local minimizer for 1)

    F (x ) = 12m

    i =1

    (f i (x )) 2 ,

    where f i : IRn IR , i = 1 , . . . , m are given functions, and m n .

    Example 1.1. An important source of least squares problems is data tting . As anexample consider the data points (t 1 , y1 ), . . . , (tm , ym ) shown below

    t

    y

    Figure 1.1. Data points {(t i , yi )}(marked by + )and model M (x , t ) (marked by full line.)

    Further, we are given a tting model ,

    M (x , t ) = x3 ex 1 t + x4 ex 2 t .

    1) The factor 12 in the denition of F (x ) has no effect on x. It is introduced for conve-

    nience, see page 18.

    1. I NTRODUCTION AND D EFINITIONS 2

    The model depends on the parameters x = [x1 , x 2 , x 3 , x 4 ] . We assume that

    there exists an x

    so thatyi = M (x , t i ) + i ,

    where the {i }are (measurement) errors on the data ordinates, assumed to be-have like white noise.

    For any choice of x we can compute the residuals

    f i (x ) = yi M (x , t i )= yi x3 ex 1 t i x4 ex 2 t i , i = 1 , . . . , m .

    For a least squares t the parameters are determined as the minimizer x of thesum of squared residuals. This is seen to be a problem of the form in Deni-tion 1.1 with n = 4 . The graph of M (x , t ) is shown by full line in Figure 1.1.

    A least squares problem is a special variant of the more general problem:Given a function F : IRn IR, nd an argument of F that gives the minimumvalue of this so-called objective function or cost function .

    Denition 1.2. Global MinimizerGiven F : IRn IR. Find

    x + = argmin x {F (x )}.This problem is very hard to solve in general, and we only present meth-ods for solving the simpler problem of nding a local minimizer for F , anargument vector which gives a minimum value of F inside a certain regionwhose size is given by , where is a small, positive number.

    Denition 1.3. Local MinimizerGiven F : IRn IR. Find x so thatF (x) F (x ) for x x < .

    In the remainder of this introduction we shall discuss some basic concepts inoptimization, and Chapter 2 is a brief review of methods for nding a local

  • 8/10/2019 Non-Linear Least Squares

    3/30

    3 1. I NTRODUCTION AND D EFINITIONS

    minimizer for general cost functions. For more details we refer to Frandsen

    et al (2004). In Chapter 3 we give methods that are specially tuned for leastsquares problems.

    We assume that the cost function F is differentiable and so smooth that thefollowing Taylor expansion is valid, 2)

    F (x + h ) = F (x ) + h g + 12 h H h + O( h3 ) , (1.4a)

    where g is the gradient ,

    g F (x ) =

    F

    x 1(x )

    ...

    F x n

    (x )

    , (1.4b)

    and H is the Hessian ,

    H F (x ) = 2F x i x j

    (x ) . (1.4c)

    If xis a local minimizer and h is sufciently small, then we cannot nd apoint x+ h with a smaller F -value. Combining this observation with (1.4a)we get

    Theorem 1.5. Necessary condition for a local minimizer.If x is a local minimizer, then

    g F (x) = 0 .We use a special name for arguments that satisfy the necessary condition:

    Denition 1.6. Stationary point. If

    g s F (x s) = 0 ,then x s is said to be a stationary point for F .

    2) Unless otherwise specied, denotes the 2-norm, h = h 21 + + h 2n .

    1. I NTRODUCTION AND D EFINITIONS 4

    Thus, a local minimizer is also a stationary point, but so is a local maximizer.

    A stationary point which is neither a local maximizer nor a local minimizeris called a saddle point . In order to determine whether a given stationarypoint is a local minimizer or not, we need to include the second order termin the Taylor series (1.4a). Inserting x s we see that

    F (x s+ h ) = F (x s) + 12 h H s h + O( h3)

    with H s = F (x s) .(1.7)

    From denition (1.4c) of the Hessian it follows that any H is a symmetricmatrix. If we request that H s is positive denite , then its eigenvalues aregreater than some number > 0 (see Appendix A), and

    h H s h > h 2 .

    This shows that for h sufciently small the third term on the right-handside of (1.7) will be dominated by the second. This term is positive, so thatwe get

    Theorem 1.8. Sufcient condition for a local minimizer.

    Assume that x s is a stationary point and that F (x s) is positive denite.Then x s is a local minimizer.

    If H s is negative denite , then x s is a local maximizer. If H s is indenite (ieit has both positive and negative eigenvalues), then x s is a saddle point.

  • 8/10/2019 Non-Linear Least Squares

    4/30

    2. D ESCENT METHODS

    All methods for non-linear optimization are iterative: From a starting pointx 0 the method produces a series of vectors x 1 , x 2 , . . . , which (hopefully)converges to x, a local minimizer for the given function, see Denition 1.3.Most methods have measures which enforce the descending condition

    F (x k +1 ) < F (x k ) . (2.1)

    This prevents convergence to a maximizer and also makes it less probablethat we converge towards a saddle point. If the given function has severalminimizers the result will depend on the starting point x 0 . We do not knowwhich of the minimizers that will be found; it is not necessarily the mini-

    mizer closest to x 0 .In many cases the method produces vectors which converge towards theminimizer in two clearly different stages. When x 0 is far from the solutionwe want the method to produce iterates which move steadily towards x.In this global stage of the iteration we are satised if the errors do notincrease except in the very rst steps, ie

    e k +1 < e k for k > K ,

    where ek denotes the current error,

    e k = x k x . (2.2)In the nal stage of the iteration, where x k is close to x, we want fasterconvergence. We distinguish between

    Linear convergence :e k +1 a e k when e k is small ; 0 < a < 1 , (2.3a)

    2. D ESCENT M ETHODS 6

    Quadratic convergence :e k +1 = O( e k 2) when e k is small , (2.3b)

    Superlinear convergence :e k +1 / e k 0 for k. (2.3c)

    The methods presented in this lecture note are descent methods which sat-isfy the descending condition (2.1) in each step of the iteration. One stepfrom the current iterate consists in1. Find a descent direction h d (discussed below), and2. nd a step length giving a good decrease in the F -value.Thus an outline of a descent method is

    Algorithm 2.4. Descent method

    begink := 0; x := x 0 ; found := false {Starting point }while (not found ) and (k < k max )

    h d := search direction (x ) {From x and downhill }if (no such h exists) found := true {x is stationary }else := step length (x , h d) {from x in direction h d}x := x + h d; k := k+1 {next iterate }

    end

    Consider the variation of the F -value along the half line starting at x andwith direction h . From the Taylor expansion (1.4a) we see that

    F (x + h ) = F (x ) + h F (x ) + O( 2 )

    F (x ) + h F (x ) for sufciently small. (2.5)

    We say that h is a descent direction if F (x + h ) is a decreasing function of at = 0 . This leads to the following denition.

  • 8/10/2019 Non-Linear Least Squares

    5/30

  • 8/10/2019 Non-Linear Least Squares

    6/30

    9 2. D ESCENT M ETHODS

    F (x ) is positive denite, then we get quadratic convergence (dened in

    (2.3)). On the other hand, if x is in a region where F (x ) is negative deniteeverywhere, and where there is a stationary point, the basic Newton method(2.9) would converge (quadratically) towards this stationary point, whichis a maximizer. We can avoid this by requiring that all steps taken are indescent directions.

    We can build a hybrid method, based on Newtons method and the steepestdescent method. According to (2.10) the Newton step is guaranteed to bedownhill if F (x ) is positive denite, so a sketch of the central section of this hybrid algorithm could be

    if F (x ) is positive deniteh := h n

    elseh := h sd

    x := x + h

    (2.11)

    Here, h sd is the steepest descent direction and is found by line search; seeSection 2.3. A good tool for checking a matrix for positive deniteness is

    Choleskys method (see Appendix A) which, when successful, is also usedfor solving the linear system in question. Thus, the check for deniteness isalmost for free.

    In Section 2.4 we introduce some methods, where the computation of thesearch direction h d and step length is done simultaneously, and give aversion of (2.11) without line search. Such hybrid methods can be veryefcient, but they are hardly ever used. The reason is that they need an im-plementation of F (x ), and for complicated application problems this is notavailable. Instead we can use a so-called Quasi-Newton method, based onseries of matrices which gradually approach H = F (x). In Section 3.4we present such a method. Also see Chapter 5 in Frandsen et al (2004).

    2.3. Line SearchGiven a point x and a descent direction h . The next iteration step is a movefrom x in direction h . To nd out, how far to move, we study the variationof the given function along the half line from x in the direction h ,

    2.3. Line Search 10

    ( ) = F (x + h ) , x and h xed , 0 . (2.12)An example of the behaviour of ( ) is shown in Figure 2.1.

    yy = (0)

    y = ( )

    Figure 2.1. Variation of the cost function along the search line.

    Our h being a descent direction ensures that

    (0) = h F (x ) < 0 ,

    indicating that if is sufciently small, we satisfy the descending condition(2.1), which is equivalent to

    ( ) < (0) .

    Often, we are given an initial guess on , eg = 1 with Newtons method.Figure 2.1 illustrates that three different situations can arise

    1 is so small that the gain in value of the objective function is verysmall. should be increased.

    2

    is too large: ( ) (0) . Decrease in order to satisfy the descentcondition (2.1).3 is close to the minimizer 1) of ( ). Accept this -value.

    1) More precisely: the smallest local minimizer of . If we increase beyond the intervalshown in Figure 2.1, it may well happen that we get close to another local minimumfor F .

  • 8/10/2019 Non-Linear Least Squares

    7/30

  • 8/10/2019 Non-Linear Least Squares

    8/30

    13 2. D ESCENT M ETHODS

    at x (unless x = x): by a proper modication of or we aim at having

    better luck in the next iteration step.Since L(h ) is assumed to be a good approximation to F (x + h ) for h suf-ciently small, the reason why the step failed is that h was too large, andshould be reduced. Further, if the step is accepted, it may be possible to usea larger step from the new iterate and thereby reduce the number of stepsneeded before we reach x.

    The quality of the model with the computed step can be evaluated by theso-called gain ratio

    = F (x ) F (x + h )L(0 ) L(h ) , (2.18)

    ie the ratio between the actual and predicted decrease in function value. Byconstruction the denominator is positive, and the numerator is negative if the step was not downhill it was too large and should be reduced.

    With a trust region method we monitor the step length by the size of theradius . The following updating strategy is widely used,

    if < 0.25 := / 2elseif > 0.75

    := max { , 3h }(2.19)

    Thus, if < 14 , we decide to use smaller steps, while > 34 indicates that it

    may be possible to use larger steps. A trust region algorithm is not sensitiveto minor changes in the thresholds 0.25 and 0.75, the divisor p1 = 2 or thefactor p2 = 3 , but it is important that the numbers p1 and p2 are chosen so

    that the -values cannot oscillate.In a damped method a small value of indicates that we should increasethe damping factor and thereby increase the penalty on large steps. A largevalue of indicates that L(h ) is a good approximation to F (x + h ) for thecomputed h , and the damping may be reduced. A widely used strategy isthe following, which is similar to (2.19), and was was originally proposedby Marquardt (1963),

    2.4. Trust Region and Damped Methods 14

    if < 0.25

    := 2elseif > 0.75 := / 3

    (2.20)

    Again, the method is not sensitive to minor changes in the thresholds 0.25and 0.75 or the numbers p1 = 2 and p2 = 3 , but it is important that the num-bers p1 and p2 are chosen so that the -values cannot oscillate. Experienceshows that the discontinuous changes across the thresholds 0.25 and 0.75can give rise to a utter (illustrated in Example 3.7 on page 27) that canslow down convergence, and we demonstrated in Nielsen (1999) that thefollowing strategy in general outperforms (2.20),

    if > 0 := max{13 , 1 (2 1)3}; := 2else := ; := 2

    (2.21)

    The factor is initialized to = 2 . Notice that a series of consecutive fail-ures results in rapidly increasing -values. The two updating formulas areillustrated below.

    0 10.25 0.75

    1

    new

    /

    Figure 2.2. Updating of by (2.21) with = 2 (full line) Marquardts strategy (2.20) (dasheded line).

    2.4.1. Computation of the step. In a damped method the step is computedas a stationary point for the function

    (h ) = L(h ) + 12 h h ,

  • 8/10/2019 Non-Linear Least Squares

    9/30

    15 2. D ESCENT M ETHODS

    This means that h dm is a solution to

    (h ) = L (h ) + h = 0 ,and from the denition of L(h ) in (2.14) we see that this is equivalent to

    (B + I )h dm = c , (2.22)where I is the identity matrix. If is sufciently large, the symmetric matrixB + I is positive denite (shown in Appendix A), and then it follows fromTheorem 1.8 that h dm is a minimizer for L.

    Example 2.1. In a damped Newton method the model L(h ) is given by c = F (x )and B = F (x ), and (2.22) takes the form

    (F (x ) + I )h dn = F (x ) .h dn is the so-called damped Newton step. If is very large, then

    h dn 1

    F (x ) ,

    ie a short step in a direction close to the steepest descent direction. On the otherhand, if is very small, then h dn is close to the Newton step h n . Thus, we can

    think of the damped Newton method as a hybrid between the steepest descentmethod and the Newton method.

    We return to damped methods in Section 3.2.

    In a trust region method the step h tr is the solution to a constrained opti-mization problem,

    minimize L(h )

    subject to h h 2

    .(2.23)

    It is outside the scope of this booklet to discuss this problem in any detail(see Madsen et al (2004) or Section 4.1 in Nocedal and Wright (1999). We just want to mention a few properties.

    If the matrix B in (2.14) is positive denite, then the unconstrained mini-mizer of L is the solution to

    2.4. Trust Region and Damped Methods 16

    Bh = c ,and if this is sufciently small (if it satises h h 2), then this is thedesired step, h tr . Otherwise, the constraint is active, and the problem ismore complicated. With a similar argument as we used on page 11, we cansee that we do not have to compute the true solution to (2.23), and in Sec-tions 3.3 and 3.4 we present two ways of computing an approximation toh tr .

    Finally, we present two similarities between a damped method and a trustregion method in the case where B is positive denite: In case the uncon-

    strained minimizer is outside the trust region, it can be shown (Theorem2.11 in Madsen et al (2004)) that there exists a > 0 such that

    Bh tr + c = h tr . (2.24a)By reordering this equation and comparing it with (2.22) we see that h tr isidentical with the damped step h dm computed with the damping parameter = . On the other hand, one can also show (Theorem 5.11 in Frandsen etal (2004)) that if we compute h dm for a given 0, then

    h dm = argmin h h dm {L(h )}, (2.24b)ie h dm is equal to h tr corresponding to the trust region radius = h dm .Thus, the two classes of methods are closely related, but there is not a simpleformula for the connection between the - and -values that give the samestep.

  • 8/10/2019 Non-Linear Least Squares

    10/30

    3. N ON -L INEAR L EAST SQUARES P ROBLEMS

    In the remainder of this lecture note we shall discuss methods for nonlinear

    least squares problems. Given a vector function f : IRn IRm with m n .We want to minimize f (x ) , or equivalently to ndx= argmin x {F (x )}, (3.1a)

    where

    F (x ) = 12m

    i =1

    (f i (x )) 2 = 12 f (x )2 = 12 f (x ) f (x ) . (3.1b)

    Least squares problems can be solved by general optimization methods, butwe shall present special methods that are more efcient. In many cases theyachieve better than linear convergence, sometimes even quadratic conver-gence, even though they do not need implementation of second derivatives.

    In the description of the methods in this chapter we shall need formulas forderivatives of F : Provided that f has continuous second partial derivatives,we can write its Taylor expansion as

    f (x + h ) = f (x ) + J (x )h + O( h 2) , (3.2a)

    where J IRm n is the Jacobian. This is a matrix containing the rst partial

    derivatives of the function components,

    (J (x )) ij = f ix j

    (x ) . (3.2b)

    As regards F : IRn IR, it follows from the rst formulation in (3.1b),

    3. L EAST S QUARES P ROBLEMS 18

    that 1)

    F x j

    (x ) =m

    i =1

    f i (x )f ix j

    (x ) . (3.3)

    Thus, the gradient (1.4b) is

    F (x ) = J (x ) f (x ) . (3.4a)

    We shall also need the Hessian of F . From (3.3) we see that the element inposition ( j, k ) is

    2F x j x k

    (x ) =m

    i =1

    f ix j

    (x ) f ix k

    (x ) + f i (x ) 2 f ix j x k(x ) ,

    showing that

    F (x ) = J (x ) J (x ) +m

    i =1f i (x)f i (x ) . (3.4b)

    Example 3.1. The simplest case of (3.1) is when f (x ) has the form

    f (x ) = b Ax ,where the vector bIR

    m and matrix AIRm n are given. We say that this is a

    linear least squares problem . In this case J (x ) = A for all x , and from (3.4a)we see that

    F (x ) = A (b Ax ) .This is zero for x determined as the solution to the so-called normal equations ,

    (A A )x = A b . (3.5)

    The problem can be written in the form

    Ax b ,

    and alternatively we can solve it via orthogonal transformation : Find an orthog-onal matrix Q such that

    1) If we had not used the factor 12 in the denition (3.1b), we would have got an annoyingfactor of 2 in a lot of expressions.

  • 8/10/2019 Non-Linear Least Squares

    11/30

  • 8/10/2019 Non-Linear Least Squares

    12/30

    21 3. L EAST S QUARES P ROBLEMS

    F (x + h ) L(h ) 12 (h ) (h )= 12 f f + h J f + 12 h J Jh= F (x ) + h J f + 12 h J Jh (3.7b)

    (with f = f (x ) and J = J (x )). The GaussNewton step h gn minimizes L(h ),

    h gn = argmin h {L(h )}.It is easily seen that the gradient and the Hessian of L are

    L (h ) = J f + J Jh , L (h ) = J J . (3.8)

    Comparison with (3.4a) shows that L (0 ) = F (x ). Further, we see that thematrix L (h ) is independent of h . It is symmetric and if J has full rank, ieif the columns are linearly independent, then L (h ) is also positive denite,cf Appendix A. This implies that L(h ) has a unique minimizer, which canbe found by solving

    (J J )h gn = J f . (3.9)This is a descent direction for F since

    h gn F (x ) = h gn (J f ) = h gn (J J )h gn < 0 . (3.10)Thus, we can use h gn for h d in Algorithm 2.4. The typical step is

    Solve (J J )h gn = J f x := x + h gn

    (3.11)

    where is found by line search. The classical Gauss-Newton method uses = 1 in all steps. The method with line search can be shown to have guar-

    anteed convergence, provided thata) {x |F (x ) F (x 0 )}is bounded, andb) the Jacobian J (x ) has full rank in all steps.

    In chapter 2 we saw that Newtons method for optimization has quadraticconvergence. This is normally not the case with the Gauss-Newton method.

    3.1. GaussNewton Method 22

    To see this, we compare the search directions used in the two methods,

    F (x )h n = F (x ) and L (h )h gn = L (0 ) .We already remarked at (3.8) that the two right-hand sides are identical, butfrom (3.4b) and (3.8) we see that the coefcient matrices differ:

    F (x ) = L (h ) +m

    i =1

    f i (x)f i (x ) . (3.12)

    Therefore, if f (x) = 0 , then L (h ) F (x ) for x close to x, and we getquadratic convergence also with the Gauss-Newton method. We can expectsuperlinear convergence if the functions {f i}have small curvatures or if the{|f i (x)|}are small, but in general we must expect linear convergence. It isremarkable that the value of F (x) controls the convergence speed.

    Example 3.3. Consider the simple problem with n = 1 , m = 2 given by

    f (x) = x + 1x 2 + x 1. F (x) = 12 (x+1)

    2 + 12 (x2 + x1)

    2 .

    It follows thatF (x) = 2 2 x 3 + 3 x 2 2(1)x ,

    so x = 0 is a stationary point for F . Now,

    F (x) = 6 2 x2 + 6 x 2(1) .This shows that if < 1, then F (0) > 0, so x = 0 is a local minimizer actu-ally, it is the global minimizer.

    The Jacobian is

    J(x) =

    12x + 1 ,

    and the classical Gauss-Newton method from xk gives

    xk +1 = xk 2 2 x 3k + 3 x

    2k 2(1)xk

    2 + 4 x k + 4 2 x2k.

    Now, if = 0 and xk is close to zero, then

    xk +1 = xk + ( 1)xk + O(x2k ) = x k + O(x

    2k ) .

  • 8/10/2019 Non-Linear Least Squares

    13/30

    23 3. L EAST S QUARES P ROBLEMS

    Thus, if | |< 1, we have linear convergence. If < 1, then the classical Gauss-Newton method cannot nd the minimizer. Eg with = 2 and x0 = 0 .1 weget a seemingly chaotic behaviour of the iterates,

    k xk0 0.10001 0.30292 0.13683 0.4680... ...

    Finally, if = 0 , then

    xk +1 = xk xk = 0 ,ie we nd the solution in one step. The reason is that in this case f is a linearfunction.

    Example 3.4. For the data tting problem from Example 1.1 the i th row of theJacobian matrix is

    J (x ) i, : = x3 t i ex 1 t i x 4 t i ex 2 t i ex 1 t i ex 2 t i .If the problem is consistent (ie f (x ) = 0 ), then the Gauss-Newton method withline search will have quadratic nal convergence, provided that x1 is signif-icantly different from x2 . If x1 = x2 , then rank (J (x )) 2, and the Gauss-Newton method fails.

    If one or more measurement errors are large, then f (x ) has some large compo-nents, and this may slow down the convergence.

    In M ATLAB we can give a very compact function for computing f and J : Sup-pose that x holds the current iterate and that the m2 array ty holds the coordi-nates of the data points. The following function returns f and J containing f (x )and J (x ), respectively.

    function [f, J] = fitexp(x, ty)t = ty(:,1); y = ty(:,2);E = exp(t * [x(1), x(2)]);f = y - E*[x(3); x(4)];J = -[x(3)*t.*E(:,1), x(4)*t.*E(:,2), E];

    3.2. The LevenbergMarquardt Method 24

    Example 3.5. Consider the problem from Example 3.2, f (x ) = 0 with f : IRn IR

    n

    . If we use Newton-Raphsons method to solve this problem, the typicaliteration step is

    Solve J (x )h nr = f (x ); x := x + h nr .The Gauss-Newton method applied to the minimization of F (x ) = 12 f (x ) f (x )has the typical step

    Solve (J (x ) J (x ))h gn = J (x ) f (x ); x := x + h gn .Note, that J (x ) is a square matrix, and we assume that it is nonsingular. Then(J (x ) ) 1 exists, and it follows that h gn = h nr . Therefore, when applied toPowells problem from Example 3.2, the Gauss-Newton method will have thesame troubles as discussed for Newton-Raphsons method in that example.

    These examples show that the Gauss-Newton method may fail, both withand without a line search. Still, in many applications it gives quite goodperformance, though it normally only has linear convergence as opposed tothe quadratic convergence from Newtons method with implemented secondderivatives.

    In Sections 3.2 and 3.3 we give two methods with superior global perfor-

    mance, and in Section 3.4 we give modications to the rst method so thatwe achieve superlinear nal convergence.

    3.2. The LevenbergMarquardt MethodLevenberg (1944) and later Marquardt (1963) suggested to use a damped Gauss-Newton method, cf Section 2.4. The step h lm is dened by the fol-lowing modication to (3.9),

    (J J + I )h lm = g with g = J f and 0 . (3.13)Here, J = J (x ) and f = f (x ). The damping parameter has several effects:

    a) For all > 0 the coefcient matrix is positive denite, and this ensuresthat h lm is a descent direction, cf (3.10).

  • 8/10/2019 Non-Linear Least Squares

    14/30

  • 8/10/2019 Non-Linear Least Squares

    15/30

    27 3. L EAST S QUARES P ROBLEMS

    Algorithm 3.16. LevenbergMarquardt method

    begink := 0; := 2; x := x 0A := J (x ) J (x ); g := J (x ) f (x ) found := ( g 1); := max{a ii }while (not found ) and (k < k max )

    k := k+1; Solve (A + I )h lm = gif h lm 2( x + 2 )

    found := true

    elsex new := x + h lm := ( F (x ) F (x new )) / (L(0 ) L(h lm ))if > 0 {step acceptable }x := x newA := J (x ) J (x ); g := J (x ) f (x ) found := ( g 1 ) := max{13 , 1 (2 1)3}; := 2

    else

    := ; := 2 end

    f (x )0 +

    J (x ) I h 0 .

    As mentioned in Example 3.1, the most accurate solution is found via orthogonaltransformation. However, the solution h lm is just a step in an iterative process,

    and needs not be computed very accurately, and since the solution via the normalequations is cheaper, this method is normally employed.

    Example 3.7. We have used Algorithm 3.16 on the data tting problem from Ex-amples 1.1 and 3.4. Figure 1.1 indicates that both x 1 and x 2 are negative and thatM (x , 0) 0. These conditions are satised by x 0 = [1, 2, 1, 1] . Fur-ther, we used = 10 3 in the expression (3.14) for 0 and the stopping criteria

    3.2. The LevenbergMarquardt Method 28

    given by (3.15) with 1 = 2 = 10 8 , kmax = 200 . The algorithm stopped after

    62 iteration steps with x

    [4, 5, 4, 4] . The performance is illustratedbelow; note the logarithmic ordinate axis.This problem is not consistent, so we could expect linear nal convergence. Thelast 7 iteration steps indicate a much better (superlinear) convergence. The ex-planation is, that the f i (x ) are slowly varying functions of t i , and the f i (x )have random sign, so that the contributions to the forgotten term in (3.12)almost cancel out. Such a situation occurs in many data tting applications.

    0 10 20 30 40 50 60 7010

    12

    108

    10 4

    100

    F(x)||g||

    Figure 3.2a. The L-M method applied to the tting problem from Example 1.1.

    For comparison, Figure 3.2b shows the performance with the updating strategy(2.20). From step 5 to step 68 we see that each decrease in is immediately

    followed by an increase, and the norm of the gradient has a rugged behaviour.This slows down the convergence, but the nal stage is as in Figure 3.2a.

    0 10 20 30 40 50 60 7010

    12

    108

    104

    100

    F(x)||g||

    Figure 3.2b. Performance with updating strategy (2.20).

    Example 3.8. Figure 3.3 illustrates the performance of Algorithm 3.16 applied toPowells problem from Examples 3.2 and 3.5. The starting point is x 0 = [ 3 , 1 ] ,0 given by = 1 in (3.14), and we use 1 = 2 = 10 15 , kmax = 100 in thestopping criteria (3.15).

  • 8/10/2019 Non-Linear Least Squares

    16/30

    29 3. L EAST S QUARES P ROBLEMS

    0 10 20 30 40 50 60 70 80 90 10010

    16

    108

    100 F(x)

    ||g||

    Figure 3.3. The L-M method applied to Powells problem.

    The iteration seems to stall between steps 22 and 30. This as an effect of the (almost) singular Jacobian matrix. After that there seems to be linearconvergence. The iteration is stopped by the safeguard at the point x =

    [-3.82e-08

    , -1.38e-03

    ]. This is a better approximation to x

    =0 than we

    found in Example 3.2, but we want to be able to do even better; see Examples3.10 and 3.17.

    3.3. Powells Dog Leg MethodAs the LevenbergMarquardt method, this method works with combinationsof the GaussNewton and the steepest descent directions. Now, howevercontrolled explicitly via the radius of a trust region , cf Section 2.4. Powells

    name is connected to the algorithm because he proposed how to nd anapproximation to h tr , dened by (2.23).

    Given f : IRn IRm . At the current iterate x the GaussNewton step h gn isthe least squares solution to the linear systemJ (x )h f (x ) . (3.17)

    It can be computed by solving the normal equations

    J (x ) J (x ) h gn = J (x ) f (x ) . (3.18a)The steepest descent direction is given by

    h sd = g = J (x ) f (x ) . (3.18b)This is a direction, not a step, and to see how far we should go, we look atthe linear model

    3.3. Powells Dog Leg Method 30

    f (x + h sd ) f (x ) + J (x )h sd F (x + h sd ) 12 f (x ) + J (x )h sd 2

    = F (x ) + h sd J (x ) f (x ) + 12 2 J (x )h sd 2 .

    This function of is minimal for

    = h sd J (x ) f (x )

    J (x )h sd 2 = g 2

    J (x )g 2 . (3.19)

    Now we have two candidates for the step to take from the current point x :

    a = h sd and b = h gn . Powell suggested to use the following strategy forchoosing the step, when the trust region has radius . The last case in thestrategy is illustrated in Figure 3.4.

    if h gn h dl := h gnelseif h sd h dl := ( / h sd )h sdelse

    h dl := h sd + (h gn

    h sd)

    with chosen so that h dl = .

    (3.20a)

    a = hsd

    b = hGN

    hdl

    x

    Figure 3.4. Trust region and Dog Leg step. 4)

    4) The name Dog Leg is taken from golf: The fairway at a dog leg hole has a shape asthe line from x (the tee point) via the end point of a to the end point of h dl (the hole).Powell is a keen golfer!

  • 8/10/2019 Non-Linear Least Squares

    17/30

    31 3. L EAST S QUARES P ROBLEMS

    With a and b as dened above, and c = a (b a ) we can write( ) a + (b a ) 2 2 = b a 2 2 + 2 c + a 2 2 .

    We seek a root for this second degree polynomial, and note that + for ; (0) = a 2 2 < 0; (1) = h gn 2 2 > 0. Thus, has one negative root and one root in ]0, 1[. We seek the latter, and the mostaccurate computation of it is given by

    if c 0 = c +

    c2 + b a 2 ( 2 a 2 ) b a 2

    else = 2 a 2 c + c2 + b a 2 ( 2 a 2 )

    (3.20b)

    As in the L-M method we can use the gain ratio

    = ( F (x ) F (x + h dl )) (L(0 ) L(h dl))to monitor the iteration. Again, L is the linear model

    L(h ) = 12 f (x ) + J (x )h2 .

    In the L-M method we used to control the size of the damping parameter.Here, we use it to control the radius of the trust region. A large value of

    indicates that the linear model is good. We can increase and therebytake longer steps, and they will be closer to the Gauss-Newton direction. If

    is small (maybe even negative) then we reduce , implying smaller steps,closer to the steepest descent direction. Below we summarize the algorithm.

    We have the following remarks.

    1 Initialization. x 0 and 0 should be supplied by the user.

    2 We use the stopping criteria (3.15) supplemented withf (x ) 3 , reecting that f (x) = 0 incase of m = n , ie a nonlinear

    system of equations.

    3 If m = n , then is replaced by = , cf (3.6), and we do not use thedetour around the normal equations (3.18a); see Example 3.9.

    3.3. Powells Dog Leg Method 32

    Algorithm 3.21. Dog Leg Method

    begink := 0; x := x 0 ; := 0 ; g := J (x ) f (x ) {1 } found := ( f (x ) 3 ) or ( g 1 ) {2 }while (not found ) and (k < k max )

    k := k+1; Compute by (3.19)h sd := g ; Solve J (x )h gn f (x ) {3 }Compute h dl by (3.20)if h dl 2 ( x + 2 )

    found := trueelsex new := x + h dl

    := ( F (x ) F (x new )) / (L(0 ) L(h dl)) {4 }if > 0

    x := x new ; g := J (x ) f (x ) found := ( f (x ) 3 ) or ( g 1)

    if > 0.75 {5 } := max { , 3h dl }elseif < 0.25 := / 2; found := ( 2 ( x + 2 )) {6 }end

    4 Corresponding to the three cases in (3.20a) we can show that

    L(0 )L(h dl) =F (x ) if h dl = h gn

    (2 g

    )

    2 if h dl =

    g g12 (1 )2 g 2 + (2 )F (x ) otherwise

    5 Strategy (2.19) is used to update the trust region radius.

    6 Extra stopping criterion. If 2 ( x + 2), then (3.15b) will surelybe satised in the next step.

  • 8/10/2019 Non-Linear Least Squares

    18/30

  • 8/10/2019 Non-Linear Least Squares

    19/30

    35 3. L EAST S QUARES P ROBLEMS

    if F (x) = 0 . The iteration starts with a series of steps with the L-M method.

    If the performance indicates that F (x

    ) is signicantly nonzero, then weswitch to the QuasiNewton method for better performance. It may happenthat we get an indication that it is better to switch back to the LM method,so there is also a mechanism for that.

    The switch to the QuasiNewton method is made if the condition

    F (x ) < 0.02F (x ) (3.22)

    is satised in three consecutive, successful iteration steps. This is interpretedas an indication that we are approaching an xwith F (x) = 0 and F (x)signicantly nonzero. As discussed in connection with (3.12), this can leadto slow, linear convergence.

    The QuasiNewton method is based on having an approximation B to theHessian F (x ) at the current iterate x , and the step h qn is found by solving

    Bh qn = F (x ) . (3.23)This is an approximation to the Newton equation (2.9a).

    The approximation B is updated by the BFGS strategy, cf Section 5.10 inFrandsen et al (2004): Every B in the series of approximation matrices issymmetric (as any F (x )) and positive denite. This ensures that h qn isdownhill, cf (2.10). We start with the symmetric, positive denite matrixB 0 = I , and the BFGS update consists of a rank 2 matrix to be added tothe current B . Madsen (1988) uses the following version, advocated byAl-Baali and Fletcher (1985),

    h := x new x ; y := J new J new h + ( J new J ) f (x new )if h y > 0

    v := Bh ; B := B + 1h y

    y y 1h v

    v v(3.24)

    with J = J (x ), J new = J (x new ). As mentioned, the current B is positivedenite, and it is changed only, if h y > 0. In this case it can be shown thatalso the new B is positive denite.

    3.4. Hybrid: LM and QuasiNewton 36

    The QuasiNewton method is not robust in the global stage of the itera-

    tion; it is not guaranteed to be descenting. At the solution x

    we haveF (x) = 0 , and good nal convergence is indicated by rapidly decreasingvalues of F (x ) . If these norm values do not decrease rapidly enough,then we switch back to the LM method.

    The algorithm is summarized below. It calls the auxiliary functions LMstepand QNstep , implementing the two methods.

    Algorithm 3.25. A Hybrid Method

    begink := 0; x := x 0 ; := 0 ; B := I {1 } found := ( F (x ) 1 ); method := L-Mwhile (not found ) and (k < k max )

    k := k+1case method of

    L-M:[x new , found , better , method , . . . ] := LMstep (x , . . . ) {2 }

    Q-N:

    [x new , found , better , method , . . . ] := QNstep (x , B , . . . ) {2

    }Update B by (3.24) {3 }if better x := x new

    end

    We have the following remarks:

    1 Initialization. 0 can be found by (3.14). The stopping criteria aregiven by (3.15).

    2 The dots indicate that we also transfer current values of f and J etc, sothat we do not have to recompute them for the same x .

    3 Notice that both L-M and Quasi-Newton steps contribute informationfor the approximation of the Hessian matrix.

    The two auxiliary functions are given below,

  • 8/10/2019 Non-Linear Least Squares

    20/30

    37 3. L EAST S QUARES P ROBLEMS

    Function 3.26. LevenbergMarquardt step

    [x new , found , better , method , . . . ] := LMstep (x , . . . )begin

    x new := x ; method := L-MSolve (J (x ) J (x ) + I )h lm = F (x )if h lm 2( x + 2 )

    found := trueelse

    x new := x + h lm

    := ( F (x ) F (x new )) / (L(0 ) L(h lm )) {4

    }if > 0better := true ; found := ( F (x new ) 1 )if F (x new ) < 0.02F (x new ) {5 }

    count := count +1if count = 3 {6 }

    method := Q-Nelse

    count := 0

    elsecount := 0; better := false

    end

    We have the following remarks on the functions LMstep and QNstep :

    4 The gain ratio is also used to update as in Algorithm 3.16.

    5 Indication that it might be time to switch method. The parameter count is initialized to zero at the start of Algorithm 3.25.

    6 (3.22) was satised in three consecutive iteration steps, all of which had> 0, ie x was changed in each of these steps.

    3.4. Hybrid: LM and QuasiNewton 38

    Function 3.27. QuasiNewton step

    [x new , found , better , method , . . . ] := QNstep (x , B . . .)

    beginx new := x ; method := Q-N ; better := falseSolve Bh qn = F (x )if h qn 2 ( x + 2)

    found := trueelse

    if

    h qn >

    {7

    }h qn := ( / h qn )h qnx new := x + h qn ;if F (x new ) 1 {8 }

    found := trueelse {9 }

    better := ( F (x new ) < F (x )) or (F (x new ) (1+ )F (x )and F (x new ) < F (x ) )if F (x new ) F (x ) {10 }

    method := L-Mend

    7 We combine the QuasiNewton method with a trust region approach,with a simple treatment of the case where the bound is active, cf page 15f. At the switch from the LM method is initialized tomax{1.52 ( x + 2 ), 15 h lm }.

    8 Not shown: is updated by means of (2.19).

    9

    In this part of the algorithm we focus on getting F closer to zero, sowe accept a slight increase in the value of F , eg = M , where Mis the computers unit roundoff.

    10 The gradients do not decrease fast enough.

  • 8/10/2019 Non-Linear Least Squares

    21/30

  • 8/10/2019 Non-Linear Least Squares

    22/30

    41 3. L EAST S QUARES P ROBLEMS

    The simplest remedy is to replace J (x ) by a matrix B obtained by numericaldifferentiation : The (i, j ) th element is approximated by the nite differenceapproximation

    f ix j

    (x ) f i (x + e j ) f i (x )

    bij , (3.28)where e j is the unit vector in the j th coordinate direction and is an ap-propriately small real number. With this strategy each iterate x needs n+1evaluations of f , and since is probably much smaller than the distance

    x

    x , we do not get much more information on the global behavior of

    f than we would get from just evaluating f (x ). We want better efciency.

    Example 3.14. Let m = n = 1 and consider one nonlinear equation

    f : IR IR . Find x such that f ( x) = 0 .For this problem we can write the NewtonRaphson algorithm (3.6) in the formf (x+ h) (h) f (x) + f (x)hsolve the linear problem (h) = 0

    xnew := x + h

    (3.29)

    If we cannot implement f (x), then we can approximate it by

    (f (x+ ) f (x)) / with chosen appropriately small. More generally, we can replace (3.29) by

    f (x+ h) (h) f (x) + bh with b f (x)solve the linear problem (h) = 0

    xnew := x + h

    (3.30a)

    Suppose that we already know xprev and f (xprev ). Then we can x the factor b(the approximation to f (x)) by requiring that

    f (xprev ) = (xprev x) . (3.30b)This gives b = f (x) f (xprev ) / (x xprev ) , and with this choice of bwe recognize (3.30) as the secant method , see eg pp 70f in Eld en et al (2004).

    3.5. Secant Version of the LM Method 42

    The main advantage of the secant method over an alternative nite differenceapproximation to NewtonRaphsons method is that we only need one functionevaluation per iteration step instead of two.

    Now, consider the linear model (3.7a) for f : IRn IRm ,f (x + h ) (h ) f (x ) + J (x )h .

    We will replace it by

    f (x + h ) (h ) f (x ) + Bh ,where B is the current approximation to J (x ). In the next iteration step we

    need B new so thatf (x new + h ) f (x new ) + B new h .

    Especially, we want this model to hold with equality for h = xx new , ief (x ) = f (x new ) + B new (xx new ) . (3.31a)

    This gives us m equations in the mn unknown elements of B new , so weneed more conditions. Broyden (1965) suggested to supplement (3.31a)with

    B new v = Bv for all v(xx new ) . (3.31b)It is easy to verify that the conditions (3.31ab) are satised by

    Denition 3.32. Broydens Rank One Update

    B new = B + uhwhere

    h = x new x , u = 1h h (f (x new ) f (x ) Bh ) .

    Note that condition (3.31a) corresponds to the secant condition (3.30b) inthe case n = 1 . We say that this approach is a generalized secant method .

    A brief sketch of the central part of Algorithm 3.16 with this modicationhas the form

  • 8/10/2019 Non-Linear Least Squares

    23/30

  • 8/10/2019 Non-Linear Least Squares

    24/30

  • 8/10/2019 Non-Linear Least Squares

    25/30

  • 8/10/2019 Non-Linear Least Squares

    26/30

    49 3. L EAST S QUARES P ROBLEMS

    with t i = 45+5 i and

    i yi i yi i yi1 34780 7 11540 12 51472 28610 8 9744 13 44273 23650 9 8261 14 38204 19630 10 7030 15 33075 16370 11 6005 16 28726 13720

    The minimizer is x 5.6110 3 6.18103 3.45102 withF (x ) 43.97.An alternative formulation is

    i (x ) = 10 3 yi z 1 exp 10z 2u i + z 3 13 , i = 1 , . . . , 16 ,

    with u i = 0 .45+0 .05i. The reformulation corresponds toz = 10 3 e13 x1 10 3 x2 10 2 x3 , and the minimizer isz 2.48 6.18 3.45 with (x ) 4.39710 5 .If we use Algorithm 3.16 with = 1 , 1 = 10 6 , 2 = 10 10 and the equivalentstarting vectors

    x 0 = 210 2 4103 2.5102 , z0 = 8.85 4 2.5 ,then the iteration is stopped by (3.15b) after 175 iteration steps with the rstformulation, and by (3.15a) after 88 steps with the well-scaled reformulation.

    A PPENDIX

    A. Symmetric, Positive Denite Matrices

    The matrix AIRn n is symmetric if A = A , ie if a ij = a ji for all i, j .

    Denition A.1. The symmetric matrix AIRn n is

    positive denite if x A x > 0 for all xIRn , x= 0 ,

    positive semidenite if x A x 0 for all xIRn , x= 0 .Some useful properties of such matrices are listed in Theorem A.2 below. The proof can be found by combining theorems in almost any textbooks on linear algebra andon numerical linear algebra. At the end of this appendix we give some practicalimplications of the theorem.

    Now, let J

    IRm n be given, and letA = J J .

    Then A = J (J ) = A , ie A is symmetric. Further, for any nonzero xIRn let

    y = Jx . Then

    x Ax = x J J x = y y 0 ,showing that A is positive semidenite. If m n and the columns in J are linearlyindependent, then x = 0 y = 0 and y y > 0. Thus, in this case A is positivedenite.

    From (A.3) below follows immediately that

    (A + I )v j = ( j + )v j , j = 1 , . . . , n

    for any IR. Combining this with 2 in Theorem A.2 we see that if A is symmetric

    and positive semidenite and > 0, then the matrix A + I is also symmetric and itis guaranteed to be positive denite.

  • 8/10/2019 Non-Linear Least Squares

    27/30

    51 A PPENDIX

    Theorem A.2. Let A IRn n be symmetric and let A = LU , where

    L is a unit lower triangular matrix and U is an upper triangular matrix.Further, let {( j , v j )}nj =1 denote the eigensolutions of A , ie

    Av j = j v j , j = 1 , . . . , n . (A.3)

    Then1 The eigenvalues are real, j IR, and the eigenvectors {v j }form

    an orthonormal basis of IR n .

    2 The following statements are equivalenta) A is positive denite (positive semidenite)

    b) All j > 0 ( j 0 )c) All uii > 0 ( uii 0 ) .

    If A is positive denite, then3 The LU-factorization is numerically stable.

    4 U = DL with D = diag (u ii ).

    5 A = C C , the Cholesky factorization. CIRn n is upper trian-

    gular.

    The condition number of a symmetric matrix A is

    2 (A ) = max {| j |}/ min{| j |}.If A is positive (semi)denite and > 0, then

    2 (A + I ) = max

    { j

    }+

    min{ j }+ max

    { j

    }+

    ,and this is a decreasing function of .

    Finally, some remarks on Theorem A.2 and practical details: A unit lower trian-gular matrix L is characterized by ii = 1 and ij = 0 for j>i . Note, that theLU-factorization A = LU is made without pivoting. Also note that points 4 5

    give the following relation between the LU- and the Cholesky-factorization

    B. Minimum Norm Least Squares Solution 52

    A = L U = L D L = C C ,

    showing thatC = D 1 / 2 L , with D 1 / 2 = diag( u ii ) .

    The Cholesky factorization can be computed directly (ie without the intermediateresults L and U ) by the following algorithm, that includes a test for positive de-niteness.

    Algorithm A.4. Cholesky Factorization

    begink := 0; posdef := true

    {Initialisation

    }while posdef and k < nk := k+1; d := akk k 1i =1 c

    2ik

    if d > 0 {test for pos. def. }ckk := d {diagonal element }for j := k+1 , . . . , n

    ckj := a kj k 1

    i =1 cij cik /c kk {superdiagonal elements }else

    posdef := falseend

    The cost of this algorithm is about 13 n3 ops. Once C is computed, the system

    Ax = b can be solved by forward and back substitution in

    C z = b and C x = z ,

    respectively. Each of these steps costs about n 2 ops.

    B. Minimum Norm Least Squares Solution

    Consider the least squares problem: Given AIRm n with m n and bIRm , nd

    h

    IRn

    such thatb A h

    is minimized. To analyze this, we shall use the singular value decomposition (SVD)of A ,

    A = U V , (B.1)

    where the matrices UIRm m and V IR

    n n are orthogonal, and

  • 8/10/2019 Non-Linear Least Squares

    28/30

    53 A PPENDIX

    =

    1. .

    . n0

    with 1 p > 0, p +1 = = n = 0 .

    The j are the singular values . The number p is the rank of A , equal to the di-mension of the subspace R(A )IRm (the so-called range of A ) that contains everypossible linear combination of the columns of A .

    Let {u j }mj =1 and {v j }nj =1 denote the columns in U and V , respectively. Since thematrices are orthogonal, the vectors form two orthonormal sets, ie

    u i u j = v i v j = 1 , i = j ,0 , otherwise. (B.2)

    From (B.1) and (B.2) it follows that

    A =p

    j =1j u j v j and A v k = k u k , k = 1 , . . . , n . (B.3)

    The

    {u j

    }and

    {v j

    }can be used as orthonormal bases in IR m and IR n , respectively,

    so we can write

    b =m

    j =1 j u j , h =

    n

    i =1i v i , (B.4)

    and by use of (B.3) we see that

    r = b A h =p

    j =1( j j j )u j +

    m

    j = p +1 j u j ,

    and by means of (B.2) this implies

    r 2 = r r =p

    j =1( j j j )

    2 +m

    j = p +1 2j . (B.5)

    This is minimized when

    j j j = 0 , j = 1 , . . . , p .

    A PPENDIX 54

    Thus, the least squares solution can be expressed as

    h =p

    j =1 jj

    v j +n

    j = p +1j v j .

    When p < n the least squares solution has n p degrees of freedom: p +1 , . . . , nare arbitrary. Similar to the discussion around (B.5) we see that h is minimizedwhen p +1 = = n = 0 . The solution corresponding to this choice of the freeparameters is the so-called minimum norm solution,

    h min =p

    j =1

    jj

    v j =p

    j =1

    u j bj

    v j .

    The reformulation follows from (B.4) and (B.2).

  • 8/10/2019 Non-Linear Least Squares

    29/30

    R EFERENCES

    1. M. Al-Baali and R. Fletcher (1985): Variational Methods for Non-LinearLeast-Squares. J. Opl. Res. Soc. 36 , No. 5, pp 405421.

    2. C.G. Broyden (1965): A Class of Methods for Solving Nonlinear SimultaneousEquations. Maths. Comp. 19 , pp 577593.

    3. J.E. Dennis, Jr. and R.B. Schnabel (1983): Numerical Methods for Uncon-strained Optimization and Nonlinear Equations, Prentice Hall.

    4. L. Eld en, L. Wittmeyer-Koch and H.B. Nielsen (2004): Introduction to Numer-ical Computation analysis and M ATLAB illustrations , to appear.

    5. P.E. Frandsen, K. Jonasson, H.B. Nielsen and O. Tingleff (2004):Unconstrained Optimization, 3rd Edition, IMM, DTU. Available ashttp://www.imm.dtu.dk/courses/02611/uncon.pdf

    6. P.E. Gill, W. Murray, M.A. Saunders and M.H. Wright (1984): Sparse Matrix Methods in Optimization, SIAM J.S.S.C. 5, pp 562589.

    7. G. Golub and V. Pereyra, (2003): Separable Nonlinear Least Squares: TheVariable Projection Method and its Applications , Inverse Problems, 19 , pp R1R26.

    8. G. Golub and C.F. Van Loan (1996): Matrix Computations , 3rd edition. JohnHopkins Press.

    9. K. Levenberg (1944): A Method for the Solution of Certain Problems in Least Squares. Quart. Appl. Math. 2, pp 164168.

    10. K. Madsen (1988): A Combined Gauss-Newton and Quasi-Newton Method for Non-Linear Least Squares. Institute for Numerical Analysis (now part of IMM), DTU. Report NI-88-10.

    11. K. Madsen and H.B. Nielsen (2002): Supplementary Notes for 02611 Opti-mization and Data Fitting , IMM, DTU. Available ashttp://www.imm.dtu.dk/courses/02611/SN.pdf

    R EFERENCES 56

    12. K. Madsen, H.B. Nielsen and J. Sndergaard (2002): Robust Subroutines for Non-Linear Optimization. IMM, DTU. Report IMM-REP-2002-02. Availableas http://www.imm.dtu.dk/ km/F package/robust.pdf

    13. K. Madsen, H.B. Nielsen and O. Tingleff (2004): Optimization with Con-straints, 2nd Edition. IMM, DTU. Available ashttp://www.imm.dtu.dk/courses/02611/ConOpt.pdf

    14. D. Marquardt (1963): An Algorithm forLeast Squares Estimation on Nonlinear Parameters. SIAM J. A PPL . MATH .11 , pp 431441.

    15. H.B. Nielsen (1997): Direct Methods for Sparse Matrices. IMM, DTU. Avail-able as http://www.imm.dtu.dk/ hbn/publ/Nsparse.ps

    16. H.B. Nielsen (1999): Damping Parameter in Marquardts Method. IMM, DTU.Report IMM-REP-1999-05. Available ashttp://www.imm.dtu.dk/ hbn/publ/TR9905.ps

    17. H.B. Nielsen (2000): Separable Nonlinear Least Squares. IMM, DTU. ReportIMM-REP-2000-01. Available ashttp://www.imm.dtu.dk/ hbn/publ/TR0001.ps

    18. J. Nocedal and S.J. Wright (1999): Numerical Optimization , Springer.

    19. M.J.D. Powell (1970): A Hybrid Method for Non-Linear Equations. In P. Rabi-nowitz (ed): Numerical Methods for Non-Linear Algebraic Equations , Gordonand Breach. pp 87ff.

    20. P.L. Toint (1987): On Large Scale Nonlinear Least Squares Calculations.SIAM J.S.S.C. 8, pp 416435.

  • 8/10/2019 Non-Linear Least Squares

    30/30

    INDEX

    Al-Baali, 35

    BFGS strategy, 35Broyden, 42, 45

    Cholesky factorization, 9, 51f condition number, 51consistency, 23constrained optimization problem, 15

    damped method, 12, 24 Newton method, 15

    damping parameter, 12, 24data tting, 1, 23, 27, 34, 45, 48descending condition, 5descent direction, 7, 8, 21, 24, 43

    method, 6difference approximation, 41, 44, 47dog leg method, 31f, 34, 45

    eigensolution, 51Eld en et al, 41exact line search, 11

    nal convergence, 7 stage, 5, 8tting model, 1Fletcher, 35ops, 46, 52Frandsen et al, 3, 7ff, 11,

    16, 20, 34, 35, 39full rank, 21

    gain ratio, 13, 25, 31GAMS, 47Gauss-Newton, 21, 29generalized secant method, 42Gill et al, 45

    global minimizer, 2, 22 stage, 5, 36golf, 31Golub, 19, 48gradient, 3, 18, 21

    method, 7

    Hessian, 3, 8, 18, 21, 35hybrid method, 8f, 15, 36, 39f

    indenite, 4innite loop, 26initial stage, 7Internet, 47inverse Jacobian, 46

    Jacobian, 17, 20, 22, 40, 47, 48

    L-M algorithm, 47

    equations, 26 method, 27f, 36, 40

    least squares t, 2 problem, 1, 17

    Levenberg, 24line search, 7, 10ff linear convergence, 5, 7,

    20, 22f, 29, 33

    INDEX 58

    linear independence, 21, 43, 50 least squares, 18, 48, 52 model, 20, 25, 29, 42 problem, 26

    local minimizer, 2ff, 22LU-factorization, 51

    Madsen, 19, 34f, 39 et al, 15, 16

    Marquardt, 13, 24M ATLAB , 19, 23, 47

    Meyers problem, 48minimum norm solution, 54model, 11

    necessary condition, 3Newtons method, 8Newton-Raphson, 19, 24Nielsen, 14, 19, 45, 48Nocedal, 15nonlinear system, 19, 24, 28, 45f normal equations, 18, 26, 33numerical differentiation, 41

    objective function, 2, 2orthogonal matrix, 18

    transformation, 18, 27orthonormal basis, 51

    parameter estimation, 48

    Pereyra, 48positive denite, 4, 8,15, 21, 24, 35, 50

    Powell, 20, 28, 30, 43Powells problem, 33, 47

    quadratic convergence, 6, 9, 22f Quasi-Newton, 9, 34, 36, 40

    range, 53rank, 53rank-one update, 42, 45residual, 2Rosenbrock, 39, 44, 47rounding errors, 26, 38, 46

    saddle point, 4safeguard, 26scaling, 48secant dog leg, 47

    L-M method, 44 method, 41f semidenite, 50separable least squares, 48singular Jacobian, 20, 29singular value decomposition, 52soft line search, 11span, 43sparse matrix, 45stationary point, 3steepest descent, 7, 25, 29stopping criteria, 26, 28, 29, 40, 47submatrix, 19subvector, 19sufcient condition, 4, 8superlinear convergence, 6, 22, 28, 34SVD, 52

    Taylor expansion, 3, 6, 8, 12, 17, 20

    Toint, 45trust region, 12, 29

    updating, 13, 14, 28, 42, 45

    Van Loan, 19

    white noise, 2Wright, 15