linear least squares problems

Upload: ashoka-vanjare

Post on 02-Jun-2018

226 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/10/2019 Linear Least Squares Problems

    1/32

    02610 Optimization and Data Fitting Linear Data Fitting Problems 1

    Data Fitting and Linear Least-Squares Problems

    This lecture is based on the bookP. C. Hansen, V. Pereyra and G. Scherer,Least Squares Data Fitting with Applications ,Johns Hopkins University Press, to appear(the necessary chapters are available on CampusNet)

    and we cover this material:

    Section 1.1: Motivation. Section 1.2: The data tting problem.

    Section 1.4: The residuals and their properties. Section 2.1: The linear least squares problem. Section 2.2: The QR factorization.

  • 8/10/2019 Linear Least Squares Problems

    2/32

    02610 Optimization and Data Fitting Linear Data Fitting Problems 2

    Example: Parameter estimation

    0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.40

    1

    2

    3

    NMR signal, frozen cod

    time t (seconds)

    The measured NMR signal reects the amount of different types of

    water environments in the meat.The ideal time signal (t) from NMR is:

    (t) = x1 e 1 t + x2 e 2 t + x3 , 1 , 2 > 0 areknown .

    Amplitudes x1 and x2 are proportional to the amount of water

    containing the two kinds of protons. The constant x3 accounts foran undesired background (bias) in the measurements.

    Goal: estimate the three unknown parameters and then computethe different kinds of water contents in the meat sample.

  • 8/10/2019 Linear Least Squares Problems

    3/32

    02610 Optimization and Data Fitting Linear Data Fitting Problems 3

    Example: Data approximation

    Measurements of air pollution, in the form of the NO concentration,over a period of 24 hours, on H. C. Andersens Boulevard.

    0 10 200

    200

    400Polynomial fit

    0 10 200

    200

    400Trigonometric fit

    Fit a smooth curve to the measurements, so that we can computethe concentration at an arbitrary time between 0 and 24 hours.

    Polynomial and periodic models:

    (t) = x1 t p + x2 t p 1 + + x p t + x p+1 ,(t) = x1 + x2 sin( t)+ x3 cos( t)+ x4 sin(2 t)+ x5 cos(2 t)+

  • 8/10/2019 Linear Least Squares Problems

    4/32

  • 8/10/2019 Linear Least Squares Problems

    5/32

  • 8/10/2019 Linear Least Squares Problems

    6/32

    02610 Optimization and Data Fitting Linear Data Fitting Problems 6

    The Linear Data Fitting Problem

    We wish to compute an approximation to the data, given by the tting model M (x , t ).

    The vector x = ( x1 , x 2 , . . . , x n )T R n contains n parameters that

    characterize the model, to be determined from the given noisy data.

    The linear data tting problem:

    Linear tting model: M (x , t ) =n

    j =1

    x j f j (t).

    Here the functions f j (x) are chosen either because the reect theunderlying physical/chemical/. . . model (parameter estimation) orbecause they are easy to work with (data approximation).

    The order of the t n should be somewhat smaller than thenumber m of data points.

  • 8/10/2019 Linear Least Squares Problems

    7/32

    02610 Optimization and Data Fitting Linear Data Fitting Problems 7

    The Least Squares (LSQ) Fit

    A standard technique for determining the parameters.We introduce the residual r i associated with the data points as

    r i = yi M (x , t i ), i = 1 , 2, . . . , m .Note that each residual is a function of the parameter vector x , i.e.,

    r i = r i (x ). A least squares t is a choice of the parameter vector xthat minimizes the sum-of-squares of the residuals:

    LSQ t: minx

    m

    i =1

    r i (x )2 = minx

    m

    i =1

    yi M (x , t i )2.

    Sum-of-absolute-values makes the problem harder to solve:

    minx

    m

    i =1|r i (x )| = minx

    m

    i =1

    yi M (x , t i )

  • 8/10/2019 Linear Least Squares Problems

    8/32

    02610 Optimization and Data Fitting Linear Data Fitting Problems 8

    Summary: the Linear LSQ Data Fitting Problem

    Given: data ( t i , yi ), i = 1 , . . . , m with measurement errors.

    Our data model: there is an (unknown) function ( t) such that

    yi = ( t i ) + ei , i = 1 , 2, . . . , m .

    The data errors: ei are are unknown, but we may have some

    statistical information about them.Our linear tting model:

    M (x , t ) =n

    j =1

    x j f j (t), f j (t) = given functions .

    The linear least squares t:

    minx

    m

    i =1

    r i (x )2 = minx

    m

    i =1

    yi M (x , t i )2

    ,

  • 8/10/2019 Linear Least Squares Problems

    9/32

    02610 Optimization and Data Fitting Linear Data Fitting Problems 9

    The Underlying Idea

    Recall our underlying data model:yi = ( t i ) + ei , i = 1 , 2, . . . , m

    and consider the residuals for i = 1 , . . . , m :

    r i = yi

    M (x , t i )

    = yi (t i ) + (t i ) M (x , t i )= ei + (t i ) M (x , t i ) .

    The data error ei is from the measurements.

    The approximation error (t i )

    M (x , t i ) is due to the discrep-

    ancy between the pure-data function and the tting model.

    A good tting model M (x , t ) is one for which the approximationerrors are of the same size as the data errors.

  • 8/10/2019 Linear Least Squares Problems

    10/32

    02610 Optimization and Data Fitting Linear Data Fitting Problems 10

    Matrix-Vector Notation

    Dene the matrix A

    R m n

    and the vectors y , r

    R m

    as follows,

    A =

    f 1 (t1 ) f 2 (t1 ) f n (t1 )f 1 (t2 ) f 2 (t2 ) f n (t2 )... ... ...f 1 (tm ) f 2 (tm ) f n (tm )

    , y =

    y1

    y2...

    ym

    , r =

    r 1

    r 2...

    r m

    ,

    i.e., y is the vector of observations, r is the vector of residuals, andthe matrix A is constructed such that the j th column is the j thmodel basis function sampled at the abscissas t1 , t 2 , . . . , t m . Then

    r = y A x and (x ) =m

    i =1r i (x )2 =

    r

    22 = y A x22 .

    The data tting problem in linear algebra notation: min x (x ).

  • 8/10/2019 Linear Least Squares Problems

    11/32

    02610 Optimization and Data Fitting Linear Data Fitting Problems 11

    Contour Plots

    NMR problem: xing x3 = 0 .3 gives a problem minx

    (x ) with twounknowns x1 and x2 .

    Left (x ) versus x . Right level curves {x R 2 | (x ) = c}.

    Contours are ellipsoids, and cond( A) = eccentricity of the ellipsoids.

  • 8/10/2019 Linear Least Squares Problems

    12/32

    02610 Optimization and Data Fitting Linear Data Fitting Problems 12

    The Example Again

    We return to the NMR data tting problem. For this problemthere are m = 50 data points and n = 3 model basis functions:

    f 1 (t) = e 1 t , f 2 (t) = e 2 t , f 3 (t) = 1 .

    The coefficient matrix:

    A =1.0000e+000 1.0000e+000 1.0000e+0008.0219e-001 9.3678e-001 1.0000e+0006.4351e-001 8.7756e-001 1.0000e+0005.1622e-001 8.2208e-001 1.0000e+000

    4.1411e-001 7.7011e-001 1.0000e+0003.3219e-001 7.2142e-001 1.0000e+000

    etc etc etc

  • 8/10/2019 Linear Least Squares Problems

    13/32

    02610 Optimization and Data Fitting Linear Data Fitting Problems 13

    The Example Again, Continued

    The LSQ solution to the least squares problem is x

    = ( x

    1 , x

    2 , x

    3 )T

    with elements

    x1 = 1 .303, x2 = 1 .973, x3 = 0 .305.

    The exact parameters used to generate the data are 1 .27, 2.04, 0.3 .

    0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.40

    1

    2

    3

    Measured data and least squares fit

    time t (seconds)

  • 8/10/2019 Linear Least Squares Problems

    14/32

  • 8/10/2019 Linear Least Squares Problems

    15/32

    02610 Optimization and Data Fitting Linear Data Fitting Problems 15

    More About Weights

    Consider the expected value of the weighted sum-of-squares:

    E m

    i =1

    r i (x ) i

    2

    =m

    i =1E

    r i (x )2

    2i

    =m

    i =1E

    e2i 2i

    +m

    i =1E

    (( t i ) M (x , t i )) 2 2i

    = m +m

    i =1

    E (( t i ) M (x , t i )) 2 2i

    ,

    where we used that E (ei ) = 0 and E (e2i ) = 2i .Intuitive result:we can allow the expected value of the approximation errors to belarger for those data ( t i , yi ) that have larger standard deviations(i.e., larger errors).

  • 8/10/2019 Linear Least Squares Problems

    16/32

    02610 Optimization and Data Fitting Linear Data Fitting Problems 16

    Weights in the Matrix-Vector Formulation

    We introduce the diagonal matrixW = diag( w1 , . . . , w m ), wi = 1i , i = 1 , 2, . . . , m .

    Then the weighted LSQ problem is min x W (x ) with

    W (x ) =

    m

    i =1

    r i (x ) i

    2

    = W (y A x )22 .

    Same computational problem as before, with y W y , A W A.

  • 8/10/2019 Linear Least Squares Problems

    17/32

    02610 Optimization and Data Fitting Linear Data Fitting Problems 17

    Example with Weights

    The NMR data again, but we add larger Gaussian noise to the rst10 data points, with standard deviation 0 .5:

    i = 0 .5, i = 1 , 2, . . . , 10 (rst 10 data with larger errors) i = 0 .1, i = 11 , 12, . . . , 50 (remaining data with smaller errors).

    The weights: wi = 1i are 2, 2, . . . , 2, 10, 10, . . . , 10.

    We solve the problem with and without weights for 10 , 000instances of the noise, and consider x2 (the exact value is 2 .04).

  • 8/10/2019 Linear Least Squares Problems

    18/32

    02610 Optimization and Data Fitting Linear Data Fitting Problems 18

    Geometric Characterization of the LSQ Solution

    Our notation in Chapter 2 (no weights, and y b):x = argmin x r2 , r = b A x .

    Geometric interpretation smallest residual when r range( A):

    range( A) = span of columns of A

    A x

    rb

  • 8/10/2019 Linear Least Squares Problems

    19/32

    02610 Optimization and Data Fitting Linear Data Fitting Problems 19

    Derivation of Closed-Form Solution

    The two components of b = A x + r must be orthogonal:(A x )T r = 0 x T AT (b A x ) = 0

    x T AT b x T AT A x = 0 x T (AT b AT A x ) = 0which leads to the normal equations:

    AT A x = AT b x = ( AT A) 1 AT b.Computational aspect sensitivity to rounding errors controlled by

    cond( AT A) = cond( A)2 .

    A side remark: the matrix A = ( AT A) 1 AT in the aboveexpression for x is called the pseudoinverse of A.

  • 8/10/2019 Linear Least Squares Problems

    20/32

    02610 Optimization and Data Fitting Linear Data Fitting Problems 20

    Avoiding the Normal Equations

    QR factorization of A

    R m n

    with m > n :

    A = QR 1

    0 with Q

    R m m , R1 R n n ,

    where R1 is upper triangular and Q is orthogonal, i.e.,

    QT Q = I m , Q Q T = I m , Q v2 = v2 for any v .Splitting: Q = ( Q1 , Q2 ) A = Q1 R 1 , where Q1 R m n .

    [Q,R] = qr(A)

    full QR factorization, Q = Q and R =

    R 1

    0.

    [Q,R] = qr(A,0) economy-size version with Q = Q1 & R = R1 .

  • 8/10/2019 Linear Least Squares Problems

    21/32

    02610 Optimization and Data Fitting Linear Data Fitting Problems 21

    Least squares solution

    b A x22 = b QR 1

    0x

    2

    2

    = Q QT b R 1

    0x

    2

    2

    =QT 1 b

    QT 2 b R 1

    0x

    2

    2

    =QT 1 b R 1 x

    QT 2 b

    2

    2

    = QT 1 b R 1 x22 + QT 2 b22 .Last term is independent of x . First term is minimal when

    R 1 x = QT 1 b x = R 11 QT 1 b .Thats easy! Multiply with QT 1 followed by backsolve with R1 .Sensitivity to rounding errors controlled by cond( R) = cond( A).

    MATLAB: x = A\b; uses the QR factorization.

  • 8/10/2019 Linear Least Squares Problems

    22/32

    02610 Optimization and Data Fitting Linear Data Fitting Problems 22

    NMR Example Again, Again

    A and Q1 :

    1.00 1.00 1

    0.80 0.94 1

    0.64 0.88 1...

    ..

    ....

    3.2 10 5 4.6 10 2 12.5 10 5 4.4 10 2 12.0 10 5 4.1 10 2 1

    ,

    0.597 0.281 0.1720.479 0.139 0.0710.384 0.029 0.002..

    ....

    ..

    .1.89 10 5 0.030 0.2241.52 10 5 0.028 0.2261.22 10 5 0.026 0.229

    ,

    R 1 =1.67 2.40 3.02

    0 1.54 5.16

    0 0 3.78

    , QT 1 b =7.814.32

    1.19

    .

  • 8/10/2019 Linear Least Squares Problems

    23/32

    02610 Optimization and Data Fitting Linear Data Fitting Problems 23

    Normal equations vs QR factorization

    Normal equations

    Squared condition number: cond( AT A) = cond( A)2 . OK for well conditioned A. Work = mn 2 + (1 / 3)n 3 ops.

    QR factorization

    No squaring of condition number, cond( R) = cond( A) Can better handle ill conditioned A.

    Work = 2 mn 2

    (2/ 3)n 3 ops.

    Normal eqs. always faster, but risky for ill-conditioned matrices.

    QR factorization is always stable better black-box method.

  • 8/10/2019 Linear Least Squares Problems

    24/32

    02610 Optimization and Data Fitting Linear Data Fitting Problems 24

    Residual Analysis (Section 1.4)

    Must choose the tting model M (x

    , t ) = n

    j =1 xj f j (t) such thatthe data errors and the approximation errors are balanced.

    Hence we must analyze the residuals r1 , r 2 , . . . , r m :

    1. The model captures the pure-data function well enoughwhen the approximation errors are smaller than the data

    errors. Then the residuals are dominated by the data errorsand some of the statistical properties of the errors carry over tothe residuals.

    2. If the tting model does not capture the behavior of thepure-data function, then the residuals are dominated by theapproximation errors. Then the residuals will tend to behaveas a sampled signal and show strong local correlations.

    Use the simplest model (i.e., the smallest n) that satises 1.

  • 8/10/2019 Linear Least Squares Problems

    25/32

    02610 Optimization and Data Fitting Linear Data Fitting Problems 25

    Residual Analysis Assumptions and Techniques

    We make the following assumptions about the data errors ei :

    They are random variables with mean zero and identicalvariance, i.e., E (ei ) = 0 and E (e2i ) = 2 for i = 1 , 2, . . . , m .

    They belong to a normal distribution, ei N (0, 2 ).We describe two tests with two different properties.

    Randomness test: check for randomness of the signs of r i . Autocorrelation test: check if the residuals are uncorrelated.

  • 8/10/2019 Linear Least Squares Problems

    26/32

    02610 Optimization and Data Fitting Linear Data Fitting Problems 26

    Test for Random Signs

    Can we consider the signs of the residuals to be random?Run test from time series analysis.

    Given a sequence of two symbols in our case, + and forpositive and negative residuals r i a run is dened as a successionof identical symbols surrounded by different symbols.

    The sequence + + + + + + + + has: m = 17 elements, n+ = 8 pluses,

    n = 9 minuses, and u = 5 runs: + + +, , ++, , and + + +.

  • 8/10/2019 Linear Least Squares Problems

    27/32

    02610 Optimization and Data Fitting Linear Data Fitting Problems 27

    The Run Test

    The distribution of runs u (not the residuals!) can be approximatedby a normal distribution with mean u and standard deviation ugiven by

    u = 2n + n

    m + 1 , 2u =

    (u 1) (u 2)m 1

    .

    With a 5 % signicance level we will accept the sign sequence asrandom if

    z = |u u | u < 1.96 .In the above example with 5 runs we have z = 2 .25 and the

    sequence of signs cannot be considered random.

  • 8/10/2019 Linear Least Squares Problems

    28/32

    02610 Optimization and Data Fitting Linear Data Fitting Problems 28

    Test for Correlation

    Are short sequences of residuals r i , r i +1 , r i +2 . . . are correlated?This is a clear indication of trends.

    Dene the autocorrelation of the residuals, as well as the trendthreshold T , as the quantities

    =m 1

    i =1

    r i r i +1 , T = 1 m 1m

    i =1

    r 2i .

    Trends are likely to be present in the residuals if the absolute valueof the autocorrelation exceeds the trend threshold, i.e., if || > T .In some presentations, the mean of the residuals is subtractedbefore computing and T .

    Not necessary here, since we assume that the errors have zero mean.

  • 8/10/2019 Linear Least Squares Problems

    29/32

    02610 Optimization and Data Fitting Linear Data Fitting Problems 29

    Run test: for n 5 we have z < 1.96 random residuals.Autocorrelation test: for n = 6 and 7 we have || < T .

  • 8/10/2019 Linear Least Squares Problems

    30/32

    02610 Optimization and Data Fitting Linear Data Fitting Problems 30

    Covariance Matrices

    Recall that bi = ( t i ) + ei . Covariance matrix for b:

    Cov(b) = E (b b T ) = E ( + e ) ( + e )= E T + e T + e T + e e T = E (e e T ).

    Covariance matrix for x:

    Cov(x ) = Cov (AT A)1

    AT b = ( AT A)1

    AT Cov(b)A (AT A)1

    .White noise in the data: Cov( b) = 2 I m

    Cov(x ) = 2 (AT A) 1 .

    Recall that:

    [Cov(x )]ij = Cov( xi x

    j ), i = j

    [Cov(x )]ii = st .dev( xi )2

  • 8/10/2019 Linear Least Squares Problems

    31/32

    02610 Optimization and Data Fitting Linear Data Fitting Problems 31

    Estimation of Noise Standard Deviation

    White noise in the data: Cov( b) = 2 I m

    . Can show that:

    Cov( r ) = 2 Q2 QT 2

    E (r 22 ) = QT 2 22 + E (QT 2 e22 )= QT 2 22 + ( m n) 2 .

    Hence, if the approximation errors are smaller than the data errorsthen r 22 (m n) 2 and the scaled residual norm

    s = r 2 m nis an estimate of the standard deviation of the errors in the data.Monitor s as a function of n to estimate .

  • 8/10/2019 Linear Least Squares Problems

    32/32

    02610 Optimization and Data Fitting Linear Data Fitting Problems 32

    Residual Norm and Scaled Residual Norm

    0 10 200

    250

    500

    Polynomial fit

    Residual norm || r * ||2

    0 10 200

    250

    500

    Trigonometric fit

    Residual norm || r * ||2

    0 10 200

    10

    20 Scaled residual norm s*

    0 10 200

    10

    20 Scaled residual norm s*

    The residual norm is monotonically decreasing while s has aplateau/minimum.