scientific computing chapter 3 - linear least squares

78
Scientific Computing Chapter 3 - Linear Least squares Guanghua tan [email protected] School of computer and communication, HNU

Upload: pekelo

Post on 23-Feb-2016

50 views

Category:

Documents


0 download

DESCRIPTION

Scientific Computing Chapter 3 - Linear Least squares. Guanghua tan [email protected] School of computer and communication, HNU. Outline. Least Squares Data Fitting. Least Squares Data Fitting Existence, Uniqueness, and Conditioning Solving Linear Least Squares Problems. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Scientific Computing Chapter 3 - Linear Least squares

Scientific Computing Chapter 3 - Linear Least squares

Guanghua [email protected]

School of computer and communication, HNU

Page 2: Scientific Computing Chapter 3 - Linear Least squares

Outline Least Squares Data Fitting

Existence, Uniqueness, and Conditioning

Solving Linear Least Squares Problems

Least Squares Data Fitting

Page 3: Scientific Computing Chapter 3 - Linear Least squares

Method of Least Squares Measurement errors are inevitable in

observational and experimental sciences Errors can be smoothed out by averaging over

many cases, i.e., taking more measurements than are strictly necessary to determine parameters of system

Resulting system is over-determined, so usually there is no exact solution

Page 4: Scientific Computing Chapter 3 - Linear Least squares

Example: Fitting a plane A plane can be determined by 3 points

Scan points may contain many points Plane equation 0ax by cz

( , , )i i i ip x y z

1 1 1

2 2 2

3

0

n n n n

x y za

x y zbc

x y z

Page 5: Scientific Computing Chapter 3 - Linear Least squares

Linear Least Squares For linear problems, we obtain over-determined

linear system Ax = b, with m×n matrix A, m > n System is better written , since equality is

not exactly satisfiable when m>n Least squares solution x minimizes squared

Euclidean norm of residual vector

Ax b

r b Ax 2 22 2min || || min || ||

x xr b Ax

Page 6: Scientific Computing Chapter 3 - Linear Least squares

Data Fitting Given m data points (ti, yi), find n-vector x of

parameters that gives “best fit” to model function f(t, x)

Problem is linear if function f is linear in components of x,

where functions depend only on t

2

1

min ( ( , ))m

i ix i

y f t x

1 1 2 2( , ) ( ) ( ) ( )i n nf t x x t x t x t ( )j t

Page 7: Scientific Computing Chapter 3 - Linear Least squares

Data Fitting Polynomial fitting

is linear, since polynomial linear in coefficients, though nonlinear in independent variable t

Fitting sum of exponentials

is example of nonlinear problem For now, we will consider only linear least squares

problems

2 11 2 3( , ) n

nf t x x x t x t x t

21 1( , ) nx tx t

nf t x x e x e

Page 8: Scientific Computing Chapter 3 - Linear Least squares

Fitting quadratic polynomial to five data points gives linear squares problem

Example: Data Fitting

Page 9: Scientific Computing Chapter 3 - Linear Least squares

Example: Data Fitting For data

over-determined 5×3 linear system is

Solution, we will see how to compute later, is

So approximating polynomial is

Page 10: Scientific Computing Chapter 3 - Linear Least squares

Outline Least Squares Data Fitting

Existence, Uniqueness, and Conditioning

Solving Linear Least Squares Problems

Page 11: Scientific Computing Chapter 3 - Linear Least squares

Existence and Uniqueness Linear least squares problem always has

solution Solution is unique, if and only if, columns of A are

linearly independent, i.e., rank(A) = n, where A is m × n

If rank(A) < n, then A is rank-deficient, and solution of linear least squares problem is not unique

For now we assume A has full column rank n

Ax b

Page 12: Scientific Computing Chapter 3 - Linear Least squares

Normal Equations To minimize squared Euclidean norm of residual

vector

take derivative with respect to x and set it to 0

which reduces to n×n linear system of normal equations

22( ) || || ( ) ( )

2

T T

T T T T T

x r r r b Ax b Ax

b b x A b x A Ax

2 2 0T TA Ax A b

T TA Ax A b

Necessary condition for a

minimum: ( ) 0x

Page 13: Scientific Computing Chapter 3 - Linear Least squares

Geometric Interpretation Vector v1 and v2 are orthogonal if their

product is zero, Space spanned by columns of m×n matrix

A, , is of dimension at most n

if m>n, b generally does not lie in span(A), so there is no exact solution to Ax = b

1 2 0T v v

( ) { : }nspan x x A A

Page 14: Scientific Computing Chapter 3 - Linear Least squares

Geometric Interpretation Vector y = Ax in span(A) closest to b in 2-norm

occurs when residual r = b – Ax is orthogonal to span(A)

again giving system of normal equations

The solution to the least squares problem is

0 ( )T T x A r A b A

T Tx A A A b

1( )T Tx A A A b

Page 15: Scientific Computing Chapter 3 - Linear Least squares

Pseudo-inverse and Condition Number

Non-square m×n matrix A has no inverse in usual sense

If rank(A) = n, pseudo-inverse is defined by

and condition number by

By convention,

1( )T TA A A A

2 2( ) || || || ||cond A A A

( ) if ( )cond A rank A n

Page 16: Scientific Computing Chapter 3 - Linear Least squares

Condition Number Just as condition number of square matrix

measures closeness to singularity, condition number of rectangular matrix measures closeness to rank deficiency

Least squares solution of is given byAx b

1( )T Tx A A A b

Page 17: Scientific Computing Chapter 3 - Linear Least squares

Sensitivity and Conditioning

Sensitivity of least squares solution to depends on b as well as A

Define angle θ between b and y = Ax by

Bound on perturbation ∆x in solution x due to perturbation ∆b in b is given by

Ax b

2 2

2 2

|| || || ||cos( )|| || || ||

y Axb b

2 2

2 2

( )|| || || ||1|| || cos( ) || ||

x bcondx b

A

Page 18: Scientific Computing Chapter 3 - Linear Least squares

Sensitivity and Conditioning

Similarly, for perturbation E in matrix A,

Condition number of least squares solution is about cond(A) if residual is small, but can be squared or arbitrarily worse for large residual

22 2

2 2

|| || || ||([ ] tan( ) )|| || || ||

( ) ( )cond condx Ex A

A A

Page 19: Scientific Computing Chapter 3 - Linear Least squares

Outline Least Squares Data Fitting

Existence, Uniqueness, and Conditioning

Solving Linear Least Squares Problems

Page 20: Scientific Computing Chapter 3 - Linear Least squares

Normal Equations Method If m×n matrix A has rank n, then symmetric n×n matrix

ATA is positive definite, so its Cholesky factorization

can be used to obtain solution x to system of normal equations

which has same solution as linear squares problem Normal equations method involves transformations

T TA A LL

T Tx bA A A

rectangular

square triangular

Page 21: Scientific Computing Chapter 3 - Linear Least squares

Example: Normal Equations Method

For polynomial data-fitting example given previously, normal equations method gives

Page 22: Scientific Computing Chapter 3 - Linear Least squares

Example, continued Cholesky factorization of symmetric positive definite

matrix gives

Solving lower triangular system Lz = ATb by forward-substitution gives

Solving upper triangular system LTx = z by back-substitution gives

TA A

[1.789 0.632 1.336]Tz

[0.086 0.400 1.429]Tx

Page 23: Scientific Computing Chapter 3 - Linear Least squares

Shortcomings of Normal Equations

Information can be lost in forming ATA and ATb For example, take

where is positive number smaller than Then in floating-point arithmetic

which is singular

1 10

0A

mach

2

2

1 11 11 11 1

T

A A

Page 24: Scientific Computing Chapter 3 - Linear Least squares

Shortcomings of Normal Equations

Sensitivity of solution is also worsened, since

2( ) [ ( )]Tcond condA A A

Page 25: Scientific Computing Chapter 3 - Linear Least squares

Augmented System Method

Definition of residual together with orthogonality requirement give (m+n)×(m+n) augmented system

Augmented system is symmetric but not positive definite, is larger than original system, and requires storing two copies of A

But it allows greater freedom in choosing pivots in computing LU factorization

0 0T

I A r bA x

0T

Ax r bA r

Page 26: Scientific Computing Chapter 3 - Linear Least squares

Augmented System Introducing scaling parameter gives system

which allows control over relative weights of two subsystems in choosing pivots

Reasonable rule of thumb is to take

Augmented system is sometimes useful, but is far from ideal in work and storage required

/

0 0T

r bx

I AA

,max | | /1000iji j

a

Page 27: Scientific Computing Chapter 3 - Linear Least squares

Orthogonal Transformation We seek alternative method that avoids

numerical difficulties of normal equations We need numerically robust transformation

that produces easier problem without changing solution

What kind of transformation leaves least squares solution unchanged?

Page 28: Scientific Computing Chapter 3 - Linear Least squares

Orthogonal Transformations

Square matrix Q is orthogonal if Multiplication of vector by orthogonal

matrix preserves Euclidean norm

Thus, multiplying both sides of least squares problem by orthogonal matrix does not change its solution

TQ Q I

2 22 2|| || ( ) || ||T T T TQv Qv Qv v Q Qv v v v

Page 29: Scientific Computing Chapter 3 - Linear Least squares

Triangular Least Squares As with square linear systems, suitable target in

simplifying least squares problems is triangular form Upper triangular over-determined (m>n) least squares

problem has form

where R is nxn upper triangular and b is partitioned similarly

Residual is

1

20x

bRb

2 2 22 1 2 2 2|| || || || || || r b Rx b

Page 30: Scientific Computing Chapter 3 - Linear Least squares

Triangular Least Squares We have no control over second term, , but

first term becomes zero if x satisfies n×n triangular system

which can be solved by back-substitution Resulting x is least squares solution, and

minimum sum of squares is

22 2|| ||b

1Rx b

2 22 2 2|| || || ||r b

Page 31: Scientific Computing Chapter 3 - Linear Least squares

Triangular Least Squares So our strategy is to transform general least

squares problem to triangular form using orthogonal transformation so that least squares solution is preserved

Page 32: Scientific Computing Chapter 3 - Linear Least squares

QR Factorization Given m×n matrix A, with m>n, we seek m×m orthogonal matrix

Q such that

where R is n×n and upper triangular Linear least squares problem is then transformed into triangular

least squares problem

which has same solution, since

0

RA Q

1

20T Tc

x xc

RQ A Q b

Page 33: Scientific Computing Chapter 3 - Linear Least squares

Orthogonal Bases If we partition m×m orthogonal matrix Q = [Q1 Q2],

where Q1 is m×n, then

is called reduced QR factorization of A Columns of Q1 are orthonomal basis for span(A) Solution to least squares problem is given by

solution to square system

x A b

Page 34: Scientific Computing Chapter 3 - Linear Least squares

Computing QR Factorization

To compute QR factorization of m×n matrix A, with m>n, we annihilate subdiagonal entries of successive columns of A, eventually reaching upper triangular form

Similar to LU factorization by Gaussian elimination, but use orthogonal transformations instead of elementary elimination matrices

Page 35: Scientific Computing Chapter 3 - Linear Least squares

Computing QR Factorization

Possible methods include Householder transformations

Givens rotations

Gram-Schmidt orthogonalization

Page 36: Scientific Computing Chapter 3 - Linear Least squares

Householder Transformations

Householder transformation has form

for nonzero vector v H is orthogonal and symmetric: Given vector a, we want to choose v so that

2T

T

vvH Iv v

1T H H H

1

10 0

0 0

a e

H

Page 37: Scientific Computing Chapter 3 - Linear Least squares

Householder transformation

1 2, || ||e where a

a v2( )a v v

Page 38: Scientific Computing Chapter 3 - Linear Least squares

Householder Transformation

Substituting into formula for H, we can take

And , with sign chosen to avoid cancellation, i.e.

1 v a e

2|| || a

1 2( ) || ||sign a a

Page 39: Scientific Computing Chapter 3 - Linear Least squares

Example: Householder Transformation

If , then we take

where Since a1 is positive, we choose negative sign for

to avoid cancellation, so

2 1 2 Ta

1

2 1 21 0 1 02 0 2 0

v a e

2|| || 3 a

2 3 51 0 12 0 2

v

Page 40: Scientific Computing Chapter 3 - Linear Least squares

Example: Householder Transformation

To confirm that transformation works,

Page 41: Scientific Computing Chapter 3 - Linear Least squares

Householder QR Factorization

To compute QR factorization of A, use householder transformation to annihilate sub-diagonal entries of each successive column

Each Householder transformation is applied to entire matrix, but does not affect prior columns, so zeros are preserved

Process just described produces factorization

where R is n×n and upper triangular

1 0n

RH H A

Page 42: Scientific Computing Chapter 3 - Linear Least squares

Householder QR Factorization

If , then

In applying Householder transformation H to arbitrary vector u

which is much cheaper than general matrix-vector multiplication and requires only vector v, not full matrix H

2 2T T

T T

vv v uHu I u u vv v v v

1 nQ H H

1 0n

RH H A

0

QAR1

1 1( )n n H H H H Q

Page 43: Scientific Computing Chapter 3 - Linear Least squares

Implementation of Householder QR Factorization

Pseudo descriptionFor k = 1 to n {loop over columns} { compute Householder vector for current col}

if βk = 0 then { skip current column if continue with next k it’s already zero } for j = k to n { apply transformation to remaining sub-matrix } endend

2 2( )k kk kk mksign a a a

0 0 Tk kk mk k ka a e v

Tk k k v v

(2 / )

Tj k j

j j j k k

v a

a a v

Page 44: Scientific Computing Chapter 3 - Linear Least squares

Householder QR Factorization

To preserve solution of linear least squares problem, right-hand side b is transformed by same sequence of Householder transformations

Then solve triangular least squares problem

For solving linear least squares problem, product Q of Householder transformations need not be formed explicitly

0Tx b

RQ

Page 45: Scientific Computing Chapter 3 - Linear Least squares

Householder QR Factorization

R can be stored in upper triangle of array initially containing A

Householder vectors v can be stored in (now zero) lower triangular portion of A (almost)

Householder transformations most easily applied in this form anyway

Page 46: Scientific Computing Chapter 3 - Linear Least squares

Example: Householder QR Factorization

For polynomial data fitting example given previously, with

Householder vector v1 for annihilating subdiagonal entries of first column of A is

Page 47: Scientific Computing Chapter 3 - Linear Least squares

Example, continued Applying resulting Householder transformation H1

yields transformed matrix and right-hand side

Householder vector v2 for annihilating subdiagonal entries of second column of H1A is

Page 48: Scientific Computing Chapter 3 - Linear Least squares

Example, continued Applying resulting Householder transformation H2 yields

Householder vector v3 for annihilating subdiagonal entries of third column of H2H1A

Page 49: Scientific Computing Chapter 3 - Linear Least squares

Example, continued Applying resulting Householder transformation H3

yields

Now solve upper triangular system by back-substitution to obtain

1x cR 0.086 0.400 1.429 Tx

Page 50: Scientific Computing Chapter 3 - Linear Least squares

Givens rotations Given rotations introduce zeros one at a time Given vector , choose scalars c and s so

that

Previous equation can be rewritten

1

2 0c ss c

aa

1 2Ta a

2 2 2 21 2with , or equivalently, 1c s a a

1 2

2 1 0a a ca a s

Page 51: Scientific Computing Chapter 3 - Linear Least squares

Givens Rotations Gaussian elimination yields triangular system

Back-substitution then gives

Finally, , implies2 2 2 2

1 21, or =c s a a

Page 52: Scientific Computing Chapter 3 - Linear Least squares

Example: Givens Rotation Let , to annihilate second entry we compute

cosine and sine

Rotation is then given by

To confirm that rotation works,

4 3 Ta

Page 53: Scientific Computing Chapter 3 - Linear Least squares

Givens QR Factorization More generally, to annihilate selected component of

vector in n dimensions, rotate target component with another component

By systematically annihilating successive entries, we can reduce matrix to upper triangular form using sequence of Given rotations

Page 54: Scientific Computing Chapter 3 - Linear Least squares

Givens QR Factorization Each rotation is orthogonal, so their product

is orthogonal, producing QR factorization Straightforward implementation of Givens

method requires about 50% more work than Householder method, and also requires more storage, since each rotation requires two numbers, c and s, to define it

Page 55: Scientific Computing Chapter 3 - Linear Least squares

Givens QR Factorization These disadvantages can be overcome, but

requires more complicated implementation Givens can be advantageous for computing

QR factorization when many entries of matrix are already zero, since those annihilations can be skipped

Page 56: Scientific Computing Chapter 3 - Linear Least squares

Gram-Schmidt Orthogonalization

Given vectors a1 and a2, we seek orthonormal vectors q1 and q2 having same span

This can be accomplished by subtracting from second vector its projection onto first vector and normalizing both resulting vectors, as shown in diagram

Page 57: Scientific Computing Chapter 3 - Linear Least squares

Gram-Schmidt Orthogonalization

Process can be extended to any number of vectors , orthogonalizing each successive vector against all preceding ones, giving classical Gram-Schmidt

Resulting qk and rjk form reduced QR factorization of A

Page 58: Scientific Computing Chapter 3 - Linear Least squares

Modified Gram-Schmidt Classical Gram-Schmidt procedure often suffers loss

of orthogonality in finite-precision Also, separate storage is required for A, Q, and R,

since original ak are need in inner loop, so qk cannot overwrite columns of A

Both deficiencies are improved by modified Gram-Schmidt procedure, with each vector orthogonalized in turn against all subsequent vectors, so qk can overwrite ak

Page 59: Scientific Computing Chapter 3 - Linear Least squares

Modified Gram-Schmidt Factorization

Modified Gram-Schmidt algorithm

Page 60: Scientific Computing Chapter 3 - Linear Least squares

Rank Deficiency If rank(A)<n, then QR factorization still exists, but

yields singular upper triangular factor R, and multiple vectors x give minimum residual norm

Common practice selects minimum residual solutions x having smallest norm

Can be computed by QR factorization with column pivoting or by singular value decomposition (SVD)

Rank of matrix is often not clear cut in practice, so relative tolerance is used to determine rank

Page 61: Scientific Computing Chapter 3 - Linear Least squares

Example: Near Rank Deficiency

Consider 3x2 matrix

Computing QR factorization

R is extremely close to singular (exactly singular to 3-digit accuracy of problem statement)

Page 62: Scientific Computing Chapter 3 - Linear Least squares

Example: Near Rank Deficiency

If R is used to solve linear least squares problem, result is highly sensitive to perturbations in right-hand side

For practical purpose, rank(A)=1 rather than 2, because columns are nearly linearly dependent

Page 63: Scientific Computing Chapter 3 - Linear Least squares

QR with Column Pivoting Instead of processing columns in natural order, select

for reduction at each stage column of remaining unreduced submatrix having maximum Euclidean norm

If rank(A)=k<n, then after k steps, norms of remaining unreduced columns will be zero (or “negligible” in finite-precision arithmetic) below row k

Yields orthogonal factorization of form

Page 64: Scientific Computing Chapter 3 - Linear Least squares

QR with Column Pivoting Basic Solution to least squares problem can now

be computed by solving triangular system , where c1 contains first k components of QTb, and then taking

Minimum-norm solution can be computed, if desired, at expense of additional processing to annihilate S

Ax b

1z R c

0

zx P

Page 65: Scientific Computing Chapter 3 - Linear Least squares

QR with Column Pivoting Rank(A) is usually unknown, so rank is

determined by monitoring norms of remaining unreduced columns and terminating factorization when maximum value falls below chosen tolerance

Page 66: Scientific Computing Chapter 3 - Linear Least squares

Singular Value Decomposition

Singular value decomposition (SVD) of m×n matrix A has form

where U is m×m orthogonal matrix, V is n×n orthogonal matrix, and ∑ is m×n diagonal matrix, with

m m m n n nT

A U Σ V

0 for0 forij

i

i ji j

Page 67: Scientific Computing Chapter 3 - Linear Least squares

Singular Value Decomposition

Diagonal entries , called singular values of A, are usually ordered so that

Columns ui of U and vi of V are called left and right singular vectors

Computation of SVD will be discussed in next chapter, here only covers the application of SVD

i

1 2 n

Page 68: Scientific Computing Chapter 3 - Linear Least squares

Example: SVD

SVD of is given by

Page 69: Scientific Computing Chapter 3 - Linear Least squares

Applications of SVD Minimum norm solution to is given by

For ill-conditioned or rank deficient problems, “small” singular values can be omitted from summation to stabilize solution

Euclidean matrix norm: Euclidean condition number of matrix: Rank of matrix: number of nonzero singular values

x A b

Page 70: Scientific Computing Chapter 3 - Linear Least squares

Pseudo-inverse Define pseudo-inverse of ( possibly rectangular )

diagonal matrix by transposing and taking scalar pseudo-inverse of each entry

Then pseudo-inverse of general real m×n matrix A is given by

Pseudo-inverse always exists whether or not matrix is square or has full rank If A is square and nonsingular, then In all cases, minimum-norm solution to is given by

1 A Ax bA

x bA

Page 71: Scientific Computing Chapter 3 - Linear Least squares

Lower-Rank Matrix Approximation

Another way to write SVD is

with Ei has rank 1 and can be stored using only m+n storage locations

Condensed approximation to A is obtained by omitting from summation terms corresponding to small singular values

Approximation using k largest singular values is closest matrix of rank k to A

Approximation is useful in image processing, data compression , information retrieval, cryptography, etc

1 1 2 2T

n n A UΣV E E ET

i i iE u v

Page 72: Scientific Computing Chapter 3 - Linear Least squares

Total Least Squares Ordinary least squares is applicable when right-

hand side b is subject to random error but matrix A is known accurately

When all data, including A, are subject to error, then total least squares is more appropriate

Total least squares minimizes orthogonal distances, rather than vertical distances, between model and data

Page 73: Scientific Computing Chapter 3 - Linear Least squares

Total least squares solution can be computed from SVD of [A, b]

1, 1

1, 1, 1

1 n

n nn n

v

vv

x

Page 74: Scientific Computing Chapter 3 - Linear Least squares

Comparison of Methods Forming normal equations matrix ATA requires about

n2m/2 multiplications, and solving resulting symmetric linear system requires about n3/6 multiplications

Solving least squares problem using Householder QR factorization requires about mn2-n3/3 multiplications

If m ≈ n, both methods require about same amount of work

If m>>n, Householder QR requires about twice as much as normal equations

Page 75: Scientific Computing Chapter 3 - Linear Least squares

Comparison of Methods Cost of SVD is proportional to mn2+n3, with

proportionality constant ranging from 4 to 10, depending on algorithms used

Normal equations method produces solution whose relative error is proportional to [cond(A)]2

Required Cholesky factorization can be expected to break down if or worse ( ) 1/ machcond A

Page 76: Scientific Computing Chapter 3 - Linear Least squares

Comparison of Methods Householder method produces solution whose

relative error is proportional to

which is best possible, since is inherent sensitivity of solution to least squares problem

Householder method can be expected to break down (in back-substitution phase) only if cond(A) or worse

22( ) || || [ ( )]cond A r cond A

1/ mach

Page 77: Scientific Computing Chapter 3 - Linear Least squares

Comparison Methods Householder is more accurate and more broadly

applicable than normal equations These advantages may not be worth additional cost,

however, when problem is sufficiently well conditioned that normal equations provide sufficient accuracy

For rank-deficient or nearly rank-deficient problems, Householder with column pivoting can produce useful solution when normal equations method fails outright

SVD is even more robust and reliable than Householder, but substantially more expensive

Page 78: Scientific Computing Chapter 3 - Linear Least squares

Time Adjustment I want to adjust the class next Thursday to

next Tuesday Is there any time available?

Morning or afternoon?