scientific computing chapter 3 - linear least squares
DESCRIPTION
Scientific Computing Chapter 3 - Linear Least squares. Guanghua tan [email protected] School of computer and communication, HNU. Outline. Least Squares Data Fitting. Least Squares Data Fitting Existence, Uniqueness, and Conditioning Solving Linear Least Squares Problems. - PowerPoint PPT PresentationTRANSCRIPT
Scientific Computing Chapter 3 - Linear Least squares
Guanghua [email protected]
School of computer and communication, HNU
Outline Least Squares Data Fitting
Existence, Uniqueness, and Conditioning
Solving Linear Least Squares Problems
Least Squares Data Fitting
Method of Least Squares Measurement errors are inevitable in
observational and experimental sciences Errors can be smoothed out by averaging over
many cases, i.e., taking more measurements than are strictly necessary to determine parameters of system
Resulting system is over-determined, so usually there is no exact solution
Example: Fitting a plane A plane can be determined by 3 points
Scan points may contain many points Plane equation 0ax by cz
( , , )i i i ip x y z
1 1 1
2 2 2
3
0
n n n n
x y za
x y zbc
x y z
Linear Least Squares For linear problems, we obtain over-determined
linear system Ax = b, with m×n matrix A, m > n System is better written , since equality is
not exactly satisfiable when m>n Least squares solution x minimizes squared
Euclidean norm of residual vector
Ax b
r b Ax 2 22 2min || || min || ||
x xr b Ax
Data Fitting Given m data points (ti, yi), find n-vector x of
parameters that gives “best fit” to model function f(t, x)
Problem is linear if function f is linear in components of x,
where functions depend only on t
2
1
min ( ( , ))m
i ix i
y f t x
1 1 2 2( , ) ( ) ( ) ( )i n nf t x x t x t x t ( )j t
Data Fitting Polynomial fitting
is linear, since polynomial linear in coefficients, though nonlinear in independent variable t
Fitting sum of exponentials
is example of nonlinear problem For now, we will consider only linear least squares
problems
2 11 2 3( , ) n
nf t x x x t x t x t
21 1( , ) nx tx t
nf t x x e x e
Fitting quadratic polynomial to five data points gives linear squares problem
Example: Data Fitting
Example: Data Fitting For data
over-determined 5×3 linear system is
Solution, we will see how to compute later, is
So approximating polynomial is
Outline Least Squares Data Fitting
Existence, Uniqueness, and Conditioning
Solving Linear Least Squares Problems
Existence and Uniqueness Linear least squares problem always has
solution Solution is unique, if and only if, columns of A are
linearly independent, i.e., rank(A) = n, where A is m × n
If rank(A) < n, then A is rank-deficient, and solution of linear least squares problem is not unique
For now we assume A has full column rank n
Ax b
Normal Equations To minimize squared Euclidean norm of residual
vector
take derivative with respect to x and set it to 0
which reduces to n×n linear system of normal equations
22( ) || || ( ) ( )
2
T T
T T T T T
x r r r b Ax b Ax
b b x A b x A Ax
2 2 0T TA Ax A b
T TA Ax A b
Necessary condition for a
minimum: ( ) 0x
Geometric Interpretation Vector v1 and v2 are orthogonal if their
product is zero, Space spanned by columns of m×n matrix
A, , is of dimension at most n
if m>n, b generally does not lie in span(A), so there is no exact solution to Ax = b
1 2 0T v v
( ) { : }nspan x x A A
Geometric Interpretation Vector y = Ax in span(A) closest to b in 2-norm
occurs when residual r = b – Ax is orthogonal to span(A)
again giving system of normal equations
The solution to the least squares problem is
0 ( )T T x A r A b A
T Tx A A A b
1( )T Tx A A A b
Pseudo-inverse and Condition Number
Non-square m×n matrix A has no inverse in usual sense
If rank(A) = n, pseudo-inverse is defined by
and condition number by
By convention,
1( )T TA A A A
2 2( ) || || || ||cond A A A
( ) if ( )cond A rank A n
Condition Number Just as condition number of square matrix
measures closeness to singularity, condition number of rectangular matrix measures closeness to rank deficiency
Least squares solution of is given byAx b
1( )T Tx A A A b
Sensitivity and Conditioning
Sensitivity of least squares solution to depends on b as well as A
Define angle θ between b and y = Ax by
Bound on perturbation ∆x in solution x due to perturbation ∆b in b is given by
Ax b
2 2
2 2
|| || || ||cos( )|| || || ||
y Axb b
2 2
2 2
( )|| || || ||1|| || cos( ) || ||
x bcondx b
A
Sensitivity and Conditioning
Similarly, for perturbation E in matrix A,
Condition number of least squares solution is about cond(A) if residual is small, but can be squared or arbitrarily worse for large residual
22 2
2 2
|| || || ||([ ] tan( ) )|| || || ||
( ) ( )cond condx Ex A
A A
Outline Least Squares Data Fitting
Existence, Uniqueness, and Conditioning
Solving Linear Least Squares Problems
Normal Equations Method If m×n matrix A has rank n, then symmetric n×n matrix
ATA is positive definite, so its Cholesky factorization
can be used to obtain solution x to system of normal equations
which has same solution as linear squares problem Normal equations method involves transformations
T TA A LL
T Tx bA A A
rectangular
square triangular
Example: Normal Equations Method
For polynomial data-fitting example given previously, normal equations method gives
Example, continued Cholesky factorization of symmetric positive definite
matrix gives
Solving lower triangular system Lz = ATb by forward-substitution gives
Solving upper triangular system LTx = z by back-substitution gives
TA A
[1.789 0.632 1.336]Tz
[0.086 0.400 1.429]Tx
Shortcomings of Normal Equations
Information can be lost in forming ATA and ATb For example, take
where is positive number smaller than Then in floating-point arithmetic
which is singular
1 10
0A
mach
2
2
1 11 11 11 1
T
A A
Shortcomings of Normal Equations
Sensitivity of solution is also worsened, since
2( ) [ ( )]Tcond condA A A
Augmented System Method
Definition of residual together with orthogonality requirement give (m+n)×(m+n) augmented system
Augmented system is symmetric but not positive definite, is larger than original system, and requires storing two copies of A
But it allows greater freedom in choosing pivots in computing LU factorization
0 0T
I A r bA x
0T
Ax r bA r
Augmented System Introducing scaling parameter gives system
which allows control over relative weights of two subsystems in choosing pivots
Reasonable rule of thumb is to take
Augmented system is sometimes useful, but is far from ideal in work and storage required
/
0 0T
r bx
I AA
,max | | /1000iji j
a
Orthogonal Transformation We seek alternative method that avoids
numerical difficulties of normal equations We need numerically robust transformation
that produces easier problem without changing solution
What kind of transformation leaves least squares solution unchanged?
Orthogonal Transformations
Square matrix Q is orthogonal if Multiplication of vector by orthogonal
matrix preserves Euclidean norm
Thus, multiplying both sides of least squares problem by orthogonal matrix does not change its solution
TQ Q I
2 22 2|| || ( ) || ||T T T TQv Qv Qv v Q Qv v v v
Triangular Least Squares As with square linear systems, suitable target in
simplifying least squares problems is triangular form Upper triangular over-determined (m>n) least squares
problem has form
where R is nxn upper triangular and b is partitioned similarly
Residual is
1
20x
bRb
2 2 22 1 2 2 2|| || || || || || r b Rx b
Triangular Least Squares We have no control over second term, , but
first term becomes zero if x satisfies n×n triangular system
which can be solved by back-substitution Resulting x is least squares solution, and
minimum sum of squares is
22 2|| ||b
1Rx b
2 22 2 2|| || || ||r b
Triangular Least Squares So our strategy is to transform general least
squares problem to triangular form using orthogonal transformation so that least squares solution is preserved
QR Factorization Given m×n matrix A, with m>n, we seek m×m orthogonal matrix
Q such that
where R is n×n and upper triangular Linear least squares problem is then transformed into triangular
least squares problem
which has same solution, since
0
RA Q
1
20T Tc
x xc
RQ A Q b
Orthogonal Bases If we partition m×m orthogonal matrix Q = [Q1 Q2],
where Q1 is m×n, then
is called reduced QR factorization of A Columns of Q1 are orthonomal basis for span(A) Solution to least squares problem is given by
solution to square system
x A b
Computing QR Factorization
To compute QR factorization of m×n matrix A, with m>n, we annihilate subdiagonal entries of successive columns of A, eventually reaching upper triangular form
Similar to LU factorization by Gaussian elimination, but use orthogonal transformations instead of elementary elimination matrices
Computing QR Factorization
Possible methods include Householder transformations
Givens rotations
Gram-Schmidt orthogonalization
Householder Transformations
Householder transformation has form
for nonzero vector v H is orthogonal and symmetric: Given vector a, we want to choose v so that
2T
T
vvH Iv v
1T H H H
1
10 0
0 0
a e
H
Householder transformation
1 2, || ||e where a
a v2( )a v v
Householder Transformation
Substituting into formula for H, we can take
And , with sign chosen to avoid cancellation, i.e.
1 v a e
2|| || a
1 2( ) || ||sign a a
Example: Householder Transformation
If , then we take
where Since a1 is positive, we choose negative sign for
to avoid cancellation, so
2 1 2 Ta
1
2 1 21 0 1 02 0 2 0
v a e
2|| || 3 a
2 3 51 0 12 0 2
v
Example: Householder Transformation
To confirm that transformation works,
Householder QR Factorization
To compute QR factorization of A, use householder transformation to annihilate sub-diagonal entries of each successive column
Each Householder transformation is applied to entire matrix, but does not affect prior columns, so zeros are preserved
Process just described produces factorization
where R is n×n and upper triangular
1 0n
RH H A
Householder QR Factorization
If , then
In applying Householder transformation H to arbitrary vector u
which is much cheaper than general matrix-vector multiplication and requires only vector v, not full matrix H
2 2T T
T T
vv v uHu I u u vv v v v
1 nQ H H
1 0n
RH H A
0
QAR1
1 1( )n n H H H H Q
Implementation of Householder QR Factorization
Pseudo descriptionFor k = 1 to n {loop over columns} { compute Householder vector for current col}
if βk = 0 then { skip current column if continue with next k it’s already zero } for j = k to n { apply transformation to remaining sub-matrix } endend
2 2( )k kk kk mksign a a a
0 0 Tk kk mk k ka a e v
Tk k k v v
(2 / )
Tj k j
j j j k k
v a
a a v
Householder QR Factorization
To preserve solution of linear least squares problem, right-hand side b is transformed by same sequence of Householder transformations
Then solve triangular least squares problem
For solving linear least squares problem, product Q of Householder transformations need not be formed explicitly
0Tx b
RQ
Householder QR Factorization
R can be stored in upper triangle of array initially containing A
Householder vectors v can be stored in (now zero) lower triangular portion of A (almost)
Householder transformations most easily applied in this form anyway
Example: Householder QR Factorization
For polynomial data fitting example given previously, with
Householder vector v1 for annihilating subdiagonal entries of first column of A is
Example, continued Applying resulting Householder transformation H1
yields transformed matrix and right-hand side
Householder vector v2 for annihilating subdiagonal entries of second column of H1A is
Example, continued Applying resulting Householder transformation H2 yields
Householder vector v3 for annihilating subdiagonal entries of third column of H2H1A
Example, continued Applying resulting Householder transformation H3
yields
Now solve upper triangular system by back-substitution to obtain
1x cR 0.086 0.400 1.429 Tx
Givens rotations Given rotations introduce zeros one at a time Given vector , choose scalars c and s so
that
Previous equation can be rewritten
1
2 0c ss c
aa
1 2Ta a
2 2 2 21 2with , or equivalently, 1c s a a
1 2
2 1 0a a ca a s
Givens Rotations Gaussian elimination yields triangular system
Back-substitution then gives
Finally, , implies2 2 2 2
1 21, or =c s a a
Example: Givens Rotation Let , to annihilate second entry we compute
cosine and sine
Rotation is then given by
To confirm that rotation works,
4 3 Ta
Givens QR Factorization More generally, to annihilate selected component of
vector in n dimensions, rotate target component with another component
By systematically annihilating successive entries, we can reduce matrix to upper triangular form using sequence of Given rotations
Givens QR Factorization Each rotation is orthogonal, so their product
is orthogonal, producing QR factorization Straightforward implementation of Givens
method requires about 50% more work than Householder method, and also requires more storage, since each rotation requires two numbers, c and s, to define it
Givens QR Factorization These disadvantages can be overcome, but
requires more complicated implementation Givens can be advantageous for computing
QR factorization when many entries of matrix are already zero, since those annihilations can be skipped
Gram-Schmidt Orthogonalization
Given vectors a1 and a2, we seek orthonormal vectors q1 and q2 having same span
This can be accomplished by subtracting from second vector its projection onto first vector and normalizing both resulting vectors, as shown in diagram
Gram-Schmidt Orthogonalization
Process can be extended to any number of vectors , orthogonalizing each successive vector against all preceding ones, giving classical Gram-Schmidt
Resulting qk and rjk form reduced QR factorization of A
Modified Gram-Schmidt Classical Gram-Schmidt procedure often suffers loss
of orthogonality in finite-precision Also, separate storage is required for A, Q, and R,
since original ak are need in inner loop, so qk cannot overwrite columns of A
Both deficiencies are improved by modified Gram-Schmidt procedure, with each vector orthogonalized in turn against all subsequent vectors, so qk can overwrite ak
Modified Gram-Schmidt Factorization
Modified Gram-Schmidt algorithm
Rank Deficiency If rank(A)<n, then QR factorization still exists, but
yields singular upper triangular factor R, and multiple vectors x give minimum residual norm
Common practice selects minimum residual solutions x having smallest norm
Can be computed by QR factorization with column pivoting or by singular value decomposition (SVD)
Rank of matrix is often not clear cut in practice, so relative tolerance is used to determine rank
Example: Near Rank Deficiency
Consider 3x2 matrix
Computing QR factorization
R is extremely close to singular (exactly singular to 3-digit accuracy of problem statement)
Example: Near Rank Deficiency
If R is used to solve linear least squares problem, result is highly sensitive to perturbations in right-hand side
For practical purpose, rank(A)=1 rather than 2, because columns are nearly linearly dependent
QR with Column Pivoting Instead of processing columns in natural order, select
for reduction at each stage column of remaining unreduced submatrix having maximum Euclidean norm
If rank(A)=k<n, then after k steps, norms of remaining unreduced columns will be zero (or “negligible” in finite-precision arithmetic) below row k
Yields orthogonal factorization of form
QR with Column Pivoting Basic Solution to least squares problem can now
be computed by solving triangular system , where c1 contains first k components of QTb, and then taking
Minimum-norm solution can be computed, if desired, at expense of additional processing to annihilate S
Ax b
1z R c
0
zx P
QR with Column Pivoting Rank(A) is usually unknown, so rank is
determined by monitoring norms of remaining unreduced columns and terminating factorization when maximum value falls below chosen tolerance
Singular Value Decomposition
Singular value decomposition (SVD) of m×n matrix A has form
where U is m×m orthogonal matrix, V is n×n orthogonal matrix, and ∑ is m×n diagonal matrix, with
m m m n n nT
A U Σ V
0 for0 forij
i
i ji j
Singular Value Decomposition
Diagonal entries , called singular values of A, are usually ordered so that
Columns ui of U and vi of V are called left and right singular vectors
Computation of SVD will be discussed in next chapter, here only covers the application of SVD
i
1 2 n
Example: SVD
SVD of is given by
Applications of SVD Minimum norm solution to is given by
For ill-conditioned or rank deficient problems, “small” singular values can be omitted from summation to stabilize solution
Euclidean matrix norm: Euclidean condition number of matrix: Rank of matrix: number of nonzero singular values
x A b
Pseudo-inverse Define pseudo-inverse of ( possibly rectangular )
diagonal matrix by transposing and taking scalar pseudo-inverse of each entry
Then pseudo-inverse of general real m×n matrix A is given by
Pseudo-inverse always exists whether or not matrix is square or has full rank If A is square and nonsingular, then In all cases, minimum-norm solution to is given by
1 A Ax bA
x bA
Lower-Rank Matrix Approximation
Another way to write SVD is
with Ei has rank 1 and can be stored using only m+n storage locations
Condensed approximation to A is obtained by omitting from summation terms corresponding to small singular values
Approximation using k largest singular values is closest matrix of rank k to A
Approximation is useful in image processing, data compression , information retrieval, cryptography, etc
1 1 2 2T
n n A UΣV E E ET
i i iE u v
Total Least Squares Ordinary least squares is applicable when right-
hand side b is subject to random error but matrix A is known accurately
When all data, including A, are subject to error, then total least squares is more appropriate
Total least squares minimizes orthogonal distances, rather than vertical distances, between model and data
Total least squares solution can be computed from SVD of [A, b]
1, 1
1, 1, 1
1 n
n nn n
v
vv
x
Comparison of Methods Forming normal equations matrix ATA requires about
n2m/2 multiplications, and solving resulting symmetric linear system requires about n3/6 multiplications
Solving least squares problem using Householder QR factorization requires about mn2-n3/3 multiplications
If m ≈ n, both methods require about same amount of work
If m>>n, Householder QR requires about twice as much as normal equations
Comparison of Methods Cost of SVD is proportional to mn2+n3, with
proportionality constant ranging from 4 to 10, depending on algorithms used
Normal equations method produces solution whose relative error is proportional to [cond(A)]2
Required Cholesky factorization can be expected to break down if or worse ( ) 1/ machcond A
Comparison of Methods Householder method produces solution whose
relative error is proportional to
which is best possible, since is inherent sensitivity of solution to least squares problem
Householder method can be expected to break down (in back-substitution phase) only if cond(A) or worse
22( ) || || [ ( )]cond A r cond A
1/ mach
Comparison Methods Householder is more accurate and more broadly
applicable than normal equations These advantages may not be worth additional cost,
however, when problem is sufficiently well conditioned that normal equations provide sufficient accuracy
For rank-deficient or nearly rank-deficient problems, Householder with column pivoting can produce useful solution when normal equations method fails outright
SVD is even more robust and reliable than Householder, but substantially more expensive
Time Adjustment I want to adjust the class next Thursday to
next Tuesday Is there any time available?
Morning or afternoon?