Lecture 8: Fast Linear Solvers (Part 3)
1
Cholesky Factorization
⢠Matrix ðŽ is symmetric if ðŽ = ðŽð.
⢠Matrix ðŽ is positive definite if for all ð â ð, ðððšð > 0.
⢠A symmetric positive definite matrix ðŽ has Cholesky factorization ðŽ = ð¿ð¿ð, where ð¿ is a lower triangular matrix with positive diagonal entries.
⢠Example.
â ðŽ = ðŽð with ððð > 0 and ððð > |ððð|ðâ ð is positive definite.
2
ðŽ =
ð11 ð12 ⊠ð1ðð12 ð22 ⊠ð2ðâ®
ð1ð
â®ð2ð
â±âŠ
â®ððð
=
ð11 0 ⊠0ð21 ð22 ⊠0â®ðð1
â®ðð2
â±âŠ
â®ððð
ð11 ð21 ⊠ðð10 ð22 ⊠ðð2â®0
â®0
â±âŠ
â®ððð
3
In 2 Ã 2 matrix size case ð11 ð21ð21 ð22
=ð11 0ð21 ð22
ð11 ð210 ð22
ð11 = ð11; ð21 = ð21/ð11; ð22 = ð22 â ð212
4
Submatrix Cholesky Factorization Algorithm
5
for ð = 1 to ð ððð = ððð for ð = ð + 1 to ð ððð = ððð/ððð end for ð = ð + 1 to ð // reduce submatrix for ð = ð to ð ððð = ððð â ðððððð
end end end
Equivalent to ðððððð due
to symmetry
6
Remark:
1. This is a variation of Gaussian Elimination algorithm.
2. Storage of matrix ðŽ is used to hold matrix ð¿ .
3. Only lower triangle of ðŽ is used (See ððð = ððð â ðððððð).
4. Pivoting is not needed for stability.
5. About ð3/6 multiplications and about ð3/6 additions are required.
Data Access Pattern
Read only
Read and write
Column Cholesky Factorization Algorithm
7
for ð = 1 to ð for ð = 1 to ð â 1 for ð = ð to ð ððð = ððð â ðððððð
end end ððð = ððð
for ð = ð + 1 to ð ððð = ððð/ððð
end end
Data Access Pattern
8
Read only
Read and write
Parallel Algorithm
⢠Parallel algorithms are similar to those for LU factorization.
9
References ⢠X. S. Li and J. W. Demmel, SuperLU_Dist: A scalable
distributed-memory sparse direct solver for unsymmetric linear systems, ACM Trans. Math. Software 29:110-140, 2003
⢠P. Hénon, P. Ramet, and J. Roman, PaStiX: A high-performance parallel direct solver for sparse symmetric positive definite systems, Parallel Computing 28:301-321, 2002
QR Factorization and HouseHolder Transformation
Theorem Suppose that matrix ðŽ is an ð à ð matrix with linearly independent columns, then ðŽ can be factored as ðŽ = ðð
where ð is an ð Ã ð matrix with orthonormal columns and ð is an invertible ð Ã ð upper triangular matrix.
10
QR Factorization by Gram-Schmidt Process
Consider matrix ðŽ = [ð1 ð2 ⊠|ðð]
Then,
ð1 = ð1, ð1 =ð1
||ð1||
ð2 = ð2 â (ð2 â ð1)ð1, ð2 =ð2
||ð2||
âŠ
ðð = ðð â ðð â ð1 ð1 ââ¯â ðð â ððâ1 ððâ1, ðð =ðð
||ðð||
ðŽ = ð1 ð2 ⊠ðð
= ð1 ð2 ⊠ðð
ð1 â ð1 ð2 â ð1 ⊠ðð â ð1
0 ð2 â ð2 ⊠ðð â ð2
â® â® â± â®0 0 ⊠ðð â ðð
= ðð
11
Classical/Modified Gram-Schmidt Process
for j=1 to n
ðð = ðð
for i = 1 to j-1
ððð = ðð
âðð (ð¶ððð ð ðððð)
ððð = ððâðð (ðððððððð)
ðð = ðð â ððððð
endfor
ððð = ||ðð||2
ðð = ðð/ððð
endfor 12
Householder Transformation
Let ð£ â ð ð be a nonzero vector, the ð à ð matrix
ð» = ðŒ â 2ððð
ððð
is called a Householder transformation (or reflector).
⢠Alternatively, let ð = ð/||ð||, ð» can be rewritten as
ð» = ðŒ â 2ððð.
Theorem. A Householder transformation ð» is symmetric and orthogonal, so ð» = ð»ð = ð»â1.
13
1. Let vector ð be perpendicular to ð.
ðŒ â 2ððð
ðððð = ð â 2
ð(ððð)
ððð= ð
2. Let ð =ð
| ð |. Any vector ð can be written as
ð = ð + ððð ð. ðŒ â 2ððð ð = ðŒ â 2ððð ð + ððð ð = ð â ððð ð
14
⢠Householder transformation is used to selectively zero out blocks of entries in vectors or columns of matrices.
⢠Householder is stable with respect to round-off error.
15
For a given a vector ð, find a Householder
transformation, ð», such that ð»ð = ð
10â®0
= ðð1
â Previous vector reflection (case 2) implies that vector ð is in parallel with ð â ð»ð.
â Let ð = ð â ð»ð = ð â ðð1, where ð = ±| ð | (when the arithmetic is exact, sign does not matter).
â ð = ð/|| ð||
â In the presence of round-off error, use
ð = ð + ð ððð ð¥1 | ð |ð1
to avoid catastrophic cancellation. Here ð¥1 is the first entry in the vector ð.
Thus in practice, ð = âð ððð ð¥1 | ð | 16
Algorithm for Computing the Householder Transformation
17
ð¥ð = max { ð¥1 , ð¥2 , ⊠, ð¥ð } for ð = 1 to ð ð£ð = ð¥ð/ð¥ð end
ðŒ = ð ððð(ð£1)[ð£12 + ð£2
2 +â¯+ ð£ð2]1/2
ð£1 = ð£1 + ðŒ ðŒ = âðŒð¥ð ð = ð/| ð |
Remark: 1. The computational complexity is ð(ð). 2. ð»ð is an ð ð operation compared to ð(ð2) for a general matrix-vector product. ð»ð = ð â ðð(ðð»ð).
QR Factorization by HouseHolder Transformation
Key idea:
Apply Householder transformation successively to columns of matrix to zero out sub-diagonal entries of each column.
18
⢠Stage 1 â Consider the first column of matrix ðŽ and determine a
HouseHolder matrix ð»1 so that ð»1
ð11ð21â®
ðð1
=
ðŒ10â®0
â Overwrite ðŽ by ðŽ1, where
ðŽ1 =
ðŒ1 ð12â ⯠ð1ð
â
0 ð22â â® ð2ð
â
â® â® â® â®0 ðð2
â ⯠ðððâ
= ð»1ðŽ
Denote vector which defines ð»1 vector ð1. ð1 = ð1/| ð1 |
Remark: Column vectors
ð12â
ð22â
â®ðð2â
âŠ
ð1ðâ
ð2ðâ
â®ðððâ
should be computed by
ð»ð = ð â ðð(ðð»ð). 19
⢠Stage 2 â Consider the second column of the updated matrix
ðŽ â¡ ðŽ1 = ð»1ðŽ and take the part below the diagonal to determine a Householder matrix ð»2
â so that
ð»2â
ð22â
ð32â
â®ðð2â
=
ðŒ2
0â®0
Remark: size of ð»2â is (ð â 1) à (ð â 1).
Denote vector which defines ð»2â vector ð2. ð2 = ð2/| ð2 |
â Inflate ð»2â to ð»2 where
ð»2 =1 00 ð»2
â
Overwrite ðŽ by ðŽ2 â¡ ð»2ðŽ1
20
⢠Stage k
â Consider the ðð¡â column of the updated matrix ðŽ and take the part below the diagonal to determine a Householder matrix ð»ð
â so that
ð»ðâ
ðððâ
ðð+1,ðâ
â®ðð,ðâ
=
ðŒð
0â®0
Remark: size of ð»ðâ is (ð â ð + 1) à (ð â ð + 1).
Denote vector which defines ð»ðâ vector ðð. ðð = ðð/| ðð |
â Inflate ð»ðâ to ð»ð where
ð»ð =ðŒðâ1 0
0 ð»ðâ , ðŒðâ1 is (ð â 1) à (ð â 1) identity matrix.
Overwrite ðŽ by ðŽð â¡ ð»ððŽðâ1
21
⢠After (ð â 1) stages, matrix ðŽ is transformed into an upper triangular matrix ð .
⢠ð = ðŽðâ1 = ð»ðâ1ðŽðâ2 = ⯠=ð»ðâ1ð»ðâ2âŠð»1ðŽ
⢠Set ðð = ð»ðâ1ð»ðâ2âŠð»1. Then ðâ1 = ðð. Thus ð = ð»1
ðð»2ð âŠð»ðâ1
ð
⢠ðŽ = ðð
22
Example. ðŽ =1 11 21 3
.
Find ð»1 such that ð»1
111
=ðŒ100
.
By ð = âð ððð ð¥1 | ð |,
ðŒ1 = â 3 = â1.721.
ð¢1 = 0.8881, ð¢2 = ð¢3 = 0.3250.
By ð»ð = ð â ðð ðð»ð , ð»1
123
=â3.46410.36611.3661
ð»1ðŽ =â1.721 â3.4641
0 0.36610 1.3661
23
Find ð»2â and ð»2, where
ð»2â 0.36611.3661
=ðŒ2
0
So ðŒ2 = â1.4143, ð2 =0.79220.6096
So ð = ð»2ð»1ðŽ =â1.7321 â3.4641
0 â1.41430 0
ð = (ð»2ð»1)ð= ð»1
ðð»2ð
=â0.5774 0.7071 â0.4082â0.5774 0 0.8165â0.5774 â0.7071 â0.4082
24
ðŽð = ð â ðð ð = ð
â ðððð ð = ððð â ð ð = ððð
Remark: Q rarely needs explicit construction
25
Parallel Householder QR
⢠Householder QR factorization is similar to Gaussian elimination for LU factorization.
â Additions + multiplications: ð4
3ð3 for QR versus
ð2
3ð3 for LU
⢠Forming Householder vector ððis similar to computing multipliers in Gaussian elimination.
⢠Subsequent updating of remaining unreduced portion of matrix is similar to Gaussian elimination.
⢠Parallel Householder QR is similar to parallel LU. Householder vectors need to broadcast among columns of matrix.
26
References
⢠M. Cosnard, J. M. Muller, and Y. Robert. Parallel QR decomposition of a rectangular matrix, Numer. Math. 48:239-249, 1986
⢠A. Pothen and P. Raghavan. Distributed orthogonal factorization: Givens and Householder algorithms, SIAM J. Sci. Stat. Comput. 10:1113-1134, 1989
27