lecture 8: fast linear solvers (part 3)zxu2/acms60212-40212/lec-09-3.pdfcholesky factorization...

Post on 22-Jan-2021

7 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Lecture 8: Fast Linear Solvers (Part 3)

1

Cholesky Factorization

‱ Matrix 𝐮 is symmetric if 𝐮 = 𝐮𝑇.

‱ Matrix 𝐮 is positive definite if for all 𝒙 ≠ 𝟎, 𝒙𝑇𝑹𝒙 > 0.

‱ A symmetric positive definite matrix 𝐮 has Cholesky factorization 𝐮 = 𝐿𝐿𝑇, where 𝐿 is a lower triangular matrix with positive diagonal entries.

‱ Example.

– 𝐮 = 𝐮𝑇 with 𝑎𝑖𝑖 > 0 and 𝑎𝑖𝑖 > |𝑎𝑖𝑗|𝑗≠𝑖 is positive definite.

2

𝐮 =

𝑎11 𝑎12 
 𝑎1𝑛𝑎12 𝑎22 
 𝑎2𝑛⋼

𝑎1𝑛

⋼𝑎2𝑛

⋱ 

⋼𝑎𝑛𝑛

=

𝑙11 0 
 0𝑙21 𝑙22 
 0⋼𝑙𝑛1

⋼𝑙𝑛2

⋱ 

⋼𝑙𝑛𝑛

𝑙11 𝑙21 
 𝑙𝑛10 𝑙22 
 𝑙𝑛2⋼0

⋼0

⋱ 

⋼𝑙𝑛𝑛

3

In 2 × 2 matrix size case 𝑎11 𝑎21𝑎21 𝑎22

=𝑙11 0𝑙21 𝑙22

𝑙11 𝑙210 𝑙22

𝑙11 = 𝑎11; 𝑙21 = 𝑎21/𝑙11; 𝑙22 = 𝑎22 − 𝑙212

4

Submatrix Cholesky Factorization Algorithm

5

for 𝑘 = 1 to 𝑛 𝑎𝑘𝑘 = 𝑎𝑘𝑘 for 𝑖 = 𝑘 + 1 to 𝑛 𝑎𝑖𝑘 = 𝑎𝑖𝑘/𝑎𝑘𝑘 end for 𝑗 = 𝑘 + 1 to 𝑛 // reduce submatrix for 𝑖 = 𝑗 to 𝑛 𝑎𝑖𝑗 = 𝑎𝑖𝑗 − 𝑎𝑖𝑘𝑎𝑗𝑘

end end end

Equivalent to 𝑎𝑖𝑘𝑎𝑘𝑗 due

to symmetry

6

Remark:

1. This is a variation of Gaussian Elimination algorithm.

2. Storage of matrix 𝐮 is used to hold matrix 𝐿 .

3. Only lower triangle of 𝐮 is used (See 𝑎𝑖𝑗 = 𝑎𝑖𝑗 − 𝑎𝑖𝑘𝑎𝑗𝑘).

4. Pivoting is not needed for stability.

5. About 𝑛3/6 multiplications and about 𝑛3/6 additions are required.

Data Access Pattern

Read only

Read and write

Column Cholesky Factorization Algorithm

7

for 𝑗 = 1 to 𝑛 for 𝑘 = 1 to 𝑗 − 1 for 𝑖 = 𝑗 to 𝑛 𝑎𝑖𝑗 = 𝑎𝑖𝑗 − 𝑎𝑖𝑘𝑎𝑗𝑘

end end 𝑎𝑗𝑗 = 𝑎𝑗𝑗

for 𝑖 = 𝑗 + 1 to 𝑛 𝑎𝑖𝑗 = 𝑎𝑖𝑗/𝑎𝑗𝑗

end end

Data Access Pattern

8

Read only

Read and write

Parallel Algorithm

‱ Parallel algorithms are similar to those for LU factorization.

9

References ‱ X. S. Li and J. W. Demmel, SuperLU_Dist: A scalable

distributed-memory sparse direct solver for unsymmetric linear systems, ACM Trans. Math. Software 29:110-140, 2003

‱ P. HĂ©non, P. Ramet, and J. Roman, PaStiX: A high-performance parallel direct solver for sparse symmetric positive definite systems, Parallel Computing 28:301-321, 2002

QR Factorization and HouseHolder Transformation

Theorem Suppose that matrix 𝐮 is an 𝑚 × 𝑛 matrix with linearly independent columns, then 𝐮 can be factored as 𝐮 = 𝑄𝑅

where 𝑄 is an 𝑚 × 𝑛 matrix with orthonormal columns and 𝑅 is an invertible 𝑛 × 𝑛 upper triangular matrix.

10

QR Factorization by Gram-Schmidt Process

Consider matrix 𝐮 = [𝒂1 𝒂2 
 |𝒂𝑛]

Then,

𝒖1 = 𝒂1, 𝒒1 =𝒖1

||𝒖1||

𝒖2 = 𝒂2 − (𝒂2 ∙ 𝒒1)𝒒1, 𝒒2 =𝒖2

||𝒖2||




𝒖𝑘 = 𝒂𝑘 − 𝒂𝑘 ∙ 𝒒1 𝒒1 −⋯− 𝒂𝑘 ∙ 𝒒𝑘−1 𝒒𝑘−1, 𝒒𝑘 =𝒖𝑘

||𝒖𝑘||

𝐮 = 𝒂1 𝒂2 
 𝒂𝑛

= 𝒒1 𝒒2 
 𝒒𝑛

𝒂1 ∙ 𝒒1 𝒂2 ∙ 𝒒1 
 𝒂𝑛 ∙ 𝒒1

0 𝒂2 ∙ 𝒒2 
 𝒂𝑛 ∙ 𝒒2

⋼ ⋼ ⋱ ⋼0 0 
 𝒂𝑛 ∙ 𝒒𝑛

= 𝑄𝑅

11

Classical/Modified Gram-Schmidt Process

for j=1 to n

𝒖𝑗 = 𝒂𝑗

for i = 1 to j-1

𝑟𝑖𝑗 = 𝒒𝑖

∗𝒂𝑗 (đ¶đ‘™đ‘Žđ‘ đ‘ đ‘–đ‘đ‘Žđ‘™)

𝑟𝑖𝑗 = 𝒒𝑖∗𝒖𝑗 (𝑀𝑜𝑑𝑖𝑓𝑖𝑒𝑑)

𝒖𝑗 = 𝒖𝑗 − 𝑟𝑖𝑗𝒒𝑖

endfor

𝑟𝑗𝑗 = ||𝒖𝑗||2

𝒒𝑗 = 𝒖𝑗/𝑟𝑗𝑗

endfor 12

Householder Transformation

Let 𝑣 ∈ 𝑅𝑛 be a nonzero vector, the 𝑛 × 𝑛 matrix

đ» = đŒ − 2𝒗𝒗𝑇

𝒗𝑇𝒗

is called a Householder transformation (or reflector).

‱ Alternatively, let 𝒖 = 𝒗/||𝒗||, đ» can be rewritten as

đ» = đŒ − 2𝒖𝒖𝑇.

Theorem. A Householder transformation đ» is symmetric and orthogonal, so đ» = đ»đ‘‡ = đ»âˆ’1.

13

1. Let vector 𝒛 be perpendicular to 𝒗.

đŒ − 2𝒗𝒗𝑇

𝒗𝑇𝒗𝒛 = 𝒛 − 2

𝒗(𝒗𝑇𝒛)

𝒗𝑇𝒗= 𝒛

2. Let 𝒖 =𝒗

| 𝒗 |. Any vector 𝒙 can be written as

𝒙 = 𝒛 + 𝒖𝑇𝒙 𝒖. đŒ − 2𝒖𝒖𝑇 𝒙 = đŒ − 2𝒖𝒖𝑇 𝒛 + 𝒖𝑇𝒙 𝒖 = 𝒛 − 𝒖𝑇𝒙 𝒖

14

‱ Householder transformation is used to selectively zero out blocks of entries in vectors or columns of matrices.

‱ Householder is stable with respect to round-off error.

15

For a given a vector 𝒙, find a Householder

transformation, đ», such that đ»đ’™ = 𝑎

10⋼0

= 𝑎𝒆1

– Previous vector reflection (case 2) implies that vector 𝒖 is in parallel with 𝒙 − đ»đ’™.

– Let 𝒗 = 𝒙 − đ»đ’™ = 𝒙 − 𝑎𝒆1, where 𝑎 = ±| 𝒙 | (when the arithmetic is exact, sign does not matter).

– 𝒖 = 𝒗/|| 𝒗||

– In the presence of round-off error, use

𝒗 = 𝒙 + 𝑠𝑖𝑔𝑛 đ‘„1 | 𝒙 |𝒆1

to avoid catastrophic cancellation. Here đ‘„1 is the first entry in the vector 𝒙.

Thus in practice, 𝑎 = −𝑠𝑖𝑔𝑛 đ‘„1 | 𝒙 | 16

Algorithm for Computing the Householder Transformation

17

đ‘„đ‘š = max { đ‘„1 , đ‘„2 , 
 , đ‘„đ‘› } for 𝑘 = 1 to 𝑛 𝑣𝑘 = đ‘„đ‘˜/đ‘„đ‘š end

đ›Œ = 𝑠𝑖𝑔𝑛(𝑣1)[𝑣12 + 𝑣2

2 +⋯+ 𝑣𝑛2]1/2

𝑣1 = 𝑣1 + đ›Œ đ›Œ = âˆ’đ›Œđ‘„đ‘š 𝒖 = 𝒗/| 𝒗 |

Remark: 1. The computational complexity is 𝑂(𝑛). 2. đ»đ’™ is an 𝑂 𝑛 operation compared to 𝑂(𝑛2) for a general matrix-vector product. đ»đ’™ = 𝒙 − 𝟐𝒖(đ’–đ‘»đ’™).

QR Factorization by HouseHolder Transformation

Key idea:

Apply Householder transformation successively to columns of matrix to zero out sub-diagonal entries of each column.

18

‱ Stage 1 – Consider the first column of matrix 𝐮 and determine a

HouseHolder matrix đ»1 so that đ»1

𝑎11𝑎21⋼

𝑎𝑛1

=

đ›Œ10⋼0

– Overwrite 𝐮 by 𝐮1, where

𝐮1 =

đ›Œ1 𝑎12∗ ⋯ 𝑎1𝑛

∗

0 𝑎22∗ ⋼ 𝑎2𝑛

∗

⋼ ⋼ ⋼ ⋼0 𝑎𝑛2

∗ ⋯ 𝑎𝑛𝑛∗

= đ»1𝐮

Denote vector which defines đ»1 vector 𝒗1. 𝒖1 = 𝒗1/| 𝒗1 |

Remark: Column vectors

𝑎12∗

𝑎22∗

⋼𝑎𝑛2∗




𝑎1𝑛∗

𝑎2𝑛∗

⋼𝑎𝑛𝑛∗

should be computed by

đ»đ’™ = 𝒙 − 𝟐𝒖(đ’–đ‘»đ’™). 19

‱ Stage 2 – Consider the second column of the updated matrix

𝐮 ≡ 𝐮1 = đ»1𝐮 and take the part below the diagonal to determine a Householder matrix đ»2

∗ so that

đ»2∗

𝑎22∗

𝑎32∗

⋼𝑎𝑛2∗

=

đ›Œ2

0⋼0

Remark: size of đ»2∗ is (𝑛 − 1) × (𝑛 − 1).

Denote vector which defines đ»2∗ vector 𝒗2. 𝒖2 = 𝒗2/| 𝒗2 |

– Inflate đ»2∗ to đ»2 where

đ»2 =1 00 đ»2

∗

Overwrite 𝐮 by 𝐮2 ≡ đ»2𝐮1

20

‱ Stage k

– Consider the 𝑘𝑡ℎ column of the updated matrix 𝐮 and take the part below the diagonal to determine a Householder matrix đ»đ‘˜

∗ so that

đ»đ‘˜âˆ—

𝑎𝑘𝑘∗

𝑎𝑘+1,𝑘∗

⋼𝑎𝑛,𝑛∗

=

đ›Œđ‘˜

0⋼0

Remark: size of đ»đ‘˜âˆ— is (𝑛 − 𝑘 + 1) × (𝑛 − 𝑘 + 1).

Denote vector which defines đ»đ‘˜âˆ— vector 𝒗𝑘. 𝒖𝑘 = 𝒗𝑘/| 𝒗𝑘 |

– Inflate đ»đ‘˜âˆ— to đ»đ‘˜ where

đ»đ‘˜ =đŒđ‘˜âˆ’1 0

0 đ»đ‘˜âˆ— , đŒđ‘˜âˆ’1 is (𝑘 − 1) × (𝑘 − 1) identity matrix.

Overwrite 𝐮 by 𝐮𝑘 ≡ đ»đ‘˜đŽđ‘˜âˆ’1

21

‱ After (𝑛 − 1) stages, matrix 𝐮 is transformed into an upper triangular matrix 𝑅.

‱ 𝑅 = 𝐮𝑛−1 = đ»đ‘›âˆ’1𝐮𝑛−2 = ⋯ =đ»đ‘›âˆ’1đ»đ‘›âˆ’2â€Šđ»1𝐮

‱ Set 𝑄𝑇 = đ»đ‘›âˆ’1đ»đ‘›âˆ’2â€Šđ»1. Then 𝑄−1 = 𝑄𝑇. Thus 𝑄 = đ»1

đ‘‡đ»2𝑇 â€Šđ»đ‘›âˆ’1

𝑇

‱ 𝐮 = 𝑄𝑅

22

Example. 𝐮 =1 11 21 3

.

Find đ»1 such that đ»1

111

=đ›Œ100

.

By 𝑎 = −𝑠𝑖𝑔𝑛 đ‘„1 | 𝒙 |,

đ›Œ1 = − 3 = −1.721.

𝑱1 = 0.8881, 𝑱2 = 𝑱3 = 0.3250.

By đ»đ’™ = 𝒙 − 𝟐𝒖 đ’–đ‘»đ’™ , đ»1

123

=−3.46410.36611.3661

đ»1𝐮 =−1.721 −3.4641

0 0.36610 1.3661

23

Find đ»2∗ and đ»2, where

đ»2∗ 0.36611.3661

=đ›Œ2

0

So đ›Œ2 = −1.4143, 𝒖2 =0.79220.6096

So 𝑅 = đ»2đ»1𝐮 =−1.7321 −3.4641

0 −1.41430 0

𝑄 = (đ»2đ»1)𝑇= đ»1

đ‘‡đ»2𝑇

=−0.5774 0.7071 −0.4082−0.5774 0 0.8165−0.5774 −0.7071 −0.4082

24

𝐮𝒙 = 𝒃 ⇒ 𝑄𝑅𝒙 = 𝒃

⇒ 𝑄𝑇𝑄𝑅𝒙 = 𝑄𝑇𝒃 ⇒ 𝑅𝒙 = 𝑄𝑇𝒃

Remark: Q rarely needs explicit construction

25

Parallel Householder QR

‱ Householder QR factorization is similar to Gaussian elimination for LU factorization.

– Additions + multiplications: 𝑂4

3𝑛3 for QR versus

𝑂2

3𝑛3 for LU

‱ Forming Householder vector 𝒗𝑘is similar to computing multipliers in Gaussian elimination.

‱ Subsequent updating of remaining unreduced portion of matrix is similar to Gaussian elimination.

‱ Parallel Householder QR is similar to parallel LU. Householder vectors need to broadcast among columns of matrix.

26

References

‱ M. Cosnard, J. M. Muller, and Y. Robert. Parallel QR decomposition of a rectangular matrix, Numer. Math. 48:239-249, 1986

‱ A. Pothen and P. Raghavan. Distributed orthogonal factorization: Givens and Householder algorithms, SIAM J. Sci. Stat. Comput. 10:1113-1134, 1989

27

top related