parallel numerics, wt 2016/2017 - 5 iterative methods for ... › ... › parnum › ws16 ›...

Parallel Numerics, WT 2016/2017

5 Iterative Methods for Sparse Linear Systemsof Equations

Parallel Numerics, WT 2016/2017 5 Iterative Methods for Sparse Linear Systems of Equations

page 1 of 1

Contents1 Introduction

1.1 Computer Science Aspects1.2 Numerical Problems1.3 Graphs1.4 Loop Manipulations

2 Elementary Linear Algebra Problems2.1 BLAS: Basic Linear Algebra Subroutines2.2 Matrix-Vector Operations2.3 Matrix-Matrix-Product

3 Linear Systems of Equations with Dense Matrices3.1 Gaussian Elimination3.2 Parallelization3.3 QR-Decomposition with Householder matrices

4 Sparse Matrices4.1 General Properties, Storage4.2 Sparse Matrices and Graphs4.3 Reordering4.4 Gaussian Elimination for Sparse Matrices

5 Iterative Methods for Sparse Linear Systems of Equations5.1 Stationary Methods5.2 Nonstationary Methods5.3 Preconditioning

6 Domain Decomposition6.1 Overlapping Domain Decomposition6.2 Non-overlapping Domain Decomposition6.3 Schur Complements


page 2 of 1

• Disadvantages of direct methods (in parallel):– strongly sequential– may lead to dense matrices– sparsity pattern changes, additional entries necessary– indirect addressing– storage– computational effort

• Iterative solver:– choose initial guess = starting vector x (0), e.g., x (0) = 0– iteration function x (k+1) := Φ(x (k))

• Applied on solving a linear system:– Main part of Φ should be a matrix-vector multiplication Ax

(matrix-free!?)– Easy to parallelize, no change in the pattern of A.

x (k) k→∞−→ x = A−1b

– Main problem: Fast convergence!


page 3 of 1

• Disadvantages of direct methods (in parallel):– strongly sequential– may lead to dense matrices– sparsity pattern changes, additional entries necessary– indirect addressing– storage– computational effort

• Iterative solver:– choose initial guess = starting vector x (0), e.g., x (0) = 0– iteration function x (k+1) := Φ(x (k))

• Applied on solving a linear system:– Main part of Φ should be a matrix-vector multiplication Ax

(matrix-free!?)– Easy to parallelize, no change in the pattern of A.

x (k) k→∞−→ x = A−1b

– Main problem: Fast convergence!Parallel Numerics, WT 2016/2017 5 Iterative Methods for Sparse Linear Systems of Equations

page 3 of 1

5.1. Stationary Methods5.1.1. Richardson Iteration


page 4 of 1

• Construct from Ax = b an iteration process:

b = Ax = ( A− I + I︸︷︷︸(artificial) splitting of A

)x = x − (I − A)x ⇒ x = b + (I − A)x

= b + Nx

• Leads to equation x = Φ(x) with Φ(x) := b + Nx :

start: x (0);

x (k+1) := Φ(x (k)) = b + Nx (k) = b + (I − A)x (k)


page 4 of 1

Richardson Iteration (cont.)start: x (0);

x (k+1) := Φ(x (k)) = b + Nx (k) = b + (I − A)x (k)

If x (k) is convergent, x (k) → x ,then

x = Φ(x) = b + Nx = b + (I − A)x ⇒ Ax = b

and therefore it holds

x (k) → x = x := A−1b

Residual-based formulation:

x (k+1) = Φ(x (k)) = b + (I − A)x (k) = b + x (k) − Ax (k)

= x (k) + (b − Ax (k))︸︷︷︸r(x) = residual

= x (k) + r(x (k))


page 5 of 1

Convergence Analysis via Neumann Series

x (k) = b + Nx (k−1) = b + N(b + Nx (k−2)) = b + Nb + N2x (k−2) =

. . . = b + Nb + N2b + · · ·+ Nk−1b + Nk x (0) =

=∑k−1

j=0N jb + Nk x (0) =

(∑k−1

j=0N j)

b + Nk x (0)

Special case x (0) = 0:

x (k) =

(∑k−1

j=0N j)

b

⇒ x (k) ∈ span{b,Nb,N2b, . . . ,Nk−1b} = span{b,Ab,A2b, . . . ,Ak−1b}= Kk (A,b)

which is called the Krylov space to A and b.

For ‖N‖ < 1 holds:∑k−1

j=0N j →

∑∞

j=0N j = (I − N)−1 = (I − (I − A))−1 = A−1


page 6 of 1

Convergence Analysis via Neumann Series(cont.)

x (k) →(∑∞

j=0N j)

b = (I − N)−1b = A−1b = x

Richardson iteration is convergent for ‖N‖ < 1 or A ≈ I.

Error analysis for e(k) := x (k) − x :

e(k+1) = x (k+1) − x = Φ(x (k))− Φ(x) = (b + Nx (k))− (b + Nx) =

= N(x (k) − x) = Ne(k)

‖e(k)‖ ≤ ‖N‖‖e(k−1)‖ ≤ ‖N‖2‖e(k−2)‖ ≤ · · · ≤ ‖N‖k‖e(0)‖

‖N‖ < 1⇒ ‖N‖k k→∞−→ 0⇒ ‖e(k)‖ k→∞−→ 0

• Convergence, if ρ(N) = ρ(I − A) < 1, where ρ is spectral radius

ρ(N) = |λmax| = maxi

(|λi |) (λi is eigenvalue of N)

• Eigenvalues of A have to be all in circle around 1 with radius 1.


page 7 of 1


x (k) →(∑∞

j=0N j)

b = (I − N)−1b = A−1b = x

Richardson iteration is convergent for ‖N‖ < 1 or A ≈ I.Error analysis for e(k) := x (k) − x :


= N(x (k) − x) = Ne(k)

‖e(k)‖ ≤ ‖N‖‖e(k−1)‖ ≤ ‖N‖2‖e(k−2)‖ ≤ · · · ≤ ‖N‖k‖e(0)‖

‖N‖ < 1⇒ ‖N‖k k→∞−→ 0⇒ ‖e(k)‖ k→∞−→ 0




• Eigenvalues of A have to be all in circle around 1 with radius 1.


page 7 of 1


x (k) →(∑∞

j=0N j)

b = (I − N)−1b = A−1b = x

Richardson iteration is convergent for ‖N‖ < 1 or A ≈ I.Error analysis for e(k) := x (k) − x :


= N(x (k) − x) = Ne(k)

‖e(k)‖ ≤ ‖N‖‖e(k−1)‖ ≤ ‖N‖2‖e(k−2)‖ ≤ · · · ≤ ‖N‖k‖e(0)‖

‖N‖ < 1⇒ ‖N‖k k→∞−→ 0⇒ ‖e(k)‖ k→∞−→ 0




• Eigenvalues of A have to be all in circle around 1 with radius 1.Parallel Numerics, WT 2016/2017 5 Iterative Methods for Sparse Linear Systems of Equations

page 7 of 1

Splittings of A• Convergence of Richardson only in very special cases!

Try to improve the iteration for better convergence!• Write A in form A := M − N

b = Ax = (M − N)x = Mx − Nx ⇔ x = M−1b + M−1Nx = Φ(x)

Φ(x) = M−1b + M−1Nx = M−1b + M−1(M − A)x =

= M−1(b − Ax) + x = x + M−1r(x)

• N should be such that Ny can be evaluated efficiently.• M should be such that M−1y can be evaluated efficiently.

x (k+1) = x (k) + M−1r (k)

• Iteration with splitting M − N is equivalent to Richardson on

M−1Ax = M−1b


page 8 of 1

Convergence

• Iteration with splitting A = M − N is convergent if

ρ(M−1N) = ρ(I −M−1A) < 1

• For fast convergence it should hold

– M−1A ≈ I– M−1A should be better conditioned than A itself

• Such a matrix M is called a preconditioner for A.Is used in other iterative methods to accelerate convergence.

• Condition number:

κ(A) = ‖A−1‖‖A‖,∣∣∣∣λmax

λmin

∣∣∣∣ , orσmax

σmin


page 9 of 1

5.1.2. Jacobi (Diagonal) Splitting

Choose A = M − N = D − (L + U) withD = diag(A)

L the lower triangular part of A, and

U the upper triangular part.

x (k+1) = D−1b + D−1(L + U)x (k) =

= D−1b + D−1(D − A)x (k) = x (k) + D−1r (k)

Convergent for A ≈ diag(A) or diagonal dominant matrices:

ρ(D−1N) = ρ(I − D−1A) < 1


page 10 of 1

5.1.3. Jacobi (Diagonal) Splitting

Choose A = M − N = D − (L + U) withD = diag(A)

L the lower triangular part of A, and

U the upper triangular part.

x (k+1) = D−1b + D−1(L + U)x (k) =

= D−1b + D−1(D − A)x (k) = x (k) + D−1r (k)

Convergent for A ≈ diag(A) or diagonal dominant matrices:

ρ(D−1N) = ρ(I − D−1A) < 1


page 10 of 1

Jacobi (Diagonal) Splitting (cont.)Iteration process written elementwise:

x (k+1) = D−1(b − (A−D)x (k))⇒ x (k+1)j =

1ajj

bj −n∑

m=1,m 6=j

aj,mx (k)m

ajjx

(k+1)j = bj −

∑j−1

m=1aj,mx (k)

m −∑n

m=j+1aj,mx (k)

m

• Damping or relaxation for improving convergence• Idea: Iterative method as correction of last iterate in search

direction.• Introduce step length for this correction step:

x (k+1) = x (k) + D−1r (k) → x (k+1) = x (k) + ωD−1r (k)

with additional damping parameter ω.• Damped Jacobi iteration:

x (k+1)damped = (ω + 1− ω)x (k) + ωD−1r (k) = ωx (k+1) + (1− ω)x (k)


page 11 of 1


x (k+1) = D−1(b − (A−D)x (k))⇒ x (k+1)j =

1ajj

bj −n∑

m=1,m 6=j

aj,mx (k)m

ajjx

(k+1)j = bj −

∑j−1

m=1aj,mx (k)

m −∑n

m=j+1aj,mx (k)

m


direction.

• Introduce step length for this correction step:





page 11 of 1


x (k+1) = D−1(b − (A−D)x (k))⇒ x (k+1)j =

1ajj

bj −n∑

m=1,m 6=j

aj,mx (k)m

ajjx

(k+1)j = bj −

∑j−1

m=1aj,mx (k)

m −∑n

m=j+1aj,mx (k)

m


direction.• Introduce step length for this correction step:





page 11 of 1

Damped Jacobi Iteration

x (k+1) = x (k) + ωD−1r (k) = x (k) + ωD−1(b − Ax (k)) =

= . . .

= ωD−1b + [(1− ω)I + ωD−1(L + U)]x (k)

is convergent for

ρ([(1− ω)I + ωD−1(L + U)]︸︷︷︸ω→0−→ I

) < 1

Look for optimal ω with best convergence (add. degree of freedom).


page 12 of 1

Parallelism in the Jacobi Iteration

• Jacobi method is easy to parallelize: only Ax and D−1x .

• But often too slow convergence!

• Improvement: block Jacobi iteration


page 13 of 1

5.1.4. Gauss-Seidel Iteration

Always use newest information available!

Jacobi iteration:

ajjx(k+1)j = bj −

j−1∑m=1

aj,m x (k)m︸︷︷︸

already computed

−n∑

m=j+1

aj,mx (k)m

Gauss-Seidel iteration:

ajjx(k+1)j = bj −

j−1∑m=1

aj,m x (k+1)m︸︷︷︸

already computed

−n∑

m=j+1

aj,mx (k)m


page 14 of 1

Gauss-Seidel Iteration (cont.)

• Compare dependancy graphs for general iterative algorithms.Here:

x = f (x) = D−1(b + (D − A)x) = D−1(b − (L + U)x)

to splitting A = (D − L)− U = M − N

x (k+1) = (D − L)−1b + (D − L)−1Ux (k) =

= (D − L)−1b + (D − L)−1(D − L− A)x (k) =

= x (k) + (D − L)−1r (k)

• Convergence depends on spectral radius ρ(I − (D − L)−1A) < 1


page 15 of 1

Parallelism in the Gauss-Seidel Iteration

• Linear system in D − L is easy to solve because D − L is lowertriangular but

• strongly sequential!

• Use red-black ordering or graph colouring for compromise:

parallel↔ convergence


page 16 of 1

Successive Over Relaxation (SOR)

• Damping or relaxation:

x (k+1) = x (k)+ω(D−L)−1r (k) = ω(D−L)−1b+[(1−ω)+ω(D−L)−1U]x (k)

• Convergence depends on spectral radius of iteration matrix

(1− ω) + ω(D − L)−1U

• Parallelization of SOR == parallelization of GS


page 17 of 1

Stationary Methods (in General)

• Can always be written in the two normal forms

x (k+1) = c + Bx (k)

with convergence depending on ρ(B) of iteration matrix and

x (k+1) = x (k) + Fr (k)

with preconditioner F , B = I − FA

• For x (0) = 0:x (k+1) ⊆ Kk (B, c),

which is the Krylov space with respect to matrix B and vector c.

• Slow convergence (but good smoothing properties!→ multigrid)


page 18 of 1

MATLAB Example

• clear; n=100;k=10;omega=1;stationary

• tridiag(−.5,1,−.5):

– Jacobi norm | cos(π/n)|– GS norm cos(π/n)2

– both < 1→ convergence, but slow

• To improve convergence→ nonstationary methods (or multigrid)


page 19 of 1

Chair of Informatics V—SCCSEfficient Numerical Algorithms—Parallel & HPC

• High-dimensional numerics (sparse grids)• Fast iterative solvers (multi-level methods,

preconditioners, eigenvalue solvers)• Uncertainty Quantification• Space-filling curves• Numerical linear algebra• Numerical algorithms for image processing• HW-aware numerical programming

Fields of application in simulation

• CFD (incl. fluid-structure interaction)• Plasma physics• Molecular dynamics• Quantum chemistry

Further info→ www5.in.tum.deFeel free to come around and ask for thesis topics!


page 20 of 1

5.2. Nonstationary Methods5.2.1. Gradient Method

• Consider A = AT > 0 (A SPD)

Function Φ(x) =12

xT Ax − bT x

• n-dim. paraboloid Rn → R• Gradient ∇Φ(x) = Ax − b

• Position with ∇Φ(x) = 0 is exactlyminimum of paraboloid

• Instead of solving Ax = b considerminx Φ(x)

• Local descent direction in y :∇Φ(x) · y is minimum for

y = −∇Φ(x)


page 21 of 1

5.3. Nonstationary Methods5.3.1. Gradient Method

• Consider A = AT > 0 (A SPD)

Function Φ(x) =12

xT Ax − bT x

• n-dim. paraboloid Rn → R• Gradient ∇Φ(x) = Ax − b

• Position with ∇Φ(x) = 0 is exactlyminimum of paraboloid

• Instead of solving Ax = b considerminx Φ(x)

• Local descent direction in y :∇Φ(x) · y is minimum for

y = −∇Φ(x)


page 21 of 1

Gradient Method (cont.)

• Optimization: start with x (0)

x (k+1) := x (k) + αk d (k)

with search direction d (k) and step size αk .

• In view of previous results the optimal (local) search direction is

−∇Φ(x (k)) =: d (k)

• To define αk :

minα

g(α) := minα

(Φ(x (k) + α(b − Ax (k))))

= minα

(12

(x (k) + αd (k))T A(x (k) + αd (k))− bT (x (k) + αd (k))

)= min

α

(12α2d (k)T

Ad (k) − αd (k)Td (k) +

12

x (k)TAx (k) − x (k)T

b)

αk = d (k)T d (k)

d (k)T Ad (k)d (k) = −∇Φ(x (k)) = b − Ax (k)


page 22 of 1

Gradient Method (cont. 2)

x (k+1) = x (k) +‖b − Ax (k)‖2

2

(b − Ax (k))T A(b − Ax (k))(b − Ax (k))

• Method of steepest descent.

• Disadvantage: Distortedcontour lines.

• Slow convergence (zig zagpath)

• Local descent direction is notglobally optimal


page 23 of 1

Analysis of the Gradient Method

• Definition A-norm:‖x‖A :=

√xT Ax

• Consider error:

‖x − x‖2A = ‖x − A−1b‖2

A = (xT − bT A−1)A(x − A−1b)

= xT Ax − 2bT x + bT A−1b= 2Φ(x) + bT A−1b

• Therefore, minimizing Φ is equivalent to minimizing the error inthe A-norm! More detailed analysis reveals:

‖x (k+1) − x‖2A ≤

(1− 1

κ(A)

)· ‖x (k) − x‖2

A

• Therefore, for κ(A)� 1 very slow convergence!


page 24 of 1

5.3.2. The Conjugate Gradient Method

• Improving descent direction being globally optimal.

• x (k+1) := x (k) + αk p(k) with search direction not being negativegradient, but projection of gradient that is A-conjugate to allprevious search directions:

p(k) ⊥ Ap(j) for all j < k orp(k) ⊥A p(j) or

p(k)TAp(j) = 0 for j < k

• We choose new search direction as component of last residualthat is A-conjugate to all previous search directions.

• αk again by 1-dim. minimization as before (for chosen p(k))


page 25 of 1

5.3.3. The Conjugate Gradient Method

• Improving descent direction being globally optimal.

• x (k+1) := x (k) + αk p(k) with search direction not being negativegradient, but projection of gradient that is A-conjugate to allprevious search directions:

p(k) ⊥ Ap(j) for all j < k orp(k) ⊥A p(j) or

p(k)TAp(j) = 0 for j < k

• We choose new search direction as component of last residualthat is A-conjugate to all previous search directions.

• αk again by 1-dim. minimization as before (for chosen p(k))


page 25 of 1

The Conjugate Gradient Algorithm

p(0) = r (0) = b − Ax (0)

for k = 1,2, . . . do

α(k) = − 〈r(k),r (k)〉

〈p(k),Ap(k)〉

x (k+1) = x (k) − α(k)p(k)

r (k+1) = r (k) + α(k)Ap(k)

if ‖r (k+1)‖22 ≤ ε then break

β(k) = 〈r (k+1),r (k+1)〉〈r (k),r (k)〉

p(k+1) = r (k+1) + β(k)p(k)


page 26 of 1

Properties of Conjugate Gradients• It holds

p(j)TAp(k) = 0 = r (j)

Tr (k) for j 6= k

• and

span(p(1), . . . ,p(j)) = span(r (0), . . . , r (j−1)) =

= span(r (0),Ar (0), . . . ,Aj−1r (0)) = Kj (A, r (0))

• Especially for x (0) = 0 it holds

Kj (A, r (0)) = span(b,Ab, . . . ,Aj−1b)

• x (k) is best approximate solution to Ax = b in subspaceKk (A, r (0)). For x (0) = 0 : x (k) ∈ Kk (A,b)

• Error:‖x (k) − x‖A = min

x∈Kk (A,b)‖x − x‖A

• Cheap 1-dim. minimization gives optimal k -dim. solution for free!


page 27 of 1

Properties of Conjugate Gradients• It holds

p(j)TAp(k) = 0 = r (j)

Tr (k) for j 6= k

• and

span(p(1), . . . ,p(j)) = span(r (0), . . . , r (j−1)) =

= span(r (0),Ar (0), . . . ,Aj−1r (0)) = Kj (A, r (0))

• Especially for x (0) = 0 it holds

Kj (A, r (0)) = span(b,Ab, . . . ,Aj−1b)

• x (k) is best approximate solution to Ax = b in subspaceKk (A, r (0)). For x (0) = 0 : x (k) ∈ Kk (A,b)

• Error:‖x (k) − x‖A = min

x∈Kk (A,b)‖x − x‖A

• Cheap 1-dim. minimization gives optimal k -dim. solution for free!Parallel Numerics, WT 2016/2017 5 Iterative Methods for Sparse Linear Systems of Equations

page 27 of 1

Properties of Conjugate Gradients (cont.)

• Consequence: After n steps Kn(A,b) = Rn and thereforex (n) = A−1b is solution in exact arithmetic.

• Unfortunately, this is not true in floating point arithmetic.

• Furthermore, O(n) iteration steps would be too costly:costs: #iterations ∗ matrix-vector-product

• Matrix-vector-product easy in parallel.

• But, how to get fast convergence and reduce #iterations?


page 28 of 1

Error Estimation (x (0) = 0)

‖e(k)‖A = ‖x (k) − x‖A = minx∈Kk (A,b)

‖x − x‖A =

= minα0,...,αk−1

∥∥∥∥∑k−1

j=0αj (Ajb)− x

∥∥∥∥A

=

= minP(k−1)(x)

∥∥∥P(k−1)(A)b − x∥∥∥

A=

= minP(k−1)(x)

∥∥∥P(k−1)(A)Ax − x∥∥∥

A=

= minP(k−1)(x)

∥∥∥(P(k−1)(A)A− I)(x − x (0))∥∥∥

A=

= minQ(k)(x),Q(k)(0)=1

∥∥∥Q(k)(A)e(0)∥∥∥

A

for polynomial Q(k)(x) of degree k with Q(k)(0) = 1.Parallel Numerics, WT 2016/2017 5 Iterative Methods for Sparse Linear Systems of Equations

page 29 of 1

Error Estimation

• Matrix A has orthonormal basis of eigenvectors uj , j = 1, . . . ,n,eigenvalues λj

• It holds

Auj = λjuj , j = 1, . . . ,n, and uTj uk = 0 for j 6= k and 1 for j = k

• Start error in ONB: e(0) =∑n

j=1ρjuj

‖e(k)‖A = minQ(k)(0)=1

∥∥∥∥∥∥Q(k)(A)n∑

j=1

ρjuj

∥∥∥∥∥∥A

= minQ(k)(0)=1

∥∥∥∥∥∥n∑

j=1

ρjQ(k)(A)uj

∥∥∥∥∥∥A

=

= minQ(k)(0)=1

∥∥∥∥∥∥n∑

j=1

ρjQ(k)(λj )uj

∥∥∥∥∥∥A

≤ minQ(k)(0)=1

{max

j=1,...,n

∣∣∣Q(k)(λj )∣∣∣}∥∥∥∥∥∥

n∑j=1

ρjuj

∥∥∥∥∥∥A

=

= minQ(k)(0)=1

{max

j=1,...,n|Q(k)(λj )|

}∥∥∥e(0)∥∥∥

A


page 30 of 1

Error Estimates

By choosing polynomials with Q(k)(0) = 1, we can derive errorestimates for the error after the k th step:

Choose, e.g. Q(k)(x) :=

∣∣∣∣1− 2λmax + λmin

x∣∣∣∣k

‖e(k)‖A ≤ maxj=1,...,n

|Q(k)(λj )|∥∥∥e(0)

∥∥∥A

=

∣∣∣∣1− 2λmax

λmax + λmin

∣∣∣∣k ‖e(0)‖A

=

(λmax − λmin

λmax + λmin

)k

‖e(0)‖A =

(κ(A)− 1κ(A) + 1

)k

‖e(0)‖A


page 31 of 1

Better Estimates

• Choose normalized Chebyshev polynomialsTn(x) = cos(n arccos(x))

‖e(k)‖A ≤1

Tk

(κ(A)+1κ(A)−1

)‖e(0)‖A ≤ 2

(√κ(A)− 1√κ(A) + 1

)k

‖e(0)‖A

• For clustered eigenvalues choose special polynomial, e.g.assume that A has only two eigenvalues λ1 and λ2:

Q(2)(x) :=(λ1 − x)(λ2 − x)

λ1λ2

‖e(2)‖A ≤ maxj=1,2

∣∣∣Q(2)(λj )∣∣∣ ‖e(0)‖A = 0

Convergence of CG after two steps!


page 32 of 1

Outliers/Cluster

Assume the matrix has an eigenvalue λ1 > 1 and all othereigenvalues are contained in an ε-neighborhood of 1:

∀λ 6= λ1 : |λ− 1| < ε

Q(2)(x) :=(λ1 − x)(1− x)

λ1

‖e(2)‖A ≤ max|λ−1|<ε

∣∣∣∣ (λ1 − λ)(1− λ)

λ1

∣∣∣∣ ‖e(0)‖A ≤(λ1 − 1 + ε)ε

λ1= O(ε)

Very good approximation of CG after only two steps!

Important: small number of outliers combined with cluster.


page 33 of 1

Conjugate Gradients Summary

• To get fast convergence and reduce the number of iterations:→ find preconditioner M, such that M−1Ax = M−1b withclustered eigenvalues.

• Conjugate gradients (CG) is always the method of choice forsymmetric positive definite A (in general).

• To improve convergence, include preconditioning (PCG).

• CG has two important properties: optimal and cheap.


page 34 of 1

Parallel Conjugate Gradients Algorithm


page 35 of 1

Non-Blocking Collective Operations

• Use to hide communication!

• Allows overlap of numerical computations and communcations.

• In MPI-1/MPI-2 only possible for point-to-point communication:MPI Isend and MPI Irecv.

Additional libraries necessary for collective operations!Example: LibNBC (non-blocking collectives)

• Are included in new MPI-3 standard.


page 36 of 1

Example: Pseudocode for Nonblock. Reduct.MPI_Request req;

int sbuf1[SIZE], rbuf1[SIZE], buf2[SIZE];

// compute sbuf1

compute(sbuf1, SIZE);

// start non-blocking allreduce of subf1

// computation and communication overlap

MPI_Iallreduce(sbuf1,rbuf1,SIZE,MPI_INT,MPI_SUM,MPI_COMM, &req);

// compute buf2 (independent of buf1)

compute(buf2, SIZE);

// synchronization

MPI_WAIT(&req, &stat);

// use data in rbuf1; final computation

evaluate(rbuf1, buf2, SIZE);


page 37 of 1

Iter. Methods for general (nonsymmetric) A:BiCGSTAB

• Applicable if little memory at hand and AT not available.

• Computational costs per iteration similar to BiCG and CGS.

• Alternative for CGS that avoids irregular convergence patterns ofCGS maintaining similar convergence speed.

• Less loss of accuracy in the updated residual.


page 38 of 1

Iter. Methods for general (nonsymmetric) A:GMRES

• Leads to smallest residual for fixed number of iteration steps, butthese steps become increasingly expensive.

• To limit increasing storage requirments and work per iterationstep, restarting is necessary. When to do so depends on A andb; it requires skill and experience.

• Requires only matrix-vector products with the coeff. matrix.

• Number of inner products grows linearly with iteration number(up to restart point).

• Implementation based on Gram-Schmidt→ inner productsindependent→ only one synchronization point.Using modified Gram-Schmidt→ one synchronization point perinner product.

We consider GMRES in the following.


page 39 of 1

Iter. Methods for general (nonsymmetric) A:GMRES

• Leads to smallest residual for fixed number of iteration steps, butthese steps become increasingly expensive.

• To limit increasing storage requirments and work per iterationstep, restarting is necessary. When to do so depends on A andb; it requires skill and experience.

• Requires only matrix-vector products with the coeff. matrix.

• Number of inner products grows linearly with iteration number(up to restart point).

• Implementation based on Gram-Schmidt→ inner productsindependent→ only one synchronization point.Using modified Gram-Schmidt→ one synchronization point perinner product.

We consider GMRES in the following.Parallel Numerics, WT 2016/2017 5 Iterative Methods for Sparse Linear Systems of Equations

page 39 of 1

5.3.4. GMRES

• Iterative solution method for general A

• Consider small subspace Um and determine optimal approximatesolution for Ax = b in Um. Hence x is of the form x := Umy

minx∈Um‖Ax − b‖2 = min

y‖A(Umy)− b‖2

• Can be solved by normal equations: (UTmAT AUm)y = UT

mAT b

• Try to find sequence of ”good“ subspaces U1 → U2 → U3 → . . .such that iteratively we can update the optimal solutions

x1 → x2 → x3 → . . .→ A−1b

using mainly matrix-vector products.


page 40 of 1

5.3.5. GMRES

• Iterative solution method for general A

• Consider small subspace Um and determine optimal approximatesolution for Ax = b in Um. Hence x is of the form x := Umy

minx∈Um‖Ax − b‖2 = min

y‖A(Umy)− b‖2

• Can be solved by normal equations: (UTmAT AUm)y = UT

mAT b

• Try to find sequence of ”good“ subspaces U1 → U2 → U3 → . . .such that iteratively we can update the optimal solutions

x1 → x2 → x3 → . . .→ A−1b

using mainly matrix-vector products.


page 40 of 1

GMRES: Subspace

What subspace for fast convergence and easy computations?

Um := Km(A,b) = span(b,Ab, . . . ,Am−1b) (Krylov space)

Problem: b, Ab, A2b, . . . is bad basis for this subspace!

So we need a first step to compute a good basis for Um:

Start with u1 := b‖b‖2

and do for j = 2 : m:

uj := Auj−1 −j−1∑k=1

(uTk Auj−1)uk = Auj−1 −

j−1∑k=1

hk,j−1uk

uj :=uj

‖uj‖2=

uj

hj,j−1

which is the standard orthogonalization method (Arnoldi method)


page 41 of 1

Matrix Form of Arnoldi ONB

Um = span(b,Ab, . . . ,Am−1b) = span(u1,u2, . . . ,um) (ONB)

Write this orthogonalization method in matrix form

Auj−1 =

j−1∑k=1

hk,j−1uk + uj =

j∑k=1

hk,j−1uk

AUm = A(u1 . . . um) = (u1 . . . um+1)Hm+1,m = Um+1Hm+1,m

Hm+1,m =

h11 · · · · · · h1m

h21. . .

...

0. . . . . .

.... . . hmm

0 0 hm+1,m


page 42 of 1

GMRES: Minimization

This leads to minimization problem

minx∈Um‖Ax − b‖ = min

y‖AUmy − b‖

= miny

∥∥∥Um+1Hm+1,my − ‖b‖u1

∥∥∥= min

y

∥∥∥Um+1(Hm+1,my − ‖b‖e1)∥∥∥

= miny

∥∥∥Hm+1,my − ‖b‖e1

∥∥∥Because Um+1 is part of an orthogonal matrix.


page 43 of 1

parallel numerics, wt 2016/2017 - 5 iterative methods for ... › ... › parnum › ws16 ›...

Documents