AA220/CS238
Parallel Methods in Numerical Analysis
Introduction to Sparse Direct Solver
(Symmetric Positive Definite Systems)
Kincho H. Law
Professor of Civil and Environmental Engineering
Stanford University
Email: [email protected]
May 13, 2005
User Interface (UI)
(Mesh Gen., B.C., etc..)
Element Library
Matrix, RHS
Formation
and Assembly
Linear Solver
(Direct, Iterative)
Post Processing
Element
Characteristics
Nonlin
ear
and/o
r A
daptive S
olv
er
Direct Solution Scheme
A Typical Finite Element Program
Ku p=
TLDL u p=1.
Ly p=2.
TDL u y=3.
K: Symmetric +ve Definite
L: Lower Triangular
Matrix
D: Diagonal Matrix
Motivation Example: Bandwidth Minimization
Reverse Cuthill McKee Algorithm
Motivation Example: Sparsity within Profile
Reverse Cuthill McKee Algorithm
.
...
Motivation Example: Sparse Matrix Ordering
Minimum Degree Ordering
Introductory Example312
1 2
0.40.2
0.20.4
=
3
2
1
3
2
1
000.8000.2
000.8000.2
000.2000.2000.8
y
y
y
x
x
x Stiffness
Matrix
1 1
2 2
3 3
1.000 8.000 2.000 2.000
0.250 1.000 7.500 0.500
0.250 1.000 0.500 7.500
x y
x y
x y
= Eq. 2 – 2/8 x Eq. 1
Eq. 3 – 2/8 x Eq. 1
=
3
2
1
3
2
1
467.7
500.0500.7
000.2000.2000.8
000.1067.0250.0
000.1250.0
000.1
y
y
y
x
x
x
Eq. 3 – (-0.5/7.5) x Eq. 2
1 1
2 2
3 3
1.000 8.000 2.000 2.000
0.250 1.000 7.500 0.500
0.250 1.000 0.500 7.500
x y
x y
x y
=
=
3
2
1
3
2
1
000.8000.2
000.8000.2
000.2000.2000.8
y
y
y
x
x
x
0 of Graph AG
1
2
3
1 of Graph AG
Graph G of A1 is constructed as follows:
1. Deleting node X1 and its incident edges
2. Adding edges to the graph so that the adjacent nodes of X1 are
now pairwise adjacent in graph G
1
2
3
[ ]000.2000.2250.0
250.0
000.8000.0
000.0000.8
Symbolic Representation of Symmetric Matrices
and Gaussian Elimination
=
3
2
1
3
2
1
000.8000.2
000.8000.2
000.2000.2000.8
y
y
y
x
x
x
1 1
2 2
3 3
1.000 8.000 2.000 2.000
0.250 1.000 7.500 0.500
0.250 1.000 0.500 7.500
x y
x y
x y
=
0 of Graph AG
1 of Graph AG
1
2
3
1
2
3
Symbolic Representation of Gaussian Elimination
=
3
2
1
3
2
1
467.7
500.0500.7
000.2000.2000.8
000.1067.0250.0
000.1250.0
000.1
y
y
y
x
x
x
1
2
3
2 of Graph AG
)067.0)(500.0(500.7
=
3
2
1
3
2
1
467.7
500.0500.7
000.2000.2000.8
000.1067.0250.0
000.1250.0
000.1
y
y
y
x
x
x
=
3
2
1
3
2
1
000.1
067.0000.1
250.0250.0000.1
467.7
500.7
000.8
000.1067.0250.0
000.1250.0
000.1
y
y
y
x
x
x
=
3
2
1
3
2
1
000.8000.2
000.8000.2
000.2000.2000.8
y
y
y
x
x
x
Summary of Results
Given a symmetric system of equations Ax = y
Numerical factorization of A into LU
For symmetric system, A can be factored as LDLT
1
2
3
Graph of A
1
2
3
Graph of L+U
3124 5 6
=
6
5
4
3
2
1
6
5
4
3
2
1
8.00002.00002.0000
0000.28.00002.0000
0000.28.0000
8.00000000.2
0000.28.00002.0000
2.00002.00008.0000
y
y
y
y
y
y
x
x
x
x
x
x
1.0000 0.26670.01790.2667
1.00000.2500
1.0000
1.00000.0667-0.2500
1.00000.2500
1.0000
6.9310
7.5000
8.0000
7.4667
7.5000
8.0000
1.0000
2667.01.0000
2500.01.0000
0179.01.0000
2667.00667.01.0000
0.25000.25001.0000
Another Example
Ax = y
A = LDLT
1 2
0.40.2
0.20.4Stiffness
Matrix
Not all entries within profile (band) are zero!
XXX
XXX
XX
XX
XXX
XXX
1 2 3 4 5 6
XXX
XXX
XX
X
XX
X
1 2 3 4 5 6
=
6
5
4
3
2
1
6
5
4
3
2
1
8.00002.00002.0000
0000.28.00002.0000
0000.28.0000
8.00000000.2
0000.28.00002.0000
2.00002.00008.0000
y
y
y
y
y
y
x
x
x
x
x
x
Graph Representation of Gaussian Elimination
Graph of A
Graph after elimination of node 1
XX
XXX
XX
X
X
X
1 2 3 4 5 6
XX
XXX
XX
X
X
X
1 2 3 4 5 6
XX
XX
X
X
X
X
1 2 3 4 5 6
X
X
X
X
X
X
1 2 3 4 5 6
Graph after elimination of node 2
Graph after elimination of node 3
Graph after elimination of node 4
Graph after elimination of node 5
XXX
XXX
XX
XX
XXX
XXX
8.00002.00002.0000
0000.28.00002.0000
0000.28.0000
8.00000000.2
0000.28.00002.0000
2.00002.00008.0000
1 2 3 4 5 6
Matrix A
Nonzero Structure of Matrix A
Graph of Matrix A
1.0000 0.26670.01790.2667
1.00000.2500
1.0000
1.00000.0667-0.2500
1.00000.2500
1.0000
1 2 3 4 5 6
XXX
XXX
XX
XX
XXX
XXX
Matrix Factor L
Nonzero Structure of Matrix L+LT
Graph of Matrix L+LT
Matrix Numbering as Graph Ordering
Minimum Degree Ordering Scheme (A Greedy Strategy)
1. Select a node with a minimum degree from the graph and label the
node
2. Eliminate the node and perform graph transformation by adding fill-in
edges if necessary
3. Repeat steps 1 and 2 until all nodes are labelled
6541 2 3
1 2 3 4 5 6
XX
XXX
XXX
XXX
XXX
XX
Purpose: Minimize the number of “fill-in” nonzero entries in the
matrix factor L
e a
bd
f
g c gXXX
fX
eX
dX
cX
bX
a
1 a
bd
f
g c
2
bd
f
g c
3d
f
g c
d
4
g c
d
5 c
6
c7
7
6
5
4
3
2
1
XX
XXX
XX
X
Original Graph and
Matrix Structure
1 2
36
4
5 7
Reordered Graph
and Matrix Structure
Summary of Numerical Factorization for
Sparse Symmetric Matrices
Given a symmetric matrix A, compute the matrix factor
L and D such that A = LDLT
1. Order the matrix A such that it has a desirable
structure
2. Symbolic Factorization to determine the structure of L
3. Numerically factorize the matrix A into L and D
utilizing the nonzero structure of L
4. Forward and backward solutions: Lz = y; DL x = zT
Focus of Discussion:
• Graph and Tree Representation of the
Sparse Factor L
=su
uMA
T
+===
dwDwLDw
wDLLDLwL
d
D
w
L
su
uMA
M
TT
MM
T
MM
T
MMM
T
MM
T
M
T
11
Direct (Cholesky) Factorization
T
MMMLDLM ]][][[][ =
1.000
0.250 1.000
[ ] 0.250 -0.067 1.000
1.000
0.250 1.000
ML =
8.000
7.500
[ ] 7.467
8.000
7.500
MD =
Suppose
8.00002.00002.0000
0000.28.00002.0000
0000.28.0000
8.00000000.2
0000.28.00002.0000
2.00002.00008.0000
The problem is to compute w and d
{ } { } { } { }uLDwwDLuMMMM
11][][]][[ ==
}]{[}{}]{[}{ wDwsddwDwsM
T
M
T=+=
{ }
11/8.000 1.000
1/7.500 0.250 1.000 2.000 0.267
1/7.467 0.250 -0.067 1.000 0.018
1/8.000 1.000
1/7.500 0.250 1.000 2.000 0.267
w = =
8.000
0.267 7.500 0.267
8.000 6.9310.018 7.467 0.018
8.000
0.267 7.500 0.267
T
d = =
Structure of {w} is related to forward solve
Define
PARENT(j) = min {i | L(i,j) 0}
Note: The list array PARENT represents the row
subscript of the first nonzero entry in each column of
the lower triangular matrix factor L
Lemma: If A(i,j) (or L(i,j)) 0, then for each
k = PARENT( PARENT …. (PARENT(j) )
L(i,k) 0, where k < i.
That is, given the tree T(A) and the nonzero entries of
A, we can obtain the nonzero entries per each row of
the matrix factor L by
tracing the path along the tree from the
nonzero offdiagonal column subscript of A to
the row number of interest
XXX
XXX
XX
XX
XXX
XXX
6
3
2
1
5
4
Tree of Matrix Factor L
Definition for a Tree Structure of Matrix Factor
Matrix Structure of L+LT
(Law and Fenves 81, 86; Liu 86, 88; Schreiber 82)
ALGORITHM: ROW_STRUCTURE
/* Determine the data structure for row i of matrix factor L */
BEGIN
Sort the column subscripts of the nonzero entries in ascending
order and store them in a linked list array LIST;
j = HEAD of LIST;
WHILE j 0 DO
BEGIN
IF LIST(j) 0 THEN next = LIST(j)
ELSE next = i;
r = j;
WHILE 0 < r < next DO
BEGIN
add subscript r to row i of L;
r = PARENT(r);
ENDWHILE;
IF r 0 and r < i THEN sort r to LIST;
j = LIST(j);
ENDWHILE;
END.
ALGORITHM: ROW_STRUCTURE
/* Determine data structure for row i of matrix factor L */
BEGIN
Sort the column subscripts of the nonzero entries
in ascending order and store them in a linked list
array LIST;
j = HEAD of LIST;
WHILE j 0 DO
BEGIN
IF LIST(j) 0 THEN next = LIST(j)
ELSE next = i;
r = j;
WHILE 0 < r < next DO
BEGIN
add subscript r to row i of L;
r = PARENT(r);
ENDWHILE;
IF r 0 and r < i THEN sort r to LIST;
j = LIST(j);
ENDWHILE;
END.
XXX
XXX
XX
XX
XXX
XXX
6
3
2
1
5
4
Tree of Matrix
Factor L
Matrix
Structure
Example: Row 6 of Matrix K
HEAD = 2; LIST:<0, 5, 0, 0, 6, 0>
j = 2
next = LIST(2) = 5;
r = 2;
add entry 2 to row 6 of L
(i.e. L(6,2) 0);
r = PARENT(2) = 3;
sort 3 to list;
(i.e. LIST:<0, 3, 5, 0, 6, 0>);
(j = LIST(2) = 3;)
ALGORITHM: ROW_STRUCTURE
/* Determine data structure for row i of matrix factor L */
BEGIN
Sort the column subscripts of the nonzero entries
in ascending order and store them in a linked list
array LIST;
j = HEAD of LIST;
WHILE j 0 DO
BEGIN
IF LIST(j) 0 THEN next = LIST(j)
ELSE next = i;
r = j;
WHILE 0 < r < next DO
BEGIN
add subscript r to row i of L;
r = PARENT(r);
ENDWHILE;
IF r 0 and r < i THEN sort r to LIST;
j = LIST(j);
ENDWHILE;
END.
XXX
XXX
XX
XX
XXX
XXX
6
3
2
1
5
4
Tree of Matrix
Factor L
Matrix
Structure
Example: Row 6 of Matrix K (cont’d)
(HEAD = 2; LIST:<0, 3, 5, 0, 6, 0>;)
(j = LIST(2) = 3;)
next = LIST(3) = 5;
r = 3;
add entry 3 to row 6 of L
(i.e. L(6,3) 0);
r = PARENT(3) = 6;
j = LIST(3) = 5
(LIST(5)=0) NEXT=6
r = 5;
add entry 5 to row 6 of L
(i.e. L(6,5) 0);
r = PARENT(5) = 6;
j = LIST(6) = 0.Nonzero entries denoted in the linked list LIST
ALGORITHM: TREE-STRUCTURE
/*Given the nonzero entries of A, determine the tree structure of
matrix A */
BEGIN
Initialize array PARENT to 0;
FOR each row i = 2 TO n, DO
FOR each nonzero entry A(i,j) of row i, DO
r = j;
WHILE ((PARENT(r) 0) AND (PARENT(r) i))
DO
r = PARENT(r);
ENDWHILE.
IF (PARENT(r) = 0) THEN PARENT(r) = i;
ENDFOR
ENDFOR.
END.
1
2
1
3
2
1
3
2
1
3
2
1
4
3
2
1
5
4
3
2
1
5
4
66
3
2
1
5
4
6
3
2
1
5
4
XXX
XXX
XX
XX
XXX
XXX
XXX
XXX
XX
XX
XXX
XXX
Matrix A
Matrix of L+LT
ALGORITHM: TREE-STRUCTURE (with tree compression) (Ref: Liu 86, 88)
/*Given the nonzero entries of A, determine the tree structure of matrix A */
BEGIN
Initialize arrays PARENT and ANCESTOR to 0;
FOR each row i = 2 TO n, DO
FOR each nonzero entry A(i,j) of row i, DO
r = j;
WHILE ((ANCESTOR(r) 0) AND (ANCESTOR(r) < i)) DO
r = ANCESTOR(r);
ANCESTOR(r) = i;
r = t;
ENDWHILE.
IF (ANCESTOR(r) = 0) THEN DO
ANCESTOR(r) = i;
PARENT(r) = i;
ENDIF.
ENDFOR
ENDFOR.
END.
1
10
8
7
65
4
3
2
24
262827
1817
15
16 13
14
11 12
9
3029
19
21
23 25
20
22
3635
3433
3231
36
36363636363623
22
24 20
26
27
28
29
30
31
32
33
34
35
8
13
14
3
4
9
10
15
16
1
2
5
6
11
12
2117
18
19
7
25
6
5
6
7
8
9
0
1
2
3
4
5
6
7
8
9
0
1
2
3
4
1
2
3
4
5
6
7
8
9
0
1
2
3
4
5
65 6 7 8 9 0 1 2 3 4 56 7 8 9 0 1 2 3 41 2 3 4 56 7 8 9 03 4 51 2
F F
Example model ordered using RCM (bandwidth minimization) algorithm
Note: Zero entries within the band!
Restructuring of Ordered Elimination Tree :
ALGORITHM: Binary-Tree Representation
/*Given the PARENT array */
BEGIN
FOR each node i, DO
IF (PARENT(I) 0) THEN DO
BEGIN
r = PARENT(r);
IF (CHILD(r) 0) THEN DO
BEGIN
SIBLING(r)=CHILD(r);
CHILD(r) = i;
ENDIF.
ELSE CHILD(r)=I;
ENDIF
ENDFOR.
END.
X X X
X X X
X
X X
X X X
X X X
6
4
2
1
5
3
SIBLING
CHILD6
4
2
1
5
3
PARENT
Restructuring of Ordered Elimination Tree : Post-Order Traversal
ALGORITHM: POST-ORDER(r,number)
/*Given the Binary Tree Representation */
/* Initially set r=n (root of T(K)) and number = 1 */
BEGIN
t=r;
IF (t 0) THEN DO
BEGIN
POST-ORDER(CHILD(t),number);
label node t = number;
number = number + 1;
POST-ORDER(SIBLING(t),number);
ENDIF
END.
X X X
X X X
X
X X
X X X
X X X
6
3
2
1
5
4
6
4
2
1
5
3
6
3
2
1
5
4
XXX
XXX
XX
XX
XXX
XXX
1
10
87
65
43
2
24262827
1817
15
16
13
14
11
12
9
3029
19212325
2022
3635
3433
3231
36
36363636363617
16
18 24
26
27
28
29
30
31
32
33
34
35
20
21
22
9
10
11
12
13
14
1
2
3
4
5
6
157
8
23
19
25
6
5
6
7
8
9
0
1
2
3
4
5
6
7
8
9
0
1
2
3
4
1
2
3
4
5
6
7
8
9
0
1
2
3
4
5
65 6 7 8 9 0 1 2 3 4 56 7 8 9 0 1 2 3 41 2 3 4 56 7 8 9 03 4 51 2
FF
Post-ordering of the elimination tree
• Preserve number of fill-in entries
• Reveal matrix partitioning
• Allow “block” data structure : principal submatrix, row segments
Summary of Numerical Factorization for
Sparse Symmetric Matrices
Given a symmetric matrix A, compute the matrix factor
L and D such that A = LDLT
1. Order the matrix A such that it has a desirable
structure
2. Symbolic Factorization to determine the structure of L
1. Given structure of A, determine the tree structure
T(A) of A
2. Given T(A) and structure of A, determine the
structure of L
3. Numerically factor the matrix A into L and D utilizing
the nonzero structure of L
User Interface (UI)
(Mesh Gen., B.C., etc..)
Element Library
Matrix, RHS
Formation
and Assembly
Linear Solver
(Direct, Iterative)
Post Processing
Element
Characteristics
Nonlin
ear
and/o
r A
daptive S
olv
er
Ordering the equations
Data structure for
system matrix
Profile solver
Sparse solver
Direct Solution Scheme
A Typical Finite Element Program
Performance of Sparse Linear Solver
0
2
4
6
8
10
12
14
16
18
0 5000 10000 15000 20000 25000
Number of Equations
Per
form
ance
(se
con
d) Profi le Sollver
Minimum Degrree
Multi level Nested Disssection
Neq MultiND MinD Profile
Square40 3354 0.191 0.286 0.425
Humboldt 1 5206 0.282 0.26 1.027
Humboldt 2 7294 0.61 0.788 5.126
Plate 100x20 12516 2.532 3.272 14.33
Square100 20394 2.518 5.081 15.82
(Ref: Jun Peng 2002)
Sparse Solvers Traditional Solver
Comparison of Different Linear SolversComparison of Different Linear Solvers
The Models: 1. Brick6x8x50; 2. Humboldt1; 3. Humboldt2; 4. Square100x100
0
5
10
15
20
25
30
1 2 3 4
4 Tested Models
Perf
orm
an
ce (
Seco
nd
)
SymSparse::MultiND SymSparse::MinD SymSparse::GenNDProfile SuperLU UmfPack
(Ref: Jun Peng 2002)
General Remarks
References:
• Law and Fenves, “A Node Addition Model for Symbolic Factorization,”
ACM TOMS, 12(1):37-50, 1986.
• Mackay, Law and Raefsky, “An Implementation of A Generalized
Sparse/Profile Finite Element Solution Method,” Computer and
Structure, 41(4):723-737, 1991.
• George and Liu, Computer Solution of Large Sparse Positive Definite
Systems, Prentice Hall, 1981.
• Duff, Erisman and Reid, Direct Methods for Sparse Matrices, Oxford
Science Publications, 1986.
Software Packages: Sparspak, YSMP, UMFPACK, SuperLU, etc
Next Lecture: Parallel Implementation of a Sparse Direct Solver