support-graph preconditioning john r. gilbert mit and u. c. santa barbara coauthors: marshall bern,...
TRANSCRIPT
Support-Graph Support-Graph PreconditioningPreconditioning
John R. Gilbert
MIT and U. C. Santa Barbara
coauthors:
Marshall Bern, Bruce Hendrickson, Nhat Nguyen, Sivan Toledo
authors of other work surveyed:
Erik Boman, Doron Chen, Keith Gremban, Bruce Hendrickson, Gary Miller, Sivan Toledo, Pravin Vaidya, Marco Zagha
Systems of Linear Equations: Ax = bSystems of Linear Equations: Ax = b
• A is large & sparse• say n = 105 to 108, # nonzeros = O(n)
• Physical setting sometimes implies G(A) has separators• O(n1/2) in 2D, O(n2/3) in 3D
• Here: A is symmetric and positive (semi)definite
• Direct methods: A=LU
• Iterative methods: y(k+1) = Ay(k)
Graphs and Sparse MatricesGraphs and Sparse Matrices [Parter, … ]
10
13
2
4
5
6
7
8
9
10
13
2
4
5
6
7
8
9
G(A) G+(A)[chordal]
Symmetric Gaussian elimination:for j = 1 to n add edges between j’s higher-numbered neighbors
Fill: new nonzeros in factor
Elimination tree with nested dissection
Fill-reducing matrix permutationsFill-reducing matrix permutations
• Theory: approx optimal separators => approx optimal fill and flop count
• Orderings: nested dissection, minimum degree, hybrids
• Graph partitioning: spectral, geometric, multilevel
Vertex separator in graph of matrix
0 50 100
0
20
40
60
80
100
120
nz = 844
Matrix reordered by nested dissection
Conjugate gradient iterationConjugate gradient iteration
• One matrix-vector multiplication per iteration• Two vector dot products per iteration• Four n-vectors of working storage
x0 = 0, r0 = b, p0 = r0
for k = 1, 2, 3, . . .
αk = (rTk-1rk-1) / (pT
k-1Apk-1) step length
xk = xk-1 + αk pk-1 approx solution
rk = rk-1 – αk Apk-1 residual
βk = (rTk rk) / (rT
k-1rk-1) improvement
pk = rk + βk pk-1 search direction
Conjugate gradient: ConvergenceConjugate gradient: Convergence
• In exact arithmetic, CG converges in n steps (completely unrealistic!!)
• Accuracy after k steps of CG is related to:• consider polynomials of degree k that are equal to 1 at 0.• how small can such a polynomial be at all the eigenvalues of A?
• Thus, eigenvalues close together are good.
• Condition number: κ(A) = ||A||2 ||A-1||2 = λmax(A) / λmin(A)
• Residual is reduced by a constant factor by O(κ1/2(A)) iterations of CG.
PreconditionersPreconditioners
• Suppose you had a matrix B such that:
(1) condition number κ(B-1A) is small
(2) By = z is easy to solve
• Then you could solve (B-1A)x = B-1b instead of Ax = b(actually (B-1/2AB-1/2) B1/2 x = B-1/2 b, but never mind)
• B = A is great for (1), not for (2)• B = I is great for (2), not for (1)
Support Graph PreconditioningSupport Graph Preconditioning
+: New analytic tools, some new preconditioners
+: Can use existing direct-methods software
-: Current theory and techniques limited
• Define a preconditioner B for matrix A
• Explicitly compute the factorization B = LU
• Choose nonzero structure of B to make factoring cheap
(using combinatorial tools from direct methods)
• Prove bounds on condition number using both
algebraic and combinatorial tools
Spanning Tree Preconditioner Spanning Tree Preconditioner [Vaidya]
• A is symmetric positive definite with negative off-diagonal nzs
• B is a maximum-weight spanning tree for A (with diagonal modified to preserve row sums)
• factor B in O(n) space and O(n) time
• applying the preconditioner costs O(n) time per iteration
G(A) G(B)
Spanning Tree Preconditioner Spanning Tree Preconditioner [Vaidya]
• support each edge of A by a path in B
• dilation(A edge) = length of supporting path in B
• congestion(B edge) = # of supported A edges
• p = max congestion, q = max dilation
• condition number κ(B-1A) bounded by p·q (at most O(n2))
G(A) G(B)
Spanning Tree Preconditioner Spanning Tree Preconditioner [Vaidya]
• can improve congestion and dilation by adding a few
strategically chosen edges to B
• cost of factor+solve is O(n1.75), or O(n1.2) if A is planar
• in experiments by Chen & Toledo, often better than
drop-tolerance MIC for 2D problems, but not for 3D.
G(A) G(B)
Support Graphs Support Graphs [after Gremban/Miller/Zagha]
Intuition from resistive networks:
How much must you amplify B to provide the conductivity of A?
• The support of B for A is
σ(A, B) = min{τ : xT(tB – A)x 0 for all x, all t τ}
• In the SPD case, σ(A, B) = max{λ : Ax = λBx} = λmax(A, B)
• Theorem:
If A, B are SPD then κ(B-1A) = σ(A, B) · σ(B, A)
Splitting and Congestion/Dilation LemmasSplitting and Congestion/Dilation Lemmas
• Split A = A1+ A2 + ··· + Ak and B = B1+ B2 + ··· + Bk
• Ai and Bi are positive semidefinite
• Typically they correspond to pieces of the graphs of A and B
(edge, path, small subgraph)
• Lemma: σ(A, B) maxi {σ(Ai , Bi)}
• Lemma: σ(edge, path) (worst weight ratio) · (path length)
• In the MST case:
• Ai is an edge and Bi is a path, to give σ(A, B) p·q
• Bi is an edge and Ai is the same edge, to give σ(B, A) 1
Support-graph analysis of modified incomplete CholeskySupport-graph analysis of modified incomplete Cholesky
• B has positive (dotted) edges that cancel fill• B has same row sums as A
Strategy: Use the negative edges of B to support both the negative edges of A and the positive edges of B.
-1 -1 -1
-1 -1 -1
-1 -1 -1
-1 -1 -1
-1-1 -1-1
-1-1 -1-1
-1-1 -1-1
A
A = 2D model Poisson problem
.5 .5 .5
.5 .5 .5
.5 .5 .5
-1 -1 -1
-1 -1 -1
-1 -1 -1
-1 -1 -1
-1-1 -1-1
-1-1 -1-1
-1-1 -1-1
B
B = MIC preconditioner for A
Supporting positive edges of BSupporting positive edges of B
• Every dotted (positive) edge in B is supported by two paths in B
• Each solid edge of B supports one or two dotted edges
• Tune fractions to support each dotted edge exactly
• 1/(2n – 2) of each solid edge is left over to support an edge of A
Analysis of MIC: SummaryAnalysis of MIC: Summary
• Each edge of A is supported by the leftover 1/(2n – 2) fraction of the same edge of B.
• Therefore σ(A, B) 2n – 2
• Easy to show σ(B, A) 1
• For this 2D model problem, condition number is O(n1/2)
• Similar argument in 3D gives condition number O(n1/3) or O(n2/3) (depending on boundary conditions)
Support-graph analysis for Support-graph analysis for better preconditioners?better preconditioners?
• For model problems, the preconditioners analyzed so far are not as efficient as multigrid, domain decomposition, etc.
• Gremban/Miller: a hierarchical support-graph preconditioner, but condition number still not polylogarithmic.
• We analyze a multilevel-diagonal-scaling-like preconditioner in 1D . . .
• . . . but we haven’t proved tight bounds in higher dimensions.
Hierarchical preconditioner (1D model problem) Hierarchical preconditioner (1D model problem)
• Good support in both directions: σ(A, B) = σ(B, A) = O(1)
• But, B is a mesh => expensive to factor and to apply
A
B
-1 -1 -1 -1 -1 -1
-2-1-1
Hierarchical preconditioner continued Hierarchical preconditioner continued
• Drop fill in factor of B (i.e. add positive edges) =>
factoring and preconditioning are cheap
• But B cannot support both A and its own positive edges;
σ(A, B) is infinite
A
B
-1 -1 -1 -1 -1 -1
-2 -1-1.5 .5
Hierarchical preconditioner continued Hierarchical preconditioner continued
• Solution: add a coarse mesh to support positive edges
• Now σ(A, B) log2(n+1)
• Elimination/splitting analysis gives σ(B, A) = 1
• Therefore condition number = O(log n)
A
B
-1 -1 -1 -1 -1 -1
-2 -1-1.5 .5
-.5-.5-1
Generalization to higher dimensions?Generalization to higher dimensions?
• Idea: mimic the 1D construction• Generate a coarsening hierarchy with overlapping subdomains• Use σ(B, A) = 1 as a constraint to choose weights• Defines a preconditioner for regular or irregular problems
• Sublinear bounds on σ(A, B) for some 2D model problems.
• (Very preliminary) experiments show slow growth in κ(B-1A)
Model problem on triangular mesh
n 45 153 561 2145 4753
ICC(0): iterations 8 12 19 33 45
New: iterations 14 15 15 17 20
Algebraic framework Algebraic framework
• The support of B for A is
σ(A, B) = min{τ : xT(tB – A)x 0 for all x, all t τ}
• In the SPD case, σ(A, B) = max{λ : Ax = λBx} = λmax(A, B)
• If A, B are SPD then κ(B-1A) = σ(A, B) · σ(B, A)
• [Boman/Hendrickson] If V·W=U, then σ(U·UT, V·VT) ||W||22
Algebraic framework Algebraic framework [Boman/Hendrickson]
Lemma: If V·W=U, then σ(U·UT, V·VT) ||W||22
Proof:
• take t ||W||22 = λmax(W·WT) = max x0 { xTW·WTx / xTx }
• then xT (tI - W·WT) x 0 for all x
• letting x = VTy gives yT (tV·VT - U·UT) y 0 for all y
• recall σ(A, B) = min{τ : xT(tB – A)x 0 for all x, all t τ}
• thus σ(U·UT, V·VT) t
A B
-a2
-b2
-a2 -c2
-b2
[ ]a2 +b2 -a2 -b2
-a2 a2 +c2 -c2
-b2 -c2 b2 +c2 [ ]a2 +b2 -a2 -b2
-a2 a2 -b2 b2
[ ] a b
-a c -b -c
[ ] a b
-a c -b
U V
=VVT=UUT
[ ] 1 -c/a
1 c/b/b
W
= x
σ(A, B) ||W||22 ||W|| x ||W||1
= (max row sum) x (max col sum)
(max congestion) x (max dilation)
Open problems IOpen problems I
• Other subgraph constructions for better bounds on ||W||22 ?
• For example [Boman],
||W||22 ||W||F2 = sum(wij
2) = sum of (weighted) dilations,
and [Alon, Karp, Peleg, West] show there exists a spanning tree with average weighted dilation exp(O((log n loglog n)1/2)) = o(n );
this gives condition number O(n1+) and solution time O(n1.5+),
compared to Vaidya O(n1.75) with augmented spanning tree
• Is there a construction that minimizes ||W||22 directly?
Open problems IIOpen problems II
• Make spanning tree methods more effective in 3D?• Vaidya gives O(n1.75) in general, O(n1.2) in 2D• Issue: 2D uses bounded excluded minors, not just separators
• Spanning tree methods for more general matrices?
• All SPD matrices? ([Boman, Chen, Hendrickson, Toledo]: different matroid for all diagonally dominant SPD matrices)
• Finite element problems? ([Boman]: Element-by-element preconditioner for bilinear quadrilateral elements)
• Analyze a multilevel method in general?
Complexity of linear solversComplexity of linear solvers
2D 3DSparse Cholesky: O(n1.5 ) O(n2 )
CG, exact arithmetic: O(n2 ) O(n2 )
CG, no precond: O(n1.5 ) O(n1.33 )
CG, modified IC: O(n1.25 ) O(n1.17 )
CG, support trees: O(n1.20 ) O(n1.5+ )
Multigrid: O(n) O(n)
n1/2 n1/3
Time to solve model problem (Poisson’s equation) on regular mesh
ReferencesReferences
• M. Bern, J. Gilbert, B. Hendrickson, N. Nguyen, S. Toledo. Support-graph preconditioners. (Submitted for publication, 2001.) ftp://parcftp.xerox.com/gilbert/support-graph.ps
• K. Gremban, G. Miller, M. Zagha. Performance evaluation of a parallel preconditioner. IPPS 1995.
• D. Chen, S. Toledo. Implementation and evaluation of Vaidya’s preconditioners. Preconditioning 2001. (Submitted for publication, 2001.)
• E. Boman, B. Hendrickson. Support theory for preconditioning. (Submitted for publication, 2001.)
• E. Boman, D. Chen, B. Hendrickson, S. Toledo. Maximum-weight-basis preconditioners. (Submitted for publication, 2001.)
• Bruce Hendrickson’s support theory web page: http://www.cs.sandia.gov/~bahendr/support.html