in copyright - non-commercial use permitted rights ...26758/eth... · parallel multigrid methods...

Research Collection

Doctoral Thesis

Parallel multigrid methods using sparse apprpoximate inverses

Author(s): Bröker, Oliver

Publication Date: 2003

Permanent Link: https://doi.org/10.3929/ethz-a-004617648

Rights / License: In Copyright - Non-Commercial Use Permitted

This page was generated automatically upon download from the ETH Zurich Research Collection. For moreinformation please consult the Terms of use.

ETH Library

https://doi.org/10.3929/ethz-a-004617648

http://rightsstatements.org/page/InC-NC/1.0/

https://www.research-collection.ethz.ch

https://www.research-collection.ethz.ch/terms-of-use

DISS. ETH NO. 15129

Parallel Multigrid Methodsusing

Sparse Approximate Inverses

A dissertation submitted to the

SWISS FEDERAL INSTITUTE OF TECHNOLOGY ZURICH

for the degree of

Dr. sc. ETH Zurich

presented by

OLIVER BROKER

Dipl. Inf., Universitat Bonn

born May 8, 1971 in Hannover, Germany

citizen of Germany

accepted on the recommendation of

Prof. Dr. Walter Gander, examinerProf. Dr. Marcus J. Grote, co-examiner

Dr. Klaus Stuben, co-examiner2003

AcknowledgmentsI very much enjoyed working on this thesis, not only because of my strong interest in scientificcomputing, but also because I had the chance to meet and work with numerous wonderfulpeople.

Very fortunate circumstances made Prof. Marcus J. Grote�

my advisor. His careful plan-ning, constant support and friendliness were really great. Thank you for all your help! I wouldlike to also thank Prof. Walter Gander

�

for giving me the opportunity to do my Ph.D. thesis inhis group. It is an honor for me to have Dr. Klaus Stuben

�

, one of the inventors of AMG, as aco-examiner.

Many thanks go the members of my group. It was extremely enjoyful to work with youall. I would like to mention Erwin Achermann, Peter Arbenz, Oscar Chinellato, Roman Geus,Leonhard Jaschke, and Rolf Strebel

�

. Roman not only proposed to me to come to Zurich, healso provided his PySparse library with exceptional support. Oscar has been especially helpfulin the last months of this work. Additionally I’d like to thank Anne Preisig for organizationalhelp and the ISG, especially Marc Schmitt, for computer support. Thanks also to AlexanderHouben, Thomas Tscherrig, and Ljiljana Vukelja for writing their Masters thesis’ with me.Anja Gerdes kindly created both covers with a lot of patience.

Parts of this work were carried out at Lawrence Livermore National Labs in Livermore,California. I would like to thank Edmond Chow for his kind invitation and support during mystay there. Michael Lambert and Markus Kowarschik are well remembered for a great time inLivermore.

I would like to thank Tanja Clees, Arno Krechel, and Kees Oosterlee, members of SCAIat the Fraunhofer Institute, Birlinghoven for their cooperation. Special thanks go to UlrichTrottenberg (and his family), Oliver McBryan (and his family), and Wolfgang Joppich forall the support very early in my studies. The group of Prof. Ulrich Rude at the Universityof Erlangen-Nurnberg and the group of Prof. Michael Griebel at the University of Bonn arethanked for their hospitality.

Coming to Zurich made my life rich and happy. Alexx Below, Edo Gerdes and Jean-Phillipe Escher are remembered for great times. Wolfram Schlickenrieder

�

, Gabriele Neyer,Karin Meier, and Frank Horn – you are unique Villa-housemates! Thanks go out to the Nose-cakes Bernhard Hoecker, Bastian Pastewka and Keirut Wenzel for keeping in touch. StephanHeuel

�

and Christian Schmelter have also been really great friends from a larger distance.Yet, my best friend of all times is Christina Marchand

�

. I am in love with you and ourdaughter Vivien.

Finally I would like to thank my parents Ruth and Wolfgang and my sister Julia for all theirencouragment, guidance, support and trust.

Oliver BrokerZurich, May 2003

Thank you, please come again!— Apu in “The Simpsons”

�

You have proofread parts of this thesis — Not only the author, but surely also the readersincerely thanks you for this!

i

Abstract

We consider combining sparse approximate inverses with geometric and algebraic multi-grid, see Ruge and Stuben (1987). The approximate inverses we use are based on theSPAI algorithm by Grote and Huckle (1997). For a given matrix

�, a sparse approximate

inverse � is computed by minimizing �� in the Frobenius norm. The minimiza-tion problem decouples into independent, local and small least squares procedures. Theapproach is promising, because it is naturally parallel and flexible.

SPAI hierarchy Various choices for the sparsity pattern of � yield a new hierarchy ofSPAI inverses:

SPAI-0, where � is diagonalSPAI-1r, where � has the sparsity pattern defined by the strong con-

nections from algebraic multigrid,SPAI-1, where � has the sparsity pattern of

�and

SPAI( � ), where the sparsity pattern of � is determined automaticallyby the SPAI algorithm.

Using SPAI inverses We propose three extensions to standard geometric and algebraicmultigrid to solve

��and make use of approximate inverses in the

Smoother The approximate inverses can be used as smoothers by using � � � �� asan error correction. The performance of SPAI-1 smoothing is comparable to thatof the Gauss-Seidel iteration, while SPAI( � ) may be used when the simpler SPAI-0and SPAI-1 smoothers fail. We show that the theoretical properties of the SPAI-0smoothers extend the ones obtained for damped Jacobi smoothing.

Coarsening & Interpolation The entries of the matrix �� can be used as an alternativedefinition for the strong connections used in algebraic multigrid. For standard prob-lems where the classical definition works well, the strong connections coincide withthe new definition using SPAI inverses. We investigate two applications from thefinite element method, where the Ruge/Stuben strong connection heuristic fails, butthe SPAI inverses yield robust strong connections.

Galerkin Coarse Grid Projection Explicit computation of an approximate inverse enablesus to include the matrix � in the Galerkin coarse grid projection. We propose toinclude � using

�� . The preconditioning effect of the alteredcoarse grid projection can improve the properties of the coarse grid operator. This

ii

improves the applicability of algebraic multigrid as a stand-alone solver and as apreconditioner.

Sequential and parallel computational examples show the practical efficiency of theseapproaches.

Parallel Coarsening Unfortunately the coarsening process in the setup–phase of the al-gebraic multigrid algorithm is a sequential procedure and thus represents a missing link toformulate a large scale parallel algebraic multigrid algorithm. We present a parallel coars-ening algorithm that produces coarse grids, whose quality is independent of the processornumber.

Object-oriented AMG in Python We have chosen to implement AMG using thePython programming language. The efficient implementation of the AMG setup is dis-cussed in much detail. The resulting open source software package WolfAMG, that imple-ments the classical Ruge/Stuben AMG in conjunction with the SPAI inverses is described.

Summary Combining geometric and algebraic multigrid is considered at different lev-els of research: algorithmic design, theoretical investigation, practical implementationand numerical experiment. Essential advantages of combining multigrid with SPAI are:

� improved robustness,� inherent parallelism,� ordering independence, and� possible local adaptivity.

iii

Zusammenfassung

In dieser Arbeit werden dunn besetzte approximate Inverse mit algebraischen Mehrgit-terverfahren (siehe Ruge and Stuben 1987) kombiniert. Die verwendeten approximativenInversen basieren auf dem SPAI Algorithmus von Grote und Huckle (1997). Fur einegegebene Matrix

�wir eine approximative Inverse � berechnet die � � � � in der

Frobeniusnorm minimiert. Dieses Minimierungsproblem zerfallt in unabhangige, lokaleund kleine Probleme der kleinsten Quadrate. Der Ansatz ist erfolgversprechend, weil ernaturlicherweise flexibel und parallel ist.

SPAI Hierarchie Verschiedene Besetzheitstrukturen fur die approximative Inverse �bilden eine neue Hierarchie von SPAI Inversen:

SPAI-0, mit diagonalem �SPAI-1r, wo � die Besetzheitsstruktur der starken Verbindungen in al-

gebraischem Mehrgitterverfahren hat,SPAI-1, wo � die Nichtnullenstruktur von

�hat

SPAI( � ), wo die Struktur von � wird dynamisch vom SPAI Algorith-mus bestimmt wird.

Verwendung der SPAI Inversen Wir schlagen drei Erweiterungen des geometrischenund des algebraischen Mehrgitterverfahrens vor.

Glatter Approximative Inverse konnen als Glatter verwendet werden, indem � � � �� als Fehlerkorrektur verwendet wird. Die Glattung der SPAI-1 Inversen entsprichtungefahr der des Gauss-Seidel Verfahren. Die SPAI( � ) Inverse kann verwendetwerden, wenn die Glattungseigenschaft der einfacheren SPAI-0 und SPAI-1 Ver-fahren nicht ausreicht. Wir zeigen ferner, dass die theoretischen Eigenschaften desSPAI-0 Glatters die des gedampften Jacobi Verfahren erweitern.

Vergroberung & Interpolation Die Eintrage von � konnen als alternative Definition derstarken Verbindungen im algebraischen Mehrgitterverfahren verwendet werden.Fur Standardprobleme gleicht diese neue Definition der klassischen. Wir unter-suchen 2 Anwendungen aus der finiten Elemente-Diskretisierung, wo die klassis-che Ruge/Stuben Definition fehlschlagt, die Verbindungen basierend auf den SPAIInversen sich allerdings als robustes Kriterium herausstellen.

Galerkin Grobgitterprojektion Die explizite Behandlung der Galerkin Grobgittprojektionerlaubt die Hinzunahme der Matrix � in das zu berechnende Matrixprodukt. Wir

iv

schlagen vor, den Grobgitteroperator� � �� zu verwenden. Die Vor-

konditionierung des veranderten Grobgitteroperators konnen dessen Eigenschaftenverbessern. Diese Tatsache erweitert die Anwendbarkeit von algebraischem Mehr-gitter als eigenstandiges Losungsverfahren.

Sequentielle und parallele Rechenbeispiele demonstrieren die praktische Effizienz dieserAnsatze.

Parallele Vergroberung Bedauerlicherweise ist die Vergroberung in der Aufbauphasedes algebraischen Mehrgitterverfahrens von sequentieller Natur und ist das letzte fehlendeBindeglied fuer ein effizientes paralleles Verfahren auf vielen Prozessoren. Wir stelleneinen parallelen Vergroberungsalgorithmus vor, der grobe Gitter mit einer Qualitat, dieunabhangig von der Zahl der Prozessoren ist, berechnet.

Resultate Wir betrachten die Kombination von approximativen Inversen und Mehrgit-terverfahren unter verschiedenen Gesichtspunkten: Entwurf von Algorithmen, Theorie,praktische Implementierung und numerische Experimente. Die wesentlichen Vorteile derKombination von SPAI Inversen mit dem Mehrgitterverfahren sind

� verbesserte Robustheit,� inharenter Parallelismus,� Ordnungsunabhangigkeit und� potentielle lokale Adaptivitat.

v

Contents

1 Introduction 11.1 Overview of the Ideas in this Thesis . . . . . . . . . . . . . . . . . . . . 11.2 The Development of Multigrid . . . . . . . . . . . . . . . . . . . . . . . 51.3 Requirements of Linear Solvers . . . . . . . . . . . . . . . . . . . . . . 111.4 Why combine Multigrid with SPAI? . . . . . . . . . . . . . . . . . . . . 121.5 Organization of Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

2 Prerequisites: Problems, Solvers, Multigrid and SPAI 192.1 Discretization of PDEs . . . . . . . . . . . . . . . . . . . . . . . . . . . 202.2 Iterative Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262.3 Multigrid . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 312.4 SPAI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

3 Multigrid Smoothing with SPAI 473.1 Smoothing Idea, SPAI Hierarchy and Theory . . . . . . . . . . . . . . . 493.2 Numerical Experiments for Standard Problems . . . . . . . . . . . . . . 613.3 Parallel MG with SPAI-1 . . . . . . . . . . . . . . . . . . . . . . . . . . 813.4 Conclusions: SPAI Smoothes and is Parallel . . . . . . . . . . . . . . . . 83

4 SPAI as Strong Connections in Multigrid 854.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 854.2 Numerical Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . 904.3 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97

5 Preconditioned Galerkin Projection with SPAI 995.1 Including � in the Galerkin projection . . . . . . . . . . . . . . . . . . 995.2 Numerical examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101

6 Parallel Coarsening by Relaxation 1136.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1136.2 Our approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1146.3 Compatible Relaxation . . . . . . . . . . . . . . . . . . . . . . . . . . . 1156.4 Coarsening by Relaxation . . . . . . . . . . . . . . . . . . . . . . . . . . 1196.5 Approximate Inverse Relaxation . . . . . . . . . . . . . . . . . . . . . . 1246.6 Numerical results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1266.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129

vi

7 Object-oriented Implementation of AMG in Python 1317.1 Efficient Ruge/Stuben coarsening . . . . . . . . . . . . . . . . . . . . . . 1317.2 Sparse matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1347.3 Galerkin product . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1407.4 WolfAMG . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145

8 Conclusions 1558.1 Is SPAI useful with Multigrid? . . . . . . . . . . . . . . . . . . . . . . . 1558.2 Contributions of this Thesis . . . . . . . . . . . . . . . . . . . . . . . . . 1568.3 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156

Bibliography 161

vii

List of Figures1.1 Overview of the main idea in this thesis . . . . . . . . . . . . . . . . . . 21.2 Carl Friedrich Gauss on the former german 10 Mark bill . . . . . . . . . 61.3 Parameterized multigrid algorithm . . . . . . . . . . . . . . . . . . . . . 141.4 Component dependencies in multigrid . . . . . . . . . . . . . . . . . . . 151.5 Dependencies of the chapters in this thesis . . . . . . . . . . . . . . . . . 16

2.1 Meshes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212.2 Discrete linear System . . . . . . . . . . . . . . . . . . . . . . . . . . . 242.3 Finite element discretization example . . . . . . . . . . . . . . . . . . . 252.4 Unidirectional error-smoothing . . . . . . . . . . . . . . . . . . . . . . . 382.5 Ideal coarsening configuration for interpolation . . . . . . . . . . . . . . 40

3.1 Discrete (approximate) Green’s functions . . . . . . . . . . . . . . . . . 553.2 Fourier coefficients . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 573.3 Unstructured grid . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 643.4 Inverse comparison for anisotropic Laplacian . . . . . . . . . . . . . . . 683.5 Solutions to the constant convection problem . . . . . . . . . . . . . . . 703.6 Gauss-Seidel on undirectional convection problem . . . . . . . . . . . . 723.7 SPAI smoothers on unidirectional convection problem . . . . . . . . . . . 733.8 Solutions to the constant convection problem with Shishkin mesh . . . . . 753.9 Smoothers for constant convection problem with Shishkin mesh . . . . . 763.10 Smoothers for rotating convection problem . . . . . . . . . . . . . . . . 793.11 Rotated anisotropy problem . . . . . . . . . . . . . . . . . . . . . . . . . 803.12 AMG for rotated anisotropic diffusion problem . . . . . . . . . . . . . . 82

4.1 Stretched grid example . . . . . . . . . . . . . . . . . . . . . . . . . . . 914.2 AMG for unstructured anisotropic diffusion problem . . . . . . . . . . . 934.3 AMG for streamline diffusion problem . . . . . . . . . . . . . . . . . . . 954.4 AMG for uncommon Laplace discretization . . . . . . . . . . . . . . . . 98

5.1 AMG with PCGA for rotated anisotropy . . . . . . . . . . . . . . . . . . 1035.2 Solutions to the Helmholtz equation . . . . . . . . . . . . . . . . . . . . 1045.3 BiCGstab with AMG for Helmholtz . . . . . . . . . . . . . . . . . . . . 1075.4 BiCGstab with AMG on the Sherman Collection . . . . . . . . . . . . . 1095.5 BiCGstab with AMG for Venkat problems . . . . . . . . . . . . . . . . . 110

6.1 Compatible relaxation for Laplacian . . . . . . . . . . . . . . . . . . . . 1166.2 Compatible relaxation for anisotropic Laplacian . . . . . . . . . . . . . . 117

viii

6.3 Compatible relaxation for Laplacian . . . . . . . . . . . . . . . . . . . . 1186.4 Random initial guesses . . . . . . . . . . . . . . . . . . . . . . . . . . . 1206.5 Coarsening by relaxation for Laplacian (1) . . . . . . . . . . . . . . . . . 1226.6 Coarsening by relaxation for Laplacian (2) . . . . . . . . . . . . . . . . . 1236.7 Solution � of

��

and��

. . . . . . . . . . . . . . . . . . . . 1256.8 Approximate inverse relaxation for Laplacian (jac) . . . . . . . . . . . . 1256.9 Approximate inverse relaxation for anisotropic Laplacian (jac) . . . . . . 1266.10 Approximate inverse relaxation for Laplacian (gs) . . . . . . . . . . . . . 127

7.1 Efficient implementation of coarsening . . . . . . . . . . . . . . . . . . . 1337.2 Sparse Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1357.3 Sparse matrix-matrix multiplication . . . . . . . . . . . . . . . . . . . . 1397.4 Galerkin product . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1417.5 In-place Galerkin product . . . . . . . . . . . . . . . . . . . . . . . . . . 1447.6 Harvey Keitel as Mr. Wolf in Pulp Fiction . . . . . . . . . . . . . . . . . 1467.7 Structure of WolfAMG. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1477.8 Class hierarchy of the module smoother . . . . . . . . . . . . . . . . . . 1517.9 WolfAMG vs. ML vs. AMG1R5 . . . . . . . . . . . . . . . . . . . . . . 153

ix

List of Tables1.1 Development of the complexity of iterative solvers . . . . . . . . . . . . 8

2.1 Relationships of unknowns in AMG . . . . . . . . . . . . . . . . . . . . 392.2 Standard settings for experiments . . . . . . . . . . . . . . . . . . . . . . 442.3 MG/AMG for Laplace . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

3.1 Basic iterative methods . . . . . . . . . . . . . . . . . . . . . . . . . . . 493.2 MG for Laplace . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 623.3 AMG for Laplace ( � ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 633.4 AMG for Laplace ( � ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 633.5 AMG for unstructured Laplace ( � ) . . . . . . . . . . . . . . . . . . . . . 653.6 AMG for unstructured Laplace ( � ) . . . . . . . . . . . . . . . . . . . . . 653.7 MG for locally anisotropic Laplace . . . . . . . . . . . . . . . . . . . . . 663.8 AMG for locally anisotropic Laplace ( � ) . . . . . . . . . . . . . . . . . . 693.9 AMG for locally anisotropic Laplace ( � ) . . . . . . . . . . . . . . . . . . 693.10 AMG for unidirectional convection problem . . . . . . . . . . . . . . . . 743.11 AMG for constant convection problem on Shishkin mesh . . . . . . . . . 773.12 AMG for rotating convection problem . . . . . . . . . . . . . . . . . . . 773.13 AMG for rotated anisotropic diffusion problem . . . . . . . . . . . . . . 813.14 Scalability of parallel MG using SPAI-1. . . . . . . . . . . . . . . . . . . 83

6.1 CBR for unstructured Laplacian . . . . . . . . . . . . . . . . . . . . . . 129

7.1 WolfAMG vs. ML vs. AMG1R6 . . . . . . . . . . . . . . . . . . . . . . 153

x

List of Algorithms2.1 Restarted GMRES( � ) . . . . . . . . . . . . . . . . . . . . . . . . . . . 292.2 Preconditioned BiCGstab . . . . . . . . . . . . . . . . . . . . . . . . . . 302.3 Residual correction algorithm . . . . . . . . . . . . . . . . . . . . . . . . 322.4 Two–Grid algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . 332.5 Multigrid algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 342.6 AMG setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 372.7 Ruge/Stuben coarsening . . . . . . . . . . . . . . . . . . . . . . . . . . 412.8 Removing Dirichlet nodes . . . . . . . . . . . . . . . . . . . . . . . . . 432.9 Scaling to unit diagonal . . . . . . . . . . . . . . . . . . . . . . . . . . . 442.10 Outline of the SPAI algorithm . . . . . . . . . . . . . . . . . . . . . . . 463.1 Spai algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 516.1 Framework for a coarsening algorithm . . . . . . . . . . . . . . . . . . . 1176.2 Coarsening by relaxation . . . . . . . . . . . . . . . . . . . . . . . . . . 1237.1 Coarse Grid Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1337.2 Sparse matrix-vector multiplication . . . . . . . . . . . . . . . . . . . . 1377.3 Dense matrix-matrix multiplication . . . . . . . . . . . . . . . . . . . . . 1387.4 Sparse matrix-matrix multiplication . . . . . . . . . . . . . . . . . . . . 1397.5 Forward Galerkin product . . . . . . . . . . . . . . . . . . . . . . . . . . 1407.6 Backward Galerkin product . . . . . . . . . . . . . . . . . . . . . . . . . 1407.7 In-place Galerkin product . . . . . . . . . . . . . . . . . . . . . . . . . . 143

xi

Notation� � � �� matrices� �� entry of matrix

�in the � -th row and the -th column� �� -th row, resp. -th column of

�� submatrix, consisting of rows �� , �� and �� of

�� submatrix, consisting of columns � , � and � of

�� indices of the nonzero entries of the � -th row of

�� indices of the nonzero entries of the -th column of

�� partial derivative: �"!# �$

Kronecker product[➠file.ext] reference to file “file.ext”%'& �( � converted to rounded floating point representation

xii

Symbols� diffusion coefficient . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20�

convection coefficient (vector) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20%continuous right hand side . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20%��continuous Dirichlet boundary condition . . . . . . . . . . . . . . . . . . . . . . . . . 20

� number of gridpoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20�mesh parameter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20� � � � continuous domain, discrete domain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20�stencil . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

� � � � solution vector, grid function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22��

left, right and centered stencil for mixed derivative . . . . . . . . . . . . . . . . 23� � � � discrete operator of a linear system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23% � right hand side of a linear system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23�finite element basis function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24�iteration count . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

�� approximate solution in� th iteration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26� � � �� residual, residual in

� th iteration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26� it accuracy tolerance of iterative methods . . . . . . . . . . . . . . . . . . . . . . . . . . 26

� convergence rate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26� � � �� error, error in� th iteration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26� relaxation parameter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27� �� lower and upper triangular, diagonal part . . . . . . . . . . . . . . . . . . . . . . . . . 27

� � � � � pre- and post-smoothing steps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33� std standard interpolation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35� strong connection threshold . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39S(A) strong connections in

�. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

P,C,F set of points, coarse points and fine points . . . . . . . . . . . . . . . . . . . . . . . . 39� approximate inverse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45�

sp maximal number of nonzeros per row/column . . . . . . . . . . . . . . . . . . . 134� nz number of nonzeros in sparse matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . 134�� , �� & � � , � & � � Row-, column- and value-array for sparse matrices . . . . . . . . . . . . . . 136��!� �� , ��

�� linked list arrays for sparse matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136

xiii

AbbreviationsSOR successive overrelaxation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

CG conjugate gradient method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7GMRES general minimum residual . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

FDM finite difference method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19FEM finite element method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

AMG algebraic multigrid . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19GMRES generalized minimum residual method . . . . . . . . . . . . . . . . . . . 27

BiCGstab bi-conjugated stabilized CG method . . . . . . . . . . . . . . . . . . . . . 27BiCG BiConjugate gradient method . . . . . . . . . . . . . . . . . . . . . . . . . . . 28GCA Galerkin coarse grid approximation . . . . . . . . . . . . . . . . . . . . . 35LFA Local Fourier Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54SPD symmetric positive definite . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

PGCA preconditioned Galerkin Coarse Grid Approximation . . . . 100CBR Coarsening by relaxation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119LNZ list of the nonzeros . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134CSR compressed sparse row . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134

LL linked list . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136NumPy Numeric Python . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148

xiv

Chapter 1

Introduction

1.1 Overview of the Ideas in this Thesis . . . . . . . . . . . . . . . . . 11.2 The Development of Multigrid . . . . . . . . . . . . . . . . . . . . 51.3 Requirements of Linear Solvers . . . . . . . . . . . . . . . . . . . 111.4 Why combine Multigrid with SPAI? . . . . . . . . . . . . . . . . 121.5 Organization of Thesis . . . . . . . . . . . . . . . . . . . . . . . . 14

This thesis is about combining two ideas: multigrid and the SPAI (sparse approximateinverse) algorithm. In particular algebraic multigrid (AMG) is a multilevel solver forsparse linear systems introduced by Ruge and Stuben (1987) and the SPAI algorithm(Grote and Huckle 1997) is a method to compute a sparse approximate inverse based onminimizing the Frobenius norm. More precisely we incorporate the SPAI algorithm inthe algebraic multilevel scheme. Where the original SPAI algorithm is not suitable, weintroduce new variants of computing similar approximate inverses. Essential advantagesof the approach are

� improved robustness,

� inherent parallelism,

� ordering independence and

� possible adaptivity.

Let’s start with a broad overview of what this thesis is all about.

1.1 Overview of the Ideas in this Thesis

Partial differential equations are often used to construct models of the most basic the-ories from physics, chemistry and engineering. Explicit solutions to partial differentialequations are unknown or too complex, except for model problems. The predominant

2 Introduction

technique today to solve partial differential equations is numerical approximation. Thistechnique approximates the given problem with a discrete version of the given opera-tors. In general this approach generates very large sparse linear systems. One importantalgorithm to solve such linear systems is the multigrid method.

Figure 1.1 Overview of the main idea in this thesis Pictorial representation of the multi-grid V-cycle. We depict that on every level � we additionally compute an approximateinverse �� and make use of the inverse to increase the parallelism and robustness of theclassical AMG method.

Construction Phase Solution Phase

Level 2

Level 1

Level 0

��

��

��

Iteration 1 Iteration 2�

�

��

��

� �

� � ��

� �

�

The Multigrid Idea Our pictorial example in Figure 1.1 displays a three grid method.The level 0, also called the finest level, is given by the original linear system

��that we

are interested in solving. For any multigrid method one must construct a series of levels,in our case two additional operators

� � and� � which resemble coarse approximations

of the original problem on the finest level. The coarsening process is generally stoppedwhen the coarsest operator is small enough for efficiently solving the system by a directmethod like Gaussian elimination. The number of levels thus strongly depends on thesize of the original system

� �and the coarsening scheme used. In geometric multigrid

the coarse operators are obtained the same way� �

was obtained, by explicit discretizationof the original problem. Algebraic multigrid tries to mimic that process automatically andconstructs the coarse operators solely with information taken from

� �. The advantage is

obvious: instead of having to supply the multigrid solver with a sequence of matrices, al-gebraic multigrid requires only

� �as the input. The algebraic approach naturally divides

the process into two phases: a construction phase and a solution phase.

The coarsening process does not only produce system operators� � on each level, it also

defines so called transfer operators. The transfer operators are used for transferring afunction from one grid level to a neighboring one. The transfer operator from level � tolevel �� is called restriction. This is also commonly referred to as a fine to coarse gridtransfer. The operator for the reverse way from the coarse to the fine grid is called theinterpolation.

Overview of the Ideas in this Thesis 3

To define a multigrid method one needs one more thing: the smoother. Each level mustbe associated with a method that removes the high-frequencies in the error for a given ap-proximation of the solution. Consequently it is called the smoother; it smoothes the error.This fact also motivates the multigrid approach: smooth quantities can be represented oncoarser grids. So after a few smoothing steps the process can proceed on a coarser meshand in essence save computational time by using fewer points.

We actually also need a solver for solving the coarsest problem, in our case� � . The

choice here is immaterial though, since we assume the coarsest problem to be so smallthat e.g. Gaussian elimination with partial pivoting should suffice. Note also that themultigrid process is only efficient when the coarse matrices are sparse enough. We gaugethe relative density of those matrices compared to

� �with a measure �� to be defined

exactly in Section 2.3.3.

In Section 2.3 we will go into more detail about the exact construction of the previ-ously mentioned operators and methods. Note though, that the construction of multilevelschemes is highly modular. One can choose between a large variety of grid hierarchies,transfer operators and smoothers. In this thesis we refer to these as components. Wemainly distinguish between three components:

Coarsening Generally speaking, describes the process of selecting points from grid � thatwill be represented on the coarser grid �� .

Interpolation Includes the construction of interpolation and restriction operator� � and� � respectively.

Smoother Defines the method for error smoothing on each level.

One cannot strictly divide the multigrid process in such three categories, because all com-ponents in a multigrid method are highly related, but it is useful for presenting generalideas.

What we are really doing is solving the fine level system� �

. The solution process isa recursive process that visits all levels. Our examples depicts the often used V-cycle.Starting from the finest level 0 the algorithm alternates (pre-)smoothing (depicted by asmall circle) and restriction (depicted by an arrow pointing down). When the coarsestlevel is reached a direct solve (depicted by a box) is issued on the coarsest grid. Then thealgorithm proceeds back to the finest level by again alternating interpolation and (post-)smoothing. That point concludes a V-cycle. A multigrid iteration repeats V-cycles untila suitable stopping criterion is fulfilled.

Multigrid and SPAI There are two drawbacks to the (algebraic) multigrid approach:

4 Introduction

� Unfortunately (A)MG may fail to converge, i.e. either the error is not reduced aftera V-cycle or reduced by such a small factor that the desired accuracy is only reachedafter an prohibitively large number of steps.

� Solving the resulting systems of equations on parallel machines is of particular in-terest. One wants to obtain parallel speedup and especially make use of the largememory available on parallel machines with distributed memory. Multigrid in prin-ciple is as parallel as its components. Most components are naturally parallel, butunfortunately in the algebraic approach, neither the coarsening nor the smootherare easily parallelizable.

The goal of this thesis is to improve upon those two drawbacks of the algebraic multigridmethod using approximate inverses based on the SPAI algorithm.

On each level � we compute a sparse approximate inverse matrix � �� based on theSPAI algorithm. We make use of these for all components � � by

� investigation of the entries � � �� matrix-vector multiplication with � �� matrix-matrix multiplication with � �

Obviously this requires additional storage, which we measure with �� , similar to mea-suring the additional storage for the coarse level systems

� � with � � .

The following table lists the main changes we introduce in the classical algebraic multi-grid algorithm:

Component classical AMG newSmoother Gauss-Seidel iterating using � �Coarsening strong connections from

� � strong connections from � �Coarsening Galerkin product Galerkin product preconditioned by ��

We show for a number of typical applications for algebraic multigrid that we can employthe SPAI inverses to either improve the parallelism while retaining the smoothing perfor-mance of the Gauss-Seidel iteration or even improve the convergence rate at reasonableadditional storage cost. Varying the sparsity pattern of the inverses � � either explicitly orautomatically gives us more flexibility for improving AMG convergence.

Hopefully this introduction has sparked your interest in the combination of AMG andSPAI. The following sections in this chapter discuss the historical development and sig-nificance of the multigrid algorithm. We explain why exactly combining multigrid with

The Development of Multigrid 5

SPAI may be advantageous. We motivate our approach with four fundamental guiding de-sign principles for adaptive multilevel schemes with approximate inverses. We concludethe chapter by an overview of the remaining chapters in this thesis.

1.2 The Development of Multigrid

This section is a short summary of the development of numerical linear solvers. Its mainpurpose is to show how one branch of the research in numerical linear algebra emerged inthe design of multigrid approaches. We outline the significance of the approach, i.e. thatthe multigrid algorithm is an optimal solver for a large number of problems.

In a letter dated December 26th 1823 Gauss1 wrote to his friend Gerling2:

[ �� ] Ich empfehle Ihnen diesen Modus zur Nachahmung. Schwerlichwerden Sie je wieder direkt eliminieren, wenigstens nicht, wenn Sie mehrals zwei Unbekannte haben. Das indirekte Verfahren lasst sich im Schlafeausfuhren, oder man kann, wahrend desselben, an andere Dinge denken.[ �� ]

[ �� ] I suggest this approach for imitation. You will scarcely ever elimi-nate directly again, at least if you have more than two unknowns. The indirectmethod can be done asleep or in the meantime one can think of other things.[ �� ]

Obviously Gauss had already realized that Gaussian elimination is a computationally ex-pensive method. The iteration he was referring to in the letter is essentially what is nowgenerally known as the Gauss-Seidel iteration. Seidel (1874) states in his work:

[ �� ] So einfach nun, ihrer mathematischen Natur nach, die Aufgabeist, [ �� ], so muhsam wird ihre numerische Durchfuhrung, wenn die Zahl derUnbekannten betrachtlich gross wird. [ �� ] Ich weiss nicht, ob ein Complexmit mehr als einigen siebzig Unbekannten je einheitlich berechnet wordenist. [ �� ]

[ �� ] As simple as the method is, considering its mathematical nature,the troublesome is its numerical execution when the number of unknownsbecomes considerably large. [ �� ] I am not sure a complex with more thanseventy unknowns has ever been thoroughly computed. [ �� ]

1Carl Friedrich Gauss ( � 1777-�1855)

2Christian Ludwig Gerling ( � 1788-�1864)

6 Introduction

Figure 1.2 Carl Friedrich Gauss on the former german 10 Mark bill The front sidedisplays Carl Friedrich Gauss, a building of the historic Gottingen, where Gauss taughtmany years as a professor and the formula and a graph for the Gauss normal distribution.The back side shows a sextant and a triangulation that was part of Gauss’s measure-ments. The first edition of this note was printed on April 16, 1991. Shortly before thefinal printing a mathematician at the german “Bundesdruckerei” (Federal Printing Works)discovered an error in the formula. Instead of the commonly used greek letter “sigma”( � ) a “delta” (

�) was erroneously used. On January 1, 2002 seven Euro banknotes were

introduced in 12 Member States of the European Union. The images displayed on thebanknotes are modeled on typical architectural styles, rather than something more spe-cific.


Starting from some initial guess� �� the Gauss-Seidel iteration is defined by

� �� where

� � � � � � . � is the diagonal,�

is the lower triangular, and � is the uppertriangular part of

�. While the iteration was soon shown to be non-effective for large and

general linear systems, it is to this day one of the foundations of algebraic multigrid.

Stationary Iterative methods In the early application of the Gauss-Seidel and simi-lar methods like successive overrelaxation (SOR) the formulation of stationary iterativemethods based on regular splittings were of importance. Young (1950) and Varga (1962)were early pioneers in investigating the convergence properties of these methods.

A more general framework for the iterative solution of linear equations was already pre-sented by Richardson (1910). The general idea of the Richardson iteration is given by thefollowing equation:

� �� The Richardson iteration in practice was often found to be numerically unstable. Exten-sions of the idea were given by polynomial acceleration.

New splitting ideas by Peaceman and Rachford, Jr. (1955) lead to the development of thealternating direction iteration (ADI). The basic idea in ADI is to split the system matrixinto matrices corresponding to coordinate directions. For model problems this yields aset of tridiagonal positive definite matrices that can be treated specially in an alternatingfashion. The regular splitting for ADI requires the use of Cartesian meshes.

A breakthrough in iterative solver construction was the independent discovery of the con-jugate gradient method (CG) by Hestenes and Stiefel (1952). Each step in CG minimizesthe functional:

� � � � !�� for all��

in the current Krylov subspace� � � � � span �

��

��

CG is proven to converge for symmetric positive definite problems. A similar algo-rithm proposed by Lanczos (1952) was described for general matrices. Since then manyKrylov subspace methods have been proposed in the literature. A very popular one is theBiCGstab iteration proposed by van der Vorst (1992).

The general minimum residual (GMRES) iteration was proposed by Saad and Schultz(1986). It proved to have good numerical behavior, but at a linearly increasing computa-tional and memory cost per iteration. Additional vectors have to be stored. One obvious

8 Introduction

choice is to restart the iteration after a certain number of steps, generally referred to asGMRES( � ).

Preconditioning The convergence of all the above methods is highly dependent on thecondition and the eigenvalue distribution of the system matrix. Improving the conver-gence properties by a transformation into a system with more advantageous properties isnow traditionally called preconditioning. Preconditioning by polynomial approximationsof the inverse was developed in the 1960s and 1970s. Also the already mentioned SORmethod or other iterative methods have to this day been popular as preconditioners.

Another approach is the computation of incomplete factorizations or explicit approximateinverses. Among them are many versions of the popular incomplete

� � factorization(ILU) first proposed by Meijerink and van der Vorst (1977). Kolotilina and Yeremin(1993a) and Benzi, Meyer, and Tuma (1996) proposed other factorized approximate in-verses.

An alternative to computing factorizations is computing an explicit matrix. A matrix �is computed, for which ��

� �� small �

in some chosen norm. The SPAI algorithm by Grote and Huckle (1997) uses the Frobeniusnorm.

In both cases, the incomplete factorizations and the approximate inverses, the sparsitypattern is a crucial factor for the efficiency of the approximate inverse. The storage spaceneeds to be limited while the properties of the approximate inverses need to be effectiveat the same time.

Table 1.1 Development of the complexity of iterative solvers Given the two-dimensional Poisson problem with � unknowns we list computational complexities foriterative solvers (Joppich 1996). The full multigrid scheme was the first iterative solver toreach the optimal linear complexity. Note that the resulting linear system is sparse.

SOR � �� ADI � �� Bunemann method � �� Multigrid V-Cycle � �� Full multigrid scheme � ��

Multigrid methods One naturally seeks optimal solvers. Computational work propor-tional to the number of unknowns is also referred to as linear complexity or � �� . Linear


complexity is certainly optimal. Table 1.1 shows that the goal to find an optimal solver forlarge sparse linear equations with linear complexity was first reached by the full multigridscheme.

Linear complexity means that the work per unknown does not increase with growing sys-tem size. This is of interest, because discretization of partial differential equations (PDEs)results in linear systems with inherent discretization error. Consequently one is interestedin solving systems of maximal system size. Methods with non-linear complexity are proneto fail with the growing system size. On the other hand: methods with proven linear com-plexity will certainly be applicable anytime in the future. In analogy to the grid-spacingparameter

�in discretization, linear complexity of a method is also commonly referred to

as�

-independence in the multigrid literature.

The first multigrid publications were by Fedorenko (1961) and Bakhvalov (1966). Theactual efficiency and generality was first fully realized by Brandt (1973). The underlyingidea of multigrid is based on the insight that e.g. the Gauss-Seidel iteration may not bevery efficient in reducing the error, but is in many cases very efficient in smoothing theerror. We obviously know from information theory that smooth functions can be repre-sented with fewer sampling points. This idea of transferring smooth quantities can berealized using the residual equation. Solving a similar approximate problem on a coarsermesh obviously saves computational time. Additionally the discrete frequency space isshifted. On the coarser grid low frequencies become higher frequencies and can be effi-ciently reduced by an appropriate smoothing method. Recursive application of this ideayields the multigrid iteration. The success of the multigrid algorithm is due to the perfectinterplay of smoothing and coarse grid correction and the overall linear computationalcomplexity.

Since the 1980s the number of publications related to multigrid increased substantially.Over 3000 references can be found in the MGNET bibliography by Douglas and Douglas(2002).

Multigrid has proven to be an optimal i.e.�

-independent solver for discrete elliptic bound-ary value problems with variable coefficients for general domains in 1D, 2D and 3D. Themultigrid iteration has been successfully implemented in parallel for a variety of prob-lems, see Hackbusch (1993), Trottenberg, Oosterlee, and Schuller (2001) and Wesseling(1982). Today Multigrid is rather an approach, rather than a fixed algorithm.

References An excellent summary of the development of iterative linear solvers waswritten by Saad and van der Vorst (2000). A whole book dedicated to the developmentof scientific computing was edited by Nash (1990). It includes a paper by Gutknecht(1990), dedicated to the development of scientific computing in Switzerland, mainly atthe Swiss Federal Institute of Technology Zurich (ETH). A similar reference is the paper

10 Introduction

by Schwarz (1981). Ceruzzi (1998) wrote a more recent and popular summary of thehistory of modern computing.

Excellent summaries and descriptions of the methods mentioned above exist by variousauthors. We would like to mention Stoer and Bulirsch (1980)3, Ortega (1988), Hackbusch(1993), Barrett et al. (1994), Saad (1996) and Greenbaum (1997).

Drawbacks of the Multigrid Approach Despite the optimal complexity of geometricmultigrid, its popularity, compared to Krylov subspace methods is limited. For examplethe very popular mathematical software system MATLAB and many other well-knownnumerical packages do not provide support for multigrid in their standard libraries, whilemany implement a variety of Krylov subspace iterations. Why is this the case? Weidentify several reasons why:

More difficult to Implement Compared to Krylov subspace methods in general multigridis inherently more difficult to implement. This is due to the level hierarchy and theassociated grid transfers. This also make it more difficult to design frameworks formultigrid iterations.

Grid hierarchy needed Multigrid requires a grid hierarchy and corresponding operators.In situations with complicated geometries it may be difficult to generate this hier-archy.

Multigrid is an approach While the multigrid idea works with a variety of problems, itis not one single algorithm that solves all these problem. There is a large numberof coarsening strategies, smoothers, grid-transfer operators and cycling strategies.Choosing an appropriate set might not always be simple.

Multigrid knowledge required After all one needs at least some basic knowledge aboutmultigrid to implement a non-standard multigrid solver. The learning process mighttake too long for many potential users of multigrid.

Applicability The multigrid approach is not as widely applicable as for example Gaussianelimination.

To be more precise about the differences of multigrid and classical iterative methods, wepresent a list of requirements for a linear solver.

3originally published in German in the two volumes: Stoer (1979) and Stoer and Bulirsch (1990)

Requirements of Linear Solvers 11

1.3 Requirements of Linear Solvers

A sparse linear system of equations��

can be a model for amazingly many problems,even too vast to start enumerating. Consequently much research effort has been put intodesigning solution methods for such problems: linear solvers. The number of methods,acronyms, preconditioners, implementations and papers is already huge and still growing.In contrast to this wide range of linear solvers most users of numerical software wouldprefer black-box software and algorithms and ideally write programs like:

x := solve(A,b).

There is no such thing as a black box solver for large sparse linear systems, but one hopesto find suitable solvers for considerably large classes of problems. One such class is theset of problems arising from the discretization of partial differential equations. Even forthat class numerical software today is far from issuing one simple command. Computerstorage capacity and speed has reached a limit that allows the solution of many very largesparse linear systems on a single workstation. Yet existing methods are limited: directsolvers often quickly exceed the memory requirements and the deteriorating convergencerate of many iterative methods makes many problems intractable. One has to resort tospecialized methods and optimized software. Technical programming issues can easilyoverwhelm the user and even prohibit the solution. Thus the wish-list of requirements forlinear solvers has become more detailed. A linear solver should ideally:

(R1) General �� not be limited to a special problem, geometry, discretization tech-nique or resulting matrix class.

(R2) Efficient �� use very little (and predictable) amounts of memory and computingtime.

(R3) Parallel �� achieve good parallel speedup on distributed memory machineswith thousands of processors.

(R4) Robust �� neither suddenly break down, nor behave in any other unstablefashion.

(R5) Simple �� be simple to understand and use, i.e. it should be short and havevery few or no parameters.

(R6) Implemented �� exist as an efficient implementation, be portable and open source.

While the emphasis of different users may vary, ideally all requirements should be fulfilledat the same time. Especially parallelism (R3) is often under consideration. Really large

12 Introduction

problems can be solved today only on large scale parallel machines, mainly due to thelarge amounts of required memory that is naturally limited on sequential machines due tothe physical memory.

We investigate the three standard approaches Krylov subspace methods, geometric multi-grid and algebraic multigrid with respect to our list of requirements:

Krylov subspace methods Many standard iterative solvers are widely accepted, becausethey seem to strike a good balance between the above goals. Generally speakingtheir implementation is fairly simple. Preconditioners have improved the applica-bility and the robustness of iterative solvers. Their parallelism cannot be estimatedin general, but many of them are trivially and efficiently parallelized.

MG Geometric multigrid fulfills most of the above requirements as well - especiallyrequirement (R2), since it is proven to be an optimal method for many problems.Unfortunately, the approach is non-trivial to use in practice. Knowledge about themultigrid principle, mathematical and technical generation of the grid hierarchyand the transfer operators is required. Additionally good parallel speedup can bedifficult to obtain - it requires good load balancing of the coarsening process andparallelism in all of its components: smoothing, coarsening and interpolation.

AMG In order to improve requirement (R5) simplicity and (R1) generality, algebraicmultigrid was designed (Ruge and Stuben 1987). As a trade off some of the gen-erality and efficiency of the explicit multigrid approach was lost. Additionally itsparallelism is definitely limited due to the sequential coarsening algorithm and thesequential Gauss-Seidel smoother.

In the following section we discuss why combining SPAI with multigrid is advantageousfor improving on the given requirements for linear solvers.

1.4 Why combine Multigrid with SPAI?

We describe four main guiding design principles for combining multigrid with SPAI ap-proximate inverses:

(1) More Parallelism The computation of the SPAI inverse is based on minimizingthe Frobenius norm. The resulting minimization procedure to compute the approximateinverse � decomposes the problem in � independent small least-square problems, exactlyone for each row of � . We try to replacing the sequential parts in AMG with proceduresbased on the SPAI algorithm. If the new procedure requires no additional sequential

Why combine Multigrid with SPAI? 13

computation and the resulting convergence rate is comparable to the original algorithm,then we have achieved additional parallelism in AMG.

The parallelism of SPAI based algorithms, e.g. procedures that require multiplication withthe approximate inverse � or simply referencing entries in � , has another essential ad-vantage: the result is independent of the processor number. Most attempts in parallelizingAMG rely on the fact that the number of grid points per processor is fairly large and par-allelism for a given problem size is limited by this ratio. In other words, the resultingparallel method is not independent of the number of processors. In contrast, SPAI basedparallel algorithms as mentioned above do not suffer from this limitation.

(2) Ordering Independence In classical AMG coarsening and smoothing operationsare ordering dependent. For some applications the ordering of the unknowns becomescrucial. The computation of a SPAI inverse is ordering independent, i.e. the approxi-mate inverse is invariant under permutation of rows. This property indicates additionalrobustness and possibly more parallelism as mentioned in the previous paragraph.

(3) More Adaptivity The multigrid iteration is an iterative process. Thus the efficiencyof the process is determined by the convergence and the computational cost for a cycle.Classically multigrid is essentially parameter-free. If a multigrid iteration does not con-verge for a given problem one may try to increase the number of pre- and post-smoothingsteps or choose a different interpolation scheme or something alike. Usually the reasonfor deteriorating convergence is a fundamental problem with the interplay of smoothingand coarse grid correction, thus specialized procedures must be found. For many usersthis might be a reason to reject multigrid as a solver, because it may demand a great dealof insight into multigrid theory and implementation work.

In contrast the SPAI algorithm by Grote and Huckle (1997) has a very useful parameter� . The lower this parameter, the better the approximate inverse � . The limit case � ��yields the exact inverse

� � � for � . Even the non-expert may choose to spend more timeand use more memory in sake of a better result.

The same basic idea might be applicable to the multigrid iteration, i.e. a Multigrid( � )algorithm with a parameter � that controls the complexity vs. convergence relationship.Figure 1.3 shows the idea schematically. One may also imagine separate parameters forthe components, i.e. coarsening, interpolation and the smoother.

(4) New Component Relations Multigrid methods rely on the perfect interaction ofsmoothing and coarse grid correction. One can choose to either define a smoother andthen choose an appropriate coarse grid correction method or vice versa. The smoother,

14 Introduction

Figure 1.3 Parameterized multigrid algorithm

parameter � � � ideal ��

time for one iteration slow still fast fastmemory requirements dense still sparse sparseconvergence behavior exact solver good convergence bad convergence

the coarsening and the grid transfer operators can be viewed as components. Figure 1.4(a)shows pictorially that all components must match all other components. As a consequencewhen designing a multigrid algorithm one must fix one or maybe two of the componentsand tailor the other component(s) accordingly.

In classical (geometric) multigrid the grid hierarchy is given, as well as the systems andtransfer operators on each level. Figure 1.4(b) shows this relationship. In contrast, classi-cal algebraic multigrid assumes the smoother to be fixed as Gauss-Seidel smoothing andconstructs operator dependent coarse grids and from that the interpolation operators, seealso Figure 1.4(c).

Both approaches may lead to difficulties in designing an efficient multigrid scheme. Ge-ometric multigrid might not be applicable due to an unknown or irregular grid struc-ture. Resorting to algebraic multigrid does not necessarily work, especially if the Gauss-Seidel iteration is not efficient enough. We try to break the traditional dependencies ofthe multigrid approach and make a step towards a more general scheme as depicted inFigure 1.4(a).

All four anticipated advantages can be utilized in this thesis.

1.5 Organization of Thesis

This thesis deals with combining (algebraic) multigrid with approximate inverses. Weinvestigate the use of approximate inverses based on Frobenius-norm minimization, as

� smoothers,

� definition for strong connections and

� as preconditioners in the Galerkin coarse grid projection.

Additionally we examine the problem of parallel coarsening strategies and investigatea new coarsening algorithm. We implement our suggested modifications of the original

Organization of Thesis 15

Figure 1.4 Component dependencies in multigrid The multigrid componentsSmoother, Coarsening and Interpolation must perfectly match each other (a). In geo-metric multigrid (b) the coarsening and the interpolation are usually given, the smoothermust be designed to match the given components. In algebraic multigrid (c) the situationis reversed. Gauss-Seidel is fixed as a smoother and matching coarsening and interpola-tion components are automatically constructed.

Smoother

Coarsening Interpolation

(a) multigrid components must match eachother

Smoother

InterpolationCoarsening

(b) geometric multigrid

Smoother

InterpolationCoarsening

(c) algebraic multigrid

16 Introduction

AMG algorithm in the Python programming language.

Figure 1.5 Dependencies of the chapters in this thesis Arrows in this figure representdependencies.

Chapter 1Why combine Multigrid with SPAI?

�

Chapter 2Prerequisites: Problems, Solvers, Multigridand SPAI

�

Chapter 3Multigrid Smoothing with SPAI

� � �

Chapter 4SPAI as Strong Connec-tions in Multigrid

Chapter 5Preconditioned GalerkinProjection with SPAI

Chapter 6Parallel Coarsening byRelaxation

� � �

Chapter 7Object-oriented Implementation of AMG inPython

�

Chapter 8Conclusions

This thesis is organized as follows (Figure 1.5 displays the dependencies of the chapters):

Chapter 2 repeats the necessary prerequisites for all the following chapters. It introducesthe notation and the basic algorithms we use. A great part of this section deals withthe explanation and the definition of the multigrid approach. We focus especiallyon the algebraic coarsening and interpolation approach by Ruge and Stuben (1987).Readers new to multigrid might need some additional references to gain full under-standing of the material presented. Multigrid experts can probably skip most of thesection.

Chapter 3 explains how approximate inverses might be used as smoothers within themultigrid iteration.

Organization of Thesis 17

For a given matrix�

, a sparse approximate inverse � is computed by minimizing��

��in the Frobenius norm. Various choices for the sparsity pattern of �

yield a new hierarchy of smoothers:

SPAI-0, where � is diagonal;

SPAI-1, where � has the sparsity pattern of�

; and

SPAI( � ) by Grote and Huckle (1997), where the sparsity pattern of � is deter-mined automatically by the SPAI-Algorithm.

We show that in many cases SPAI-0 smoothing is sufficient to obtain satisfactorymultigrid convergence rates. The performance of SPAI-1 smoothing is comparableto that of the Gauss-Seidel iteration, while SPAI( � ) may be used when the sim-pler SPAI-0 and SPAI-1 smoothers fail. Some theoretical results for the SPAI-0smoother are included in this section. We show how memory and computationaltime can be saved by using a reduced sparsity pattern instead of the full SPAI-1approximate inverse.

For improved robustness the SPAI smoothers are combined with an algebraic coars-ening strategy.

Chapter 4 discusses the use of the entries in an approximate inverse � as a definitionfor strong connections. We show analytically for two examples where the standardRuge/Stuben heuristic fails that the entries of the approximate inverse provide morereliable information about the direction of smooth error components. The new cri-terion for strong connections yields a

�-independent method for those examples.

Additionally to the new coarsening and interpolation direction this criterion natu-rally defines a new sparsity pattern for approximate inverse smoothers.

Chapter 5 introduces sparse approximate inverses in the Galerkin coarse grid projection.The resulting preconditioning of the matrix makes the Galerkin projection morerobust and thus extends the applicability of algebraic multigrid. We demonstrate bynumerical examples that this approach is successful in improving the convergencerate of problems with unsatisfactory convergence and enables the solution of someproblems that cannot be solve by strictly applying classical AMG.

Chapter 6 is about a new parallel coarsening strategy. We present a new approach to thisproblem using an optimization algorithm, structurally similar to the well knownSimulated Annealing procedure (Metropolis et al. 1953). We show that less than5 iterations of such an optimization algorithm are sufficient to find a reasonablecoarsening. The method is also based on compatible relaxation (Brandt 2000) andthus incorporates the smoother in the coarsening process. We show the effective-ness not only with respect to the sequential Gauss-Seidel iteration, but other parallel

18 Introduction

smoothers introduced in Section 3. We assess the effectiveness of such an approachin terms of operator complexity and convergence. Numerical examples for the Pois-son problem are presented.

Chapter 7 gives more detailed descriptions about implementing AMG. A descriptionof an efficient implementation of the Ruge/Stuben coarsening algorithm is given.We describe our implementation of sparse matrices and the corresponding matrix-vector and matrix-matrix multiplication. We also discuss different implementationsof the Galerkin coarse grid approximation. We conclude the chapter with a shortdescription of an object oriented framework for AMG and argue why such a designis useful. The chapter is written with respect to our Python implementation, but thealgorithms and concepts presented are independent of the programming language.

Chapter 8 provides a summary of all the new methods and results presented in this thesis.We outline future work in the area of algebraic multigrid and approximate inverses.

Chapter 2

Prerequisites: Problems, Solvers,Multigrid and SPAI

2.1 Discretization of PDEs . . . . . . . . . . . . . . . . . . . . . . . . 202.1.1 Structured Grids . . . . . . . . . . . . . . . . . . . . . . . . 202.1.2 Unstructured Grids . . . . . . . . . . . . . . . . . . . . . . 23

2.2 Iterative Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . 262.2.1 Jacobi and Gauss-Seidel Iteration . . . . . . . . . . . . . . . 272.2.2 GMRES . . . . . . . . . . . . . . . . . . . . . . . . . . . . 282.2.3 BiCGstab . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

2.3 Multigrid . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 312.3.1 Multigrid Principle . . . . . . . . . . . . . . . . . . . . . . 312.3.2 Multigrid Convergence Theory . . . . . . . . . . . . . . . . 362.3.3 Algebraic Multigrid . . . . . . . . . . . . . . . . . . . . . . 372.3.4 Experiment Settings . . . . . . . . . . . . . . . . . . . . . . 44

2.4 SPAI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

This chapter is a summary of the known techniques and algorithms used in this thesis. Westart by restating how partial differential equations are transformed into linear systems ofequations using the finite difference method (FDM) or the finite element method (FEM).Due to the discretization error, which is small only when the number of points is largeand the locality of the differential operators, the resulting linear systems are usually largeand sparse (i.e. very few nonzeros per row) and become intractable by direct methods,such as Gaussian elimination. Thus we introduce some iterative methods that are oftenemployed instead. The multigrid iteration is also an iterative solution scheme. In certainspecial situations it has proven to be an optimal method, i.e. the work per unknown staysconstant as the number of points increases. We explain the multigrid idea and especiallyits sub-variant: algebraic multigrid (AMG). Approximate inverses were popularized aspreconditioners to speed up iterative methods. In this introduction we focus on approxi-mate inverses based on Frobenius-norm minimization and especially the SPAI algorithm(Grote and Huckle 1997).

20 Prerequisites: Problems, Solvers, Multigrid and SPAI

2.1 Discretization of PDEs

We generally look at second order linear partial differential equations for the unknownfunction � of the form

�� (�� %

for��

� % �

for��

with given right hand side%

and boundary conditions% �

in one, two, and three di-mensions. The diffusion coefficient � , the vector of convection parameters

�and the

Helmholtz coefficient�

may vary throughout the given domain�

.

2.1.1 Structured Grids

Grids A large class of grids is given by the tensor product of one dimensional grids� � �� with� � � ��

�

The resulting dimensional mesh is given by� ��

��

Tensor meshes with local refinement are also referred to as Shishkin meshes. Figure 2.1displays typical two dimensional meshes.

In the regular grid case, we discretize the unit square� � � � � � using � grid points in

each direction with a uniform grid spacing� � ! � � � � . The �� standard grid in

2D is thus given by the set of interior points� � � ��

��

� � ��

�

Second derivative We approximate the second partial derivative by the following wellknown approximation:

� � � � � � � � � � � �� (2.1)

For one dimensional meshes we assume all grid points to be ordered and indexed by themagnitude of their coordinate, i.e.

� �� ((( � � � . When equally spaced grid points

Discretization of PDEs 21

Figure 2.1 Meshes Displays the standard grid and typical tensor meshes.

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

x

y

(a) standard grid

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

x

y

(b) tensor mesh with refine-ment toward the center of theregion

0.2 0.4 0.6 0.8 1

0.2

0.4

0.6

0.8

1

x

y

(c) tensor mesh with (log-arithmic) refinement towardthe lower left hand corner

are used the given approximation (2.1) then translates to

��

� � �

In case of unevenly spaced gridpoints with a left meshsize� � � � � � � and a right

meshsize� � ��

� and

� � � � � �we use the following more general approxi-

mation

�'� � � � � ��

� � � � ��

In case of strong jumps in the diffusion coefficient � , we stabilize the discretization by thefollowing modified approximation (see also Grauschopf, Griebel, and Regler 1997):

� � � � � � � � �

��

if � � � �� & � � � � � if � � � � � ! � � � � � � � �� & � � � � �

� � if � � � �� ! � � � � � � � �

� � � & � � � otherwise

with

& � ��

� � � �� and � � � �

� � � � ��


First derivative For the first derivative we use an upwinding scheme, i.e. an approxi-mation of the derivative that depends on the sign of the convection parameter

�. It is well-

known that centered differences and downwind schemes yield an unstable discretization.The upwind scheme is defined by:

� � � � � � � � � � � ! � � if

� � � � � � � � � � � ! � � if

� � �� (2.2)

Equation (2.2) is already given for unequal meshsizes.

Remark 2.1.1 (higher dimensions) The above definitions for the derivatives carry overto any number of dimensions as long as the grids used are regular enough and alignedparallel to the axes. Note that the indices � � � � � and � � � must be adjusted according tothe ordering of the points. �Remark 2.1.2 (higher order discretizations) Note that for many applications with smoothsolutions, higher order discretizations are more efficient. Generally speaking higher or-der discretizations can be achieved by using more neighboring points and/or combiningupwinding schemes with central differences where stability allows to do so. ApplyingAMG to higher order stencils is straightforward. As in any discretization, stencils withlarge positive off-diagonal entries may pose a problem for AMG. �Remark 2.1.3 (general geometries) The FDM is applicable for general geometries andboundary conditions, but the approximation for non-rectangular domains and Neumannboundary conditions requires the insertion of additional grid points. �Stencil notation First and second order methods result in compact nine point stencils.In two dimensions such a nine-point stencil

�is a � �� real–matrix with entries indexed

by the set � � � � � � � � � as follows:

� ��

�� (2.3)

Its application to a function � is defined as follows:��

��

The matrix index�

in Equation (2.3) only indicates that the 3 � 3 matrix is a stencil. Notethat each stencil corresponds to a row in the resulting system matrix when the wholesystem is assembled.


Mixed Derivatives In special cases for standard grids we will also use the mixed deriva-tive �� . This derivative can be discretized using a left or right oriented 7-point stencil��

and��

respectively:

� � � ! � ��

� � �

�� ! � �

��

��

A more symmetric and sparser 4-point stencil can be obtained by computing an average� �of both stencils above:

� � � !��

� � � �

��

Note that the resulting matrix has no diagonal entry. If not specified otherwise we use thestencil

� �for discretizing the mixed derivative.

Assembly Applying a stencil to every point in the standard grid, writing the grid func-tion as a vector by ordering the unknowns, and adding appropriate boundary conditionsyields a system of linear equations

� � � � % �that solves the discrete system. Figure 2.2 shows such a linear system for lexicographicordering (i.e. by

�-coordinate first and then by � -coordinate).

Figure 2.2 shows the most typical example for this approach. We discretize the equation� � � � with zero boundary conditions on the unit square. We use the standard grid andthe above approximation formulas for discretizing the Laplace operator, which results inthe well known 5-point stencil

��

� �

�� (2.4)

2.1.2 Unstructured Grids

For unstructured meshes we use the FEM for discretization. We start from the equation

� div � � � � � � � % �


Figure 2.2 Discrete linear System Plot of the sparsity and the solution of a linear systemarising from discretizing � � � � with zero boundary conditions on the unit square andstandard discretization on a 30 � 30 regular grid. (a) display a pattern plot of the upper� � � � � � � part of the corresponding matrix

�. All nonzeros are indicated by a dot.

The first 31 rows are strictly diagonal due to the Dirichlet boundary conditions, while allremaining rows in this picture contain five nonzeros corresponding to the standard 5-pointstencil.

0 20 40 60 80 100

0

20

40

60

80

100

nz = 330

(a) matrix

0

0.5

1

0

0.5

10

0.02

0.04

0.06

0.08

xy

u

(b) discrete solution

and Dirichlet boundary conditions

� �

for� � � �

Using the Galerkin approach and integration by parts yields the weak formulation

find �� such that� ��

��

�� % �

We assume � to be a linear combination of simple functions

with local support:

�

� �

The discretization and expansion yield� � � � � � � � � � � � � � � � �� %


Figure 2.3 Finite element discretization example Grid and solution of the equation� � � � for a triangulation of the shape of Switzerland. (Dimensions and scales are notaccurate.)

0 500 1000 1500 2000 2500

50 100 150 200 250 300 350 400 450 500

40

60

80

100

120

140

160

180

200

220

240

x

y

The resulting linear system�� %

is thus given as

� � � � � � � � � � � �� and (2.5)

% � � % � �We assume � and � to be piecewise constant functions. The local support of the basis func-tions

results in a sparse linear system, because most � � of the terms in Equation (2.5)

vanish.

We use piecewise linear basis functions over triangles and bilinear basis functions overrectangles. Figure 2.3 shows a plot of the solution for the Poisson problem � � � �with zero boundary conditions, obtained by the described FEM for a triangulation of thearea of Switzerland.

Remark 2.1.4 Note that we have left out many of the details of how we practicallyobtained the resulting linear system. We obtain the system by coordinate transforma-tions, integration of reference elements and assembly of the complete matrix. We refer towell known publications for these details, such as the book by Braess (1997) or the verydetailed book by Schwarz (1991). �


2.2 Iterative Methods

Main iteration loop We are seeking the solution to a given linear system�� %

using an iterative scheme. Starting from a given initial approximation � �� to � , we com-

pute consecutive approximations � � � � � � � � � � � �� until the residual

� �� % � � � ��has been effectively reduced to a given tolerance � it(we generally use � � � � ):�

� ��

�� it

We measure the convergence � of the method after�

iterations by computing the averagereduction of the residual norm in each step:

� � �

� ��

��

% � � � ��

�% � � �

��

We define the error in each step to be� ��

Remark 2.2.1 (additional stopping criterions) In practice we use three additional stop-ping criterions:

Limit iterations The number of iterations is generally limited by a given number. Thiscriterion aborts iterations that converge very slowly.

Divergence If at any point the relative residual norm exceeds 50 orders of magnitude, i.e.��

��

the iteration is stopped. This criterion prevents unnecessary computation of diverg-ing iterations. Note that, theoretically one can find examples where this criterionstops a converging iteration.

�

Iterative Methods 27

2.2.1 Jacobi and Gauss-Seidel Iteration

The damped Jacobi iteration

� �� % � � ��

� �� (2.6)

or the Gauss-Seidel method

� �� % � � �� (2.7)

are well known examples of iterative methods.

For stationary methods the effect of an iterative step on the error can best be describedusing matrix notation of the form:

� �� Let� � � � � � , with � the diagonal,

�the strictly lower triangular part, and � the

strictly upper triangular part of�

. Then, the damped Jacobi iteration corresponds to

�� (2.8)

whereas the Gauss-Seidel iteration corresponds to

�� (2.9)

In (2.6) the parameter � is chosen to maximize the reduction of the error in some sensethough. The optimal value � is problem and method dependent and usually unknown apriori. Although Gauss-Seidel typically leads to faster convergence, it is more difficultto implement in parallel (especially for sparse matrices), because each smoothing step in(2.7) requires the solution of a lower triangular system.

Both iterative methods exhibit impractical convergence rates for many important prob-lems, but are quite useful components within the multigrid iteration which we will intro-duce in the next section.

Krylov subspace methods have been proven to be more robust than the simple iterationsintroduced in equations (2.8) and (2.9). From the large variety of methods two algorithmshave become very popular: the generalized minimum residual method (GMRES) and thebi-conjugated stabilized CG method (BiCGstab). We focus on those two methods in thenext sections.


2.2.2 GMRES

GMRES (Saad and Schultz 1986) generates a sequence of orthogonal unit vectors �

that form a basis for the Krylov subspace� � � � � span �

��

��

��

The orthogonalization is done using a modified Gram-Schmidt procedure. The neededcoefficients and inner products are stored for fast retrieval.

The approximation� �� in step

�of GMRES has the form

� ��

where the coefficients � � � � are determined such that the residual norm is minimized ineach step: ��

� � �� min! � (2.10)

Obviously this method converges within � steps, but also becomes more computationallyexpensive in every step. Thus restarted versions of the algorithm with restarts after �

iterations are usually used. Algorithm 2.1 outlines preconditioned restarted GMRES.

Remark 2.2.2 One of the main properties of GMRES is that the residual norm inEquation (2.10) can be computed without computing the iterate itself. Thus the costlycomputation of � can be delayed until the residual is minimized to a given tolerance. Thiseffectively moves the statement on line 18 in Algorithm 2.1 outside the innermost loop.�2.2.3 BiCGstab

The well-known Conjugate Gradient method (Hestenes and Stiefel 1952) is a methodsuitable for symmetric positive definite matrices and thus not applicable in general. TheCG method is therefore extended with relations based on

� �without explicitly multiply-

ing with� �

. The resulting method is called BiConjugate gradient method (BiCG). Thecorrect choice of search directions and step sizes ensure bi-orthogonality of the searchdirections. BiCG has very unstable convergence behavior and is thus augmented by twoadditional stabilizations: the repeated application of the BiCG contraction operator andessentially repeated applications of the GMRES(1) method. The resulting method by vander Vorst (1992) is given in Algorithm 2.2

Iterative Methods 29

Algorithm 2.1 Restarted GMRES( � ) Outline of the restarted GMRES method.

1 GMRES � � � � � ��

�� 2 �

� ��

3 �4 while no convergence

�5

� � � � � � � – initial residual –6 � � � !

��

– set initial –7 �

��

8 for � � ��

9� � �

� – compute new Krylov vector –

10 for� � �� – modified Gram-Schmidt –

11

�� – project –

12� � � � � �

�� – remove –13 end for14

� � ��

��

��– compute

�entry –

15 � � � � ! �

� ��

– scale new vector –16 compute � such that

��

�� min!

17 end for18 �

� �� – update solution –

19 � � ��20 end while21 return � �22


Algorithm 2.2 Preconditioned BiCGstab Outline of the preconditioned BiCGstabmethod.

1 BiCGstab � � � � � ��

2� ��

��

3

��

4 � �5 while no convergence

�6 �

� � � ��

� � �

7 if � �8

� � �

� � �

9 else10 �

� � � �

� � � ! �

� � � � ��

� � � ! �

� � �

11� � �

� � � ��

� � � �

� � � � �

� � �

� � � �

12 end if13

��

14 � � ��

15 � � �

� � � ! ��

�

16 � �

� � � ��

�

�

17 break if

��

��

tol18

��

�

19 � � �

�20

� � �

�� ! ��

�21

� � �

� � � ��

� ��

� ��

22� � � � �

� �

23 break if � � �

24 end while25 return � �26

Multigrid 31

2.3 Multigrid

Krylov subspace methods, e.g. GMRES or BiCGstab mentioned in the previous section,all share one undesired property: the work per unknown increases as the matrix sizegrows. This fact can best be demonstrated by looking at the Poisson problem, the standardmodel problem for elliptic partial differential equations, which is given by

� � � � �� for � �� and

� � �� for � ��

Discretizing the Poisson problem with standard discretization techniques introduced inSection 2.1 yields the the discretization stencil (2.4). The resulting system matrix is sym-metric positive definite and has the following eigenvalues� � � � ! � � �� for � �� Thus the condition number of the discrete Laplace operator

� � is given by:

� ��

�� !�� !��

The condition number thus increases quadratically for� � � . A well-known result for

the CG iteration (see e.g. Greenbaum 1997) relates the convergence rate to the conditionnumber of the operator as follows:��

� � � ��

��

��

This result clearly shows that even for this simple symmetric positive definite modelproblem the most obvious Krylov subspace method, the CG method, does not achieve�

-independent convergence.

The convergence rate of the multigrid algorithm, which we will discuss in the next section,solves the model problem independent of the mesh parameter

�.

2.3.1 Multigrid Principle

The multigrid algorithm is often motivated by the following observation:

For the model problem and a random initial guess, the Gauss-Seidel iteration and thedamped Jacobi iteration, given in equations (2.6) and (2.7) are efficient in reducing the2-norm of the error in the first few iterations. After this initial error reduction the con-vergence becomes very slow. The effect can easily be analyzed using Fourier analysis,


see (Briggs, Henson, and McCormick 2000). For a more intuitive explanation withoutmathematical rigor, let us look at the Jacobi iteration. To compute a new component ina step of the iteration, the method uses information from neighboring points. Thus onecan imagine that in each step information is transported from one node to the neighboringones. High frequency error components are local and thus quickly eliminated while lowfrequency error components must be eliminated by propagating the information throughlarger regions of the grid. In other words: the first few effective steps of the iterationdamps high frequency error components, while the low frequency error components takemuch longer to be eliminated. While the norm of the error might be still quite large after afew iterations, it is usually smooth in some sense. That is the reason why these iterationsare called smoothers. They smooth the error.

The smoothness of the error must obviously be exploited implicitly, because the erroritself is unknown. We can make use of the smooth error using the residual equation thatrelates the error with the residual:

�� % � � � � (2.11)

and the fact that smooth discrete functions can be represented on a coarser grid. Usingcoarser grids in essence saves computational time. Some of the low frequencies on thefine grid become high frequencies on the coarse grid and the smoothing method becomeseffective again.

The residual equation alone can be used to formulate the residual correction algorithm(see Algorithm 2.3).

Algorithm 2.3 Residual correction algorithm Instead of solving the problem directly,the residual correction algorithm uses the residual as a right hand side to the originalsystem. The solution of the modified system is the error of the original system.

1 ResidualCorrection � � � � � � Solve � � � 2

� � � � � – calculate residual –3

� Solve � � � � – solve for the error –

4 �� – error correction –

5 return � �6

Let � � ��

�denote the number of points on the fine grid and �

� ��

�the

number of points on the coarse grid. Given transfer operators for a grid function fromthe fine grid to coarse grid (restriction)

� �� and from the coarse grid to thefine grid (interpolation)

� �� and the system matrix on the coarse grid� �

onecan formulate the two–grid algorithm (see Algorithm 2.4), which combines smoothingand coarse grid correction. The parameters � � and � � denote the number of pre- and

Multigrid 33

post-smoothing steps. These numbers are usually small, i.e. one or two pre- and post-smoothing steps are usually sufficient.

Remark 2.3.1 (two–grid in practice) The use of the two–grid algorithm is twofold. Itis not an efficient routine as presented in Algorithm 2.4, due to the exact solution of a(still) large system in line 7. In conjunction with an inexact solver it can be useful as apreconditioner. In practice it can also be useful when debugging, testing or evaluating amultigrid code. �Recursive application of this procedure on the coarse level eliminates the direct solve inline 7 of Algorithm 2.4 and leads to the multigrid iteration shown in Algorithm 2.5. Itsadditional parameter � controls the cycling structure of the multigrid iteration. A choiceof �

� is called a V-cycle, while � � leads to the well-known W-cycle.

The recursion works so well, because low frequencies on a grid level become high fre-quencies on the next coarser grid. Thus simple smoothers become effective again.

Algorithm 2.4 Two–Grid algorithm

1 Twogrid � �� Smoother � � � � � � � � � � � � � � � �

2 ��

3 for � � steps4 �

Smoother � � � � � � � � – pre-smoothing –

5 end for6

� � � � � �7 �

� � � � � �� – coarse grid correction –

8 for � � steps9 �

Smoother � � � � � � � � – post-smoothing –

10 end for11 return � �12

Remark 2.3.2 (initial guess) The recursive call within the multigrid iteration, i.e. line 9in Algorithm 2.5, has a 0 as its first argument. This means that the coarse grid correctionprocedure is started with a zero vector as the initial guess. This is clearly a good choicesince the result of the call is an error correction, which is small, when the iteration is closeto the solution. �Galerkin product The coarse grid problem

� �may be obtained analytically by simply

using the discretization techniques that were used for obtaining the fine grid problem� � ,


Algorithm 2.5 Multigrid algorithm Recursive application of the two–grid algorithm(Algorithm 2.4) yields the multigrid algorithm. The cycle is started with an initial guess� �� and the corresponding level number � � .

1 Multigrid � � � � � � Smoother � � � � � �#� � ��

2 for � � steps3 � � Smoother � � � � � � � � � – pre-smoothing –4 end for5

� � � � � � � � � – residual calculation –6

� � � � � � � � – restrict the residual –

7 if � � ��

then8 for � steps9

� � � �

Multigrid � � � � � � � � �� Smoother � � � � � � � � ��

10 – multigrid recursion –11 end for12 else13

� � � � � � �� – direct solve –

14 end � %15

� � � � � � � � – error interpolation –16 � � � � � � � – error correction –17 for � � steps18 � � Smoother � � � � � � � � � – post-smoothing –19 end for20 return � �21

Multigrid 35

i.e.��

Surprisingly, computing the operator� � � might have several disadvantages. It is not

guaranteed that� � � approximates the coarse grid problem well enough for satisfactory

multigrid convergence rates. Especially for complicated geometries it can be difficult toconstruct a coarse operator with very few unknowns. Non-connected regions, boundarieswith fine resolution and coarse grid matrices with large condition number can be issues inanalytically constructing coarse grid operators, see Trottenberg et al. (2001) and Alcouffeet al. (1981).

The Galerkin coarse grid approximation (GCA) is an alternative for computing a coarsegrid operator. Given an interpolation operator

�with full rank and appropriate size the

coarse grid operator is defined as�� (2.12)

For a suitable restriction�

one may define the more general coarse grid operator� �

� � � � . For selected problems and grids the GCA resembles the analytic operator� � � .

The operator obtained by the GCA in (2.12) has interesting properties, i.e. symmetry andpositive definiteness are conserved.

The GCA scaling factor � has obviously no direct effect upon the matrix properties itself,but induces a simple scaling of the result and thus over- or under-correction in the errorcorrection in the multigrid cycle.

Remark 2.3.3 (sparse coarse operators) A critical issue in retaining ideal multigrid ef-ficiency is the bounded relative density (number of nonzeros in each row) of the coarsegrid operator. Recall that the efficiency of many smoothers, like the Gauss-Seidel itera-tion, depends on the density of the given matrix. Various methods for operator truncationcan solve the problem of increasing relative densities, but the process is often found to beunstable and cause strong divergence of the multigrid iteration. �Standard interpolation and injection For the standard � � � grid with standard coars-ening the standard interpolation

� std and standard injection� std-inj can be defined as:

� std � � � � � � � � � �� $ � � � � � � � � � ��

We realize that this introduction is very short. For more detailed introductions to multigridthe reader is referred to Briggs et al. (2000), Trottenberg et al. (2001) and Wesseling


(1992).

2.3.2 Multigrid Convergence Theory

The classical convergence theory by Hackbusch (1985) is based on a splitting of the iter-ation matrix of the two-level method:

� � �� (2.13)

with � � � � � � � � �� For � � � and � � � Equation (2.13) leads to the following estimate:��

� ��

��

��

��

��

��(2.14)

Looking separately at the two factors in Equation (2.14) leads to the widely used smoothing–and approximation–properties. The smoothing property in essence states that the smootherremoves the high frequencies in the error, while not amplifying the low error frequencies.The approximation property forces the coarse grid correction procedure to be reasonable.

Smoothing property ��

��

� � �� for � � � (2.15)

Approximation property ��

�� (2.16)

Here � � is the order of the differential equation to be solved. With both approximationproperties fulfilled there exists a � for which the twogrid–iteration converges with

��

� � � � ��.

A more general�

-independent convergence for the W-Cycle can be shown if the contrac-tion number is level–independent.

For symmetric positive definite systems both conditions imply convergence of the V-cycle.

Multigrid 37

2.3.3 Algebraic Multigrid

Multigrid (MG) methods are sensitive to the subtle interplay between smoothing andcoarse-grid correction. When a standard geometric multigrid method is applied to dif-ficult problems, say with strong anisotropy, this interplay is disturbed, because the error isno longer smoothed equally well in all directions. Although manual intervention and se-lection of coarse grids can sometimes overcome this difficulty, it remains cumbersome toapply in practice to unstructured grids and complex geometries. In contrast, an algebraicmultigrid approach compensates for the deficient smoothing by a sophisticated choice ofthe coarser grids and the interpolation operators, which is based solely on the input matrix�

.

Many AMG variants exist, which differ in the coarsening strategy or the interpolationused – an introduction to various AMG methods can be found in Wagner (1998).

Following Ruge and Stuben (1987), we now describe the algebraic coarsening strategyand interpolation operators, which we shall combine with approximate inverses later anduse for our numerical experiments. We refer to this algorithm as classical AMG. Bythis we mean the approach implemented in the publicly availabe code AMG1R5. Thisfirst fairly general AMG code was made publically available in the mid eighties. Futureversions of this development have been substantially improved, but are only commerciallyavailable.

Setup As already indicated, the multigrid iteration itself (Algorithm 2.5) is not changedin the algebraic case. In AMG a setup phase constructs a hierarchy of grids (Algo-rithm 2.6) and the solution is found by combining smoothing and coarse grid correctionrecursively on the constructed levels.

Algorithm 2.6 AMG setup

1 AMGSetup � � � � 2 � �3 while �� min

�4 � �� – compute coarsening –5

� � � � �� & � �� – compute interpolation –

6

� � � � � �� – Galerkin approximation –

7 � � � �8 end while9 return � � �� – return operators –

10


Figure 2.4 Unidirectional error-smoothing Plot of the error on a 32 � 32 mesh afterfive Gauss-Seidel smoothing steps for the anisotropic Poisson problem described in Sec-tion 2.3.3 ( �

� � � � ). The smooth error component is aligned with the anisotropy, whichcan be read from the stencils.

5

10

15

20

25

30

5

10

15

20

25

30

0

0.2

0.4

0.6

xy

ER

RO

R

Anisotropic stencil:

��

� �

�

Isotropic stencil:

��

� �

�

Strong connections The Ruge/Stuben (1985) approach is based on the assumption thatthe Gauss-Seidel iteration is used as the smoother. For � -matrices the direction ofsmoothing can then be read from the stencils. A good example for illustration is thelocally anisotropic Laplace equation:

� � � � � � ��

�with

� � � �� and �� if � ��

� otherwise�

The equation is discretized on the boundary of the inner square as described in Section 2.1,which results in the stencil

��

� �

��

(2.17)

in the interior of the region.

Figure 2.4 shows the unidirectional smoothing of the Gauss-Seidel in the region of anisotropyin the middle of the domain. This effect can be read from the stencil in the interior of thedomain. The direction of smoothing can be identified by looking at the relative magnitudeof the negative off-diagonal entries in a row.

For small � stencil (2.17) has relatively small negative entries � � in the � direction (ver-

Multigrid 39

Table 2.1 Relationships of unknowns in AMG

Condition Notation Interpretation� � ��

��

�� (strongly) depends on �

and �� or

� (strongly) influences ��

��

�� weakly depends on �

and �� or

� weakly influences �

tical in the stencil) and relatively large negative entries � � in the�

direction (horizontalin the stencil). The large entries correspond to efficient error smoothing in that direction,while the error components in the direction of small entries are essentially decorrelated.

The above mentioned large entries are called strong connections and the small entriescorrespondingly weak connections. The formal definition also declares all positive entriesas weak connections. Negative off-diagonal entries with an absolute value at least aslarge as the absolute value of the largest entry multiplied with some factor � � � � � � �are declared as being strong connections. This factor is called the strong connectionthreshold. The formal definition and notation for strong connections and similar relationscan be found in Table 2.1. We define the set of dependencies of a point � as

� �

�

��

�

the set of influences of a point � as

� � �

��

�

Strong connections are never stored explicitly in the original Ruge/Stuben approach. Toincrease the modularity of our code we store strong connections in a matrix

� � � .Based on the notion of strong and weak connections we can now proceed on to introducingthe matrix dependent coarsening and interpolation.

Algebraic coarsening On every level, the coarsening strategy must divide�

, the setof all points on that level, into two disjoint sets: � , the coarse points, also present on thecoarser level, and � , the fine points, which are absent on the coarser level. The choice of� and � induces the � ! � –splitting of

� � � � , with �� .

Coarse grid correction heavily depends on accurate interpolation. Accurate interpolationis guaranteed if every � point is surrounded by sufficiently many strongly dependent �points. A typical configuration is shown in Figure 2.5.


Figure 2.5 Ideal coarsening configuration for interpolation Strong dependencies areindicated with solid arrows, while weak dependencies are represented by dashed arrows.� points are represented by solid circles, whereas F points are represented by dashedcircles.Hence all the strong dependencies of point � ( � � and �� ) are � points; therefore � � and ��are good candidates for interpolating � .

�

��

��

� � � strongdependency

weakdependency

C point

F point

The coarsening algorithm attempts to determine a � ! � –splitting, which maximizes the� –to– � dependency for all � points (Coarsening Goal 1), with a minimal set � (Coars-ening Goal 2). It is important to strike a good balance between these two conflicting goals,as the overall computational effort depends not only on the convergence rate, but also onthe amount of work per multigrid cycle. Clearly the optimal � ! � -splitting minimizestotal execution time. However, since the convergence rate is generally unpredictable, thecoarsening algorithm merely attempts to meet Coarsening Goals 1 and 2 in a heuristicfashion. In doing so, its complexity must not exceed � � ��

�to retain the overall

complexity of the V-cycle multigrid iteration.

Ruge&Stuben heuristic The Ruge/Stuben coarsening is a greedy optimization algo-rithm based on a preliminary � point choice (see Algorithm 2.7 on the next page). Eachpoint is associated with a

� � � �!� � � � . Initially all point are neither � , nor � point, but in aset � . Then the point with the highest

� � � �!� � � � � � is chosen as a � point. Its influences� � that are still in � are then assigned � points. This procedure is repeated until the set� is empty. For more details see Section 7.1 on page 131.

This classical coarsening algorithm is known to produce very good coarse sets � , espe-cially for regular geometric problems. For very special problems it even produces thesame coarse grids as used in geometric multigrid.

Remark 2.3.4 The procedure is obviously sequential. The recomputation of the� � � ��

Multigrid 41

Algorithm 2.7 Ruge/Stuben coarsening

1 RugeStubenCoarsening � � �� 2 initialize

� � � �� 3 � � �� 4 while � � � � 5 select � � � with maximal

� � � �� – select C point –6 � � � �

7 for all �

� � �

– make influences � points –8 � � � �

9 update

� � � �!� � � �10

11 return �� 12

makes it impossible to pick several points in one step. Parallel algorithms can be foundin (Krechel and Stuben 1999; Henson and Meyer-Yang 2002). We will discuss parallelapproaches in Section 6. �Interpolation Within the multigrid iteration a coarse grid to fine grid transfer operatorcalled interpolation is needed. In the following we describe Ruge/Stuben direct inter-polation. When defining an interpolation for general matrices we consider the followingprinciples:

Smooth error interpolation We assume the error to be smooth along strong connections.Error interpolation weights should be in the direction of smooth error components.We would like to note that we also want the following converse to be true: wedo not want to interpolate error components in the direction of weak connections,because the error will be rough in that direction.

Small interpolatory set The set of strongly connected coarse neighbors � might be small;most coarsening algorithms do only guarantee

��

�� .

Handling of positive entries The matrix may contain (large) positive entries.

For nearly � -matrices the algebraically smooth error, i.e. the error after a few iterationsof the Gauss-Seidel method, approximately satisfies

� �� (2.18)


A splitting of Equation (2.18) is given by four sets:� �

are the strong negative connec-tions,

� � the weak negative connections,� � the strong positive connections and

� �the weak positive connections. For � -matrices the sets

� � and� � are empty, while for

essentially positive type matrices the set� � is assumed to be empty. It is assumed that

these four sets are disjoint and that their union contains all neighbors.

Using this splitting under any definition of strong and weak connection Equation (2.18)can be split in the following way:

� � � � � ��

� � � � � � ��

� � � � � � ��

� � � � � � ��

� � � � � � � � � ��

(2.19)

We approximate and rewrite Equation (2.19) by

� eliminating some of the sums with appropriate rescaling of factors � � �� and/or modifying � � and

� solving for�

.

For all � � � we then obtain an interpolation formula of the form� � ��

In the general case positive and negative connections are treated symmetrically (� �

� � � � and� � � � � � ). The weak connections are distributed proportionally among

their strong counterparts:

� � � � � ��

� � ��

If the strong counterpart is empty (even if the weak set is also empty), they are added tothe diagonal and the respective �

� is set to 0. The weights are then given by:

� � � � � � � ! � � � � �� ! � � � � � �

Remark 2.3.5 The � ! � splitting for direct coarsening needs to provide enough � to �connectivity in the coarse grid selection. This limits the speed of coarsening. Further � to� connectivity can be achieved by additionally using strong connections to � variables.�

Multigrid 43

Remark 2.3.6 Many other interpolation schemes exist, such as indirect interpolation, Ja-cobi interpolation and multipass interpolation by Stuben (2001). An interpolation schemefor compact 9 point stencils in finite difference equations was proposed by de Zeeuw(1990). Wagner (2000) derives an interpolation scheme based on norm minimization of��

�� , where

�is the iteration matrix of the smoother. More recent schemes

were proposed by Wan, Chan, and Smith (2000) and Brandt (2000). There are more inter-polation methods contained in the references of these citations. The experiments in thisthesis are restricted to the very basic direct interpolation. �Remove Dirichlet nodes Before we solve a given problem we eliminate all Dirichletunknowns, i.e. unknowns that correspond to rows in the matrix with solely diagonal en-tries. Such unknowns need to be forced to stay on the fine grid, because they cannot beinterpolated. Since their values can be deduced solely from the corresponding entry in theright hand side. We therefore eliminate those Dirichlet nodes in a preprocessing step.

The solution of a Dirichlet unknown � is obviously given by � �� % �� ! � �� , seeAlgorithm 2.8.

Algorithm 2.8 Removing Dirichlet nodes This algorithm removes rows and columnsthat have solely diagonal entries from the matrix and modifies the right hand side of thesystem accordingly. Such rows can emerge from Dirichlet boundary conditions and maydegrade smoothing efficiency on the coarsest level.

1 RemoveDirichlet � � % � � 2

� set of Dirichlet rows – rows to remove –

3

� set of non Dirichlet rows – rows to keep –

4� zero vector

5 for all � � � 6

� �� % �� ! � �� – compute partial solution –7

8

% % � � � – compute correction –9

�% % � � � – reduce right hand side –10

�� – reduce system matrix –11 return �� % �

– return preprocessed system –12

Diagonal scaling Another common transformation before the setup phase scales allrows of the input matrix to rows with a unit diagonal. Algorithm 2.9 shows this simpleprocedure, including the necessary transformation of the right hand side. Note that thisscaling destroys possible symmetry in the original operator

�.


Algorithm 2.9 Scaling to unit diagonal This algorithm scales all rows in the matrix toa diagonal of 1.

1 ScaleToUnitdiagonal � � % � � 2 for all �3 if

� �� 4

� �5 else6

� �� 7 end if8

�� ! � – scale a row –9

�% �� % �� ! � – scale right hand side –10 end for all11 return �� % �

– return preprocessed system –12

Table 2.2 Standard settings for experiments� � � � � Strong connection threshold� � � � Galerkin projection scaling factor

� pre, � post � number of smoothing steps� min � � � stopping criterion for coarsening� � !�� optimal Jacobi damping for 2D Laplacian� it � � �

�

tolerance for iteration

2.3.4 Experiment Settings

We report on experiments with our own implementation of MG and AMG, described inSection 7. If not reported otherwise our standard settings are given in Table 2.2.

We run all of our experiments on a standard Intel Pentium 4 PC running Redhat Linux7.1. Our ASUS P4B series motherboard is equipped with 256 KB cache and 1GB of mainmemory. The processor is clocked at 1.6 GHz. We are running the Linux Kernel 2.4.9-31 compiled for the Pentium 4 processor. All experiments can be verified, since the fullsource code is available1 and each experiment corresponds to a specific file. We indicatethat file using the [➠file.ext] notation.

Experiment 1 (MG/AMG for Laplacian) We compare the performance of the algebraicapproach to the geometric approach, with standard coarsening and interpolation for thePoisson problem �� . Table 2.3 lists operator densities, execution times and conver-

1see http://www.inf.ethz.ch/personal/broeker/wolfamg

SPAI 45

Table 2.3 MG/AMG for Laplace Operator complexities � � , execution times � sol in sec-onds and convergence rates � for the Laplace problem. We compare geometric with alge-braic multigrid using standard components and Gauss-Seidel smoothing.[➠exp 001a.py]

131 � 131 259 � 259 515 � 515�� sol[s] � ��

sol[s] � �� sol[s] �

MG 1.6 0.1 0.002 1.6 0.4 0.002 1.6 1.4 0.002AMG 2.2 0.1 0.004 2.2 0.5 0.006 2.2 2.1 0.008

gence rates. All properties are obviously � -independent and comparable. AMG opera-tor densities are slightly higher compared to MG complexities: checkerboard coarseninghalves the number of grid points in the algebraic case, while standard coarsening “quar-ters” the number of points on each level. Convergence rates and operator complexitiesmatch the ones reported in the literature, see Ruge and Stuben (1987) and Grauschopf,Griebel, and Regler (1997). The largest problem with 515 � 515 (265.225) unknowns2 canbe solved with AMG in approximately 2 seconds. �

2.4 SPAI

It is well known that the inverse� � � of a sparse matrix

�is generally full. Thus comput-

ing the inverse directly is generally impossible. Recently various algorithms have beenproposed to compute a sparse approximate inverse � . Examples are the FSAI approachby Kolotilina and Yeremin (1993a), the MR algorithm by Chow and Saad (1998), and theAINV approach by Benzi, Meyer, and Tuma (1996). Once computed, the approximateinverse � is applied as a preconditioner to the linear system for use with a Krylov sub-space iterative method. For a comparative study of various sparse approximate inversepreconditioners we refer to Benzi and Tuma (1999).

In this thesis we shall consider approximate inverses based on the SPAI algorithm byGrote and Huckle (1997). The SPAI algorithm computes an approximate inverse � thatminimizes � � � � in the Frobenius norm:��

��

� � �

� ��

Since an effective sparsity pattern of � is usually unknown a priori, the original SPAI-Algorithm begins with a diagonal pattern. Then the algorithm proceeds with augmentingthe sparsity pattern of � to further reduce each residual � �

� ��

� �� . The pro-gressive reduction of the 2-norm of � � involves two steps. First, the algorithm identifies a

2The awkward gridsize =515 is due to restrictions in the geometric coarsening process.


Algorithm 2.10 Outline of the SPAI algorithm

1 Spai � � � � � 2 for every column

� �� in�

do3 set initial sparsity �

– usually a diagonal matrix –4 compute first approximation to � �� 5 while

�� do – fill columns of approximate in-

verse –6 compute most profitable indices �7 update � �� using � and �8 � � � �9 end while

10 end for11 assemble all � �� to �12 return � �13

set of potential new candidates, based on the sparsity pattern of�

and the current (sparse)residual � � . Second, the algorithm selects the most profitable entries, usually less than fiveentries, by computing for each candidate a cheap upper bound for the reduction in � � � �� .Once the new entries have been selected and added to � � �� , the (small) least-squaresproblem is solved again with the augmented set of indices. The algorithm proceeds untileach row � � �� of � satisfies

� � �� (2.20)

Here � is a tolerance set by the user, which controls the fill-in and the quality of thepreconditioner � . A lower value of � usually yields a more effective preconditioner, butthe cost of computing �

SPAI(� ) may become prohibitive; moreover, a denser �results in a higher cost per application. The optimal value of � minimizes the total time.It depends on the problem, the discretization, the desired accuracy, and the computerarchitecture. Further details about the original SPAI-Algorithm can be found in Groteand Huckle (1997). The further development of a block version of SPAI can be found inBarnard and Grote (1999). A parallel and sequential SPAI code is freely available3.

3see http://www.inf.ethz.ch/personal/broeker/spai

Chapter 3

Multigrid Smoothing with SPAI

3.1 Smoothing Idea, SPAI Hierarchy and Theory . . . . . . . . . . . 49

3.1.1 The SPAI Hierarchy . . . . . . . . . . . . . . . . . . . . . . 50

3.1.2 Why should SPAI be a good smoother? . . . . . . . . . . . . 52

3.1.3 Experimental Local Fourier Analysis . . . . . . . . . . . . . 54

3.1.4 Theoretical properties of SPAI-0 . . . . . . . . . . . . . . . 58

3.2 Numerical Experiments for Standard Problems . . . . . . . . . . 61

3.2.1 Poisson problem . . . . . . . . . . . . . . . . . . . . . . . . 62

3.2.2 Locally Anisotropic diffusion . . . . . . . . . . . . . . . . . 65

3.2.3 Constant convection . . . . . . . . . . . . . . . . . . . . . . 70

3.2.4 Constant Convection with Shishkin Mesh . . . . . . . . . . . 74

3.2.5 Rotating Flow . . . . . . . . . . . . . . . . . . . . . . . . . 75

3.2.6 Rotated Anisotropy . . . . . . . . . . . . . . . . . . . . . . 78

3.3 Parallel MG with SPAI-1 . . . . . . . . . . . . . . . . . . . . . . 81

3.4 Conclusions: SPAI Smoothes and is Parallel . . . . . . . . . . . . 83

In this Chapter we investigate the use of Frobenius-norm minimized approximate inversesas smoothers in the multigrid iteration. We start with a motivation why we considerexactly this class of approximate inverses. There are two main approaches for computinga sparse approximate inverse:

Factorized approximate inverses: In the factorized case one computes an approximatefactorization

� � � � � � , where� � and

� � are sparse and easy to invert, i.e. anefficient procedure to compute

� � �� can be given. This approximate inversionprocedure is usually given by backward substitution or trivial direct inversion fordiagonal matrices. The factorization can also include more than two matrix factors.Examples for factorized approximate inverses are ILU factorizations (see Wessel-ing 1993),the FSAI approach by Kolotilina and Yeremin (1993b) and the AINVapproach by Benzi, Meyer, and Tuma (1996).

48 Multigrid Smoothing with SPAI

Direct approximate inverses: Direct computation of an approximate inverse matrix �means that the matrix is explicitly computed and stored. The application of �then reduces to a matrix vector product, which is efficient and parallel if � issufficiently sparse. Direct sparse approximate inverses have become popular in thelast years, see e.g. Kolotilina and Yeremin (1993a), Chow and Saad (1998), Groteand Huckle (1997). Benzi and Tuma (1999) provide a comparative study of selectedapproximate inverses.

Approximate inverses as smoothers were first introduced in the FAPIN algorithm by Fred-erickson (1975) and then extended and analyzed by Benson and Frederickson (1982) andthe references therein. The FAPIN approach presented in Frederickson (1975) is restrictedto the constant-coefficient case. The FAPIN algorithm contained many elements of themultigrid iteration as introduced in Chapter 2.3.

Combining multigrid with ILU techniques as a smoother has been popular and proveneffective since the mid 80’s. See Griebel (1992), Wesseling (1993), Saad and Zhang(1999), Hackbusch and Wittum (1993), and the references therein.

Recently Tang and Wan (2000) showed that direct approximate inverses based on Frobenius-norm minimization are effective as smoothers for various elliptic problems on unstruc-tured grids.

Our Approach In this section we introduce direct sparse approximate inverses basedon Frobenius-norm minimization as smoothers. The advantage of an explicitly computedinverse is twofold:

Parallelism: The Frobenius-norm minimization procedure makes the computation of theinverse matrix � parallel. The application of the inverse is naturally parallel, sinceit consists of a simple sparse matrix-vector multiplication.

Additional information: By computing an explicit inverse one obviously has random ac-cess to the elements in � . Thus additional information about the nature of thelinear system

�under investigation is given. Incorporation of this information in

the AMG algorithm provides potential for improving the robustness of the algebraicmultigrid solver.

Both of these advantages are inherently included in the Frobenius-norm minimized directapproximate inverses. Factorized approximate inverses in contrast are associated withordering dependence, sequential computational dependencies and implicit inversion.

Smoothing Idea, SPAI Hierarchy and Theory 49

3.1 Smoothing Idea, SPAI Hierarchy and Theory

Let � denote a sparse approximation of� � � . The residual equation

� � � � �� motivates that a corresponding iterative method is given by

� �� (3.1)

Classical iterative methods Let� �� be a splitting of the original operator. It

follows that

� � � � � � � �� (3.2)

For a given splitting� � � � � Equation (3.2) shows that a corresponding iterative method

is given by� � �� or

� � �� respectively.

Let� � � � � � , with � the diagonal,

�the lower triangular part, and � the upper

triangular part of�

. Table 3.1 lists the corresponding � for some basic smoothers, suchas the Jacobi iteration and the Gauss-Seidel iteration.

In multigrid usually few of such standard iterations are sufficient to obtain satisfactoryconvergence. In many classical smoothers � is not computed explicitly, but the inversioncontained in � is achieved implicitly by procedures such as backward substitution.

Table 3.1 Basic iterative methods List of approximate inverses � , with� � � � � �

and where�

, � and � are the lower triangular, the diagonal and the upper triangular partof�

respectively.�� ,�� and

��are lower diagonal and upper triangular matrices with a

prescribed sparsity pattern.

damped Jacobi � � � �Gauss-Seidel lexicographic �� or � � � � � � � �Gauss-Seidel forward/backward � � �� or � � �� ILU ��

In order for an iterative scheme described by Equation (3.1) to be an effective smoother� must efficiently reduce high frequency error components, but yet be sparse for efficientcomputation of � and efficient evaluation of an iterative step. As the approximate inverse� is known explicitly, each iteration step requires the computation of the residual and oneadditional matrix-vector multiply of � with the residual � . Thus, it is easy to parallelize


and cheap to evaluate, because � is sparse.

3.1.1 The SPAI Hierarchy

We are considering substantially simplified versions of the SPAI smoother (see Sec-tion 2.4), where the sparsity pattern of � is not adaptively computed, but rather givena-priori. The sparsity pattern of the approximate inverse that is to be computed is codedin a sparse matrix

�. The nonzeros of

�specify the nonzero that are to be computed for

the approximate inverse. This approach is practical for notational purposes and can beused in programs as well. For computing an approximate inverse with a given sparsitypattern, one can create a matrix with the required sparsity pattern

�and then pass

�and�

to a function that overwrites the values in�

with the values for � and thus transforms�into � .

The Frobenius-norm minimization approach��

��

��

��

� ��

��

��

��

��

��

� � � ��

yields a least squares problem for every row�

in � . With the set of nonzero indices inthe

� th row � � � � �� , it is given by� � ��

� � � �With

�� and � � � � � � � this problem can be solved using the normal

equations

� ��

This approach is implemented in Algorithm 3.1. � is always well-defined if�

is non-singular.

Remark 3.1.1 (normal equations) Normal equations are often ill-conditioned. Alterna-tives to computing the normal equations are the QR decomposition or the singular valuedecomposition. The matrices of interest in computing SPAI inverses are all very small


Algorithm 3.1 Spai algorithm This algorithm computes a sparse approximate inverse� that minimizes

��

�� for an � with the sparsity pattern of

�by solving the

normal equations of the least squares problem.

1 SpaiFixed � � � �� 2 for

� � to �� – loop over all rows of –3 � � � � � �� – indices of nonzeros in

�–

4

�� – select submatrix –5 � � � � � � ��

– solve normal equations –6

7 return � �8

and in practice using the normal equations has proven to be stable enough. Additionally,sparse factorizations are much more complicated to implement. �Now the question is what sparsity pattern to choose. Demko, Moss, and Smith (1984)analyze the decay of the true inverse of band matrices away from the diagonal. Theyshow that most entries in the exact inverse

� � � are very small. For positive definite andtridiagonal matrices with nonzero entries for

� � � ��

and� � � � � they show that

� � �decays away from the diagonal. Later in this section we will show that approximateinverses resemble the decay of the discrete Green’s function. All this motivates the use ofdirectly computed approximate inverses with sparsity patterns based on powers of

�or

powers of the strong connections in�

.

SPAI-0 For�

and consequently � being a diagonal matrix one can calculate theentries in � directly. It is given by:

� � � � � � � � � ��

��

� � (3.3)

Due to this explicit formula one can derive several theoretical properties ofthe SPAI-0 smoother. We discuss these properties in Section 3.1.4.

SPAI-1 The sparsity pattern of � is that of�

:

SPAI-1�

SpaiFixed � � � � .SPAI-1r The sparsity pattern of � is that of the strong connections in

�:

SPAI-1r�

SpaiFixed � � � � � � .


SPAI-�

(r) A more general definition yields:

SPAI-� �

SpaiFixed � � � � � �SPAI-

�r

�SpaiFixed � � � � � � � �

This general definition explains the naming of the SPAI hierarchy. Generallyspeaking the sparsity pattern of � should be as sparse as possible. Obvi-ously the denser � , the more storage required and the more computationaltime is spent for a matrix vector multiply with � . In practice only SPAI-2rand sometimes SPAI-2 will be used.

SPAI( � ) The sparsity pattern of � is determined automatically by the SPAI algorithm(see Algorithm 2.10 on page 46).

Remark 3.1.2 (left and right inverse) In practice it is important to remember that theSPAI( � ) algorithm allows the computation of a left and a right approximate inverse thatminimize

��

��and

��

��respectively. For smoothing it is required to compute

a left inverse. �Clearly, when comparing the performance of various smoothers, we cannot limit our-selves to comparing the number of multigrid iterations, but must also take into accountthe additional computational work due to the smoother. To do so, we calculate the totaldensity ratio � � of nonzero entries in � to those in

� �:

� � � � �� nnz � �

nnz � � � �

Hence the additional amount of work due to the smoother is proportional to � � .

For a standard five-point stencil on a regular two-dimensional grid, � SPAI-0 � � � ! � , likedamped Jacobi. Since � SPAI-1

� � , the SPAI-1 smoother is about 67% times more

expensive here than the SPAI-0 smoother. For SPAI � � , the total density ratio, � SPAI � � ,depends on � : it increases monotonically with decreasing � . We remark that � SPAI � �

�� , whenever SPAI � � leads to a sparser approximate inverse than SPAI-1.

3.1.2 Why should SPAI be a good smoother?

Why do approximate inverses yield effective smoothers for problems which come frompartial differential equations? As the mesh parameter

�tends to zero, the solution of the

linear system,� � � � % � � (3.4)


tends to the solution of the underlying differential equation,�� % �� (3.5)

with appropriate boundary conditions. Here the matrix� � corresponds to a discrete ver-

sion of the differential operator�

. Let � �� denote the�

-th row of � � � � � . It solves thelinear system

� � � � � �� (3.6)

with� � � � the

�-th unit vector. As

� � � , � �� tends to the Green’s function �� ,which solves

�� (3.7)

Here� �

denotes the adjoint differential operator and� � � � � � � the “delta-function”

centered about� � � . To exhibit the correspondence between � �� and �� , we recall

that� �

is formally defined by the identity

� � � � � � � � � � (3.8)

for all � � in appropriate function spaces. Equation (3.8) is the continuous counterpart tothe relation

� � � � � � � � � � � � � � � � � (3.9)

From (3.5), (3.7) and (3.8) we conclude that

� � � � � � � � � � � � � � �� % �� (3.10)

Similarly, the combination of (3.4), (3.6), and (3.9) leads to the discrete counterpartof (3.10),

� � � � � � � � � � � � � � � � � �� % � � � (3.11)

Comparison of (3.10) and (3.11) shows that � �� corresponds to to �� as� � � .

The�

-th row � � � � � of the approximate inverse � � solves

� ��

Hence � � � � approximates the�

-th column of� � �

, that is � �� in (3.6), in the (discrete)2-norm for a fixed sparsity pattern of � � � � . The nonzero entries of � � � � usually liein a neighborhood of

� � � : they correspond to mesh points� � close to

� � � . There-fore, after an appropriate scaling in inverse powers of

�, we see that � � � � approximates

�� locally in the (continuous)� � � � -norm. For (partial) differential operators,


�� typically is singular at� � � and decays smoothly, but not necessarily rapidly,

with increasing distance

��

�. Clearly the slower the decay, the denser � � must

be to approximate well� � �� . This deficiency of sparse approximate inverse precondi-

tioners was also pointed out by Tang and Wan (2000). At the same time, however, itsuggests that sparse approximate inverses, obtained by the minimization of � �� inthe Frobenius norm, naturally yield smoothers for multigrid. Indeed to be effective, a pre-conditioner must approximate

� � �� uniformly over the entire spectrum of�

. In contrast,an effective smoother only needs to capture the high-frequency behavior of

� � �� . Yetthis high-frequency behavior corresponds to the singular, local behavior of � �� ,precisely that which is approximated by � � � � .To illustrate this fact, we consider the standard five-point stencil of the discrete Laplacianon a � � � � � grid. In Figure 3.1 on the facing page we compare

� � � with the Gauss-Seidel approximate inverse � � � � � � and two explicit approximate inverses, SPAI-1and SPAI-2. We recall that Gauss-Seidel, a poor preconditioner for this problem, remainsan excellent smoother, because it captures the high-frequency behavior of

� � � . Similarly,SPAI-1 and SPAI � � yield local operators with, as we shall see, good smoothing property.Despite the resemblance between the Gauss-Seidel and the SPAI approximate inverses,we note the one-sidedness of the former, in contrast to the symmetry of the latter.

3.1.3 Experimental Local Fourier Analysis

Local Fourier Analysis (LFA) is a formal method for investigating twogrid convergencerates, see Chapter 4 in Trottenberg, Oosterlee, and Schuller (2001). In this section weexperimentally compute approximations to the asymptotic quantities to be derived in aformal LFA. Even for small gridsizes, like 64 � 64 points, the experimentally computedvalues present a good approximation of the asymptotic values found in the literature.


Figure 3.1 Discrete (approximate) Green’s functions Row � � � � of the exact anddifferent approximate inverses for Laplace’s equation on a 16 � 16 grid.

0

5

10

15

0

5

10

15

0

0.5

1

1.5

x 10−3

(a) �� , exact inverse

0

5

10

15

0

5

10

15

0

0.5

1

1.5

x 10−3

(b)�� , Gauss-Seidel

0

5

10

15

0

5

10

15

0

0.5

1

1.5

x 10−3

(c) SPAI-1��

0

5

10

15

0

5

10

15

0

0.5

1

1.5

x 10−3

(d) SPAI(0.2)��


We consider the continuous and periodic functions � � � ��

If the unknowns on the � � � standard grid are given by � � � � ��

�, we evaluate the

continuous functions on the grid points and store the values in a vector� � � :

� � � ��

We refer to� � � as Fourier components. They coincide with the eigenvectors of the dis-

crete Laplacian with zero Dirichlet boundary conditions. They consequently form a basisfor� � �

. In other words, with real coefficients � � � any grid function can be expressed by

�

� � ��

We split Fourier basis into a high- and a low-frequency part by separating the frequenciesas follows:

� low �� !��

� high �� ! � low

The error smoothing effect can now be analyzed by looking at the reduction of high fre-quency error components. We investigate this reduction by looking at the reduction ofan approximate solution that contains all frequencies in the high oscillation space. Wetherefore choose the initial error:

� � � ��

high

� � � �

which is equivalent to using the coefficients

� � � � � �� !��

� � ��

After a step of any standard iteration we obtain an iterated error��.

Analyzing��

in the Fourier representation with coefficients�� , we compute the experi-

mentally observed smoothing factor � , which we define to be

� ��

� � ��

high ��

��

��

� �A formal LFA would yield the asymptotic value for � . Thus the experimentally observedvalues for � present a lower bound for the asymptotic smoothing factor.


Figure 3.2 Fourier coefficients Smoothing analysis for the Laplacian and differentsmoothers. We plot the coefficients of the error

�in (a) after one iteration of various

smoothers.

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

1 2 3 4 5 6 7 8 9 10

1

2

3

4

5

6

7

8

9

10

(a) �

1 2 3 4 5 6 7 8 9 10

1

2

3

4

5

6

7

8

9

10

(b) Jacobi

1 2 3 4 5 6 7 8 9 10

1

2

3

4

5

6

7

8

9

10

(c) ��

-Jacobi

1 2 3 4 5 6 7 8 9 10

1

2

3

4

5

6

7

8

9

10

(d) Gauss-Seidel

1 2 3 4 5 6 7 8 9 10

1

2

3

4

5

6

7

8

9

10

(e) SPAI-0

1 2 3 4 5 6 7 8 9 10

1

2

3

4

5

6

7

8

9

10

(f) SPAI-1

1 2 3 4 5 6 7 8 9 10

1

2

3

4

5

6

7

8

9

10

(g) SPAI-2

1 2 3 4 5 6 7 8 9 10

1

2

3

4

5

6

7

8

9

10

(h) SPAI(0.2)

Figure 3.2 shows graphs of the coefficients � � � for different smoothers. In Figure 3.2(a)we plot the initial error. The L-shape shows that all coefficients corresponding to low fre-quencies are zero. Subfigure (b) shows a typical result for the undamped Jacobi iteration.


In the very high frequency part of the domain represented by the lower right-hand cornerof the region essentially no smoothing is observed. In contrast, the damped jacobi itera-tion smoothes, as can be seen by comparing (b) and (c). Subfigure (e) shows that dampedJacobi and SPAI-0 are essentially equivalent for this problem. Gauss-Seidel smootheswithout dampening, as can be seen in Figure 3.2(d). Subfigures (f-h) show the smoothingefficiency of different smoothers based on approximate inverses. The plots show that thesmoothing effect is amplified as the sparsity of the approximate inverses is augmented.

3.1.4 Theoretical properties of SPAI-0

The approximation property from Equation (2.16) is independent of the smoother. It de-pends only on the discretization, the prolongation operator

�, and the restriction operator�

. For instance, with full elliptic regularity, canonical prolongation and restriction, andGalerkin discretization (Equation (2.12)), the approximation property holds for a stablefinite element discretization. As a consequence in the remainder of this section we inves-tigate the smoothing property of the SPAI-0 smoother only.

Periodic Laplacian We first point out a very special situation where SPAI-0 and dampedJacobi, with optimal relaxation parameter � � , lead to identical smoothers.

Theorem 1 ( � -Jacobi and SPAI-0 coincide for periodic Laplacian) Consider Laplace’sequation with periodic boundary conditions in space dimensions:

� � � %in � � � � �

�� (3.12)

Let�

result from standard second-order finite difference discretization on a regular mesh.Then SPAI-0 and Jacobi smoothing with optimal damping parameter � � are identical. �In this special situation, the minimization of � � � � � � in the Frobenius norm automati-cally leads to the same diagonal approximate inverse as damped Jacobi, that is � � � � .This is quite remarkable, because SPAI-0 is parameter-free, whereas � � is specificallygeared towards optimal smoothing. Hence the Frobenius norm appears as the naturalnorm to measure the quality of the smoother.

Proof We show the proof in detail for � , and then briefly outline the proof forarbitrary .

1-D CASE ( � ):We discretize (3.12) on � � � � � with � equispaced grid points,

� � � , � �� ,


� �#! � . Then standard second-order finite difference approximation leads to the � � �circulant matrix,

� 1-D

��

� � � � ��

. . .. . .

. . .� � � � �

� � � � �

��

with eigenvalues ��

� � �� The eigenvalues appear in increasing order,

��

�� , with the high frequencies locatedin the upper half,

� � � !�� . The eigenvalues of the Jacobi smoother� , defined in (2.8),

are � � � � � � � � � � � � � !�� . Each � � determines the amplification factor of the

�-th

Fourier component after one smoothing step with� . For

� � � � � � � , the eigenvalues� � in the high frequency range,

� � � !�� , are both positive and negative. Since theydecrease monotonically with

�, the optimal value � � , which minimizes ��

��

�,

realizes a high frequency spectrum, which is symmetric about the origin. Therefore, � �solves

� � � � � � ��

��

� � � � � � � � � � � � �

This yields � � � ! � for � .Since � � � � � � and � � � �� for all � � � � � , the approximate inverse �for SPAI-0 has the constant diagonal entry �

� � � ! � . Therefore, SPAI-0 and dampedJacobi coincide, since � � � � � � .GENERAL CASE ( � � ):In dimensions the discrete Laplacian,

� d-D can be written as a sum of Kronecker prod-ucts of

� 1-D with the identity. For instance, in two and three space dimensions, we have� 2-D � 1-D $ � � � $ � 1-D

� 3-D � 1-D $ � $ � � � $ � 1-D $ �� $ � $ � 1-D �As a consequence the eigenvalues are sums of the eigenvalues from the one-dimensionalcase. For instance , in two dimensions

� � � � 2-D � � � 1-D � � � � � 1-D � � �� Again the eigenvalues increase with increasing

�,�� if

� � �, where

�and

�

are -multi-indices in � � � ��

. Here the high frequencies correspond to multi-indices

� � � �� , with at least one� � � !�� . The monotonicity of

� �implies


that of ��

, the eigenvalues of� . Thus we obtain � � by equating � �

��with �

�� ,

the two extreme values on the high frequency range, with� � � � !�� #� � � and

� �� , respectively:

��

��

��

� � � � � � � � � � � � �

This yields� � ��

� � � � � � �

Since � �� and � � � �� for all � � � � � , the approximateinverse � for SPAI-0 has the constant diagonal entry �

� � � ! �� . Therefore,SPAI-0 and damped Jacobi coincide, since � � � � � � . �In this special situation, the parameter-free SPAI-0 smoother automatically yields a scal-ing of � � � � � which minimizes the smoothing factor; in that sense it is optimal.

General elliptic operator In Broker, Grote, Mayer, and Reusken (2000) we analyzeSPAI-0 smoothing for an elliptic differential operator of the form

� �� where � �� , � �� , and � �� are constant and � �� . We compare the smoothing per-formance to the optimal Jacobi smoothing by Yavneh and Olvovsky (1998). Wheneverdamped Jacobi is a good smoother, SPAI-0 typically provides at least 80%-90% smooth-ing efficiency with respect to optimally damped Jacobi smoothing for second-order finitediscretizations of elliptic differential operators.

More general cases We prove that for SPAI-0 the smoothing property holds under rea-sonable assumptions on the matrix

�. More precisely, for

�being symmetric and positive

definite, we prove that SPAI-0 satisfies the smoothing property, either if�

is weakly di-agonally dominant, or if

�has at most seven nonzero off-diagonal entries per row. To

our knowledge this is the first fairly general theoretical result on the smoothing propertyof iterative methods that are based on sparse approximate inverses. Previously Tang andWan (2000) analyzed the smoothing property of sparse approximate inverse smoothers forboundary value problems with constant coefficients on a two-dimensional regular grid.From a comparison of the SPAI-0 and damped Jacobi smoothers via numerical experi-ments, we conclude that the parameter-free SPAI-0 smoother is usually preferable to thedamped Jacobi method.

Theorem 2 (SPAI-0 is a smoother for SPD case) Let�

be symmetric positive definite(SPD), and let

� �� , with � the SPAI-0 preconditioner given by Equation (3.3).

Numerical Experiments for Standard Problems 61

Assume that the maximal number of nonzero off-diagonal entries in each row (� � ) is

less than or equal to 7. Then�

satisfies the smoothing property (Equation (2.15)). �Proof See Broker, Grote, Mayer, and Reusken (2000) for the proof, details, and that theestimate of 7 nonzero entries is sharp. �Next, we show that if

�is weakly diagonally dominant, that is � ��

��

��

��

�for all � �

the number of nonzero entries is immaterial and thus obtain another criterion for thesmoothing property.

Theorem 3 (SPAI-0 is a smoother for diagonally dominant case) Let�

be symmetric,positive definite, and weakly diagonally dominant. Furthermore, let

�� , with� the SPAI-0 smoother given by Equation (3.3), and assume that

� ��

Then�

satisfies the smoothing property (Equation (2.15)). �Proof See Broker, Grote, Mayer, and Reusken (2000) for the proof and details. �Note that the above condition is satisfied for matrices resulting from the FEM for piece-wise linear elements on triangles with no angle larger than � !�� .

3.2 Numerical Experiments for Standard Problems

In this section we verify the theoretical results about SPAI smoothing experimentally.We report convergence rates and timings generated with our WolfAMG code described inChapter 7. The code is freely available for download1 and each table with results indicatesa corresponding test file. Unless specified differently, the standard setting for the tests aregiven in Table 2.2 on page 44.

1see http://www.inf.ethz.ch/personal/broeker/wolfamg


Table 3.2 MG for Laplace Operator complexities � � , approximate inverse complex-ities � � and convergence rates � for solving the Laplace problem. We compare dif-ferent smoothers for geometric multigrid with standard coarsening and interpolation.[➠exp 001b.py] �

131 259 515( �� = 1.6) ( �� = 1.6) ( � � = 1.6)�� -Jacobi — 0.009 — 0.008 — 0.008

Gauss-Seidel — 0.002 — 0.002 — 0.002SPAI-0 0.3 0.007 0.3 0.007 0.3 0.007SPAI(0.35) 1.1 0.005 1.1 0.005 1.1 0.006SPAI-1 1.6 0.001 1.6 0.001 1.6 0.001SPAI(0.25) 2.3 0.001 2.4 0.001 2.4 0.001

3.2.1 Poisson problem

We first consider the Poisson problem on the unit square with zero Dirichlet boundaryconditions. We discretize the problem with the FDM on the standard grid. We test theproblem for geometric and algebraic coarsening, as well as regular and unstructured grids.

Experiment 2 (MG for Laplacian) This experiment tests geometric multigrid with stan-dard coarsening and interpolation. In Table 3.2 we compare the convergence rates ob-tained with various smoothers.

All smoothers, including the SPAI smoothers, lead to an � -independent convergence rate.The smoothing performance of SPAI-0 can be compared to the smoothing efficiency of� -Jacobi, while SPAI-1 can be compared to Gauss-Seidel. We observe a steady decreaseof the convergence rate � for smaller values of � , paralleled, of course, by an increase in�� . Note that SPAI-1 leads to a more effective but denser smoother than SPAI(0.35), yetthe situation is reversed as � is decreased to 0.25. �

Experiment 3 (AMG for Laplacian) We now execute the same test as the previous one,but with the standard algebraic coarsening strategy. The results in Table 3.3 on the facingpage also show ideal multigrid convergence rates for all smoothers under consideration.For this problem SPAI(0.35),SPAI-1 and SPAI(0.25) are identical and thus yield identicalconvergence rates.

We additionally report timings for our implementation of AMG with SPAI inverses in Ta-ble 3.4 on the next page. We list the setup time

�sup and the solver time

�sol. The most

important observation is that the solver time for smoothing with SPAI smoothers is com-parable to classical smoothers. Another important observation is that AMG has a setup


Table 3.3 AMG for Laplace ( � ) Operator complexities � � , approximate inverse com-plexities � � and convergence rates � for solving the Laplace problem. We compare dif-ferent smoothers for algebraic multigrid. [➠exp 001c.py]�

128 256 512( �� = 2.2) ( �� = 2.2) ( � � = 2.2)� � � � � � � � �� -Jacobi — 0.031 — 0.032 — 0.036


Table 3.4 AMG for Laplace ( � ) Setup time � sup and solve time � sol for solv-ing the Laplace problem. We compare different smoothers for algebraic multigrid.[➠exp 001c.py] �

128 256 512�sup[s]

�sol[s]

�sup[s]

�sol[s]

�sup[s]

�sol[s]� -Jacobi 0.3 0.2 1.3 0.9 5.9 3.4

Gauss-Seidel 0.3 0.1 1.3 0.6 6.1 2.2SPAI-0 0.4 0.3 1.7 1.1 7.0 4.2SPAI(0.35) 0.6 0.2 2.7 0.9 11.8 3.5SPAI-1 0.7 0.2 2.6 0.9 11.0 3.5SPAI(0.25) 0.6 0.2 2.6 0.9 11.0 3.5

time that cannot be neglected. Obviously, the computation of an approximate inverse onevery level adds to the setup time. Note that this problem is a worst case test for an AMGcode, because the relative time spent in the solver routine is very small due to the fastconvergence. One can read from the table that the setup time doubles from Gauss-Seidelsmoothing to SPAI-1 smoothing. We have spent more time for optimizing the V-cycle per-formance, rather than the setup. For solving a system with multiple right hand sides thesetup time is, of course, immaterial.

The timings show that our implementation of the setup of the SPAI( � ) algorithm is not � -independent. Unfortunately, for practical reasons, we have chosen an inefficient interfaceto an existing SPAI code. �

Experiment 4 (MG for Laplacian on unstructured grid) We now turn to investigatingunstructured grids and irregular domains for the Poisson problem. We have used the FEM


Figure 3.3 Unstructured grid Grid and solution of the equation � � � � for a trian-gulation of the shape of a cross-section of the human pelvis.

0.5 1 1.5 2 2.5 3 3.5 4 4.5

5 10 15 20 25 30 35 40 4510

15

20

25

30

35

40

x

y

for discretizing the problem, as described in Section 2.1.2. We have chosen the cross-section of a human pelvis as the domain. We have chosen this domain, because it seemsneither superficially complicated, nor of too simple nature, as e.g. L-shaped domains. Ad-ditionally the boundary cannot easily be represented nicely with only 100 points, which isexactly our limit

�min for the size of the coarsest system. This justifies even more the use

of an algebraic coarsening scheme. The triangulation of the domain and the solution ofthe resulting linear system can be seen in Figure 3.3.

Convergence results for various smoothers can be seen in Table 3.5 on the facing page.Overall the convergence results are not as ideal as with regular grids. Recall that thecoarsening heuristic produces perfect coarse grids for regular meshes. The table does notclearly indicate the asymptotic convergence rate, but the increase in values for � descentswith increasing grid size. Note that the gridsizes are not linearly spaced, but differ bya factor of four from column to column, due to the regular refinement of the underlyingtriangular mesh. For the largest problem the best convergence rate of 0.22 is achievedusing the SPAI-1 smoother. In this example the Jacobi iteration can be compared to SPAI-0 and Gauss-Seidel to SPAI-1 respectively. One can observe that a denser SPAI smootheryields a better convergence rate.

Table 3.6 on the next page lists the corresponding setup and solve times. For the smallerproblems the setup time still approximately equals the solve time. For the large problemsetup and solve time are of the same order of magnitude, except for the dynamic SPAI( � )smoother. Note that with respect to the gridsize the timings indicate � -independency for thesolve time. Within this sequential setting the Gauss-Seidel iteration yields the fastest AMG


Table 3.5 AMG for unstructured Laplace ( � ) Operator complexities � � , approxi-mate inverse complexities � � and convergence rates � for solving the Laplace prob-lem on an unstructured grid. We compare different smoothers for algebraic multigrid.[➠exp 006a.py] �

5755 23813 96841( �� = 1.7) ( �� = 1.8) ( �� = 1.9)�� -Jacobi — 0.17 — 0.29 — 0.36


Table 3.6 AMG for unstructured Laplace ( � ) Setup time � sup and solve time � sol forsolving the Laplace problem on an unstructured grid. We compare different smoothersfor algebraic multigrid. [➠exp 006a.py]�

5755 23813 96841�sup[s]

�sol[s]

�sup[s]

�sol[s]

�sup[s]

�sol[s]� -Jacobi 0.1 0.1 0.6 0.9 2.3 4.3

Gauss-Seidel 0.1 0.1 0.5 0.6 2.4 3.0SPAI-0 0.1 0.2 0.6 1.0 2.8 4.9SPAI(0.35) 1.2 0.1 7.0 1.0 293.4 5.1SPAI-1 0.3 0.1 1.5 0.9 6.9 4.7SPAI(0.25) 2.2 0.1 15.1 0.9 601.1 4.8

method. SPAI smoothers are inherently parallel and their sequential execution times arecomparable to the Gauss-Seidel iteration. For this problem the fastest parallel smoother isthe Jacobi iteration. �

3.2.2 Locally Anisotropic diffusion

In our next series of experiments we consider the locally anisotropic diffusion problem� � � � � � �

� � � � �


Table 3.7 MG for locally anisotropic Laplace Operator complexities � � , approximateinverse complexities � � and convergence rates � for solving the locally anisotropicLaplace problem on a 131 � 131 grid. We compare different smoothers for geometricmultigrid with standard coarsening and interpolation. [➠exp 003a.py]

��

� � ��

� � �

� � � �( � � = 1.6) ( � � = 1.6) ( � � = 1.6) ( � � = 1.6)�� -Jacobi — 0.01 — 0.26 — 0.90 — 0.97

Gauss-Seidel — 0.00 — 0.28 — 0.81 — 0.95SPAI-0 0.3 0.01 0.3 0.31 0.3 0.91 0.3 0.97SPAI-1 1.6 0.00 1.6 0.09 1.6 0.81 1.6 0.95SPAI(0.7) 0.3 0.01 0.3 0.31 0.3 0.91 0.3 0.97SPAI(0.6) 0.3 0.01 0.3 0.31 0.3 0.92 0.3 0.97SPAI(0.5) 0.3 0.01 0.6 1.22 0.6 0.76 0.6 0.92

with� � � �� and

�� if � ��

� otherwise�

We have already briefly discussed this equation as a motivation for AMG in Section 2.3.3.Classical smoothers yield unidirectional error smoothing in the direction of strong cou-pling within the area of anisotropic diffusion, see Figure 2.4. Error interpolation must bein the direction of smooth error components, which is the reason for using an algebraiccoarsening and interpolation scheme.

Experiment 5 (MG for anisotropic Laplacian) We apply standard coarsening and inter-polation to the anisotropic Poisson problem. As indicated above, the multigrid iteration re-sults in a very slowly converging scheme when standard smoother are used, see Table 3.7.The same is true for the fixed sparsity pattern approximate inverse smoothers SPAI-0 andSPAI-1. Yet the smaller � for the SPAI( � ) smoother is, the better is the convergence rate� . For larger gridsizes this effect cannot be utilized and the SPAI( � ) convergence rates foranisotropies stronger than � � � �

�lead to convergence rates very close to one. �

In Figure 3.4 we compare rows� � � � �� and � � �� , where � is computed with

SPAI � � �� . We consider�

corresponding to the grid points �#!�� #!�� in the centerof

�, where �

� � � � , and �#!�� #!�� inside the surrounding region where � � . We

observe how the approximate inverse, computed by the SPAI-Algorithm, captures thedistinct local features of the true inverse. We recall that the sparsity pattern of � is not


fixed a priori, but adapted automatically by the SPAI-Algorithm. The SPAI-1 smoother isdenser than the SPAI(0.4) smoother, but yields an inferior convergence rate. That clearlyshows that the dynamic sparsity pattern is advantageous for the smoothing efficiency forthis example.

These results demonstrate the usefulness of SPAI smoothing. When the area with anisotropicbehavior is small the search of the SPAI( � ) automatically yields an effective sparsity pat-tern. Yet when the anisotropy is strong and present in a large region, the pattern becomestoo large and thus computationally costly. The major problem of unidirectional errorsmoothing is solved by the algebraic scheme in AMG which we will use in our nextexperiment.

Experiment 6 (AMG for anisotropic Laplacian) Solving the locally anisotropic Laplaceequation with algebraic coarsening and interpolation yields a multigrid method that is � -independent and � -independent for all smoothers investigated. This can be verified inTable 3.8 and Table 3.9 respectively.

The conclusions that can be drawn from the convergence rates are similar to those of thePoisson problem on regular grids, i.e. SPAI smoothing works, the correspondence betweenJacobi/SPAI-0 and Gauss-Seidel/SPAI-1 and the density/convergence relationships. Thetwo Tables 3.8 and Table 3.9 show that the resulting method is � -independent with respectto the gridsize, as well as the anisotropy � for all smoothers. Overall SPAI smoothers yieldacceptable densities � � and � � . �


Figure 3.4 Inverse comparison for anisotropic Laplacian Comparison of rows of thetrue inverse

� � � for the locally anisotropic Poisson problem ( � ), with rows of the approx-imate inverse � =SPAI � � � � � . Row � corresponds to the grid point � !�� !�� in the outerisotropic region and row corresponds to the grid point �#!�� #!�� in the inner anisotropicregion.

0

5

10

15

0

5

10

15

0

1

2

3

4

5

x 10−3

(a) �� , exact inverse

0

5

10

15

0

5

10

15

0

1

2

3

4

5

x 10−3

(b) SPAI(0.25)��

0

5

10

15

0

5

10

15

0

1

2

3

4

5

6

7

8

x 10−4

(c) � � � �� , exact inverse

0

5

10

15

0

5

10

15

0

1

2

3

4

5

6

7

8

x 10−4

(d) SPAI(0.25)��


Table 3.8 AMG for locally anisotropic Laplace ( � ) Operator complexities � � , approx-imate inverse complexities � � and convergence rates � for solving the locally anisotropicLaplace problem on a 512 � 512 grid. We compare different smoothers for algebraic multi-grid. [➠exp 003b.py]

��

� � � � � � �� ( � � = 2.2) ( � � = 2.3) ( � � = 2.3) ( � � = 2.3)�� -Jacobi — 0.04 — 0.22 — 0.28 — 0.28

Gauss-Seidel — 0.01 — 0.15 — 0.24 — 0.24SPAI-0 0.3 0.04 0.3 0.22 0.3 0.31 0.3 0.31SPAI-1 2.2 0.01 2.3 0.16 2.3 0.26 2.3 0.26SPAI-1r 2.2 0.01 1.9 0.14 1.9 0.20 1.9 0.20SPAI(0.6) 0.3 0.04 0.3 0.22 0.3 0.31 0.3 0.31SPAI(0.5) 0.3 0.04 0.7 0.16 0.7 0.12 0.7 0.12SPAI(0.4) 1.1 0.01 1.3 0.14 1.3 0.12 1.3 0.12

Table 3.9 AMG for locally anisotropic Laplace ( � ) Operator complexities � � , approx-imate inverse complexities � � and convergence rates � for solving the locally anisotropicLaplace problem with �

� � �� . We compare different smoothers with algebraic multi-grid. [➠exp 003c.py]�

128 256 512( � � = 2.3) ( � � = 2.3) ( � � = 2.3)�� -Jacobi — 0.15 — 0.21 — 0.28

Gauss-Seidel — 0.08 — 0.14 — 0.24SPAI-0 0.3 0.16 0.3 0.23 0.3 0.31SPAI-1 2.3 0.07 2.3 0.15 2.3 0.26SPAI-1r 1.9 0.06 1.9 0.12 1.9 0.20SPAI(0.6) 0.3 0.16 0.3 0.23 0.3 0.31SPAI(0.5) 0.7 0.08 0.7 0.12 0.7 0.12SPAI(0.4) 1.3 0.07 1.3 0.11 1.3 0.12


3.2.3 Constant convection

We now consider the convection–diffusion equation

� � � � � �� (�� (3.13)

on the unit square, where � vanishes on the boundary. Here � represents any scalar quan-tity advected by the flow field

� . For convection dominated flow, ��

��

, the linearsystems cease to be symmetric and positive definite, so that these problems lie outside ofclassical multigrid theory. We use centered second-order finite differences for the diffu-sion, but discretize the convection with first-order upwinding to ensure numerical stability,as described in Section 2.1.1. We are especially interested in convection dominated cases(small � ). AMG converges well in the diffusion dominated cases, because the problem isessentially a small perturbation of the Laplacian.

First, we consider a situation of unidirectional flow, with angle � from the�

-axis, that iswith constant flow direction

� � � �� .We analyze the smoothing efficiency of classical and SPAI smoothers for different angles� by explicitly plotting the error on a 30 � 30 grid after 10 iterations of various smoothingiterations. Solutions to the problem can be seen in Figure 3.5

Figure 3.5 Solutions to the constant convection problem [➠exp 007p.py]

0

0.5

1

0

0.5

10

0.5

1

1.5

(a)�

= 45 �

0

0.5

1

0

0.5

10

0.5

1

1.5

(b)�

= 135 �

0

0.5

1

0

0.5

10

0.5

1

1.5

(c)�

= 225 �

The Gauss-Seidel iteration is obviously ordering dependent, as most components in aniteration are very likely to use components computed in that same iteration. We usethree standard ordering: GS(for) for lexicographic ordering of the unknowns from 1 to

� , GS(bak) for the reverse ordering from � to 1 and the alternating order GS(sym) thatalternates GS(for) and GS(bak) in every other iteration. Figure 3.6 shows that GS(sym)smoothing is needed to obtain geometric smoothness of the error after 10 iterations forall values of � . For single direction Gauss-Seidel iterations the resulting smoothness


varies from very smooth to no smoothing at all. A simple analysis of the system matricesexplains this effect. In the limit � � � the matrices for angles � =45 � and � =225 � are uppertriangular, respectively lower triangular. It is obvious that GS(for) is an exact solver for alower triangular matrix and a constant iteration with � �� for an upper triangularmatrix.

Applying symmetric GS(sym) to the problem yields an�

-independent convergence rate.Parallelizing GS(sym) is at least as difficult as parallelizing GS(for) and GS(bak). Inspecial regular cases red-black ordering of the unknowns easily parallelizes the Gauss-Seidel iteration, yet in the general sparse unstructured case the Gauss-Seidel iteration isdifficult to execute efficiently in parallel.

The SPAI smoothers in contrast are inherently parallel and Figure 3.7 shows that the SPAI-0 and even more the SPAI(0.5) smoothers are � -independent smoothers. To reiterate:the parallelism in the SPAI smoother is due to the Frobenius norm minimization, whichseparates the minimization problem into independent minimizations for each row andmore so to the fact that a smoothing step requires the computation of the residual and themultiplication of the sparse approximate inverse with the computed residual.

We verify our results about the smoothing efficiency of the different smoother with thefollowing experiment.

Experiment 7 (AMG for constant convection problem) We test AMG on the constantconvection problem for a convection dominated case with � �� and the aforemen-tioned special angles

�=45 � ,

�=135 � , and

�=225 � . Table 3.10 on page 74 lists the conver-

gence rates for different smoothers. The ordering dependence of the GS(for) and GS(bak)smoothers becomes obvious. While they both exhibit good convergence rates for

�=135� ,

they fail to converge for one of the two remaining flow directions. The GS(sym) smootherconverges with excellent convergence rates, as expected from the previous analysis. Bothof the diagonal type smoother, the � -Jacobi smoother and the SPAI-0 smoother convergewith excellent convergence rates, independent of the flow direction as well. Convergencerates of the SPAI-0 are superior for all directions. Especially the SPAI(0.5) smoother yieldsexcellent convergence rates at little extra cost for storing the inverse. Again the heuris-tic search pattern for nonzero entries proves to find good entries for an efficient AMGsmoother. �


Figure 3.6 Gauss-Seidel on undirectional convection problem We plot the error after10 iterations of different versions of the Gauss-Seidel iteration on a 30 � 30 grid discretiza-tion of the undirectional convection problem ( �

� � � � ) with different angles � of theconvection. [➠exp 007p.py]

� = 45 �

0

0.5

1

0

0.5

1−5

0

5

10x 10

−3

(a) GS(for)

0

0.5

1

0

0.5

1−0.5

0

0.5

1

(b) GS(sym)

0

0.5

1

0

0.5

1−400

−200

0

200

400

(c) GS(bak)

� = 135 �

0

0.5

1

0

0.5

1−200

−100

0

100

(d) GS(for)

0

0.5

1

0

0.5

1−50

0

50

100

(e) GS(sym)

0

0.5

1

0

0.5

1−100

−50

0

50

100

(f) GS(bak)

� = 225 �

0

0.5

1

0

0.5

1−400

−200

0

200

(g) GS(for)

0

0.5

1

0

0.5

1−1

−0.5

0

0.5

(h) GS(sym)

0

0.5

1

0

0.5

1−4

−2

0

2x 10

−3

(i) GS(bak)


Figure 3.7 SPAI smoothers on unidirectional convection problem We plot the errorafter 10 iterations of different versions of SPAI smoothers on a 30 � 30 grid discretiza-tion of the unidirectional convection problem ( �

� � � � ) with different angles � of theconvection. [➠exp 007p.py]

� = 45 �

0

0.5

1

0

0.5

1−200

−100

0

100

200

(a) SPAI-0

0

0.5

1

0

0.5

1−100

−50

0

50

100

(b) SPAI-1

0

0.5

1

0

0.5

1−100

−50

0

50

100

(c) SPAI(0.5)

� = 135 �

0

0.5

1

0

0.5

1−200

−100

0

100

200

(d) SPAI-0

0

0.5

1

0

0.5

1−100

−50

0

50

100

(e) SPAI-1

0

0.5

1

0

0.5

1−100

0

100

200

(f) SPAI(0.5)

� = 225 �

0

0.5

1

0

0.5

1−200

−100

0

100

200

(g) SPAI-0

0

0.5

1

0

0.5

1−150

−100

−50

0

50

(h) SPAI-1

0

0.5

1

0

0.5

1−100

−50

0

50

100

(i) SPAI(0.5)


Table 3.10 AMG for unidirectional convection problem Operator complexities � � ,approximate inverse complexities � � and convergence rates � for solving the convec-tion dominated ( �

� � � � � ) unidirectional convection problem on a 64 � 64 grid. Wecompare different smoothers for algebraic multigrid. [➠exp 007a.py]�

45 � 90 � 135 � 180 � 225 �

( � � = 4.1) ( � � = 2.1) ( � � = 4.1) ( � � = 2.3) ( � � = 4.1)�� GS(for) — 0.00 — 0.00 — 0.01 — 0.31 — 0.43GS(bak) — 0.37 — 0.34 — 0.02 — 0.00 — 0.00GS(sym) — 0.00 — 0.00 — 0.01 — 0.00 — 0.00� -Jacobi — 0.10 — 0.11 — 0.12 — 0.13 — 0.12SPAI-0 0.4 0.08 0.4 0.16 0.4 0.10 0.4 0.17 0.4 0.09SPAI-1 4.0 0.03 2.1 0.04 4.0 0.04 2.3 0.04 4.0 0.04SPAI(0.5) 1.4 0.05 1.1 0.06 1.4 0.06 1.1 0.07 1.3 0.07

3.2.4 Constant Convection with Shishkin Mesh

Figure 3.5(b) shows that the constant convection problem from the previous section ex-hibits a boundary layer along the left and upper edge of the domain for � =135 � . One maywant to resolve the boundary layer. We therefore choose a tensor mesh that is given bydistributing half of the points over the boundary layer and the other half over the rest ofthe region. The solution can be seen in Figure 3.8. We plot the solution of the problem ona standard mesh in Figure 3.8(a). Clearly on the 30 � 30, the boundary layer is essentiallynot resolved, i.e. the function drops discontinuously to zero at the boundary layer. TheShishkin mesh resolves the boundary layer well, which can be seen from Figure 3.8(b).To make the resolution even more visible, we magnify the boundary layer by stretchingthe mesh and plot the solution on the “standard” grid. Figure 3.8(c) shows that the bound-ary layer is resolved by approximately half of the gridpoints in each direction and the restof the region by the other half. Note that for the tensor mesh then 3/4 of the mesh pointsresolve the two-dimensional boundary layer.

Shishkin tensor meshes result in anisotropic meshes where different grid spacings in thetwo dimensions meet. In our case the anisotropy is significant, i.e. approximately 1:10.Even stronger anisotropies can easily occur. Anisotropic meshes yield anisotropic sten-cils. Figure 3.9 displays the smoothing behavior of the smoothers we have been dis-cussing. Error smoothing is unidirectional in the anisotropic region, as in the anisotropicdiffusion case in Section 3.2.2. In the convection dominated case, Gauss-Seidel exhibitsthe expected behavior, i.e. unidirectional smoothing in regions with anisotropic grids andsmoothing in all directions in the isotropic regions. When the convection is strong enough


Figure 3.8 Solutions to the constant convection problem with Shishkin meshConstant convection problem for � =135 � and �

� � � � on different 30 � 30 grids.[➠exp 011p.py]

0

0.5

1

0

0.5

10

0.5

1

1.5

xy

u

(a) standard grid

0

0.5

1

0

0.5

10

0.5

1

1.5

xy

u

(b) Shishkin mesh

0

0.5

1

0

0.5

10

0.5

1

1.5

xy

u

(c) Shishkin mesh withstandard plot

though, the expected behavior is disturbed, as can be seen in Figure 3.9(a) and 3.9(b).Some components of the error experience no smoothing at all. Symmetric Gauss-Seidelyields equal error smoothing in all directions, see Figure 3.9(c). The SPAI smoothersyield the expected behavior in all cases, as can be seen in figures 3.9(d)–(f).

Experiment 8 (AMG for constant convection on Shishkin mesh)Convergence results for this problem and different smoothers can be seen in Table 3.11on page 77. The results show what can be expected from the error smoothing plots in Fig-ure 3.9. Convergence rates for unidirectional Gauss-Seidel deteriorate for increasing gridsize. All other smoothers yield convergence rates below 0.6. In this case the SPAI(0.5)smoother is clearly preferable over the SPAI-1 smoother. The dynamic search pattern obvi-ously yields a better smoother, even for smaller values of � � . Again, improving smoothingby lowering the value of � is limited, as can be seen by comparing convergence rates forSPAI(0.5) and SPAI(0.3). The reduced sparsity pattern inverse SPAI-1r yields a very ef-ficient smoother with good convergence rate and a small value � � in comparison to theother SPAI smoothers with equivalent convergence rate. �

3.2.5 Rotating Flow

Next, we consider a situation of rotating flow, where � solves (3.13) with

� � ��


Figure 3.9 Smoothers for constant convection problem with Shishkin mesh Constantconvection problem for � =135 � and �

� � �� on different 30 � 30 grids. We plot the errorafter 10 steps of the indicated smoothers. [➠exp 011p.py]

0

0.5

1

0

0.5

1−400

−200

0

200

400

(a) GS(for)

0

0.5

1

0

0.5

1−500

0

500

(b) GS(bak)

0

0.5

1

0

0.5

1−100

−50

0

50

100

(c) GS(sym)

0

0.5

1

0

0.5

1−400

−200

0

200

400

(d) SPAI-0

0

0.5

1

0

0.5

1−200

−100

0

100

200

(e) SPAI-1

0

0.5

1

0

0.5

1−200

−100

0

100

200

(f) SPAI(0.5)

We choose this problem, because it is impossible in this example to reorder the unknownsso that the entire system becomes lower triangular for vanishing � , as in Section 3.2.3.As a consequence (see the preceding section), the multigrid iteration with single directionGauss-Seidel smoothing converges very slowly for small � .

The exact smoothing behavior of the smoothers under investigation can be seen in Fig-ure 3.10 on page 79. Unidirectional Gauss-Seidel iterations, i.e. GS(for) and GS(bak)yield no smoothing effect in some area of the region, while the SPAI smoothers all do, asdoes GS(sym). The following convergence results corroborate the geometric smoothnessobserved in Figure 3.10.

Experiment 9 (AMG for rotating convection) Table 3.12 on the facing page shows con-vergence rates for the rotating flow problem on different grids with dominating convection,i.e. � � � � � � . Due to the lack of smoothing efficiency in one of the corners of the re-gion, the smoothers GS(for) and GS(bak) annihilate the convergence of the resulting AMGiteration. The GS(sym) iteration yields an excellent converging multigrid scheme. The


Table 3.11 AMG for constant convection problem on Shishkin mesh Operator com-plexities � � , approximate inverse complexities � � and convergence rates � for solvingthe convection dominated ( �

� � � � � ) constant convection problem ( � =135 � ) for differ-ent grid sizes of a Shishkin mesh that resolves the boundary layer. We compare differentsmoothers for algebraic multigrid. [➠exp 011a.py]

128 � 128 256 � 256 512 � 512� � � � � � � � � � � � � � �GS(for) 3.9 — 0.81 4.1 — 0.92 4.3 — �

GS(bak) 3.9 — 0.74 4.1 — 0.96 4.3 — �

GS(sym) 3.9 — 0.09 4.1 — 0.24 4.3 — 0.41� -Jacobi 3.9 — 0.27 4.1 — 0.43 4.3 — 0.61SPAI-0 3.9 0.4 0.23 4.1 0.4 0.43 4.3 0.4 0.61SPAI-1 3.9 3.9 0.21 4.1 4.1 0.39 4.3 4.3 0.59SPAI-1r 3.9 1.5 0.18 4.1 1.5 0.35 4.3 1.5 0.54SPAI(0.5) 3.9 1.7 0.20 4.1 1.7 0.40 4.3 1.7 0.58SPAI(0.3) 3.9 3.5 0.19 4.1 3.5 0.36 4.3 3.5 0.56

Table 3.12 AMG for rotating convection problem Operator complexities � � , approx-imate inverse complexities � � and convergence rates � for solving the convection dom-inated ( �

� � � �� ) rotating convection problem on different grid sizes. We comparedifferent smoothers for algebraic multigrid. [➠exp 008a.py]

128 � 128 256 � 256 512 � 512� � � � � � � � � � � � � � �GS(for) 3.9 — 0.81 4.1 — 0.92 4.3 — �

GS(bak) 3.9 — 0.74 4.1 — 0.96 4.3 — �

GS(sym) 3.9 — 0.09 4.1 — 0.24 4.3 — 0.41� -Jacobi 3.9 — 0.27 4.1 — 0.43 4.3 — 0.61SPAI-0 3.9 0.4 0.23 4.1 0.4 0.43 4.3 0.4 0.61SPAI-1 3.9 3.9 0.21 4.1 4.1 0.39 4.3 4.3 0.59SPAI-1r 3.9 1.5 0.18 4.1 1.5 0.35 4.3 1.5 0.54SPAI(0.5) 3.9 1.7 0.20 4.1 1.7 0.40 4.3 1.7 0.58SPAI(0.3) 3.9 3.5 0.19 4.1 3.5 0.36 4.3 3.5 0.56


SPAI smoothers show the expected behavior as in the previous examples. For the largestproblem on a 512 � 512 grid the convergence rates of SPAI smoothers do not reach theGS(sym) convergence rate of 0.41. The SPAI-1r smoother comes close to that conver-gence rate at an acceptable � � .

The large values of � � are due to a fan-out effect in the Galerkin coarse grid approxima-tion that is also reported in the literature, see Stuben (2001). Note that especially in thissituation � � is smaller than � � . �Experiment 10 (AMG for 3D rotating convection) Essentially equivalent results to theprevious example can be found for a 3D problem with

�� !�� "� �� "� ��#$�� %��

&')(

The resulting vector-field for the convection points in any of the three dimensions. See[➠exp 012a.py] for the exact results. �

3.2.6 Rotated Anisotropy

Another standard test problem for multigrid from textbooks is given by rotating the anisotropicdiffusion problem from equation

� � � � � � � � � �by an angle � . The resulting differential equation is given by

� � � � � � � � � � � � � � � �� (3.14)

with

� ��

�and � � ��

� �Figure 3.11 shows solutions of the problem for a series of angles � .

We have seen AMG to work well on this problem for � being multiples of 90 � , when theanisotropy is aligned with the grid. What happens to the convergence rate of AMG for

� =45 � or at some other angle in between? The next experiment reproduces results foundin the literature ((Wesseling 1992) and Stuben 2001) using Gauss-Seidel smoothing andtests AMG with SPAI smoothing.


Figure 3.10 Smoothers for rotating convection problem We plot the error after 10 stepsof the indicated smoothers. [➠exp 008p.py]

0

0.5

1

0

0.5

1−200

−100

0

100

200

(a) GS(for)

0

0.5

1

0

0.5

1−200

−100

0

100

200

(b) GS(bak)

0

0.5

1

0

0.5

1−50

0

50

100

(c) GS(sym)

0

0.5

1

0

0.5

1−200

−100

0

100

200

(d) SPAI-0

0

0.5

1

0

0.5

1−100

0

100

200

(e) SPAI-1r

0

0.5

1

0

0.5

1−100

0

100

200

(f) SPAI-1

0

0.5

1

0

0.5

1−200

−100

0

100

200

(g) SPAI(0.7)

0

0.5

1

0

0.5

1−200

−100

0

100

200

(h) SPAI(0.6)

0

0.5

1

0

0.5

1−100

0

100

200

(i) SPAI(0.5)


Figure 3.11 Rotated anisotropy problem Solutions to the rotated anisotropy problemfor different angles � . [➠exp 010p.py]

0

0.5

1

0

0.5

10

50

100

150

(a)�

= 0 �

0

0.5

1

0

0.5

10

50

100

150

(b)�

= 15 �

0

0.5

1

0

0.5

10

50

100

150

(c)�

= 30 �

0

0.5

1

0

0.5

10

50

100

150

200

(d)�

= 45 �

0

0.5

1

0

0.5

10

50

100

150

(e)�

= 60 �

0

0.5

1

0

0.5

10

50

100

150

(f)�

= 75 �

0

0.5

1

0

0.5

10

50

100

150

(g)�

= 90 �

Parallel MG with SPAI-1 81

Table 3.13 AMG for rotated anisotropic diffusion problem Operator complexities � � ,approximate inverse complexities � � and convergence rates � for strong anisotropy ( �

� � � �� ) and different angles � for the rotated anisotropic diffusion problem on a 512 � 512grid. We compare different smoothers for algebraic multigrid. [➠exp 010a.py]

�

0 � 15 � 30 � 45 � 60 � 75 �

( � � = 2.7) ( � � = 2.3) ( � � = 2.4) ( � � = 2.3) ( � � = 2.4) ( � � = 2.3)�� Gauss-Seidel — 0.00 — 0.84 — 0.92 — 0.46 — 0.92 — 0.37� -Jacobi — 0.03 — 0.88 — 0.94 — 0.57 — 0.94 — 0.53SPAI-0 0.4 0.01 0.2 0.88 0.2 0.93 0.2 0.55 0.2 0.93 0.2 0.49SPAI-1 2.7 0.00 2.3 0.80 2.4 0.89 2.3 0.31 2.4 0.89 2.3 0.28SPAI-1r 1.2 0.00 0.9 0.83 1.3 0.89 1.2 0.35 1.3 0.89 0.9 0.35SPAI-2 7.5 0.00 6.9 0.74 7.4 0.89 6.9 0.18 7.4 0.89 6.6 0.16SPAI-2r 2.0 0.00 2.0 0.79 3.5 0.89 3.2 0.24 3.5 0.89 2.1 0.23SPAI(0.5) 2.0 0.00 0.9 0.87 0.7 0.90 0.2 0.55 0.7 0.90 0.9 0.47SPAI(0.3) 5.0 0.00 3.1 0.82 2.4 0.91 1.9 0.31 2.4 0.91 3.0 0.31

Experiment 11 (AMG for rotated anisotropy) We experiment on Equation (3.14) withstrong anisotropy ( � � � � � � ) and different smoothers. Table 3.13 lists convergence ratesfor different angles. Note that convergence is worst for all smoother, e.g. for � =30� . Theconvergence with the Gauss-Seidel smoother rises above 0.9 on a 512 � 512 grid. It isknown from the literature that this problem poses problems for geometric multigrid, be-cause the smoothing efficiency is disturbed in all grid aligned directions. Even with line-relaxation schemes the method is still � -dependent. Stuben reports (Stuben 2001) the

� -dependency of the AMG iteration can be resolved by either using an F-cycle with animproved interpolation or combining a Krylov subspace method with AMG as a precondi-tioner.

Figure 3.12(a) shows that the convergence rate is very similar for a variety of smoothers.Figure 3.12(b) shows that the SPAI-1r smoother is an attractive parallel alternative toGauss-Seidel smoothing for this example. While the complexity of the approximate in-verse is strictly below 2 for any angle � , the convergence � also lies strictly below theconvergence rate obtained with Gauss-Seidel. Table 3.13 lists sample convergence ratesfor different angles. �

3.3 Parallel MG with SPAI-1

Since the SPAI-1 smoother is inherently parallel, it is straightforward to apply within aparallel version of geometric MG. The data is distributed among processors via domain


Figure 3.12 AMG for rotated anisotropic diffusion problem We plot the rotationalangle � vs. the convergence � and approximate inverse complexities � � for stronganisotropy ( �

� � � �� ) for the rotated anisotropic diffusion problem on a 512 � 512grid. We compare different smoothers for algebraic multigrid. [➠exp 010b.py]

0

0.2

0.4

0.6

0.8

1

0 10

20

30

40

GSSPAI−1SPAI−1rSPAI(0.5)SPAI(0.3)

(a) � vs. �

0

1

2

3

4

5

0 10

20

30

40


(b) � � vs. �

decomposition, which is well–known to work efficiently for a number of multigrid appli-cations, see McBryan et al. (1991) for a survey.

The platform we used is the ETH–Beowulf cluster, which consists of 192 dual CPU Pen-tium III (500 MHz, 1 GB RAM, SuSe Linux 6.2) processors. All nodes are connected viaa 100 MBit/s and 1 GB/s switched network, while communication is done with MPI, seeMessage Passing Interface Forum (1994) and Gropp et al. (1994).

We applied our parallel multigrid implementation to the rotating flow problem (3.13) with� � � �� . On 128 nodes, the total execution–time was 156 seconds on the 4096 � 4096

grid, i.e. approximately 16 million unknowns. The time includes the set–up for the con-struction of the SPAI-1 smoother, which requires the solution of about sixteen millionsmall (25 � 9) and independent least–squares problems. The use of a coarsest level, whichconsists only of a single mesh point, leads to a divergent multigrid iteration for �

� � � � .By increasing the resolution of the coarsest level up to 32 � 32 mesh points, one obtains aconvergent multigrid iteration.

To obtain good speed–up with a parallel MG code, it is important to perform coarsegrid agglomeration (Trottenberg and Oosterlee 1996) because of the loss of efficiency oncoarser grid levels. Although we have not implemented such an agglomeration strategy,our computations scale reasonably well as long as the problem size matches the size ofthe parallel architecture, see Table 3.14.

Conclusions: SPAI Smoothes and is Parallel 83

Table 3.14 Scalability of parallel MG using SPAI-1. The problem size and the numberof processors is increased by a factor of 4, while total time increases by 30% only.

Gridsize 512 � 512 � 4 1023 � 1023Number of processors 4 � � 16Total time (sec) 20 26

3.4 Conclusions: SPAI Smoothes and is Parallel

Our results for a number of standard test problem for AMG show that sparse approximateinverses, based on the minimization of the Frobenius norm, provide an attractive alterna-tive to classical Jacobi or Gauss-Seidel smoothing. The simpler smoothers, SPAI-0 andSPAI-1, often provide ample smoothing, comparable to damped Jacobi or Gauss-Seidel.The ample smoothing is achieved with additional memory requirements for storing theapproximate inverse. Yet in a parallel calculation available memory is less of a problem,while using approximate inverses as a smoother guarantee convergence results indepen-dent of the number of processors.

We never encountered a problem where Gauss-Seidel smoothing resulted in a convergingmultigrid scheme, but SPAI failed to do so. However the opposite does occur, as shownin this chapter. Thus SPAI smoothers are shown to be more robust than the Gauss-Seideliteration.

Our implementation of geometric multigrid combined with SPAI-1 smoothing enables thefast solution of very large convection-diffusion problems on massively parallel architec-tures.

Chapter 4

SPAI as Strong Connections inMultigrid

4.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85

4.1.1 Approximate Inverses for Constant Coefficients . . . . . . . 86

4.1.2 Approximate Inverses for Typical Stencils . . . . . . . . . . 87

4.2 Numerical Experiments . . . . . . . . . . . . . . . . . . . . . . . 90

4.2.1 Stretched Grid Example . . . . . . . . . . . . . . . . . . . . 90

4.2.2 Unstructured Anisotropic Poisson Problem . . . . . . . . . . 92

4.2.3 Streamline Diffusion Problem . . . . . . . . . . . . . . . . . 92

4.2.4 Uncommon Laplace Discretization . . . . . . . . . . . . . . 94

4.3 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97

4.1 Motivation

The Ruge/Stuben AMG approach assumes that the Gauss-Seidel iteration smoothes alongso called strong connections (see Section 2.3.3), i.e. in the direction of large negativeoff-diagonal entries. When the smoothing procedure is changed to approximate inversesmoothing this heuristic has always proven to be correct as demonstrated by the conver-gence results in the previous chapter. Hence the same definition for strong connectionscan apparently be used for these classes of problems.

However, certain problems do exist where the Ruge/Stuben heuristic does not correctlyidentify the direction of smooth error components. We shall now show for two examplesthat using large positive entries in the SPAI-1 approximate can alleviate this problem.

86 SPAI as Strong Connections in Multigrid

4.1.1 Approximate Inverses for Constant Coefficients

We consider a 5-point stencil

� ��

��

�

�� (4.1)

where the every entry corresponds to the first letter in the orientations north, south, east,west and the center, respectively. Such stencils typically arise in equations with con-stant coefficients and periodic boundary conditions. They also arise when an infinite gridwithout boundaries is used. Infinite grids are commonly used in the formal analysis ofmultigrid smoothers, see Chapter 4 in Trottenberg, Oosterlee, and Schuller (2001).

The stencil�

can be assembled to the global linear system�

. Algorithm 3.1 on page 51shows how to compute a Frobenius-norm minimal approximate inverse for a given spar-sity pattern. Our results from the previous chapter show that the pattern of the matrix�

itself yields a robust smoother. Therefore we now look at the explicit construction ofSPAI-1 inverses for stencils of the form given in Equation (4.1). The Frobenius-normminimization introduced in Section 3.1.1 on page 50 yields identical minimization proce-dures for all rows. Thus the approximate inverse matrix � �

� � � can also be expressedin stencil notation.

The resulting stencil is given by first computing the auxiliary matrix��, which for a 5-

point stencil reads

��

��

��

��

��

� ��

��

� � ��

��

� � � � � ��

�

��

With

� ��

the minimization of��

� ��

for � in the least squares sense yields the correspondingrow � of � :

� ��

Motivation 87

As mentioned previously, � can be written in stencil notation as

� ��

� � ��

� � ��

The symbolic computation of � for general stencil entries � � � � � � � � � is feasible withthe computer algebra tool Maple. With common subexpression optimization a C programto compute � � � contains 98 assignments and a total of 476 floating point operations.Such expressions unfortunately are too complicated for analytical considerations and thusmake it difficult to easily prove theoretical properties for general classes of matrices. Inthe following section we therefore compute � for various distinct 5-point stencils.

4.1.2 Approximate Inverses for Typical Stencils

Two standard problems. The equation at any inner point of the standard finite-differencediscretization of the 2D-Laplace equation,

� � � �can be written in stencil notation as

� ��

-1-1

�-1

-1

��

We mark the strong connections (see Section 2.3.3 on page 38 for a formal definition)using a bold font.

Exact computation and conversion to floating point numbers with three significant deci-mal digits, respectively, yields:

� � �� 3/61

3/61 � � ! � � 3/61

3/61

��

%'& � � � ��

0.04920.0492

� � � �� 0.04920.0492

��

The same analysis for the anisotropic Laplace equation

� � � � � � � � � �


yields a large � -dependent expression for � by using the following stencil for � � � � � ,

� �� -1 � � � ( � � � � -1

� � � � �

��

we obtain the approximate inverse

%'& � � � �� 0.200

� � � � � 0.200� � � � � � � �

�� (4.2)

For this example the “weak” connections according to the Ruge/Stuben definition yieldsmall values in the approximate inverse and thus very low smoothing efficiency for thosedirections, exactly as in the Gauss-Seidel iteration. Thus the same coarse grids and in-terpolation operators combined with approximate inverse smoothing yield a convergingiteration. For small � , the vanishing smoothing efficiency in the direction of small entriesin � simply results from the negligible influence in the matrix-vector multiplication.

In (4.2) the small connections in the stencil can even be dropped. Since the small entriesin � coincide with the small entries in

�, the computation of the approximate inverse

can be restricted to the strong-connections in�

. For the anisotropic stencil above and thesparsity pattern �

� ��

��

we obtain the approximate inverse

%'& � � � ��0.200

� � � � � 0.200

�� (4.3)

Remark 4.1.1 Stencil (4.2) with small entries dropped and stencil (4.3) are not exactlythe same. The difference in the off-diagonal entries is given by

� � � � � � � � � � � � � � � � � ��

� � � � � � � �� ( � � �

�

�

�Stretched grids example. Solving the Poisson problem with bilinear quadrilateral finiteelements, with an aspect ratio of 1/10 (a problem suggested by Chow (2001)), leads to the

Motivation 89

following stencil:

� ��-1.0 � � � -1.0-3.9 � � � -3.9-1.0 � � � -1.0

�� (4.4)

Chow suggests a method for finding the strong connections based on a computed set ofalgebraically smooth vectors. By using �� from SPAI-1 as the basis for defining strong-connections we obtain the same strong connections:

%'& � � ��

0.0574� � � � � � 0.0574

� � � � � � � � � � � � � � � � � � � � ��

The approximate inverse clearly indicates the correct direction of smoothing.

This idea leads to a new definition of the strong connections, based on an approximateinverse � . We define the unknown � strongly depends on � if and only if

� �� and � �� with � � � � � � �� (4.5)

Strong Connections from SPAI For all the problems presented in the previous sec-tion the correct strong connections can be found by looking at the entries of the SPAI-1inverse, and more precise: the correct strong connections are in the directions of posi-tive entries in the SPAI-1 inverse. For the standard AMG test problems from Chapter 3this new definition of the strong connections coincides with the Ruge/Stuben definitionof strong connections given in Table 2.1 on page 39. For the large positive connectionsand stretched grid example we have shown that the Ruge/Stuben heuristic fails to find thecorrect strong connections, whereas the suggested heuristic based on the SPAI-1 inversestill succeeds in finding the direction of smooth error components.

For a more flexible and thus possibly more restrictive criterion, we consider applying theRuge/Stuben heuristic with the parameter � to the matrix �� instead of

�. The choice� �� essentially implies that all positive entries in the approximate inverse are taken as

strong connections. Any larger value of � adds a constraint on the allowed magnitude ofthe entry; thus it is more restrictive.

Recall that in Chapter 3 we have successfully applied the reduced sparsity pattern of thestrong connections to compute a Frobenius norm minimized approximate inverse, calledSPAI-1r. For some problems it allowed us to lower the memory requirement for the in-verse substantially, often nearly by a factor of two. When using approximate inverseswith a sparsity pattern computed from, say SPAI-1, this idea may still work, but cannotdirectly be implemented, because it requires the computation of a SPAI-1 inverse to findthe strong connections first, before recomputing the approximate inverse with the reduced


sparsity pattern. We denote this recomputed approximate inverse as SPAI-H. We empha-size that we recompute the approximate inverse with the reduced sparsity pattern, insteadof just truncating the negligible entries. The recomputation can actually be done on thefly without ever storing the full SPAI-1 inverse.

Remark 4.1.2 (suggested usage) In case the strong connections from SPAI coincide withthe ones from the Ruge/Stuben heuristic computing SPAI-1 adds unnecessary computa-tion to the AMG setup. In practice we suggest to use the original Ruge/Stuben approachfor defining strong connections first, because it is computationally cheaper than the newdefinition proposed in this chapter. If that fails, however, one may proceed by using �� .We have applied this strategy in our numerical examples. �Remark 4.1.3 (various SPAI inverses possible) In the previous paragraphs we have as-sumed that � is the SPAI-1 approximate inverse applied to Equation (4.5). Clearly al-ternative approximate inverses can be used as a basis for defining the strong connectionsusing Equation (4.5). In fact it might be advantageous to use SPAI( � ), especially formatrices that are considered too dense for AMG. �We now present some numerical results using the strong connection definition from Equa-tion (4.5).

4.2 Numerical Experiments

In this section we show that the robustness of AMG can be improved by replacing thetraditional Ruge/Stuben strong connection heuristic by the one proposed in the previoussection, which is based on using the large positive entries of an approximate inverse � .Recall that for finite-difference stencils, e.g. for the anisotropic Poisson problem, wehave seen the Ruge/Stuben heuristic to coincide with the SPAI-H heuristic. Thus in thefollowing section we shall concentrate on linear systems that stem from finite elementdiscretizations.

4.2.1 Stretched Grid Example

We begin by reporting two-level convergence rates for the stretched grid example in themotivation for this chapter. From stencil (4.4) one can see that the Ruge/Stuben strongconnection definition does not find the correct strong connections for the recommendedvalue � � � �� , while the SPAI-H criterion does. To demonstrate the effect upon amultilevel algorithm we report on the convergence of a two level iteration.

Numerical Experiments 91

Mainly we will compare using the classical Ruge/Stuben definition of strong connectionswith Gauss-Seidel as a smoother, with SPAI-1 strong connections and SPAI-1 smootherand SPAI-1 strong connections and the SPAI-H smoother. We will refer to these methodsas Ruge/Stuben, SPAI-1 and SPAI-H. In the previous section we have shown that SPAI-1and Gauss-Seidel behave very alike, thus the choice of smoother is of little relevance inthis study.

Experiment 12 (Stretched grid example) We apply standard AMG with only two levelsto the stretched grid example suggested by Chow (2001). We use a stretched grid with100 � 10 gridpoints on the unit square, i.e. a ratio of 1:10 for the quadrilateral elements.

The graph in Figure 4.1 shows that the two-level convergence of standard Ruge/StubenAMG exhibits a jump near the value � � �

( #�� commonly suggested in the literature. Onemay choose to simply raise that value to obtain better two-level convergence. This isgenerally not recommended, because one may also use a small value of � (maybe evensmaller than 0.25) to limit the density of the coarse grid operators in the coarsening pro-cess. Additionally, more extreme ratios of the quadrilateral elements would require evenhigher values for � . Note that the convergence rate of � =0.55 obtained for ��

�( #�� leads

to a very slow multilevel convergence rate near 1.0.

Using the modified strong connection definition from Equation (4.5) yields the desired two-grid convergence near 0.1 for any value of � . �

Figure 4.1 Stretched grid example We plot the convergence rates of classical two levelAMG versus � . We compare the convergence rates obtained with the standard strongconnection definition by Ruge/Stuben and the new definition based on SPAI-1 inverses.

0.1 0.15 0.2 0.25 0.30

0.2

0.4

0.6

0.8

1

strong connection threshold τ

conv

erge

nce

q

Ruge/StübenSPAI−1rSPAI−1


4.2.2 Unstructured Anisotropic Poisson Problem

In the case of anisotropic diffusion with unstructured grids, the anisotropy is generally notaligned with the grid. The strong connection heuristic must then gauge for each connec-tion whether the error smoothing for that direction is strong enough for coarsening andinterpolation. In the next example we compare the two given heuristics for this problem.

Experiment 13 (Unstructured Anisotropic Poisson Problem) We compare the originalRuge/Stuben strong connection definition to the new definition in (4.5). We compare thesensitivity of the two heuristics with respect to � by applying both to an anisotropic Poissonproblem with � � � � � � on an unstructured grid.

In Figure 4.2(a) we can observe that the convergence rates of the Ruge/Stuben heuristicare strictly smaller than the convergence rates of our new definition for strong connec-tions. The graph shows that the best convergence rate with the Ruge/Stuben definition isobtained with the generally recommended value of � � �

( #�� .For very small values of ��

( # for the SPAI-H heuristic the convergence rate � ap-proaches 1.0. For intermediate values of � between 0.3 and 0.8 both definitions yieldsimilar convergence rates, while for larger values both methods yield convergence ratesapproaching 1.0.

For this example the classical definition of strong connections appears to be better thanour new suggested definition in Equation (4.5). �

4.2.3 Streamline Diffusion Problem

We apply the new strong connection definition to matrices kindly provided by GunarMatthies from the University Magdeburg. He considers the problem

� � � � � � � � � ( � � � � � %

on the unit square with a regular grid and � �

on the boundary. The discretization isdone using � �� elements and the streamline diffusion parameter

� � � �. The right

hand side%

is chosen such that the solution � is

� � � � � � ��

For computing the matrix entries a quadrature formula that integrates exactly over poly-nomials from � �

is used. For more information on the streamline diffusion method seeMatthies and Tobiska (2001) and the references therein. The matrices were computedusing the MooNMD software, see also John and Matthies (2002).

Experiment 14 (Streamline diffusion problem) We apply AMG with the modified strongconnection definition from this chapter with SPAI-H for a value of � � �

as the set of strong


Figure 4.2 AMG for unstructured anisotropic diffusion problem We apply AMG withthe modified SPAI-H strong connection definition with varying � and different smoothersto an anisotropic Poisson problem with �

� � � � on an unstructured mesh with gridsize� =167297 [➠exp 017g.py]

0

0.2

0.4

0.6

0.8

1

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

RSSPAI−1SPAI−H

(a) convergence �

0

0.5

1

1.5

2

2.5

3

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

RSSPAI−1SPAI−H

(b) operator complexity � �

0

0.5

1

1.5

2

2.5

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

RSSPAI−1SPAI−H

(c) approximate inverse matrix complexity� �

0

1

2

3

4

5

6

7

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

RSSPAI−1SPAI−H

(d) solution time � sol


connections and the smoother to the given streamline diffusion problem. For � � � � ��

classical AMG does not converge for any � . The convergence results with new strongconnections are given in Figure 4.3(a). One can see that � -independent convergencerates are obtained for all values of � . Figures 4.3(a) and 4.3(b) show that the overallvalues for � � are strictly below 3, while the approximate inverse complexities are evenlower, i.e. strictly below 1.2. Better convergence rates below 0.65 for all values of � can beachieved by using the more dense SPAI-1 inverse. �

4.2.4 Uncommon Laplace Discretization

In the following example we introduce an uncommon discretization of the anisotropicPoisson problem given by

� � � � � � � � � � � � � ��

(4.6)

� � � � � �� (4.7)

where� � � � �� .

Any stencil�

can be multiplied with the stencil

��

��

The solution of the resulting system� � �

is a weighted average of the original stenciland also a discretization of the PDE corresponding to the original stencil

�, because for� � � the combined stencil and the original stencil tend to coincide:

� � ��

� . For� � the modification has no effect:

� �� .

Using this modification we introduce the following stencils:

� � � � discretized by� � �

��

��

� � � � � �� and

� � � � discretized by��

��

��

��

For � � � � � using the stencil

� �� (4.8)


Figure 4.3 AMG for streamline diffusion problem We test AMG with the modifiedSPAI-H strong connection definition and smoother to the streamline diffusion problemfor different grid sizes � . [➠exp 017f.py]

0

0.2

0.4

0.6

0.8

1

2112

8320

33024

131584

ν=100

ν=10−4

ν=10−8

ν=10−12

(a) convergence �

0

0.5

1

1.5

2

2.5

3

3.52112

8320

33024

131584

ν=100

ν=10−4

ν=10−8

ν=10−12


0

0.2

0.4

0.6

0.8

1

1.2

2112

8320

33024

131584

ν=100

ν=10−4

ν=10−8

ν=10−12



to discretize Equation (4.6) yields the following stencils for selected values of � :

� � �#! � � ��

0.2502 -1.0005 0.2502-0.5010 2.0020 -0.50100.2502 -1.0005 0.2502

�

� � � ��

-1.0000-0.0010 2.0020 -0.0010

-1.0000

�

� �#! � � ��

-0.2502 -0.9995 -0.25020.4990 2.0020 0.4990

-0.2502 -0.9995 -0.2502

�

� �#!��

-0.5005 -0.9990 -0.50050.9990 2.0020 0.9990

-0.5005 -0.9990 -0.5005

�

All stencils� �� are discretizations of Equation (4.7). Here

� � � is the standard dis-cretization of the anisotropic Laplacian. This stencil is recommended for direct discretiza-tion. The remaining stencils do, however, occur in practice. The unpredictable behaviorof the Galerkin coarse grid projection may be one typical reason why.

Let now � �� be the stencil in the SPAI-1 approximate inverse corresponding to� �� .

In the following the strong connections are based on Equation (4.5) applied to � �� for � � � � . This approach yields the correct strong connections for all stencils underinvestigation:

� � � ! � � � ��

0.0523 0.2242 0.05230.1570 0.6725 0.15700.0523 0.2242 0.0523

�

� � � � ��

0.20000.0001 0.5998 0.0001

0.2000

�

� �#! � � � ��

-0.0525 0.2246 -0.0525-0.1572 0.6731 -0.1572-0.0525 0.2246 -0.0525

�

� �#!��

-0.0824 0.2437 -0.0824-0.2437 0.7255 -0.2437-0.0824 0.2437 -0.0824

�

Experiment 15 Uncommon discretization of the anisotropic Laplacian We compare clas-sical AMG with the standard definition of strong connections to the new Definition (4.5) with

Conclusions 97

� � �(��

. We report on results for the uncommon discretizations of the Laplace operatorgiven by � �� in Equation (4.8) on a � #

�� #

�grid and for � � � � � � .

Figure 4.4 on the following page(a) shows the more robust convergence behavior of thestrong connections based on the SPAI-1 inverse. For

�( #��

(� , precisely that region

where the classical definition for the strong connections fails to find the correct directionof smooth error, the convergence rate for the strong connections based on the SPAI-1approximate inverse yields a converging iteration.

The operator complexities reported in Figure 4.4(b) are all very similar, while the approxi-mate inverse complexity in Figure 4.4(b) is nearly one for all values of � . �

4.3 Conclusions

We have presented a new heuristic to determine the directions of smooth error compo-nents in AMG. This heuristic is similar to that proposed by Ruge and Stuben as it uses athreshold on the relative magnitude of matrix entries controlled by a parameter � as a cri-terion. The difference is that the original definition uses the entries of the system matrix�

and the new definition uses entries of a precomputed approximate inverse � . Moreexactly, the new strong connection definition defines the largest positive entries in � asstrong connections. Thus for the new method one can apply the old definition to �� .

The method has proven to be equivalent for standard test problems on structured grids.When compared to an anisotropic Poisson problem on an unstructured grid, it yieldsslightly worse, but still comparable convergence rates as the standard Ruge/Stuben heuris-tic. For more complicated problems from the FEM on stretched grid or the streamlinediffusion method, the Ruge/Stuben heuristic fails to find the direction of error smoothing,while the new definition yields

�-independent convergence rates for those problems.

We conclude that for difficult problems the new heuristic might be a good alternative tothe classical definition of strong connections. Using the new definition has not shownto be better in terms of convergence rate in all cases, yet it never failed to find suitabledirections. In contrast we have found examples, where the classical Ruge/Stuben ap-proach fails to determine the correct strong connections. Thus the new strong connectiondefinition appears more robust than the classical definition.


Figure 4.4 AMG for uncommon Laplace discretization We test AMG with the mod-ified SPAI-H strong connection definition and smoother to the uncommon Laplace dis-cretizations given in Equation (4.8) on a � � � �� grid and for �

� � � � .[➠exp 017h.py]

0

0.2

0.4

0.6

0.8

1

−0

.25

−0

.2−

0.1

5−

0.1

−0

.05

0.0

0.0

50

.10

.15

0.2

0.2

50

.30

.35

0.4

0.4

50

.5

RSSPAI−1SPAI−H

(a) convergence �

0

0.5

1

1.5

2

2.5

3

3.5

4

−0

.25

−0

.2−

0.1

5−

0.1

−0

.05

0.0

0.0

50

.10

.15

0.2

0.2

50

.30

.35

0.4

0.4

50

.5

RSSPAI−1SPAI−H


0

0.5

1

1.5

2

2.5

3

3.5

−0

.25

−0

.2−

0.1

5−

0.1

−0

.05

0.0

0.0

50

.10

.15

0.2

0.2

50

.30

.35

0.4

0.4

50

.5

RSSPAI−1SPAI−H


Chapter 5

Preconditioned GalerkinProjection with SPAI

5.1 Including�

in the Galerkin projection . . . . . . . . . . . . . . 995.2 Numerical examples . . . . . . . . . . . . . . . . . . . . . . . . . 101

5.2.1 Rotated Anisotropy . . . . . . . . . . . . . . . . . . . . . . 101

5.2.2 Indefinite Helmholtz Problem . . . . . . . . . . . . . . . . . 102

5.2.3 General systems . . . . . . . . . . . . . . . . . . . . . . . . 106

5.2.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . 108

5.1 Including�

in the Galerkin projection

Uniform convergence of classical AMG has been proven for the subclass of symmetricM-matrices. A symmetric M-matrix is a symmetric positive definite matrix with a signconstraint, i.e. it must have zero or negative off diagonal entries. Stuben denotes thematrix classes that extend the M-matrix class to classes of matrices suitable to be solvedwith AMG as ideal classes. A necessary condition for a convergent AMG multilevelscheme is the assumption that the operators

� � on all levels belong to one of the idealclasses (seeTrottenberg et al. 2001). Yet the recursive definition of coarse grid operators1

� � � � � � �

does not guarantee this condition, even if the operator on the finest grid is from those idealclasses, e.g. diagonal dominant matrices.

Wang and Zhang (2002) observed that the number of strongly diagonally dominant rowsincreases substantially when

�is multiplied with an approximate inverse � based on

Frobenius norm minimization. This fact remains true, even for very sparse approximateinverses. Apparently preconditioning with an approximate inverse based on Frobeniusnorm minimization improves properties such as diagonal dominance.

1See also Section 2.3.1 and Equation (2.12)

100 Preconditioned Galerkin Projection with SPAI

In Chapter 3 we introduced approximate inverses based on Frobenius norm minimizationas smoothers. On every level of the multigrid hierarchy we compute a SPAI approxi-mate inverse � � of the corresponding operator

� � . We use that approximate inverse toconstruct a smoother by using

� �� If one chooses to use such approximate inverse smoothers the computed inverse might aswell be used for exploiting the improved matrix properties of � � compared those of

�by itself.

We introduce the preconditioned Galerkin Coarse Grid Approximation (PGCA) of thefine grid equation

� � � � �� (5.1)

by multiplying (5.1) from the left with the corresponding approximate inverse � � , i.e.

� � � � � � � � � � � (5.2)

A coarse approximation of Equation (5.2) yields� ��

��

(5.3)� �

��

Using Equation (5.3) within algebraic multigrid requires two minor modifications of thesetup and cycle:

1. The computation of the GCA operator in line 6 of Algorithm 2.6 needs to bechanged from

� � � � � � �� to

� � � � � � ��

2. The residual transfer on line 6 of Algorithm 2.5 needs to be changed from � � � �to

� � � � � �

In some cases we may use a modified interpolation, i.e. we use direct interpolation as inclassical AMG, but we take the weights for interpolation from the matrix � � insteadof the matrix

�. The reason is obvious: We are coarsening the operator � � in the

preconditioned residual Equation (5.2). Note that the structure of the interpolation isunchanged, because the direction of interpolation is still based on the heuristic describedby the strong connections. Changes are introduced solely for the weights. The modifiedinterpolation does not always yield better interpolation weights.

Numerical examples 101

The presented method is simple and general. Any AMG program must implement somekind of sparse matrix-matrix multiplication. Including the preconditioned Galerkin prod-uct in such a multiplication is implemented by just adding an additional call to that matrix-matrix multiplication routine. The method is general, because it can be applied to anymatrix suitable for AMG. In the following section we demonstrate the effectiveness ofthis approach by applying it to a number of test problems.

Remark 5.1.1 (diagonal case) When � has a diagonal pattern, the method is equivalentto traditional GCA. In this case � � is just a scaling of the rows of

�with a factor. Such

scalings have no effect upon the coarsening or the interpolation. �Remark 5.1.2 (explicit inverse necessary) This approach is only possible with explicitlycomputed inverses. Even when factorized approximate inverses are used this approach isnot applicable. When Gauss-Seidel is used, obviously using the smoother in a PCGA isnot at all possible. �Remark 5.1.3 (left inverses vs. right inverses) Note that the SPAI approximate inversesused in this thesis minimize

��

�� and not

��

�� . Thus using precon-

ditioning based on � � seems more natural then using� � . For symmetric matrices

both variants coincide, but for the finest level only. On the first level and all subsequentlevels symmetry is not guaranteed when using PCGA. One may choose to explicitly com-pute right approximate inverses for a left preconditioned version of PCGA. We have notpursued this approach, because of the potential additional memory cost involved for com-puting not only one, but two approximate inverses. �5.2 Numerical examples

5.2.1 Rotated Anisotropy

In Section 3.2.6 we have considered the rotated anisotropic Laplace equation. The resultsin Figure 3.12 show that the convergence-rate of AMG might reach 0.9 for selected an-gles of the rotation. In this next experiment we attempt to improve on the convergencebehavior of such angles by using the PCGA.

Experiment 16 (AMG with PCGA for rotated anisotropy problem) We use the PCGAin AMG and apply it to the rotated anisotropy problem. As strong connections we havefound the new definition from the previous section with a value � � �

(� to work best for

producing sparse coarse grid operators.


Convergence rates can be seen in Figure 5.1(a). We observe strong dependence of �on � for AMG with SPAI-1r introduced in the PCGA and as a smoother. For all otherSPAI inverses we observe convergence rates that are strictly below the ones obtained withclassical AMG, i.e. Gauss-Seidel smoother and standard GCA.

Especially when using the SPAI(0.4) inverse the AMG iteration converges in a single iter-ation. The good convergence rate is paralleled by very dense system matrices, as can beseen by looking at the corresponding values of � � in Figure 5.1(b). Using these inversescertainly is not practical, because in the worst case we observe an operator complexity of14. Nevertheless these experiments demonstrate that the PCGA in principle behaves asexpected for this problem: improving the quality of the inverse improves the convergenceof AMG when PCGA is used.

For the SPAI-1 and SPAI-H versions of the PCGA we clearly observe only slightly higheroperator complexities and approximate inverse complexities � � clearly below 1.5 for all an-gles � . At the same time the convergence rate clearly outperforms the original Ruge/Stubenapproach for critical angles. �

5.2.2 Indefinite Helmholtz Problem

We consider the Helmholtz equation

� � � � �� %

on the unit square with zero Dirichlet boundary conditions. We specifically look at con-stant values

� � � . Using the standard FDM as described in Section 2.1, yields thediscretization stencil

� � ��

� �

��

��

��

Figure 5.2 on page 104 shows solutions of the discrete Helmholtz operator on a 20 � 20grid. The resulting linear system

�obvious has eigenvalues of the standard Laplace

operator shifted by � � . It is thus singular when�

is equal to any of the analytically giveneigenvalues � � � �

��

with� � � � ! � � ��

The nullspace of the singular system is spanned the corresponding eigenfunctions� � � �� & � � �

The matrix�

is positive definite as long as� � � �� .


Figure 5.1 AMG with PCGA for rotated anisotropy We apply AMG with PCGA to therotated anisotropy problem for different angles � . We use a 128 � 128 grid[➠exp 010e.py]

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

0 10

20

30

40


(a) convergence �

0

2

4

6

8

10

12

14

16

0 10

20

30

40



0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

0 10

20

30

40



0

1

2

3

4

5

6

0 10

20

30

40




Figure 5.2 Solutions to the Helmholtz equation Numerical solutions to Helmholtzequation on a 20 � 20 regular grid with zero boundary conditions and source term% � �� .

0

10

20

0

10

20

0

0.1

xy

u

(a)��

0

10

20

0

10

20−0.5

0

0.5

xy

u

(b)��

0

10

20

0

10

20−0.5

0

0.5

xy

u

(c)��

0

10

20

0

10

20−0.02

−0.01

0

0.01

xy

u

(d)��


AMG convergence is obtained as long as�

is sufficiently far from an eigenvalue andsmaller than the 6th eigenvalue. The reason for poor multigrid performance is that inthe indefinite case Gauss-Seidel does not converge for some smooth eigenfunctions. Themultigrid iteration converges as long as these smooth eigenfunctions can be representedon the coarsest mesh. Thus limiting the number of levels, e.g. five levels, improves con-vergence at the cost of solving a possibly large coarse system. Detailed analysis of themultigrid behavior and experimental results for AMG on the Helmholtz operator can befound in Stuben (2001).

Several ideas have been proposed for improving the convergence of the multigrid iterationfor the Helmholtz operator. Brandt and Livshits (1997) introduce a wave-ray approach.The method is based on level dependent smoothing and the introduction of so called ray-cycles. Bader, Schimper, and Zenger (1999) extend the approach to hierarchical bases.The general approach of combining the BiCGstab iteration with AMG as a preconditionerto improve the convergence was originally proposed by Ruge and Stuben (1987). Elman,Ernst, and O’Leary (2002) also propose to combine the GMRES(m) iteration as an outeriteration and also on coarse levels for this problem. Huckle and Staudacher (2002) choosespecial transfer operators to speed up the convergence of the multigrid iteration for thespecial case of symmetric Toeplitz matrices.

By using a PGCA we expect to improve the spectrum of the coarse grid operators. Therebywe really improve the applicability of AMG to any indefinite problem. The next experi-ment shows this for the Helmholtz problem.

Experiment 17 (BiCGstab with AMG for Helmholtz) We apply BiCGstab with AMG aspreconditioner on a 128 � 128 grid with increasing � up to 600. Using larger values of �yields solutions with oscillations that cannot be represented on a 128 � 128 grid. Note thatit becomes easier to solve this problem when the gridsize

�is increased and the number

of levels is kept. Stuben (2001) reports AMG to work for this example when the number oflevels is limited to five. In contrast we use a limit of 10 points for the coarsest grid, resultingin 9 levels in the algebraic multigrid hierarchy for our given problem size.

The effect of lowering this limit for the coarsening process becomes apparent in the resultsfor Gauss-Seidel in Figure 5.3 on page 107. Subfigure 5.3(a) shows that the convergencerate for Gauss-Seidel rapidly deteriorates for increasing � . For � � 200.0 classical AMGfails to be a good preconditioner for the BiCGstab iteration.

In contrast all PCGA versions converge up to the given limit of �� (�. Although conver-

gence can be achieved, the convergence rate � is clearly close to one, i.e. well above�(�

for � � � � � . Often the best convergence rate is achieved with the SPAI-1/PCGA version.The fact that using SPAI-1 as a smoother alone improves the convergence shows thatSPAI-1 has better smoothing properties for this problem than Gauss-Seidel. The improvedconvergence rate does not come without a price. Figures 5.3(b) and 5.3(c) show thatthe overall complexity approximately doubles, independently of � . The additional memory


used nevertheless usually pays off in execution time, i.e. the execution time in Figure 5.3(d)is most often minimized by the using SPAI-1 with PCGA, especially for large � . �

5.2.3 General systems

We also apply the approach to a set of systems retrieved from the Internet. We want totest if the method is general enough to be applied to problems without specific knowl-edge about the nature of the system. We thus investigate problems that arise from thediscretization of partial differential equations.

Sherman Collection We start the investigation with the well-known Sherman collec-tion retrieved from the Matrix Market. In summer 1984, Andrew Sherman of Nolan andAssociates, Houston, TX, USA, issued a challenge to the petroleum industry and thenumerical analysis community for the fastest solution of 5 systems of linear equationsextracted from oil reservoir modeling programs. Each matrix arises from a three dimen-sional simulation model using a seven-point finite-difference approximation The resultingmatrices are symmetric and the corresponding right-hand side vector is also supplied.

Experiment 18 (BiCGstab with AMG for the Sherman collection) We apply the PGCAto the matrices in the Sherman collection. We apply AMG with our standard settings asa preconditioner to the BiCGstab iteration. Figure 5.4 on page 109 reports results for thisexperiment.

The graphs show no entries for the sherman2 problem. This very ill-conditioned problemcannot be solved by any of the methods used in this experiment, even when the requiredtolerance for the solver is reduced to � � �

.

The Figure 5.4(a) shows that the convergence properties can be substantially improvedfor the sherman collection when using approximate inverse smoothers and PGCA. TheSPAI(0.7) inverse only improves the convergence for the sherman5 example. All otherSPAI inverses yield an improved convergence rate.

The PGCA obviously yields coarse grid systems that differ from the usual GCA. Thus theoverall operator complexity � � changes. Looking at Figure 5.4(b) reveals that the operatorcomplexity substantially increases for the static sparsity patterns of SPAI-1 and SPAI-1r, inthis case somewhere between 3.0 and 7.0. Yet for the SPAI( � ) approximate inverses theoriginal operator complexity obtained by the original Ruge/Stuben AMG approach is onlymarginally increased.

The amount of additional storage necessary for the approximate inverses is depicted inSubfigure (c). Especially the SPAI-1 smoother exhibits large additional storage up to � � �

�

(�. The SPAI( � ) approximate inverses in contrast stay strictly below �

(�, which means that


Figure 5.3 BiCGstab with AMG for Helmholtz We apply BiCGstab with AMG as apreconditioner to the Helmholtz problem with increasing

�. For the cases where approx-

imate inverses are used we use the PGCA. We use a 128 � 128 grid and 10 points as thelimit for the number of unknowns for the coarsest grid. [➠exp 016d.py]

0

0.2

0.4

0.6

0.8

1

0.0

60

.0

12

0.0

18

0.0

24

0.0

30

0.0

36

0.0

42

0.0

48

0.0

54

0.0

60

0.0

GSSPAI−1SPAI−1/PGCASPAI−1rSPAI−1r/PGCASPAI(0.5)SPAI(0.5)/PCGA

(a) convergence �

0

1

2

3

4

5

0.0

60.0

120.0

180.0

240.0

300.0

360.0

420.0

480.0

540.0

600.0



0

1

2

3

4

5

0.0

60.0

120.0

180.0

240.0

300.0

360.0

420.0

480.0

540.0

600.0



0

5

10

15

20

0.0

60.0

120.0

180.0

240.0

300.0

360.0

420.0

480.0

540.0

600.0




the storage cost for the approximate inverses on all levels is below the cost of storing theoriginal matrix

� �.

Overall the execution time in this experiment is minimal for the SPAI(0.5) smoother. For thesherman1 problem execution times are very similar for all methods. For all other problemsthe execution time of the SPAI(0.5) approximate inverse version of BiCGstab with AMGhalves the execution time of the solution phase using Gauss-Seidel with standard GCA.For the sherman5 problem the decrease in solution time is most noticeable. The Gauss-Seidel version of the AMG preconditioner needs more than 7 seconds due to the badconvergence rate that can be seen in Subfigure (a). �

Venkat Problems The Venkat problems2 are matrices assembled and published by V.Venkatakrishnan from NASA. They arise in an unstructured 2D solver for the Euler equa-tions. The Venkat01, Venkat25 and Venkat50 matrices correspond to the time steps 1, 25and 50 respectively. The matrix size of each matrix in this set is 62424, while the num-ber of nonzeros is 1717792. The resulting average number of nonzeros per row of 27, isconsidered high for applying AMG.

Experiment 19 (BiCGstab with AMG for Venkat problems)We again apply BiCGstab with AMG as a preconditioner to the Venkat problems. In thiscase we use a modified interpolation scheme, where we apply direct interpolation to thematrix

� �instead of the matrix

�. In this case the modified interpolation is of crucial

importance, i.e. no convergence can be achieved with standard AMG interpolation.

Figure 5.5 on page 110 shows that classical AMG does not converge at all for these prob-lems. The convergence rates shown in Subfigure 5.5(a) show that all PCGA versions ofAMG yield good convergence rates strictly below

�(

�for the Venkat01 problem. Best con-

vergence is achieved by using SPAI-1 and PCGA. The convergence rate monotonicallyincreases for increasing time steps in the Euler solver. Note that in this example the over-all memory cost for storing the multilevel hierarchy stays well below 4. That is a typicalvalue for classical algebraic multigrid when a fan-out effect arises, like in a rotating flowsituation. �

5.2.4 Conclusions

The new idea pursued in this chapter is the incorporation of an approximate inverse intothe Galerkin product. The method is meant for situations where the standard GCA doesnot produce satisfactory coarse grid operators. The method is simple and general, asit only requires one additional matrix-matrix multiplication and can be applied to anyproblem.

2see also http://www.cise.ufl.edu/˜davis/sparse/Simon/


Figure 5.4 BiCGstab with AMG on the Sherman Collection We apply BiCGstab withAMG as a preconditioner to the Sherman Collection. For the cases where approximateinverses are used we use the PGCA. [➠exp 017a.py]

0

0.2

0.4

0.6

0.8

1

sherman4

sherman3

sherman1

sherman5

sherman2


(a) convergence �

0

1

2

3

4

5

6

7

sherman4

sherman3

sherman1

sherman5

sherman2



0

1

2

3

4

5

6

7

sherman4

sherman3

sherman1

sherman5

sherman2



0

0.1

0.2

0.3

0.4

0.5

0.6

sherman4

sherman3

sherman1

sherman5

sherman2




Figure 5.5 BiCGstab with AMG for Venkat problems We apply BiCGstab with AMGas a preconditioner to the Venkat problems. For the cases where approximate inverses areused we use the PGCA. [➠exp 017c.py]

0

0.2

0.4

0.6

0.8

1

venkat01

venkat25

venkat50


(a) convergence �

0

0.5

1

1.5

2venkat01

venkat25

venkat50



0

0.5

1

1.5

2venkat01

venkat25

venkat50



0

500

1000

1500

venkat01

venkat25

venkat50




Numerical examples demonstrate that good convergence rates for simple problems aremaintained, while the convergence rate can be substantially improved when convergenceof classical AMG with standard GCA is slow. Combining the new PCGA approach withapproximate inverses as smoothers and strong connection definition can be successful inproducing sparse coarse system operators and sparse approximate inverses. Problems thatare to be expected with the fill-in of coarse grid operators, when the approximate inversesbecome too dense, are observed.

The PCGA has been demonstrated to be useful in solving the Helmholtz problem using afull multigrid hierarchy, when AMG is combined with the BiCGstab iteration. ClassicalAMG and even geometric multigrid fails for this problem for large

�, due to deficiencies

in the Gauss-Seidel smoothing procedure and spurious smooth eigenmodes in the errorthat cannot be represented on coarse grids. The properties on the coarse grid operatorsare improved using the PCGA and yield a converging iteration up to very large

�.

Application of the PCGA to general sparse systems has demonstrated the generality of theapproach. In all examples, the convergence rate of the resulting AMG iteration is alwaysimproved when compared to the convergence rate of the original AMG approach.

Chapter 6

Parallel Coarsening byRelaxation

6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1136.2 Our approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1146.3 Compatible Relaxation . . . . . . . . . . . . . . . . . . . . . . . . 1156.4 Coarsening by Relaxation . . . . . . . . . . . . . . . . . . . . . . 119

6.4.1 Principles . . . . . . . . . . . . . . . . . . . . . . . . . . . 1196.4.2 Framework . . . . . . . . . . . . . . . . . . . . . . . . . . 121

6.5 Approximate Inverse Relaxation . . . . . . . . . . . . . . . . . . 1246.6 Numerical results . . . . . . . . . . . . . . . . . . . . . . . . . . . 1266.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129

Classical AMG is sequential in nature. The ubiquitous Gauss-Seidel iteration and theRuge/Stuben greedy coarsening heuristic are inherently sequential. Chapter 6 introducesthe SPAI smoothers. For standard test problems we show that the SPAI-1 smoother yieldsample smoothing efficiency to the Gauss-Seidel iteration. Parallel coarsening has thusbecome a “missing link” to a clearly parallel AMG method. We therefore devote thischapter to the subject of a new parallel coarsening idea.

6.1 Introduction

When AMG is parallelized on a distributed memory parallel machine, the most commonapproach is to distribute the grid points and the respective equations among the processorsin some sensible way to preserve the locality of the operator

�. Krechel and Stuben (1999)

and Henson and Meyer-Yang (2002) report on such approaches.

A canonical approach to parallelize Algorithm 2.7 is to run the algorithm locally on allprocessors. This procedure alone produces � - � and � - � connections along the proces-sor boundaries that introduce interpolation errors and large operator complexities respec-tively. To eliminate the problem of interpolation errors a second pass that “fixes” the

114 Parallel Coarsening by Relaxation

boundary is introduced. Variations of this idea exist, including coarsening of the bound-ary first and the interior second and also other algorithms using ideas from computingindependent sets in parallel (Goldberg and Spencer 1989; Luby 1985).

All these approaches share the property that the result depends on the number of pro-cessors employed. Generally speaking the smaller the number of grid points on eachprocessor, the worse the resulting convergence rate and operator complexity.

One may argue that parallel execution only makes sense for a reasonable ratio of gridpoints per processors, but on coarser grids this approach will always lead to possibleparallel inefficiencies and larger than necessary operator complexities. Agglomeration–techniques (McBryan et al. 1991) are known to alleviate some of the inefficiencies oncoarser grids, but they introduce an extra complexity in the programming process andpossibly expensive volume communication.

6.2 Our approach

In this thesis we have devoted ourselves to exploring the possibilities of a large scaleparallel algorithm that doesn’t suffer from the processor dependency of the result. Theconsequence is a guarantee for perfect parallel scalability of the convergence rate and theoperator complexity. The sequential properties of the resulting algorithm are very likelyto degrade under this strong assumption of parallel scalability. The resulting algorithmshould really be compared to other parallel versions, which we have not done yet.

There are several approaches, especially those that may be derived from parallel algo-rithms to find maximal independent sets (e.g. Goldberg and Spencer 1989; Luby 1985).We have chosen to investigate an algorithmic approach with a structure that is similar tothe well known simulated annealing method (Johnson et al. 1991). Simulated Annealingis an iterative algorithm. Starting with an approximate solution, in each step one tries toobtain a better solution by applying an alteration function that introduces “small changes”in the approximate solution. The iteration is continued until the approximation is suffi-ciently good. The art of designing good optimization loops is to choose a good measurefor the quality of approximate solutions and finding suitable alteration functions that arelikely to improve approximate solutions.

Simulated Annealing is known for its long runtime, usually because very many iterationsare necessary. The quality of the alteration function often determines the overall qualityof the algorithm. We show in this Chapter that for parallel coarsening 5 iterations seemsufficient the problems we have looked at.

We have chosen this approach manly for two reasons: simplicity and good potential par-allelism.

Compatible Relaxation 115

Remark 6.2.1 (Energy functions) In operations research it is very usual to optimize anenergy function, usually a scalar function that gauges the quality of the current solution(see e.g. Johnson et al. 1991). As in many other cases it is non-trivial to find such anenergy function for a set of coarse grid points. The optimal energy function would be theruntime of a two–grid–iteration (multigrid with 2 levels only). This quantity cannot becomputed, on fine levels it would require the solution of very large systems. Other heuris-tic energy function approaches, such as the number of � -to- � connections may work aswell, but the exact relation to AMG convergence is unknown. Not to be limited by theformal definition of energy functions we have chosen to devise the algorithm in a strictlyheuristic fashion. Additionally one should keep in mind that many such energy functionsmight not be computable in parallel and that the widely–used Ruge/Stuben coarseningdoes not contain such a function as well. �6.3 Compatible Relaxation

Compatible relaxation (CR) was introduced by Brandt (2000). The idea is to relax only� variables ( � –smoothing) the equation

��

(6.1)

with the initial guess ��

� . We denote the following standard vectors�

and� �

as:

� � ��

otherwise� �

��

and usually use the special cases�

� and��

. Since the solution � of Equation (6.1) isknown to be 0, one of the first iterations of the relaxation procedure, e.g. � � � � � or � � canbe evaluated for pointwise convergence of the values at F points. The smaller the value,the better the pointwise relaxation. It is assumed that fast pointwise relaxation is associ-ated with “nearness” to � points with respect to the smoother. Compatible relaxation isthus a measure of the current set of coarse variables and F points with pointwise conver-gence rates close to 1 are good candidates for being turned into � points in a coarseningalgorithm.

Example 6.3.1 (Compatible relaxation) As an illustration we use second-order finitedifferences on the unit square

� � � � � � , using a standard equidistant rectangular grid.The problem we will look at is the anisotropic Laplace equation:

� � � � � � � � � � � �


Figure 6.1 Compatible relaxation for Laplacian ( � � ), Jacobi ( � � � � )

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

2 4 6 8 10 12 14

2

4

6

8

10

12

14

(a) initial value �

��

��

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

2 4 6 8 10 12 14

2

4

6

8

10

12

14

(b) ��

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

2 4 6 8 10 12 14

2

4

6

8

10

12

14

(c) �

This example is quite unrealistic in the context of algebraic multigrid and only used formotivation of compatible relaxation. We consider a C/F splitting with all points being Cpoints, except a center circular region of F points. The closer to the center of this regionan F point is, the more suitable as a C point it is. The F points on the inner edge ofthe region are best suited to stay F points. Figure 6.1(a) shows the corresponding vector�

� which we will relax in subsequent steps using Jacobi relaxation of the F points withdampening parameter � � � � . Figure 6.1(b) shows the solution after one relaxation step.The outermost points of the F region have decreased values in � , while the interior pointshave almost not converged at all. Figure 6.1(b) shows a second smoothing step. Thesolution shows similar desired properties.

In case of an anisotropic problem with ( � � � � � ) the Jacobi relaxation is know to

smooth efficiently in the direction of strong connection (in this case the � –direction) only.Figure 6.2 shows a similar series for this problem. Instead of the symmetric � in theisotropic case we can now observe a � that matches the anisotropy. �Remark 6.3.1 (CR as parallel as smoother) Note that compatible relaxation is as paral-lel as the smoother used in this procedure. The smoother used in compatible relaxationshould certainly be the smoother used in the multigrid iteration. Thus compatible relax-ation is as parallel as necessary. �Remark 6.3.2 (No strong–connection matrix required) Furthermore compatible relax-ation does not need the definition of a strong–connection matrix, as usual in algebraicmultigrid (see e.g. Stuben 2001). Huang (1991) and Chow (2001) show the deficienciesof strong–connection matrices. These deficiencies are in traditional approaches trans-ferred into the coarsening. �

Compatible Relaxation 117

Figure 6.2 Compatible relaxation for anisotropic Laplacian ( � � � � � ), Jacobi ( �

� � � )

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

2 4 6 8 10 12 14

2

4

6

8

10

12

14


��

��

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

2 4 6 8 10 12 14

2

4

6

8

10

12

14

(b) ��

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

2 4 6 8 10 12 14

2

4

6

8

10

12

14

(c) �

Towards a Coarsening Scheme Compatible relaxation can be used to find good po-tential C-to-F points, e.g. some inner region of the circle from the last example. Startingfrom a initial Splitting �

�! ��, a set � � is determined by compatible relaxation using

�� iterations and a threshold of � � � :

� � � � � � ��

The splitting ��! ��

is updated to � � ! � � by moving a suitable coarse set from � � from� to � . Algorithm 6.1 lists the proposed algorithm in a more formal way.

Algorithm 6.1 Framework for a coarsening algorithm

1 Coarsening � � CoarsenFCT � � � � � � � ��

2 � ! � CoarsenFCT � � � � � – initial coarsening –3 while coarsening not good

� – outer loop –4 �

�� – starting vector –

5 for� relax

��

– compatible relaxation –6 � � � � � � �

�� – determine potential coarsening points –

7

� CoarsenFCT � � �� – coarsen from that set –

8 � � � � – add�

to � –9 � � ! � – remove

�from � –

10

11


Figure 6.3 Compatible relaxation for Laplacian ( � � ), Gauss-Seidel

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

2 4 6 8 10 12 14

2

4

6

8

10

12

14


��

��

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

2 4 6 8 10 12 14

2

4

6

8

10

12

14

(b) �

: one step Gauss-

Seidel

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

2 4 6 8 10 12 14

2

4

6

8

10

12

14

(c) ��: two steps Gauss-

Seidel

Problems We will discuss several difficulties associated with this approach for devisinga large scale parallel algorithm.

Initial guess The initial guess for the required method need be a suitable coarse set. Oth-erwise the algorithm will asses a very large set of F points as equally bad and noprogress is made by compatible relaxation. It really requires a suitable parallelalgorithm to find a coarse set.

Additional Coarsening needed Algorithm 6.1 incorporates yet another separate coarsen-ing algorithm. While this can possibly be a well-known algorithm it is still non-trivial to find an algorithm with the desired parallelism.

Fixed and small iteration number The iteration number�

must be small. If it is too largethe solution may be too close to the correct solution, which is 0. Figure 6.3 showsthat few iteration may not yield satisfactory results though. It is well known thatGauss-Seidel smoothes the error equally in all directions for the standard Poissonproblem. Compatible relaxation yields an asymmetric gauging of the fine set ofvariables though. More such examples can be found for different configurations ofF/C points, not only this artifical example with the center spot of F points.

Stopping criterion The outer loop contains a yet unspecified end condition. The idea isto stop at a prescribed maximal pointwise convergence.

Coarsening by Relaxation 119

6.4 Coarsening by Relaxation

Coarsening by relaxation (CBR) is based upon three main principles: random initialguess, C-to-F transfer and simple swapping. We will go into more detail about theseprinciples in the following section

Remark 6.4.1 (Regular test problems) Coarsening by relaxation is targeted towards ir-regular grid structures. With irregular grid structures it is unclear though what the bestC/F splitting is. Actually it is usually unclear if there is such an optimal splitting. Addi-tionally it is non-trivial to gauge the quality of a splitting, because there is a natural tradeoff between the number of coarse grid points and convergence, where either of the goalscould be given a preference.

For the remainder of this Chapter we will thus still argue based upon experiments basedon regular and Laplace-like model problems, simply because we need to resort to thesewell-studied cases where we have some knowledge about the desired C/F splitting. �6.4.1 Principles

Random Initial Guess We know that for the Poisson problem on the standard grid thetypical red-black coloring of the nodes yields a good coarsening. For special problems,the original Ruge/Stuben coarsening algorithm finds exactly this coarse set of points.generally speaking for a red-black coloring there exist two sets of possible coarse points:either the red or the black points. These two possible C/F splittings are the exact comple-ment of each other, i.e. each points role is swapped in the two splittings. In the contextof an initial guess this means: No point is determined to be � point or � point from thebeginning with our knowledge of any other points. Yet fixing a single point as either �point or � point determines the status of all other points.

For a parallel coarsening algorithm this motivates the following:

� It seems impossible to find a parallel algorithm with acceptable parallel behaviorthat will find the perfect red-black coloring of the nodes.

� There is no predetermined C/F choice for such regular problems, thus a randominitial guess seems appropriate.

Advantages of a random initial guess are:

Simple and parallel Producing a random initial guess is simple and parallel.


Figure 6.4 Random initial guesses Examples for different random initial guesses withdifferent densities.

2 4 6 8 10 12 14

2

4

6

8

10

12

14

(a) ��

��

2 4 6 8 10 12 14

2

4

6

8

10

12

14

(b) ��

��

2 4 6 8 10 12 14

2

4

6

8

10

12

14

(c) ��

��

Controlled density Initial coarse set with a prescribed density �

��

�!��

�can be pro-

duced.

Even distribution Randomly selected coarse sets are distributed fairly evenly on the grid.They are usually neither perfect, nor completely useless first guesses. The conse-quence of this is that it is definitely necessary to improve the initial guess substan-tially, but it needs not to be completely rearranged. Changing a certain ratio, say inbetween 30% and 70% of all the points should suffice to find a good set of coarsegrid points. Figure 6.4 shows random initial sets � for a regular 15 � 15 grid.

Remark 6.4.2 (Deterministic random initial guess) A random initial guess can be pro-duced deterministically. Most random number generators are linear congruential genera-tors, i.e. the random numbers are really an algebraic sequence:

� � � � � � � � � � � � ��

with special values for � � � and�. The starting value � � is called the seed. If we use

different sequences for each node, using the node’s global index as a seed, we can producedeterministic coarse sets independent on the number of processors if needed. Note thatthe might not be deterministic if machine dependent floating point arithmetic is used. �C-to-F Transfer Brandt suggest an algorithm where one adds � points only. One de-ficiency of that approach is that one needs to automatically determine when to stop theiteration. In the limit case one would end up with � �

. On the other hand, if theiteration is stopped too early � might be too small. One remedy to this problem can beto in each step remove a set � � of � points in case they have become too dense. One


simple procedure to do so is to use compatible relaxation with the sets � and � swapped.We will discuss techniques for finding suitable � points in the following section.

Simple Swap In order to avoid the necessity of an additional coarsening algorithm wepropose a simple swap, i.e. using the full � � set, instead of a coarse choice from � � .Similarly for the newly introduce set � � . Formally that leads to the following exchange:

��

� � � � � � �

��

� � � � � � � �

Remark 6.4.3 (Convergence of CBR) We define CBR to converge in some iteration stepif the corresponding sets � � and � � are empty. In the case of CBR–convergence futureiterations introduce no change in the C/F splitting. For model problems we have observedCBR to converge in three steps or less.

Figure 6.5 shows an example where three steps of the outer loop of CBR are visualizedfor an unstructured Poisson problem and convergence can be observed.

This convergence property is central idea that distinguishes this approach from coarseningwith compatible relaxation only. �Remark 6.4.4 (Post–processing) In order to define an interpolation it is often requiredthat every F point has at least one strong connection to a coarse point. Alternatively otherrequirements may be expected from the coarse set of points. We assume such require-ments to be fixed in a PostProcess routine. �6.4.2 Framework

The principles in Section 6.4.1 is our basis for the framework listed in Algorithm 6.2 onpage 123. The focus in devising this algorithm was simplicity. Many other versions arepossible.

Specifically line 2 could potentially contain any other coarsening algorithm or a similarheuristic, e.g. an independent set algorithm or a processor–local version of the Ruge/Stubencoarsening.

Lines 6 and 7 could similarly be realized, not by a simple swap, but by a more sophisti-cated coarsening step.

The next section will discuss methods to determine the set � � , as in line 5 of Algo-rithm 6.2.


Figure 6.5 Coarsening by relaxation for Laplacian (1) Problem on an unstructuredgrid. Figure shows the first 4 iterations of Algorithm 6.2. The random initial guess inSubfigure (a) contains some points to be swapped: grey ( � � ) and white ( � � ) dots. In (b)the number of such points is already drastically reduced. In all further steps such pointsessentially do not occur anymore. The graph shows that CBR converges in 2 steps for thisexample. Figure (d) shows no points to be swapped anymore.

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

(a) random initial guess

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

(b) step 1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

(c) step 2

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

(d) step 3


Figure 6.6 Coarsening by relaxation for Laplacian (2) Problem on an unstructuredgrid, ( � � , � � � ; Figure shows the resulting grids using Algorithm 6.2

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

(a) level 1/2

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

(b) level 2/3

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

(c) level 3/4

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

(d) level 4/5

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

(e) level 5/6

Algorithm 6.2 Coarsening by relaxation

1 CoarseningByRelaxation � � � 2 � �� RandomSet � � � – initial coarsening –3 while coarsening not good

� – outer loop –4 determine � � from � – determine potential � to � points –5 determine � � from � – determine potential � to � points –6 � � � � � � � � � – exchange in � –7 � � � � � � � � � – exchange in � –8

9 PostProcess ��

10 return �� 11


6.5 Approximate Inverse Relaxation

If�

is non-singular, the solution of��

is the � th column of the exact inverse� � � �� . The value

� � � � � � � can betaken as a measure of how important node � is for interpolating node

�. The solution of

�� (6.2)

is likewise

�

�� (6.3)

Thus � � is now a measure for how well it can be used for interpolation, if it were an �point, given the current set of coarse points � . This applies to coarse points, as well asfine points. Figure 6.7 on the facing page shows the solution of

��

for the standardand the anisotropic Poisson problem.

An approximation to � that solves Equation (6.2) can be used by relaxing Equation (6.2)using an initial guess �

� �� . We call this process approximate inverse relaxation (AIR).

If � � needs to be determined, � is a good measure for potential � -to- � candidates. Thisis the converse process to compatible relaxation.

� � � � � � ��

Additionally � – or � –smoothing can be used. If the set � � needs to be determined� –smoothing prevents weights from � points to be added to � . Hence the approximatesolution � � compares � points solely rather than � and � points all together. Figures 6.8and 6.9 show examples of approximate inverse relaxation with an inner circular patchof � points and the standard, as well as the anisotropic Poisson problem. Results aresimilar to compatible relaxation with swapped roles for � and � points. Figure 6.10shows that approximate inverse relaxation initially suffers from the same asymmetry thatcan be observed in compatible relaxation. Figure 6.10 shows that the asymmetry vanishesvisually after 9 iterations.

Remark 6.5.1 (Exact result for � smoothing) Convergent � smoothing in approximateinverse smoothing of Equation (6.2) leads to the solution of the modified system

��

where��

diag � � � � � diag � ��. Note that

��may be singular, even if

�is not.

Figures 6.7(c) and 6.7(d) show solutions of the modified system��. They are structurally

Approximate Inverse Relaxation 125

Figure 6.7 Solution � of��

and��

.

2

4

6

8

10

12

2 4 6 8 10 12 14

2

4

6

8

10

12

14

(a) �� , Laplacian

5

10

15

20

25

2 4 6 8 10 12 14

2

4

6

8

10

12

14

(b) � � � �� , anisotropic

Laplacian ( �� )

0

1

2

3

4

5

6

2 4 6 8 10 12 14

2

4

6

8

10

12

14

(c)�� , Laplacian

0

2

4

6

8

10

12

2 4 6 8 10 12 14

2

4

6

8

10

12

14

(d)�� , anisotropic

Laplacian ( �� )

Figure 6.8 Approximate inverse relaxation for Laplacian (jac) (C-smoothing), Jacobi( � � � � )

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

2 4 6 8 10 12 14

2

4

6

8

10

12

14


��

� �

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

1.1

2 4 6 8 10 12 14

2

4

6

8

10

12

14

(b) ��

0.2

0.4

0.6

0.8

1

1.2

2 4 6 8 10 12 14

2

4

6

8

10

12

14

(c) �


Figure 6.9 Approximate inverse relaxation for anisotropic Laplacian (jac) (C-smoothing), ( �

� � � � ), Jacobi ( � � � � )

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

2 4 6 8 10 12 14

2

4

6

8

10

12

14


��

� �

0.2

0.4

0.6

0.8

1

1.2

2 4 6 8 10 12 14

2

4

6

8

10

12

14

(b) ��

0.2

0.4

0.6

0.8

1

1.2

1.4

2 4 6 8 10 12 14

2

4

6

8

10

12

14

(c) �

very similar to Figures 6.7(a) and 6.7(b). �Remark 6.5.2 (Scaling of

�) The individual columns

� � �� of the inverse rely on the

scaling of the columns in the matrix�

. When adding them in Equation (6.3) one mustensure that they are properly, i.e. equally scaled. Possible scaling include scaling of thediagonal entry to one or scaling all 2-norms of the columns identically. We have notinvestigated the effect of re-scaling in this thesis, assuming that the scaling is good enoughfor the sample problems that we have tested. �6.6 Numerical results

Algorithm 6.2 is a framework and we need to yet specify exactly what measurements touse to determine the sets � � and � � . Before we proceed to discuss numerical results wewill make some remarks related to CBR, which are independent of the concrete choice ofmeasurement functions.

CBR introduces new parameters:

��

� � The number of iterations of the outer loop. We use ��

� � � for all of ourexperiments, because we have observed experimentally the process to converge inat most 5 iterations, if at all.

�� The number of iteration steps of compatible relaxation and approximate in-

verse relaxation. The number of steps for AIR is less critical than for CR, since it

Numerical results 127

Figure 6.10 Approximate inverse relaxation for Laplacian (gs) (C-smoothing), Gauss-Seidel

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

2 4 6 8 10 12 14

2

4

6

8

10

12

14


��

� �

0

0.2

0.4

0.6

0.8

1

1.2

1.4

2 4 6 8 10 12 14

2

4

6

8

10

12

14

(b) ��

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2 4 6 8 10 12 14

2

4

6

8

10

12

14

(c) �

0

0.5

1

1.5

2

2 4 6 8 10 12 14

2

4

6

8

10

12

14

(d) ��

0

0.5

1

1.5

2

2.5

2 4 6 8 10 12 14

2

4

6

8

10

12

14

(e) ��

0

0.5

1

1.5

2

2.5

3

2 4 6 8 10 12 14

2

4

6

8

10

12

14

(f) ��

0

0.5

1

1.5

2

2.5

3

3.5

2 4 6 8 10 12 14

2

4

6

8

10

12

14

(g) � �

0

0.5

1

1.5

2

2.5

3

3.5

4

2 4 6 8 10 12 14

2

4

6

8

10

12

14

(h) ��

0.5

1

1.5

2

2.5

3

3.5

4

4.5

2 4 6 8 10 12 14

2

4

6

8

10

12

14

(i) ��


may be quite high. For AIR we use an � �� of 5, while we use a value of 1 or 2for �

� � .

� � � � � �� The thresholds for CR and AIR are critical. Based on the observation of fairlyeven distribution of the random initial guess (see Section 6.4.1) we use a reverseengineering approach in our experiments. We assume that in the first iteration acertain ratio of the points � should be changed. We then compute a corresponding � ,such that the size of the set to be changed corresponds exactly to � , i.e.

��

� �

��

�or

��

� �

��

�. This cannot be done without sorting of all � or all � points by

the corresponding value in � . This approach is rather experimentally useful, butnot at all suitable for parallel and fast application of CBR. For application of CBRin practice one would have to resume to finding suitable � . We fix this � for futureiterations, otherwise CBR would not converge.

Remark 6.6.1 (Stability) CBR is clearly a randomized algorithm, because of the randominitial guess. Although we have previously pointed out that identical results independentof the number processors may be obtained, it is yet desirable for the algorithm to be stablewith respect to the randomness introduced. We check this by computing � � using

� � ��

��

for all properties � of interest, e.g. the convergence rate � , the operator complexity � � ,etc. over 5 runs. Assuming normal distribution of � the result of any run lies within theinterval of the average value of �� with probability 99,7%. In none of our tests thecomputed values for � � yield to a significant interval. �Remark 6.6.2 (

�-independence) Our results reflect experiments with at least 30,000

nodes and up to 50,000 points. Due to the desired overall�

-independence of the fullalgebraic multigrid algorithm, it is essential to show this property experimentally. Thiswill be future research. �Remark 6.6.3 (Annealing) CBR could easily be forced to converge by adjusting thethresholds for CR/AIR in successive iteration. This approach would be analogous tosimulated annealing. We feel that this is not necessary and for the sake of simplicity donot introduce such artificial annealing. �Remark 6.6.4 (Operator complexity vs. convergence) In algebraic multigrid there isan inherent trade-off between operator-complexity and convergence. Indirectly via thedensity of coarse grid points in the initial guess and the chosen thresholds, the user ofCBR selects a midpoint within this trade-off. We have chosen to report only values for

Conclusions 129

the best operator complexity and the best convergence, when a series of tests was made.�Table 6.1 CBR for unstructured Laplacian

� method � � � � � �� initial � � � �

13041 CBR 0.15 0.4 0.71 2.3613041 CBR 0.20 0.6 0.41 6.4913041 RS-AMG — — 0.41 3.0151681 CBR 0.15 0.4 0.76 2.3551681 CBR 0.20 0.6 0.43 6.9551681 RS-AMG — — 0.43 3.01

205761 CBR 0.20 0.4 0.78 2.43205761 CBR 0.15 0.6 0.42 6.86205761 RS-AMG — — 0.45 2.95

Experiment 20 (CBR for unstructured Laplacian) We test the CBR method on an un-structured Poisson problem with different grid sizes. Table 6.1 lists convergence rates �and operator complexities � � for CBR and standard Ruge-Stuben coarsening as in clas-sical AMG. The CBR algorithm was incorporated into the BoomerAMG code (see Hensonand Meyer-Yang 2002).

For test series of the CBR algorithm we list the parameter setting with the best resultingconvergence rate and the setting with the best resulting operator complexity. We seethat in neither case the result can compare with the Ruge Stuben heuristic. Either theconvergence rate reaches 0.7 and above or the operator complexity exceeds a value of6.0. The resulting solution time in either case approximately doubles. �

6.7 Conclusions

We developed a new coarsening method for algebraic multigrid. The method is iterativeand derives a coarse grid using a random initial guess for the � / � splitting and then usesa swapping scheme that swaps � and � points. From experiments we conclude that 5 ofsuch iterations are sufficient to obtain a stable � / � splitting. We present results for an un-structured Poisson problem and show that the overall AMG solution time approximatelydoubles. This loss in efficiency might be recoverable in a real parallel setting, because ofthe overall parallelism in the code.

One of the reasons that we do not present more results for CBR is that we realized thatthe choice of the newly introduced parameters for initial density of the randomly selected


� points and the swapping criterions are difficult to choose. We found that the choice ofthese and also other parameters became too complicated. The main reason for choosingcompatible relaxation and approximate inverse relaxation as a criterion for swapping �and � points was the fact that the resulting algorithm would not make use of a strongconnection definition. It thus promised to be automatically tune itself to the smootherused. While this principle remains true for the resulting algorithm it only makes sense inthe overall AMG iteration, if an interpolation that is independent of a strong-connectiondefinition as well is used. Due to the lack of such an interpolation scheme so far, we seemore disadvantages with the increased number of parameters than advantages.

While we consider CBR irrelevant in the form presented here, this chapter neverthelessshows that the general idea may compete with classical parallel coarsening algorithms.We therefore propose to keep the main idea of the coarsening scheme based on a randominitial guess and a iterative swapping, but stepping backwards and adapt it to a givendefinition of strong connections. The resulting method could require less parameters,because measures based on the number of connections can be formulated based on thenumber of (strong) connections. The author is planning on investigating such an approachin the future.

Chapter 7

Object-oriented Implementationof AMG in Python

7.1 Efficient Ruge/Stuben coarsening . . . . . . . . . . . . . . . . . . 131

7.2 Sparse matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134

7.2.1 Storage formats . . . . . . . . . . . . . . . . . . . . . . . . 134

7.2.2 Sparse matrix-vector multiplication . . . . . . . . . . . . . . 137

7.2.3 Sparse matrix-matrix multiplication . . . . . . . . . . . . . . 137

7.3 Galerkin product . . . . . . . . . . . . . . . . . . . . . . . . . . . 140

7.3.1 Straightforward . . . . . . . . . . . . . . . . . . . . . . . . 140

7.3.2 In-place . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142

7.4 WolfAMG . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145

7.4.1 Introduction to Object Orientation . . . . . . . . . . . . . . 145

7.4.2 WolfAMGis Written in Python . . . . . . . . . . . . . . . . . 146

7.4.3 Example Class: the Smoother . . . . . . . . . . . . . . . . . 149

7.4.4 WolfAMG vs. ML vs. AMG1R5 . . . . . . . . . . . . . . . . 152

Details about the implementation of AMG are rarely discussed in the literature. We havedevoted this chapter to describing the details of an efficient implementation of AMG com-ponents. We also describe our implementation of AMG in the Python programming lan-guages, but the description of algorithms in this chapter is independent of the program-ming language used.

7.1 Efficient Ruge/Stuben coarsening

We have already introduced the coarsening algorithm in Section 2.3.3. Recall the AMGsetup as given in Algorithm 2.6 on page 37. Line 6 of the setup requires a routine thatcoarsens the unknowns on that given level. More precisely, we are seeking a splitting of

132 Object-oriented Implementation of AMG in Python

the given set of points�

into two disjoint sets � and � , where we will promote exactlythe points from � on to the next level. The set � is therefore called coarse points and theset � fine points. Note that in the classical AMG approach the coarse points are reallyfine points as well. We have already presented a coarsening procedure in Algorithm 2.7.In this section we will state more precisely how to implement this kind of coarseningefficiently.

To split�

into � and � , every step of a greedy heuristic moves the most promisingcandidate from

�into � , while forcing neighboring points into � . This procedure is

repeated until all points are distributed. If every step requires at most � ��

operations,and the complexity of all other computations does not exceed � � ��

�, the desired

overall complexity of � � ��

is reached.

The greedy heuristic described in Stuben (2001) is based on the following two principles,which correspond to Coarsening Goals 1 and 2:

1. The most promising candidate, � , for becoming a � point, is that with the highestnumber of influences

��

�. Then all influences of � are added to � . This choice

supports Coarsening Goal 1 because all � points will eventually have at least onestrong � dependency.

2. To keep the number of � points low (Coarsening Goal 2), the algorithm shouldprefer � points near recently chosen � points; hence, these influences are given ahigher priority.

Starting with all points as “undecided points”, that is � � , the algorithm proceeds byselecting from � the most promising � point with highest priority. The priority of anypoint � is defined by

Priority � � ��

�� (

��

�(7.1)

Equation (7.1) reflects the preference in choosing the next � point for a point � � � ,which influences many previously selected � points. The key advantage of (7.1) is thepossibility to update the priority locally and in � � � time, which results in the desiredoverall complexity of � � ��

�. We now summarize the Coarse Grid Selection algo-

rithm:

To implement steps (1), (2), and (3) in Algorithm 7.1 on the facing page efficiently in� � � time, we maintain a list � of all points sorted by priority, together with the list �of point indices of � . Moreover, a list of boundaries � of all priorities occurring in �enables the immediate update of the sorted list � . Figure 7.1 shows a possible segmentof the lists � , � , and � .

During the set–up phase of the coarsening algorithm, � is computed and sorted by priority.Step (1) simply chooses the last element of � . Steps (2) and (3) are implemented by

Efficient Ruge/Stuben coarsening 133

Algorithm 7.1 Coarse Grid Selection

1 CoarseGridSelection �� 2 for all � � �3 set Priority � �

��

�4 end for all5 sort Priority6 � � � � �� 7 while � � �8 select � � � with max

� � � �� – Step (1) –9 � � � �

10 for all �

� � � – dependencies of p –11 Priority �

� Priority �

� � � – Step (2) –12 end for all13 for all �

� � � – influences of p –14 � � � �

15 for all � � � � – dependencies of q –16 Priority � � Priority � � � � – Step (3) –17 end for all18 end for all19 end while20 return �� 21

Figure 7.1 Efficient implementation of coarsening The three lists � , � , and � enablethe efficient implementation of the coarsening algorithm.

1 2 2 2

5 6

3

11 15

11

index of �

�

position in �

�

priority(� )

position in �

43

11141513

1312

6 14 3 5 4

14

3

14 17

21

�

�

�

index of�


exchanging a point, whose priority must be either incremented or decremented, with itsleft– or rightmost neighbor in � with that same priority; then its priority is adjusted, while

� remains sorted. Both � and � are updated accordingly.

Following a suggestion of K. Stuben, we shall skip the second pass of the original coars-ening algorithm in (Ruge and Stuben 1987), which enforces even stronger � –to– � de-pendency. Indirect interpolation schemes render the second pass unnecessary.

7.2 Sparse matrices

Sparse matrices are matrices with very few nonzeros compared to the number of zeroentries. We assume for every sparse � � � matrix the number of nonzeros per row (or col-umn) to be bounded by �

sp � , with �sp

� �� . Obviously storage and multiplication with

a sparse matrix is much more efficient when only nonzeros are stored rather than all itselements, as in the full case. An example for a � � �

sparse matrix and the correspondinglist of nonzeros is shown in Figure 7.2. There is a large number of formats to store suchnonzero entries. We describe the ones necessary for an efficient sequential implementa-tion of sparse matrices that we used in the software that computed the experiments in thisthesis.

7.2.1 Storage formats

List of nonzeros It is fairly obvious that neither an unsorted, nor a sorted list of thenonzeros (LNZ) sparse matrix stored in three lists ��!� � � , �� & � � and � & � � is sufficient for aneffective handling of sparse matrices. In the either case it is computationally expensive toinsert elements and matrix-vector multiplication may jump inefficiently within the vectorcomponents and produce inefficient cache misses. Additionally it is important to use astructure that has close to random access to rows or columns. Since AMG is rather roworiented, we describe two row oriented storage formats.

Compressed sparse row The compressed sparse row (CSR) sparse matrix format storesa set of non zeros with its values and corresponding row and column indices. To store amatrix

� � � � with � nz the number of nonzeros, three arrays are needed:

1. a row pointer array �� that holds � �� entries (the row start indices),

2. a column index array �� & � � that holds � nz entries (the column index pointers) and

3. a value array � & � � that holds � nz entries (the values of the matrix entry).

Sparse matrices 135

Figure 7.2 Sparse Matrix Example for a sparse matrix and its corresponding list ofnonzeros.

� ��

� � � � � �

� � � � � �

� � � � � �

� � � ��

� � � � � �

��

(a) matrix

index �� & � � � & � �1 2 2 12 3 4 23 4 5 54 2 3 25 5 6 46 1 1 77 5 5 68 3 2 2

(random order)

(b) LNZ format

index ��!� � � �� & � � � & � �1 1 1 72 2 2 13 4 3 24 6 2 25 7 4 26 (9) � 5 57 — 5 68 — 6 4

�end marker

(c) CSR format

index �� & � � � & � � ��

�� 1 4 3 2 -12 3 2 2 53 2 2 1 14 8 1 7 -15 7 4 2 -16 — 5 6 67 — 6 4 -18 — 5 5 -1

(random order)

(d) LL format


The value array � & � � holds all the nonzeros. To access element� �� in CSR format one

proceeds as follows: The row pointer array at position � ( �� ) points at the beginningof a row in the column index array �� & � � , i.e. �� & � row[i] � is the first column index of row � .Note that ��!� � � � �� marks the beginning of the next row and �� therefore marksthe end of row � . Between the beginning and the end of row � ( �� and �� ),�� & � � contains, sorted, all the column indices of row � . Now the column index must besearched for among all of the possible entries, that is, a �

� �� -1 thatsatisfies �� & � �#� must be found . If such a � exists, then � & � � � is the value at �� .Otherwise the entry is not stored and the value at �� is consequently zero by definition.

The CSR format allows for fast matrix-vector multiplication, i.e. in � � nz � � � and con-stant element retrieval time. The fixed row structure of the CSR format is a clear disad-vantage for the insertion of new nonzeros. If at any time the insertion of a new nonzeroexceeds the number that was reserved for the row in question, new space must be madeavailable for that row. In the worst case all elements, i.e. the whole matrix, has to beshifted. Of course, before creating a sparse matrix in CSR format, one predicts the nec-essary storage and then preallocates the necessary memory. Predicting the pattern of thematrix can be impossible in some cases though. One can also resort to allocation verylarge matrices that are guaranteed to suffice. Yet then unnecessary amounts of memoryare allocated and multiplication with such a matrix is slowed down due to the additionalmemory that needs to be scanned.

Linked lists are commonly known for better element insertion time than constant lengtharrays, of course, at the cost of more memory usage. This fact is used in the followingformat for sparse matrices.

Linked list A linked list (LL) sparse matrix format is similar to the CSR format, butrather than storing all elements in a flat � & � � with corresponding row and column indexvectors, each row is a linked list of elements. The first element of a row is indexed by a�� , while each element in the list of values � & � � has a corresponding �

� �� entry that

links the following element with the list. A value of � � marks the end of the list in onerow. The disadvantage of the LL format is that a column lookup requires the search of allnonzero values. If columns are needed it can pay off to store a columnwise index in thesame fashion as just now described for rows.

Figure 7.2 compares the different formats for one particular example. In practice we useall of the described formats and use transform routines where needed. Generally speakingwe use the LL format to for the AMG setup that creates the matrices and the CSR formatfor the cycle, where fast matrix vector multiplication is needed.

Remark 7.2.1 (array reference and insertion) In the following we will not explicitly makeuse of the described matrix formats when formulating our algorithms. We will assume that

Sparse matrices 137

reference and insertion of an element� �� will be implemented correspondingly. �

Matrix Market The widely spread matrix market format1 can best be compared to theLNZ format. But it is only a storage format for storage of matrices in files. The formatitself contains more possibilities than described here, namely vector storage and facilitiesfor symmetric matrices and matrix patterns. The concept is really independent of thediscussion of matrix formats used for computation.

7.2.2 Sparse matrix-vector multiplication

For the CSR and the LL format we can easily describe a matrix vector multiplicationroutine, see Algorithm 7.2. The major difference between the CSR and the LL formatis the computation of the column indices in line 6 of the Algorithm 7.2. In either casethe algorithm can easily scan both rows. In the CSR case the row is stored contiguouslyanyway, while in the LL format case the row can be processed by following the links inthe linked list.

Algorithm 7.2 Sparse matrix-vector multiplication

1 MatVecMult � � �� 2 for � � to � – all rows of � –3 � �� – zero vector –4 end for5 for � � to � – all rows of

�–

6 for �� & � � � � �� 7 � �� & � � � �� & � � – compute entry –8 end for9 end for

10 return � �11

7.2.3 Sparse matrix-matrix multiplication

Sparse matrix-matrix multiplication is lacking in many sparse matrix software packages.The main reason is the possible inefficiency of the matrix-matrix multiplication, the dif-ficulty to parallelize it and the fact that e.g. for most Krylov subspace methods it is not

1see also http://math.nist.gov/MatrixMarket/


necessary to build matrix-matrix products. Even a two–grid algorithm (see Algorithm 2.4on page 33) can often be formulated without matrix-matrix multiplication. Another im-portant issue is additional fill-in, which is possibly large. AMG does not allow an im-plementation without building matrix-matrix multiplications, which can be seen from thesetup routine, see Algorithm 2.6 on page 37.

Let �� ( � �

with� � � and

�being sparse matrices with row oriented storage, like the CSR or the LL

format. Recall the matrix-matrix multiplication from above. To compute an entry

� ��

in the result matrix, the scalar product of the � -th row� �� and the j-th column

� �� has to be calculated (see Figure 7.3). One difficulty now becomes obvious: As

�is

given in a row oriented format, selecting the column� �� is time-consuming. Another

problem with the corresponding dense matrix matrix procedure (see Algorithm 7.3) hasto do with the fact that the result matrix

�is sparse. It is important not to compute zero

entries and insertion of nonzero elements needs to be executed efficiently. Algorithm 7.4solves all of the above mentioned issues.

Algorithm 7.3 Dense matrix-matrix multiplication

1 NaiveMatMatMult � � � �� 2 for � � to � – all rows of

�–

3 for � to � – all columns of

�–

4

� �� – compute entry –

5 end for6 end for7 return

��

8

Note that line 5 in Algorithm 7.4 inserts elements into the result matrix

�if the corre-

sponding element does not exist yet. We implement this routine by inserting into an LLformat matrix, but it may even pay off to precompute the pattern for inserting into a CSRformat. The choice of matrix formats really depends on the use of the matrix after themultiplication.

Sparse matrices 139

Algorithm 7.4 Sparse matrix-matrix multiplication

1 SparseMatMatMult � � � �� 2

� �

3 for � � to � – all rows of�

–4 for �� & � � � � �� – all nonzeros of row

� �� –5 for �� & �

� � �� & �� – all nonzeros of row� �� & �� –

6

� �� & �

� � � �� & � � ( � �� & � � �� & ��

7 – compute entry –8 end for9 end for

10 end for11 return

��

12

Figure 7.3 Sparse matrix-matrix multiplication

Z

B

A

i

k

k

(i,j)


7.3 Galerkin product

7.3.1 Straightforward

This section describes different variants of implementing Equation (2.12). The straight-forward way of calculating the Galerkin product simply consists of applying the matrix-matrix multiplication twice. This can either be done by computing the Galerkin productin a forward procedure (see Algorithm 7.5) or the other way round, namely in a backwardprocedure (see Algorithm 7.6). A graphic illustration of both algorithms can bes seen inFigure 7.4.

Remark 7.3.1 (transpose product) One can obviously choose to implement a multipli-cation with the transpose of a matrix instead of explicitly transposing the matrix

�for

multiplication with the transpose. �Algorithm 7.5 Forward Galerkin product

1 GalerkinProduct � � � �� 2

�� transpose �� – build transpose –

3 � � �� ( � � – build first product –4 free �� – free the storage space of

��–

5 free �� – free the storage space of � � –6

� � �� ( �� – build second product –7 return � � �8

Algorithm 7.6 Backward Galerkin product

1 GalerkinProduct � � � �� 2 � � �� ( � – build second product –3

�� transpose �� – build transpose –

4

�� (�� – build first product –5 free �� – free the storage space of

��–

6 free �� – free the storage space of �� –7 return � � �8

Whether to prefer the forward or backward direction strongly depends on the numberof nonzeros in

�compared to

� � . If � nz � � � is bigger than � nz �� , forward must bepreferred and otherwise backward. To illustrate this, consider the extreme case where

�

Galerkin product 141

Figure 7.4 Galerkin product Graphic representation of the Galerkin product forwardand backward procedure.

PT

Ah

AH

P

Y1

(a) forward

P

AHPT

Ah Y2

(b) backward


is the zero matrix and� � is a normal sparse matrix. Computing � � � � ( � � takes

time rows � � � (see 7.2.3) and computing��

� � ( � then again takes time �� .But doing this computation the other way round, namely computing � � � � ( � , takes�

sp � � � ( rows � � � � operations.

Both methods require the storage of intermediate matrices � � �� and possibly the storageof the transposed matrix

� �. On the finest grid this can be the bottleneck within the AMG

setup. The following section deals with eliminating this intermediate storage.

7.3.2 In-place

In this section we are seeking an in-place method, i.e. a routine with no or very little ad-ditional storage to compute

� � � � � � � . Applying the same principles as from thetransition from full matrix-matrix multiplication to sparse matrix-matrix multiplicationin Section 7.2.3, namely row-wise access of the matrices and efficient element insertion,leads to Algorithm 7.7. The correctness of the algorithm can easily be verified using Fig-ure 7.5 that shows a graphical illustration of the algorithm and its loop variables in particu-lar. A simple analysis yields that the runtime of the algorithm is bounded by � � nz � � � � ,i.e. its computational complexity behaves linearly with the number of nonzeros in

� � . Inthat sense Algorithm 7.7 is optimal and has the same computational complexity as thedirect methods presented in the previous section. Note that this result remains only trueif the insertion of elements can be executed efficiently. Recall from the previous sectionsthat this remains only true for either the CSR format with preallocation of nonzeros perrow that is not exceeded for any row or the LL format.

Remark 7.3.2 (in-place conversion) Efficiency of Algorithm 7.7 is only guaranteed forgeneral matrices, when using the LL format. If the user desires a CSR format matrix thena conversion from LL format to CSR format is required. The presented method is reallyonly in-place if the conversion can be done in-place as well. Efficient in-place conversionis possible and described in Geus (2002). �

Tests Tscherrig (2002) reports timings for an implementation of the Galerkin productalgorithm presented above. Additionally he presents various approaches for improving theinsertion time of elements into a matrix in CSR format. The tests verify the linear behaviorof the algorithm for different matrices. Additionally the tests show that the linear behavioris disturbed for the CSR format when the number of nonzeros is not guessed correctly.

Galerkin product 143

Algorithm 7.7 In-place Galerkin product

1 GalerkinProduct � � � �� 2 for � to � – all columns of

� �–

3��

� � �� 4 for ��

� �� – nonzeros column

� � �� –5

�� & � � � �� – begin ad hoc row –6 for �� & � � �� & � � � – nonzeros row

� � �� –7

�� & � �� & � � ��

8 for �� & �� & � – nonzeros row

� �� & � � �� –9 �'� � �� &

� � – compute entry –10

� � � �� & �� ( � �� & �� & �

11 end for12 end for – end ad hoc row –13

�� & �� 14 for �� &

� � �� & � � � – nonzeros row � � �� –

15

� � �� &� �� – compute entry –

16

� � �� (�� & � � �

17 end for18 end for19 end for20 return � � �21


Figure 7.5 In-place Galerkin product

P

AHPT

Ah Y2

p q

q

p

k

j

j+ +

+ +

ad hoc row

WolfAMG 145

7.4 WolfAMG

Numerical software is often still written in Fortran or C. The main reason is performance.Yet it is most obvious that the typical open source numerical code that one finds on theInternet is rather unreadable. Typical problems include:

Too many indices Due to the lack of efficient data structures many constructs involveiterated references to arrays and structures. It is sensible to use short variable namesfor indices that appear often. Consequently many codes are overwhelmed withcryptic references.

Too many parameters A general observation is that functions in Fortran and C codes tendto have 10 parameters or more. While this is regarded bad programming practice,it definitely makes code difficult to read, verify and debug.

Long code Standard procedural programming style in practice often leads to very longprograms with repeated code that makes packages difficult to maintain.

The resulting development process suffers from the following troubles:

Too complex The packages become so complex that it is utterly difficult to add anotherfeature.

Wrong code The complete code is difficult to verify.

Long turnaround time The write, compile, link, test cycle becomes very long, especiallyif large libraries need to be remade or the test routine takes long, because largeproblems sizes are required for testing.

7.4.1 Introduction to Object Orientation

Object orientation is known to solve many structural problems in programming. The con-cept of object orientation is widely spread and part of most of the modern programminglanguages, including C++, C#, Java, Modula-3, Smalltalk, Eiffel and Python. An objectis the aggregation of variables and corresponding methods. The focus lies on:

Encapsulation The code is divided into a public and a private interface. Encapsulationhelps to maintain the coherence of the data.

Polymorphism The ability to assign function names or overload standard operators withfunctions specific to an object is known as polymorphism. It increases the modu-larization and re-usability of code.


Inheritance Creating subclasses from a parent class is called inheritance. The reuse ofparent class methods is an efficient mechanisms for code reuse.

Important standard books on object oriented programming are Booch and Rumbaugh(1997) and Gamma et al. (1986).

Remark 7.4.1 (simple objects in numerics) It is often argued that in numerical computa-tions the objects are simple, i.e. scalars, matrices and vectors. Thus it is concluded thatobject orientation is of little help. We disagree and will show an example where objectorientation is of great help in the following section. �7.4.2 WolfAMGis Written in Python

There are two traditional ways of implementing numerical codes: MATLAB and a C/Fortranmix. Our previous implementations included both versions, and a mix of MATLAB andC as well. The essence is that we found C (and especially Fortran) too complicated to useand the MATLAB sparse matrix implementation too slow. Geus (2002) reports on similarexperiences.

Figure 7.6 Harvey Keitel as Mr. Wolf in Pulp Fiction Mr. Wolf: “I’m Winston Wolf. Isolve problems.”

We have chosen the Python language for implementing WolfAMG2, our implementation of

AMG with approximate inverses. We use the widely known NumPy package by Ascheret al. (2001) for vectors and dense matrices and the PySparse package by Geus (2002).Additionally we use the Netlib implementation of the well-known BLAS and LAPACK

2The name refers to my exceptional roommate Wolfram Schlickenrieder and a character in the movie PulpFiction by Quentin Terentino released in 1994. The movie ironically introduces Harvey Keitel as a suaveproblem-solver named “Wolf”. See Figure 7.6 and Tarantino (1994) for more details or why not watch themovie tonight?

WolfAMG 147

library, see Anderson et al. (1994). For computing SPAI( � ) inverses we use the SPAI 3.0package by Grote and Barnard. Note that the PySparse package interfaces the SuperLUpackage by Demmel et al. (1997), which is used as the direct solve method in WolfAMG.Figure 7.7 shows the main package and programming language dependencies.

Figure 7.7 Structure of WolfAMG.

The Python programming language has many very useful properties:

Portability Python has been ported to many platforms.

Interpreted The Python interpreter offers a naturally build in debugger. Additionallythe interpreter can be used to easily verify the behavior of the standard library orother additional software. This saves time and provides an alternative to consultingsoftware documentation.

Fast The core Python language implements fast data structures for structs, arrays, dictio-naries and its object system.

Libraries Python comes with a large set of standard open source software.

Interfaces The Python language has standard interfaces to the C language, Java and manyother languages. We have developed a simple interface to MATLAB for creatinggraphics.


Object oriented The object oriented programming model in Python offers simple, shortand almost unique ways of programming. The resulting code is generally short,readable and thus easy to understand.

The Python language and many references on Python can be found on the web-site www.python.org. Most useful when working with Python is the introduction to Pythonby van Rossum and Fred L. Drake (2002b) and the complete Library Reference, see vanRossum and Fred L. Drake (2002a). The Python to C interface is described in a separatedocument (van Rossum and Fred L. Drake 2002c). A good comparison of Python toother languages was written by Prechelt (2000). Python has been used for large scientificcomputing projects, see Beazley and Lomdahl (1997), Hinsen (1997) and Geus (2002).

Numeric Python The standard Python array data-type is called sequence. The datatypeis so general that it becomes slow when operating with millions of entries. Thus NumericPython (NumPy) was created for manipulating large (multi-)dimensional floating pointarrays. The functions in NumPy is an extension that is very similar to the MATLABAPI. Note that the development of NumPy is ending and that it will be replaced by asimilar library called NumArray. All references on Numeric Python can be found onwww.pfdubois.com/numpy/ and the references therein.

PySparse Geus (2002) wrote an extension to Python for sparse matrices. Sparse ma-trices can be stored in the LL format or the CSR format. A third format for symmetricmatrices (SSS format) exists as well. PySparse implements standard matrix operations,such as inserting, updating and deleting elements, referencing matrix slices, performingmatrix-vector and matrix-matrix products, computing matrix norms, etc. PySparse alsoimplements a set of standard preconditioners, iterative solvers and the Jacobi-Davidsonalgorithm for solving the symmetric eigenvalue problem.

Code size The AMG code in its current state currently comprises

� approximately 1.400 lines of C code,

� approximately 3.000 lines of Python code for the AMG related modules and

� approximately 2.700 lines of Python code for the experiments (includes figure andtable generation).

This is very little compared to previous unfinished versions of pure C code. Note that asimilar amount of code is contained in the PySparse package.

WolfAMG 149

7.4.3 Example Class: the Smoother

It is beyond the scope of this document to describe our full implementation of WolfAMG.Additionally the API and object hierarchy of WolfAMG is likely to change. Thus we docu-ment WolfAMGin a separate reference manual, see Broker (2003). Nevertheless we wouldlike to show how we were able to benefit from the object orientation in Python in imple-menting AMG. As an example we focus on the smoother as one of the modules.

Technically the smoother is a routine that takes a vector � as its input and returns a vector�� of the same dimensions that contains the smooth approximation. When Gauss-Seidel isused as a smoother one needs to pass a reference to

�and

�as well. In the approximate

inverse case, an additional reference to�

is needed. In languages like C it is very difficultto strictly hide the details of the smoother within the AMG code. Vukelia (2002) showsthat it is non-trivial to write a preprocessor for the C language that hides the details of amodule to a satisfactory degree. Object oriented programming naturally hides the detailsof the implementation of a module.

In the following we describe the interface of a smoother class as needed within WolfAMG.Note that we specify the exact arguments, where necessary and denote (...) for an interfacethat may be chosen.

init (...) The class must provide a constructor method. For most smoothers this methodwill copy some standard arguments into the class variables. For most smoothers thenumber of smoothing steps � pre or � post will be a typical class variable. Note thatthe constructor is never called by the WolfAMG code (unless standard smoothersare instantiated, because the user has not provided a smoother instance). Thus theinterface is open, i.e. (...).

setup(level) It is very likely for smoothers to require some initialization, based on the ma-trix�� on level

�. The setup method takes the level as an argument assuming that

smoothing needs level dependent information only. Additionally the smoother maychange or add information on the level, e.g. like storing an approximate inverse.

call (u,A,b,level) The smoother class exhibits one especial method, i.e. the methodthat provides smoothing. The special call method is thus overloaded with thesmoothing operation. (References to the system matrix and the right hand sidecould also be retrieved from the level class. We redundantly reference the variablesfor more clarity.)

Remark 7.4.2 (additional methods) A destructor method is usually needed, while inPython the implicitly given method is sufficient. Additionally one may want to overloadthe print operation ( repr () in Python) with output of information about the smoother.


Every smoothing class will most likely also include more methods specific to the smoother,especially for safely changing attributes. �Remark 7.4.3 (setup free design) Note that this design does not require the user to setup the smoothing method manually. In case the setup() method is not called prior to the

call () method, the call () method itself calls the setup() method. We are interestedin a strict separation of setup time and the computational time, thus we have chosen thisdesign to enable the user to manually initiate the setup prior to all smoothing. Functionsthat overwrite the protected issetup variable may also force a call to reissue the setupupon the next smoothing call. �Within this framework smoothing can be implemented almost completely independentof the implementation of the cycle. Figure 7.8 shows the attributes and methods of allsmoothing classes in our smoother module. Note that there is a base class smoother thatimplements standard methods that may used as is or overwritten. Only the smoothingitself, the call () method must be overwritten. The smoother base class call functionraises an implementation error. The base class constructor can be used similarly by al-most all methods, except the Gauss-Seidel method which contains an extra argument forthe direction. Such an extra argument should then be paired with a function for settingthe protected value. Such functions can be extensions of the superclass smoother, coderewrites are only necessary for the additional method in the derived class. In our smootherclass everything related to the variables n and omega are used in all subclasses. The ap-proximate inverse smoothers all share a common smoothing method, while they all differin the setup. All the different smoothers can thus be implemented by implementing thesetup function only, again code reuse is obvious.

The interpolation and galerkin module are implemented similarly. Consult Broker (2003)for more information about the implementation of WolfAMG.

Features of WolfAMG WolfAMGimplements the classical Ruge/Stuben approach. It obvi-ously implements different smoothers, i.e. Gauss-Seidel, Jacobi and approximate inversesmoothers. We use classical coarsening and direct interpolation only, but the methods areimplemented in a modular and thus extensible fashion. The code provides preconditionedcoarse grid approximations methods described in Chapter 5. We use SuperLU for thedirect solve on the coarsest grid.

We would like to mention one feature of the code: the user can specify a different setof modules for every level. While this allows for enormous flexibility, it is in practicedifficult to find level dependent components. Yet especially in the context of approxi-mate inverses it may be a way of saving memory by adjusting the density of the inversesaccording to the gridsize.

WolfAMG 151

Figure 7.8 Class hierarchy of the module smoother


7.4.4 WolfAMG vs. ML vs. AMG1R5

We compare compare WolfAMG, our implementation of algebraic multigrid, to other avail-able implementations of AMG.

AMG1R5 The first fairly general AMG program, AMG1R5, is described in Ruge andStuben (1987). It was mainly written by Klaus Stuben and John Ruge at the formerGMD in Birlinghoven, Germany. It is 4600 line Fortran77 implementation of the originalRuge/Stuben approach. It uses the Yale sparse matrix package as a direct solver.

ML ML is a parallel multigrid equation solver/preconditioning package, written by ateam at Sandia National Labs in Livermore, California. See http://www.cs.sandia.gov/%7Etuminaro/ML_Description.html for more information about the codeand further references. Several different multigrid methods are provided including 3 alge-braic schemes: smoothed aggregation, edge element multigrid, and classical AMG. In ad-dition, it is possible to supply grid hierarchy information and finite element basis functioninformation to create multilevel methods. In addition, several smoothers are provided.

Comparing the codes We compare the three codes on the standard 5 point discretiza-tion of the Poisson problem. The underlying question really is whether our Python imple-mentation can possibly compete with pure C or Fortran implementations. One should beaware that all the presented codes have a number of parameters to choose from and thatwe might not have chosen the best set for the given problem.

We compare setup and solve times for an increasing grid size. One step of symmetricGauss-Seidel was chosen as the smoother. We tuned the tolerance of the iteration in sucha way that all three codes reduce the absolute residual norm of the solution to the sameorder of magnitude, i.e. � � � � .Results are shown in Table 7.1 on the next page and Figure 7.9 on the facing page. Thenumbers show that there is practically no significant difference between the performanceof these codes. From practical experience with AMG it becomes obvious that the solvetime for a given problem with AMG is less dependent on the programming languageor implementation used, rather than the proper use of AMG, i.e. the correct choice ofcomponents and parameters.

WolfAMG 153

Table 7.1 WolfAMG vs. ML vs. AMG1R6 Setup time � sup and solve time � sol for solv-ing the Laplace problem. We compare different implementations of algebraic multigrid.[➠exp 015a.py] �

128 � 128 256 � 256 512 � 512 1024 � 102416384 65536 262144 1048576�

sup[s]�

sol[s]�

sup[s]�

sol[s]�

sup[s]�

sol[s]�

sup[s]�

sol[s]ML 0.2 0.3 1.1 1.3 4.2 5.3 17.8 20.2AMG1R6 0.2 0.2 1.0 1.3 4.0 5.2 16.8 23.2WolfAMG 0.3 0.2 1.2 0.9 5.2 3.4 22.5 15.5

Figure 7.9 WolfAMG vs. ML vs. AMG1R5 We plot the gridsize � vs. the setup time � sup

and solve time � sol solving the Poisson problem on a regular grid. We compare differentimplementations of algebraic multigrid. [➠exp 015a.py]

0

5

10

15

20

25

128

256

512

1024

MLAMG1R5WolfAMG

(a) vs. � sup

0

5

10

15

20

25

128

256

512

1024

MLAMG1R5WolfAMG

(b) vs. � sol

Chapter 8

Conclusions

8.1 Is SPAI useful with Multigrid? . . . . . . . . . . . . . . . . . . . 155

8.2 Contributions of this Thesis . . . . . . . . . . . . . . . . . . . . . 156

8.3 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156

8.1 Is SPAI useful with Multigrid?

The answer to this rhetoric question is clearly “yes”. We have shown in this thesis thatinverses, based on Frobenius norm minimization can be useful in conjunction with geo-metric and algebraic multigrid. We have introduced SPAI inverses as smoothers, strongconnections and in the Galerkin coarse grid approximation. The advantages are as fol-lows:

Improved Robustness The three major algorithmic improvements presented in this thesishave almost solely improved the convergence rate of the multigrid iteration, i.e. thecost in terms of computational time and memory might increase, but compared toclassical AMG e.g. the convergence rates were mostly improved. Compared to thesimple Gauss-Seidel iteration, the slight increase in robustness must be paid with asubstantial increase in memory requirement. Yet, in some experiments better exe-cution times of the solution phase were observed due to the improved convergencebehavior of AMG when approximate inverses were used.

More Parallelism One of the major advantages of Frobenius norm minimization is thatthe procedure of computing the inverse is naturally parallel. Thus methods forenhancing multigrid convergence that make use of SPAI inverses generally inheritthe given parallelism.

Ordering Independence Similarly to the natural parallelism comes inherent ordering in-dependence. The new methods presented exhibit no ordering dependence. Thusthe ordering of the unknowns is immaterial and the methods are guaranteed to con-verge independent of that order. This property is additionally useful in the context

156 Conclusions

of parallel computing. In a parallel environment ordering independence generallymeans that the convergence is independent of the processor number. Thus orderingindependence is not a mere practical improvement, but another important featurefor designing large scale parallel algorithms.

8.2 Contributions of this Thesis

This thesis treats the combination of algebraic multigrid and approximate inverses onvarious research levels:

Algorithms We have made use of approximate inverses in all multigrid components. Var-ious sparsity patterns were proposed, with special emphasis on reducing the mem-ory cost for storing the approximate inverses.

Theory The class of matrices for which the SPAI-0 approximate inverse yields a smootherwhich fulfills the smoothing property was extended, when compared to the resultsfound in the literature for the damped Jacobi smoother.

Experiments We have conducted a number of experiments to show the generality, ro-bustness and

�-independency of the approach for standard scalar test problems.

The computations show that the approach can be efficiently realized.

Implementation Our implementation of AMG uses recent programming tools and soft-ware design. The resulting code is efficient, portable, extensible and versatile. Thecode is published as open source software and thus available for the whole commu-nity in scientific computing. We have shown the parallel efficiency of the approachwith a parallel implementation of SPAI-1 with geometric multigrid.

Extensibility The general approach allows further extensions in multiple respects: gener-alization to systems of PDEs, adding frameworks to AMG, comparison with poly-nomial smoothers, exploiting more dynamic adaptivity, redesign of a parallel coars-ening algorithm and extending WolfAMG. We will discuss this future work in thenext section.

8.3 Future Work

Systems of PDEs Most applications in science stemming from partial differential equa-tions are really systems of PDEs. One can simply treat the resulting system of linearequations as scalar systems and apply AMG to them. However, even for a simple vector

Future Work 157

Laplace equation, the resulting matrix is not an M-matrix and thus convergence cannot beguaranteed.

The point block approach is a generalization of classical AMG to systems of PDEs. Theidea was first proposed by Brandt (1984) and (Ruge and Stuben 1987). These approacheswere recently investigated numerically for a variety of problems Fullenbach et al. (2000),Fullenbach and Stuben (2002), Fullenbach and Stuben (2003) and by Oeltz (2002).

The point-block approach groups all unknowns corresponding to a single point and formssmall subsystems for each point. The block matrix

�is then condensed to a scalar matrix��

using a fixed matrix norm of a block as the corresponding scalar entry.

Let each node be associated with a number of unknowns, e.g.��

Then the resulting matrix can be written in block form as:

��

.... . .

��

��

�� ...

��

where each block� � � is an � � � matrix. A reduced system

��is then given by

��

��

��

� � � ��

� ��

.... . .

��

The Ruge/Stuben AMG algorithm is then applied to the scalar matrix��

with smoothingand interpolation adapted to the blocks.

The point block approach in AMG is very similar to the idea of a block version of SPAIproposed by Barnard and Grote (1999). The results in this thesis suggest that block ver-sions of SPAI could be efficient smoothers in the point-block AMG approach.

Polynomial Smoothers Recently polynomial iterative solver have been used as smoothersfor algebraic multigrid by Adams, Brezina, Hu, and Tuminaro (2002). Polynomial smoothersare also inherently parallel. While SPAI smoother require memory for storing an approxi-mate inverse, polynomial smoothers need to store coefficients only. Yet for efficient poly-nomial smoothing a good approximation of the largest eigenvalue of the correspondingsystem matrix is needed. The cost of applying a polynomial smoother depends mostly onthe degree of the polynomial. It would be interesting to compare polynomial smootherswith SPAI smoothers. Polynomial smoothers are certainly limited in the non-symmetriccase.

158 Conclusions

Algorithmic Framework for AMG. As we have seen from the many computationalexperiments carried out the number of choices of components and parameters for compil-ing a quickly converging algebraic multigrid scheme is quite large. The total number ofchoices quickly exceeds � � �

�. An exhaustive search for optimization is intractable, but

even tuning for a few parameters is a time-consuming process. There are two main ideasto improve on that problem:

Manual testing routines: There are re-occurring steps, when testing a new problem withAMG. Starting with a small test problem and a parameter settings that is mostlikely to yield a converging iteration, we manually change the parameters towardsan efficient multilevel scheme. Automating this process saves time and enableseven an inexperienced user to make such an investigation.

Genetic Search Algorithms: Oosterlee and Wienands (2002) apply a genetic search al-gorithm to a multigrid program. The results are surprising as they find improvedparameter-sets for well studied standard problems.

Towards what quantity should the choice of parameters and components be tuned?One may want to optimize convergence, solution time, solution time including thesetup time, overall memory consumption or a combination of these. Zitzler, Deb,and Thiele (2000) compare multi-objective evolutionary algorithms for several dif-ferent problems. The PISA programming package (Platform and ProgrammingLanguage Independent Interface for Search Algorithms) by Stefan Bleuler et. al.1

at the Computer Engineering Laboratory of ETH Zurich is a framework for the im-plementation of genetic optimization processes. Combining PISA with WolfAMG

could give more insight into finding good AMG components.

Automatic searching for parameters would

� support a user to find very good parameters for problems that are often computedand

� help us to endorse new research directions for investigating the AMG algorithmitself.

Exploit more Dynamic Local Adaptivity Unfortunately classical AMG may fail forproblems where one would expect good convergence behavior. In many cases an expertcan adjust the AMG components to the given problem, whereas inexperienced users may

1see also http://www.tik.ee.ethz.ch/pisa/pisa_report.pdf and the references therein

Future Work 159

not have enough knowledge about multigrid to do so. Users are mostly interested in black-box software. In fact, most of them would probably like to sacrifice some optimality formore robustness.

Approximate inverses provide more potential for dynamic local adaptivity than othermethods. Their performance can in many cases be improved by augmenting the spar-sity pattern. Obviously this fact has to be exploited with great care, because an extendedsparsity pattern also means additional required storage. The general idea is to watchthe convergence behavior after a few iterative steps and then refine the solution processwhere necessary. This approach is one of the very promising main ideas to provide morereliable AMG methods. One of the major questions arising in this context is: “what cri-terion should be taken as a measure for deciding where the sparsity pattern should beaugmented?” In the context of smoothers, a natural choice derived from Equation (2.11)may be using the magnitude of the components in

�� .

Parallel Coarsening In Chapter 6 we have shown that a parallel coarsening scheme thatproduces coarse grids, independent of the processor number may be feasible. The algo-rithm produces coarse grids of lesser quality, when compared to the coarse grids computeby the traditional Ruge/Stuben coarsening. The result is an unsatisfactory performanceof the algorithm for low processor numbers on large grids. Additionally the algorithmintroduces too many sensitive parameters which are mainly due to the iterative schemesintroduced into the coarsening. We now think that an approach that uses less sophisticatedschemes, like counting the number of strong connections for each fine node is easier togauge and might be a more successful criterion.

Extending WolfAMG There are several standard features well-known from the literaturelacking in WolfAMG.

Cycle Types The well known W cycle and F cycle should be added. The cycling strategyshould not be added by adding functions to the AMG class, but rather by writingcycle control classes that decide within a general frame work whether the multigridcycle goes up in the level hierarchy or down. This enables the user to easily adaptthe cycling strategy even with more advanced adaptive cycling techniques, like theone proposed by Rude (1993).

Interpolation WolfAMG lacks indirect interpolation introduced early by Ruge and Stuben(1987). Other interpolation approaches proposed by Wagner (2000) have shownto be effective for convection–diffusion–reaction equations. Meurant (2001) alsoproposes new interpolation schemes based on approximate inverses.

160 Conclusions

Operator truncation Operator truncation helps to lower the memory requirements forvery large problems. Truncation can be solely based on comparing entries in asingle row, by comparing all entries in a matrix and could possibly be dependenton the application as well. One can truncate the coarse grid operators, as well asthe interpolation operators. Stuben (2001) mentions that operator truncation mustbe used with great care and may cause strong divergence of the AMG iteration.

Documentation After all good documentation is a strong requirement for a good soft-ware package. WolfAMG currently comes with no documentation, besides com-ments within the code.

Optimization The code can easily be optimized in a number of places.

1. The Ruge/Stuben coarsening should be implemented more carefully with adynamic list � (see Section 7.1). It is currently implemented using a list � ofstatic length.

2. The matrix-matrix multiplication is currently realized in th LL matrix for-mat. We have shown fast algorithms for matrix-matrix multiplication in Sec-tion 7.2.3 in the CSR format that should be added to the PySparse package.

BibliographyAdams, M., M. Brezina, J. Hu, and R. Tuminaro (2002). Parallel multigrid smoothing:Polynomial versus gauss-seidel. submitted.

Alcouffe, R. E., A. Brandt, J. E. Dendy, and J. W. Painter (1981). The multi–gridmethods for the diffusion equation with strongly discontinuous coefficients. SIAM J. Sci.Stat. Comput. 2, 430–454.

Anderson, E., Z. Bai, C. Bischof, J. Demmel, J. Dongarra, J. D. Croz, A. Green-baum, S. Hammarling, A. McKenney, S. Ostrouchov, and D. Sorensen (1994). LA-PACK Users’ Guide - Release 2.0. Philadelphia, PA. (Software and guide are availablefrom Netlib at URL http://www.netlib.org/lapack/).

Ascher, D., P. F. Dubois, K. Hinsen, J. Hugunin, and T. Oliphant (2001). NumericalPython. Lawrence Livermore National Laboratory.

Bader, M., M. Schimper, and C. Zenger (1999). Hierarchical bases for the indefinitehelmholtz equation. Technical Report TUM-19905, Technische Universitat Munchen, In-stitut fur Informatik.

Bakhvalov, N. S. (1966). On the convergence of a relaxation method under natural con-straints on an elliptic operator. Z. Vycisl. Mat. i. Mat. Fiz. 6, 861–883.

Barnard, S. and M. J. Grote (1999). A block version of the spai preconditioner. In Proc.of the 9th SIAM conference on Parallel Processing for Scientic Computing, held in SanAntonio, TX, March 1999.

Barrett, R., M. Berry, T. F. Chan, J. Demmel, J. Donato, J. J. Dongarra, V. Eijkhout,R. Pozo, C. Romine, and H. van der Vorst (1994). Templates for the solution of linearsystems: building blocks for iterative methods. Philadelphia: SIAM Books.

Beazley, D. and P. Lomdahl (1997). Feeding a large scale physics application to python.In Proceedings of the 6 th International Python Conference, San Jose, California, October14-17, 1997.

Benson, M. W. and P. O. Frederickson (1982). Iterative solution of large sparse linearsystems arising in certain multidimensional approximation problems. Utilitas Mathemat-ica 22, 127–140.

162 BIBLIOGRAPHY

Benzi, M., C. D. Meyer, and M. Tuma (1996, September). A sparse approximate inversepreconditioner for the conjugate gradient method. SIAM Journal on Scientific Comput-ing 17(5), 1135–1149.

Benzi, M. and M. Tuma (1999, June). A comparative study of sparse approximate in-verse preconditioners. Applied Numerical Mathematics: Transactions of IMACS 30(2–3), 305–340.

Booch, G. and J. Rumbaugh (1997). Unified Method for Object-Oriented DevelopmentVersion 1.0. Rational Software Corporation.

Braess, D. (1997). Finite Elements. Theory, Fast Solvers and Applications in Solid Me-chanics. Cambridge: Cambridge University Press.

Brandt, A. (1973). Multi–level adaptive technique (MLAT) for fast numerical solutionto boundary value problems. In H. Cabannes and R. Teman (Eds.), Proceedings of theThird International Conference on Numerical Methods in Fluid Mechanics, Volume 18 ofLecture Notes in Physics, Berlin, pp. 82–89. Springer–Verlag.

Brandt, A. (1984). Multigrid techniques: 1984 guide with applications to fluid dynamics.GMD–Studien Nr. 85. St. Augustin: Gesellschaft fur Mathematik und Datenverarbeitung.

Brandt, A. (2000). General highly accurate algebraic coarsening. Elect. Trans. Numer.Anal. 10, 1–20.

Brandt, A. and I. Livshits (1997). Wave-Ray multigrid method for standing wave equa-tions. Elect. Trans. Numer. Anal. 6, 162–181.

Briggs, W. L., V. E. Henson, and S. F. McCormick (2000). A Multigrid Tutorial.Philadelphia: SIAM Books. Second edition.

Broker, O. (2003). Wolf AMG — Reference & Developer Manual. ETH Zurich. http://www.inf.ethz.ch/˜broeker/wolfamg.

Broker, O., M. J. Grote, C. Mayer, and A. Reusken (2000, November). Robust parallelsmoothing for multigrid via sparse approximate inverses. Report 2000-13, ETH Zurich,Seminar fur Angewandte Mathematik. submitted to SIAM J. Sci. Comput.

Ceruzzi, P. E. (1998). A History of Modern Computing. Cambridge, Mass.: MIT Press.

Chow, E. (2001). An unstructured multigrid method based on geometric smoothness.technical report UCRL-JC-145075, Lawrence Livermore National Laboratory. submittedto Num. Lin. Alg. Appl.

BIBLIOGRAPHY 163

Chow, E. and Y. Saad (1998, May). Approximate inverse preconditioners via sparse-sparse iterations. SIAM Journal on Scientific Computing 19(3), 995–1023.

de Zeeuw, P. M. (1990). Matrix–dependent prolongations and restrictions in a blackboxmultigrid solver. J. Comput. Appl. Math. 33, 1–27.

Demko, S., W. F. Moss, and P. W. Smith (1984). Decay rates for inverses of band ma-trices. Math. Comp. 43, 491–499.

Demmel, J. W., J. Gilbert, and X. S. Li (1997). SuperLU users’ guide. Technical ReportCSD-97-944, University of California, Berkeley.

Douglas, C. C. and M. B. Douglas (1991-2002). MGNet Bibliography. Department ofComputer Science and the Center for Computational Sciences, University of Kentucky,Lexington, KY, USA and Department of Computer Science, Yale University, New Haven,CT, USA; see http://www.mgnet.org/mgnet-bib.html.

Elman, H. C., O. G. Ernst, and D. P. O’Leary (2002). A multigrid method enhanced byKrylov subspace iteration for discrete Helmholtz equations. SIAM Journal on ScientificComputing 23(4), 1291–1315.

Fedorenko, R. P. (1961). A relaxation method for solving elliptic difference equations. Z.Vycisl. Mat. i. Mat. Fiz. 1, 922–927. Also in U.S.S.R. Comput. Math. and Math. Phys.,1 (1962), pp. 1092–1096.

Frederickson, P. O. (1975). Fast approximate inversion of large sparse linear systems.Technical Report 7-75, Lakehead University.

Fullenbach, T. and K. Stuben (2002). Algebraic multigrid for selected pde systems.In Proceedings of the Fourth European Conference on Elliptic and Parabolic Problems,Rolduc (The Netherlands) and Gaeta (Italy), 2001, pp. 399–410. World Scientific, NewJersey, London.

Fullenbach, T. and K. Stuben (2003). Algebraic multigrid for industrial semiconduc-tor device simulation. In Proceedings of the First International Conference on Chal-lenges in Scientific Computing. Lecture Notes in Computational Science and Engineering,Springer, Heidelberg, Berlin, Berlin, Germany, Oct 2-5, 2002. (to appear).

Fullenbach, T., K. Stuben, and S. Mijalkovic (2000). Application of an algebraic multi-grid solver to process simulation problems. In Proceedings of the International Confer-ence on Simulation of Semiconductor Processes and Devices, Seattle (WA), USA, Sep6-8, 2000, pp. 225–228. IEEE, Piscataway (NJ), USA.

164 BIBLIOGRAPHY

Gamma, E., R. Helm, R. Johnson, and J. Vlissides (1986). Design Patterns: Elementsof reusable object-oriented software. Addison–Wesley.

Geus, R. (2002). The Jacobi-Davidson algorithm for solving large sparse symmetriceigenvalue problems. PhD Thesis No. 14734, ETH Zurich.

Goldberg, M. and T. Spencer (1989, August). Constructing a maximal independent setin parallel. SIAM Journal on Discrete Mathematics 2(3), 322–328.

Grauschopf, T., M. Griebel, and H. Regler (1997). Additive multilevel-preconditionersbased on bilinear interpolation, matrix dependent geometric coarsening and algebraicmultigrid coarsening for second order elliptic PDEs. Appl. Numer. Math. 23, 63–96.

Greenbaum, A. (1997). Iterative methods for solving linear systems. Philadelphia, PA:Society for Industrial and Applied Mathematics (SIAM).

Griebel, M. (1992). Grid– and point–oriented multilevel algorithms. In IncompleteDecomposition (ILU): Theory, Technique and Application, Proceedings of the EighthGAMM–Seminar, Kiel, 1992, Volume 41 of Notes on Numerical Fluid Mechanics, pp.32–46. Braunschweig: Vieweg.

Gropp, W. D., E. Lusk, and A. Skjellum (1994). Using MPI: Portable Parallel Pro-gramming with the Message-Passing Interface. Scientific and Engineering Computation.Cambridge, MA: MIT Press.

Grote, M. J. and T. Huckle (1997, May). Parallel preconditioning with sparse approxi-mate inverses. SIAM Journal on Scientific Computing 18(3), 838–853.

Gutknecht, M. H. (1990). The pioneer days of scientific computing in switzerland.In S. Nash (Ed.), A History of Scientific Computing, pp. 301–313. NY: ACM Press,Addison-Wesley.

Hackbusch, W. (1985). Multigrid Methods and Applications, Volume 4 of ComputationalMathematics. Berlin: Springer–Verlag.

Hackbusch, W. (1993). Iterative Solution of Large Sparse Systems of Equations. Berlin:Springer–Verlag.

Hackbusch, W. and G. Wittum (1993). Incomplete Decompositions (ILU) – Algorithms,Theory, and Applications, Volume 41 of Notes on Numerical Fluid Mechanics. Braun-schweig: Vieweg.

BIBLIOGRAPHY 165

Henson, V. E. and U. Meyer-Yang (2002). Boomeramg: A parallel algebraic multigridsolver and preconditioner. to appear in Applied Numer. Math. 41(1), 155–177.

Hestenes, M. R. and E. Stiefel (1952). Methods of conjugate gradients for solving linearsystems. J. Res. Nat. Bur. Stand. 49, 409–436.

Hinsen, K. (1997). The molecular modeling toolkit: a case study of a large scientificapplication in python. In Proceedings of the 6 th International Python Conference, SanJose, California, October 14-17, 1997.

Huang, W. Z. (1991). Convergence of algebraic multigrid methods for symmetric posi-tive definite matrices with weak diagonal dominance. Applied Mathematics and Compu-tation 46(2 (part II)), 145–164.

Huckle, T. and J. Staudacher (2002). Multigrid preconditioning and toeplitz matrices.ETNA 13, 81–105.

John, V. and G. Matthies (2002). MooNMD – a program package based on mappedfinite element methods. Technical Report 01/2002, Fakultat fur Mathematik, Otto-von-Guericke-Universitat Magdeburg.

Johnson, D. S., C. R. Aragon, L. A. McGeoch, and C. Schevon (1991, -). Optimizationby simulated annealing: An experimental evaluation; part II, graph coloring and numberpartitioning. Operations Research 39(3), 378–406.

Joppich, W. (1996). Grundlagen der Mehrgittermethode. Kursunterlagen.

Kolotilina, L. Y. and A. Y. Yeremin (1993a, January). Factorized sparse approxi-mate inverse preconditionings. I. Theory. SIAM Journal on Matrix Analysis and Applica-tions 14(1), 45–58.

Kolotilina, L. Y. and A. Y. Yeremin (1993b, January). Factorized sparse approxi-mate inverse preconditionings. I. Theory. SIAM Journal on Matrix Analysis and Applica-tions 14(1), 45–58.

Krechel, A. and K. Stuben (1999, December). Parallel algebraic multigrid based onsubdomain blocking. GMD Report 71, GMD.

Lanczos, C. (1952). Solution of systems of linear equations by minimized iterations. J.Res. Natl. Bur. Stand 49, 33–53.

Luby, M. (1985). A simple parallel algorithm for the maximal independent set problem.In ACM (Ed.), Proceedings of the seventeenth annual ACM Symposium on Theory of

166 BIBLIOGRAPHY

Computing, Providence, Rhode Island, May 6–8, 1985, New York, NY, USA, pp. 1–10.ACM Press.

Matthies, G. and L. Tobiska (2001). The streamline-diffusion method for conformingand nonconforming finite elements of lowest order applied to convection-diffusion prob-lems. Computing 66, 343–364.

McBryan, O. A., P. O. Frederickson, J. Linden, A. Schuller, K. Solchenbach,K. Stuben, C. A. Thole, and U. Trottenberg (1991). Multigrid methods on parallelcomputers a survey of recent developments. Impact Comput. Sci. Eng. 3, 1–75.

Meijerink, J. A. and H. A. van der Vorst (1977). An iterative solution method for linearsystems of which the the coefficient matrix is a symmetric M–matrix. Math. Comp. 31,148–162.

Message Passing Interface Forum (1994, May). MPI: A message-passing interfacestandard, version 1.0.

Metropolis, N., A. W. Rosenbluth, M. N. Rosenbluth, A. H. Teller, and E. Teller(1953, June). Equations of state calculations by fast computing machines. Journal ofChemical Physics 21(6), 1087–1091.

Meurant, G. (2001). Numerical experiments with algebraic multilevel preconditioners.Electronic transactions on numerical analysis 12, 1–65.

Nash, S. (1990). A History of Scientific Computing. ACM Press, History Series.

Oeltz, D. (2002). An algebraic multigrid method for linear elasticity. In Proceedings ofthe 2002 Copper Mountain Conference on Iterative Methods.

Oosterlee, C. and R. Wienands (2002). A genetic search for optimal multigrid compo-nents within a fourier analysis setting. submitted.

Ortega, J. M. (1988). Matrix Theory: A Second Course. New York, NY, USA: PlenumPress.

Peaceman, D. W. and H. H. Rachford, Jr. (1955, March). The numerical solution ofparabolic and elliptic differential equations. Journal of the Society for Industrial and Ap-plied Mathematics 3(1), 28–41.

Prechelt, L. (2000). An empirical comparison of seven programming languages. Com-puter 33(10), 23–29.

BIBLIOGRAPHY 167

Richardson, L. (1910). The approximate arithmetical solution by finite differences ofphysical problems involving differential equations, with an application to the stresses ina masonry dam. Phil. Trans. R. Soc. A, .

Rude, U. (1993). Fully adaptive multigrid methods. SIAM J. Numer. Anal. 30, 230–248.

Ruge, J. W. and K. Stuben (1987). Algebraic multigrid (AMG). In S. F. McCormick(Ed.), Multigrid Methods, Volume 3 of Frontiers in Applied Mathematics, pp. 73–130.Philadelphia, PA: SIAM.

Saad, Y. (1996). Iterative methods for sparse linear systems. New York: PWS Publishing.

Saad, Y. and M. H. Schultz (1986). GMRES: A generalized minimum residual algorithmfor solving nonsymmetric linear systems. SIAM J. Sci. Stat. Comput. 7, 856–869.

Saad, Y. and H. A. van der Vorst (2000). Iterative solution of linear systems in the 20thcentury. Journal of Computational and Applied Mathematics 123, 1–33.

Saad, Y. and J. Zhang (1999). Diagonal threshold techniques in robust multi-level ILUpreconditioners for general sparse linear systems. Numer. Lin. Alg. Appl. 6, 257–280.

Schwarz, H. R. (1981, April/June). The early years of computing in Switzerland. Annalsof the History of Computing 3(2), 121–132.

Schwarz, H. R. (1991). Methode der finiten Elemente (3., neubearbeitete Auflage ed.).Stuttgart: Teubner.

Stoer, J. (1979). Einfuhrung in die Numerische Mathematik I, Volume 15 of HeidelbergerTaschenbucher. Berlin: Springer-Verlag. 7. verbesserte Auflage.

Stoer, J. and R. Bulirsch (1980). Introduction to Numerical Analysis. New York:Springer-Verlag.

Stoer, J. and R. Bulirsch (1990). Einfuehrung in die Numerische Mathematik II (3.verbesserte Auflage ed.). Berlin: Springer.

Stuben, K. (2001). An Introduction to Algebraic Multigrid. Academic Press. Guest con-tribution in Multigrid by Trottenberg et.al. (2001), pp. 413–532.

Tang, W.-P. and W. L. Wan (2000, October). Sparse approximate inverse smoother formultigrid. SIAM Journal on Matrix Analysis and Applications 21(4), 1236–1252.

Tarantino, Q. (1994). Pulp Fiction: A Quentin Tarantino Screenplay. Miramax.

168 BIBLIOGRAPHY

Trottenberg, U. and C. Oosterlee (1996, October). Parallel adaptive multigrid – anelementary introduction. Arbeitspapiere der GMD 1026, GMD, St. Augustin.

Trottenberg, U., C. Oosterlee, and A. Schuller (2001). Multigrid. Academic Press.

Tscherrig, T. (2002). Implementing the galerkin product in algebraic multigrid. ETHZurich, Diplomarbeit (in english).

van der Vorst, H. A. (1992, March). BI-CGSTAB: A fast and smoothly converging vari-ant of BI-CG for the solution of nonsymmetric linear systems. SIAM Journal on Scientificand Statistical Computing 13(2), 631–644.

van Rossum, G. and J. Fred L. Drake (2002a). Python Library Reference. Python Labs.Release 2.2.2, http://www.python.org.

van Rossum, G. and J. Fred L. Drake (2002b). Python Tutorial. Python Labs. Release2.2.2, http://www.python.org.

van Rossum, G. and J. Fred L. Drake (2002c). Python/C API Reference Manual. PythonLabs. Release 2.2.2, http://www.python.org.

Varga, R. S. (1962). Matrix Iterative Analysis. Englewood Cliffs, NJ: Prentice–Hall.

Vukelia, L. (2002). Modularizing ANSI-C codes. ETH Zurich, Diplomarbeit (in english).

Wagner, C. (1998). Introduction to algebraic multigrid. Course Notes.

Wagner, C. (2000). On the algebraic construction of multilevel transfer operators (forconvection–diffusion–reaction equations. In Multigrid Methods VI, Volume 14 of LectureNotes in Computational Science and Engineering, Berlin, pp. 264–270. Springer–Verlag.

Wan, W. L., T. F. Chan, and B. Smith (2000). An energy-minimizing interpolation forrobust multigrid methods. SIAM Journal on Scientific Computing 21(4), 1632–1649.

Wang, K. and J. Zhang (2002). Multistep sparse approximate inverse preconditioningstrategies. In T. A. Manteuffel and S. F. McCormick (Eds.), Seventh Copper MountainConference on Iterative Methods. University of Colorado.

Wesseling, P. (1982). A robust and efficient multigrid method. In W. Hackbusch andU. Trottenberg (Eds.), Multigrid Methods, Volume 960 of Lecture Notes in Mathemat-ics, pp. 614–630. Berlin: Springer-Verlag.

Wesseling, P. (1992). An Introduction to Multigrid Methods. Chichester: John Wiley &Sons. Reprinted by www.MGNet.org.

BIBLIOGRAPHY 169

Wesseling, P. (1993). The role of incomplete LU-factorization in multigrid methods. InIncomplete Decompositions (ILU) – Algorithms, Theory, and Applications, Volume 41of Notes on Numerical Fluid Mechanics, pp. 202–214. Braunschweig: Vieweg.

Yavneh, I. and E. Olvovsky (1998). Multigrid smoothing for symmetric nine-point sten-cils. Applied Mathematics and Computation 92(2–3), 229–246.

Young, D. M. (1950). Iterative Methods for Solving Partial Difference Equations of El-liptic Type. Ph. D. thesis, Department of Mathematics, Harvard University.

Zitzler, E., K. Deb, and L. Thiele (2000, April). Comparison of multiobjective evolu-tionary algorithms: Empirical results. Evolutionary Computation 8(2), 173–195.

Curriculum Vitae

Person

Name Oliver BrokerDate of birth May 8, 1971Place of birth Hannover, GermanyCitizenship German

Education

1998—2003 Ph.D. student at ETH Zurich at the Institute of Scientific Computing inthe Numerical Linear Algebra group of Prof. Walter Gander

1991—1998 University of Bonn — Diplom in Computer Science1981—1990 Clara-Schumann-Gymnasium Bonn, Germany — Abitur1987/88 Millard-North-High-School, Omaha (NE), USA — High-School-

Diploma

Teaching

1998—2002 Assistentship in undergradute teaching in Numeric and Symbolic Com-puting and Scientific Computing at the Institute of Scientific Computingat ETH Zurich; total of 7 semesters

2000—2002 Responsible for directing the legislative board concerning all teaching atthe Department of Computer Science at ETH Zurich

1999—2001 Active work in the program for promotion of women in computer scienceat ETH Zurich

2000 Computer Science course at the Kantonsschule Baden (high-school)1996 Assistentship in undergradute teaching in Computer Graphics at Col-

orado University in Boulder

Interests� Scientific computing, parallel computing, programming� Numerical analysis, especially multigrid algorithms� Endurance sports and electronic music

in copyright - non-commercial use permitted rights ...26758/eth... · parallel multigrid methods...

Documents