an approximate minimum degree ordering algorithm

An Approximate Minimum Degree Ordering

Algorithm

Patrick R. Amestoy∗ Timothy A. Davis† Iain S. Duff‡

SIAM J. Matrix Analysis & Applic., Vol 17, no 4, pp. 886-905,Dec. 1996

Abstract

An Approximate Minimum Degree ordering algorithm (AMD) forpreordering a symmetric sparse matrix prior to numerical factoriza-tion is presented. We use techniques based on the quotient graph formatrix factorization that allow us to obtain computationally cheapbounds for the minimum degree. We show that these bounds are of-ten equal to the actual degree. The resulting algorithm is typicallymuch faster than previous minimum degree ordering algorithms, andproduces results that are comparable in quality with the best orderingsfrom other minimum degree algorithms.

∗ENSEEIHT-IRIT, Toulouse, France. email: [email protected].†Computer and Information Sciences Department University of Florida, Gainesville,

Florida, USA. phone: (904) 392-1481, email: [email protected]. Technical reports andmatrices are available via the World Wide Web at http://www.cis.ufl.edu/ davis, or byanonymous ftp at ftp.cis.ufl.edu:cis/tech-reports. Support for this project was providedby the National Science Foundation (ASC-9111263 and DMS-9223088). Portions of thiswork were supported by a post-doctoral grant from CERFACS.

‡Rutherford Appleton Laboratory, Chilton, Didcot, Oxon. 0X11 0QX England, andEuropean Center for Research and Advanced Training in Scientific Computation (CER-FACS), Toulouse, France. email: [email protected]. Technical reports, informa-tion on the Harwell Subroutine Library, and matrices are available via the World WideWeb at http://www.cis.rl.ac.uk/struct/ARCD/NUM.html, or by anonymous ftp at sea-mus.cc.rl.ac.uk/pub.

1

2

1 Introduction

When solving large sparse symmetric linear systems of the form Ax = b, it iscommon to precede the numerical factorization by a symmetric reordering.This reordering is chosen so that pivoting down the diagonal in order onthe resulting permuted matrix PAPT = LLT produces much less fill-in andwork than computing the factors of A by pivoting down the diagonal in theoriginal order. This reordering is computed using only information on thematrix structure without taking account of numerical values and so may notbe stable for general matrices. However, if the matrix A is positive-definite[21], a Cholesky factorization can safely be used. This technique of precedingthe numerical factorization with a symbolic analysis can also be extended tounsymmetric systems although the numerical factorization phase must allowfor subsequent numerical pivoting [1, 2, 16]. The goal of the preordering isto find a permutation matrix P so that the subsequent factorization has theleast fill-in. Unfortunately, this problem is NP-complete [31], so heuristicsare used.

The minimum degree ordering algorithm is one of the most widely usedheuristics, since it produces factors with relatively low fill-in on a wide rangeof matrices. Because of this, the algorithm has received much attention overthe past three decades. The algorithm is a symmetric analogue of Markowitz’method [26] and was first proposed by Tinney and Walker [30] as algorithmS2. Rose [27, 28] developed a graph theoretical model of Tinney and Walker’salgorithm and renamed it the minimum degree algorithm, since it performsits pivot selection by choosing from a graph a node of minimum degree.Later implementations have dramatically improved the time and memoryrequirements of Tinney and Walker’s method, while maintaining the basicidea of selecting a node or set of nodes of minimum degree. These improve-ments have reduced the memory complexity so that the algorithm can operatewithin the storage of the original matrix, and have reduced the amount ofwork needed to keep track of the degrees of nodes in the graph (which is themost computationally intensive part of the algorithm). This work includesthat of Duff and Reid [10, 13, 14, 15]; George and McIntyre [23]; Eisenstat,Gursky, Schultz, and Sherman [17, 18]; George and Liu [19, 20, 21, 22]; andLiu [25]. More recently, several researchers have relaxed this heuristic bycomputing upper bounds on the degrees, rather than the exact degrees, andselecting a node of minimum upper bound on the degree. This work includesthat of Gilbert, Moler, and Schreiber [24], and Davis and Duff [8, 7]. Davis

3

and Duff use degree bounds in the unsymmetric-pattern multifrontal method(UMFPACK), an unsymmetric Markowitz-style algorithm. In this paper, wedescribe an approximate minimum degree ordering algorithm based on thesymmetric analogue of the degree bounds used in UMFPACK.

Section 2 presents the original minimum degree algorithm of Tinney andWalker in the context of the graph model of Rose. Section 3 discusses thequotient graph (or element graph) model and the use of that model to reducethe time taken by the algorithm. In this context, we present our notationfor the quotient graph, and present a small example matrix and its graphs.We then use the notation to describe our approximate degree bounds inSection 4. The Approximate Minimum Degree (AMD) algorithm and itstime complexity is presented in Section 5. In Section 6, we first analysethe performance and accuracy of our approximate degree bounds on a setof test matrices from a wide range of disciplines. The AMD algorithm isthen compared with other established codes that compute minimum degreeorderings.

Throughout this paper, we will use the superscript k to denote a graph,set, or other structure obtained after the first k pivots have been chosen andeliminated. For simplicity, we will drop the superscript when the context isclear.

2 Elimination graphs

The nonzero pattern of a symmetric n-by-n matrix, A, can be representedby a graph G0 = (V 0, E0), with nodes V 0 = 1, ..., n and edges E0. Anedge (i, j) is in E0 if and only if aij 6= 0 and i 6= j. Since A is symmetric, G0

is undirected.The elimination graph, Gk = (V k, Ek), describes the nonzero pattern of

the submatrix still to be factorized after the first k pivots have been chosenand eliminated. It is undirected, since the matrix remains symmetric as itis factorized. At step k, the graph Gk depends on Gk−1 and the selection ofthe kth pivot. To find Gk, the kth pivot node p is selected from V k−1. Edgesare added to Ek−1 to make the nodes adjacent to p in Gk−1 a clique (a fullyconnected subgraph). This addition of edges (fill-in) means that we cannotknow the storage requirements in advance. The edges added correspond tofill-in caused by the kth step of factorization. A fill-in is a nonzero entry Lij,where (PAPT )ij is zero. The pivot node p and its incident edges are then

4

removed from the graph Gk−1 to yield the graph Gk. Let AdjGk(i) denote theset of nodes adjacent to i in the graph Gk. When the kth pivot is eliminated,the graph Gk is given by

V k = V k−1 \ p

andEk =

(Ek−1 ∪ (AdjGk−1(p) × AdjGk−1(p))

)∩ (V k × V k).

The minimum degree algorithm selects node p as the kth pivot such thatthe degree of p, tp ≡ |AdjGk−1(p)|, is minimum (where |...| denotes the size ofa set or the number of nonzeros in a matrix, depending on the context). Theminimum degree algorithm is a non-optimal greedy heuristic for reducing thenumber of new edges (fill-ins) introduced during the factorization. We havealready noted that the optimal solution is NP-complete [31]. By minimizingthe degree, the algorithm minimizes the upper bound on the fill-in caused bythe kth pivot. Selecting p as pivot creates at most (t2p − tp)/2 new edges inG.

3 Quotient graphs

In contrast to the elimination graph, the quotient graph models the factor-ization of A using an amount of storage that never exceeds the storage forthe original graph, G0 [21]. The quotient graph is also referred to as thegeneralized element model [13, 14, 15, 29]. An important component of aquotient graph is a clique. It is a particularly economic structure since aclique is represented by a list of its members rather than by a list of allthe edges in the clique. Following the generalized element model, we referto nodes removed from the elimination graph as elements (George and Liurefer to them as eliminated nodes). We use the term variable to refer touneliminated nodes.

The quotient graph, Gk = (V k, Vk, Ek, E

k), implicitly represents the elim-

ination graph Gk, where G0 = G0, V 0 = V , V0

= ∅, E0 = E, and E0

= ∅.For clarity, we drop the superscript k in the following. The nodes in G consistof variables (the set V ), and elements (the set V ). The edges are dividedinto two sets: edges between variables E ⊆ V × V , and between variablesand elements E ⊆ V ×V . Edges between elements are not required since wecould generate the elimination graph from the quotient graph without them.

The sets V0

and E0

are empty.

5

We use the following set notation (A, E , and L) to describe the quotientgraph model and our approximate degree bounds. Let Ai be the set ofvariables adjacent to variable i in G, and let Ei be the set of elements adjacentto variable i in G (we refer to Ei as element list i). That is, if i is a variablein V , then

Ai ≡ j : (i, j) ∈ E ⊆ V ,

Ei ≡ e : (i, e) ∈ E ⊆ V ,

andAdjG (i) ≡ Ai ∪ Ei ⊆ V ∪ V .

The set Ai refers to a subset of the nonzero entries in row i of the originalmatrix A (thus the notation A). That is, A0

i ≡ j : aij 6= 0, and Aki ⊆ Ak−1

i ,for 1 ≤ k ≤ n. Let Le denote the set of variables adjacent to element e in G.That is, if e is an element in V , then we define

Le ≡ AdjG (e) = i : (i, e) ∈ E ⊆ V .

The edges E and E in the quotient graph are represented using the setsAi and Ei for each variable in G, and the sets Le for each element in G.We will use A, E , and L to denote three sets containing all Ai, Ei, andLe, respectively, for all variables i and all elements e. George and Liu [21]show that the quotient graph takes no more storage than the original graph(|Ak| + |Ek| + |Lk| ≤ |A0| for all k).

The quotient graph G and the elimination graph G are closely related. Ifi is a variable in G, it is also a variable in G, and

AdjG(i) =

Ai ∪

⋃

e∈Ei

Le

\ i, (1)

where the “\” is the standard set subtraction operator.When variable p is selected as the kth pivot, element p is formed (variable

p is removed from V and added to V ). The set Lp = AdjG(p) is found usingequation (1). The set Lp represents a permuted nonzero pattern of the kthcolumn of L (thus the notation L). If i ∈ Lp, where p is the kth pivot, andvariable i will become the mth pivot (for some m > k), then the entry Lmk

will be nonzero.Equation (1) implies that Le \ p ⊆ Lp for all elements e adjacent to

variable p. This means that all variables adjacent to an element e ∈ Ep are

6

adjacent to the element p and these elements e ∈ Ep are no longer needed.They are absorbed into the new element p and deleted [15], and referenceto them is replaced by reference to the new element p. The new element pis added to the element lists, Ei, for all variables i adjacent to element p.Absorbed elements, e ∈ Ep, are removed from all element lists.

The sets Ap and Ep, and Le for all e in Ep, are deleted. Finally, any entryj in Ai, where both i and j are in Lp, is redundant and is deleted. The set Ai

is thus disjoint with any set Le for e ∈ Ei. In other words, Aki is the pattern

of those entries in row i of A that are not modified by steps 1 through k ofthe Cholesky factorization of PAPT . The net result is that the new graph Gtakes the same, or less, storage than before the kth pivot was selected.

The following equations summarize how the sets L, E , and A change whenpivot p is chosen and eliminated. The new element p is added, old elementsare absorbed, and redundant entries are deleted:

Lk =

Lk−1 \

⋃

e∈Ep

Le

∪ Lp

Ek =

Ek−1 \

⋃

e∈Ep

e

∪ p

Ak =(Ak−1 \ (Lp × Lp)

)∪(V k × V k

).

3.1 Quotient graph example

We illustrate the sequence of elimination graphs and quotient graphs of a10-by-10 sparse matrix in Figures 1 and 2. The example is ordered sothat a minimum degree algorithm recommends pivoting down the diagonalin the natural order (that is, the permutation matrix is the identity). InFigures 1 and 2, variables and elements are shown as thin-lined and heavy-lined circles, respectively. In the matrices in these figures, diagonal entriesare numbered, unmodified original nonzero entries (entries in A) are shownas a solid squares. The solid squares in row i form the set Ai. The variablesin current unabsorbed elements (sets Le) are indicated by solid circles in thecolumns of L corresponding to the unabsorbed elements. The solid circles inrow i form the set Ei. Entries that do not correspond to edges in the quotientgraph are shown as an ×. Figure 1 shows the elimination graph, quotientgraph, and the matrix prior to elimination (in the left column) and after the

7

Figure 1: Elimination graph, quotient graph, and matrix for first three steps.

14

3

5

6

8

9

10

2

7

4

5

6

8

9

10

7

4

3

5

6

8

9

10

7

14

3

5

6

8

9

10

2

7

14

3

5

6

8

9

10

2

7

14

3

5

6

8

9

10

2

7

7

4

3

5

6

8

9

10

2

23

45

67

910

1

8

23

45

67

910

1

8

23

45

67

910

1

8

23

45

67

910

1

8

14

3

5

6

8

9

10

7

20 1 2 3

G 3G2

G10

G

(a) Elimination graph

(b) Quotient graph

(c) Factors and active submatrix

8

Figure 2: Elimination graph, quotient graph, and matrix for steps 4 to 7.

4

3

5

6

8

9

10

2

7

5

6

8

9

10

7 6

8

9

10

7

8

9

10

7

8

9

10

6

8

9

10

7

8

9

10

7

4

5

6

8

9

10

7

23

45

67

910

1

8

23

45

67

910

1

8

23

45

67

910

1

8

23

45

67

910

1

8

4 5 6 9

G 7G 6G5G4

(a) Elimination graph

(c) Factors and active submatrix

(b) Quotient graph

9

first three steps (from left to right). Figure 2 continues the example for thenext four steps.

Consider the transformation of the graph G2 to the graph G3. Variable 3 isselected as pivot. We have L3 = A3 = 5, 6, 7 (a simple case of equation (1)).The new element 3 represents the pairwise adjacency of variables 5, 6, and7. The explicit edge (5,7) is now redundant, and is deleted from A5 and A7.

Also consider the transformation of the graph G4 to the graph G5. Vari-able 5 is selected as pivot. The set A5 is empty and E5 = 2, 3. Followingequation (1),

L5 = (A5 ∪ L2 ∪ L3) \ 5

= (∅ ∪ 5, 6, 9 ∪ 5, 6, 7) \ 5

= 6, 7, 9,

which is the pattern of column 5 of L (excluding the diagonal). Since the newelement 5 implies that variables 6, 7, and, 9 are pairwise adjacent, elements2 and 3 do not add any information to the graph. They are removed, havingbeen “absorbed” into element 5. Additionally, the edge (7, 9) is redundant,and is removed from A7 and A9. In G4, we have

A6 = ∅ E6 = 2, 3, 4A7 = 9, 10 E7 = 3, 4A9 = 7, 8, 10 E9 = 2

.

After these transformations, we have in G5,

A6 = ∅ E6 = 4, 5A7 = 10 E7 = 4, 5A9 = 8, 10 E9 = 5

,

and the new element in G5,

L5 = 6, 7, 9.

3.2 Indistinguishable variables and external degree

Two variables i and j are indistinguishable in G if AdjG(i)∪i = AdjG(j)∪j. They will have the same degree until one is selected as pivot. If i isselected, then j can be selected next without causing any additional fill-in.

10

Selecting i and j together is called mass elimination [23]. Variables i and jare replaced in G by a supervariable containing both i and j, labeled by itsprincipal variable (i, say) [13, 14, 15]. Variables that are not supervariablesare called simple variables. In practice, new supervariables are constructedat step k only if both i and j are in Lp (where p is the pivot selected at stepk). In addition, rather than checking the graph G for indistinguishability,we use the quotient graph G so that two variables i and j are found tobe indistinguishable if AdjG (i) ∪ i = AdjG (j) ∪ j. This comparison isfaster than determining if two variables are indistinguishable in G, but maymiss some identifications because, although indistinguishability in G impliesindistinguishability in G, the reverse is not true.

We denote the set of simple variables in the supervariable with principalvariable i as i, and define i = i if i is a simple variable. When p is selected aspivot at the kth step, all variables in p are eliminated. The use of supervari-ables greatly reduces the number of degree computations performed, whichis the most costly part of the algorithm. Non-principal variables and theirincident edges are removed from the quotient graph data structure when theyare detected. The set notation A and L refers either to a set of supervari-ables or to the variables represented by the supervariables, depending on thecontext. In degree computations and when used in representing eliminationgraphs, the sets refer to variables; otherwise they refer to supervariables.

In Figure 2, detected supervariables are circled by dashed lines. Non-principal variables are left inside the dashed supervariables. These are, how-ever, removed from the quotient graph. The last quotient graph in Figure 2represents the selection of pivots 7, 8, and 9, and thus the right column ofthe figure depicts G7, G9, and the matrix after the ninth pivot step.

The external degree di ≡ ti − |i| + 1 of a principal variable i is

di = |AdjG(i) \ i| = |Ai \ i| +

∣∣∣∣∣∣

⋃

e∈Ei

Le

\ i

∣∣∣∣∣∣, (2)

since the set Ai is disjoint from any set Le for e ∈ Ei. At most (d2

i − di)/2fill-ins occur if all variables in i are selected as pivots. We refer to ti as thetrue degree of variable i. Selecting the pivot with minimum external degreetends to produce a better ordering than selecting the pivot with minimumtrue degree [25] (also see Section 6.2).

11

3.3 Quotient-graph-based minimum degree algorithm

A minimum degree algorithm based on the quotient graph is shown in Al-gorithm 1. It includes element absorption, mass elimination, supervariables,and external degrees. Supervariable detection is simplified by computing ahash function on each variable, so that not all pairs of variables need becompared [3]. Algorithm 1 does not include two important features of Liu’sMultiple Minimum Degree algorithm (MMD): incomplete update [17, 18] andmultiple elimination [25]. With multiple elimination, an independent set ofpivots with minimum degree is selected before any degrees are updated. If avariable is adjacent to two or more pivot elements, its degree is computed onlyonce. A variable j is outmatched if AdjG(i) ⊆ AdjG(j). With incomplete de-gree update, the degree update of the outmatched variable j is avoided untilvariable i is selected as pivot. These two features further reduce the amountof work needed for the degree computation in MMD. We will discuss theirrelationship to the AMD algorithm in the next section.

The time taken to compute di using equation (2) by a quotient-graph-based minimum degree algorithm is

Θ(|Ai| +∑

e∈Ei

|Le|), (3)

which is Ω(|AdjGk(i)|) if all variables are simple.1 This degree computation isthe most costly part of the minimum degree algorithm. When supervariablesare present, in the best case the time taken is proportional to the degree ofthe variable in the “compressed” elimination graph, where all non-principalvariables and their incident edges are removed.

4 Approximate degree

Having now discussed the data structures and the standard minimum de-gree implementations, we now consider our approximation for the minimumdegree and indicate its lower complexity.

1Asymptotic complexity notation is defined in [6]. We write f(n) = Θ(g(n)) if thereexist positive constants c1, c2, and n0 such that 0 ≤ c1g(n) ≤ f(n) ≤ c2g(n) for alln > n0. Similarly, f(n) = Ω(g(n)) if there exist positive constants c and n0 such that0 ≤ cg(n) ≤ f(n) for all n > n0; and f(n) = O(g(n)) if there exist positive constants c

and n0 such that 0 ≤ f(n) ≤ cg(n) for all n > n0.

12

Algorithm 1 (Minimum degree algorithm, based on quotient graph)V = 1...nV = ∅for i = 1 to n do

Ai = j : aij 6= 0 and i 6= jEi = ∅di = |Ai|i = i

end fork = 1while k ≤ n do

mass elimination:select variable p ∈ V that minimizes dp

Lp =(Ap ∪

⋃e∈Ep

Le

)\ p

for each i ∈ Lp doremove redundant entries:Ai = (Ai \ Lp) \ pelement absorption:Ei = (Ei \ Ep) ∪ pcompute external degree:

di = |Ai \ i| +∣∣∣(⋃

e∈EiLe

)\ i∣∣∣

end forsupervariable detection, pairs found via hash function:for each pair i and j ∈ Lp do

if i and j are indistinguishable thenremove the supervariable j:i = i ∪ jdi = di − |j|V = V \ jAj = ∅Ej = ∅

end ifend forconvert variable p to element p:V = (V ∪ p) \ Ep

V = V \ pAp = ∅Ep = ∅k = k + |p|

end while

13

Algorithm 2 (Computation of |Le \ Lp| for all e ∈ V )assume w(k) < 0, for k = 1, . . . , nfor each supervariable i ∈ Lp do

for each element e ∈ Ei doif (w(e) < 0) then w(e) = |Le|w(e) = w(e) − |i|

end forend for

We assume that p is the kth pivot, and that we compute the bounds onlyfor supervariables i ∈ Lp. Rather than computing the exact external degree,di, our Approximate Minimum Degree algorithm (AMD) computes an upperbound [8, 7],

dk

i = min

n − k,

dk−1

i + |Lp \ i|,|Ai \ i| + |Lp \ i| +

∑

e∈Ei\p

|Le \ Lp|

. (4)

The first two terms (n−k, the size of the active submatrix, and dk−1

i +|Lp\i|,the worst case fill-in) are usually not as tight as the third term in equation (4).Algorithm 2 computes |Le \ Lp| for all elements e in the entire quotientgraph. The set Le splits into two disjoint subsets: the external subset Le \Lp

and the internal subset Le ∩ Lp. If Algorithm 2 scans element e, the termw(e) is initialized to |Le| and then decremented once for each variable iin the internal subset Le ∩ Lp, and, at the end of Algorithm 2, we havew(e) = |Le| − |Le ∩ Lp| = |Le \ Lp|. If Algorithm 2 does not scan element e,the term w(e) is less than zero. Combining these two cases, we obtain

|Le \ Lp| =

w(e) if w(e) ≥ 0|Le| otherwise

, for all e ∈ V . (5)

Algorithm 2 is followed by a second loop to compute our upper bound degree,di for each supervariable i ∈ Lp, using equations (4) and (5). The total timefor Algorithm 2 is

Θ(∑

i∈Lp

|Ei|).

14

The second loop to compute the upper bound degrees takes time

Θ(∑

i∈Lp

(|Ai| + |Ei|)). (6)

The total asymptotic time is thus given by expression (6).Multiple elimination [25] improves the minimum degree algorithm by up-

dating the degree of a variable only once for each set of independent pivots.Incomplete degree update [17, 18] skips the degree update of outmatchedvariables. We cannot take full advantage of the incomplete degree updatesince it avoids the degree update for some supervariables adjacent to thepivot element. With our technique (Algorithm 2), we must scan the elementlists for all supervariables i in Lp. If the degree update of one of the su-pervariables is to be skipped, its element list must still be scanned so thatthe external subset terms can be computed for the degree update of othersupervariables in Lp. The only advantage of multiple elimination or incom-plete degree update would be to skip the second loop that computes theupper bound degrees for outmatched variables or supervariables for whichthe degree has already been computed.

If the total time in expression (6) is amortized across the computation ofall supervariables i ∈ Lp, then the time taken to compute di is

Θ(|Ai| + |Ei|) = O(|A0

i |),

which is Θ(|AdjGk(i)|) if all variables are simple. Computing our bound takestime proportional to the degree of the variable in the quotient graph, G. Thisis much faster than the time taken to compute the exact external degree (seeexpression (3)).

4.1 Accuracy of our approximate degrees

Gilbert, Moler, and Schreiber [24] also use approximate external degrees thatthey can compute in the same time as our degree bound di. In our notation,their bound di is

di = |Ai \ i| +∑

e∈Ei

|Le \ i|.

Since many pivotal variables are adjacent to two or fewer elements whenselected, Ashcraft, Eisenstat, and Lucas [4] have suggested a combination of

15

di and di,

di =

di if |Ei| = 2

di otherwise.

Computing di takes the same time as di or di, except when |Ei| = 2. In thiscase, it takes O(|Ai|+ |Le|) time to compute di, whereas computing di or di

takes Θ(|Ai|) time. In the Yale Sparse Matrix Package [17] the |Le \Lp| termfor the Ei = e, p case is computed by scanning Le once. It is then used tocompute di for all i ∈ Lp for which Ei = e, p. This technique can also be

used to compute di, and thus the time to compute di is O(|Ai| + |Le|) andnot Θ(|Ai| + |Le|).

Theorem 4.1 Relationship between external degree and the three approxi-mate degree bounds. The equality di = di = di = di holds when |Ei| ≤ 1.The inequality di = di = di ≤ di holds when |Ei| = 2. Finally, the inequal-ity di ≤ di ≤ di = di holds when |Ei| > 2. Consequently, the inequalitydi ≤ di ≤ di ≤ di holds for all values of |Ei|.

Proof. The bound di is equal to the exact degree when variable i isadjacent to at most one element (|Ei| ≤ 1). The accuracy of the di bound isunaffected by the size of Ai, since entries are removed from A that fall withinthe pattern L of an element. Thus, if there is just one element (the currentelement p, say), the bound di is tight. If |Ei| is two (the current element, p,and a prior element e, say), we have

di = |Ai \ i| + |Lp \ i| + |Le \ i| = di + |(Le ∩ Lp) \ i|.

The bound di counts entries in the set (Le ∩ Lp) \ i twice, and so di will bean overestimate in the possible (even likely) case that a variable j 6= i existsthat is adjacent to both e and p. Combined with the definition of d, we havedi = di = di when |Ei| ≤ 1, di = di ≤ di when |Ei| = 2, and di ≤ di = di

when |Ei| > 2.If |Ei| ≤ 1 our bound di is exact for the same reason that di is exact. If

|Ei| is two we have

di = |Ai \ i| + |Lp \ i| + |Le \ Lp| = di.

No entry is in both Ai and any element L, since these redundant entries areremoved from Ai. Any entry in Lp does not appear in the external subset

16

(Le\Lp). Thus, no entry is counted twice, and di = di when |Ei| ≤ 2. Finally,

consider both di and di when |Ei| > 2. We have

di = |Ai \ i| + |Lp \ i| +∑

e∈Ei\p

|Le \ Lp|

anddi = |Ai \ i| + |Lp \ i| +

∑

e∈Ei\p

|Le \ i|.

Since these degree bounds are only used when computing the degree of asupervariable i ∈ Lp, we have i ⊆ Lp. Thus, di ≤ di when |Ei| > 2.

Note that, if a variable i is adjacent to two elements or less then ourbound is equal to the exact external degree. This is very important, sincemost variables of minimum degree are adjacent to two elements or less.

4.2 Degree computation example

We illustrate the computation of our approximate external degree bound inFigures 1 and 2. Variable 6 is adjacent to three elements in G3 and G4. Allother variables are adjacent to two or less elements. In G3, the bound d6 istight, since the two sets |L1 \ L3| and |L2 \ L3| are disjoint.

In graph G4, the current pivot element is p = 4. We compute

d6 = |Ai \ i| + |Lp \ i| + (∑

e∈Ei\p

|Le \ Lp|)

= |∅ \ 6| + |6, 7, 8 \ 6| + (|L2 \ L4| + |L3 \ L4|)

= |7, 8| + (|5, 6, 9 \ 6, 7, 8|+ |5, 6, 7 \ 6, 7, 8|)

= |7, 8| + (|5, 9|+ |5|)

= 5.

The exact external degree of variable 6 is d6 = 4, as can be seen in theelimination graph G4 on the left of Figure 2(a). Our bound is one more thanthe exact external degree, since the variable 5 appears in both L2 \ L4 andL3 \ L4, but is one less than the bound di which is equal to 6 in this case.Our bound on the degree of variable 6 is again tight after the next pivot step,since elements 2 and 3 are absorbed into element 5.

17

5 The approximate minimum degree algorithm

The Approximate Minimum Degree algorithm is identical to Algorithm 1,except that the external degree, di, is replaced with di, throughout. Thebound on the external degree, di, is computed using Algorithm 2 and equa-tions (4) and (5). In addition to the natural absorption of elements in Ep, anyelement with an empty external subset (|Le \ Lp| = 0) is also absorbed intoelement p, even if e is not adjacent to p. This aggressive element absorptionimproves the degree bounds by reducing |E|. For many matrices, aggressiveabsorption rarely occurs. In some cases, however, up to half of the elementsare aggressively absorbed. Consider the matrix

a11 0 a13 a14

0 a22 a23 a24

a31 a32 a33 0a41 a42 0 a44

,

where we assume the pivots are chosen down the diagonal in order. Theexternal subset |L1 \ L2| is zero (L1 = L2 = 3, 4). Element 2 aggresivelyabsorbs element 1, even though element 1 is not adjacent to variable 2 (a12

is zero).As in many other minimum degree algorithms, we use a set of n linked

lists to assist the search for a variable of minimum degree. A single linked listholds all supervariables with the same degree bound. Maintaining this datastructure takes time proportional to the total number of degree computations,or O(|L|) in the worst case.

Computing the pattern of each pivot element, Lp, takes a total of O(|L|)time overall, since each element is used in the computation of at most oneother element, and the total sizes of all elements constructed is O(|L|).

The AMD algorithm is based on the quotient graph data structure usedin the MA27 minimum degree algorithm [13, 14, 15]. Initially, the sets Aare stored, followed by a small amount of elbow room. When the set Lp

is formed, it is placed in the elbow room (or in place of Ap if |Ep| = 0).Garbage collection occurs if the elbow room is exhausted. During garbagecollection, the space taken by Ai and Ei is reduced to exactly |Ai| + |Ei| foreach supervariable i (which is less than or equal to |A0

i |) and the extra spaceis reclaimed. The space for Ae and Ee for all elements e ∈ V is fully reclaimed,as is the space for Le of any absorbed elements e. Each garbage collection

18

takes time that is proportional to the size of the workspace (normally Θ(|A|)).In practice, elbow room of size n is sufficient.

During the computation of our degree bounds, we compute the followinghash function for supervariable detection [3],

Hash(i) =

∑

j∈Ai

j +∑

e∈Ei

e

mod (n − 1)

+ 1,

which increases the degree computation time by a small constant factor. Weplace each supervariable i in a hash bucket according to Hash(i), taking timeO(|L|) overall. If two or more supervariables are placed in the same hashbucket, then each pair of supervariables i and j in the hash bucket are testedfor indistinguishability. If the hash function results in no collisions then thetotal time taken by the comparison is O(|A|).

Ashcraft [3] uses this hash function as a preprocessing step on the entirematrix (without the mod (n−1) term, and with an O(|V | log |V |) sort insteadof |V | hash buckets). In contrast, we use this function during the ordering,and only hash those variables adjacent to the current pivot element.

For example, variables 7, 8, and 9 are indistinguishable in G5, in Fig-ure 2(a). The AMD algorithm would not consider variable 8 at step 5, sinceit is not adjacent to the pivot element 5 (refer to quotient graph G5 in Fig-ure 2(b)). AMD would not construct 7 = 7, 9 at step 5, since 7 and 9 aredistinguishable in G5. It would construct 7 = 7, 8, 9 at step 6, however.

The total number of times the approximate degree di of variable i iscomputed during elimination is no more than the number of nonzero entriesin row k of L, where variable i is the kth pivot. The total time takento compute di using Algorithm 2 and equations (4) and (5) is O(|A0

i |), orequivalently O(|(PAPT )k∗|), the number of nonzero entries in row k of thepermuted matrix. The total time taken by the entire AMD algorithm is thusbounded by the degree computation,

O

(n∑

k=1

|Lk∗| · |(PAPT )k∗|

). (7)

This bound assumes no (or few) supervariable hash collisions and a constantnumber of garbage collections. In practice these assumptions seem to hold,but the asymptotic time would be higher if they did not. In many problemdomains, the number of nonzeros per row of A is a constant, independent

19

of n. For matrices in these domains, our AMD algorithm takes time O(|L|)(with the same assumptions).

6 Performance results

In this section, we present the results of our experiments with AMD on a widerange of test matrices. We first compare the degree computations discussedabove (t, d, d, d, and d), as well as an upper bound on the true degree,t ≡ d + |i| − 1. We then compare the AMD algorithm with other establishedminimum degree codes (MMD and MA27).

6.1 Test Matrices

We tested all degree bounds and codes on all matrices in the Harwell/Boeingcollection of type PUA, RUA, PSA, and RSA [11, 12] (at orion.cerfacs.fror numerical.cc.rl.ac.uk), all non-singular matrices in Saad’s SPARSKIT2collection (at ftp.cs.umn.edu), all matrices in the University of Florida col-lection (available from ftp.cis.ufl.edu in the directory pub/umfpack/matrices),and several other matrices from NASA and Boeing. Of those 378 matrices, wepresent results below on those matrices requiring 500 million or more floating-point operations for the Cholesky factorization, as well as the ORANI678matrix in the Harwell/Boeing collection and the EX19 in Saad’s collection(a total of 26 matrices). The latter two are best-case and worst-case examplesfrom the set of smaller matrices.

For the unsymmetric matrices in the test set, we first used the maximumtransversal algorithm MC21 from the Harwell Subroutine Library [9] to reorderthe matrix so that the permuted matrix has a zero-free diagonal. We thenformed the symmetric pattern of the permuted matrix plus its transpose.This is how a minimum degree ordering algorithm is used in MUPS [1, 2].For these matrices, Table 1 lists the statistics for the symmetrized pattern.

Table 1 lists the matrix name, the order, the number of nonzeros inlower triangular part, two statistics obtained with an exact minimum degreeordering (using d), and a description. In column 4, we report the percentageof pivots p such that |Ep| > 2. Column 4 shows that there is only a smallpercentage of pivots selected using an exact minimum degree ordering thathave more than two elements in their adjacency list. Therefore, we can expecta good quality ordering with an algorithm based on our approximate degree

20

Table 1: Selected matrices in test set

Matrix n nz Percentage of Description|Ep| > 2 |Ei| > 2

RAEFSKY3 21,200 733,784 0.00 13.4 fluid/structure interaction, turbulenceVENKAT01 62,424 827,684 0.71 15.7 unstructured 2D Euler solverBCSSTK32 44,609 985,046 0.20 27.3 structural eng., automobile chassisEX19 12,005 123,937 1.57 29.4 2D developing pipe flow (turbulent)BCSSTK30 28,924 1,007,284 0.66 31.8 structural eng., off-shore platformCT20STIF 52,329 1,323,067 0.77 33.2 structural eng., CT20 engine blockNASASRB 54,870 1,311,227 0.06 35.0 shuttle rocket boosterOLAF 16,146 499,505 0.41 35.2 NASA test problemRAEFSKY1 3,242 145,517 0.00 38.9 incompressible flow, pressure-driven pipeCRYSTK03 24,696 863,241 0.00 40.9 structural eng., crystal vibrationRAEFSKY4 19,779 654,416 0.00 41.4 buckling problem for container modelCRYSTK02 13,965 477,309 0.00 42.0 structural eng., crystal vibrationBCSSTK33 8,738 291,583 0.00 42.6 structural eng., auto steering mech.BCSSTK31 35,588 572,914 0.60 43.1 structural eng., automobile componentEX11 16,614 540,167 0.04 43.3 CFD, 3D cylinder & flat plate heat exch.FINAN512 74,752 261,120 1.32 46.6 economics, portfolio optimizationRIM 22,560 862,411 2.34 63.2 chemical eng., fluid mechanics problemBBMAT 38,744 1,274,141 5.81 64.4 CFD, 2D airfoil with turbulenceEX40 7,740 225,136 17.45 64.7 CFD, 3D die swell problem on square dieWANG4 26,068 75,564 15.32 78.3 3D MOSFET semicond. (30x30x30 grid)LHR34 35,152 608,830 7.69 78.7 chemical eng., light hydrocarbon recoveryWANG3 26,064 75,552 15.29 79.2 3D diode semiconductor (30x30x30 grid)LHR71 70,304 1,199,704 8.47 81.1 chemical eng., light hydrocarbon recoveryORANI678 2,529 85,426 6.68 86.9 Australian economic modelPSMIGR1 3,140 410,781 6.65 91.0 US county-by-county migrationAPPU 14,000 1,789,392 15.64 94.4 NASA test problem (random matrix)

21

bound. In column 5, we indicate how often a degree di is computed when|Ei| > 2 (as a percentage of the total number of degree updates). Table 1 issorted according to this degree update percentage. Column 5 thus reportsthe percentage of “costly” degree updates performed by a minimum degreealgorithm based on the exact degree. For matrices with relatively large valuesin column 5, significant time reductions can be expected with an approximatedegree based algorithm.

Since any minimum degree algorithm is sensitive to tie-breaking issues, werandomly permuted all matrices and their adjacency lists 21 times (except forthe random APPU matrix, which we ran only once). All methods were giventhe same set of 21 randomized matrices. We also ran each method on theoriginal matrix. On some matrices, the original matrix gives better orderingtime and fill-in results for all methods than the best result obtained with therandomized matrices. The overall comparisons are not however dependenton whether original or randomized matrices are used. We thus report onlythe median ordering time and fill-in obtained for the randomized matrices.

The APPU matrix is a random matrix used in a NASA benchmark, andis thus not representative of sparse matrices from real problems. We includeit in our test set as a pathological case that demonstrates how well AMDhandles a very irregular problem. Its factors are about 90% dense. It wasnot practical to run the APPU matrix 21 times because the exact degreeupdate algorithms took too much time.

6.2 Comparing the exact and approximate degrees

To make a valid comparison between degree update methods, we modifiedour code for the AMD algorithm so that we could compute the exact externaldegree (d), our bound (d), the d bound, the d bound, the exact true degree(t), and our upper bound on the true degree (t). The six codes based on d, d,d, d, t, and t (columns 3 to 8 of Table 2) differ only in how they compute thedegree. Since aggressive absorption is more difficult when using some boundsthan others, we switched off aggressive absorption for these six codes. Theactual AMD code (in column 2 of Table 2) uses d with aggressive absorption.

Table 2 lists the median number of nonzeros below the diagonal in L (inthousands) for each method. Results 20% higher than the lowest median |L|in the table (or higher) are underlined. Our upper bound on the true degree(t) and the exact true degree (t) give nearly identical results. As expected,using minimum degree algorithms based on external degree noticeably im-

22

Table 2: Median fill-in results of the degree update methods

Matrix Number of nonzeros below diagonal in L, in thousands

AMD d d d d t t

RAEFSKY3 4709 4709 4709 4709 5114 4992 4992VENKAT01 5789 5771 5789 5798 6399 6245 6261BCSSTK32 5080 5081 5079 5083 5721 5693 5665EX19 319 319 319 318 366 343 343BCSSTK30 3752 3751 3753 3759 4332 4483 4502CT20STIF 10858 10758 10801 11057 13367 12877 12846NASASRB 12282 12306 12284 12676 14909 14348 14227OLAF 2860 2858 2860 2860 3271 3089 3090RAEFSKY1 1151 1151 1151 1151 1318 1262 1262CRYSTK03 13836 13836 13836 13836 17550 15507 15507RAEFSKY4 7685 7685 7685 7685 9294 8196 8196CRYSTK02 6007 6007 6007 6007 7366 6449 6449BCSSTK33 2624 2624 2624 2640 3236 2788 2787BCSSTK31 5115 5096 5132 5225 6194 6079 6057EX11 6014 6016 6014 6014 7619 6673 6721FINAN512 4778 4036 6042 11418 11505 8235 8486RIM 3948 3898 3952 3955 4645 4268 4210BBMAT 19673 19880 19673 21422 37820 21197 21445EX40 1418 1386 1417 1687 1966 1526 1530WANG4 6547 6808 6548 6566 7871 7779 7598LHR34 3618 3743 3879 11909 27125 4383 4435WANG3 6545 6697 6545 6497 7896 7555 7358LHR71 7933 8127 8499 28241 60175 9437 9623ORANI678 147 147 146 150 150 147 146PSMIGR1 3020 3025 3011 3031 3176 2966 2975APPU 87648 87613 87648 87566 87562 87605 87631

23

proves the quality of the ordering (compare columns 3 and 7, or columns 4and 8). From the inequality d ≤ d ≤ d ≤ d, we would expect a similar rank-ing in the quality of ordering produced by these methods. Table 2 confirmsthis. The bound d and the exact external degree d produce nearly identicalresults. Comparing the AMD results and the d column, aggressive absorptiontends to result in slightly lower fill-in, since it reduces |E| and thus improvesthe accuracy of our bound. The d bound is often accurate enough to producegood results, but can fail catastrophically for matrices with a high percentageof approximate pivots (see column 4 in Table 1). The less accurate d boundproduces notably worse results for many matrices.

Comparing all 378 matrices, the median |L| when using d is never morethan 9% higher than the median fill-in obtained when using the exact externaldegree, d (with the exception of the FINAN512 matrix). The fill-in results ford and d are identical for nearly half of the 378 matrices. The approximatedegree bound d thus gives a very reliable estimation of the degree in thecontext of a minimum degree algorithm.

The FINAN512 matrix is highly sensitive to tie-breaking variations. Itsgraph consists of two types of nodes: “constraint” nodes and “linking” nodes[5]. The constraint nodes form independent sparse subgraphs, connected to-gether via a tree of linking nodes. This matrix is a pathological worst-casematrix for any minimum degree method. All constraint nodes should be or-dered first, but linking nodes have low degree and tend to be selected first,which causes high fill-in. Using a tree dissection algorithm, Berger, Mul-vey, Rothberg, and Vanderbei [5] obtain an ordering with only 1.83 millionnonzeros in L.

Table 3 lists the median ordering time (in seconds on a SUN SPARCsta-tion 10) for each method. Ordering time twice that of the minimum medianordering time listed in the table (or higher) is underlined. Computing thed bound is often the fastest, since it requires a single pass over the elementlists instead of the two passes required for the d bound. It is, however, some-times slower than d because it can generate more fill-in, which increases theordering time (see expression (7)). The ordering time of the two exact degreeupdates (d and t) increases dramatically as the percentage of “costly” degreeupdates increases (those for which |Ei| > 2).

Garbage collection has little effect on the ordering time obtained. In theabove runs, we gave each method elbow room of size n. Usually a singlegarbage collection occurred. At most two garbage collections occurred forAMD, and at most three for the other methods (aggressive absorption reduces

24

Table 3: Median ordering time of the degree update methods

Matrix Ordering time, in seconds

AMD d d d d t t

RAEFSKY3 1.05 1.10 1.09 1.05 1.02 1.15 1.09VENKAT01 4.07 4.95 4.11 4.47 3.88 4.32 3.85BCSSTK32 4.67 5.64 4.54 4.91 4.35 5.55 4.48EX19 0.87 1.12 0.89 1.01 0.86 1.09 0.87BCSSTK30 3.51 5.30 3.55 3.65 3.51 4.38 3.38CT20STIF 6.62 8.66 6.54 7.07 6.31 8.63 6.45NASASRB 7.69 11.03 7.73 9.23 7.78 11.78 7.99OLAF 1.83 2.56 1.90 2.16 1.83 2.33 1.78RAEFSKY1 0.27 0.34 0.28 0.32 0.25 0.35 0.28CRYSTK03 3.30 4.84 3.08 3.68 3.14 5.23 3.30RAEFSKY4 2.32 2.90 2.18 2.45 2.08 3.12 2.07CRYSTK02 1.49 2.34 1.55 1.64 1.45 2.04 1.52BCSSTK33 0.91 1.36 1.05 0.99 0.85 1.62 0.91BCSSTK31 4.55 7.53 4.92 5.68 4.56 7.41 4.92EX11 2.70 4.06 2.77 3.00 2.60 4.23 2.89FINAN512 15.03 34.11 14.45 17.79 15.84 46.49 18.58RIM 5.74 10.38 5.69 6.12 5.72 10.01 5.58BBMAT 27.80 115.75 27.44 42.17 23.02 129.32 28.33EX40 1.04 1.56 1.10 1.09 0.95 1.46 1.12WANG4 5.45 11.45 5.56 6.98 5.21 11.59 5.88LHR34 19.56 109.10 25.62 45.36 43.70 125.41 24.73WANG3 5.02 10.45 5.49 6.52 4.81 11.02 5.02LHR71 46.03 349.58 58.25 129.85 121.96 389.70 60.40ORANI678 5.49 196.01 8.13 6.97 7.23 199.01 8.45PSMIGR1 10.61 334.27 10.07 14.20 8.16 339.28 9.94APPU 41.75 2970.54 39.83 43.20 40.64 3074.44 38.93

25

the memory requirements).

6.3 Comparing algorithms

In this section, we compare AMD with two other established minimum degreecodes: Liu’s Multiple Minimum Degree (MMD) code [25] and Duff and Reid’sMA27 code [15]. MMD stores the element patterns L in a fragmented mannerand requires no elbow room [20, 21]. It uses the exact external degree,d. MMD creates supervariables only when two variables i and j have noadjacent variables and exactly two adjacent elements (Ei = Ej = e, p, andAi = Aj = ∅, where p is the current pivot element). A hash function isnot required. MMD takes advantage of multiple elimination and incompleteupdate.

MA27 uses the true degree, t, and the same data structures as AMD.It detects supervariables whenever two variables are adjacent to the currentpivot element and have the same structure in the quotient graph (as doesAMD). MA27 uses the true degree as the hash function for supervariabledetection, and does aggressive absorption. Neither AMD nor MA27 takeadvantage of multiple elimination or incomplete update.

Structural engineering matrices tend to have many rows of identical nonzeropattern. Ashcraft [3] has found that the total ordering time of MMD can besignificantly improved by detecting these initial supervariables before start-ing the elimination We implemented the pre-compression algorithm used in[3], and modified MMD to allow for initial supervariables. We call the re-sulting code CMMD (“compressed” MMD). Pre-compression has little effecton AMD, since it finds these supervariables when their degrees are first up-dated. The AMD algorithm on compressed matrices together with the costof pre-compression was never faster than AMD.

Table 4 lists the median number of nonzeros below the diagonal in L(in thousands) for each code. Results 20% higher than the lowest median|L| in the table (or higher) are underlined. AMD, MMD, and CMMD findorderings of about the same quality. MA27 is slightly worse because it usesthe true degree (t) instead of the external degree (d).

Considering the entire set of 378 matrices, AMD produces a better me-dian fill-in than MMD, CMMD, and MA27 for 62%, 61%, and 81% of thematrices, respectively. AMD never generates more than 7%, 7%, and 4%more nonzeros in L than MMD, CMMD, and MA27, respectively. We haveshown empirically that AMD produces an ordering at least as good as these

26

Table 4: Median fill-in results of the four codes

Matrix Number of nonzeros below diagonalin L, in thousands

AMD MMD CMMD MA27RAEFSKY3 4709 4779 4724 5041VENKAT01 5789 5768 5811 6303BCSSTK32 5080 5157 5154 5710EX19 319 322 324 345BCSSTK30 3752 3788 3712 4529CT20STIF 10858 11212 10833 12760NASASRB 12282 12490 12483 14068OLAF 2860 2876 2872 3063RAEFSKY1 1151 1165 1177 1255CRYSTK03 13836 13812 14066 15496RAEFSKY4 7685 7539 7582 8245CRYSTK02 6007 5980 6155 6507BCSSTK33 2624 2599 2604 2766BCSSTK31 5115 5231 5216 6056EX11 6014 5947 6022 6619FINAN512 4778 8180 8180 8159RIM 3948 3947 3914 4283BBMAT 19673 19876 19876 21139EX40 1418 1408 1401 1521WANG4 6547 6619 6619 7598LHR34 3618 4162 4162 4384WANG3 6545 6657 6657 7707LHR71 7933 9190 9190 9438ORANI678 147 147 147 147PSMIGR1 3020 2974 2974 2966APPU 87648 87647 87647 87605

27

other three methods for this large test set.If the apparent slight difference in ordering quality between AMD and

MMD is statistically significant, we conjecture that it has more to do withearlier supervariable detection (which affects the external degree) than withthe differences between the external degree and our upper bound.

Table 5 lists the median ordering time (in seconds on a SUN SPARCsta-tion 10) for each method. The ordering time for CMMD includes the timetaken by the pre-compression algorithm. Ordering time twice that of theminimum median ordering time listed in the table (or higher) is underlined.On certain classes of matrices, typically those from structural analysis appli-cations, CMMD is significantly faster than MMD. AMD is the fastest methodfor all but the EX19 matrix. For the other 352 matrices in our full test set,the differences in ordering time between these various methods is typicallyless. If we compare the ordering time of AMD with the other methods onall matrices in our test set requiring at least a tenth of a second of orderingtime, then AMD is slower than MMD, CMMD, and MA27 only for 6, 15,and 8 matrices respectively. For the full set of matrices, AMD is never morethan 30% slower than these other methods. The best and worst cases for therelative run-time of AMD for the smaller matrices are included in Table 5(the EX19 and ORANI678 matrices).

7 Summary

We have described a new upper bound for the degree of nodes in the elim-ination graph that can be easily computed in the context of a minimumdegree algorithm. We have demonstrated that this upper-bound for the de-gree is more accurate than all previously used degree approximations. Wehave experimentally shown that we can replace an exact degree update byour approximate degree update and obtain almost identical fill-in.

An Approximate Minimum Degree (AMD) based on external degree ap-proximation has been described. We have shown that the AMD algorithmis highly competitive with other ordering algorithms. It is typically fasterthan other minimum degree algorithms, and produces comparable results toMMD (which is also based on external degree) in terms of fill-in. AMD typ-ically produces better results, in terms of fill-in and computing time, thanthe MA27 minimum degree algorithm (based on true degrees).

28

Table 5: Median ordering time of the four codes

Matrix Ordering time, in secondsAMD MMD CMMD MA27

RAEFSKY3 1.05 2.79 1.18 1.23VENKAT01 4.07 9.01 4.50 5.08BCSSTK32 4.67 12.47 5.51 6.21EX19 0.87 0.69 0.83 1.03BCSSTK30 3.51 7.78 3.71 4.40CT20STIF 6.62 26.00 9.59 9.81NASASRB 7.69 22.47 11.28 12.75OLAF 1.83 5.67 4.41 2.64RAEFSKY1 0.27 0.82 0.28 0.40CRYSTK03 3.30 10.63 3.86 5.27RAEFSKY4 2.32 5.24 2.36 2.91CRYSTK02 1.49 3.89 1.53 2.37BCSSTK33 0.91 2.24 1.32 1.31BCSSTK31 4.55 11.60 7.76 7.92EX11 2.70 7.45 5.05 3.90FINAN512 15.03 895.23 897.15 40.31RIM 5.74 9.09 8.11 10.13BBMAT 27.80 200.86 201.03 134.58EX40 1.04 2.13 2.04 1.30WANG4 5.45 10.79 11.60 9.86LHR34 19.56 139.49 141.16 77.83WANG3 5.02 10.37 10.62 8.23LHR71 46.03 396.03 400.40 215.01ORANI678 5.49 124.99 127.10 124.66PSMIGR1 10.61 186.07 185.74 229.51APPU 41.75 5423.23 5339.24 2683.27

29

8 Acknowledgments

We would like to thank John Gilbert for outlining the di ≤ di portion ofthe proof to Theorem 4.1, Joseph Liu for providing a copy of the MMDalgorithm, and Cleve Ashcraft and Stan Eisenstat for their comments on adraft of this paper.

30

References

[1] P. R. Amestoy, Factorization of large sparse matrices based on a mul-tifrontal approach in a multiprocessor environment, INPT PhD ThesisTH/PA/91/2, CERFACS, Toulouse, France, 1991.

[2] P. R. Amestoy, M. Dayde, and I. S. Duff, Use of level 3 BLASin the solution of full and sparse linear equations, in High PerformanceComputing: Proceedings of the International Symposium on High Per-formance Computing, Montpellier, France, 22–24 March, 1989, J.-L.Delhaye and E. Gelenbe, eds., Amsterdam, 1989, North Holland, pp. 19–31.

[3] C. Ashcraft, Compressed graphs and the minimum degree algorithm,SIAM J. Sci. Comput. , 1995, to appear.

[4] C. Ashcraft, S. C. Eisenstat, and R. F. Lucas. personal com-munication.

[5] A. Berger, J. Mulvey, E. Rothberg, and R. Vanderbei, Solvingmultistage stochachastic programs using tree dissection, Tech. ReportSOR-97-07, Program in Statistics and Operations Research, PrincetonUniversity, Princeton, New Jersey, 1995.

[6] T. H. Cormen, C. E. Leiserson, and R. L. Rivest, Introductionto Algorithms, MIT Press, Cambridge, Massachusets, and McGraw-Hill,New York, 1990.

[7] T. A. Davis and I. S. Duff, An unsymmetric-pattern multifrontalmethod for sparse LU factorization, SIAM J. Matrix Anal. Applic. , toappear.

[8] , Unsymmetric-pattern multifrontal methods for parallel sparse LUfactorization, Tech. Report TR-91-023, CISE Dept., Univ. of Florida,Gainesville, FL, 1991.

[9] I. S. Duff, On algorithms for obtaining a maximum transversal, ACMTrans. Math. Software, 7 (1981), pp. 315–330.

[10] I. S. Duff, A. M. Erisman, and J. K. Reid, Direct Methods forSparse Matrices, London: Oxford Univ. Press, 1986.

31

[11] I. S. Duff, R. G. Grimes, and J. G. Lewis, Sparse matrix testproblems, ACM Trans. Math. Software, 15 (1989), pp. 1–14.

[12] , Users’ guide for the Harwell-Boeing sparse matrix collection (Re-lease 1), Tech. Report RAL-92-086, Rutherford Appleton Laboratory,Didcot, Oxon, England, Dec. 1992.

[13] I. S. Duff and J. K. Reid, A comparison of sparsity orderings forobtaining a pivotal sequence in Gaussian elimination, J. Inst. of Math.Appl., 14 (1974), pp. 281–291.

[14] , MA27 – A set of Fortran subroutines for solving sparse symmetricsets of linear equations, Tech. Report AERE R10533, HMSO, London,1982.

[15] , The multifrontal solution of indefinite sparse symmetric linearequations, ACM Trans. Math. Software, 9 (1983), pp. 302–325.

[16] , The multifrontal solution of unsymmetric sets of linear equations,SIAM J. Sci. Comput., 5 (1984), pp. 633–641.

[17] S. C. Eisenstat, M. C. Gursky, M. H. Schultz, and A. H. Sher-

man, Yale sparse matrix package, I: The symmetric codes, Internat. J.Numer. Methods Eng., 18 (1982), pp. 1145–1151.

[18] S. C. Eisenstat, M. H. Schultz, and A. H. Sherman, Algorithmsand data structures for sparse symmetric Gaussian elimination, SIAMJ. Sci. Comput., 2 (1981), pp. 225–237.

[19] A. George and J. W. H. Liu, A fast implementation of the minimumdegree algorithm using quotient graphs, ACM Trans. Math. Software, 6(1980), pp. 337–358.

[20] , A minimal storage implementation of the minimum degree algo-rithm, SIAM J. Numer. Anal., 17 (1980), pp. 282–299.

[21] , Computer Solution of Large Sparse Positive Definite Systems, En-glewood Cliffs, New Jersey: Prentice-Hall, 1981.

[22] , The evolution of the minimum degree ordering algorithm, SIAMReview, 31 (1989), pp. 1–19.

32

[23] A. George and D. R. McIntyre, On the application of the minimumdegree algorithm to finite element systems, SIAM J. Numer. Anal., 15(1978), pp. 90–111.

[24] J. R. Gilbert, C. Moler, and R. Schreiber, Sparse matrices inMATLAB: design and implementation, SIAM J. Matrix Anal. Applic.,13 (1992), pp. 333–356.

[25] J. W. H. Liu, Modification of the minimum-degree algorithm by multi-ple elimination, ACM Trans. Math. Software, 11 (1985), pp. 141–153.

[26] H. M. Markowitz, The elimination form of the inverse and its appli-cation to linear programming, Management Science, 3 (1957), pp. 255–269.

[27] D. J. Rose, Symmetric Elimination on Sparse Positive Definite Sys-tems and the Potential Flow Network Problem, PhD thesis, AppliedMath., Harvard Univ., 1970.

[28] , A graph-theoretic study of the numerical solution of sparse positivedefinite systems of linear equations, in Graph Theory and Computing,R. C. Read, ed., New York: Academic Press, 1973, pp. 183–217.

[29] B. Speelpenning, The generalized element method, Tech. Report Tech-nical Report UIUCDCS-R-78-946, Dept. of Computer Science, Univ. ofIllinois, Urbana, IL, 1978.

[30] W. F. Tinney and J. W. Walker, Direct solutions of sparse net-work equations by optimally ordered triangular factorization, Proc. ofthe IEEE, 55 (1967), pp. 1801–1809.

[31] M. Yannakakis, Computing the minimum fill-in is NP-complete,SIAM J. Alg. Disc. Meth., 2 (1981), pp. 77–79.

an approximate minimum degree ordering algorithm

Documents

minimum degree algorithms

numerical factorization

actual degree

subsequent factorization

symmetric sparse matrix

cholesky factorization

symmetric reordering

resulting algorithm