a parallel multilevel metaheuristic for graph partitioning

Journal of Heuristics, 10: 315–336, 2004c© 2004 Kluwer Academic Publishers. Manufactured in The Netherlands.

A Parallel Multilevel Metaheuristicfor Graph Partitioning

R. BANOSC. GIL∗Departamento de Arquitectura de Computadores y Electronica, Universidad de Almeria,La Canada de San Urbano s/n, 04120 Almeria, Spainemail: [email protected]: [email protected]

J. ORTEGADepartamento de Arquitectura y Tecnologia de Computadores, Universidad de Granada,Campus de Fuentenueva, Granada, Spainemail: [email protected]

F.G. MONTOYADepartamento de Ingenieria Civil, Universidad de Granada, Campus de Fuentenueva, Granada, Spainemail: [email protected]

Submitted in September 2003 and accepted by Enrique Alba in March 2004 after 1 revision

Abstract

One significant problem of optimisation which occurs in many scientific areas is that of graph partitioning. Severalheuristics, which pertain to high quality partitions, have been put forward. Multilevel schemes can in fact improvethe quality of the solutions. However, the size of the graphs is very large in many applications, making it impossibleto effectively explore the search space. In these cases, parallel processing becomes a very useful tool overcomingthis problem. In this paper, we propose a new parallel algorithm which uses a hybrid heuristic within a multilevelscheme. It is able to obtain very high quality partitions and improvement on those obtained by other algorithmspreviously put forward.

Key Words: graph partitioning, parallel optimisation, multilevel optimisation, metaheuristic, simulatedannealing, tabu search

The Graph Partitioning Problem (GPP) occurs in many areas. For example, VLSI design(Banerjee, 1994; Alpert and Kahng, 1995), test pattern generation (Klenke, Williams, andAylor, 1992; Gil et al., 2002), data-mining (Mobasher et al., 1996), efficient storage of greatdata bases (SheKhar and DLiu, 1996), geographical information systems (Guo, Trinidad,and Smith, 2000), etc. The critical issue is finding a partition of the vertices of a graph intoa given number of roughly equal parts, whilst ensuring that the number of edges connectingthe vertices in different sub-graphs is minimised. As the problem is NP-complete (Garey

∗Author to whom all correspondence should be addressed.

316 BANOS ET AL.

and Johnson, 1979), efficient procedures providing high quality solutions in a reasonableamount of time are very useful.

Different strategies have been proposed to solve the GPP. A classification of graph par-titioning algorithms is shown as follows.

• Local vs. Global Methods: If the partitioning algorithm uses a previously obtained initialpartition, it is called a local method (Kernighan and Lin, 1970). However, if the algorithmalso obtain the initial partition, it is called global (Simon, 1991).

• Geometric vs. Coordinate-free Methods: If the algorithm takes into account the spatiallocation of the vertices (Gilbert, Miller, and Teng, 1998), it is called geometric. Otherwise,if only connectivity among vertices is considered, it is called a coordinate-free method(Simon, 1991).

• Static vs. Dynamic Partitioning: The static partitioning model divides the graph onlyonce (Simon, 1991). Sometimes, due to the characteristics of the application where GPPoccurs, the graph structure changes dynamically, thus making it necessary to repeat-edly apply an optimisation. These algorithms (Schloegel, Karypis, and Kumar, 2001) arecalled dynamic.

• Multilevel vs. Non-Multilevel Schemes: If the algorithm directly divides the target graph(Kernighan and Lin, 1970), it is called Non-Multilevel. Otherwise, if the graph is groupedseveral times, divided in the lowest level and then ungrouped up to the target graph, it iscalled Multilevel (Karypis and Kumar, 1998a).

• Serial vs. Parallel Algorithms: The typical approach to solving the GPP is based onalgorithms that run on a single processor (Karypis and Kumar, 1998a). These algorithmsare called serial algorithms. In other cases, parallel processing is used either to speed upthe serial version or to explore different areas of the search space (Karypis and Kumar,1998b).

In this paper, an efficient parallel multilevel algorithm for GPP is presented. This algo-rithm uses a multilevel scheme in the search process, which includes a metaheuristic basedon mixing Simulated Annealing (SA) and Tabu Search (TS). Further, parallel processing isused to allow a cooperative and simultaneous search of the solution space. This results ina global free-coordinate parallel multilevel algorithm for static graph partitioning, whosepartitions often improve those previously obtained by other algorithms.

Section 1 provides a more precise definition of the GPP and describes the cost functionused in the optimisation process. Section 2 describes the proposed metaheuristic to solve theproblem. Section 3 provides the design of the multilevel scheme, including the metaheuristicas described in Section 2. Section 4 offers a detailed explanation of the parallelisation of themultilevel algorithm, while the analysis of the results obtained are provided in Section 5.Finally, Section 6 gives the conclusions and suggests areas for future work.

1. The graph partitioning problem (GPP)

Given a graph G = (V, E), where V is the set of vertices, with |V | = n, and E the set ofedges which determines the connectivity among the vertices, the GPP consists of dividing

A PARALLEL MULTILEVEL METAHEURISTIC FOR GRAPH PARTITIONING 317

Figure 1. Two possible alternatives to divide a graph.

V into K balanced sub-graphs, V1, V2, . . . , VK , so that Vi ∩ Vj = φ for all i �= j ; and∑Kk=1 |Vk | = |V |. The balance condition is defined as the maximum sub-domain weight,

S = max(|Vk |), for k = 1, . . . , K , divided by the perfect balance, n/K . If a definedimbalance, x%, is allowed, then the GPP try to find a partition such that the number of cutsis minimised subject to the constraint that S ≤ n/K ∗((100 + x)/100). Whenever the verticesand edges have weights, |v| denotes the weight of vertex v, and |e| denotes the weight ofedge e. All the test graphs used to evaluate the quality of our algorithm have vertices andedges with weight equal to one (|v| = 1 and |e| = 1). However, our procedure is able toprocess graphs with any weight values.

Figure 1 shows two possible partitions of a given graph. In figure 1(a) the graph is dividedinto two equally balanced sub-graphs, where the number of cuts is not minimised. On theother hand, figure 1(b) shows a partition with the optimal number of cuts. However, this parti-tion does not fulfil the requirement of load balancing. This example clearly shows the conflictof objectives. In order to choose one or other, the following cost function can be considered:

c(s) = α·ncuts(s) + β·K∑

k=1

2imbalance(k) (s) (1)

2. The metaheuristic: Refined mixed simulated annealing and Tabusearch (rMSATS)

In this section, we provide a description of the hybrid heuristic proposed in Gil, Ortega, andMontoya (2000). Next, we detail the two phases of rMSATS.

2.1. Obtaining the initial partition of the graph

The first step to solve the GPP is to make an initial partitioning of the target graph. Inthis step rMSATS uses a procedure known as Graph Growing Algorithm (GGA) (Karypis

318 BANOS ET AL.

and Kumar, 1998a). This algorithm starts from a randomly selected vertex, which is thenassigned to the first sub-graph, as their adjacent vertices. This recursive process is repeateduntil this sub-graph reaches n/K vertices. From this point, the following visited vertices areassigned to a new sub-graph, and the process is repeated until all the vertices are assignedto their respective sub-graph. As the position of the initial vertex determines the structureof the primary partition, its random selection offers a very useful diversity.

2.2. Optimisation by mixing simulated annealing and Tabu search

The application of an algorithm, such as GGA above, is insufficient to obtain a partitionwith adequate quality. Therefore, a refinement phase is required to effectively explore thesearch space. The problem with local search and hill climbing techniques is that the searchmay stop at local optima. In order to overcome this drawback, rMSATS uses SimulatedAnnealing (SA) (Kirkpatrick, Gelatt, and Vecchi, 1983) and Tabu Search (TS) (Gloverand Laguna, 1993). The continued use of both heuristics results in a hybrid strategy andallows the search process to escape from local minima, while simultaneously preventing theoccurrence of cycles. The concept of SA uses a variable called temperature, t , whose valuediminishes in the successive iterations based on a factor termed Tfactor. The variable t isincluded within the Metropolis function and acts, simultaneously, as a control variable forthe number of iterations of the algorithm, and as a probability factor for a definite solution tobe accepted. The decrease of t implies a reduction in the probability of accepting movementswhich worsen the cost function. On the other hand, when the search space is explored, theuse of a limited neighbourhood can cause the appearance of cycles. To avoid this problem,TS complements SA. Thus, when an adverse movement is accepted by the Metropolisfunction, the vertex is moved and included in the tabu list. In the current iteration, the setof vertices included in the tabu list cannot be moved. However, they are removed from thelist at the end of the next one. Experimental outcomes (Gil, Ortega, and Montoya, 2000)indicate that the use of both techniques improves the results obtained when only SA or TSare applied individually. These experiments also obtain good results in comparison withother multilevel algorithms. In many cases, results obtained by rMSATS outperform theones obtained by the METIS library (Karypis and Kumar, 1998a). Algorithm 1 shows thepseudo-code of the rMSATS procedure. Input parameters are required in the first step. Initialpartition is obtained in the second step by using the GGA algorithm. The loop defined inStep 3 continues until the end condition is false, i.e., the number of iterations is smaller toR and t is larger to the established threshold. In each iteration of this loop, the boundaryvertices are evaluated (Step 3.a). For each boundary vertex, rMSATS evaluates the cost ofthe movement of this vertex to the neighbouring sub-graph, as a function of the cost (Step3.a.1), and the value of t . This movement is either accepted or rejected (Step 3.a.2). If themovement is accepted, then rMSATS test two things. Firstly, if the last movement impliesa worsening of the cost function, then the vertex is added to the tabu list (Step 3.a.2.a).Further, after each movement is analysed, rMSATS verifies that the new solution is betterthan the previous one. In this case (Step 3.a.2.b), the solution is saved. Before starting withthe next iteration, the temperature and iteration counter are updated (Step 3.a.3). Finally, inStep 4 the best solution found is returned.


Algorithm 1: refined Mixed Simulated Annealing and Tabu Search (rMSATS).

1) Input: graph, K, max imb, Ti, Tfactor, R;2) Obtain initial partition of graph by applying GGA; t = Ti;3) While (current iteration ≤ R) OR (t ≥ 0.1)

3.a) For each (boundary vertex v) do3.a.1) cost = cost of the movement of v to the neighbour subgraph;3.a.2) If Metropolis (cost, t) is accepted then

perform movement (v, boundary subgraph);3.a.2.a) If cost movement ≥ 0 then

add v to the tabu list;3.a.2.b) If current solution ≤ best solution then

best solution = current solution;3.a.3) t = t∗Tfactor; current iteration = current iteration + 1;

4) Return the best solution;

3. rMSATS within a multilevel scheme: MultiLevel rMSATS (MLrMSATS)

The Multilevel Paradigm has become an effective strategy for GPP. Currently, most of thealgorithms for graph partitioning use a multilevel paradigm in combination with local searchprocedures, usually variants of the Kernighan and Lin algorithm (KL) (Kernighan and Lin,1970). These algorithms (Karypis and Kumar, 1998a; Walshaw and Cross, 2000b) definethe current state of the art trend.

The Multilevel paradigm consists of three different phases. Figure 2 provides a visualdescription of this strategy. In the first phase, the vertices of the target graph are used tobuild clusters that will become the vertices of a new graph. By continuously repeatingthis process, graphs with fewer vertices will be created, until a sufficiently small graphis obtained. The second phase consist of making a first partition of that graph by usingany of the existing procedures. However, the quality of this first partition is low, since thecoarsening phase intrinsically implies a loss of accuracy. This is the reason to perform athird phase in which the graph is projected towards its original configuration, by applyingsimultaneously a refinement algorithm.

In the following, the main characteristics of the multilevel version of rMSATS,MLrMSATS (Banos et al., 2003), are detailed.

3.1. Coarsening phase

MLrMSATS uses a deterministic matching strategy to coarsen the graph. This HEavyMatching (HEM) strategy consists of matching the vertices according to the weight of theedges that connects them. The vertices not matched in this level are taken and matched withothers not yet visited whose common edge has the highest weight. The advantage of thisalternative is that the heaviest edges are hidden during this phase, and the resulting graphis, therefore built by light edges. Thus, the cuts obtained after the graph partitioning will

320 BANOS ET AL.

Figure 2. Multilevel paradigm.

tend to decrease. The coarsening process finishes when the number of vertices becomesless than the given threshold of z · K , where z is a parameter with a properly selected value(z = 15 in our experiments). Some studies (Karypis and Kumar, 1998a) have determinedthat HEM strategy obtains better solutions than others like Random Matching (RM).

A problem, independent of the matching strategy used to coarse the graph, occurs whena vertex does not have any free neighbour with which it can be matched. These verticeswill pass directly to the next level. This creates an imbalance in the weight of the verticesof the new graph. As the union of vertices is made by matching two vertices, this problemwill get worse in the coarser levels. Thus, the weights of the vertices in the level i , Gi , takevalues in the interval [1, 2i]. If all the vertices of the initial graph have weights equal to one,we will have a graph that is grouped in 10 levels. In the inferior level, G10, it is possible tohave a vertex u with weight |u| = 1, and another v with weight |v| = 210. This makes itdifficult to have balanced sub-graphs, mainly in the coarsest levels, since the movement ofvery heavy vertices between neighbouring partitions cannot be accepted by the objectivefunction. In order to solve this problem, MLrMSATS selects the vertices to be matched inascending order of weights. The algorithm first tries to match vertices with lower weights,i.e., those that have been isolated in previous levels.

3.2. Initial partitioning phase

After the coarsening phase, GGA is applied to obtain the first partition in the coarsest graph.GGA gives a total balanced initial partition, but the efficiency of the solution with reference


to the number of cuts is reduced. Thus, it is necessary to apply a final phase to improve thequality of the partition.

3.3. Un-coarsening phase

It is necessary to apply a given technique to improve the quality of the initial partition.As previously stated, most algorithms use local search methods, usually variations of theKernighan and Lin (1970) procedure. However, to improve the initial solution, MLrMSATSapplies rMSATS in all the levels. In each one, the solution obtained in the previous level isoptimised by using rMSATS with the initial annealing values. The final solution is obtainedby repeating this process until the highest-level graph is reached.

Algorithm 2: MultiLevel rMSATS (MLrMSATS).

1) Input: graph, K , max imb, Ti, Tfactor, R, z;2) Coarse graph N levels by using HEM strategy in function of z and K ;3) Obtain initial partition of graph by applying GGA;4) For i = N to 1 repeat

4.a) Re-initialise parameters using input values;4.b) While (current iteration ≤ R) do

4.b.1) Apply rMSATS(current iteration,max imb,t);5) Return the best solution;

Algorithm 2 formally defines MLrMSATS behaviour. Step 2 corresponds to the coars-ening phase, where HEM is applied. In Step 3, GGA is used to obtain the initial partition,while the refinement process is performed in Step 4 using rMSATS heuristic.

Results obtained by MLrMSATS (Banos et al., 2003) in most of the cases offer animprovement to rMSATS, and also to METIS library algorithms (Karypis and Kumar,1998a).

4. Parallel multilevel metaheuristic: Parallel multilevel simulated annealingand Tabu search (PMSATS)

In the previous section, the main characteristics of MLrMSATS have been described. Thismultilevel algorithm has been parallelised in order to explore the search space using severalinitial partitions and annealing parameters. Due to graph partitioning being a NP-completeproblem and the large size of the graphs used in realistic situations, parallel processing is avery useful technique.

Several metaheuristics, necessary to resolve the combinatorial optimisation problems,also need to be parallelised (Cung et al., 2001). Some of these parallel metaheuristics havebeen successfully applied to the GPP (Diekmann et al., 1996; Randall, 1999). On the otherhand, some multilevel approaches have also been used in parallel (Karypis and Kumar,1996; Walshaw and Cross, 2000b) and obtain very high quality partitions in a reasonableamount of time.

322 BANOS ET AL.

The new algorithm, which we propose in this paper, is a parallelisation of the multilevelmetaheuristic algorithm described in the previous section. Our parallelisation is not aimedto reduce the runtime of the serial version (MLrMSATS), but rather to optimise as much aspossible, the quality of the partitions by taking advantage of the characteristics of both, themultilevel paradigm and the metaheuristic rMSATS.

The idea consists of building a set of p solutions. Each solution applies MLrMSATS ina different way. The coarsening phase is performed using the HEM strategy. Then, eachsolution applies GGA, starting at a randomly chosen vertex. Thus, each solution starts therefinement phase with its own initial partition. In the refinement phase, all the p solutionsuses its own annealing values, as we will describe later. In some iterations of the refinementphase, an elitist selection mechanism is applied in order to continue applying the searchwith the best solutions of the set and discarding the worse ones.

Figure 3 provides a graphical comparison amongst rMSATS, MLrMSATS and PMSATS.Figure 3(a) corresponds to rMSATS, where only a solution is optimised, after firstly obtain-ing the initial partition by using GGA, and then, by using rMSATS in the optimisation phase.Figure 3(b) corresponds to MLrMSATS, where again a solution is optimised, but this timeusing the multilevel paradigm. Finally, figure 3(c) corresponds to PMSATS, where a pop-ulation of p solutions is optimised simultaneously using parallel processing. Each solutionapplies HEM in the same way. In each solution, the initial partitioning is performed usingGGA starting from a different initial vertex, thus obtaining different initial partitions beforeperforming the refinement phase. Finally, the refinement phase is independently carriedout for each solution by applying rMSATS with their own annealing values (different fromthe others), although in same iterations there are interactions between the solutions thusenabling an elitist selection. Then, PMSATS continues improving the quality of the bestsolutions and discarding the worst ones. The main characteristics of PMSATS are detailedas follows:

• Each solution applies GGA to the coarsest graph starting from a vertex chosen by randominterval selection, as we detail: Let p be the number of solutions, then solution Pi , i =1 . . . p, makes the first partition starting from a vertex randomly chosen in the interval[( i−1

p · n) + 1, ( ip · n) + 1[. This strategy assures diversity in the initial partitions, even in

the case of irregular graphs.• In order to accommodate the effect of modifying the annealing parameters, each solution

uses a different initial temperature as the identifier, i , as we explain later. An interval ofinitial temperatures [Ti Min, Ti Max], and a fixed number of iterations, R, are established.The solution P1 starts in Ti Min, the solution Pp starts in Ti Max, and the others are equallydistributed along this interval. Then, Tfactor is computed for each solution in functionof R and its initial temperature. In figure 4, a clear example of this strategy is shown.Here, an interval of initial temperatures, Ti = [150, 50], and a fixed number of iterations,R = 1000, has been established. The solution A has the highest value of Ti , and Tfactoris low, thus determining a fast decrease in the temperature. On the other hand, solutionJ has the lowest value of Ti , and Tfactor is high, determining a slow decrease in thetemperature. With this strategy, our algorithm also provides a fair distribution of thework load thus avoiding the loss of efficiency that would occur if each process randomlychose the value of Ti and Tfactor.


Figure 3. Graphical description of rMSATS (a), MLrMSATS (b), and PMSATS (c).

• PMSATS uses an elitist selection mechanism to best utilise the computational resourcesin the optimisation of the best solutions obtained. In some iterations at the refinementphase, the quality of the solutions is competitively compared. In a given iteration, if w

solutions (w ≤ p) are evaluated, the winner of the tournament sends its current solutionto the others, which continue to apply the refinement process using this new solution,and together with their own annealing values. By using this elitist strategy, the algorithmcontinues working on the best solutions with different annealing parameters, insteadof exploring other less efficient solutions. To pursue this idea, we need to resolve twodifferent questions regard- ing the method of selection required to do it. The first questionrequires us to determine the best way to perform the migration amongst processors.

324 BANOS ET AL.

Figure 4. Parallel temperature variation using different values of Ti and Tfactor.

Figure 5. Migration strategies used by PMSATS: (a) STR1 and (b) STR2.

We have used two different strategies, with this purpose in mind. The first one, STR1,(figure 5(a)) is based on specific communication between the solution Pi and its closeneighbours, i.e., with Pi−1 and Pi+1 alternatively. The second strategy, called STR2,(figure 5(b)) is based on broadcasting the best solution of the set to the rest of theprocessors, and continuing the search with this new solution. The second question requiresus to determine the optimum migration frequency. Two different alternatives have beenimplemented, taking into account the characteristics of the heuristic within the multilevelscheme. The first one, M1, consists of migrating the solutions after each refinementstep. After applying rMSATS in the current level and before it is projected to the upper


one, the solutions are selected and migrated by using one of the proposed strategies.The second alternative, M2, allows communication to be held only in the highest level,described as follows: Let C be the number of communications and R the number ofrefinement iterations. Then, the communications are performed in the target graph onlyat the iterations { R

C , 2· RC , . . . , C · R

C }. This strategy makes an independent refinement ofthe solutions at all levels possible, except in the final level, where the solutions are better.In order to compare this strategy with M1, the value of C is set equal to the number ofrefinement levels. Thus, both alternatives have the same amount of communication.

Algorithm 3 presents the PMSATS procedure. In the first step, the input parameters areacquired. Step 2 consists of coarsening the target graph N levels by using the HEM strategy.N depends on z and K as previously stated. Step 3 calculates the annealing parametersfor use in the optimisation process. Step 4 obtains an initial solution (initial partition) ofthe graph by applying GGA using random interval selection in function of the solution’sidentifier. Step 5 is repeated in each one of the un-coarsening iterations until the target graph(graph of level 1) is finally optimised. In order to apply the new optimisation loop in thecurrent level, Step 5.a initialises the annealing parameters with the calculus of Step 3. Step5.b controls the number of iterations of rMSATS refinement process. In each cycle of theloop, the rMSATS algorithm is applied with its own parameters and variables. The elitistselection is performed using the input migration strategy whereby the selected migrationfrequency is M2, the current level is 1 (finest level), and the current iteration is one of theiterations where communication has been established. Once Step 5.b. has finished, and onlyif the selected migration frequency is M1, Step 5.c allows the communication amongst theprocessors. Then, the graph is ungrouped (Step 5); unless the last optimisation had beendone over the graph of level 1, i.e. the target graph. In this case, all the solutions are sent tothe master processor (Step 6), which returns the best of all the received solutions (Step 7).

Algorithm 3: Parallel Multilevel Simulated Annealing and Tabu Search (PMSATS).

1) Input: graph, K, max imb, z, p, Ti min, Ti max, R, mig strat, mig freq;2) Coarse graph N levels by using HEM in function of z and K ;3) Determine Ti and Tfactor for this solution; t = Ti;4) Obtain initial partition of graph by applying GGA;5) For i = N to 1 repeat

5.a) Re-initialise parameters using input values;5.b) While (current iteration ≤ R) do

5.b.1) Apply rMSATS(current iteration, max imb, t);5.b.2) If mig freq == M2 and current level == 1 and

current iteration is a communication iteration thenElitist Tournament Selection(mig strat);

5.c) If mig freq == M1 thenElitist Tournament Selection(mig strat);

6) Send best solution to the master;7) Return best solution received of all the slaves;

326 BANOS ET AL.

5. Experimental results

The PMSATS executions were performed by using a cluster of twelve dual Intel Xeon2.4 GHz processors. The test graphs used have different sizes and topologies. These graphsbelong to a public domain set frequently used to compare and evaluate graph-partitioningalgorithms. Table 1 briefly describes them: number of vertices, number of edges, maximumconnectivity (max) (number of neighbours to the vertex with the highest neighbourhood),minimum connectivity (min), average connectivity (avg) and file size.

Table 1. Set of test graphs used to evaluate the experimental results.

Graph |V| |E| min max avg File size (KB)

add20 2395 7462 1 123 6.23 63

data 2851 15093 3 17 10.59 140

3elt 4720 13722 3 9 5.81 136

uk 4824 6837 1 3 2.83 70

add32 4960 9462 1 31 3.82 90

whitaker3 9800 28989 3 8 5.92 294

crack 10240 30380 3 9 5.93 297

wing nodal 10937 75488 5 28 13.80 768

fe 4elt2 11143 32818 3 12 5.89 341

vibrobox 12328 165250 8 120 26.81 1679

bcsstk29 13992 302748 4 70 43.27 1679

4elt 15606 45878 3 10 5.88 501

fe sphere 16386 49152 4 6 6.00 540

cti 16840 48232 3 6 5.73 532

memplus 17758 54196 1 573 6.10 536

cs4 22499 43858 2 4 3.90 506

bcsstk30 28924 1007284 3 218 69.65 11403

bcsstk31 35588 572914 1 188 32.2 6547

bcsstk32 44609 985046 1 215 44.16 11368

t60k 60005 89440 2 3 2.98 1100

wing 62032 121544 2 4 3.92 1482

brack2 62631 366559 3 32 11.71 4358

finan512 74752 261120 2 54 6.99 3128

fe tooth 78136 452591 3 39 11.58 5413

fe rotor 99617 662431 5 125 13.3 7894

598a 110971 741934 5 26 13.37 9030

fe ocean 143437 409593 1 6 5.71 5242

wave 156317 1059331 3 44 13.55 13479

m14b 214765 1679018 4 40 15.64 21996


These test graphs, together with the best known solutions for them, can be found inWalshaw’s Graph Partitioning Archive (Graph Partitioning Archive (2003)). These solutionsindicate the number of cuts classified by levels of imbalance (0%, 1%, 3% and 5%). Thus,the reduction in the number of cuts is considered as an objective, while the imbalance degree(less than 5% in our experiments) is considered as a restriction. Under these conditions, thecost function described in (1), has parameters α = 1, and β = 0; with imbalance(k) ≤ 5for all k in the interval [1, K ].

5.1. Parameter setting

Figure 6 shows the results obtained by PMSATS by applying GGA, using 1, 5 and 20 differ-ent solutions, with the random interval selection for the initial vertex previously explained.All the solutions use the same temperature values, Ti = 100 and Tfactor = 0.995. Withthese parameters, the number of iterations of the algorithm is set to 1500. As we can see,the use of different initial partitions offers a diversity, which helps to improve the solutionby using fewer partitions.

Once it has been shown that, by using more solutions and by applying GGA from differentinitial vertices, the quality of the solutions often improves. We can analyse the performanceof the algorithm when the number of solutions and iterations has been modified. Figure 7compares the application of PMSATS with p = 10, R = 3000; p = 20, R = 1500; andp = 40, R = 750. Here, the annealing values are T i = 100 and Tfactor = 0.995. Asit can be seen in this figure, the best configuration corresponds to p = 20, R = 1500.The advantage of using more solutions comes from the improvement in the diversity of the

Figure 6. Effect of applying PMSATS with different number of initial partitions.

328 BANOS ET AL.

Figure 7. Effect of modifying the number of solutions and iterations.

searching process. However, the high complexity of the search requires the application ofrMSATS during many iterations. Thus, the selected parameters are p = 20, and R = 1500.

The determination of the interval of initial temperatures in the annealing process posesa problem. Neither the adequate size of the range nor its extreme values are known. Thenumber of solutions (p) is also an important factor, because whenever the value of pincreases it is supposed that the range should also be increased. Furthermore, the irregularityof the test graphs increases the dificulty of selecting adequate values for these parameters. Forthese reasons, the selection of an optimal interval becomes another NP-complete problem.To resolve this issue, we have applied the algorithm using a recursive division strategy.The idea consists of selecting a very large initial range, for example Ti∗ = [500, 2]. Thisinterval is recursively divided until none of the sub-intervals of a certain level improve thesolutions of one of the greater sub-intervals. This interval is then selected as the adequateinterval. Figure 8 shows the average number of cuts obtained by each interval after applyingthe algorithm over the test graphs for some values of K (K = {4, 16, 64}). For a number ofp = 20 solutions, none of the average number of cuts for the four smallest sub-intervals islower than the one obtained in the interval Ti∗ = [500, 250], which obtains the best averageresult. Therefore, we selected this interval for next experiments.

Besides the improvement obtained by using different initial vertices in GGA and dif-ferent annealing parameters, we have found an adequate range of values for Ti. Next,we determine the performance of the algorithm with respect to communication by usingthe migration strategies described in Section 4. In figure 9 the results obtained for thesemigration strategies are shown and the variation of the migration frequencies, is also anal-ysed. In these executions, a population of p = 20 solutions and R = 1500 iterations hasbeen considered, using the range of initial temperatures previously selected, Ti∗ = [500,

250].


Figure 8. Performance of PMSATS considering different temperature ranges.

Figure 9. Performance of PMSATS by using different migration strategies and migration frequencies.

The results obtained indicate that, in most cases, STR1 improves the average number ofcuts with respect to the STR2. The reason for this behaviour is that by sensible broadcastingbreaks, the independence of the search done by each solution is improved. With respectto the migration frequency, results indicate that it is better to use communication only inthe finest level rather than performing migration at each change of level. This behaviour is

330 BANOS ET AL.

similar to the effect of broadcasting migration, i.e., the migration in each change of levelbreaks the independence of the searching process.

5.2. PMSATS versus the previous approaches

Using Ti = 100 and Tfactor = 0.995 as annealing parameters, PMSATS executions obtainvery high quality partitions in comparison with rMSATS and MLrMSATS. For the subsetof test graphs used in the previous comparison, PMSATS obtains better solutions in all ofexecutions than rMSATS (Gil et al., 2002). PMSATS is also more efficient than MLrMSATS(Banos et al., 2003). In 90% of executions PMSATS, improves the results of MLrMSATS,while the same partition is obtained in 6% of them.

5.3. PMSATS versus other public domain packages

In this section, we compare PMSATS against other formerly proposed algorithms. Aswe commented previously, PMSATS uses parallel processing to perform the multilevelalgorithm (MLrMSATS). We therefore compared PMSATS with two versions of JOSTLE, ahigh quality multilevel graph-partitioning algorithm. Further, we compare PMSATS againstParMETIS, a powerful parallel graph partitioning library.

With reference to JOSTLE, we compare PMSATS against two versions of JOSTLE,JOSTLE Evolutionary, and iterated JOSTLE. JOSTLE models use a multilevel paradigmapproach. The principal characteristic of JOSTLE is that it applies a variant of the KL method(Kernighan and Lin, 1970) procedure during the refinement phase, by using two differentbucket sorting structures to perform the movements of boundary vertices in function of theirgains.

The basic idea of JOSTLE Evolutionary (JE) (Soper, 2000) is that each vertex is assigneda bias greater than or equal to zero, depending of its position in reference to the boundary, andeach edge a dependent weight, being a sum of its end vertices. With these values, in the coars-ening phase of JOSTLE, the edges with the highest weights are matched first (as HEM works)and when performing the refinement phase, vertex gains are calculated using the biased edgeweights. The effect is that vertices with small bias are more likely to appear at the boundaryof a sub-domain rather than those with a large bias, and edges with lower weights are morelikely to become cut edges rather than those with higher weight. The evolutionary schemeis based on obtaining successive offspring, from the evolutionary search, whose crossoverand mutation operators are dependent on the biases of the individuals of each generation.

On the other hand, we have also compared PMSATS versus iterated JOSTLE (iJ)(Walshaw, 2001). iJ is based on the alternative application of the multilevel algorithmJOSTLE. In each iteration of iJ, the multilevel process is performed using information fromthe previous iteration, improving the previous solution. The iterative process finishes whenafter a given number of iterations there is no further improvement to the quality of thesolution. Finally, we also compare the results of PMSATS versus ParMETIS. ParMETIS(ParMETIS, 2003) is an MPI-based parallel library which implements a variety of algorithmsfor the partitioning of unstructured graphs, meshes, and for computing fill-reducing order-ings of sparse matrices. ParMETIS extends the functionality provided by METIS (Karypis


and Kumar, 1998a) and includes routines which are especially suited for parallel AMRcomputations and large scale numerical simulations. The set of algorithms implementedin ParMETIS are based on the parallel multilevel k-way graph partitioning algorithms de-scribed in Karypis and Kumar (1996), the adaptive repartitioning algorithm described inSchloegel, Karypis, and Kumar (2000b), and the parallel multi-constrained algorithms de-scribed in Schloegel, Karypis, and Kumar (2000a). Thus, we compare PMSATS against theparallel multilevel k-way graph partitioning algorithm provided by ParMETIS (ParMETISv.3.1.0).

Tables 2 and 3 show the best results obtained by PMSATS versus ParMETIS library(ParMETIS 2003), and also with the best known solutions obtained by other algorithms(Graph Partitioning Archive (2003)), over all the test graphs included in Table 1, and withan imbalance of less than 5%. In comparison with ParMETIS, PMSATS obtains betterresults in 95% of cases, whilst ParMETIS only proves to be better in 1% of executions. Inthe other 4% of the cases, PMSATS and ParMETIS obtain the same partition. On the otherhand, PMSATS obtains better results than the previously best known solutions in 40% ofexecutions, and equals the best known solutions in 12% of cases. Most of these best knownsolutions included in the Graph Partitioning Archive have been obtained by JE and iJ.

The run-times of PMSATS executions can vary from few seconds to several hours,dependent on the graph. In comparison with ParMETIS, the run-times of PMSATS arelarger by approximately two orders of magnitude. Nevertheless, run-times of JE and iJ arelarger to PMSATS (e.g. run-times of JE for large graphs are of several days).

Figure 10 shows the number of the best known solutions obtained by each one of thealgorithms included in Graph Partitioning Archive (2003), with imbalance of less than 5%,over all the test graphs described in Table 1 for K = {2, 4, 8, 16, 32, 64}. If two or moredifferent algorithms obtain the same number of cuts for a certain graph and the value of

Figure 10. Number of best known solutions found by each algorithm included in Walshaw’s Graph PartitioningArchive.

332 BANOS ET AL.

Table 2. Comparison of PMSATS, ParMETIS and best know solutions with imbalance less than 5%.

(K = 2) (K = 4) (K = 8) (K = 16) (K = 32) (K = 64)Graph Algorithm cuts cuts cuts cuts cuts cuts

add20 PMSATS 638 1184 1709 2107 2583 3182ParMETIS 778 1327 2053 3060 3790 3927Graph Part. Archive 618 1184 1705 2186 2785 3266

data PMSATS 189 391 681 1147 1815 2803ParMETIS 225 468 871 1441 4415 7668Graph Part. Archive 196 378 702 1195 1922 2911

3elt PMSATS 87 199 336 567 956 1535ParMETIS 108 266 448 884 3328 4178Graph Part. Archive 87 199 334 566 958 1552

uk PMSATS 21 47 93 166 288 466ParMETIS 24 75 147 334 527 795Graph Part. Archive 18 41 82 154 265 436

add32 PMSATS 10 36 67 150 246 597ParMETIS 10 33 309 462 677 1130Graph Part. Archive 10 33 69 117 212 624

whitaker3 PMSATS 126 381 658 1100 1698 2544ParMETIS 132 489 862 1404 5425 6110Graph Part. Archive 126 380 658 1092 1686 2535

crack PMSATS 182 361 676 1083 1690 2540ParMETIS 209 412 845 1296 1988 2883Graph Part. Archive 183 360 676 1082 1679 2590

wing nodal PMSATS 1669 3564 5378 8332 11814 15789ParMETIS 1908 3887 5929 9164 12787 17374Graph Part. Archive 1970 3566 5387 8316 12024 16102

fe 4elt2 PMSATS 130 349 601 1012 1641 2520ParMETIS 130 392 710 1143 1831 2796Graph Part. Archive 130 349 597 1007 1651 2516

vibrobox PMSATS 10630 20050 24338 33460 41356 47149ParMETIS 12802 21239 28701 37882 45311 51552Graph Part. Archive 10310 19245 24158 31695 41176 50757

bcsstk29 PMSATS 2818 8388 15047 23235 34843 56120ParMETIS 2958 9617 18840 27456 41680 60938Graph Part. Archive 2818 8088 15314 24706 36731 58108

4elt PMSATS 137 322 532 939 1554 2583ParMETIS 163 387 652 1103 1835 2938Graph Part. Archive 137 319 527 916 1537 2581

fe sphere PMSATS 384 776 1193 1719 2575 3623ParMETIS 458 906 1395 2081 2966 4142Graph Part. Archive 384 766 1152 1692 2477 3547

cti PMSATS 318 889 1727 2781 4034 5738ParMETIS 459 1104 2212 3452 4963 6753Graph Part. Archive 318 917 1716 2778 4236 5907


Table 3. Comparison of PMSATS, ParMETIS and best know solutions with imbalance less than 5% (cont.).

(K = 2) (K = 4) (K = 8) (K = 16) (K = 32) (K = 64)Graph Algorithm cuts cuts cuts cuts cuts cuts

memplus PMSATS 5333 9393 11883 13939 15380 16761ParMETIS 6143 10511 12703 15348 19636 21338Graph Part. Archive 5353 9427 11939 13279 14384 17409

cs4 PMSATS 361 979 1535 2236 3210 4317ParMETIS 435 1207 1877 2704 3654 4915Graph Part. Archive 363 936 1472 2126 3080 4196




t60k PMSATS 69 211 483 889 1473 2322ParMETIS 96 247 545 1021 1663 2507Graph Part. Archive 72 211 467 852 1420 2221

wing PMSATS 787 1703 2664 4170 5980 8328ParMETIS 995 1989 3169 4975 7113 9536Graph Part. Archive 778 1636 2551 4015 6010 8161

brack2 PMSATS 660 2749 7156 11858 18005 25929ParMETIS 783 3284 7988 13440 20717 29677Graph Part. Archive 668 2808 7080 11958 17954 26944

finan512 PMSATS 162 324 648 1620 2592 17681ParMETIS 162 324 648 1296 2592 11956Graph Part. Archive 162 324 648 1296 2592 10821

fe tooth PMSATS 3839 6942 11568 17771 25528 34795ParMETIS 4416 8383 13566 20510 28497 39591Graph Part. Archive 3982 7152 12646 18435 26016 36030

fe rotor PMSATS 1956 7757 13651 20674 32616 46366ParMETIS 2238 8242 14838 23548 36769 52236Graph Part. Archive 1974 8097 13184 20773 33686 47852

598a PMSATS 2336 8024 15685 25775 39098 56883ParMETIS 2555 8646 17441 28564 44099 63516Graph Part. Archive 2339 7978 16031 26257 40179 58307

fe ocean PMSATS 312 1805 4548 8060 13007 20709ParMETIS 557 2504 6722 12371 19091 27264Graph Part. Archive 311 1704 4019 7838 12746 21784

wave PMSATS 8610 16681 29292 43029 62585 84419ParMETIS 9847 19028 34945 48716 69916 94018Graph Part. Archive 8868 18058 30583 44625 63725 88383

m14b PMSATS 3842 13401 27468 43501 66942 97143ParMETIS 4219 14607 28689 49184 74007 108724Graph Part. Archive 3866 14013 27711 44174 68468 101385

334 BANOS ET AL.

K , then the winner is the algorithm that hast the most balanced partition. If two or morealgorithms obtain the same number of cuts, and the imbalance is also the same, then thefirst algorithm to find the solution is considered the winner. As we can see in the figure 10,PMSATS is clearly the best algorithm when the number of optimal solutions is considered.In some cases, PMSATS also provides the same best-known partition. JE and iJ also obtainan important number of recognised solutions. The rest of the algorithms only obtain betterresults in a smaller number of cases.

6. Conclusions

In this paper, we present a new parallel multilevel metaheuristic algorithm for static graphpartitioning. This parallel algorithm uses a multilevel scheme that includes a hybrid meta-heuristic which mixes Simulated Annealing and Tabu Search along the search process. Theinclusion of this hybrid metaheuristic within the multilevel scheme, in many cases, outper-forms other multilevel approaches based on the use of the KL algorithm or other variants.The parallel implementation is focused upon improving the quality of the partitions asmuch as possible. For this purpose, the parallel algorithm simultaneously optimises sev-eral solutions. Each solution evolves independently applying the multilevel paradigm withdifferent annealing parameters. Eventually, during the refinement phase, an elitist selectionmechanism is used in order to best utilise the computational resources in the search for thebest solutions. The first conclusion derived from the results is that the diversity of the initialpartitions is essential in the search process. The selection of the best parameters for thehybrid heuristic is made dificult by the characteristics of the problem, and requires differentvalues in parallel. The consequences of modifying the number of solutions and the numberof iterations in the refinement phase of the multilevel algorithm have also been analysed.Furthermore, several migration strategies have been considered, using different migrationfrequencies. As a result of this analysis, we have designed a robust parallel multilevel meta-heuristic algorithm for graph partitioning whose solutions, in most cases, improve or isequal to, those obtained by other previously proposed efficient algorithms.

Acknowledgments

The authors would like to thank anonymous referees for their helpful comments. This workwas supported by project TIC2002-00228 (CICYT, Spain).

References

Alpert, C.J. and A. Kahng. (1995). “Recent Developments in Netlist Partitioning: A Survey.” Integration: the VLSIJournal 19(1/2), 1–81.

Banerjee, P. (1994). Parallel Algorithms for VLSI Computer Aided Design. Prentice Hall: Englewood Cliffs,New Jersey.

Banos, R., C. Gil, J. Ortega, and F.G. Montoya. (2003). “Multilevel Heuristic Algorithm for Graph Partitioning.” InProceedings Third European Workshop on Evolutionary Computation in Combinatorial Optimization. Springer-Verlag, LNCS 2611, pp. 143–153.


Cung, V.D., S.L. Martins, C.C. Ribeiro, and C. Roucairol. (2001). “Strategies for the Parallel Implementation ofMetaheuristics.” In C.C. Ribeiro and P. Hansen (eds.), Essays and Surveys in Metaheuristics. Kluwer, pp. 263–308.

Diekmann, R., R. Luling, B. Monien, and C. Spraner. (1996). “Combining Helpful Sets and Parallel SimulatedAnnealing for the Graph-Partitioning Problem.” Parallel Algorithms and Applications 8, 61–84.

Garey, M.R. and D.S. Johnson. (1979). Computers and Intractability: A Guide to the Theory of NP-Completeness.San Francisco: W.H. Freeman & Company.

Gil, C., J. Ortega, and M.G. Montoya. (2000). “Parallel VLSI Test in a Shared Memory Multiprocessors.” Con-currency: Practice and Experience 12(5), 311–326.

Gil, C., J. Ortega, M.G. Montoya, and R. Banos. (2002). “A Mixed Heuristic for Circuit Partitioning.” Computa-tional Optimization and Applications Journal 23(3), 321–340.

Gilbert, J., G. Miller, and S. Teng. (1998). “Geometric Mesh Partitioning: Implementation and Experiments.”SIAM Journal on Scientific Computing 19(6), 2091–2110.

Glover, F. and M. Laguna. (1993). “Tabu Search.” In C.R. Reeves (ed.), Modern Heuristic Techniques for Combi-natorial Problems. London: Blackwell, pp. 70–150.

Graph Partitioning Archive. (2003). http://www.gre.ac.uk/∼c.walshaw/partition/. URL time: August 31th, 2003,2345.

Guo, J., G. Trinidad, and N. Smith. (2000). “MOZART: A Multi-Objective Zoning and AggRegation Tool.” InProceedings 1st Philippine Computing Science Congress. pp. 197–201.

Karypis, G. and V. Kumar. (1996). “Parallel Multilevel k-way Partitioning Scheme for Irregular Graphs.” TechnicalReport TR 96-036, Dept. of Computer Science, University of Minnesota, Minneapolis.

Karypis, G. and V. Kumar. (1998). “A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs.”SIAM Journal on Scientific Computing 20(1), 359–392.

Karypis, G. and V. Kumar. (1998). “A Parallel Algorithm for Multilevel Graph Partitioning and Sparse MatrixOrdering.” Journal of Parallel and Distributed Computing 48(1), 71–95.

Kernighan, B.W. and S. Lin. (1970). “An Efficient Heuristic Procedure for Partitioning Graphs.” Bell SystemsTechnical Journal 49(2), 291–307.

Kirkpatrick, S., C.D. Gelatt, and M.P. Vecchi. (1983). “Optimization by Simulated Annealing.” Science 220(4598),671–680.

Klenke, R.H., R.D. Williams, and J.H. Aylor. (1992). “Parallel-Processing Techniques for Automatic Test PatternGeneration.” IEEE Computer 25(1), 71–84.

Mobasher, B., N. Jain, E.H. Han, and J. Srivastava. (1996). “Web Mining : Pattern Discovery from World WideWeb Transactions.” Technical Report TR-96-050, Department of computer science, University of Minnesota,Minneapolis.

ParMETIS. (2003). http://www-users.cs.umn.edu/∼karypis/metis/parmetis/index.html. URL time: September 1st,2003, 0020.

Randall, M. and A. Abramson. (1999). “A Parallel Tabu Search Algorithm for Combinatorial Optimisation Prob-lems.” In Proceedings of the 6th Australasian Conference on Parallel and Real Time Systems. Springer-Verlag,pp. 68–79.

Schloegel, K., G. Karypis, and V. Kumar. (2000). “Parallel Multilevel Algorithms for Multi-constraint GraphPartitioning.” In Proceedings of 6th International Euro-Par Conference. Springer-Verlag, LNCS 1900, pp. 296–310.

Schloegel, K., G. Karypis, and V. Kumar. (2000). “A Unified Algorithm for Load-balancing Adaptive ScientificSimulations.” In Proceedings of the 2000 ACM/IEEE Conference on Supercomputing.

Schloegel, K., G. Karypis, and V. Kumar. (2001). “Wavefront Diffusion and LMSR: Algorithms for DynamicRepartitioning of Adaptive Meshes.” IEEE Transactions on Parallel and Distributed Systems 12(5), 451–466.

Shekhar, S. and D.R. DLiu. (1996). “Partitioning Similarity Graphs: A Framework for Declustering Problems.”Information Systems Journal 21(6), 475–496.

Simon, H.D. (1991). “Partitioning of Unstructured Problems for Parallel Processing.” Computing Systems inEngineering 2–3(2), 135–148.

Soper, A.J., C. Walshaw, and M. Cross. (2000). “A Combined Evolutionary Search and Multilevel OptimisationApproach to Graph Partitioning.” In Proceedings of the Genetic and Evolutionary Computation Conference.pp. 674–681.

336 BANOS ET AL.

Walshaw, C. and M. Cross. (2000). “Mesh Partitioning: A Multilevel Balancing and Refinement Algorithm.” SIAMJournal on Scientific Computing 22(1), 63–80.

Walshaw, C. and M. Cross. (2000). “Parallel Optimisation Algorithms for Multilevel Mesh Partitioning.” ParallelComputing 26(12), 1635–1660.

Walshaw, C. (2001). “Multilevel Refinement for Combinatorial Optimisation Problems.” Technical Report01/IM/73, Computing Mathematical Science, University of Greenwich, London.

a parallel multilevel metaheuristic for graph partitioning

Documents