distributed evolutionary multi-objective mesh-partitioning algorithm for parallel finite element...

13
Distributed evolutionary multi-objective mesh-partitioning algorithm for parallel finite element computations A. Rama Mohan Rao * Structural Engineering Research Centre, Council of Scientific and Industrial Research, Tarmani, Chennai 600113, India article info Article history: Received 9 January 2009 Accepted 6 May 2009 Available online 4 June 2009 Keywords: Mesh partitioning Evolutionary computing Multi-objective optimisation Pareto solutions Aspect ratio Unstructured meshes abstract Majority of the mesh-partitioning algorithms attempt to optimise the interprocessor communications, while balancing the computational load among the processors. However, it is desirable to simultaneously optimise the submesh aspect ratios in order to significantly improve the convergence characteristics of the domain decomposition based Preconditioned-conjugate-gradient algorithms, being used extensively in the state-of-the-art parallel finite element codes. Keeping this in view, a new distributed multi-objec- tive mesh-partitioning algorithm using evolutionary computing techniques is proposed in this paper. Effectiveness of the proposed distributed mesh-partitioning algorithm is demonstrated by solving several unstructured meshes of practical-engineering problems and also benchmark problems. Ó 2009 Elsevier Ltd. All rights reserved. 1. Introduction Domain decomposition based parallel finite element method/ computational fluid dynamic applications require efficient mesh- partitioning algorithms to balance the computational load and also maintain minimal inter-processor communications. The problem of finite element mesh partitioning is equivalent to partitioning the associate graph [1,2] of the targeted finite element mesh into sub graphs of roughly equal size such that the partitions cut the least number of edges of the graph. With the additional constraint of minimising the number of cut edges (i.e., minimising the total interface length in the case of FE-mesh partitioning), the problem is NP-complete [3]. In view of this, much attention has been fo- cused on developing suitable heuristics, and some powerful graph (mesh) partitioning methods have been developed in the past [4] and are being used in practical applications. It is well known that the state-of-the-art parallel FEM/CFD algo- rithms require optimally shaped submeshes in order to improve the convergence characteristics of the solvers and there by im- prove the overall computational performance of the application. At the same time the parallel applications needs to minimise the inter processor communication overheads. The existing graph (or mesh) partitioning algorithms optimise the computational load balance of submeshes and at the same time minimise the number of cut edges. However, some of the recent works attempt to mini- mise the submesh aspect ratio [5–7] instead of cut edges. Even though, the recursive bisection algorithms deals the issue of opti- mising the shape of the submesh implicitly while optimising the communication volume, there is no known algorithm, which simultaneously optimise both the desirable objectives, i.e., sub- mesh shape and cut edges. This paper attempts to simultaneously optimise both these mutually conflicting objectives using multi- objective evolutionary computing techniques. 1.1. Previous related work There are a limited number of papers about topics related to development of mesh partitioning techniques employing evolu- tionary algorithms (EA) or genetic algorithms (GA) [8–12]. Khan and Topping [8,9] have used genetic algorithms for partitioning the finite element meshes. However, their approach used a popula- tion of cutting planes which bisected the finite element domain. A well-balanced partition is not sought by the technique, since it was designed for short run-times and thus used as an estimation of the number of elements to appear in final refined sub-meshes. Soper et al. [13] have devised a graph partitioning technique by combining a multi-level algorithm with an evolutionary search procedure, which is reported to have better quality in terms of cut edges with higher runtime. Rama Mohan Rao et al. [14] and Kaveh and Rahimi [15], Chevalier and Pellegrini [16] have devised mesh-partitioning algorithms by employing genetic algorithms with in the frame work of multi- level approaches. Other popular meta-heuristic algorithm called Ant colony optimisation (ACO) is also employed for graph partitioning in the literature [17,18]. Recently Kaveh and Shojaee [19] have combined ACO with genetic algorithms to 0045-7949/$ - see front matter Ó 2009 Elsevier Ltd. All rights reserved. doi:10.1016/j.compstruc.2009.05.006 * Tel.: +91 44 22549184; fax: +91 44 22541508. E-mail addresses: [email protected], [email protected] Computers and Structures 87 (2009) 1461–1473 Contents lists available at ScienceDirect Computers and Structures journal homepage: www.elsevier.com/locate/compstruc

Upload: a-rama-mohan-rao

Post on 26-Jun-2016

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Distributed evolutionary multi-objective mesh-partitioning algorithm for parallel finite element computations

Computers and Structures 87 (2009) 1461–1473

Contents lists available at ScienceDirect

Computers and Structures

journal homepage: www.elsevier .com/locate /compstruc

Distributed evolutionary multi-objective mesh-partitioning algorithmfor parallel finite element computations

A. Rama Mohan Rao *

Structural Engineering Research Centre, Council of Scientific and Industrial Research, Tarmani, Chennai 600113, India

a r t i c l e i n f o a b s t r a c t

Article history:Received 9 January 2009Accepted 6 May 2009Available online 4 June 2009

Keywords:Mesh partitioningEvolutionary computingMulti-objective optimisationPareto solutionsAspect ratioUnstructured meshes

0045-7949/$ - see front matter � 2009 Elsevier Ltd. Adoi:10.1016/j.compstruc.2009.05.006

* Tel.: +91 44 22549184; fax: +91 44 22541508.E-mail addresses: [email protected], arm@serc

Majority of the mesh-partitioning algorithms attempt to optimise the interprocessor communications,while balancing the computational load among the processors. However, it is desirable to simultaneouslyoptimise the submesh aspect ratios in order to significantly improve the convergence characteristics ofthe domain decomposition based Preconditioned-conjugate-gradient algorithms, being used extensivelyin the state-of-the-art parallel finite element codes. Keeping this in view, a new distributed multi-objec-tive mesh-partitioning algorithm using evolutionary computing techniques is proposed in this paper.Effectiveness of the proposed distributed mesh-partitioning algorithm is demonstrated by solving severalunstructured meshes of practical-engineering problems and also benchmark problems.

� 2009 Elsevier Ltd. All rights reserved.

1. Introduction

Domain decomposition based parallel finite element method/computational fluid dynamic applications require efficient mesh-partitioning algorithms to balance the computational load and alsomaintain minimal inter-processor communications. The problemof finite element mesh partitioning is equivalent to partitioningthe associate graph [1,2] of the targeted finite element mesh intosub graphs of roughly equal size such that the partitions cut theleast number of edges of the graph. With the additional constraintof minimising the number of cut edges (i.e., minimising the totalinterface length in the case of FE-mesh partitioning), the problemis NP-complete [3]. In view of this, much attention has been fo-cused on developing suitable heuristics, and some powerful graph(mesh) partitioning methods have been developed in the past [4]and are being used in practical applications.

It is well known that the state-of-the-art parallel FEM/CFD algo-rithms require optimally shaped submeshes in order to improvethe convergence characteristics of the solvers and there by im-prove the overall computational performance of the application.At the same time the parallel applications needs to minimise theinter processor communication overheads. The existing graph (ormesh) partitioning algorithms optimise the computational loadbalance of submeshes and at the same time minimise the numberof cut edges. However, some of the recent works attempt to mini-mise the submesh aspect ratio [5–7] instead of cut edges. Even

ll rights reserved.

m.csir.res.in

though, the recursive bisection algorithms deals the issue of opti-mising the shape of the submesh implicitly while optimising thecommunication volume, there is no known algorithm, whichsimultaneously optimise both the desirable objectives, i.e., sub-mesh shape and cut edges. This paper attempts to simultaneouslyoptimise both these mutually conflicting objectives using multi-objective evolutionary computing techniques.

1.1. Previous related work

There are a limited number of papers about topics related todevelopment of mesh partitioning techniques employing evolu-tionary algorithms (EA) or genetic algorithms (GA) [8–12]. Khanand Topping [8,9] have used genetic algorithms for partitioningthe finite element meshes. However, their approach used a popula-tion of cutting planes which bisected the finite element domain. Awell-balanced partition is not sought by the technique, since it wasdesigned for short run-times and thus used as an estimation of thenumber of elements to appear in final refined sub-meshes. Soper etal. [13] have devised a graph partitioning technique by combininga multi-level algorithm with an evolutionary search procedure,which is reported to have better quality in terms of cut edges withhigher runtime. Rama Mohan Rao et al. [14] and Kaveh and Rahimi[15], Chevalier and Pellegrini [16] have devised mesh-partitioningalgorithms by employing genetic algorithms with in the framework of multi- level approaches. Other popular meta-heuristicalgorithm called Ant colony optimisation (ACO) is also employedfor graph partitioning in the literature [17,18]. Recently Kavehand Shojaee [19] have combined ACO with genetic algorithms to

Page 2: Distributed evolutionary multi-objective mesh-partitioning algorithm for parallel finite element computations

1462 A. Rama Mohan Rao / Computers and Structures 87 (2009) 1461–1473

devise a mesh-partitioning algorithm. However, there is no knownapproach for mesh partitioning, employing multi-objective evolu-tionary algorithms.

The problem of multi-objective optimisation within the contextof graph (or mesh) partitioning has been studied recently in the lit-erature for circuit partitioning by Adabei et al. [20], Schloegel et al.[21] and Selvakumaran and Karypis [22]. Two general approacheshave been proposed in the literature to combine the multipleobjectives during mesh partitioning. The first approach keeps thedifferent objectives separate and couples them by assigning differ-ent priorities to each of them. Essentially in this scheme, a solutionthat optimizes the highest priority objective the most, is alwayspreferred and lower priority objectives are used as tie breakers(i.e., used to select among equivalent solutions in terms of thehigher priority objectives). The second approach creates an explicitmulti-objective function that numerically combines the individualobjective functions, a typical weighted aggregating approach. Inthis scheme, the choice of the weight values is used to determinethe relative importance of various objectives. The major problemwith this approach is the difficulty involved in setting theappropriate weight to each objective function. The problem getscompounded if the objective functions are incommensurable andalso when there is a large variation in the magnitude of eachfunction.

1.2. Present work

In the present work, the problem of mesh partitioning is formu-lated as a multi-objective optimisation problem and there by theneed for using arbitrary weighting factors for prioritising the rela-tive importance of each objective is eliminated. An advantage ofthe multi-objective identification methodology is that all admissi-ble solutions in the parameter space are obtained which constitutetradeoffs in fitting the different objectives. These solutions are con-sidered optimal in the sense that the solution obtained with anobjective value of first objective cannot be improved without dete-riorating the objective value of another objective. The optimalpoints along the Pareto trade-off front provide detailed informa-tion about the quality of fit in the corresponding Pareto optimalmodels. The set of Pareto optimal solutions can be obtained usingevolutionary algorithms which are found to be well-suited to solvethe multi-objective optimization problem. In the present work, anevolutionary computing technique has been employed to devise amesh-partitioning algorithm for simultaneous optimisation ofsubmesh aspect ratio and also interprocessor communicationoverhead.

2. Cost functions employed for multi-objective meshpartitioning

The two cost functions considered in the present work are opti-mizing the shape of the generated submesh and minimisation ofthe inter-processor communications. In order to formulate the costfunction for optimizing the shape of the submesh, it is appropriateto first arrive at a good definition for submesh aspect ratio (AR).There are several possible definitions for AR and for a given sub-mesh a more conventional definition is the ratio of the longest toshortest boundary edge Lmax

Lmin

� �or area of the smallest circle (pR2

o)

containing the submesh to the area of the largest inscribed circle

ðpR2i Þ, i.e., R2

o

R2i. A detailed discussion on the relative merits of

different ways of measuring AR is given in Diekmann et al. [5,6]and in the present work, similar ideas are used to define the sub-mesh aspect ratio. The aspect ratio is defined as the ratio of theperimeter of the submesh to the ideal perimeter. Based on theabove definition, the AR can be evaluated for a submesh, with area

as XS for two dimensional (2-d) problem (XS is volume for threedimensional (3-d) problem) and perimeter length oS for 2-d prob-lem (oS is surface area for 3-d problem).

If the ideal shape is assumed as square for 2-d problem and

cube for 3-d problem, then AR can be defined as @S4ffiffiffiffiXSp and @S

4ðXSÞ2=3,

respectively, for 2-d and 3-d problems. On the similar lines, if theideal shape is chosen as circle for 2-d problem and sphere for 3-dproblem, the AR can be defined as @S

2ffiffiffiffiffiffiffipXSp and @S

p1=362=3ðXSÞ2=3, respec-

tively, for 2-d and 3-d problems. From the above, we can arriveat the following generalized expression for both 2-d and 3-dproblems:

AR ¼ 1C

@S

ðXSÞd�1

d

ð1Þ

where d is the dimension of the problem (d = 2 for 2-d or 3 for 3-dproblems) and C is the constant defined for 2-d problems as 1/4 and1/(2p1/2), respectively, for square and circular shapes. Similarly, C is1/4 and 1/(36p)1/3 for cube and sphere, respectively.

In the present work, it is preferred to choose the circle/sphere-based formula since they guarantee that the AR of any shape is P1.With this, the cost function for generating shape optimised sub-meshes can be defined as:

CAR ¼ minimise max1C

@Sp

ðXSpÞd�1

d

!" #where c ¼ p1

dð2dÞd�1

d ð2Þ

CAR is the cost function to be minimised and subscript p in Eq. (2)defines a particular submesh. It is appropriate to point out herethat, even though the AR definition given here is similar to theone considered in earlier related works [5,7], the cost function con-sidered is different. For example, Walshaw et al. [7] have formu-lated the cost function to optimise the average aspect ratio. Hencein that particular work, the attempt is to optimise the shape ofthe generated submeshes in an average sense. Further, they alsohave employed a much simplified cost function by assuming conve-niently that the volume of each submesh is approximately same.This assumption need not necessarily be valid.

In majority of traditional mesh-partitioning algorithms, thecommunication volume is related to edge cut volume and thisparameter is minimised to optimise the inter-processor communi-cations. However, it has been established that the edge cuts are notproportional to the total communication volume [23,24] and al-ways tend to over-estimate the true volume of communication.The basic problem with the edge cut metric is that several edgescan describe the same need for data transfer. The datum need tobe communicated only once, irrespective of the number of verticeson the other processor requires the data and therefore result inover-estimate of the true volume of communication. This also im-plies that reducing the edge cut might not reduce the true commu-nication volume. Keeping this in view, in the present work, we tryto explicitly minimise the true communication volume. For an un-weighted graph, the communication volume is the number of ver-tices which have neighbours in another partition. This can bearrived at by scanning through the interface edges of the parti-tioned graph and arriving at the number of vertices which haveneighbours in the other partitions. Accordingly, the true communi-cation volume can be represented as the number of vertices whichhave neighbours in the other partitions. However, this number isreferred to as cut edges in the tables presented in this paper, in or-der to follow the traditional practice of representing the communi-cation volume in the form of cut edges. Further, the performance ofparallel application is limited by the slowest processor. Even if thecomputational work is well balanced, the communication volumeneed not be perfectly balanced. Hence, in order to maintain uni-form communication volume, it is attempted to minimise the max-imum communication volume by a single processor.

Page 3: Distributed evolutionary multi-objective mesh-partitioning algorithm for parallel finite element computations

A. Rama Mohan Rao / Computers and Structures 87 (2009) 1461–1473 1463

3. Graph representation and its attributes

It is well known that any finite element mesh can be repre-sented as weighted or non-weighted associate graph dependingupon the requirement of the partitioning algorithm. However inorder to use Eq. (2) in the optimisation process, some additionalattributes need necessarily to be added to the derived associategraph.

Fig. 1 shows a very simple mesh and its associate graph. Eachelement ‘e’ of the mesh corresponds to a vertex ‘v’ in the graph.The vertices of the graph can be weighted as usual but in addition,vertices store the volume and total surface of their correspondingelement (for example Xv1 = Xe1 and ov1 = oe1). Similarly the edgesof the graph are also weighted with the size of the surface they cor-respond to. Thus in Fig. 1 if D(b, d) refers to the distance betweenthe points b and d, then the weight of the edge(v1, v2) is set toD(b, d). In this way, for vertices vi corresponding to elements whichhave no exterior surface, the sum of their edge weights is equiva-lent to their surface ((ovi =

PE|(vi, vj)|). Thus for the vertex v2,

ov2 = oe2 = D(b, d) + D(b, e) + D(e, d) = |(v2, v1)| + |(v2, v3)| + |(v2, v6)|.When elements are combined together into submeshes, these

properties, volume and surface can easily be combined. Thus inFig. 1c, where S1 = e1 + e2, S2 = e3 + e4, S3 = e5 + e6 and S4 = e7 + e8.Similarly the volumes can be directly summed. For example,XS1 = Xe1 + Xe2 = Xv1 + Xv2. The surface of a combined object(submesh) S1 is the sum of the surfaces of its constituent partsless twice the interior surface, e.g., oS1 = oe1 + oe2 � 2 � D(b, d) =ov1 + ov2 � 2|(v1, v2)|. These properties are very similar to proper-ties in conventional graph algorithms, where volume combines inthe same way as weight and surfaces combine as the sum of edgeweights. Once the volume and surface of each submesh iscomputed the AR of the submesh can be evaluated using the costfunction given in Eq. (2).

4. Problem formulation

One of the effective ways of formulating the problem of meshpartitioning (i.e., partitioning of the associated graph) employingevolutionary algorithms is by combining them with multi-level

i g

a

d f

h

e

b c

e5

e6 e7

e2 e3

e1 e4

e8

d f

h

e

b c

S3

S1 S2

S4

a

g i

v1

v2 v3

v4

v5

v6 v7

v8

(a) A simple finite element mesh (b) Graph

(c) Partitioned into four submeshes

Fig. 1. Graph representation of a simple finite element mesh.

algorithm [14–16]. When evolutionary algorithm (EA) is synthe-sized with multi-level algorithms, the evolutionary algorithmsneed to apply in each coarsened level. Since EA is relatively muchslower than their compatriot local refinement algorithms like KL[25], Mob or other local refinement heuristics, the over all compu-tational time is likely to be large. Moreover, in this formulation, thechromosome size is equal to the number of vertices in the graph,which may lead to memory and also convergence problems, whilesolving very large problems even with multi-level approaches.Apart from this, the multi-level algorithms are sensitive to thegraph coarsening methods. Keeping all these things in view, analternative formulation has been employed in the present work.

The current formulation requires the positional parameters ofthe vertices. Since the algorithms are to be applied for partitioningfinite element meshes, the positional parameters, i.e., the (x, y, z)values of each vertex of the corresponding associate graph can beobtained by averaging the nodal coordinates of the correspondingelement. Two arbitrary points are chosen within the extents of thedomain as illustrated in Fig. 2 and are allowed to float around thegraph. The field separators computed for each vertex, using thelocation of these two arbitrary points are used to bisect the graphinto two equal parts by assigning equal number of vertices to eachpartition using the field separators. The objective here is to opti-mise the position of the two arbitrary points; floating around thegraph in such a way that the graph partitioned using the resultingfield separators, optimises the desired objectives like cut edges andshape. The analogy based on the static electric field around acharged particle (Coulomb’s law) is used to compute the scalarfield value F for each vertex, which is used as a field separatorfor graph partitioning. The two arbitrary points can be visualisedas a pair of point-charges floating around a domain. Each pointcharge (A or B) creates a 3-d scalar field around the domain. Thefield value at any graph vertex (x, y, z) can be determined by con-sidering its distance from point A (xA, yA, zA) and B (xB, yB, zB). Thefollowing mathematical expression for the scalar value F at anygraph vertex (x, y, z) can be employed

Fðx; y; zÞ ¼ KA

r2A

� KB

r2B

ð3Þ

where rA and rB are the distances of a vertex from the floating arbi-trary points and given by

rA ¼ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiðxA � xÞ2 þ ðyA � yÞ2 þ ðzA � zÞ2

qð4Þ

rB ¼ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiðxB � xÞ2 þ ðyB � yÞ2 þ ðzB � zÞ2

qð5Þ

B (XB, YB, ZB)

A (XA, YA, ZA)

Fig. 2. Partitioning of a 3-d domain using float encoded evolutionary algorithm.

Page 4: Distributed evolutionary multi-objective mesh-partitioning algorithm for parallel finite element computations

1464 A. Rama Mohan Rao / Computers and Structures 87 (2009) 1461–1473

The scalar field value can be used as separator for partitioning thegraph. KA and KB are two constants. It is appropriate to point outhere that even though the proposed algorithm comes under theclassification of geometric partitioning algorithm, it will not use aline (2-d) or plane (3-d) to cut the mesh like many geometric parti-tioning algorithms [26].

It is evident from the above discussion, that only eight parame-ters (xA, yA, zA, xB, yB, zB, kA, kB) are enough to obtain separators forpartitioning the graph. The evolutionary algorithm controls theseeight free parameters. The separators computed using these eightfree parameters can be used to partition the graph into two equalparts. In other words for any pair of point charges, a single poten-tial value is computed at any graph vertex. These potential valuesare sorted and the graph vertices above and below the median areassigned to the first and second submesh, respectively. This satis-fies the load-balancing requirement. Hence it will be sufficient, ifthe two objectives, i.e., submesh aspect ratio (shape) defined inEq. (2) and the amount of communication volume are simulta-neously optimised using multi-objective optimisation techniques.In order to reduce one more parameter, in the present work, KA

is assumed as �1 and KB is chosen arbitrarily.

5. Multi-objective optimisation using evolutionary algorithms

Evolutionary algorithms are well-suited for performing themulti-objective optimization. They process a set of promising solu-tions simultaneously and therefore are capable of capturing severalpoints along the Pareto front. These algorithms are based on anarbitrarily initialized population of search points in the parameterspace, which by means of selection, mutation, and recombinationevolves towards better and better regions in the search space. Cur-rently, wide choice of multi-objective evolutionary algorithms isavailable in the literature [27–29]. Similarly, comparative perfor-mances of various multi-objective evolutionary algorithms are pre-sented in the literature using several benchmark test suits and alsosolving some specific problems [29]. Even though it is difficult toidentify a single algorithm to work efficiently for all class of prob-lems (NFL theorem) [30], SPEA2 [31] is generally reported to beconsistent in terms of generating well spread Pareto fronts.

The proposed multi-objective computational model for parti-tioning finite element meshes, follows the SPEA2 approach [31],and therefore applies the concepts of elitism in the selection (usinga secondary or elite population also known as external archive) and

Fig. 3. Multi-objective ev

search of optimal solutions in the Pareto front (the individuals ofthe population are ordered according to their dominance usingthe concept of Pareto optimality). This section just concentrateson the main points of SPEA2. The initial population, representationand evolutionary operators are standard: uniform distribution,float encoded representation, binary tournament selection, SBXcrossover, and parameter based mutation operator. In the presentwork, float encoded representation is used for design variables,as it involves searching continuous domains.

However, the real power of SPEA2 lies in the elitism preservedoperation. An external set (archive) is created for storing primarilynon-dominated solutions. It is then combined with the currentpopulation to form the mating pool in order to create offspringfor the next generation. SPEA2 maintains an archive of constantsize and measures strength with respect to both the archive andthe working population. The SPEA2 fitness measure incorporatesa raw fitness value (the sum of the strength measures of the solu-tions that dominate an individual) and the distance metric, whichpenalises solutions that are close to other individuals.

Two special situations arise, when filling solutions in the ar-chive. If the number of non-dominated solutions is smaller thanthe archive size, other dominated solutions taken from the remain-der part of the population are filled in. The selection is carried outaccording to a fitness measure defined earlier. The second situationhappens when the number of non-dominated solutions is over thearchive size. In this case, a truncation operator is applied. For thatoperator, the solution which has the smallest distance to the othersolutions will be removed from the set. If solutions have the sameminimum distance, the second nearest distance will be considered,and so forth. This is called the kth nearest distance rule. Fig. 3 showsthe scheme of operation of the proposed model.

6. Computational procedure

There are two different approaches for computing a k-way par-titioning of a graph associated with the finite element mesh. One isbased on recursive bisectioning and the other on direct k-way par-titioning [32]. However, direct k-way partitioning is harder in gen-eral than recursive bisection. Furthermore, the cost function givenin Eq. (2) is difficult to optimise and one has to resort to optimiseonly the average aspect ratio rather than minimising the maximumaspect ratio. However, in recursive bisection, one can easily mini-mise the maximum of the aspect ratio of the two bisections.

olutionary algorithm.

Page 5: Distributed evolutionary multi-objective mesh-partitioning algorithm for parallel finite element computations

A. Rama Mohan Rao / Computers and Structures 87 (2009) 1461–1473 1465

Further, number of studies reported in the literature [18,19,33–35]indicates that direct k-way multi level partitioning algorithmsproduce solutions that are generally inferior to those producedusing recursive bisection. Keeping these things in view, theproposed mesh partitioning problem is formulated as a recursivealgorithm.

The partitioning algorithm operates on the associate graph cor-responding to the given finite element mesh. The genome consistsof seven parameters as described above and the initial populationis chosen randomly. The extents to the arbitrary points A and B arefixed based on the maximum and minimum values of the posi-tional parameters of the graph vertices. The positional parametersof these two arbitrary points are chosen randomly within thesefixed extents.

Fig. 4. Distributed multi-object

The field separators are computed using Eq. (3) and the verticesare partitioned based on this field separator. The first objective isthe communication volume, which can be evaluated by scanningthorough the interface vertices of the partitioned subgraphs. Thesecond objective, i.e., aspect ratio can be evaluated using Eq. (2).

Since the problem of mesh partitioning is devised as a recursivealgorithm, at each level of recursion, the multi-objective mesh-par-titioning algorithm provides a set Pareto points (non-dominatedsolutions). In order to proceed with the subsequent recursive step,one has to take a decision about the most relevant Pareto optimalpoint in the recursive step to sort the vertices and perform recur-sive bisection. In the present algorithm, option has been providedto the user to control the optimisation process by setting an appro-priate bias towards a particular objective based on his specific

ive evolutionary algorithm.

Page 6: Distributed evolutionary multi-objective mesh-partitioning algorithm for parallel finite element computations

1466 A. Rama Mohan Rao / Computers and Structures 87 (2009) 1461–1473

requirement. The appropriate non-dominated solution from thePareto front is chosen at each recursive step to proceed furtherwith next level bisection of the graph. For example, if the userset the bias towards minimisation of communication volume, thenthe algorithm chooses the solution with least communicationvolume from the set of Pareto optimal solutions as the optimalbisection. Similarly, if the user prefers optimised shapes for thegenerated submeshes, the Pareto optimal point corresponding tothe minimal aspect ratio is chosen as optimal bisection. Alterna-tively, if the user is not interested to set a bias, an appropriatePareto point is automatically chosen by the algorithm. This selec-tion is based on the idea ‘min–max optimum’ formulated by usinggame theory and is being used to deal with conflicting situations[36,37].

Fig. 5. Unstructured mesh of JOINT.

7. Parallel multi-objective mesh-partitioning algorithm

It is obvious that the proposed multi-objective mesh-partition-ing algorithm using evolutionary computing techniques is compu-tationally more expensive when compared to other popularalgorithms like multi-level algorithms, even though the partitionsgenerated by the proposed algorithm are superior and morerational. The problem gets compounded with the increased prob-lem sizes. Hence the use of inherent parallelism in the algorithmis an obvious choice. Apart from computational performance, par-allel evolutionary algorithms are attractive as they save memory tocope up with very large problems, allow to use large population,improve population diversity. It is appropriate to point out herethat the best sequential graph-partitioning algorithms known todate are based on iterative local optimization algorithms whichare difficult to parallelize and are not scalable [16]. On the otherhand evolutionary algorithms are highly adaptive for parallel pro-cessing and scalable [38].

In the present work, a distributed version of the proposedmesh-partitioning algorithm is developed by employing a hetero-geneous island model. Each processor runs the evolutionary mul-ti-objective mesh-partitioning algorithm with it’s own distinctpopulation (called deme) and with different crossover and muta-tion probabilities for a (user specified) number of generationscalled an epoch. At the end of each epoch, all the processors aresynchronised and then the migration takes place, i.e., each demecommunicates with rest of the neighbouring demes to exchangethe archived (non-dominated) solutions. All the archived solutionsobtained from the neighbouring demes are collected in each demeand the dominated solutions from the combined archived set areremoved. If the number of non-dominated solutions is still morethan the specified archive size, then archive truncation of SPEA-2will be used to reduce the size of archive in each demeindependently.

The solution is assumed to have converged if there is no newnon-dominated solution added to the archive for a specified con-tinuous number of epochs. For this purpose a variable called‘epoch-count’ is defined and it is initially set to zero. At the endof each epoch, after the completion of migration and archive trun-cation, a check is made in each deme, whether the archive is up-dated with the new values. If the archive is updated with newnon-dominated solutions in a typical deme, ‘epoch-count’ is setto zero; otherwise it is incremented by one. During migration,the epoch-count value of each deme is also communicated to allother neighbouring demes in order to synchronise the convergencein all processors. After migration, the epoch-count in each deme isset to the minimum of all epoch-count values. The algorithmconverges when the epoch-count reaches the user specified con-vergence limit. The pseudo code of the proposed parallel multi-objective evolutionary algorithm is given in Fig. 4.

8. Numerical studies

A computer code has been developed for multi-objective meshpartitioning technique using the algorithms discussed in the earliersections for partitioning the unstructured finite element meshes.The proposed multi-objective mesh partitioning technique hasbeen later implemented on parallel hardware employing the dis-tributed multi-objective mesh-partitioning algorithm presentedin this paper. The parallel version has been developed using MPI(Message Passing Interface) software development environmentand has been implemented on sixteen processor SUN TCF(Technical Compute Farm) machine. The codes have been validatedusing some simple test meshes and later numerical experimentshave been conducted using several practical-engineering prob-lems. Here the results obtained by the sequential as well as parallelversion of the multi-objective mesh-partitioning algorithm aresummarised.

For sequential algorithm, the crossover probability is consid-ered as 0.88 and mutation probability is taken as 0.005. For parallelalgorithm, each deme will have different crossover and mutationprobabilities. The values ranging from 0.85 to 0.95 for crossoverand 0.005 to 0.009 for mutation, respectively, are employed forthe numerical experiments presented in this paper. The multi-objective evolutionary algorithm is assumed to have convergedduring each recursive bisection, if a new non-dominated solutionis not added to the external archive in the last twenty generations.Similarly, the distributed version of the algorithm is assumed tohave converged if there is no archive update for the last fourepochs. An epoch consists of five generations.

8.1. Numerical example 1

The first numerical example considered is a typical unstruc-tured mesh of a joint of a space frame (JOINT) shown in Fig. 5.The graph corresponding to the unstructured mesh consists of7860 vertices and 11,212 edges. The graph is partitioned into 2n

partitions where n ranging from 1 to 6. Table 1 shows the partition-ing results obtained using all the three options discussed earlier,for varied number of partitions. The aspect ratio (AR) given in allthe results presented in this paper is the average aspect ratio andthe communication volume is represented in the form of cut edges(CE). The parameter ARR (AR_Ratio) shown in the table is the ratioof the aspect ratio and the minimum aspect ratio obtained from allthe three possible options for a particular number of partitions.Similarly, CER (CE_Ratio) is the ratio of the cut edges to the mini-mum cut edges obtained from all the three possible options for a

Page 7: Distributed evolutionary multi-objective mesh-partitioning algorithm for parallel finite element computations

Table 1Partitioning results of JOINT using the proposed multi-objective mesh-partitioning algorithm.

NP With bias to AR With bias to CE ‘min–max’ option

Cut-edges AR CER ARR Cut-edges AR CER ARR Cut-edges AR CER ARR

2 144 1.04 2.18 1.00 66 1.98 1.00 1.90 92 1.12 1.40 1.084 214 1.20 1.45 1.00 148 1.61 1.00 1.34 195 1.21 1.32 1.018 361 1.03 1.23 1.00 294 1.18 1.00 1.15 318 1.14 1.08 1.1116 510 1.38 1.18 1.00 432 1.43 1.00 1.04 496 1.42 1.15 1.0332 780 1.34 1.25 1.00 624 1.39 1.00 1.04 712 1.39 1.14 1.0464 1112 1.29 1.31 1.00 851 1.42 1.00 1.10 989 1.35 1.16 1.05

NP: number of partitions; AR: aspect ratio; CER: cut edge ratio; ARR: AR ratio.

A. Rama Mohan Rao / Computers and Structures 87 (2009) 1461–1473 1467

specified number of partitions. Quite understandably, ARR value is1.00 when the proposed algorithm is run with bias towards aspectratio and CER is 1.00 when the algorithm is executed with bias to-wards cut edges. Further, the CER value varies from 1.18 to 2.18,when bias is set to aspect ratio and it varies from 1.16 to 1.40 when‘min–max’ option is chosen. Similarly, the ARR values vary from1.04 to 1.90 for varied number of partitions, when bias is set tocommunication volume, while the variation in CER vary between1.03 and 1.08. From the results presented in Table 1, it can be ob-served that the partitions generated by each of the option of multi-objective mesh-partitioning algorithm is in fact different and onecan clearly observe the bias in the results towards an objective,when the user sets the option in favour of that particular objective.

Fig. 6. Generation of four submeshes employing the pro

For ‘min–max’ option one can observe clear balance among the twoobjectives and it is clearly reflected from the ARR and CER param-eters. The two objectives considered here, i.e., aspect Ratio andcommunication volume are in fact incommensurable and oftenconflicting and the proposed multi-objective mesh-partitioningalgorithm is capable of handling the conflicting objectives effec-tively. Fig. 6 shows the contrasting partitions generated using thethree possible options in the proposed algorithm.

8.2. Numerical example 2

The second numerical example considered is an unstruc-tured mesh of a multi-joint (MULTI-JOINT) shown in Fig. 7. The

posed multi- objective mesh-partitioning algorithm.

Page 8: Distributed evolutionary multi-objective mesh-partitioning algorithm for parallel finite element computations

Fig. 8. Unstructutred mesh of an helicopter.

Fig. 7. Multi-joint of an off-shore platform.

1468 A. Rama Mohan Rao / Computers and Structures 87 (2009) 1461–1473

corresponding dual graph has 12,900 vertices and 18,880 edges.The partitioning results obtained using the proposed computa-tional model for multi-objective mesh partitioning is shown in Ta-ble 2. It can be observed from the results given in Table 2 that theARR values vary from 1.28 to 2.01 when the bias is towards com-munication volume and with ‘min–max’ option, it varies from1.02 to 1.12. Similarly the range of CER values varies between1.20 and 3.00 when the bias is towards aspect ratio and it variesbetween 1.03 and 1.38 with min–max option. This clearly indicatesthat the proposed multi-objective mesh partitioning model gener-ates partitions with clear cut balance among the two conflictingobjectives, when ‘min–max’ option is activated. It can also be ob-served that the CER values are high, when bias is set towards opti-mising the aspect ratio.

Table 2Partitioning results of MULTI-JOINT using the proposed multi-objective mesh-partitioning

NP With bias to AR With bias to CE

Cut-edges AR CER ARR Cut-edges AR

2 186 1.04 3.00 1.00 62 2.14 311 1.29 1.75 1.00 178 1.68 482 1.34 1.40 1.00 344 1.816 722 1.40 1.28 1.00 565 1.932 986 1.29 1.20 1.00 819 1.764 1442 1.32 1.43 1.00 1070 1.6

NP: number of partitions; AR: aspect ratio; CER: cut edge ratio; ARR: AR ratio.

Table 3Partitioning results of HELIC using the proposed multi-objective mesh-partitioning algorit

NP With bias to AR With bias to CE

Cut-edges AR CER ARR Cut-edges AR

2 78 2.29 1.39 1.00 56 2.44 186 1.64 1.31 1.00 142 2.08 337 1.59 1.28 1.00 264 1.816 567 1.63 1.23 1.00 461 2.032 792 1.48 1.20 1.00 659 1.964 1107 1.44 1.14 1.00 968 1.8

NP: number of partitions; AR: aspect ratio; CER: cut edge ratio; ARR: AR ratio.

8.3. Numerical example 3

The third numerical example considered is an unstructuredmesh defining the body of an helicopter (HELIC) shown in Fig. 8.The corresponding dual graph consists of 7920 vertices and11,091 edges. Table 3 shows the partitioning results obtained usingthe proposed multi-objective mesh-partitioning algorithm. It canbe observed that the CER values varies between 1.14 and 1.39,when the bias is set to optimisation of submesh aspect ratio andit varies from 1.04 to 1.16 with ‘min–max’ option. Similarly, theARR values are between 1.05 and 1.32 when the bias is set to min-imisation of communication volume and it varies from 1.01 to 1.20with ‘min–max’ option. It can also be observed from the resultspresented in Table 3 that the optimized aspect ratios are in facton the higher side with increased cut edges. The reason for thiscan be attributed to the typical shape of the mesh. One can also ob-serve the average aspect ratio value is in fact reduced at a rapidrate with the increase in number of partitions for the case wherethe bias is set to optimisation of submesh aspect ratio, while thevariation is much smaller for the other two options.

8.4. Numerical example 4

The final numerical example considered is a three dimensionalfinite element mesh of a metal strip (FEMESH) shown in Fig. 9. The

algorithm.

‘min–max’ option

CER ARR Cut-edges AR CER ARR

1 1.00 2.01 86 1.12 1.38 1.087 1.00 1.30 210 1.44 1.18 1.121 1.00 1.35 378 1.39 1.10 1.044 1.00 1.39 582 1.44 1.03 1.034 1.00 1.35 889 1.43 1.09 1.119 1.00 1.28 1186 1.35 1.11 1.02

hm.

‘min–max’ option

CER ARR Cut-edges AR CER ARR

0 1.00 1.05 65 2.32 1.16 1.011 1.00 1.22 164 1.77 1.16 1.088 1.00 1.18 287 1.87 1.09 1.184 1.00 1.25 481 1.91 1.04 1.175 1.00 1.32 743 1.77 1.13 1.204 1.00 1.28 1005 1.70 1.04 1.18

Page 9: Distributed evolutionary multi-objective mesh-partitioning algorithm for parallel finite element computations

Fig. 9. A graded 3-d finite element mesh (FEMESH).

Table 5Details of the benchmark test graphs in the literature.

S. No. Graph Number of vertices Number of edges

1 3elt 9000 13,2782 Aerofoil 8034 11,8133 Big 30,269 44,9494 Crack 20,141 30,0435 Fe-4elt 15,606 45,8786 Brack2 62,631 366,5597 CS4 22,499 43,8588 4elt2 11,143 32,8189 T60k 60,005 89,44010 UK 4824 683711 Wing 62,032 121,544

Table 6Comparative performance of the proposed mesh-partitioning algorithm with multi-level RSB and Métis in minimising the communication volume represented in theform of cut edges.

Number of partitions 2 4 8 16 32 64

3ELT Proposedalgorithm

54 102 168 302 498 787

RSBM 76 249 430 689 1173 1828PMETIS 65 156 243 358 569 907KMETIS 64 128 195 328 566 885

AIRFOIL1 Proposedalgorithm

40 82 140 234 421 711

RSBM 56 201 357 594 1042 1672PMETIS 39 88 156 285 487 820KMETIS 39 103 157 276 505 809

BIG Proposedalgorithm

71 198 322 608 904 1403

RSBM 422 704 1231 1921 2489 3083PMETIS 87 209 358 585 962 1522KMETIS 77 189 334 603 901 1507

CRACK Proposedalgorithm

102 216 358 541 914 1255

RSBM 148 491 822 1302 1965 2964PMETIS 101 216 363 634 916 1382KMETIS 100 224 377 575 891 1313

FE-4ELT Proposedalgorithm

147 341 612 1102 1734 2803

RSBM 287 422 707 1232 1926 3087PMETIS 183 401 734 1101 1806 2905KMETIS 151 415 638 1092 1819 2869

BRACK2 Proposedalgorithm

719 3018 7507 11,026 18,983 27,954

RSBM 1036 4125 8776 15,404 22,890 32,999PMETIS 748 3230 7707 12,838 20,039 29,232KMETIS 794 3251 8310 13,068 19,849 28,969

A. Rama Mohan Rao / Computers and Structures 87 (2009) 1461–1473 1469

significance of the problem is that it has a very well graded meshfrom very fine to slowly traversing into a coarser mesh. The parti-tioning results obtained using the proposed multi-objective mesh-partitioning algorithm is given in Table 4. It can be observed fromthe table that the average aspect ratio varies from 1.30 to 1.67when the proposed model is run with the option of optimisingthe submesh aspect ratio and with ‘min–max’ option, the averageaspect ratio lies between 1.36 and 2.07. The ARR values of the par-titioning results obtained with bias towards minimisation of com-munication volume varies from 1.19 to 1.42 and these values arevaried from 1.05 to 1.24 when ‘min–max’ option is chosen whileusing the proposed model. Similarly, the CER values of the parti-tions generated exercising the option of optimisation of submeshaspect ratio are between 1.17 and 2.81, while with ‘min–max’ op-tion, it varies from 1.05 to 2.07. It can be concluded from thesenumerical experiments that the proposed multi-objectivemesh-partitioning algorithm is capable of generating partitionswith good balance between the two conflicting objectives when‘min–max’ option is chosen.

8.5. Benchmark test problems

Finally some benchmark test graphs available in the internet arecompiled for testing the proposed multi-objective mesh-partition-ing algorithm and the details of the test graphs are given in Table 5.The proposed multi-objective mesh-partitioning algorithm is em-ployed to generate varied number of partitions for all the testmeshes given in Table 5. The partitioning results obtained for thebench mark problems using the proposed algorithm are compared

Table 4Partitioning results of FEMESH using the proposed multi-objective mesh-partitioning algo

NP With bias to AR With bias to CE

Cut-edges AR CER ARR Cut-edges AR

2 714 1.67 2.81 1.00 254 2.34 1092 1.56 1.47 1.00 744 1.98 2260 1.40 1.17 1.00 1936 1.716 4132 1.34 1.23 1.00 3371 1.732 5388 1.30 1.22 1.00 4412 1.564 6693 1.42 1.27 1.00 5256 1.8

NP: number of partitions; AR: aspect ratio; CER: cut edge ratio; ARR: AR ratio.

with popular multi-level mesh-partitioning algorithms like multi-level recursive spectral Bisection algorithm (MRSB) [39], p-Metis[40], k-Metis [34] and JOSTLE [7]. The cut edges obtained for vari-ous algorithms are furnished in Table 6 for generating 2n submes-hes, where n varies between 1 and 6. The partitions are generated

rithm.

‘min–max’ option

CER ARR Cut-edges AR CER ARR

8 1.00 1.42 418 2.07 1.65 1.247 1.00 1.26 1546 1.72 2.07 1.102 1.00 1.23 2036 1.57 1.05 1.127 1.00 1.32 3745 1.42 1.11 1.065 1.00 1.19 4609 1.36 1.05 1.052 1.00 1.28 5674 1.55 1.08 1.09

Page 10: Distributed evolutionary multi-objective mesh-partitioning algorithm for parallel finite element computations

Table 10Performance of distributed multi-objective mesh-partitioning algorithm on Sun-TCFmachine for various test meshes in optimising sub mesh aspect ratio (AR) andcommunication volume (cut edges).

Graph type User preferenceoption

NP Fourprocessors

Eightprocessors

Sixteenprocessors

Cutedges

AR Cutedges

AR Cutedges

AR

JOINT min–max 8 312 1.15 304 1.10 310 1.08MULTI-JOINT min–max 8 386 1.37 367 1.32 362 1.26T60k min–max 16 1020 1.40 1028 1.34 1020 1.36WING min–max 16 7084 1.88 7042 1.83 7038 1.78J1-367 min–max 16 3050 1.59 2904 1.36 2848 1.44J2-407 min–max 16 3737 1.98 3629 1.89 3652 1.78AXLE min–max 16 9864 1.57 9843 1.54 9804 1.51ROTOR min–max 16 11,685 1.31 11,358 1.27 11,354 1.19SPACE-FRAME min–max 16 1072 1.49 1046 1.44 1039 1.46

1470 A. Rama Mohan Rao / Computers and Structures 87 (2009) 1461–1473

using the option ‘bias towards CE’ as the graph-partitioningalgorithms used for comparison optimises only single objective,i.e., communication volume. A close look at the results presentedin Table 6, clearly indicate that the proposed algorithm generatespartitions which are either comparable or superior to the popularsingle objective partitioning algorithms.

The partitioning results of the benchmark test meshes for 16, 32and 64 partitions are compared with the results of the JOSTLE soft-ware available in the literature [7] and presented in Tables 7–9,respectively. The partitions are generated using the options relatedto ‘bias towards AR’ and ‘bias towards CE’ in order to have a faircomparison, as the published results are for single objective. Onecan clearly observe that the proposed multi-objective mesh-partitioning algorithm in fact produces results, which are eithercomparable or superior to JOSTLE. The software JOSTLE is basedon multilevel implementation. However, it should be mentionedhere that the comparisons have been made with the software JOS-TLE which basically aims at optimising a single objective, i.e., either

Table 7Comparative performance of the proposed multi-objective mesh-partitioning algo-rithm with JOSTLE in minimising the sub mesh aspect ratio (AR) and communicationvolume (cut edges) – generation of 16 submeshes.

Graph With bias towards AR With bias towards CE

JOSTLE Present JOSTLE Present

AR Cutedges

AR Cutedges

AR Cutedges

AR Cutedges

UK 1.68 198 1.54 210 1.75 182 1.76 1764elt 1.32 898 1.17 922 1.50 602 1.74 578T60k 1.38 1026 1.14 1014 1.43 1016 1.47 1026CS4 1.52 2627 1.46 2664 1.58 2496 1.54 2504Wing 1.60 9274 1.57 9384 1.97 5008 2.11 4989

Table 8Comparative performance of the proposed multi-objective mesh-partitioning algo-rithm with JOSTLE in minimising the sub mesh aspect ratio (AR) and communicationvolume (cut edges) – generation of 32 submeshes.

Graph With bias towards AR With bias towards CE

JOSTLE Present JOSTLE Present

AR Cutedges

AR Cutedges

AR Cutedges

AR Cutedges

UK 1.55 331 1.49 362 1.65 305 1.79 3014elt 1.28 1367 1.20 1347 1.28 902 1.44 910T60k 1.31 1600 1.34 1604 1.34 1552 1.45 1516CS4 1.54 3647 1.39 3612 1.621 3501 1.61 3506Wing 1.61 13,732 1.57 14,012 1.93 6866 1.95 6797

Table 9Comparative performance of the proposed multi-objective mesh-partitioning algo-rithm with JOSTLE in minimising the sub mesh aspect ratio (AR) and communicationvolume (cut edges) – generation of 64 submeshes.

Graph With bias towards AR With bias towards CE

JOSTLE Present JOSTLE Present

AR Cutedges

AR Cutedges

AR Cutedges

AR Cutedges

UK 1.46 556 1.39 579 1.53 512 1.60 5174elt 1.29 1993 1.31 2010 1.31 1515 1.39 1521T60k 1.33 2514 1.19 2534 1.36 2439 1.42 2440CS4 1.53 5017 1.50 5049 1.59 4666 1.61 4612Wing 1.61 15,668 1.57 15,919 1.92 9401 1.97 9395

the aspect ratio or the communication volume. The present mesh-partitioning algorithm however attempts to generate trade-offsolutions by simultaneously optimising both the objectives. It ismore likely that the partitioning results obtained using the evolu-tionary algorithm based mesh partitioning techniques with singleobjective will be qualitatively superior when compared to the re-sults obtained with the it’s multi-objective counterpart (with a biastowards a specific objective).

8.6. Performance evaluation

The number of generations taken by distributed evolutionaryalgorithms is lesser when compared to their sequential counterparts. In view of this one can obtain super linear speedup using dis-tributed multi-objective evolutionary algorithms. Hence we pro-pose to evaluate the performance of the proposed distributedmulti-objective mesh-partitioning algorithm both in terms of qual-ity of partitions generated with reference to their sequential coun-ter parts and also the capability of solving large size meshes.Table 10 shows the partitioning results generated using variousfinite element meshes ranging from small test meshes like ‘JOINT’to moderate size test meshes like J1-367 (with the number of ver-tices is equal to 36,700), which a refined mesh of an offshore jointand J2-407 (with the number of vertices is equal to 40,700), a re-fined mesh of a space frame segment and large 3-d finite elementmodels like axle of an automobile (AXLE) with number of verticesas 1,93,100; A rotor (ROTOR) with 1,53,799 vertices and a spaceframe (SPACE-FRAME) with 1,38,699 vertices. The finite elementmeshes for the three large finite element models considered forevaluation and four partitions generated using ‘MIN–MAX ‘ optionof the proposed distributed algorithm are shown in Fig. 10.

Even though the population size is taken same in each proces-sor, the crossover and mutation rates are varied in each processorin order to vary the diversity of the generated results. Studies havebeen carried out by varying the migration interval and the best re-sults obtained are shown in Table 10. A close look at the resultspresented in Table 10 indicate that the proposed distributed mul-ti-objective mesh-partitioning algorithm in fact improves the qual-ity of the partitions generated using parallel processors whencompared to its sequential counter parts. The speedup obtainedwhile solving J1-367 is 4.3, 8.5 and 16.2, respectively, on four, eightand sixteen processors. Similarly, the speedup recorded for J2-407is 4.5, 8.6 and 16.7 on four, eight and sixteen processors, respec-tively. The speed up recorded for large finite element meshes,i.e., SPACE FRAME, ROTOR and AXLE varies from 4.2 to 4.5, 8.3 to8.5, and 16.4 to 16.8, respectively, on four, eight and sixteenprocessors.

Page 11: Distributed evolutionary multi-objective mesh-partitioning algorithm for parallel finite element computations

Fig. 10. Generation of four submeshes using ‘min–max’ option of the proposed multi-objective algorithm.

A. Rama Mohan Rao / Computers and Structures 87 (2009) 1461–1473 1471

9. Conclusions

In this paper, a mesh-partitioning algorithm is presented forsimultaneous optimisation of inter processor communication vol-ume, the shape of the generated submeshes while maintainingapproximately equal computational load among the processors.Both the objectives, i.e., submesh aspect ratio as well as the com-munication volume are vital for optimal performance of thestate-of-the-art parallel FEM codes. Since these two objective func-tions are incommensurable, it is desirable to use multi-objectiveoptimisation techniques to generate optimal submeshes. In thepresent work, a new approach within the framework of evolution-ary computing techniques is proposed to develop a multi-objectivemesh-partitioning algorithm. Numerical experiments have beenconducted by using both the sequential as well parallel versionsof the mesh-partitioning algorithm to partition the unstructured

meshes of several practical-engineering problems and also severalbenchmark test meshes given in the literature. The following aresome of the conclusions based on the numerical studies carriedout in this paper.

(i) In this paper, it is shown through numerical studies that theproposed multi-objective mesh-partitioning algorithm canhandle dissimilar objectives and provides partitions whichare more rational in terms of maintaining good shape as wellas minimising communication volume.

(ii) The proposed multi-objective mesh-partitioning algorithmcan be tuned by a user-supplied preference option in orderto control the trade-offs among the different objectives inthe generated partitions. It is shown through numericalexperiments that the proposed multi-objective mesh-partitioning algorithm is able to balance the trade-offs of

Page 12: Distributed evolutionary multi-objective mesh-partitioning algorithm for parallel finite element computations

1472 A. Rama Mohan Rao / Computers and Structures 87 (2009) 1461–1473

different objectives than partitioning with respect to a singleobjective only. It is also shown that by modifying the inputpreference vector, the multi-objective mesh-partitioningalgorithm is able to gracefully decrease the tradeoff in oneobjective at the expense of the other.

(iii) Multi-objective optimisation with ‘min–max’ option in factprovides a trade off solution with right balance of bothobjectives, which is effective for parallel finite elementcomputations.

(iv) Numerical experiments conducted on distributed multi-objective mesh-partitioning algorithms indicate that theproposed distributed algorithm in fact improves the qualityof partitions generated, provided the parameters like cross-over, mutation, migration rates are tuned appropriately.Even though some efforts have been made to tune theseparameters, these parameters are not optimized. The qualityof partitioning results is likely to improve further with a sys-tematic tuning of the parameters of distributed algorithm.

(v) One of the additional requirements of the proposed algo-rithm when compared to popular multi-level algorithms isthat it requires positional parameters of the graph vertices.However, this can not be considered as a major limitation,as the proposed algorithm is aimed at partitioning finite ele-ment meshes, for which the positional parameters will bereadily available.

(vi) It is appropriate to mention here that the distributed multi-objective mesh-partitioning algorithm presented in thispaper will be slower than the parallel versions of Metisand Jostle. However, the overall objective is to minimisethe total computing time of the parallel application whichincludes the time taken for partitioning the mesh and timetaken for the solution of the resulting partitioned mechani-cal application. There are several popular parallel applica-tions e.g., nonlinear dynamic analysis of large mechanicalsystems, computational fluid dynamics etc., which requirethe services of iterative solvers (built into the parallel appli-cation) in each time step/iteration. At the same time theyoften require processor interaction also, during each timestep/iteration. For such applications, the proposed algorithmwhich produces more rational partitions minimising bothcommunication volume (to minimise processor communica-tion overheads) as well as shape of the submesh (improvethe convergence of the solver) will be handy. Since the runtime of these applications is usually high for large size prob-lems, the marginal overheads associated with the proposedpartitioning algorithm becomes negligible. Moreover thegain due to improved partitioning of the mechanical prob-lem during parallel execution is much higher, which effec-tively negates the additional computational time taken forpartitioning. In view of this, the proposed algorithm can beused for certain class of applications, which consume sub-stantial runtime on parallel processors (typically problemslike nonlinear dynamics and computational fluid dynamics).However, the proposed algorithm is certainly not practicalfor problems like parallel sparse solutions, which requirepartitions to be generated rapidly as the run time of the par-allel solution is very less.

Acknowledgements

This paper is being published with the permission of theDirector, Structural Engineering Research Centre, Chennai. TheFinancial assistance for this work by Aeronautical Research andDevelopment Board (ARDB), New Delhi is gratefully acknowledged.

References

[1] Kaveh A. Optimal structural analysis. 2nd ed. Somerset, UK: John Wiley; 2006.[2] Kaveh A, Roosta GR. Domain decomposition for finite element analysis.

Commun Numer Methods Eng 1997;13:61–71.[3] Garey MR, Johnson DS. Computers and Intractability: a guide to the theory of

NP-completeness. NY: Freeman W.H. and Company; 1979.[4] Rama Mohan Rao A. Efficient parallel processing algorithms for nonlinear

dynamic analysis. Ph.D. thesis, Indian Institute of Science, Bangalore, India;2001.

[5] Diekmann R, Preis R, Schlimbach F, Walshaw C. Shape optimised meshpartitioning and Load balancing for parallel adaptive FEM. Parallel Comput2000;26:1555–81.

[6] Diekmann R, Preis R, Schlimbach F, Walshaw C. Aspect ratio for meshpartitioning. In: Pritchard D, Reeve J, editors. Euro-Par’98 parallel processing.LNCS, vol. 1470. Berlin: Springer; 1998. p. 347–51.

[7] Walshaw C, Cross M, Diekmann R, Schlimbach F. Multilevel mesh partitioningfor optimising domain shape. Int J High Perform Comput Appl1999;13(4):334–53.

[8] Khan AI, Topping BHV. Subdomain generation for parallel finite elementanalysis. Comput Syst Eng 1998;4:96–129.

[9] Topping BHV, Khan AI. Parallel finite element computations. Edinburgh,UK: Saxe-coburg Publications; 1996.

[10] Mansour N, Fox GC. Allocating data to distributed memory multiprocessors byGenetic Algorithms. Concurrency: Pract Experience 1994;6:485–504.

[11] Wendl A. A seed based decomposition algorithm employing geneticalgorithms. Report No: EPCC-SS96-01, University of Edinburgh, UK; 1996.

[12] Gil C, Ortega J, Diaz AF, Monotoya MG. Annealing based heuristics and geneticalgorithms for circuit partitioning in parallel test generation. Future GenerComput Syst 1998;14:439–51.

[13] Soper AJ, Walshaw C, Cross M. A combined evolutionary search and multileveloptimisation approach to graph partitioning. Mathematics Research Report00/IM/58, School of Computing and Mathematical Sciences, University ofGreenwich, London, UK; 2001.

[14] Rama Mohan Rao A, Dattaguru B, Appa Rao TVSR. Automatic decomposition ofunstructured meshes employing genetic algorithms for parallel FEMcomputations. Int J Struct Eng Mech 2002;14:625–47.

[15] Kaveh A, Rahimi HAB. A hybrid graph-genetic method for domaindecomposition. Finite Elem Anal Des 2003;29:1237–47.

[16] Chevalier Cédric, Pellegrini François. Improvement of the efficiency of geneticalgorithms for scalable parallel graph partitioning in a multi-level framework.In: Nagel WE et al., editors. Euro-Par 2006. LNCS, vol. 4128. Springer-Verlag;2006. p. 243–52.

[17] Koro Sec P, Silc J, BVorut Robi C. Solving the mesh-partitioning problem withan ant-colony algorithm. Parallel Comput 2004;30:785–801.

[18] Kaveh A, Sharafi P. Ant colony optimization for finding medians of weightedgraphs. Eng Comput 2008;25(2):102–20.

[19] Kaveh A, Shojaee S. Optimal domain decomposition via p-medianmethodology using ACO and hybrid ACGA. Finite Elem Anal Des2008;44(8):505–12.

[20] Ababie C, Selvakkumaran N, Bazargan K, Karypis G. Multi-objective circuitpartitioning for cutsize and path based delay minimisation. In: Proceeding ofICCAD; 2002. <http://www.cs.umn.edu/~karypis>.

[21] Schloegel K, Karypis G, Kumar V. A new algorithm for multi-objective graphpartitioning. In: Proceedings of Europar, vol. 99; 1999. p. 322–31.

[22] Selvakkumaran N, Karypis G. Multi-objective hypergraph partitioningalgorithms for cut and maximum subdomain degree minimization. IEEETrans Comput Aided Des 2005;25:504–17.

[23] Hendrickson B, Kolda TG. Graph partitioning models for parallel computing.Parallel Comput 2000;26:1519–34.

[24] Hendrickson B. Load balancing fictions, falsehoods and fallacies. Appl MathModel 2000;25:99–108.

[25] Kernighan BW, Lin S. An effective heuristic procedure for partitioning graphs.Bell Syst Tech J 1970:291–308.

[26] Gilbert JR, Miller GL, Teng SH. Geometric mesh partitioning: implementationand experiments. SIAM J Sci Comput 1998;19(6):2091–110.

[27] Deb K. Multi-objective optimisation using evolutionary algorithms. Wiley;2001.

[28] Colette Y, Siarry P. Multi-objective optimization principles and casestudies. Springer; 2003.

[29] Coello Coello CA. List of references on evolutionary multi-objectiveoptimisation; 2006. <http://www.lania.mx/~ccoello/EMOO/EMOObib.html>.

[30] Corne DW, Knowles JD. No free lunch and free leftovers theorems formultiobjective optimisation problems. In: Fonseca Carlos M, Fleming Peter J,Zitzler Eckart, Deb Kalyanmoy, Thiele Lothar, editors. Evolutionary multi-criterion optimization. Second international conference, EMO 2003, Faro,Portugal, April. Lecture notes in computer science, vol. 2632. Springer; 2003. p.327–41.

[31] Zitzler E, Laumanns M, Thiele Lothar. SPEA2: improving the strength Paretoevolutionary algorithm. Technical Report 103, Gloriastrasse 35, CH-8092Zurich, Switzerland; 2001.

[32] Karypis George, Kumar Vipin. Multilevel k-way partitioning scheme forirregular graphs. J Parallel Distr Comput 1998;48(1):96–129.

[33] Cong J, Lim SK. Multiway partitioning with pair wise movement. In:Proceedings of ICCAD; 1998. p. 512–6.

Page 13: Distributed evolutionary multi-objective mesh-partitioning algorithm for parallel finite element computations

A. Rama Mohan Rao / Computers and Structures 87 (2009) 1461–1473 1473

[34] Karypis G, Kumar V. Multilevel k-way hypergraph partitioning. In: Thirty-sixthannual conference on design automation (DAC’99); 1999. p. 343–8.

[35] Sanchis L. Multipleway network partitioning. IEEE Trans Comput1989;38:62–81.

[36] Osyczka A. An approach to multicriterion optimisation problems forengineering design. Comput Methods Appl Mech Eng 1978;15:309–33.

[37] Osyczka A. An approach to multicriterion optimisation problems for structuraldesign. In: Proceedings of international symposium on optimum structuraldesign, University of Arizona; 1981.

[38] Alba E, Tomassini M. Parallelism and evolutionary algorithms. IEEE TransEvolut Comput 2002;6(5):443–62.

[39] Barnard ST, Simon HD. A fast multilevel implementation of recursive spectralbisection for partitioning unstructured problems. Concurrency: PractExperience 1994;6:101–7.

[40] Karypis G, Kumar Vipin. A fast and high quality multilevel scheme forpartitioning irregular graphs. SIAM J Sci Comput 1999;20:359–92.