a mixed heuristic for circuit partitioning

20
Computational Optimization and Applications, 23, 321–340, 2002 c 2002 Kluwer Academic Publishers. Manufactured in The Netherlands. A Mixed Heuristic for Circuit Partitioning C. GIL [email protected] Departamento de Arquitectura de Computadores y Electr´ onica, Universidad de Almer´ ıa, La Ca ˜ nada de San Urbano s/n, 04120 Almer´ ıa, SPAIN J. ORTEGA [email protected] Departamento de Arquitectura y Tecnolog´ ıa de Computadores, Universidad de Granada, Campus de Fuentenueva, Granada, SPAIN M.G. MONTOYA AND R. BA ˜ NOS [email protected] Departamento de Arquitectura de Computadores y Electr´ onica, Universidad de Almer´ ıa, La Ca ˜ nada de San Urbano s/n, 04120 Almer´ ıa, SPAIN Abstract. As general-purpose parallel computers are increasingly being used to speed up different VLSI appli- cations, the development of parallel algorithms for circuit testing, logic minimization and simulation, HDL-based synthesis, etc. is currently a field of increasing research activity. This paper describes a circuit partitioning algo- rithm which mixes Simulated Annealing (SA) and Tabu Search (TS) heuristics. The goal of such an algorithm is to obtain a balanced distribution of the target circuit among the processors of the multicomputer allowing a parallel CAD application for Test Pattern Generation to provide good efficiency. The results obtained indicate that the proposed algorithm outperforms both a pure Simulated Annealing and a Tabu Search. Moreover, the usefulness of the algorithm in providing a balanced workload distribution is demonstrated by the efficiency results obtained by a topological partitioning parallel test-pattern generator in which the proposed algorithm has been included. An extented algorithm that works with general graphs to compare our approach with other state of the art algorithms has been also included. Keywords: circuit partitioning, optimisation, parallel test pattern generation, simulated annealing, Tabu Search 1. Introduction The circuit partitioning problem arises in many VLSI applications [2, 24]. Due to the increasing complexity of VLSI circuits, the NP-complete [11] character of many VLSI CAD problems makes a “divide and conquer” approach more attractive to solve these problems in reasonable periods of time by parallel processing, and to handle arbitrarily large circuits, that may not fit in the memory of standard workstations, on distributed memory multiprocessors. The usefulness of parallel processing to speed up the resolution of VLSI CAD problems and to address the circuit storage problems has been considered in the recent literature on circuit testing, logic synthesis, cell placement, etc. [5, 7]. In this way, circuit partitioning has become an important previous step in VLSI CAD applications [7]. It appears when trying to exploit the concurrency in the target circuit (data Author to whom correspondence should be addressed.

Upload: ucam

Post on 25-Feb-2023

1 views

Category:

Documents


0 download

TRANSCRIPT

Computational Optimization and Applications, 23, 321–340, 2002c© 2002 Kluwer Academic Publishers. Manufactured in The Netherlands.

A Mixed Heuristic for Circuit Partitioning

C. GIL∗ [email protected] de Arquitectura de Computadores y Electronica, Universidad de Almerıa,La Canada de San Urbano s/n, 04120 Almerıa, SPAIN

J. ORTEGA [email protected] de Arquitectura y Tecnologıa de Computadores, Universidad de Granada,Campus de Fuentenueva, Granada, SPAIN

M.G. MONTOYA AND R. BANOS [email protected] de Arquitectura de Computadores y Electronica, Universidad de Almerıa,La Canada de San Urbano s/n, 04120 Almerıa, SPAIN

Abstract. As general-purpose parallel computers are increasingly being used to speed up different VLSI appli-cations, the development of parallel algorithms for circuit testing, logic minimization and simulation, HDL-basedsynthesis, etc. is currently a field of increasing research activity. This paper describes a circuit partitioning algo-rithm which mixes Simulated Annealing (SA) and Tabu Search (TS) heuristics. The goal of such an algorithm is toobtain a balanced distribution of the target circuit among the processors of the multicomputer allowing a parallelCAD application for Test Pattern Generation to provide good efficiency. The results obtained indicate that theproposed algorithm outperforms both a pure Simulated Annealing and a Tabu Search. Moreover, the usefulness ofthe algorithm in providing a balanced workload distribution is demonstrated by the efficiency results obtained bya topological partitioning parallel test-pattern generator in which the proposed algorithm has been included. Anextented algorithm that works with general graphs to compare our approach with other state of the art algorithmshas been also included.

Keywords: circuit partitioning, optimisation, parallel test pattern generation, simulated annealing, Tabu Search

1. Introduction

The circuit partitioning problem arises in many VLSI applications [2, 24]. Due to theincreasing complexity of VLSI circuits, the NP-complete [11] character of many VLSI CADproblems makes a “divide and conquer” approach more attractive to solve these problems inreasonable periods of time by parallel processing, and to handle arbitrarily large circuits, thatmay not fit in the memory of standard workstations, on distributed memory multiprocessors.The usefulness of parallel processing to speed up the resolution of VLSI CAD problemsand to address the circuit storage problems has been considered in the recent literature oncircuit testing, logic synthesis, cell placement, etc. [5, 7].

In this way, circuit partitioning has become an important previous step in VLSI CADapplications [7]. It appears when trying to exploit the concurrency in the target circuit (data

∗Author to whom correspondence should be addressed.

322 GIL ET AL.

parallelism) instead of exploiting the concurrency of the algorithm ( functional parallelism)[24]. In any parallel application, the workload distribution among the processors of a par-allel computer is an important factor for efficient use of the parallel computer. For someapplications it is difficult to provide a graph model for the processing and communicationvolumes corresponding to the tasks of a program, and the usefulness of procedures forthe workload distribution based on graphs is reduced. In such circumstances, a dynamicload-balancing procedure is required [25]. However, the testing application we are inter-ested in is usually based on applying a given procedure to the different circuit elements(logic gates and connections between them). Thus, as the data structure of the algorithmis defined by the corresponding netlist, it is relatively easy to describe the program by agraph. The volume of processing associated with each node is that corresponding to theapplication of the algorithm to the elements of the circuit allocated to a given processor,and the communication cost results from transferring data between processors with inter-connected subcircuits allocated. Due to these characteristics, it is very useful to possessefficient algorithms for circuit partitioning because these would allow a balanced distribu-tion of the workload among processors. Particularly interesting are those algorithms thatcan be applied to irregular and sparse graphs, as these are the graphs normally associatedwith digital circuits.

In this paper we present a procedure for circuit partitioning in the context of paralleltest pattern generation. In a moderate time, the algorithm is able to find a partitioningof the circuit graph so that the parallel overall run time of the test generation process isminimised. This implies both maximizing the processor’s concurrency and minimizing thecommunication overhead, thus the objective function that we have used simultaneouslytakes these two objectives into account.

Several approaches for circuit partitioning have been reported [3, 4, 8, 10, 15, 17–19, 21,22, 27, 29, 31]. They can be classified as combinatorial or move-based approaches [10, 17,22], approaches based on geometric representations [15, 18], multilevel and hierarchicalclustering [8, 19, 21, 31] and hybrid schemes [4] that combine diffrentes types of approachesand can maximize the advantages of them.

The procedure here proposed belongs to the class of move-based approaches and it is alsoa hybrid scheme in which the solution is built iteratively from an initial solution by applyinga move or transformation to the current solution. The set of possible transformations thatcan be applied to a given solution defines the neighbourhood structure of the solution space,which is explored repeatedly moving from the current solution to a neighbouring one.These move-based procedures are simple to describe and implement, and thus, this kind ofprocedures is the most frequently used, together with a multilevel approach in some cases.The move-based procedures include iterative improvement methods [10, 22, 27], whichmove from the current solution to the best solution in its neighbour, and stochastic hill-descending procedures such as those based on Simulated Annealing (SA) [1], Tabu Search(TS) [16], and Genetic Algorithms (GA) [26], which allow movements towards solutionsworse than the current one in order to escape from local minima.

Iterative improvement algorithms such as the algorithms of Kernighan-Lin (KL) [22] andFiduccia-Mattheyses (FM) [10] for graph bipartitioning, and [19, 21, 27, 31] for partitioninginto multiple blocks, are widely applied and have almost become standards, their resultsfrequently being used for comparison with other methods. The stochastic hill-descending

A MIXED HEURISTIC FOR CIRCUIT PARTITIONING 323

procedures previously indicated are metaheuristics that allow the user to define the timecomplexity by deciding to trade solution quality for speed. Thus, these metaheuristics arevery suitable when it is important to have a good solution in a limited amount of time. Thisis the situation when the partitioning algorithm is used to distribute the load in a parallelprogram because a large amount of time to obtain it will limit the efficiency.

SA is used in [29] to find near optimal solutions to the problem of partitioning a digitalcombinational circuit for pseudo-exhaustive testing. An algorithm for circuit partitioningbased on the TS metaheuristic, also applied to pseudo-exhaustive testing, was reported in[3]. Nevertheless, the goals of this application are different from those considered in thepresent paper. In [3], the partitioning problem involves the division of the target circuit intonon-overlapping subcircuits with no more than a given number of inputs each, and is subjectto some connectivity constraints. In our case the number of inputs to each subcircuit is notlimited.

In [20], SA is compared with iterative algorithms and SA is found to outperform theKL algorithm for geometric and random graphs. However, it is suggested that multipleruns of KL with random initial solutions would be better than SA for the kind of graphsthat are applied to circuit netlists. SA is not usually seen as an effective approach for VLSIapplications because the computing times are quite large. For example, at low temperatures,many candidate solutions are explored and rejected before accepting an improved solution.Instead, TS uses a tabu list to avoid cycling near local optima and to enable moves towardsworse solutions. Thus, it is usually argued that TS is able to explore the solution spacemore efficiently than SA because it does not waste time in previously visited regions of thesolution space. In any case, these approaches need not necessarily be seen as opposed andcan be combined to obtain an improved procedure without the drawbacks characteristic ofeach method.

Thus, hybrid methods has been proposed such as [4], in which a heuristic that mixes TabuSearch and genetics algorithms is applied to the circuit partitioning problem, and a classifi-cation of hybrid algorithms is also provided. A hybrid method that allows the temperatureparameter to be strategically manipulated, rather than progressively diminished, has beenshown to yield an improved performance over standard SA approaches [9]. The algorithmhere proposed can also be considered as a hybrid heuristic with additional elements of a TabuSearch in a Simulated Annealing algorithm; thus it is termed Mixed Simulated Annealingand Tabu Search algorithm (MSATS).

Then, a large number of graph partitioning schemes exist and they differ in the edge-cut quality produced, run time, degree of parallelism and applicability to certain kindsof graphs. Often, it is not clear as to which scheme is better under what scenarios. In[28], these properties are categorized for some graph partitioning algorithms and in [30],the edge-cut quality is analized with different benchmarks for some graph partitioningpackages.

In the following, Section 2 gives a more precise definition of the circuit partitioningproblem and describes the cost function to optimize used in this paper. The description ofthe proposed algorithm (MSATS) is provided in Section 3, along with the experimentalresults for circuit partitioning in the context of parallel test pattern generation. Section 4extends the original MSATS to work with graphs that are not restricted to directed acyclicgraph (combinatorial circuits) and Finally Section 5 presents conclusions of the paper.

324 GIL ET AL.

2. The circuit partitioning problem

The circuit partitioning problem consists of finding a decomposition of the target circuitinto non-overlapping subcircuits with at least one logical gate in each subcircuit. Among thedifferent objectives that may be satisfied by the desired partitioning are (i) the minimizationof the number of cuts, (ii) the minimization of the number of subcircuits, and (iii) theminimization of the deviation in the number of elements (inputs, logical gates, outputs andfanouts points) assigned to each partition.

Criterion (i) corresponds to minimizing the communication cost, since cutting a lineusually implies passing data between the processors where the subcircuits connected bythe cut line have been assigned. Criterion (ii) is used when the goal is to determine thepartition consuming less resources (processors in this case). Finally, criterion (iii) corre-sponds to the obtention of subcircuits of similar sizes in order to get a balanced workloaddistribution among processors. As our goal is to use all the available processors in themulticomputer to generate the test patterns in parallel while trying to keep all the pro-cessors working during all the run time, the number of subcircuits is fixed to be equalto the number of available processors in the machine, and the objectives correspond tocriteria (i) and (iii). This means obtaining subcircuits with similar sizes to balance theworkload of the processor (considered as proportional to the number of nodes), and mini-mizing the number of cuts. In the following, a mathematical formulation of the problem isprovided.

Let G = (X , A) be the directed acyclic graph associated with a combinational circuit C ,where X denotes the set of components (inputs, logical gates, and outputs) and A the set oflines used for signal propagation. The nodes of X can be classified as inputs, logical gatesand outputs of circuit C . Thus X is the union of three disjoint sets, the set of inputs E , theset of logical gates P(nodes), and the set of outputs O . Figure 1 shows an example circuitwith its graph representation.

(a)

e1

e2

e3

e4

p1

e5

e6

e7e8

p2

p3

p4

p5 p6 p

7

o1

o2

(b)

e1

e2

e3

e4

e5

e6

e7

e8

p1

p2

p3

p4

p5

p6

p7

o1

o2

Figure 1. Representation of a combinational circuit with 7 logic gates (a) and the directed graph associated withit (b).

A MIXED HEURISTIC FOR CIRCUIT PARTITIONING 325

The problem is to find a partition of X into a fixed number of K subsets Xk , (k = 1,. . . ,K )such that each induced subgraph satisfies the following conditions:

1. X = ⋃Kk=1 Xk and Xk

⋂Xh = ◦✪, ∀k �= h, (k, h) ∈ {1, . . . , K }2

2. pk = |Xk ∩ P| �= ◦✪ ∀k = 1, . . . , K3. Li ≤ pk ≤ Lu , with Li = n/k� − n/k�*θ and Lu = n/k� + n/k�*θ, k =

1, . . . , K ; where n = |P|; n/k� represents the number of gates that should be includedin each subcircuit to obtain a partition of similar sized subcircuits; and θ is the parameterrepresenting the proportion of gates that is tolerated as a imbalance tolerance with respectto n/k� or perfect balance. In the first part of this work (MSATS), θ has been set tovalues between 0.2 and 0.3. and 0.03 in the second part (RMSATS).

4. Gk(Xk , Ak) is a connected graph, ∀k = 1, . . . , K .

In this way, the problem can be formulated as a combinatorial optimization problem,which means:

minimize c (s) (or maximize c (s)), subject to s ∈ S

where S is a discrete set of feasible solutions and c(s) is the cost or objective function. Thus,solving a combinatorial optimization problem implies finding the best or optimal solutionamong a finite or countably infinite number of possible solutions. In [29] the formulationof the partitioning problem as an integer linear programming problem and as a quadratictransportation problem is provided. Here, given a circuit graph G = (X, A), the cost functionto minimize is defined as:

c(s) = α · n cuts(s) + β ·K∑

k=1

2deviation(k) (1)

where deviation(k) is the amount by which the number of gates in the subcircuit Gk variesfrom the bounds Lu or Li

deviation(k) = maximum {0, |Xk ∩ P| − Lu, Li − |Xk ∩ P|}

n cuts(s) is the number of cuts of the solution, and s is any solution to the circuit partitioningproblem, feasible or not, i.e. verifying the above condition 3 or not. Thus, whenever forall k = 1, . . . , K , in a given partition s, the deviation in the number of gates of Gk withrespect to n/k� is less than θ · n/k�., the solution is feasible and the cost is c(s) =α · n cuts(s) + β · K since deviation(k) = 0 (k = 1, . . . ,K ).The second term in (1) penalizesthe deviation from the feasible solution space and its magnitude is determined by the constantβ. Nevertheless, according to the relative magnitude of α and β, a transition to a solutions determining a deviation higher than θ · n/k� in the number of gates of any subcircuitstill reduces the cost function if the reduction in the number of cuts is sufficiently high. Forexample, if α = A · β is set, a transition to a solution with a reduction of one in n cuts(s)will still produce a reduction in the cost function whenever ((

∑Kk=1 2deviation(k)) − K ) is less

than the factor A.

326 GIL ET AL.

The proposed cost function does not take into account the connection topology of themulticomputer where the parallel program is executed. In the present paper the communi-cation cost has been assumed to depend only on the volume of communications betweenthe processors, thus considering equal the distance between processors. Nevertheless, thisassumption is indeed verified in commercial architectures that use a specific hardware ora software layer for message routing, providing homogeneous latency between processors.Moreover, the specific characteristics of the cost function do not influence the evaluation ofthe proposed algorithm, or the comparison with SA and TS because, as indicated in Section1, these are meta-heuristics which do not depend on the characteristics of the cost functionconsidered.

In the following sections, the description of the procedure proposed to solve this com-binatorial optimization problem is described. As the procedure is based on a mixing ofSimulated Annealing and Tabu Search techniques, a brief introduction to the notation andterminology is first given.

3. The MSATS algorithm

As has been said, move-based procedures for solving combinatorial optimization problems,such as the partitioning problem, implement a local search in the solution space S startingfrom an initial solution s0 ∈ S. At each iteration, a heuristic is used to obtain a new solutions ′ in the neighbourhood, N (s), of the current solution of s, through applying transforma-tions, or moves, to s. Every feasible solution, s ∈ N (s), is evaluated according to the costfunction c(s) to be optimized, thus determining a change in the value of the cost function,move value = c(s) − c(s). The basic local search approach corresponds to the so-calledhill-descending algorithms, in which a monotone sequence of improving solutions is ex-amined, until a local optimum is found. Hill-descending algorithms always stop at the firstlocal optimum. To avoid this drawback, several metaheuristics have been proposed in theliterature, such as Simulated Annealing [1] and Tabu Search [16]. These use mechanismsthat allow moves which increase the cost of the current solution as an attempt to escapefrom local optima. Simulated Annealing and Tabu Search have been widely applied topartitioning circuits and many other combinatorial optimization problems. The MSATS(Mixed Simulated Annealing Tabu Search) procedure described in this section, is a hybridmethod that takes advantage of both meta-heuristics to outperform the results provided byeach.

At each iteration of MSATS, admissible moves are applied to the current solution al-lowing transitions that increase the cost function as in Simulated Annealing. When a moveincreasing the cost function is accepted, the reverse move should be forbidden during someiterations in order to avoid cycling, as in Tabu Search. The restrictions in the admissiblemoves are implemented by using a short term memory function which determines how longa tabu restriction will be enforced and the admissible moves at each iteration.

The MSATS algorithm is shown in figure 2. It adds the characteristics of the searchimplemented by Simulated Annealing to the features of Tabu Search which correspond to asearch that centres more on specific zones according to the history and the best movementsapplied. Thus, a powerful algorithm is provided, as the results given in Subsection 3.1 show.

A MIXED HEURISTIC FOR CIRCUIT PARTITIONING 327

Figure 2. Algorithm MSATS for circuit partitioning.

328 GIL ET AL.

In MSATS, the temperature t is used as a parameter to control the probability of accepting anew solution, as in Simulated Annealing. At a given temperature, only the solutions whichare selected by using the SA cooling schedule are considered as candidates to produce atransition. Thus, a certain randomness is introduced into a pure TS, in order to explore zonesof the solution space that do not appear as very promising at first. As the algorithm also hasthe characteristics of a Tabu Search, it avoids the cycles around local minima, allowing amore efficient exploration of the solution space without revisiting solutions, as may occurin a pure SA.

Two different initial solutions, s1 and s2, have been used in our experiments. They areobtained by fast algorithms that assign n/K nodes to each partition. The initial solution s1is obtained by an algorithm named Input Partitioning, in which the circuit graph is traversedin a depth-first way starting from the inputs. Solution s2 is provided by an algorithm calledOutput Partitioning which processes the graph in a depth-first manner from the outputnodes. These partitioning algorithms have been applied for circuit partitioning in parallellogic simulations and they take O(n) time and obtain partitions in which strongly connectedcomponents of the graph are assigned to the same partition. As shown in Subsection 3.1,the quality of the best solution found with MSATS (and also with SA and with TS) issometimes affected by the choice of one or another initial solution. Nevertheless, the bestinitial solution depends on the specific circuit considered and on the control parameters. Inmany cases both initial solutions provide similar results. Figure 3 shows two examples ofinitial solutions s1 (figure 3(a)) and s2 (figure 3(b)) for the circuit of figure 1.

In the application of MSATS to the partitioning problem, the neighborhood N(s) of thecurrent solution s contains all solutions s which may be obtained from s by transferring onlyone gate from one subcircuit (source subcircuit) to an other circuit (destination subcircuit).The gate that is transferred must belong to the boundary of the corresponding subcircuit,which contains all the gates connected at least to one gate belonging to a different subcircuit.These gates are called boundary gates. Moreover, the destination subcircuit must be one of

(a)

e1

e2

e3

p1 p3

c1

e4

e8

p5 p6 p

7

o1

c

c

1

2

Subcircuit 1

Subcircuit 2

o2

e5

e6

e7

p2 p4

c2

Subcircuit 3

e1

e2

e3

e4

p1 p3 p5

o2

c1

e5

e6

e7

p4c2

e8

p6 p

7

o1

c1

c2Subcircuit 1

Subcircuit 3

Subcircuit 2

(b)

p2

Figure 3. A partition example using the circuit in figure 1(a), for k = 3 with s1 initial solution (a) and s2 initialsolution (b).

A MIXED HEURISTIC FOR CIRCUIT PARTITIONING 329

the subcircuits connected to the gate transferred. In this way, the condition 4 described inSection 2 is verified during the process.

As can be seen in figure 2, the set of solutions explored at a given temperature is definedby selecting the boundary gates, according to their level in the circuit. Of course, the set ofboundary gates might change during process as the solution s changes. For example, due toa move in an other boundary gate, a boundary gate would no longer be such a boundary gate.Furthermore, a no-boundary gate can become such a gate if it is connected to a boundarygate which is moved to a different subcircuit. In the first case, the gate will not be selected,and in the case of a new boundary gate, it can be selected in subsequent moves at the presenttemperature. In any case, as each gate can be selected only once at a given temperature, thenumber of solutions explored at this temperature is finite. When a boundary gate is selectedit is allocated to the subcircuit having more lines connected to this boundary gate. Wheneverthis move is accepted and implies an increase in the cost function, the inverse move, whichallocates the gate in the initial subcircuit, is included in the short term memory to tag it asa tabu move. The move is maintained in the short term memory during the next iterationof the while loop because, after one iteration, it is quite likely that the present solution haschanged enough with respect to the solution to which the move was applied. At a giventemperature, a boundary gate can be selected only once.

Figure 4 shows an example of boundary gates movements. After selecting the boundarygate, the algorithm determines if it is possible to move it according to the problem constraints.If a move in the selected gate is allowed, the gate is allocated to the subcircuit with most gatesconnected to it. In this example the algorithm begins with p1 and p2 (figure 4(a)). As thesegates are not boundary gates, they are not selected, and the algorithm proceeds with gatep3. Gate p3 is a boundary gate but it cannot be moved because a move in gate p3 will leaveonly a gate in subcircuit 2 and each subcircuit must have at least 2 gates. The same happenswith gate p4. Gate 5 is a boundary gate that can be moved and it is allocated to subcircuit2. In this case, the number of cuts remains the same because one cut appears and also onecut disappears. As p3 and p4, gate p6 cannot be moved, and finally p7 is not a boundarygate. Thus, the circuit after the first iteration of the while loop is shown in figure 4(b).

e1

e2

e3

p1 p3

c1

e4

e8

p5 p6 p

7

o1

c

c

1

2

Subcircuit 1

Subcircuit 2

o2

e5

e6

e7

p2 p4

c2

Subcircuit 3

(a) (b)

e1

e2

e3

p1 p3

o1

Subcircuit 1

Subcircuit 2

Subcircuit 3

c1

o2

e5

e6

e7

p2 p4

c2

e4

p5

e8

p6 p

7

c2

c1

Figure 4. An example of moving the boundary gate p5 in subcircuit 1 (a) to the subcircuit 2 (b).

330 GIL ET AL.

MSATS stops when one of the following conditions is verified: (i) the temperature isequal to a final value (in this paper a temperature equal to zero has been used as finaltemperature), (ii) the number of moves applied without improving the best solution foundso far (n failures) reaches a maximum bound of consecutive iterations (max failures), and(iii) the number of iterations reaches the value max iteration (figure 2).

At high temperatures, MSATS behaves almost as a pure SA because most of the transitionsare accepted, and it is very unlikely to need to select one of the transitions included in thetabu list. On the other hand, at low temperatures the solutions that increase the cost arerarely selected in an SA and the effect of Tabu Search is also small. Thus, the difference inthe behaviour of MSATS with respect to a pure SA due to the use of tabu moves is moreimportant at intermediate temperatures. The effect of the TS elements included in MSATSallows the reduction of the number of iterations to get a solution with a given quality andtherefore, in the end MSATS is able to improve the quality of the solution obtained, as canbe seen from the results provided in Subsection 3.1.

3.1. Experimental results

Next, we summarize the results obtained by using the MSATS algorithm. The algorithmwas programmed in C and executed on a Power Challenge XL (Silicon Graphics). Thecircuits used to evaluate the performance of the hybrid algorithm are the ISCAS’85 [6]as they are common benchmark circuits in the context of test pattern generation. The firstseven rows of Table 1 (c432, . . . ,c6288) shows the basic characteristics of these circuits.The values of the parameters are set to their best values according to previous experimentalresults obtained, thus max failures is set to 0.25*(max iterations), and tfactor to 0.999. Thevalue of max iterations is usually taken as 1000, and the initial temperature, t0 as 10. Forthese values, the cost function reaches a stable final value at the end of the 1000 iterations.An increase in the number of iterations, in most of cases, does not allow us to get bettersolutions unless the value of tfactor is increased. Nevertheless, this also implies an increasein the run time and, except for extremely large circuits, the new solution does not imply agreat improvement.

Table 2 shows the best results obtained by the Simulated Annealing, Tabu Search andMSATS compared with the initial solution s1 for partitions (K ) of 2, 4, 8, 16 and 32subcircuits. The row cut reduction indicates the average of the reduction in the number ofcuts obtained by MSATS with respect to the best solution of those provided by SA and TS ineach case. As can be seen the results obtained by MSATS are better than the results obtainedwith TS and SA in most cases, specially when the circuit size increases. In these circuits,the neighbourhood of a given solution is large, and the effect of considering tabu transitionsis more important. Moreover, Table 2 compares the computing times for MSATS, TS andSA. As can be seen the times for MSATS are similar to those of Simulated Annealing andlower than those of Tabu Search.

With respect to a pure TS algorithm, MSATS accepts a smaller number of solutionsincreasing the cost function. Thus, although the number of iterations executed by MSATSis similar to that of TS, as an iteration consumes more time in the TS algorithm, it takeslonger to stop. As is shown in figure 4, although a TS algorithm is able to reach solutions

A MIXED HEURISTIC FOR CIRCUIT PARTITIONING 331

Table 1. Benchmark graphs.

Graph V E max min avg Tam file (KB)

c432 292 432 10 1 2.96 3

c499 334 499 13 1 2.99 4

c880 594 880 9 1 2.96 6

c1355 878 1355 13 1 3.09 10

c3540 2320 3540 17 1 3.05 31

c5315 3414 5315 16 1 3.11 49

c6288 3936 6288 17 1 3.2 58

add20 2395 7462 123 1 6.23 63

data 2851 15093 17 3 10.59 140

3elt 4720 13722 9 3 5.81 136

uk 4824 6837 3 1 2.83 70

add32 4960 9462 31 1 3.82 90

whitaker3 9800 28989 8 3 5.92 294

crack 10240 30380 9 3 5.93 297

wing nodal 10937 75488 28 5 13.80 768

fe 4elt2 11143 32818 12 3 5.89 341

vibrobox 12328 165250 120 8 26.81 1679

4elt 15606 45878 10 3 5.88 501

fe sphere 16386 49152 6 4 6.00 540

cti 16840 48232 6 3 5.73 532

memplus 17758 54196 573 1 6.10 536

cs4 22499 43858 4 2 3.90 506

wing 62032 121544 4 2 3.92 1482fe tooth 78136 452591 39 3 11.58 5413598a 110971 741934 26 5 13.37 9030

fe ocean 143437 409593 6 1 5.71 5242

wave 156317 1059331 44 3 13.55 13479

with small values in the cost function, it takes a long time trying to improve that solutionswithout reaching the stop condition, determined by the number of moves without obtainingan improvement in the cost function and a maximum number of iterations.

Furthermore, MSATS is able to obtain solutions with a more balanced distribution of gatesamong the different subcircuits. For example, in the partition of c432 into two subcircuits,MSATS provides subcircuits of 76 and 77 gates respectively, while TS obtains subcircuitswith 65 and 88 gates, and SA subcircuits with 70 and 83.

3.2. MSATS in a parallel test pattern generator

The MSATS algorithm has been used as a first step in a parallel test-pattern generator[12, 14]. It starts by applying MSATS for partitioning the circuit under test, and after

332 GIL ET AL.

Table 2. A summary of results with the best obtained solutions and the times for the algorithms: Initial solution(s1), Simulated Annealing (SA), Tabu Search (TS) and MSATS.

K = 2 K = 4 K = 8 K = 16 K = 32

Circuit Algorithm Cuts Sec. Cuts Sec. Cuts Sec. Cuts Sec. Cuts Sec.

c432 s1 66 0,2 106 0,3 158 0,4 186 0,5 212 0,6

SA 21 2,0 45 2,2 68 2,8 93 3,9 142 7,1

TS 22 6,1 41 9,2 62 11,2 87 13,6 130 15,3

MSATS 20 2,1 40 2,3 61 2,8 85 3,7 128 6,9

Cuts reduction (%) 5 3 2 3 2

c499 s1 41 0,2 94 0,3 143 0,5 205 0,6 225 0,7

SA 17 2,2 58 2,3 86 3,2 132 4,1 172 7,5

TS 18 7,2 51 11,4 79 15,2 132 17,1 175 19,5

MSATS 16 2,1 54 2,4 78 3,1 119 3,9 168 7,2

Cuts reduction (%) 6 −5 2 10 3

c880 s1 44 0,3 100 0,4 155 0,6 222 0,7 300 0,8

SA 19 3,0 41 3,8 98 5,1 129 6,3 189 8,5

TS 19 18,1 43 28,5 89 35,7 129 42,3 185 53,3

MSATS 17 3,1 37 3,9 80 4,9 129 6,1 179 8,2

Cuts reduction (%) 11 10 11 0 4

c1355 s1 60 0,4 137 0,5 240 0,7 351 0,9 394 1,1

SA 28 5,3 45 6,0 97 8,1 98 11,2 156 16,4

TS 38 21,3 46 34,6 94 39,4 98 47,8 148 61,9

MSATS 17 5,6 45 6,1 70 7,9 98 10,2 128 15,7

Cuts reduction (%) 40 0 26 0 14

c1908 s1 123 0,6 210 0,7 298 0,9 450 1,2 629 1,4

SA 55 8,0 74 9,5 125 11,5 168 12,9 234 20,2

TS 50 54,2 76 60,2 120 68,5 156 78,5 225 84,9

MSATS 35 8,2 71 9,5 108 10,9 125 12,1 205 19,0

Cuts reduction (%) 30 15 10 20 9

c3540 s1 116 0,8 333 0,9 535 1,2 771 1,5 1023 1,9

SA 67 15,0 156 16,2 250 20,9 375 26,5 478 42,8

TS 96 49,2 179 83,2 259 76,4 377 90,7 498 124,6

MSATS 46 15,1 132 16,9 221 20,1 298 25,4 455 40,6

Cuts reduction (%) 32 16 12 21 5

c5315 s1 188 1,1 332 1,5 496 1,9 654 2,3 939 3,1

SA 61 20,9 162 22,5 350 25,8 301 33,4 542 52,3

TS 88 80,3 172 93,4 309 116,4 402 140,3 532 203,6

MSATS 50 21,4 160 22,9 174 25,8 286 31,6 407 48,7

Cuts reduction (%) 24 2 44 5 24

c6288 s1 60 1,4 155 1,8 334 2,3 671 2,9 947 3,8

SA 48 28,1 135 30,8 315 35,1 450 44,8 745 70,5

TS 46 115,3 157 145,7 320 168,5 550 211,8 774 290,6

MSATS 46 28,4 102 30,6 301 34,2 355 42,6 524 66,9

Cuts reduction (%) 0 25 5 22 30

A MIXED HEURISTIC FOR CIRCUIT PARTITIONING 333

0

2

4

6

8

10

12

14

16

2 4 6 8 10 12 14 16

Processors

c432

c499

c880

Speedup

Figure 5. Speedup with some ISCAS circuits for test pattern generation.

this each processor receives one of the subcircuits obtained. Thus, it is possible for allthe processors to concurrently apply the test generation algorithm, described in [13], todetermine the test patterns for the stuck-at faults in the nodes of the corresponding subcir-cuit. The communication between processors is needed to complete the determination ofthe test equation in each node, and it grows with the number of cuts among subcircuits.Thus, one way to demonstrate the performance of MSATS is to consider the increase inthe speedup provided by the parallel test-generator when the number of processors grows.If the speedup grows proportionally to the number of processors, or in other words, if theefficiency is more or less constant, the performance of MSATS is adequate according tothe conditions given in Section 2. This behaviour is observed in figure 5, which shows thespeedup obtained for different circuits of the ISCAS set when the number of processorsworking grows. Speedup results for the ISCAS’85 circuits are provided in Table 3. Theparallel test generator has been run in a multicomputer Intel Paragon. The speedups ob-tained with K processors are given in the columns labeled S K and the number of cutsproduced when the circuit is partitioned in K subcircuits are given in the columns C K inTable 3.

As has been indicated, the cost function used in this paper does not model characteristicsof the multicomputer to be used, such as the interconnection topology and the relativecommunication and computation costs. This is justified by taking into account that in manycases the communication delays are similar for all the processors in the machine due to theexistence of a suitable specific hardware or to the use of a corresponding software layer for

334 GIL ET AL.

Table 3. Speedups, number of cuts and fault coverages obtained with the parallel test-pattern-generator.

Circuit Faults S 2 C 2 S 4 C 4 S 8 C 8 S 12 C 12 S 16 C 16 Coverage

c432 864 1.69 20 3.70 40 6.23 61 8.76 72 10.34 85 98%

c499 998 1.78 16 3.85 54 6.34 78 8.20 96 11.43 119 99%

c880 1760 1.82 17 3.87 37 7.05 80 9.34 98 12.56 129 100%

c1355 2710 1.90 17 2.93 45 6.25 70 9.45 81 11.65 98 98%

c1908 3816 1.97 35 2.50 71 6.86 108 9.67 113 10.26 125 98%

c3540 7080 1.78 46 3.67 132 6.40 221 8.30 240 10.20 298 98%

c6288 12570 1.67 46 2.75 102 6.53 301 8.50 327 10.45 355 97%

routing. Also, the use of simple cost functions is more suitable when, as in this case, thespeed of the optimization algorithm is important. Nevertheless, the performances obtainedby the test pattern generator shows that the model provided by our cost function is goodenough.

4. Extensions to other partitioning problems

As discussed above, the MSATS algorithm was proposed as a first step in the application ofparallel test pattern generation. Its field of application was thus restricted to combinatorialcircuits or their equivalent representation as acyclic directed graphs. A further requirementwas that the partitions must be contiguous (Section 2, condition 4).

Therefore, to compare this technique with other approaches, we extended the origi-nal MSATS to create RMSATS (Refinement of MSATS), which can be applied in graphswithout the above restrictions. The main differences with respect to MSATS are asfollows:

1. The input format to read and store the information of the graph was adapted to the publicdomain benchmarks currently used [30] to compare different packages.

2. The initial solution was modified slightly. In MSATS, an input (output) partitioningalgorithm is used where the circuit is traversed in a depth-first way starting from theinputs (outputs). However, in the new graph configuration, this strategy is not possibleas primary inputs (outputs) are not available, or at least are not so specified. Therefore, wesought alternatives, and have implemented a growing algorithm where a width traverseof the graph is performed, beginning with a randomised vertex.

3. We included a refinement step at the end of the algorithm. In this step, the balanceobjective is considered for optimisation in a greedy way, without worsening the firstobjective (edge cut).

4. For comparison with other approaches, the factor θ (imbalance tolerance) was fixedat 0.03, while in MSATS it was fixed in the range of 0.2 to 0.3. This is because itwas considered more important to reduce the cut weight considerably (interprocessorcommunication), to the detriment of node balance.

A MIXED HEURISTIC FOR CIRCUIT PARTITIONING 335

4.1. Experimental results

For the RMSATS experimental results, we used the ISCAS’85 benchmarks adapted to thenew format, together with public domain benchmarks currently used to compare differentapproaches. None of the graphs have vertex or edge weights. Table 1 lists all the graphs,their sizes, the maximum, minimum and average angle of the vertices (number of adjacentvertices) and the size in KB. The test graphs were chosen to conform a representative sampleof small to large scale real-life problems, and include both 2D and 3D examples of nodalgraphs, dual graphs, 3D semi-structured graphs and other non-mesh like combinatorialcircuits (ISCAS and add32).

We compared the results of RMSATS with those produced by a public domain partition-ing package, METIS 4.0 [21]. The algorithms chosen were pmetis (Multilevel recursivebisection) and kmetis (Multilevel k-way partitioning).

A large number of experiments were carried out to adjust different parameters of thealgorithm, including initial temperature t0 and tfactor. The values for tfactor depend onthe size of the graph and the run time of the algorithm. From small to large graphs, wechose a tfactor in the range of 0.999 to 0.9999 respectively, with an initial temperature of 2.This initial temperature is lower than in MSATS because we have larger graphs and we canexplore the solution space without wasting time on solutions that only worsen the objectivefunction.

Table 4 shows the results obtained from using RMSATS for 5 values of K (number ofpartitions). The table also shows the total weight of cut edges and the imbalance for theRMSATS, growing (initial solution of RMSATS), pmetis and kmetis algorithms.

It is important to note that graph partitioning algorithms can usually find higher qualitypartitions if the balancing constraint is relaxed slightly. Indeed, some of the public domainpartitioning packages, such as JOSTLE and METIS have an in-built, although adjustable,imbalance tolerance of 3%. For this reason, we used the same value in RMSATS, althoughthe algorithm allows a deviation factor (Eq. (1)) of 2%. Thus, the total imbalance toleranceis at worst 5%. However, after the refinement step, this imbalance is reduced to 0% in somecases. Note, too, that since pmetis uses recursive bisection, and thus produces partitionswith perfect balance, the cut weight is worse than that achieved by kmetis and RMSATSin most cases.

The results given in Table 4 show that RMSATS improves the total weight of cut edgesin many cases, even in larger graphs. On the other hand, run times were longer in every casewith RMSATS. This range varied from one to three orders of magnitude. Nevertheless, runtimes were of the order of minutes, or several hours in the worst case (other algorithms takeweeks for best cut edge quality [30]). Thus, as indicated in [30], some applications preferbetter quality solutions for the cut edge objective even at the cost of a slight increase in theimbalance or in the run time in this phase, since this implies a decrease in the total timerequired by a parallel application (i.e. a reduction in interprocessor communication time),as in parallel test pattern generation.

It is not easy to identify a trend in the results obtained, other than that RMSATS doesparticularly well when the value of K increases (i.e. 16 or 32 partitions) in most graphs, anddoes badly in the case of the add32.

336 GIL ET AL.

Table 4. Number of cuts (Cuts) and imbalance (Desv.) for different algorithms (pmetis, kmetis,growing and RMSATS) with all the bechmarks in Table 1 for different values of partitions (K).

K = 2 K = 4 K = 8 K = 16 K = 32

gra Algorithm Cuts Desv. Cuts Desv. Cuts Desv. Cuts Desv. Cuts Desv.

c432 pmetis 36 0 61 0 80 1 119 10 171 21

kmetis 39 1 60 3 87 1 147 37 179 64

growing 98 0 291 0 363 0 397 0 415 0

RMSATS 33 1 56 1 76 3 103 0 147 0

c499 pmetis 23 0 61 1 92 5 140 5 188 5

kmetis 20 4 62 2 92 3 167 20 211 211

growing 109 0 268 0 387 0 440 0 473 0

RMSATS 20 4 55 5 83 5 131 5 185 1

c880 pmetis 24 0 60 1 100 1 153 5 211 8

kmetis 44 3 60 3 93 2 152 2 219 24

growing 135 0 331 0 596 0 746 0 802 0

RMSATS 19 1 50 5 84 4 132 3 198 0

c1355 pmetis 24 0 59 0 101 0 134 4 204 6

kmetis 22 1 68 2 97 4 144 9 218 13

growing 305 0 627 0 974 0 1173 0 1262 0

RMSATS 28 2 59 4 81 5 115 4 184 4

c3540 pmetis 75 0 147 0 289 0 384 1 566 2

kmetis 92 3 184 3 264 3 417 3 577 2

growing 300 0 952 0 1890 0 2783 0 3079 0

RMSATS 66 0 141 2 218 4 329 5 490 4

c5315 pmetis 87 0 164 0 250 0 388 1 554 1

kmetis 77 1 159 3 273 3 387 4 554 3

growing 531 0 1316 0 2896 0 3980 0 4426 0

RMSATS 83 0 169 3 229 4 326 5 483 5

c6288 pmetis 153 0 257 0 384 0 504 1 632 2

kmetis 161 3 265 1 408 1 507 3 661 2

growing 1297 0 3065 0 4214 0 4913 0 5194 0

RMSATS 134 0 246 5 326 2 416 5 561 5

add20 pmetis 725 0 1292 0 1907 0 2504 1 3008 3

kmetis 719 3 1257 3 1857 5 2442 7 3073 74

growing 1549 0 2811 0 3696 0 4577 0 4940 0

RMSATS 701 3 1493 5 1769 5 2157 5 2466 4

uk pmetis 23 0 67 0 101 0 189 0 316 1

kmetis 36 3 64 3 98 2 189 3 302 3

growing 97 0 221 0 469 0 970 0 1971 0

RMSATS 30 0 69 1 89 4 161 5 259 5

(Continued on next page.)

A MIXED HEURISTIC FOR CIRCUIT PARTITIONING 337

Table 4. (Continued ).

K = 2 K = 4 K = 8 K = 16 K = 32

gra Algorithm Cuts Desv. Cuts Desv. Cuts Desv. Cuts Desv. Cuts Desv.

add32 pmetis 21 0 42 0 81 0 128 0 288 1

kmetis 28 2 44 3 102 3 206 3 352 15

growing 1067 0 2968 0 5166 0 5866 0 6169 0

RMSATS 41 1 119 5 191 5 285 5 349 5

3elt pmetis 108 0 231 0 388 0 665 0 1093 0

kmetis 97 1 213 2 403 2 651 2 1096 2

growing 239 0 724 0 1487 0 3015 0 6147 0

RMSATS 90 0 201 4 340 5 567 5 959 5data pmetis 218 0 480 0 842 0 1370 0 2060 1

kmetis 233 2 454 2 806 3 1350 3 2080 2

growing 300 0 770 0 1493 0 2988 0 6018 0

RMSATS 227 2 427 5 727 5 1168 5 1818 5

whitaker3 pmetis 135 0 406 0 719 0 1237 0 1891 0

kmetis 133 3 446 3 769 3 1200 3 1824 3

growing 304 0 728 0 1642 0 3449 0 7019 0

RMSATS 128 0 385 5 687 5 1093 5 1683 5

crack pmetis 187 0 382 0 773 0 1255 0 1890 0

kmetis 225 1 408 3 809 3 1218 3 1882 3

growing 399 0 935 0 2048 0 4196 0 8403 0

RMSATS 184 0 363 2 685 4 1098 5 1689 5

fe 4elt2 pmetis 130 0 359 0 654 0 1152 0 1787 0

kmetis 132 0 398 3 684 3 1149 3 1770 3

growing 244 0 814 0 1823 0 3685 0 7550 0

RMSATS 130 0 349 0 622 4 1004 5 1663 5

4elt pmetis 154 0 406 0 635 0 1056 0 1769 0

kmetis 225 0 344 2 614 3 1099 3 1784 3

growing 735 0 1848 0 4205 0 8684 0 17445 0

RMSATS 187 0 437 3 649 5 945 5 1534 5

Cs4 pmetis 414 0 1154 0 1746 0 2538 0 3579 0

kmetis 410 0 1173 3 1677 3 2521 3 3396 3

growing 1277 0 3489 0 7672 0 15685 0 27983 0

RMSATS 377 0 1018 1 1484 4 2245 4.6 3056 4.9

cti pmetis 334 0 1113 0 2110 0 3181 0 4605 0

kmetis 395 2 1132 1 2130 2 3451 3 4713 3

growing 1698 0 4446 0 9548 0 19402 0 34918 0

RMSATS 588 1 1177 5 1853 5 2875 5 4059 5

(Continued on next page.)

338 GIL ET AL.

Table 4. (Continued ).

K = 2 K = 4 K = 8 K = 16 K = 32

gra Algorithm Cuts Desv. Cuts Desv. Cuts Desv. Cuts Desv. Cuts Desv.

memplus pmetis 6337 0 10559 0 13110 0 14942 0 17303 0

kmetis 6453 3 10483 3 12615 3 14604 6 16821 6

growing 13433 0 23063 0 27957 0 30818 0 33417 0

RMSATS 8570 4 12438 5 13640 5 15301 5 16269 5

fe sphere pmetis 440 0 872 0 1330 0 2030 0 2913 0

kmetis 444 0 903 3 1306 3 2012 2 2842 2

growing 472 0 1342 0 2914 0 6050 0 12294 0

RMSATS 386 0 774 4 1226 4 1726 5 2511 5

wing nodal pmetis 1820 0 4000 0 6070 0 9290 0 13237 0

kmetis 1855 0 4355 2 6337 3 9465 3 12678 3

growing 5368 0 14118 0 29960 0 39425 0 44408 0

RMSATS 1723 0 3816 5 5419 5 8392 5 11741 5

wing pmetis 950 0 2086 0 3205 0 4666 0 6700 0

kmetis 909 1 1943 3 3120 3 4652 3 6613 3

growing 2785 0 7062 0 15375 0 31486 0 63456 0

RMSATS 1353 0 2069 5 2877 5 4254 5 5938 5

vibrobox pmetis 12427 0 21471 0 28177 0 37441 0 46112 0

kmetis 11952 1 23141 2 29640 3 38673 3 45613 3

growing 27835 0 64754 0 88603 0 105714 0 118650 0

RMSATS 11604 0 20500 5 30572 5 36487 5 44678 5

fe ocean pmetis 505 0 2039 0 4516 0 9613 0 14613 0

kmetis 536 0 2194 2 5627 2 10253 3 16604 3

growing 2376 0 7938 0 17855 0 37430 0 76065 0

RMSATS 464 0 2349 1 5481 5 9051 5 13923 5

fe tooth pmetis 4292 0 8577 0 13653 0 19346 0 29215 0

kmetis 4262 2 7835 3 13544 3 20455 3 28572 3

growing 7785 0 32262 0 71076 0 145539 0 218804 0

RMSATS 4086 0 11817 5 13152 5 21562 5 26981 5

598a pmetis 2504 0 8533 0 17276 0 28922 0 44760 0

kmetis 2533 0 8495 1 17137 3 28647 3 44398 3

growing 26129 0 69388 0 145234 0 181053 0 243981 0

RMSATS 3881 0 12603 5 22734 5 27976 5 40896 5

wave pmetis 9493 0 23032 0 34795 0 48106 0 72404 0

kmetis 9655 0 21682 2 33146 3 48183 3 67860 3

growing 32380 0 85289 0 184382 0 381418 0 514427 0

RMSATS 12744 0 21390 5 38522 5 51920 5 64983 5

A MIXED HEURISTIC FOR CIRCUIT PARTITIONING 339

5. Conclusions

In this paper we have developed a new algorithm for the circuit partitioning problem inthe framework of parallel circuit testing. More specifically, its use has been included in aparallel test pattern generator based on the partitioning of the circuit to test. The problem isformulated as a combinatorial optimization problem by using a cost function comprising thecontribution of the number of cuts and the deviation with respect to a balanced distributionof the gates among the different subcircuits.

The new algorithm, called MSATS, has been proposed to obtain good solutions in a smallamount of time. It reduces the possibility of cycles in the search process by applying theTabu Search characteristics to a Simulated Annealing algorithm. The results provided showthat MSATS outperforms the TS and SA algorithms when applied to the same cost function.To compare with other state of the art approaches, an extended version have been includedand the results show that this technique is comparable in terms of solution quality and evenrun time.

Acknowledgments

We thank Dr. Inmaculada Garcıa for critically reading the manuscript and making severaluseful remarks. This work has been supported by project TIC2000-1348 (CICYT, Spain).

References

1. E. Aarts and J. Korst, Simulated Annealing and Boltzmann Machines. A Stochastic Approach to CombinatorialOptimization and Neural Computing, John Wiley & Sons: New York, 1990.

2. C.J. Alpert and A. Kahng, “Recent developments in netlist partitioning: A survey,” Integration: The VLSIJournal, vol. 19, no. 1/2, pp. 1–81, 1995.

3. A.A. Andreatta and C.C. Ribeiro, “A graph partitioning heuristic for the parallel pseudo-exhaustive logicaltest of VLSI combinational circuits,” Annals of Operations Research, vol. 50, pp. 1–36, 1994.

4. S. Areibi and A. Vannelli, “Advanced search technique for circuit partitioning,” in DIMACS Series in DiscreteMathematics and Theoretical Computer Science, vol. 16, pp. 77–98, 1993.

5. P. Banerjee, Parallel Algorithms for VLSI Computer Aided Design, Prentice Hall; Englewoods Cliffs, NJ,1994.

6. F. Brglez and H. Fujiwara, “Neural netlist of ten combinational benchmark circuits and a target translator inFORTRAN,” in Proc. IEEE Int. Symp. Circuits Syst., Special Session ATPG, 1985.

7. J.A. Chandy and P. Banerjee, “A parallel circuit-partitioned algorithm for timing-driven standard cell place-ment,” Journal of Parallel and Distributed Computing, vol. 57, pp. 64–90, 1999.

8. J. Cong and M. Smith, “A parallel bottom-up clustering algorithm with applications to circuit partitioning inVLSI design,” in Proc. ACM/IEEE Design Automation Conference, 1993, pp. 755–760.

9. K.A. Dowsland, “Simulated annealing,” in Modern Heuristic Techniques for Combinatorial Problems,C.R. Reeves (Ed.), Blackwell: London, 1993, pp. 20–69.

10. C. Fiduccia and R. Mattheyses, “A linear time heuristic for improving network partitions,” in Proc. 19th IEEEDesign Automation Conference, pp. 175–181, 1982.

11. M.R. Garey and D.S. Johnson, Computers and Intractability: A Guide to the Theory of NP-Completeness,W.H. Freeman: San Francisco, 1979.

12. C. Gil and J. Ortega, “A parallel test pattern generator based on Reed-Muller spectrum,” in Euromicro Work-shop on Parallel and Distributed Processing, IEEE, pp. 199–204, 1997.

340 GIL ET AL.

13. C. Gil and J. Ortega, “Algebraic test-pattern generation based on the Reed-Muller spectrum,” IEE ProcedingComputers and Digital Techniques, vol. 145, no. 4, pp. 308–316, 1998.

14. C. Gil, J. Ortega, and M.G. Montoya, “Parallel VLSI test in a shared memory multiprocessors,” Concurrency:Practice and Experience, vol. 12, no. 5, pp. 311–326, 2000.

15. J. Gilbert, G. Miller, and S. Teng, “Geometric mesh partitioning: Implementation and experiments,” in Pro-ceedings of International Parallel Processing Symposium, 1995.

16. F. Glover and M. Laguna, “Tabu Search,” in Modern Heuristic Techniques for Combinatorial Problems, C.R.Reeves (Ed.), Blackwell: London, 1993, pp. 70–150.

17. T. Goehring and Y. Saad, “Heuristic algorithms for automatic graph partitioning,” Technical Report UMSI-94-29. University of Minnesota Supercomputing Institute, 1994.

18. S.W. Hadley, B.L. Mark, and A. Vanelli, “An efficient eigenvector approach for finding netlist partitions,”IEEE Trans. on Computer-Aided Dessign, vol. 11, no. 7, pp. 885–892, 1992.

19. B. Hendrickson and R. Leland, “A multilevel algorithm for partitioning graphs,” in Proceedings Supercom-puting ’95, ACM Press, 1995.

20. D.S. Johnson, C.R. Aragon, and L.A. McGeogh, “Optimization by simulated annealing: An experimentalevaluation, Part I: Graph partitioning,” Operations Research, vol. 37, pp. 865–892, 1989.

21. G. Karypis and V. Kumar, “Multilevel K-way partitioning scheme for irregular graphs,” Journal of Paralleland Distributed Computing, vol. 48, no. 1, pp. 96–129, 1998.

22. B.W. Kernighan and S. Lin, “An efficient heuristic procedure for partitioning graphics,” The Bell Sys. Tech.Journal, pp. 291–307, 1970.

23. R.H. Klenke, R.D. Williams, and J.H. Aylor, “Parallel-processing techniques for automatic test pattern gen-eration,” IEEE Computer, pp. 71–84, 1992.

24. V. Kumar, A. Grama, A. Gupta, and G. Karypis, Introduction to Parallel Computing. Design and Analysis ofAlgorithms, The Benjamin/Cummings Publishing, 1994.

25. V. Kumar, A. Grama, and V.N. Rao, “Scalable load balancing techniques for parallel computers,” Journal ofDistributed and Parallel Computing, vol. 22, pp. 60–79, 1994.

26. C.R. Reeves, “Genetic algorithms,” in Modern Heuristic Techniques for Combinatorial Problems, C.R. Reeves(Ed.), Blackwell: London, 1993, pp. 151–196.

27. L.A. Sanchis, “Multiple-way network partitioning with different cost functions,” IEEE Trans. on Comp.,vol. 42, no. 12, pp. 1500–1504, 1993.

28. K. Schloegel, G. Karypis, and V. Kumar, “Graph partitioning for high performance scientific simulations,” inCRPC Parallel Computing Handbook, Morgan Kaufmann: San Matio, CA, 2000.

29. I. Shperling and E.J. McCluskey, “Circuit segmentation for pseudo-exhhaustive testing via simulated anneal-ing,” in International Test Conference IEEE, 1987.

30. A.J. Soper, C. Walshaw, and M. Cross, “A combined evolutionary search and multilevel optimisation approachto graph partitioning,” Mathematics Research Report 00/IM/58, University of Greenwich, 2000.

31. C. Walshaw and M. Cross, “Mesh partitioning: A multilevel balancing and refinement algorithm,” SIAM J.Sci. Comput., vol. 22, no. 1, pp. 63–80, 2000.