[ieee distributed processing (ipdps) - rome, italy (2009.05.23-2009.05.29)] 2009 ieee international...

8
Interweaving Heterogeneous Metaheuristics Using Harmony Search Young Choon Lee and Albert Y. Zomaya Advanced Networks Research Group, School of Information Technologies The University of Sydney NSW 2006, Australia {yclee,zomaya}@it.usyd.edu.au Abstract In this paper, we present a novel parallel- metaheuristic framework, which enables a set of heterogeneous metaheuristics to be effectively interwoven and coordinated. The key player of this framework is a harmony-search-based coordinator devised using a recent breed of soft computing paradigm called harmony search that mimics the improvisation process of musicians. For the applicability validation and the performance evaluation, we have implemented a parallel hybrid metaheuristic using the framework for the task scheduling problem on multiprocessor computing systems. Experimental results verify that the proposed framework is a compelling approach to parallelize heterogeneous metaheuristics. 1. Introduction Metaheuristics/hyperheuristics are increasingly adopted to deal with many different optimization problems for which no concrete polynomial-time solution algorithms exist. While the primary intention of traditional heuristics is to find a reasonably good solution as fast as possible if necessary at the cost of quality, that of metaheuristics is to effectively search a large solution space in a non-deterministic/stochastic way—possibly spending more time—to find an optimal solution or a near-optimal solution. Specifically, metaheuristics are characterized by their two essential processes: (1) the expansive exploration of search space (diversification), and (2) the guided exploitation of good solutions (intensification). The use of memory is another important characteristic; that is, generally, during the course of searching, best solutions are maintained and constantly updated, and the best solution seen is eventually selected. Due to the time complexity of most metaheuristics, in recent years there has been a growing interest in their parallelization [1], [2], [3], [4] (see [5] for further readings on this topic). Although some serious efforts in the hybridization of metaheuristics for parallelization have been made [6], [7], [8], many of these parallel implementation efforts make use of homogeneous algorithms (parallel genetic algorithms (GAs), parallel tabu search (TS) algorithms) [9]. These parallel implementations can be classified as fine- grained or coarse-grained depending on the degree of parallelism; and their communication models are synchronous and/or asynchronous. When designing a parallel metaheuristic with heterogeneous algorithms (metaheuristics), the coordinator, or the central communicator, is the key component in that its effective coordination of these algorithms guides them to reach an optimal solution. To this extent, harmony search (HS) [10], a new breed of soft computing algorithm inspired by the improvisation process of musicians, is selected for our framework. The coordination process in our framework is analogous to the process of reaching a best harmony with sounds played by different musicians in the preparation of an improvisation. HS has shown its wide and successful applications in various optimization problems [11], [12], [13]. This paper presents a novel parallel-metaheuristic framework (PMF), which is composed of a set of heterogeneous worker metaheuristics (WMHs) and a harmony-search-based coordinator (HSC). The interwoven heterogeneous metaheuristics are enabled—by the coordinator—to share each other’s best solutions to help them escape local optima; hence, their solutions are harmonized with each other. For the applicability validation and the performance evaluation, we have implemented a parallel hybrid metaheuristic (HS-PHM) using the framework for the task scheduling problem on multiprocessor computing

Upload: albert-y

Post on 08-Oct-2016

212 views

Category:

Documents


0 download

TRANSCRIPT

Interweaving Heterogeneous Metaheuristics Using Harmony Search

Young Choon Lee and Albert Y. Zomaya Advanced Networks Research Group, School of Information Technologies

The University of Sydney NSW 2006, Australia

{yclee,zomaya}@it.usyd.edu.au

Abstract

In this paper, we present a novel parallel-

metaheuristic framework, which enables a set of heterogeneous metaheuristics to be effectively interwoven and coordinated. The key player of this framework is a harmony-search-based coordinator devised using a recent breed of soft computing paradigm called harmony search that mimics the improvisation process of musicians. For the applicability validation and the performance evaluation, we have implemented a parallel hybrid metaheuristic using the framework for the task scheduling problem on multiprocessor computing systems. Experimental results verify that the proposed framework is a compelling approach to parallelize heterogeneous metaheuristics. 1. Introduction

Metaheuristics/hyperheuristics are increasingly adopted to deal with many different optimization problems for which no concrete polynomial-time solution algorithms exist. While the primary intention of traditional heuristics is to find a reasonably good solution as fast as possible if necessary at the cost of quality, that of metaheuristics is to effectively search a large solution space in a non-deterministic/stochastic way—possibly spending more time—to find an optimal solution or a near-optimal solution. Specifically, metaheuristics are characterized by their two essential processes: (1) the expansive exploration of search space (diversification), and (2) the guided exploitation of good solutions (intensification). The use of memory is another important characteristic; that is, generally, during the course of searching, best solutions are maintained and constantly updated, and the best solution seen is eventually selected.

Due to the time complexity of most metaheuristics, in recent years there has been a growing interest in their parallelization [1], [2], [3], [4] (see [5] for further readings on this topic). Although some serious efforts in the hybridization of metaheuristics for parallelization have been made [6], [7], [8], many of these parallel implementation efforts make use of homogeneous algorithms (parallel genetic algorithms (GAs), parallel tabu search (TS) algorithms) [9]. These parallel implementations can be classified as fine-grained or coarse-grained depending on the degree of parallelism; and their communication models are synchronous and/or asynchronous.

When designing a parallel metaheuristic with heterogeneous algorithms (metaheuristics), the coordinator, or the central communicator, is the key component in that its effective coordination of these algorithms guides them to reach an optimal solution. To this extent, harmony search (HS) [10], a new breed of soft computing algorithm inspired by the improvisation process of musicians, is selected for our framework. The coordination process in our framework is analogous to the process of reaching a best harmony with sounds played by different musicians in the preparation of an improvisation. HS has shown its wide and successful applications in various optimization problems [11], [12], [13].

This paper presents a novel parallel-metaheuristic framework (PMF), which is composed of a set of heterogeneous worker metaheuristics (WMHs) and a harmony-search-based coordinator (HSC). The interwoven heterogeneous metaheuristics are enabled—by the coordinator—to share each other’s best solutions to help them escape local optima; hence, their solutions are harmonized with each other. For the applicability validation and the performance evaluation, we have implemented a parallel hybrid metaheuristic (HS-PHM) using the framework for the task scheduling problem on multiprocessor computing

1. Set HM to initial (randomly generated) harmonies 2. Create a new harmony h' with harmonies in HM and randomly generated sounds with a probability of pc 3. for each sound si' in h' do 4. Adjust pitch with a probability of with pa 5. if h' is better than the worst harmony hw in HM then

replace hw by h' 6. Repeat from Step 2 until a termination criterion reaches

Figure 1. Pseudo-code of HS

1. Initialize HM with randomly generated harmonies 2. while all harmonies in HM do not have same quality do 3 Wait until a WMH, wmh′ reports its best solution, s* 4. Replace the worst harmony, hw in HM with s* if s* is better than hw 5. Generate new harmonies using harmonies in HM 6. Adjust pitches of new harmonies 7. Update HM with new harmonies 8. Send a set, H* of best harmonies to wmh′ 9. end while 10. Send termination signal to all WMHs 11. Select the best harmony in HM

Figure 2. The workings of HSC

systems (e.g., computer clusters). We have chosen three representative metaheuristics, a GA, simulated annealing (SA) and artificial immune system (AIS), as worker components. Our evaluation study has been conducted with an extensive set of experiments. Our results clearly show the applicability and practicality of the proposed framework.

The remainder of this paper is organized as follows. The proposed framework is described in detail in Section 2. Section 3 presents an implementation of our framework for the task scheduling problem. In Section 4, the details of our evaluation study and results are presented. We conclude this paper in Section 5. 2. Parallel-metaheuristic framework

In this section, we give a brief description of HS, and then present PMF. 2.1. Harmony search

Musical improvisation is the practice of making and playing music spontaneously. As a short preparation for an improvisation, musicians undergo some trials playing their instruments to find a best harmony. This preparation (HS) consists of a series of processes: harmony generation (randomly or using previous harmonies), pitch adjustment, ensemble inclusion, and dissonance elimination. These processes have similarities to those in other evolutionary algorithms—the pitch adjustment process can be seen as the mutation operator in GAs. However, HS has several unique features that make it more attractive than other metaheuristics. These features include the generation of a harmony from multiple (previous) harmonies as opposed to two single parent chromosomes in GAs, and the creation of an ensemble between two similar (previous) harmonies.

A HS algorithm repeatedly constructs harmonies and performs pitch adjustment for each of them until a satisfactory harmony is identified. The construction of harmonies makes use of a harmony memory (HM) in which a number of best harmonies previously found (or random harmonies, initially) are maintained; this

HM is similar to the memory of musicians. A new harmony is made selecting and combining sounds—played by different musicians—from harmonies in the HM; the selected sounds then have their pitches adjusted for a better harmony. The new harmony replaces the worst harmony in HM, if it is better than the worst harmony. This replacement policy ensures the quality of harmonies in HM.

A pseudo-code of HS is presented in Figure 1. 2.2. Design description of PMF

The parallelization of heuristics, specifically metaheuristics, has appeared to be an effective technique in increasing the rate of convergence, and improving the quality of solution. While the first advantage (fast convergence speed) is apparent in parallel metaheuristics, the second advantage may not be easily achievable. This is because the search is often trapped in local optima as in many (serial) metaheuristics. Since the area in the solution space searched by a parallel metaheuristic is likely to be more extensive, this trapping issue can be relieved to a certain degree. The primary design focus of our PMF is the provision of good guidance to parallel metaheuristics for effectively escaping local optima; this eventually leads to find global optima more often.

PMF is essentially capable of integrating heterogeneous metaheuristics. This integration is enabled by its central component—HSC detailed in Figure 2. The use of heterogeneous metaheuristics implies that various unique features devised in these metaheuristics can be exploited to make the search more efficient. Clearly, the effective coordination between these metaheuristics is the key to achieve good performance. HSC in PMF not only takes charge of this coordination duty, but also accelerates and streamlines the search. Specifically, it initially generates a set of random harmonies and stores them in HM (Step 1). This HM is then updated with best solutions reported from WMHs, and harmonies in HM

(a) (b)

(c) (d)

(e) (f)

(g) (h)

Figure 3. Illustration of a search coordinated by HSC

are further manipulated using pitch adjustment for improving their quality (Steps 3–7). Any updates to harmonies in HM are made based on the replacement policy described in Section 2.1. Now, a set of best harmonies in HM is selected and sent to WMHs. HSC terminates if all harmonies in HM have the same quality. Since these metaheuristics have different convergence rates, HSC uses asynchronous communication.

Figure 3 illustrates how HSC actually coordinates WMHs (GA and AIS in this illustration) and streamlines their searches. Figure 3a shows initial search areas of WMHs. These WMHs report their best solutions once they reach their termination criteria. In our illustration, they both failed to find the global optimum (Figure 3b). HSC allows them to share each other’s best solutions (Figure 3c). While HSC simply

sends back best solutions reported from WMHs to themselves without improvements (Figures 3c and 3d), it finds an improved solution and sends it to AIS (Figure 3e); this help from HSC significantly accelerates the overall search (Figures 3f and 3g), and enables WMHs to eventually find the global optimum (Figure 3h). 3. Application of PMF to task scheduling

The problem of scheduling precedence-constrained tasks on heterogeneous computing systems (e.g., computer clusters) is well known for its NP-hardness; this implies the problem is suitable for evaluation of the applicability and practicality of PMF. In this section, we define the task scheduling problem, and then describe the three WMHs (GA, SA and AIS) developed and adopted as part of HS-PHM. Note that these WMHs are simply our selection from numerous candidate metaheuristics, and their implementations are similar to those presented in our previous work [14]. 3.1. Task scheduling

Parallel programs, in general, can be represented by a directed acyclic graph (DAG). A DAG, G = (N, E), consists of a set N of n nodes and a set E of e edges (Figure 4). A DAG is also called a task graph or macro-dataflow graph. In general, the nodes represent tasks partitioned from an application; the edges represent precedence constraints. An edge (i, j) ∈ E between task ni and task nj also represents inter-task communication. In other words, the output of task ni has to be transmitted to task nj in order for task nj to start its execution. A task with no predecessors is called an entry task, nentry, whereas an exit task, nexit, is one that does not have any successors. Among the predecessors of a task ni, the predecessor which completes the communication at the latest time is called the most influential parent (MIP) of the task denoted as MIP(ni). The longest path of a task graph is the critical path (CP).

The weight on a task ni denoted as wi represents the computation cost of the task. In addition, the computation cost of the task on a processor pj, is denoted as wi,j and its average computation cost is denoted as

iw . The weight on an edge, denoted as ci,j represents the

communication cost between two tasks, ni and nj. However, a communication cost is only required when two tasks are assigned to different processors. In other

words, the communication cost when tasks are assigned to the same processor can be ignored, i.e., 0.

The target system used in this work consists of a set P of p heterogeneous processors/machines that are fully interconnected. The inter-processor communications are assumed to perform with the same speed on all links without contentions. It is also assumed that a message can be transmitted from one processor to another while a task is being executed on the recipient processor which is possible in many systems.

The earliest start time of, and the earliest finish time of, a task ni on a processor pj is defined as

=),( ji pnEFT jiji wpnEST ,),( + (2)

Note that the actual start and finish times of a task

ni on a processor pj, denoted as AST(ni, pj) and AFT(ni, pj), can be different from its earliest start and finish times, EST(ni, pj) and EFT(ni, pj), if the actual finish time of another task scheduled on the same processor is later than EST(ni, pj).

Figure 4. A simple task graph The communication to computation ratio (CCR) is a

measure that indicates whether a task graph is communication intensive, computation intensive or moderate. For a given task graph, it is computed by the average communication cost divided by the average computation cost on a target system.

The problem addressed in this paper is the scheduling of a set of precedence-constrained (interdependent) tasks, comprising a parallel program, onto a set of tightly coupled heterogeneous processors. The primary goal of this scheduling is to make as many appropriate task-processor matches as possible without violating precedence constraints (data dependencies), so that the overall completion time,

also called makespan, of a parallel program can be minimized. The makespan in this paper is defined as M=max{AFT(nexit)}—the amount of time taken from the time the first task starts running to the time the last task completes its execution. 3.2. Algorithm description

Since the WMHs of HS-PHM are implemented only as part of our evaluation study (not as the primary work in our study), we use a minimalist approach to their implementations. Clearly, the selection of metaheuristics and their design characteristics substantially affect the performance of parallel metaheuristics developed using PMF.

In all three implementations, for a given state, one of two mutation/neighbor-selection methods (swap and point) is selected with a probability of 0.5. 3.2.1. AIS

The immune system is a very complex and yet sophisticated mechanism consisting of a diverse set of entities, principles, and processes. Antibodies and antigens are two key players; and their behaviors and interactions are characterized by different principles and processes, such as negative/positive selection, clonal selection, immune networks, and danger model. An AIS, in general, models one or more entities and processes to deal with real-world problems.

The AIS (Figure 5) presented in this section is devised using antibodies and clonal selection. The fundamental principle of clonal selection is that, for a particular antibody, the number of its clones is proportional to its affinity (fitness value), and the antibody is mutated inversely proportional to its affinity. In other words, superior antibodies (solutions) become dominant in the entire antibody population most likely retaining their current forms, whereas inferior ones become extinct unless their quality is significantly improved after a substantial number of mutations.

The AIS (Figure 5) initially generates a set AB of 10 antibodies at random and their affinities (makespans) are computed before its main process starts. Each of these antibodies undergoes the clonal selection process that consists of clonal expansion (Step 7) and affinity maturation (Step 9). The number of clones (NC) and the mutation probability (MP) in the AIS are calculated as follows:

NC = max(|AB| × 2 – AR(abi)2, 2), (3)

MP = AR(abi) / (|AB| × 2). (4) where AR(abi) is the affinity rank of antibody abi.

⎩⎨⎧

=),( ji pnEST

0 inMIPkki i

cPppnMIPEFT ),(),),(( +∈

if ni = nentryotherwise

(1)

1. Initialize C with randomly generated chromosomes 2. Let Cbest = Ø 3. repeat 4. Let Cnew = Ø 5. Let c* = the initial best chromosome in C 6. do 7. while |Cnew| ≠ |C| - |Cbest| do 8. Select a set C p of two chromosomes 9. Generate offsprings C c from C p using crossover operator 10. Perform mutations on each chromosome cc

i∈ C c 11. Add cc

i to Cnew 12. end while 13. Let C = Cbest + Cnew 14. Let c′ = the current best chromosome in C 15. while c′ ≥ c* and c* is the same for 10 times 16. Let c* = min(c*, c′) 17. Send c* to HSC 18. Let Cnew = Ø 19. Receive a set Cbest of chromosomes from HSC 20. until termination signal is received from HSC

Figure 6. The algorithm description of GA

1. Initialize AB with randomly generated antibodies 2. Compute affinities of antibodies in AB 3. repeat 4. Let ab* = the initial best antibody in AB 5. do 6. for each antibody abi ∈ AB do 7. Generate a set Ci of clones of abi proportional to itsaffinity 8. for each clone ci,j in Ci do 9. Mutate ci,j inversely proportional to its affinity 10. if ci,j’s affinity is higher than abi,j’s then 11. Replace abi with ci,j 12. end if 13. end for 14. end for 15. Let ab′ = the current best antibody in AB 16. while ab′ ≥ ab* and ab* is the same for 10 times 17. Let ab* = min(ab*, ab′) 18. Send ab* to HSC 19. Receive a set ABbest of antibodies from HSC 20. Let AB = AB – worst | ABbest | antibodies in AB + ABbest21. until termination signal is received from HSC

Figure 5. The algorithm description of AIS

After the affinity maturation process of the clone, if its affinity is higher (i.e., shorter makespan) than its original’s, the clone replaces the original. Once the entire population finishes undergoing clonal selection, the current best antibody ab′ is identified and compared with the initial best antibody ab*. If ab′ is improved over ab*, it is immediately sent to HSC for further improvements and for information exchange with other WMHs. Otherwise, the entire population is carried over to the next generation. If there is no improvement in ab* over 10 consecutive times, ab* is simply sent back to HSC. Within a short time interval, HSC sends a set of (good quality) antibodies to the AIS. These antibodies replace the worst antibodies in the population (Step 20). The AIS repeats until HSC sends a termination signal. 3.2.2. GA

GAs are inspired by the process of biological evolution, where natural selection operates on a population of phenotypes, resulting in differential reproduction that transfers the essential structures of the most successful genotypes to the subsequent generation. The core techniques adopted in GAs include inheritance, selection, recombination and mutation of evolutionary biology. GAs are probably the single most popular metaheuristic used for the task scheduling problem, because they tend to deliver very competitive schedules in a reasonable amount of time.

The GA implementation in this paper (Figure 6) involves typical genetic operators—crossover with

roulette-wheel selection and tournament selection, and mutation. The chromosome population C is set to 30 and initialized with randomly generated chromosomes. Chromosomes in a given generation first undergo crossover to produce offspring chromosomes Cnew (Step 9), and then undergo mutation to further diversify and/or improve these offsprings (Step 10). Two chromosomes for crossover operation are selected using roulette-wheel selection, and tournament selection with a probability of 0.8. Specifically, two pairs of chromosomes in the current population are selected using roulette-wheel selection. The selection probability of each of these four chromosomes is defined as its fitness value over the summation of fitness values of all chromosomes in the population. Chromosomes in each pair are then competing with each other (i.e., tournament selection). The one with better fitness value in each pair has a winning rate of 0.8. The two winner chromosomes are used for one-point crossover. A point in each chromosome is accepted for mutation with a probability of 0.05. Similar to the AIS, the GA contacts HSC if either there is an improvement in c* or it is not improved over 10 consecutive times. 3.2.3. SA

Annealing in metallurgy is a process in which a material is heated and then slowly cooled in order to make it less brittle and more workable. SA is a probabilistic problem-solving approach based on the physical annealing process of a solid.

1. Initialize sc with a randomly generated state 2. Let S = sb = sc 3. repeat 4. Compute the energy ec of sc 5. Set temperature t based on ec 6. while S ≠ Ø do 7. repeat 8. Select a neighbor state sn of sc 9. Compute the energy en of sn 10. if en < ec or a random number n < up then 11. Let sc = sn 12. Cool t 13. Go to step 19 14. end if 15. until sc is not improved for some iterations 16. Let sc = a state si in S 17. Let S = S – si 18. Reheat t 19. Update sb 20. end while 21. Send the best state sb to HSC 22. Receive a set Sbest of best states from HSC 23. Let S = Sbest 24. Let sc = the best state in S 25. until termination signal is received from HSC

Figure 7. The algorithm description of SA

The SA in this paper (Figure 7) incorporates reheating and restarting. It maintains a set of good quality states passed from HSC for restarting. For each time restarting occurs, a state in this set is used, instead of a randomly generated state; this enables the quality of the state to remain at a certain level. Since our SA, as in typical SA approaches, deals with a single state throughout its annealing process, this restarting technique makes particularly a significant impact on avoiding local optima; this may in turn lead to the global optimum.

The initial state sc is generated at random and it is used to set the initial temperature; that is, a 10th of the initial state’s energy (makespan) is set to be the initial temperature. Now, a neighbor state of the current state is selected using the mutation genetic operator—either swap or point mutation. If the neighbor state’s energy en is lower than the current state’s ec it is accepted as a new current state (Step 10). Otherwise, another neighbor state is selected with a probability of 1 – Boltzmann factor up (i.e., ( ) tee cne /−− ) and it is evaluated. This neighbor selection may repeat up to 10 times if a better neighbor state is not found or a randomly generated number n, between 0 and 1, is greater than or equal to up (uphill move probability). A worse neighbor state is occasionally accepted in order to avoid getting trapped in a local minimum energy state; hence, the move is uphill. Note that the higher

the temperature, the more frequently a worse neighbor state gets accepted. When the acceptance of a neighbor state is determined, the temperature decreases by a fixed factor α of 0.95. By doing so, the search space is confined to a narrower region and the move is more likely to be downhill. In a case in which 10 neighbor states are all rejected, the temperature doubles (reheating), but constrained by maximum limit and the current state gets replaced with a new state in Sbest received from HSC (restarting). Again, this reheating and restarting is adopted to escape from local minima. The SA sends its best state after 10 restarting tries and receives a set Sbest of states from HSC; this set is used for restarting in the next annealing process. 4. Performance evaluation

This section describes the details of our experiments, and then presents results. 4.1. Experimental settings

Experiments were carried out with two extensive sets of random task graphs and well-known application task graphs on an actual IBM Blade server consisting of 10 nodes in each of which there are two 2.40 GHz Intel Xeon processors with 1 Giga bytes of shared memory.

The three real-world parallel applications used for our experiments were the Laplace equation solver [15], the LU-decomposition [16] and Fast Fourier Transformation [17]. A large number of variations were made on these task graphs for more comprehensive experiments. In addition to task graphs, various different characteristics of processors were applied to simulations. Table 1 summarizes the parameters used in our experiments.

Table 1. Experimental parameters

parameter value the number of tasks U(10, 600) CCR {0.1, 0.2, 1.0, 5.0, 10.0}the number of processors {2, 4, 8, 16, 32, 64} processor heterogeneity {100, 200, random}

The total number of experiments conducted using

different task graphs on the six different numbers of processors is 72,000. Specifically, the random task graph set consisted of 150 base task graphs generated with combinations of 10 graph sizes, five CCRs and three processor heterogeneity settings. For each combination, 20 task graphs were randomly generated, retaining the characteristics of the base task graph.

These 3,000 graphs were investigated with six different numbers of processors. Each of the three applications was investigated using the same number of task graphs (i.e., 18,000); hence the figure 72,000.

The computational and communication costs of the tasks in each task graph were randomly selected from a uniform distribution, with the mean equal to the chosen average computation and communication costs. A processor heterogeneity value of 100 was defined to be the percentage of the speed difference between the fastest processor and the slowest processor in a given system. For the real-world application task graphs, the matrix sizes and the number of input points were varied, so that the number of tasks can range from about 10 to 600.

HS-PHM has been implemented with the support of the message passing interface (MPI) standard [18] for inter-process communication (i.e., communication between HSC and WMHs). The size of HM in HSC, and the number of best harmonies sent to a WMH at each time it reports its best solution were set to 30 and 5, respectively. When experimenting with individual WMHs, we tuned the number of generations for each of the WMHs similar to the average number of generations each of them underwent in HS-PHM (i.e., AIS: 207, GA: 241 and SA: 13,375) for the sake of fairness. That is, for each test case, AIS and GA were run for 210 and 250 generations, respectively, and SA was repeated for 15,000 iterations. In addition, with AIS and GA, the number of best states—in a particular generation—carried over to the next generation is the same as that passed from HSC to individual WMHs (i.e., 5); however, SA uses a randomly generated state for its reheating process. The pitch adjustment technique used in HSC is the same as the point mutation operator in AIS and GA. 4.2. Results

As we claimed, it is clearly shown in the results presented in this section (Table 2, and Figures 8 and 9) that PMF empowers heterogeneous metaheuristics to effectively cooperate with each other to achieve better performance. Specifically, the superior performance of HS-PHM—developed using PMF for the task scheduling problem—over three individual metaheuristics verifies the capability of PMF in exploiting strengths of different metaheuristics and efficiently coordinating these metaheuristics.

We used makespan as the performance measure in our evaluation study. For a given task graph, we normalize the makespan of the schedule generated by a particular algorithm to a lower bound—the makespan of the tasks along the CP (i.e., CP tasks) without

considering communication costs. This normalized form of makespan is termed the ‘schedule length ratio’ (SLR). Formally, the SLR value of the makespan M of a schedule generated for a task graph G by a scheduling algorithm is defined as

SLR =

∑=

∈∈

||

1, }{min

CP

ijiPpCPn w

M

ji

(5)

Table 2. Comparative results

AlgorithmDAG set

HS-PHM over AIS

HS-PHM over GA

HS-PHM over SA

Random 8% 11% 23% FFT 4% 6% 16% Laplace 7% 14% 39% LU 15% 13% 58% Average 9% 11% 34%

The performance comparison summary between

HS-PHM and three individual WMHs in Table 2 confidently confirms that HSC plays a crucial role in helping WMHs in HS-PHM better cope with being trapped in local optima; this capability of HSC in turn contributes to shortening makespan regardless of different types of task graphs. Figures 8 and 9 present more specific evidence of our claim; that is, HS-PHM consistently outperforms the others.

1.0

1.5

2.0

2.5

3.0

0.1 0.2 1.0 5.0 10.0CCR

Ave

rage

SLR

HS-PHMAISGASA

Figure 8. Results for random DAGs

1.0

3.0

5.0

7.0

9.0

11.0

13.0

15.0

0.1 0.2 1.0 5.0 10.0CCR

Ave

rage

SLR

HS-PHMAISGASA

(a)

1.0

3.0

5.0

7.0

9.0

11.0

13.0

15.0

0.1 0.2 1.0 5.0 10.0CCR

Ave

rage

SLR

HS-PHMAISGASA

(b)

1.0

3.0

5.0

7.0

9.0

11.0

13.0

15.0

0.1 0.2 1.0 5.0 10.0CCR

Ave

rage

SLR

HS-PHMAISGASA

(c)

Figure 9. Average SLR for real-world

application DAGs. (a) FFT. (b) Laplace. (c) LU.

5. Conclusion

In this paper, we have presented PMF as a novel HS-based parallel metaheuristic framework, which facilitates the development of parallel metaheuristics with heterogeneous metaheuristics. The advantage of PMF is two-fold: competence in streamlining the search process and efficient coordination of heterogeneous metaheuristics. It is identified that HS is an outstanding technique to glue a diverse set of metaheuristics for parallel execution. We have proved this claim with performance evaluation results of HS-PHM—our experimental implementation using PMF for the task scheduling problem on heterogeneous computing systems. Based on our experimental results with various settings, HS-PHM consistently outperformed individual serial versions of the WMHs integrated with HS-PHM. We plan to investigate other application areas of PMF with a broader range of WMHs, such as TS and ACO. 6. References [1] A. Chipperfield and P. Fleming, Parallel and Distributed Computing Handbook—Parallel Genetic Algorithms, A. Y.

H. Zomaya, Ed. New York: McGraw-Hill, 1996, pp. 1118–1143. [2] E. Alba and M. Tomassini, Parallelism and evolutionary algorithms, IEEE Trans. Evolutionary Computation, vol. 6, no. 5, pp. 443–462, October 2002. [3] D. Janaki Ram, T. H. Sreenivas, and K. Ganapathy Subramaniam, Parallel Simulated Annealing Algorithms, Journal of Parallel and Distributed Computing, vol. 37, no. 2, pp. 208–212, September 1996. [4] E. Alba, G. Leguizamon and G. Ordonez, Analyzing the behavior of parallel ant colony systems for large instances of the task scheduling problem, Proc. Int’l Symp. Parallel and Distributed Systems, April 2005. [5] Parallel Metaheuristics: A New Class of Algorithms, E. Alba, Ed. New Jersey: Wiley-Interscience, September 2005. [6] V. Nissen, Solving the Quadratic Assignment problem with Clues from Nature, IEEE Trans. Neural Networks, vol. 5, no. 1, pp.66–72, 1994. [7] S. Salcedo-Sanz,Y. Xu, and X.Yao, Hybrid meta-heuristics algorithms for task assignment in heterogeneous computing systems, Computers and Operations Research, vol. 33, no. 3, pp. 820–835, March 2006. [8] H. Chen, N. S. Flann, and D. W. Watson, Parallel genetic simulated annealing: a massively parallel SIMD algorithm, IEEE Trans. Parallel and Distributed Systems, vol. 9, no. 2, pp. 126–136, Feb. 1998. [9] E. G. Talbi, A Taxonomy of Hybrid Metaheuristics, Journal of Heuristics, vol. 8, no. 5, pp. 541–564, September 2002. [10] Z. W. Geem, J. H. Kim, and G. V. Loganathan, A New Heuristic Optimization Algorithm: Harmony Search, SIMULATION, vol. 76, no. 2, pp. 60–68, 2001. [11] Z. W. Geem, "Optimal Scheduling of Multiple Dam System Using Harmony Search Algorithm," Lecture Notes in Computer Science, vol. 4507, pp. 316–323, 2007. [12] Z. W. Geem, K. S. Lee, and Y. Park, "Application of Harmony Search to Vehicle Routing," American Journal of Applied Sciences, vol. 2, no. 12, pp. 1552–1557, 2005. [13] K. S. Lee, Z. W. Geem, S. -H. Lee, K. -W. Bae, "The Harmony Search Heuristic Algorithm for Discrete Structural Optimization," Engineering Optimization, vol. 37, no. 7, pp. 663–684, 2005. [14] Y. C. Lee and A. Y. Zomaya, “A Novel State Transition Method for Metaheuristic-based Scheduling in Heterogeneous Computing Systems,” IEEE Trans. Parallel and Distributed Systems, vol. 19, no. 9, pp. 1215–1223, September 2008. [15] M.-Y. Wu and D.D. Gajski, Hypertool: A Programming Aid for Message-Passing Systems, IEEE Trans. Parallel and Distributed Systems, vol. 1, no. 3, pp. 330–343, July 1990. [16] R.E. Lord, J.S. Kowalik, and S.P. Kumar, Solving Linear Algebraic Equations on an MIMD Computer, J. ACM, vol. 30, no. 1, pp. 103–117, January 1983. [17] T.H. Cormen, C.E. Leiserson, and R.L. Rivest, Introduction to Algorithms, MIT Press, 1990. [18] Message Passing Interface Forum, MPI: A Message-Passing Interface Standard, Int’l. J. Supercomputer Applications and High Performance Computing, 8, 165, 1994.