a static task scheduling algorithm in grid computing static task scheduling algorithm in grid ......

A Static Task Scheduling Algorithm in Grid Computing

Dan Ma1, Wei Zhang2

1School of Computer Science in HuaZhong University of Science and Technology , WuHan, 430074, China.

[email protected] 2WuHan Ordnance N.C.O. Academy of PLA, WuHan, 430075, China

Abstract. Task scheduling in heterogeneous computing environment such as grid computing is a critical and challenging problem. Based on traditional list scheduling we present a static task scheduling algorithm LBP (Level and Branch Priority) adapted to heterogeneous hosts in grid computing. The contribution of LBP algorithm lies mainly on working out a new method determining task priority in task ready list. Compare to the influential algorithms in the field of heterogeneous computing environment for instance HEFT and CPOP, LBP algorithm has a better task scheduling performance under the same time complexity.

1. Introduction

The task scheduling is still one of the most challenging problem need to be solved urgently either in grid computing or in traditional distributed and parallel computing. In homogonous environment, the researchers have explored many heuristic task scheduling list-based algorithms. These algorithms are classified as two types: one is called as BNP (Bounded Number of Processor) task scheduling algorithm. It supposes that all processors are fully connected and the number of processors is limited. The task scheduling algorithm ISH[1], MCP[2] and ETF[2]etc. belong to this kind of algorithm. Another is called as APN (Arbitrary Processor Network) task scheduling algorithm. It supposes that the processors network is arbitrarily connected and the number of processors is also limited. So it must consider the communication contention because the processors network isn’t fully connected. The task scheduling algorithm MH[3] and DLS[2] etc. belong to this kind of algorithm. The above two types of task scheduling algorithm works in the homogenous environment. But in heterogeneous system (such as grid system) the task scheduling problem is more complex because more factors such as different processor capacity, matching of different language codes and overhead of communication contention etc. are involved in task scheduling. So far, the task scheduling algorithm under heterogeneous condition is not often seen in literature. The influential algorithms are HEFT[4] and CPOP[4] presented by H.Topcuoglu et al.

This paper presents a static task scheduling algorithm LBP (Level and Branch Priority) in grid environment. The LBP algorithm may obtain more optimizing performance (viz. more shorter scheduling length) than HEFT and CPOP under the same condition. At the same time, it doesn’t increase the time and space complexity.

2.The basic definition and model

The most common task scheduling model is Directed Acyclic Graph (DAG). A parallel program can be well expressed as a DAG. In DAG, the parallel parts of application program are partitioned as many tasks. There exist communication data among some tasks. Generally, the task scheduling in grid environment may be seen as two independent stages logically. The first stage is task mapping or task assignment stage in which a task is assigned to a certain host. Simply expressing as: Assign:= ( ){ kjniHjTiHjTi }∈∈→ ,,|,

( ) })({1 ,U Ukj HjTiTi TiST− ∈

, Ti denotes an arbitrary task, Hj denotes an arbitrary host, n denotes the number of tasks, k denotes the number of hosts. The second stage is a task scheduling or task order stage. In this stage the all tasks already assigned on respective hosts are decided when to start. Expressing as: Scheduling:= , ST(ti) denotes the start time of the task Ti.

A DAG is denoted by graph G=(V,E). V={Ti|i = 1,2,…,n}, denotes n tasks with weight value; E={( ) VTjTiTjTi }∈,|, , denotes the communication edge with weight value. Considering heterogeneous hosts. the computation workload W(Ti) of the task Ti is various on the different hosts. So W(Ti:Hk) denotes the computation time of the task Ti executing on a certain host Hk. Mij denotes the message size between the task Ti and the task TJ.

Definition 1. The task node without any parent node is called entrance node. The task node without any child node is called exit node. If there is not only an exit node then only an exit node is named and the other exit nodes are looked as a common node which exists a void edge with zero communication message to the only exit node.

Definition 2. The idle hosts that adopted to be scheduled in grid system are fully interconnected. The number of idle hosts is limited. All idle hosts could synchronously carry on the task computing progress and message passing progress. The message size among tasks that are scheduled on the same host is zero.

3. The proposed algorithm and analysis on its performance

Most static heuristic task scheduling algorithm is based on classic list scheduling ideas. Its basic content may be divided into two independent steps. Step 1. All tasks in task graph are sorted according to a certain priority order and form a task ready list. Step 2. Taking out the head node from the task ready list one by one and scheduling it to a certain processor by employing a special strategy. The algorithm HEFT is a typical static list scheduling algorithm.

By analyzing the basic idea of list scheduling, we think the most key factor in step 1 is how to determine the priority of task node. Generally, the two common attributions that determine task priority is T-LEVEL (T-LEVEL of the task Ti is the length of the most longest path from the entrance node to the task node Ti) and B-LEVEL (B-LEVEL of task Ti is the length of the most longest path from the task node Ti to the exit node ). The T-LEVEL value of the task Ti is involved to the most early start time of Ti and The B-LEVEL value of the task Ti related to the critical path of the task graph. The algorithm HEFT used B-LEVEL as the priority attribution. Different from homogeneous environment, the task executing time is a mean executing time on all different hosts when computing the B-LEVEL. Saying step 2, many algorithms adopt the greedy strategy. The algorithm HEFT does so. Further it permits inserting the current task in the time gap of two scheduled tasks. This insertion undoubtedly increases the overhead of the algorithm.

The algorithm LBP mainly improves on the selection of task priority attribution in step 1. In the homogeneous environment, the T-LEVEL and B-LEVEL value is the most important task priority attribution. Especially the B-LEVEL emphasizes that the tasks on the critical path should be scheduled as soon as possible. But in the heterogeneous environment, the task executing time on the different hosts is various. Only adopting the mean executing time to computing the B-LEVEL isn’t wise because the mean B-LEVEL can’t really reveal the relationship between the task and the critical path. In view of the heterogeneity, we present a new way of computing the task priority that is called as Level-Branch Priority.

The way determining the task priority described follows: First, computing the Level value of each task. There are two methods to compute

the Level value. ①.From the entrance to the exit;(the method isn’t introduced here owing to limit of paper length) ②.From the exit to the entrance: When there isn’t only an entrance node, computing the Level value of every task node according to the sequence from the exit to the entrance. The value ~Li is the sum of the edges on the longest path from the exit node to the task node Ti. If there are j paths from the exit to task Ti and the value ~Lij is relevant to the jth path, ~Li=max{~Li1,~Li2,…,~Lij}. Define: Lmax=max{~L1,~L2,… ,~Li, …,~Ln}.then: Li=Lmax-~Li.

Then computing the branch value Bi of each task. Bi is the sum of all out edges weight value of the task Ti viz. )( iji MB ∑= , j is the out degree of Ti.

Finally, determining the priority of the task Ti according to the Li and Bi: The priority of Ti is higher than Tj if Li < Lj despite Bi>Bj or Bi<Bj. If Li=Lj, then comparing the Bi and Bj. If Bi>Bj, then the priority of Ti is higher than Tj Contrarily then the priority of Ti is lower than Tj.

The whole LBP algorithm simply described as below in a non-formal mode: ①.Input the DAG, determining the priority of any task Ti according to the Li and Bi. ②.Put any task Ti into the task ready list at the decreasing order of the priority of the task Ti. ③.While the task ready list is not empty Do ④. Take out head task from the task ready list to begin scheduling. ⑤. For each host Hj(j=1,2,…,k)in idle hosts set Do ⑥. Computing the most early finish time of the task Ti when it is scheduled on the

host Hj, not considering the insertion the current task Ti into the time gap

between any two scheduled tasks when computing the most early finish time of the task Ti.

⑦. Endfor ⑧. Scheduling the task Ti on the host that makes it could be finished at a most early time. ⑨.Endwhile ⑩.Output the task scheduling gantt chart.

Time complexity: The time complexity of the HEFT and CPOP algorithm is O(e*q). e is number of all edges in DAG, q is the number of all idle hosts. The LBP algorithm adopts the same greed strategy to select the idle host as the HEFT and CPOP algorithm. The difference lies on that the LBP algorithm scheduled the current task only after the last scheduled task on the idle host and the HEFT algorithm considering the insertion operation. The time complexity of LBP isn’t greater than HEFT. So the time complexity of LBP is also O(e*q).

• Scheduling performance: Some simulation experiment were made by adopting small-scale stochastic DAGs. These DAGs compose of task nodes from ten to a hundred. The CCR (Communication to Computation Ratio) of DAGs vary from 0.1 to 10.The two important indices were discussed: the mean run time and the mean speed-up. The simulation results reveal that the mean run time of LBP is a little less than HEFT and CPOP when the number of task nodes is great. At the same time, the mean speed-up of LBP is higher than HEFT and CPOP when the number of task nodes is small. With the number of task nodes becoming bigger the mean speed-up of LBP tends to be uniform as HEFT and CPOP.

4. Conclusion

The static task scheduling algorithm aiming at the heterogeneous environment isn’t often seen. The HEFT and CPOP algorithm are two influential algorithms. Based on HEFT and CPOP this paper presents a new task priority determining and task scheduling algorithm called LBP. Comparing to HEFT and CPOP, the LBP algorithm may obtain better scheduling performance than HEFT and CPOP without increasing the time and space complexity.

References

1. H.EL-Rewini, T.G.Lewis, H.H.Ali. Task Scheduling in Parallel and Distributed Systems, Englewood Cliffs, New Jersey: Prentice Hall, 1994.

2. Rajkumar Buyya, High performance cluster computing: Architectures and system (volume No. 1).402-406.

3. H.EL-Rewini, T.G.Lewis. Scheduling Parallel Programs onto Arbitrary Target Machines. Journal of Parallel and Distributed Computing, vol.9(2), 138-153, June 1990.

4. Haluk.T, Salim.H, Min-You Wu. Performance-Effective and Low-Complexity Task Scheduling for Heterogeneous Computing. Transactions on Parallel and Distributed Systems, vol.13, No.3, March 2002.

a static task scheduling algorithm in grid computing static task scheduling algorithm in grid ......

Documents