[ieee 2008 seventh international conference on grid and cooperative computing (gcc) - shenzhen,...

A Heuristic on Job Scheduling in Grid Computing Environment

Hojjat Baghban Computer Engineering Department

Islamic Azad University, Marvdasht Branch Marvdasht, Iran

[email protected]

Amir Masoud Rahmani Computer Engineering Department

Islamic Azad University, Science and Research Branch Tehran, Iran

[email protected]

Abstract—This paper introduces a model and a job scheduling algorithm in grid computing environments. In grid computing several applications require numerous resources for execution which are not often available for them, thus presence of a scheduling system to allocate resources to input jobs is vital. The resource selection criteria in the proposed algorithm are based on input jobs, communication links and resource computational capability. Then, the proposed algorithm will be assessed in simulated grid environment with statistical patterns of job insertion into system which each of them follow the normal, Poisson and exponential distribution. The results show that the new proposed algorithm has a better efficiency in comparison with the results obtained from other known algorithms.

Keywords—Grid Computing, Job Scheduling, Computation Cost, Transfer Time

I. INTRODUCTION Grid computing in the simplest case refers to cooperation of

multiple processors on multiple machines and its objective is to boost the computational power in the fields which require high capacity of the CPU. In grid computing multiple servers which use common operating systems and software have interactions with each other.

According to Foster and Keselman in [5], grid computing is hardware and software infrastructure which offer a cheep, distributable, coordinated and reliable access to powerful computational capabilities.

Since multiple applications may require numerous resources which often are not available for them so that in order to allocate resources to input jobs, having a scheduling system is essential. Because of the vastness and separation of resources in the computational grid, scheduling is one of the most important issues in grid environment [13]. Job scheduling is one of the most fundamental and important aspects of distributed and parallel computational systems. Vast investigations have been done in this scope, which have led to theories and practical results [7, 8, 18]. However new scheduling algorithms have been offered with emergence of grid computing. Objectives of scheduling algorithm are increasing system throughput [18], efficiency, and decreasing job completion time.

A. Classifying the scheduling algorithms The grid computing scheduling could be classified into

three centralized, distributed and hierarchy scheduling model [9].

In centralized scheduling, a central machine operates as a resource manager and is responsible for scheduling jobs for the available hosts in a network. In this type of scheduling, first jobs are entered into central schedulers and then central scheduler will send jobs toward hosts according to the current policies.

In the second model which is the distributed scheduling there is no central scheduler for job management and scheduling, instead there are several local scheduler used for scheduling jobs which are interacting together. In this model there are two scheduling mechanisms for linking schedulers together [9]: direct communication mechanism, and indirect communication mechanism.

In direct communication mechanism every local scheduler communicates directly with other schedulers and each scheduler has a list of other schedulers that can communicate with.

Third model is hierarchy scheduling which includes a central scheduler and several local schedulers. Central scheduler operates as a meta-scheduler [9] that it sends input jobs to local schedulers. In this paper a model and a job scheduling algorithm in grid computing environment has been offered which operates the same as a decentralize scheduling. This algorithm has been designed on the goal of decreasing the “Makespan” which is the total time required for computing all the jobs in a set [16].

The rest of this paper is organized as follows. Section II introduces related work in the field of scheduling. The proposed scheduling model and algorithm have been offered in section III. In section IV simulation results of the proposed algorithm and its comparison with other methods will be discussed, and conclusions are given in section V.

II. RELATED WORK Various algorithms have been proposed which in recent

years each one has particular features and capabilities. In this

2008 Seventh International Conference on Grid and Cooperative Computing

978-0-7695-3449-7/08 $25.00 © 2008 IEEE

DOI 10.1109/GCC.2008.22

141

section we review several scheduling algorithms which have been proposed in grid environment. In general, the distinction of these algorithms is how they prioritize the jobs.

In [4] a scheduling algorithm which is based on HQ-GTSM is presented. This algorithm not only takes into account the input jobs but also considers the resource migration time in deciding on the scheduling. One of the most important features of this algorithm is that it guarantees the grid quality of service.

DIANA [1] and BLBD [17] are scheduling algorithms that have been proposed in grid computing environment too. Based on these methods, scheduler chooses the best resource for job allocation by considering the system load and cost of resource allocation. It is interesting to know that BLBD is an adaptive scheduling which is more focused on the guarantee of quality of service.

Weighted meta-scheduling is presented in [6]. This algorithm takes into account both the system load and workload in order to enhancing the resource allocation.

DFPLTF (Dynamic FPLTF) [3] algorithm is the dynamic form of FPLTF (Fastest Processor to Largest Task First) algorithm. In FPLTF algorithm faster processors first are allocated to biggest job [2]. This method has proved a good functionality in the heterogeneous parallel machines. The base of DFPLTF algorithm is job prioritizing in a way that the big job has the highest priority for execution. One of the interesting points in DFPLTF algorithm processor’s rate and job’s length that should be predicted beforehand which may cause loss of time.

WQ (Work Queue) [15] algorithm is a classic scheduling method which first was designed for parallel and heterogeneous machines. WQ [15] prioritizes the input jobs arbitrarily. The concept of WQ refers to the input jobs that will be allocated to faster processors those of slower ones. This algorithm has not used any prediction technique for processors rate and length of input jobs.

III. THE PROPOSED SCHEDULING MODEL In this model there is no central scheduler but it contains

some local schedulers which are responsible for job scheduling. In this proposed model, each scheduler is not necessarily connected with other scheduler directly. Each local scheduler encompasses a list that stores necessary information derived from adjacent scheduler. So, the proposed model somehow take advantages of distributed model and it has removed its defects. Although in direct communication mechanism, the presence of a direct link between each local scheduler has led to the bottleneck removal, but it has caused complexity and non-scalability. Another advantage of this new model in comparison with centralized model is that each of the local schedulers is capable of following various policies to decide on job scheduling. The new model also has some other features such as fault tolerance and reliability.

A. The New Scheduling Model A new decentralized indirect communication mechanism is

recommended for architecture of every local scheduler in each

Figure 1. The proposed scheduling architecture

node. According to Fig. 1 this architecture is made of following components:

1- Local scheduler.

2- Local queues related to each user.

3- Job manager unit.

4- Resource discovery unit.

5- Scheduling decision unit.

6- Dispatcher unit.

As it was mentioned in the previous section each of the local schedulers are responsible for the input job scheduling. They perform the local scheduling of jobs according to their requirements and allocate proper resources to them. Since the grid scheduler operates in a dynamic environment therefore it encounters unexpected and unpredictable events and changes, such as failing one of host or login a user into system. Note that the job processing time is to some extent dependant on these random events and changes. Therefore in designing the scheduling algorithms a kind of procedure that would be capable of encountering such random events and changes should be taken into account. For this reason, some of the scheduling algorithms have used some prediction techniques for predicting the time. But because predicting such changes and events is time-consuming and complex and on the other hand in job scheduling using the time efficiency is an important issue. So, in the proposed algorithm we have tried to schedule the input jobs without predicting the time. After the job being inserted, they will enter the “resource discovery” unit. The duty of this unit is to identify the hosts in grid system and enquiring the necessary information from them. One of the most important information is receiving the rate of every processors and the average traffic load of communication links between user and each of the hosts. Therefore the “resource discovery” unit is responsible for delivering the received information to “scheduling decision” unit in order for the input job to be

142

-------- Function resource_discover ------

1: request(user x , r j)

2: RTTij =Round Trip Time

3: total_link_load x = ∑=

j

ixxloadlink _

4: cpu_rate j = get_processor_rate(r j )

allocated to the most appropriate host for the execution of the input jobs, based on the policy of the proposed algorithm.

After making decision and choosing the proper resource by “scheduling decision” unit, “dispatcher” unit sends the job to desired resource.

B. The Proposed Scheduling Algorithm In this algorithm with each user’s login into the system, the

“resource discovery” unit demands for the rate of presence processors in each host and also the average traffic load between user and host. This is done by the resource_discover function in Fig. 2.

Since some time is spent on sending requests to hosts and receiving responses from them, so Round Trip Time must be considered well every time that local scheduler sends the request message for the presence processors rate available in each host. This function requires sending a packet to specified host. Thus, packet will choose the best route according to routing algorithms available in routers. As there might be various traffics in the communication links, so scheduler must consider the total loads of the communication link from which the packet must pass to reach the host, in the process of job scheduling. Since the load of each communication links may change, so this parameter has a key role in choosing the best host for input job allocation.

In the proposed algorithm two features of input jobs are taken into account, which are as follows: million instruction of each job [10] and the size of each job in bytes. Therefore with respect to the host features, communication links and input jobs, we can obtain the computational cost of input jobs on each host as follows:

(1)

Thus, the function of computational_cost(ti , rj) in Fig. 3 is responsible for computing the computational cost obtained from (1). In the above relation, computational_cost(ti , rj) refers to cost of executing job ti on host rj. RTTi j is round trip time obtained from userx request from host rj. TSi is the amount of input job ti in bytes, is the total average load of communication links in bytes per second on which job ti must be passed for the host to be allocated. PBTi is the number of million instructions (or processing requirements [14]) of job ti , and cpu_rarej is the processors rate available on host rj in MIPS. One of the features of (1) is that the transfer cost from one host to another has been considered.

The first criterion of this algorithm for allocating input jobs to hosts is computational cost which can be obtained by (1).

In some cases, (1) may show the low computational cost for the host but in fact that particular host would not has appropriate CPU load balancing percent which can not start executing the job instantly. Therefore another factor called New_factor must be considered. According to (2) New_Factor(rj) defines the CPU load balancing percent in host rj. It is important to note that the proposed algorithm is

Figure 2. Function resource_discover algorithm

designed with an assumption that each host might have more than one processor.

(2)

cpu_lbj refers to the CPU load balancing percent of host ri. Fig. 4 shows the proposed scheduling algorithm. Considering the (1) and (2) a host with lower computational cost and more appropriate CPU load balancing percent has higher priority for the jobs being allocated to them.

IV. EXPERIMENTAL RESULTS This section deals with presenting simulation results and

analyzing the proposed algorithm. For implementing the new algorithm, we use the simulation tool called GridSim [14].

GridSim is a simulation tool based on java for grid environment through which we can simulate the distributed and parallel systems. The proposed algorithm is compared with WQ and DFPLTF algorithms. All the tests have been done on a system with AMD 3500+ processor and 1024 MB DDR2.

Various tests based on different patterns of jobs input and different number jobs entering the system have been done under the same conditions. The patterns of jobs insertion which has been used in each test is based on Normal, Poisson and Exponential distribution.

Thirteen hosts have been used for simulation. Parameters related to hosts are shown in Table1. Jobs used in this paper include 1000000 to 2000000 instructions.

Figure 3. Computational_cost function

j

ij

ixx

iijji RateCPU

PBT

loadlink

TSRTTrttnalComputatio__

),(cos_ ++=

∑=

jj lbcpurFactorNew _)(_ =

∑=

j

ixxloadlink _

------- Function computational_cost ------

1: computational_cost(t i , r j )

2: computation_cost[j]=j

ij

x

iij ratecpu

PBTloadlink

TSRTT__

++

3: return(computation_cost[j])

143

1: R { total host } 2: rj {host rj } 3: cpu_lbj { CPU load balancing percent of host rj } 4: computational_cost (ti , rj) {return computational cost of executing job ti on host rj} 5: get_cpu_lb(rj) { returns the CPU load balancing percent of host rj } 6: find_best(computation_cost[j], cpu_lb(rj) {find a host with minimum computational cost and the most appropriate CPU load balancing percent } 7: assign(ti , rj) { assign job ti to host rj } -------------- New algorithm ---------------- 1: Wait(event) 2: if(event=received new job ) 3: begin 4: forall j:1 to R 5: request(rj) 6: computation_cost[j]=computational_cost(ti , rj) 7: cpu_lbj=get_cpu_lb(rj)

8: end for 9: C j = find_best(computation_cost[j], cpu_lbj) 10: assign(ti , rj) 11: end if

0

2000

4000

6000

8000

10000

12000

14000

16000

10 20 30 40 50 60 70 80 90 100 110 120 130 140

Number of Jobs

Mak

espa

n

WQDFPLTFnew algorithm

Figure 4. Proposed Scheduling Algorithm

A. First test: Pattern of jobs insertion based on normal distribution Most of the quantitative and contiguous indicators have

normal distribution and most of the precise statistical methods are used for the data which have such a distribution. According to (3) important parameters of normal distribution functions are , the average of input jobs, and , the variance of input jobs.

(3)

TABLE I. GRID HOST PARAMETERS FOR THE SIMULATION

MIPS. Of PE

No. of. PE

Host

399 4 R0 395 4 R1 395 4 R2 394 4 R3 390 4 R4 388 4 R5 387 4 R6 386 4 R7 385 4 R8 383 4 R9 381 4 R10 380 4 R11 387 4 R12

In this test, jobs will be entered in the system based on normal distribution. Then, they are scheduled by means of three scheduling algorithms: WQ, DFPLTF and the proposed algorithm and results will be obtained on the scale Makespan and number of input jobs. As it is clear in Fig. 5 at first increasing the number of input jobs does not have much effect on the algorithms particularly on DFPLTF algorithm and the proposed algorithm. Interesting point is that DFPLTF and WQ algorithms have more sensitivity to input jobs. For instance when the number of input jobs increases to more than 100 the total amount of Makespan increases greatly but does not have much effect on the proposed algorithm; therefore, it shows a better efficiency.

B. Second test: Pattern of jobs insertion based on Poisson distribution In this test every job will be entered in the grid system

based on the Poisson distribution. In general, Poisson distribution is an appropriate model when a large number of independent and similar jobs enter the system. According to (4), parameters of this distribution are λ and x which are average limit of input jobs and number of random jobs in an interval respectively.

(4)

In this case, the new algorithm is simulated with respect to Poisson distribution and its results will be compared to WQ, DFPLTF algorithms under the same conditions. Based on the results obtained from Fig. 6, it can be seen that WQ algorithm has much sensitivity to the number of input jobs whereas although DFPLTF algorithm has a less sensitivity to job increase but in the case of job input increase from 60 to 70 has more effect on the increase of jobs Makespan.

In this test, the proposed algorithm has little sensitivity to the input increase and also average job Makespan is less than the other two algorithms.

Figure 5. Comparison three algorithms according to normal distribution

μ 2σ

2

2

2)(

221)( σ

μ

πσ

−−=

x

exf

!);(

xexf

xλλλ−

=

144

0

1000

2000

3000

4000

5000

6000

7000

8000

9000

10000

10 20 30 40 50 60 70 80 90 100 110 120 130 140

Number of Jobs

Mak

espa

n

WQDFPLTFnew algorithm

0

10000

20000

30000

40000

50000

60000

10 20 30 40 50 60 70 80 90 100 110 120 130 140

Number of Jobs

Mak

espa

n

WQDFPLTFNew Algorithm

Figure 6. Comparison three algorithms according to Poisson distribution

C. Third test: Pattern of jobs insertion based on exponential distribution When job’s insertion rate and the servicing rate of them

would be random variables we can suppose that they follow the same distribution. One of the key features of exponential distribution is being memoryless which means that the arrival time of the previous job is independent of the next job and it’s the same for the rate of servicing them. According to (5), one of the parameters related to exponential distribution is which describes the average of the input jobs.

(5)

In this test, jobs are inserted into the system on the based of exponential distribution. Job scheduling is simulated based on the WQ, DFPLTF and the proposed algorithms. According to the results shown in Fig. 7, although increasing the number of input jobs in the new algorithm has effect on the jobs completion but it has more effect on DFPLTF and WQ algorithms, especially by increasing the number of input jobs from 40 to 70. Also the proposed algorithm has a more balanced procedure while input jobs change and also has less average response time.

Figure 7. Comparison three algorithms according to exponential distribution

V. CONCLUSION Resource sharing in grid environment is an inevitable task

and the scheduling concept is one of the most important issues in grid computing. In this paper a model and a job scheduling algorithm in grid environment have been presented in order to enhance the average Makespan of input jobs. This algorithm has a key role in increasing the speed of executing input jobs by considering the parameters of input jobs, communication links and computational capability of resources. One of the distinguished features of proposed algorithm is scalability and also its tolerance to unexpected changes such as failing one of the hosts or entering a new user into system. The results of simulation and proposed algorithm analysis under various statistical patterns of job insertion into system has proven it as an algorithm with high efficiency in comparison with other two methods. Finally it plays a key role in reducing the average job completion time.

REFERENCES [1] A.Anjum, R. McClatchey, H. Stockinger, A. Ali, I. Willers, M. Thomas,

M. Sagheer, K. Hasham, and O.Alvi, “DIANA Scheduling Hierarchies for Optimizing Bulk Job Scheduling,” in Proc. Second IEEE Internatioanl Conference On e-Science and grid computing (e-Science’06), 2006.

[2] D. A. Menasc´e, D. Saha, S. C. D. S. Porto, V. A. F. Almeida, and S. K. Tripathi, “Static and dynamic processor scheduling disciplines in heterogeneous parallel architectures,” Journal of Parallel and Distributed Computing, 28:1–18, 1995.

[3] D. Paranhos, W. Cirne, and F. Brasileiro, “Trading cycles for information: Using replication to schedule bag-of-tasks applications on computational grids,” In International Conference on Parallel and Distributed Computing (Euro-Par), Lecture Notes in Computer Science, volume 2790, pages 169–180, 2003.

[4] Huyn zhang, chanle wu, Q.xiong, and L.Wu,G. Ye, “Research on an Effective Mechanism of Task Scheduling in Grid Environment,” IEEE, Fifth International Conference on Grid and Cooperative Computing (GCC’06), 2006.

[5] I. Foster, “What is the Grid?,” Daily News And Information For The Global Grid Community, Vol.1, No.6, July 22,2002.

[6] Jie Song, Chee-Kian Koh, Simon See, and Gay Kheng, “Performance Investigation of Weighted Meta-scheduling Algorithm for Scientific Grid,” The 4th International Conference on Grid and Cooperative Computing (GCC 2005) LNCS 3795,pp.1021-1030, 2005.

[7] K. Al-Saqabi, S. Sarwar, and K. Saleh, “Distributed gang scheduling in networks of heterogeneous workstations,” Computer Communications Journal, 1997, pp.338-348.

[8] Maheswaran M, Ali S, Siegel H J, et al, “Dynamic mapping of a class of independent tasks on to heterogeneous computing systems,” In the 8th IEEE Heterogeneous Computing Workshop (HCW '99),San Juan, Puerto Rico, Apr. 1999, pp.30-44.

[9] Maozhen. Li, Mark. Baker, The Grid. Core Technologies, John Wiley & Stons, UK, 2005.

[10] Nithiapidary Muthuvelu, Junyang Liu, Nay Lin Soe, Srikumar Venugopal, Anthony Sulistio, and Rajkumar Buyya, “A Dynamic Job Grouping-Based Scheduling for Deploying Applications with Fine-Grained Tasks on Global Grids,” ACSW Frontiers 2005: 41-48.

[11] Noriyuki Fujimoto, and Kenichi Hagihara, “A Comparison among Grid Scheduling Algorithms for Independent Coarse-Grained Tasks,” SAINT 2004 Workshop on High Performance grid computing and Networking, IEEE Press, pp.674-680, Tokyo Japan, January 26-30, 2004.

[12] O. H. Ibarra , C. E. Kim, “Heuristic algorithms for scheduling independent tasks on nonidentical processors,” Journal of the ACM, 24(2):280–289, 1977.

⎩⎨⎧

≤=

−

0,00,

)(x

xexf

xλλ

145

[13] Rajkummar Buyya, “Economic-based Distributed Resource Management and Scheduling for grid computing,” PhD thesis, Monash university, Melborn, Australia, April 12,2002.

[14] Rajkummar. Buyya, Manzur. Murshed, ‘‘GridSim: a toolkit for the modeling and simulation of distributed resource management and scheduling for grid computing,,” John Wiley & Sons, 2002.

[15] R. L. Graham, “Bounds for certain multiprocessing anomalies,” Bell System Technical Journal, 45:1563–1581.1966.

[16] Srikummar Venugopal, “Scheduling Distributed Data-Intensive Applications on Global Grid,” PhD thesis, university of Melbourne, Australia, July 2006.

[17] T.Wang, X. zhou, Q. Liu, Z. Yang, and Y. Wang, “An adaptive Resource Scheduling Algorithm for Computational Grid,” IEEE Asia-pacific Conference on Services Computing (APSCC’06), 2006.

[18] XiaoShan He, XianHe Sun, and Gregor von Laszewski, “QoS Guided Min-Min Heuristic for Grid Task Scheduling,,” Computer Science and Technology, 18(4):442-451, July 2003.

146

[ieee 2008 seventh international conference on grid and cooperative computing (gcc) - shenzhen,...

Documents