[ieee 2013 4th international conference on computer and communication technology (iccct) - india...

5
A Genetic Algorithm based Scheduler for Cloud Environment S.Sindhu Department of Information Science and Technology Anna University Chennai, India Dr.Saswati Mukherjee Department of Information Science and Technology Anna University Chennai, India Abstract— Cloud Computing is a computing model that is widely accepted both in the academia and industry mainly because it offers resources on-demand and they are rapidly provisioned. Such a provisioning sytem calls for efficient scheduling mechanisms for allocation and de-allocation of resources. A good scheduling mechanism should satisfy the QoS requirements of the user and at the same time make an efficent utilization of resources. Scheduling algorithms which are application-centric tend to optimize the performance of each individual application whereas those that are resource- centric tend to optimize the performance of each resource. Hence there exists a need for a good scheduling algorithm that optimizes both these factors. This paper proposes a bi-ojective Genetic Algorithm based scheduler for cloud that optimizes the makespan (application-centric) and average processor utilization (resource-centric).) Keywords- Cloud computing, Scheduling, Genetic Algorithms, Heuristic, Virtualization I. INTRODUCTION Cloud Computing, a new computing model which is gaining momentum nowadays combines several important concepts from distributed computing, grid computing, service oriented architecture, virtualization and web services. Virtualization is one of the key concepts in cloud. Cloud Computing mainly relies on the use of virtualized data centers (VDCs) for provisioning of resources [18]. In such a scenario the problem of job scheduling and resource management opens up new challenges. Scheduling is a process that maps and manages the execution of tasks on distributed resources. Proper scheduling can have significant impact on the performance of the system. Scheduling of jobs in a cloud environment is left to the virtual machine layer through the use of resource virtualization [19]. Hence it opens up a number of research issues like fairness in resouce allocation, providing the desired Quality of Service to the cloud users, efficient scheduling of virtual machines such that the number of data centers that are up and running at any instant of time is minimal. Baomin Xu, et. al. [19] in their work have proposed a berger model for scheduling of jobs in a cloud environment that considers the fairness in resource allocation. Xiangzhen Kong, et. al. [18] have used the fuzzy prediction method for efficient scheduling of jobs in a virtualized environment. An Activity Based Cost (ABC) model for scheduling of jobs is considered in [20] whereby the cost incurred by the users for using a cloud to run their applications is minimal. It has been proved that an optimal mapping of tasks to distributed resources is an NP-complete problem. For such problems, no known algorithms are able to generate an optimal solution within polynomial time. Thus there is a need to apply stochastic optimization methods to solve the scheduling problem [2]. Stochastic optimization methods are optimizations that generate and use random variables. There are many stochastic optimization methods, some of which are simulated annealing, swarm algorithms, evolutionary algorithms. The main advantage of genetic algorithms over other traditional optimization methods is that while most other algorithms are serial in nature, genetic algorithms are parallelizable. Genetic algorithms intrinsically work with many solutions in parallel which enables them to explore the solution space in multiple directions at any time thereby converging faster [2]. Genetic Algorithms have been widely applied in several fields for job scheduling [11,12,13,14]. This paper explores the applicability of Genetic Algorithms for scheduling tasks in a cloud environment. Scheduling algorithms which are application-centric tend to optimize the performance of each individual application whereas those that are resource-centric tend to optimize the performance of each resource [21]. Here in our work we have proposed a novel scheduling algorithm based on Genetic Algorithm that is both application-centric and resource-centric. We have formulated a multi-objective Genetic algorithm that tries to optimize both the makespan (application-centric) and average processor utilization (resource-centric). In cloud the ratio of Virtual Machines (VMs) to hosts in a Datacentre varies depending on the time of the day and load. All these relative changes in the numbers of VMs with repect to the number of hosts cannot be efficiently scheduled using one scheduling algorithm that tries the mapping in the same manner irrespective of the number of VMs. In this research, we have addressed this problem by experimenting with various initial population methods and identifying the ones that gives promising solution. The rest of the paper is organized as follows. Section 2, describes the various related work in this area. In section 3 a formal definition of the problem is presented. Section 4 describes the architecture of our proposed work and summarizes the usefulness of propsed work. Section 5 describes the experimental setup, analyzes the simulation results and various parameters used. Conclusion and future work are presented in Section 6. 2013 4th International Conference on Computer and Communication Technology (ICCCT) 978-1-4799-1572-9/13/$31.00 ©2013 IEEE 23

Upload: saswati

Post on 27-Feb-2017

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: [IEEE 2013 4th International Conference on Computer and Communication Technology (ICCCT) - India (2013.09.20-2013.09.22)] 2013 4th International Conference on Computer and Communication

A Genetic Algorithm based Scheduler for Cloud Environment

S.Sindhu Department of Information Science and Technology

Anna University Chennai, India

Dr.Saswati Mukherjee Department of Information Science and Technology

Anna University Chennai, India

Abstract— Cloud Computing is a computing model that is widely accepted both in the academia and industry mainly because it offers resources on-demand and they are rapidly provisioned. Such a provisioning sytem calls for efficient scheduling mechanisms for allocation and de-allocation of resources. A good scheduling mechanism should satisfy the QoS requirements of the user and at the same time make an efficent utilization of resources. Scheduling algorithms which are application-centric tend to optimize the performance of each individual application whereas those that are resource-centric tend to optimize the performance of each resource. Hence there exists a need for a good scheduling algorithm that optimizes both these factors. This paper proposes a bi-ojective Genetic Algorithm based scheduler for cloud that optimizes the makespan (application-centric) and average processor utilization (resource-centric).)

Keywords- Cloud computing, Scheduling, Genetic Algorithms, Heuristic, Virtualization

I. INTRODUCTION Cloud Computing, a new computing model which is

gaining momentum nowadays combines several important concepts from distributed computing, grid computing, service oriented architecture, virtualization and web services. Virtualization is one of the key concepts in cloud. Cloud Computing mainly relies on the use of virtualized data centers (VDCs) for provisioning of resources [18]. In such a scenario the problem of job scheduling and resource management opens up new challenges. Scheduling is a process that maps and manages the execution of tasks on distributed resources. Proper scheduling can have significant impact on the performance of the system. Scheduling of jobs in a cloud environment is left to the virtual machine layer through the use of resource virtualization [19]. Hence it opens up a number of research issues like fairness in resouce allocation, providing the desired Quality of Service to the cloud users, efficient scheduling of virtual machines such that the number of data centers that are up and running at any instant of time is minimal. Baomin Xu, et. al. [19] in their work have proposed a berger model for scheduling of jobs in a cloud environment that considers the fairness in resource allocation. Xiangzhen Kong, et. al. [18] have used the fuzzy prediction method for efficient scheduling of jobs in a virtualized environment. An Activity Based Cost (ABC) model for scheduling of jobs is considered in [20] whereby the cost incurred by the users for using a cloud to run their applications is minimal. It has been proved that an optimal

mapping of tasks to distributed resources is an NP-complete problem. For such problems, no known algorithms are able to generate an optimal solution within polynomial time.

Thus there is a need to apply stochastic optimization methods to solve the scheduling problem [2]. Stochastic optimization methods are optimizations that generate and use random variables. There are many stochastic optimization methods, some of which are simulated annealing, swarm algorithms, evolutionary algorithms. The main advantage of genetic algorithms over other traditional optimization methods is that while most other algorithms are serial in nature, genetic algorithms are parallelizable. Genetic algorithms intrinsically work with many solutions in parallel which enables them to explore the solution space in multiple directions at any time thereby converging faster [2]. Genetic Algorithms have been widely applied in several fields for job scheduling [11,12,13,14]. This paper explores the applicability of Genetic Algorithms for scheduling tasks in a cloud environment. Scheduling algorithms which are application-centric tend to optimize the performance of each individual application whereas those that are resource-centric tend to optimize the performance of each resource [21]. Here in our work we have proposed a novel scheduling algorithm based on Genetic Algorithm that is both application-centric and resource-centric. We have formulated a multi-objective Genetic algorithm that tries to optimize both the makespan (application-centric) and average processor utilization (resource-centric). In cloud the ratio of Virtual Machines (VMs) to hosts in a Datacentre varies depending on the time of the day and load. All these relative changes in the numbers of VMs with repect to the number of hosts cannot be efficiently scheduled using one scheduling algorithm that tries the mapping in the same manner irrespective of the number of VMs. In this research, we have addressed this problem by experimenting with various initial population methods and identifying the ones that gives promising solution.

The rest of the paper is organized as follows. Section 2, describes the various related work in this area. In section 3 a formal definition of the problem is presented. Section 4 describes the architecture of our proposed work and summarizes the usefulness of propsed work. Section 5 describes the experimental setup, analyzes the simulation results and various parameters used. Conclusion and future work are presented in Section 6.

2013 4th International Conference on Computer and Communication Technology (ICCCT)

978-1-4799-1572-9/13/$31.00 ©2013 IEEE 23

Page 2: [IEEE 2013 4th International Conference on Computer and Communication Technology (ICCCT) - India (2013.09.20-2013.09.22)] 2013 4th International Conference on Computer and Communication

II. RELATED WORK This section gives a brief overview of some related works

done in scheduling using genetic algorithms in a cloud. Zhongni Zheng, Rui wang, Hai Zhong, Xuejie Zhang [4] in their paper discusses the applicability of Parallel Genetic algorithm to find an optimal placement of virtual machines so as to maximize the utilization of resources. GAN Guo-ning, HUANG Ting-Lei, GAO Shuai [5] proposed a genetic simulated annealing algorithm for optimized task scheduling in cloud computing. It also considers the QoS requirements of different types of tasks. In [6] an algorithm which can find a fast mapping using genetic algorithms with “exist if satisfy” condition that speeds up the mapping process and ensures the meeting of all task deadlines is proposed. Chenhong Zhao, Shanshan Zhang, Qingfeng Liu [7] proposes an optimized algorithm based on Genetic algorithm to schedule independent and divisible tasks adapting to different computation and memory requirements. In [8], a genetic algorithm approach to cost based multi QoS job scheduling has been proposed. A model for cloud computing environment has been also proposed and some popular genetic cross over operators, like PMX, OX, CX and mutation operators, swap and insertion mutation are used to produce a better schedule. The algorithm guarantees the best solution in finite time. Sandeep Tayal in his paper [9] proposes an optimized algorithm based on the Fuzzy GA optimization which makes a scheduling decision by evaluating the entire group of tasks in the job queue, implemented in Hadoop MapReduce framework. The work presented in this paper differs from the other works in that the applicability of a standard GA is considered here without parallelization and considers the optimization of makespan and average processor utilization. The experiments are conducted in a simulated environment provided by CloudSim.

III. PROBLEM FORMULATION A good scheduling algorithm should lead to better

resource utilization and at the same minimize the makespan. Our aim is to find an optimal mapping of tasks to virtual machines and virtual machines to processing elements so as to minimize the makespan and at the same time maximize the resource utilization. In a private cloud where the resources are limited, makespan and resource utilization play an important role to decide the efiiciency of a scheduling algorithm.

An instance of the problem can be formally defined as : • A number of independent tasks that has to be

scheduled. Let J = {J1, J2, J3,...,Jn} be set of tasks to be scheduled. Any task has to be executed entirely in a resource. No pre-emption of tasks is allowed.

• A number of Virtual Machines participating in the planning, V = {V1, V2, ...,Vm}.

• A number of Processing Elements participating in the mapping, P = {P1, P2, ... Pk}.

• The workload of each job expressed as Instruction Count (IC).

• The computing capacity of each Processing Element expressed as MIPS (Million Instructions per Second).

• The Expected Time to Compute (ETC) matrix – It is nxk matrix, where ETC[i][j] represents the expected time to execute Ji on Pj.

IV. CLOUD SCHEDULING USING GA Overall working of a standard GA algorithm is described

below : Genetic Algorithm { Begin

Generate an initial population GAi. Evaluate each individual chromosome in GAi. Repeat until (termination condition) is satisfied {

Select Parents from GAi to be copied to GAi+1;

Perfom crossover on the chromosomes of GAi and copy them to GAi+1;

Mutate elements of GAi and copy them to GAi+1;

Evaluate the new population GAi+1; }

End}

A. Chromosome Representation Each chromosome represents a solution to the scheduling problem. The following Table 1 shows the chromosome representation.

J1 J2 J3 J4 ... Jn-1 Jn

V3 V5 V1 V6 ... V2 V4

PE1 PE3 PE2 PE1 PE3 PE2

Table 1 : Chromosome Representation J1, J2, ...,Jn represents the tasks. V1, V2, ...,Vm etc. represent the Virtual Machines and PE1, PE2, ... represent the Processing Elements.

B. Fitness Function The fitness function used in this work involves the following parameters.

A chromosome represents a schedule, which is defined using the mapping between jobs, Virtual Machines (VMs) and Processing Elements (PEs) (Refer Table 1).

S : set of chromosomes. (one generation)

2013 4th International Conference on Computer and Communication Technology (ICCCT)

24

Page 3: [IEEE 2013 4th International Conference on Computer and Communication Technology (ICCCT) - India (2013.09.20-2013.09.22)] 2013 4th International Conference on Computer and Communication

aj : arrival time of task j ejp : execution time of task j on processing element p CTjp = completion time of task j on processing element p

Then, the makespan of schedule s is given by

sβ = max(CTjp ) 1≤j≤n ,1≤p≤k (1)

Let Processor utilization of an individual Processing Element Pi be defined as,

Utilization[Pi] = CT[Pi] / sβ (2)

where CT[Pi] denotes the completion time of all tasks of

processing element Pi. Hence the average processor utilization, µ is defined as

follows: µ = ∑ Utilization[Pi] / no_PE , 1≤i≤k (3) Our objective in this research is to minimize sβ and

maximize µ. To meet this objective, we define the fitness function f, as follows:

f = (1/ sβ ) + µ (4)

C. Initial Population

In this work we have experimented with four different methods to generate the initial population. Three of the methods, namely, LCFP,SCFP and MCT uses seeding where one solution in initial population is generated using one of these algorithms and the rest are generated randonmly.

They are discussed below.

Random In this method the chromosomes are generated randomly using a uniform random distribution with different population sizes [17].

LCFP The LCFP [15] is a heuristic method which assigns the longest cloudlet to the fastest processing element. This method introduces diversity in the initial population which improves the searching process.

SCFP This heuristic is an inverse of LCFP. Here we assign the shortest cloudlet to the fastest processing element.

MCT MCT heuristic [22] assigns each task to machine

which has the minimum expected completion time for that task.

D. Selection Here we have considered roulette wheel [23] method which is a generic method for selecting chromosomes for the next population. In this method the probability of an individual

surviving into the next generation is determined by dividing the fitness value of the individual by the sum of fitness value of all individuals in the current population.

E. Crossover Crossover and mutation are the two operators which help in reproduction. It selects genes from parent chromosome, performs some variations and then creates a new offspring. While the probability of crossover is high the probability of mutation is often low. In this work we have experimented with three different crossover methods namely, single crossover, double crossover and uniform crossover.

F. Mutation Mutation is also a method of reproduction which enables to widen the search space so that it does not get stuck at local optima. The mutation process consists in randomly perturbing the individuals of the population and is applied with certain probability pm. Two mutation methods, namely, move and swap are considered here.

V. RESULTS DISCUSSION The algorithms for simulation are implemented on an Intel Dual Core machine with 320GB HDD and 2GB RAM on Cent OS 5.5. The experiments are conducted on a simulated Cloud environment provided by CloudSim. CloudSim [10] is a generalized, extensible simulation framework that enables modeling, simulation, and experimentation of Cloud computing infrastructures and applicat ion services. We have compared our proposed methods against the default scheduling policy in CloudSim which is a combination First Come First Served and Round Robin. The speed of each processing element is expressed in MIPS (Million Instructions per Second) and the length of each cloudlet is expressed as the number of instructions to be executed. The simulation environment consists of one Data Center with three hosts having three, two and three Processing Elements respectively. Each Processing Element is assigned varying computing power (varying MIPS) as shown in Table 2 below.

Host PE id MIPS H1 10 400 H1 11 860 H1 12 800 H2 20 450 H2 21 400 H3 30 900 H3 31 760 H3 32 800

Table 2 : Computing power of various PEs An initial population of 200 chromosomes was generated randomly. Initial crossover probility was fixed at 0.65 and mutation probability at 0.5 as shown in Table 3 below. We conducted the experiment for 300 iterations.

2013 4th International Conference on Computer and Communication Technology (ICCCT)

25

Page 4: [IEEE 2013 4th International Conference on Computer and Communication Technology (ICCCT) - India (2013.09.20-2013.09.22)] 2013 4th International Conference on Computer and Communication

Initial population size 200 Crossover probability 0.65 Mutation probability 0.5 No. of iterations 300 Table 3 : Experiment Parameters

Fig 1. shows the performance of Genetic Algorithm to find an optimum schedule. FCFS-RR represents the default scheduling policy in Cloudsim which is a combination of First Come First Served and Round Robin. GA-LCFP represents the combination with LCFP as the initial population, double crossover and Roulette wheel selection. GA-SCFP represents the combination with SCFP as the initial population, double crossover and Roulette wheel selection. GA-MCT represents the combination with MCT heuristic as the initial population, double crossover and Roulette wheel selection. The overall makespan to execute the cloudlets is used as the metric to evaluate the performance of the proposed algorithms.

Fig.1 Performance of GA The number of cloudlets used in our experiment was varied from 50 to 250 with variable lengths and also the experiments were conducted by suitably varying the number of Virtual Machines. It has been observed that the genetic algorithm that used the LCFP heuristic for generating the initial population provides an optimal solution and also converges faster when compared with the other combinations. This is because here the initial population is seeded with a solution that assigns the longest task to the fastest processing element which aids in minimizing the makespan and maximizes the processor utilization. But when the number of cloudlets was increased to around 250, all though the GA-LCFP produced optimal solution, the time required for convergence was more. It was also observed that when the number of Processing Elements is

almost the same as the number of Virtual Machines then GA-MCT heuristic exhibits better performance as against the GA-LCFP heuristic.Here the space-shared VM scheduling policy of CloudSim is used. When the number of Processing Elements is almost half the number of Virtual machines then the GA-LCFP heuristic exhibits better performance as compared with the other methods. Here the time-shared VM scheduling policy is used.

VI. CONCLUSION AND FUTURE WORK This paper proposed a novel multi-objective Genetic Algorithm that considers both application-specific and resource-specific scheduling objectives. Various combinations of genetic operators are tried and the best one which converges faster and gives a promising solution has been identified. In future, we would like to experiment with parallel version of Genetic Algorithm and also experiment by including QoS metrics like deadline, priority.

REFERENCES

[1] Rajkumar Buyya, Chee Shin Yeo, Srikumar Venugopal, James Broberg, and Ivona Brandic, “Cloud computing and Emerging IT Platforms: Vision, Hype, and Reality for Delivering Computing as the 5th Utility”, Future Generation Computer Systems, Elsevier Science, Amsterdam, June 2009, volume 25, Number 6, pp 599-616.

[2] Thomas Weise, “Global Optimization Algorithms – Theory and Application”, 2nd edition, June 2009.

[3] Melanie Mitchell,”An Introduction to Genetic Algorithms”, MIT Press, 1998.

[4] Zhongni Zheng, Rui wang, Hai Zhong, Xuejie Zhang, “An Approach for Cloud Resource Scheduling Based on Parallel Genetic Algorithm”, In: 3rd International Conference on Computer Research and Development (ICCRD), Shangai, 2011.

[5] GAN Guo-ning, HUANG Ting-Lei, GAO Shuai, “Genetic Simulated Annealing Algorithm for Task Scheduling based on Cloud Computing Environment”, In: 2010 IEEE International Conference on Intelligent Computing and Integrated Systems, China.

[6] Nawfal A mehdi, Ali Mamat, Hamidah Ibrahim, Shamala K Subramaniam, “ Impatient Task Mapping in Elastic Cloud using Genetic Algorithm”, Journal of Computer Science, 2011.

[7] Chenhong Zhao, Shanshan Zhang, Qingfeng Liu, “Independent task scheduling based on Genetic Algorithm in Cloud Computing”, In: 5th International Conference on Wireless Communications, Networking and Mobile Computing, China, 2009.

[8] D Dutta. R C Joshi, “A Genetic – Algorithm Approach to Cost-Based Multi-QoS Job Scheduling in Cloud Computing Environment”, In:International Conference & Workshop on Emerging Trends in Technology (ICWET ’11), USA, 2011.

[9] Sandeep Tayal, “Task Scheduling Optimization for the Cloud Computing Systems”, International Journal of Advanced Engineering sciences and Technologies,Vol. no.5.

[10] Rodrigo N,Calheiros, Rajiv Ranjan, Cesar A.F.De Rose, and Rajkumar Buyya, “CloudSim: A Novel Framework for Modeling and Simulation of Cloud Computing Infrastructures and Services”, 2009.

[11] Andrew J Page, Thomas J Naughton, “Framework for Task Scheduling in Heterogenous Distributed Computing using Genetic Algorithms”, In: 15th Artificial Intelligence and Cognitive Science Conference, 2004, Castlebar, Co. Mayo, Ireland.

2013 4th International Conference on Computer and Communication Technology (ICCCT)

26

Page 5: [IEEE 2013 4th International Conference on Computer and Communication Technology (ICCCT) - India (2013.09.20-2013.09.22)] 2013 4th International Conference on Computer and Communication

[12] E.Alba et al., “Efficient Parallel LAN/WAN Algorithms for Optimization. The MALLBA project”, In the Journal of Parallel Computing, Vol. 32, June 2006.

[13] Carlos,Alberto,Gonzalez,Pico, “Dynamic Scheduling of Computer Tasks using Genetic Algorithms”, In: International Conference on Evolutionary Computation(1994), pp. 829-833.

[14] Jing Liu, et. al. , “The Research of ant Colony and Genetic Algorithm in Grid Task Scheduling”, In: International Conference on Multimedia and Information Technology, (MMIT ’08), 2008.

[15] S.Sindhu,Saswati Mukherjee, “Efficient Task Scheduling Algorithms for Cloud Computing Environment”,In High Performance Architecture and Grid Computing, Vol.169, Part 1, 2011. pp. 79-83.

[16] E.G. Talbi, Metaheuristics: From Design to Implementation, Wiley, 2009.

[17] Abraham, A.; Buyya, R., and Nath, B. Nature.s heuristics for scheduling jobs on computational grids. In: The 8th IEEE Int.

Conference on Advanced Computing and Communications, India, 2000.

[18] Xiangzhen Kong, et.al., “Efficient dynamic task scheduling in virtualized data centers with fuzzy prediction”, Journal of Network and Computer Applications”, June 2010.

[19] Baomin Xu, et. al, “Job scheduling algorithm based on Berger model in cloud environment”, Advances in Engineering Software, April 2011.

[20] QI CAO. Et.al., “An Optimized Algorithm for Task Scheduling Based On Activity Based Costing in Cloud Computing”

[21] Fangpeng Dong and Selim G. Akl. “Scheduling Algorithms for Grid Computing:State of the Art and Open Problems”,

[22] Braun, T. D.; Siegel, H. J.; Beck, N.; Bölöni, L. L.;Maheswaran, M.; Reuther, A. I.; Robertson, J. P.; Theys, M.D. and Yao, B. A comparison of eleven static heuristics for mapping a class of independent tasks onto heterogeneous distributed computing systems. Journal of Parallel and Distributed Computing, Vol 61, No 6, 2001, p. 810.837.

[23] D.E. Goldberg, “Genetic Algorithms in Search, Optimization and Machine Learning”, Addison-Wesley, 1989.

2013 4th International Conference on Computer and Communication Technology (ICCCT)

27