cooperative power-aware scheduling in grid computing environments

8
J. Parallel Distrib. Comput. 70 (2010) 84–91 Contents lists available at ScienceDirect J. Parallel Distrib. Comput. journal homepage: www.elsevier.com/locate/jpdc Cooperative power-aware scheduling in grid computing environments Riky Subrata, Albert Y. Zomaya * , Bjorn Landfeldt Centre for Distributed and High Performance Computing, School of Information Technologies, University of Sydney, NSW 2006, Australia article info Article history: Received 11 April 2008 Received in revised form 5 September 2009 Accepted 15 September 2009 Available online 4 October 2009 Keywords: Scheduling Grid computing Game theory abstract Energy usage and its associated costs have taken on a new level of significance in recent years. Globally, energy costs that include the cooling of server rooms are now comparable to hardware costs, and these costs are on the increase with the rising cost of energy. As a result, there are efforts worldwide to design more efficient scheduling algorithms. Such scheduling algorithm for grids is further complicated by the fact that the different sites in a grid system are likely to have different ownerships. As such, it is not enough to simply minimize the total energy usage in the grid; instead one needs to simultaneously minimize energy usage between all the different providers in the grid. Apart from the multitude of ownerships of the different sites, a grid differs from traditional high performance computing systems in the heterogeneity of the computing nodes as well as the communication links that connect the different nodes together. In this paper, we propose a cooperative, power-aware game theoretic solution to the job scheduling problem in grids. We discuss our cooperative game model and present the structure of the Nash Bargaining Solution. Our proposed scheduling scheme maintains a specified Quality of Service (QoS) level and minimizes energy usage between all the providers simultaneously; energy usage is kept at a level that is sufficient to maintain the desired QoS level. Further, the proposed algorithm is fair to all users, and has robust performance against inaccuracies in performance prediction information. Crown Copyright © 2009 Published by Elsevier Inc. All rights reserved. 1. Introduction In recent years, the rising cost of energy has resulted in energy costs taking a significant percentage of the overall cost of owner- ships of processors and related hardware. As a result, energy effi- ciency, not just in battery powered devices but also in large server farms where many processors are located in close proximity and heat dissipation is a major problem, has gained a new level of importance. Efforts on more energy efficient processors are being tackled on multiple fronts, including the hardware and software aspects. In this paper we focus on energy efficient scheduling on web- scale distributed systems, namely the grid. The grid is a promis- ing platform that offers integrated architecture for sharing and aggregation of geographically distributed resources to solve po- tentially large computationally intensive problems. Such platforms are much more cost effective than traditional high performance computing systems. However, computational grids have different constraints and requirements to those of traditional high perfor- mance computing systems, such as heterogeneous computing re- sources and considerable communication delays. Energy efficient * Corresponding author. E-mail addresses: [email protected] (R. Subrata), [email protected] (A.Y. Zomaya), [email protected] (B. Landfeldt). scheduling in grids is further complicated by the fact that the mul- titudes of sites that comprise a grid are likely to have a multitude of ownerships with their own self-interest and priorities. Therefore, a power-aware scheduling algorithm designed for the grid should aim to simultaneously minimize energy usage between all the dif- ferent providers in the grid, and not just the total energy usage in the grid. In general, scheduling algorithms can be classified as static or dynamic. Static scheduling algorithms (e.g. [16]) assume that all in- formation governing scheduling decisions, which can include the characteristics of the jobs, the computing nodes and the commu- nication network, are known in advance. Scheduling decisions are made deterministically or probabilistically at compile time and they remain constant during runtime. The assumption that the characteristics of the computing resources and communication network are all known in advance and remain constant may not apply to a grid environment. In contrast, dynamic scheduling algo- rithms (e.g. [22,32]) attempt to use the runtime state information to make more informative scheduling decisions. Undoubtedly, the static approach is easier to implement and has minimal runtime overhead. However, dynamic approaches may result in better per- formance. One of the major drawbacks of the dynamic algorithms is their sensitivity to inaccuracies in performance prediction in- formation that the algorithm uses for scheduling purposes. Some dynamic scheduling algorithms are more sensitive to the inaccu- racies, and can generate extremely poor results even when the information accuracy is only slightly less than 100%; in real grid 0743-7315/$ – see front matter Crown Copyright © 2009 Published by Elsevier Inc. All rights reserved. doi:10.1016/j.jpdc.2009.09.003

Upload: riky-subrata

Post on 26-Jun-2016

215 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Cooperative power-aware scheduling in grid computing environments

J. Parallel Distrib. Comput. 70 (2010) 84–91

Contents lists available at ScienceDirect

J. Parallel Distrib. Comput.

journal homepage: www.elsevier.com/locate/jpdc

Cooperative power-aware scheduling in grid computing environmentsRiky Subrata, Albert Y. Zomaya ∗, Bjorn LandfeldtCentre for Distributed and High Performance Computing, School of Information Technologies, University of Sydney, NSW 2006, Australia

a r t i c l e i n f o

Article history:Received 11 April 2008Received in revised form5 September 2009Accepted 15 September 2009Available online 4 October 2009

Keywords:SchedulingGrid computingGame theory

a b s t r a c t

Energy usage and its associated costs have taken on a new level of significance in recent years. Globally,energy costs that include the cooling of server rooms are now comparable to hardware costs, and thesecosts are on the increase with the rising cost of energy. As a result, there are efforts worldwide to designmore efficient scheduling algorithms. Such scheduling algorithm for grids is further complicated by thefact that the different sites in a grid systemare likely to have different ownerships. As such, it is not enoughto simply minimize the total energy usage in the grid; instead one needs to simultaneously minimizeenergy usage between all the different providers in the grid. Apart from themultitude of ownerships of thedifferent sites, a grid differs from traditional high performance computing systems in the heterogeneity ofthe computing nodes as well as the communication links that connect the different nodes together. In thispaper, we propose a cooperative, power-aware game theoretic solution to the job scheduling problem ingrids. We discuss our cooperative gamemodel and present the structure of the Nash Bargaining Solution.Our proposed scheduling scheme maintains a specified Quality of Service (QoS) level and minimizesenergy usage between all the providers simultaneously; energy usage is kept at a level that is sufficientto maintain the desired QoS level. Further, the proposed algorithm is fair to all users, and has robustperformance against inaccuracies in performance prediction information.

Crown Copyright© 2009 Published by Elsevier Inc. All rights reserved.

l

1. Introduction

In recent years, the rising cost of energy has resulted in energycosts taking a significant percentage of the overall cost of owner-ships of processors and related hardware. As a result, energy effi-ciency, not just in battery powered devices but also in large serverfarms where many processors are located in close proximity andheat dissipation is a major problem, has gained a new level ofimportance. Efforts on more energy efficient processors are beingtackled on multiple fronts, including the hardware and softwareaspects.In this paper we focus on energy efficient scheduling on web-

scale distributed systems, namely the grid. The grid is a promis-ing platform that offers integrated architecture for sharing andaggregation of geographically distributed resources to solve po-tentially large computationally intensive problems. Such platformsare much more cost effective than traditional high performancecomputing systems. However, computational grids have differentconstraints and requirements to those of traditional high perfor-mance computing systems, such as heterogeneous computing re-sources and considerable communication delays. Energy efficient

∗ Corresponding author.E-mail addresses: [email protected] (R. Subrata), [email protected]

(A.Y. Zomaya), [email protected] (B. Landfeldt).

0743-7315/$ – see front matter Crown Copyright© 2009 Published by Elsevier Inc. Aldoi:10.1016/j.jpdc.2009.09.003

scheduling in grids is further complicated by the fact that the mul-titudes of sites that comprise a grid are likely to have amultitude ofownerships with their own self-interest and priorities. Therefore,a power-aware scheduling algorithm designed for the grid shouldaim to simultaneously minimize energy usage between all the dif-ferent providers in the grid, and not just the total energy usage inthe grid.In general, scheduling algorithms can be classified as static or

dynamic. Static scheduling algorithms (e.g. [16]) assume that all in-formation governing scheduling decisions, which can include thecharacteristics of the jobs, the computing nodes and the commu-nication network, are known in advance. Scheduling decisions aremade deterministically or probabilistically at compile time andthey remain constant during runtime. The assumption that thecharacteristics of the computing resources and communicationnetwork are all known in advance and remain constant may notapply to a grid environment. In contrast, dynamic scheduling algo-rithms (e.g. [22,32]) attempt to use the runtime state informationto make more informative scheduling decisions. Undoubtedly, thestatic approach is easier to implement and has minimal runtimeoverhead. However, dynamic approaches may result in better per-formance. One of the major drawbacks of the dynamic algorithmsis their sensitivity to inaccuracies in performance prediction in-formation that the algorithm uses for scheduling purposes. Somedynamic scheduling algorithms are more sensitive to the inaccu-racies, and can generate extremely poor results even when theinformation accuracy is only slightly less than 100%; in real grid

rights reserved.

Page 2: Cooperative power-aware scheduling in grid computing environments

R. Subrata et al. / J. Parallel Distrib. Comput. 70 (2010) 84–91 85

environments, however, 100% accuracy in information is very hardto achieve and maintain.So-called hybrid scheduling is another area that has been re-

ceiving some attention. In terms of static and dynamic scheduling,a hybrid scheduler attempts to combine the merits of static anddynamic scheduling algorithms, and by doing so minimizes theirrelative inherent disadvantages. Note, however, that the defini-tion between a static and a dynamic scheduling algorithm in itselfis not clear cut, and different authors use slightly different def-initions of static and dynamic algorithms. A hybrid job schedul-ing is a scheduling algorithm that combines the merits of staticand dynamic algorithms was discussed in [4]. Also, in [2], a hy-brid algorithm for adaptive load sharing in distributed systemswasstudied.Scheduling schemes can be either centralized or decentralized. In

the centralized approach (e.g. [21]), one node in the system acts asa scheduler and makes all the scheduling decisions. Information issent from the other nodes to this node, and it is therefore the singlepoint of failure. In the decentralized approach (e.g. [30]), multiplenodes in the system are involved in the scheduling decisions. It istherefore very costly for each node to obtain and maintain the dy-namic state information of the whole system; most decentralizedapproaches have each node obtaining andmaintaining only partialinformation locally to make sub-optimal decisions. Decentralizedapproaches, however, offer better fault tolerance than centralizedones.One popular power model used in many power-aware studies

is the model proposed by Yao [39]. In this model, the processoris assumed to be able to run at any speed s > 0, and the powerrequired by the processor is given by P = sη , where η ≥ 1. For ourpurpose, we use η = 2, and we say that a processor can run at anyspeed s, up to a maximum speed µ depending on the processor.In this respect, we are assuming that the processors are capableof dynamic voltage scaling. It should be noted that dynamic voltagescaling is not a trivial technique that can be simply ‘added’ to aprocessor.In this paper, we propose a cooperative power-aware schedul-

ing algorithm for web-scale distributed systems, namely the grid.Our proposed algorithm directly takes into account the multitudeof ownerships of providers in a grid and incentives for cooperation,and simultaneously maximizes the utility of the providers in thegrid. The schemes can be considered semi-static, as they respondto changes in system states during runtime. However, they do notuse as much information as traditional dynamic schemes; as such,they have relatively low overhead, are relatively insensitive to in-accuracies in performance prediction information, and are stable.The next section presents an overviewof relatedwork, followed

by a discussion on the system model including the grid and com-munication model that we are using. This is followed by details ofthe background and development of the cooperative, power-awarescheduling algorithm. Section 6 reports a number of experimentsthat show the applicability of the proposed approaches.

2. Related work

In recent years there has been an interest in the use of gametheoretic and market oriented models for the design and analysisof distributed systems and networking algorithms [24,27,28]. A so-called G-commerce model was presented in [36] for the controlof computational resources in grid environment; two markets(commodities market and auctions) were studied, and it wasconcluded that the commoditiesmarket is a better choice ofmodel.Under the assumption of exponential service times and Poissonarrival, two job allocation games for distributed systems werepresented in [11,12]. The papers presented analytical models andalgorithmic solutions; however, no communication delays were

taken into account. The Vickrey–Clarke–Groves (VCG) mechanismwas studied in [10] for load balancing in distributed systems;in the model, each computer optimizes its profit by consideringthe payments and costs of handling particular tasks. In [19], anon-cooperative scheduling framework is presented for bag oftask applications. The results show that their framework yields arelatively inefficient Nash equilibrium.More recently, a game theoretic, decentralized load balanc-

ing scheme using general service times and taking into accountcommunication delays was presented in [34]. The use of generalservice times (having finite mean and variance) and an M/G/1queuing system results in a more realistic model than the usualsimplifying assumption of exponential service times – the M/M/1queuing system. There are good reasons for using general ser-vice times: research has shown that process (application/task)lifetimes, web session times, transfer times, user requests (fordocuments/files), and file sizes tend to have a heavy-tailed distri-bution, as opposed to an exponential distribution [9,13,20]. Thispaper uses the Bounded Pareto distributionwith the following pdf:f (x) = αkα ·x−α−1

1−(k/p)α , k ≤ x ≤ p to represent the heavy-tailed prop-erty of the Internet.In terms of decentralization, cooperative games are harder to

achieve, due to the needs for cooperation and control betweenthe players. However, distributed algorithms for network band-width control based on cooperative games have been reportedin the literature [35,38]. Note also that better performance maybe achieved through cooperation; in non-cooperative games, eachplayer selfishly maximizes its own utility and makes decisionsindependently of the other players; these independent decisionsmay be extremely detrimental to the other players. It should benoted that by cooperation a Pareto optimal solution may be ob-tainable. In this sense, the efficiency of the Nash equilibrium canbe measured in terms of how close to the Pareto optimal the Nashequilibrium is. However, in cooperative games one must also takeinto account hostile players – players that do not cooperate fully.From another perspective, an important question for any cooper-ative algorithm or scheme designed for the Internet (or web-scaledistributed systems) whose sites have different ownerships is: Ifthese sites behave in a selfish manner (instead of in a ‘socially re-sponsible’ manner as expected), would the stability of the systemstill hold? Under these conditions the system as a whole wouldtend toward the Nash equilibrium, and it is therefore preferableto have relatively efficient Nash equilibria (e.g. see [1,18]).Several power-aware scheduling algorithms have been pro-

posed in recent years [3,5,6,14,29]. In [6], energy efficient schedul-ing in homogeneous multiprocessors, specifically for real timetasks with a common deadline, was discussed; different powerfunctions were analyzed, and algorithms with and without taskmigrations were proposed. Shin et al. [29] looked at the use of dy-namic voltage scaling for systems with mixed sets of tasks of bothperiodic and aperiodic tasks and the reduction in energy consump-tion that can be achieved. However, none of the algorithms takesinto account the multitude of ownerships of sites that comprise agrid, and the conflicts and interests that may arise.With respect to the above-mentioned previous work, our pro-

posed approach is novel in that we directly take into accountthe multitude of ownerships of providers in a grid and incentivesfor cooperation, and simultaneously maximize the utility of theproviders in the grid. We will also show the effects of communica-tion delays in a grid environment that may affect the completiontime of jobs.

3. Grid model

A grid consists of a set of sites that are connected throughan underlying communication network, which is heterogeneous

Page 3: Cooperative power-aware scheduling in grid computing environments

86 R. Subrata et al. / J. Parallel Distrib. Comput. 70 (2010) 84–91

Fig. 1. Relationship of users, brokers, and providers.

in nature. Certainly, the sites are fully interconnected, meaningthat there exists a communication path between any two sitesin the grid system. For our purpose, we classify a site in the gridas either a user, a broker, or a provider (or a combination of thethree) [34]. With this classification, we can model the grid usingthe hierarchical relationship between users, brokers, and providers(see Fig. 1). As shown in the figure, the grid system consists ofp users, n brokers, and m providers. A user generates jobs to beexecuted by the processors; each user sends a job to a brokerto be scheduled for processing (note that a user may send jobsto more than one broker). A broker, on the other hand, receivesjobs from a set of users and assigns them to the providers in thegrid system; every time a job is received from a user, the brokerdecides which provider will process the job and sends the jobto that provider. Ideally, there would be many more users thanbrokers in the system. As such, the jobs scheduled by the brokersare an aggregate frommany users. Finally, a provider executes andprocesses jobs sent to it. Each provider has a queue that holds jobsto be executed; each job is then processed on a first-come-first-served (FCFS) basis. Fig. 1 shows the relationship between users,brokers, and providers.Each user k is assumed to generate jobs with average rate βk

(jobs per second) according to a Poisson process, and indepen-dently of the other users. Jobs are then sent by the user to a brokerthat dispatches them to the providers. Each broker dispatches jobsto the providers at an average rate λi, which is an aggregation ofjobs being sent by users to the broker. Depending on the compu-tational power provided by the providers, each provider j executesjobs at an average rate µj (jobs per second). The job times are as-sumed to follow an exponential distribution.Users generate jobs that are sent to the brokers, who in turn

send them to the providers. In general, for stability we havethe condition/constraint that jobs must not be generated fasterthan the system can process them (otherwise the queues at theproviders will build up to infinity):

n∑i=1

λi <

m∑j=1

µj (1)

where λi is the average arrival rate of jobs (in jobs per second) atbroker i and µj is the average processing rate of jobs at provider j.Each provider is assumed to have a specified Quality of Service

(QoS) target, expressed in terms of expected job response time Tj.

For simplicity, but without loss of generality, we assume that eachprovider has the same QoS target; that is, Tj = T for all j. In termsof the potential power saving (saving in energy per time unit) thatcan be achieved, each provider would ideally like to run its systemat a speed that exactly meets the QoS target, but does not exceedit, and certainly is not below the target. This means that, when theload is low, the system can be slowed down to save on energy cost,and it is only ramped up to maximum during peak loads.The incentive/objective of each provider, by cooperating, is to

achieve energy saving that it may not otherwise achieve with-out cooperating. As the providers in the grid are assumed to havedifferent ownerships, it is important to simultaneously maximizethe power saving between all the providers, and not just the totalpower saving in the grid system. Besides energy saving, anotherincentive for cooperation is to achieve a higher QoS than can oth-erwise be achieved by each individual provider without coopera-tion [33].

4. Cooperative game

We model the grid load balancing problem as a cooperativegame with m players (for further details on cooperative gamesrefer to [25,26]), where the players are the service providers.Suppose that each player j(1 ≤ j ≤ m) has a performancefunction fj(x), where x ∈ X and X ⊆ Rm is non-empty convexclosed, representing the set of possible strategies for them players.Each player also has a minimum initial performance (without anycooperation) u0j . u

0j represents theminimum required performance

thatmust be provided by the system in order for the player to enterthe game. u0 =

{u01, u

02, . . . , u

0m

}is called the initial agreement

point. In the context of our power-aware scheduling, a solutionis defined to be Pareto optimal if it is not possible to increase theutility of a provider without strictly decreasing another one.Let U ⊂ Rm represent a non-empty convex closed set of

achievable performances. Further, let U0 ={u ∈ U |u ≥ u0

}denote the non-empty subset of U where the players achieveperformance greater than (or equal to) their initial performance. Amapping S :

(U,u0

)→ Rm is said to be aNash bargaining point if:

1. The minimum required performance is achievable: S(U,u0

)∈

U0.2. S

(U,u0

)is Pareto optimal.

3. S is linearly invariant. That is, if we have a linear mapping φ :Rm → Rm whereby φj

(uj)= ajuj + bj, j = 1, . . . ,m; then

S(φ (U) , φ

(u0))= φ

(S(U,u0

)).

4. S is independent from irrelevant alternatives. That is, if V ⊂ U ,and S

(U, u0

)∈ V , then S

(U, u0

)= S

(V , u0

).

5. S is symmetric, meaning that two providers with the sameinitial performanceu0j and the sameperformance fj (x)will havethe same result.

From the above, the point u∗ = S(U,u0

)is called the Nash bar-

gaining point, and f−1 (u∗) is called the set of Nash bargainingsolutions. From the results of [31,38], the following is a character-ization of the Nash bargaining point.Let the performance functions fj (x) be concave and upper

bounded, where x ∈ X is a non-empty convex closed set.Let X0 =

{x ∈ X

∣∣f (x) ≥ u0}be the subset of strategies that

enable the players to achieve at least their initial performances.Let J be the set of users able to achieve a performancestrictly superior to their initial performance; that is, J ={j ∈ {1, . . . ,m}

∣∣∃x ∈ X0, fj (x) > u0j }. If every fj (x), j ∈ J is one-to-one on X0, then there exists a unique Nash bargaining solutionthat verifies fj (x) > u0j , j ∈ J and solves the following optimizationproblem:

maxx

∏j∈J

(fj (x)− u0j

), x ∈ X0. (2)

Page 4: Cooperative power-aware scheduling in grid computing environments

R. Subrata et al. / J. Parallel Distrib. Comput. 70 (2010) 84–91 87

Note that players not in the set J are not considered in the optimiza-tionproblem.An equivalent optimizationproblem is the following:

maxx

∑j∈J

ln(fj (x)− u0j

), x ∈ X0. (3)

Having defined the structure of the Nash bargaining solution,we are now in a position to state our cooperative, power-awarescheduling algorithm.

5. Power-aware scheduling algorithm

For an M/M/1 queuing system [8], the average processing timeof a job including waiting time at the queue at a provider j is givenby

Fj =1

sj − φj(4)

where sj is the average rate of job execution (in jobs per second),and φj is the average arrival rate of jobs at provider j.The underlying communication networks in grids are typically

heterogeneous. We therefore take into account possible commu-nication delays that may be incurred in sending jobs to particu-lar providers. In our model, each provider j is assumed to have acertain bandwidth cj (in bit/s) available for grid jobs; cj can be cal-culated from analytical models or historical information, or dy-namically forecasted by facilities such as the Network WeatherService (NWS) [37]. Each job is assumed to require an average ofb bits of data to be transferred. The expected transfer time of a jobfrom any player to provider j is therefore given by

Lj =bcj. (5)

This value represents the average communication delay if a job isto be sent to provider j.Completion of a job involves the execution time of the job, the

waiting time at the queue, and the transfer time of the job to theprovider. Therefore the average completion time of jobs (averageresponse time) for provider j is given by

Dj = Fj + Lj =1

sj − φj+bcj. (6)

As previously noted, the power requirement for a provider j isrelated to the speed the provider’s system is running at, and isgiven by P = s2; the lower the speed, the higher the saving inenergy cost. However, the speed at which the system can be runat depends on the target expected response time T (the QoS levelafforded to the customers). Therefore, rearranging Eq. (6) gives

Pj = s2j =[1+ T · φj − Lj · φj

T − Lj

]2. (7)

The incentive/objective of each provider, by cooperating, is toachieve a utility superior to the initial utility when its systemis running at full speed. That is, for the initial energy cost wehave P0

j= µ2

j. Also note that the cost function Pj is convex and

bounded below; multiplying this cost function by −1 results ina utility function that is concave and bounded above. In terms ofthe cooperative game that we defined earlier, the Nash bargainingsolution solves the following optimization problem:

maxϕP =

m∑j=1

ln

(µ2j −

[1+ T · φj − Lj · φj

T − Lj

]2). (8)

In Eq. (8), we are simultaneously maximizing the energy costsaving between all the providers, according to the fairness axiom

of the Nash bargaining solution. It is important to simultaneouslymaximize the savings between all the providers, and not just thetotal power saving, as the providers are most likely owned bydifferent entities. As such, there needs to be incentive for each ofthe providers to cooperate. In addition to Eq. (8), we also have astrong constraint, in that all the jobs generated by the systemmustnaturally be executed, that is:n∑i=1

λi =

m∑j=1

φj. (9)

For completeness, we also have the constraint that the rate of jobssent to a provider jmust not exceed the rate at which jobs can beexecuted by the provider j (otherwise the queue at provider j willbuild up to infinity):

φj < µj (10)

φj ≥ 0 (11)

where φi represents the rate of jobs sent to processor j, andobviously cannot be negative. For now we will assume a well-behaved system and ignore constraints (10) and (11). We willfurther assume (for now) that the overall load in the grid is suchthat the target QoS (target expected response time T ) can besatisfied by the providers.We note that for Eq. (8) we have within the constraints

∂D∂φj≤ 0 and ∂2D

∂φ2j≤ 0. As such, Eq. (8) is a concave function;

similarly, the constraints are all linear. Therefore, the first-order,Karush–Kuhn–Tucker conditions are necessary and sufficient foroptimality [17,23]. The Lagrangian is given by

L =m∑j=1

ln

(µ2j −

[1+ T · φj − Lj · φj

T − Lj

]2)

[m∑j=1

φj −

n∑i=1

λi

]. (12)

A necessary condition is

∂L∂φj= 0. (13)

Solving (13), we get the following:

2(T − Lj

) (1+

(T − Lj

)φj)

−(T − Lj

)2µ2j +

(1+

(T − Lj

)φj)2 + α = 0, 1 ≤ j ≤ m. (14)

Solving for φj in Eq. (14) gives the following:

φj =

−1α−1T−

√1+ α2µ2jα

−1α−1T+

√1+ α2µ2jα

.

(15)

We then choose a solution for φj that satisfies all the constraints,giving the following partial solution for φj:

φj = −1α−1T+

√1+ α2µ2jα

. (16)

The Lagrange multiplier α needs to be chosen so that the equalityconstraint (9) is satisfied. Using (16) and (9) and some manipula-tions, we get the following equation that determines α:

m∑j=1

−m+√1+ α2µ2jα

=mT+

n∑i=1

λi. (17)

Page 5: Cooperative power-aware scheduling in grid computing environments

88 R. Subrata et al. / J. Parallel Distrib. Comput. 70 (2010) 84–91

At this point we have determined the amount of jobs that shouldbe sent to each processor (Eq. (16)) so as to simultaneously maxi-mize the cost savings between all the providers. The speed atwhicheach provider should run their system at, so as to meet the targetexpected response time, is given through Eq. (7), as shown:

sj =1+ T · φj − Lj · φj

T − Lj. (18)

The power requirement for each provider is as shown by Eq. (7).The solution and the scheme discussed above are valid for a well-behaved grid system. However, when the system loads are eithertoo low or too high, the solution presented so far breaks down. Wetherefore take into account these two special cases.

5.1. High loads and low loads

When the computational loads of the system are relatively low,the benefits offered by relatively slow providers would not offsettheir inclusion costs; including them may result in a negative jobrate assignment φj as given by Eq. (16), violating constraint (11).For these cases we can simply exclude these providers; it shouldbe noted that, for our problem, constraint (10) is inactive.On the other hand, when the system’s computational loads

are relatively high, providers may find that they are not able tosatisfy the target expected response time T and/or have incentiveto cooperate. In these cases, the target expected response time Twould have to be revised – although ideally, the conditions shouldbe that the target expected response time T , representing the QoSlevel to customers, can always be satisfied. Following fromEq. (16),it is clear that α (which is in seconds) would need to increaseas the system load increases, to satisfy the requirements and thecooperation to continue. As α increases, we have

limα→∞

φj = µj −1T. (19)

Using Eq. (9), we have the following minimum value for T , so thatcooperation can continue:

T =m

m∑j=1µj −

n∑i=1λi

. (20)

Note that Eq. (20) says nothing about the savings achieved for theincrease in T . As the system is at nearmaximum loads, and the newvalue of T is above the target QoS level, it may be advantageous tocouple the proposed power-aware scheme with another schemethat minimizes the response time at high loads. This of coursedepends on how costs of energy are perceived. Later, in Section 6,we show the tradeoff in the energy savings and response times.

5.2. Fairness

Another important criterion for the scheduling algorithm is thefairness to users. Fairness to users is often measured in the sensethat users should receive the same level of utility no matter whichbrokers are responsible for them. In this sense, fairness is achievedwhen the average response time for each of the brokers is the same.If one broker has a lower average response time, and another hasa higher average response time, then the scheduling scheme canbe considered as unfair, as it gives some brokers an advantage,while it gives some other brokers a disadvantage. Fairness isanother important performance measure for scheduling schemesbeside average response time and QoS. A scheduling scheme thatresults in a few brokers having extremely long delays may notbe preferable, as these few brokers would become ‘‘unwanted’’brokers for the users in the system – though arbitrary theymay be.

A fairness index, given by

FI =

(n∑i=1Ti

)2nn∑i=1T 2i

, (21)

where Ti is the average response time of player i, was discussedin [15] to quantify the fairness of scheduling schemes. If ascheduling scheme is 100% fair, then FI is 1.0. A fairness index closeto 1.0 indicates a relatively fair scheduling scheme.Notice that, since in our strategy all the providers have the same

target level of expected response time T (which is our measure ofQoS), it is trivial for the brokers to achieve 100% fairness to theusers. Each broker can send any amount of jobs to any provider, aslong as the provider iswilling to accept the job (that is, the provideris still receiving jobs below its agreed quota).Using the above schemes, the system periodically calculates

an optimum scheduling strategy. The system will remain in equi-librium until there are changes in the system’s states. Periodicscheduling by the system ensures that optimum strategies aremaintained.

6. Experiments

In this section we provide some results and insights into theworking of our algorithm as well as a few others. In particular, weshow the behavior of our algorithm when the system is either atlow loads or very high loads. To this end, a set of networks andapplications was generated. The proposed algorithm is labeled PA.In addition, a few other algorithms were also implemented forcomparison purposes: (1) a proportional-scheme algorithm shownin [7], (2) a non-cooperative game algorithm discussed in [34], andfinally (3) a cooperative game algorithm presented in [33].The proportional-scheme algorithm allocates jobs to providers

in proportion to its computing power (that is, job processing rate);the faster providers are sent more jobs by the brokers. The amountof jobs sent to the providers is given by the following:

λi,j = λiµjm∑j=1µj

. (22)

The proportional-scheme algorithm is a distributed, decentral-ized algorithm. Also note that the proportional-scheme algorithmcannot and does not take into account the communication de-lays incurred in transferring jobs from one site to another. In theexperiments the proportional-scheme algorithm is labeled PS.The non-cooperative game algorithm is also a distributed,

decentralized algorithm. In this game, the players are the brokers,and each player tries to minimize its own average response timeindependently. Specifically, each scheduler calculates a strategyri =

{ri,1, ri,2, . . . , ri,m

}(where ri,j is the fraction of player i’s jobs

that are sent to provider j) such that its average response time isminimized. A strategy r′i is always preferred over a strategy ri if itresults in a lower average response time. For further informationon this scheduling scheme refer to [34]. The non-cooperative gamealgorithm is labeled NG.The cooperative game algorithm is a scheduling algorithm that

requires cooperation between the providers. Here the players arethe providers, and the objective is to maximize the QoS level of allthe providers. The algorithm represents a Pareto optimal solutionto the QoS objective. For further information on this schedulingscheme refer to [33].

Page 6: Cooperative power-aware scheduling in grid computing environments

R. Subrata et al. / J. Parallel Distrib. Comput. 70 (2010) 84–91 89

Table 1Relative processing power of the providers.

Provider 1 2 3 4 5 6 7 8

Processing rate (job/s) 0.01 0.0111 0.0118 0.0125 0.0133 0.0143 0.0154 0.0167Provider 9 10 11 12 13 14 15Processing rate (job/s) 0.02 0.025 0.0333 0.04 0.05 0.0667 0.1

Table 2Relative job arrival rate of each broker.

Broker 1 2 3–5 6 7 8 9–10

Relative job arrival rate 0.35 0.2 0.1 0.06 0.05 0.02 0.01

Fig. 2. Average response time versus system load.

The parameters used for the experiments are as shown below.• The generated network contains m = 15 providers with rela-tive processing power as shown in Table 1. Along with the ex-ponential service times for jobs, the values in Table 1 representthe average processing rate of each provider.• For the proposed algorithm, we use a target expected responsetime of T = 180 s. Later, we also shows the effect when weincrease the target expected response time to T = 260 s andT = 340 s (that is, lower the QoS).• For the cooperative game (CG) algorithm,weuseφmax

j= 0.99×

µj to determine an individual performance guarantee D0j foreach of the provider. That is, instead of a common performanceguarantee in the system, each provider sets its own perfor-mance guarantee to its customers. To this end, each provideraccepts a maximum of 99% utilization of its system, in order toensure a minimum level of QoS (which each provider, by coop-erating, is trying to maximize).• For simplicity, and without loss of generality, we assume neg-ligible/zero communication delay. That is, the completion timeof a job is dominated by the waiting time at the queue, and theexecution time of the job itself.• The system has n = 10 brokers that cooperate with each other,with an arbitrary p number of users. Typically, the number ofusers would be much greater than the number of brokers, thatis, p� n, and the jobs being scheduled by eachbroker are an ag-gregate frommany users. Note that, for experimental purposes,we only need to generate the aggregate number of jobs at eachbroker, and not individual jobs from each user. Therefore, therecan be an arbitrary (p) number of users. The relative job arrivalrate for each broker is shown in Table 2. To get the requiredoverall average system loading ρ, the actual arrival rate λi ofeach broker is calculated as follows:

λi = ηi · ρ ·

m∑j=1

µj (23)

Fig. 3. Average power versus average queue length.

where ηi is the relative job arrival rate of broker i.

6.1. Normal loads

In the experiment, we vary the average load of the system from10% to 80%. The arrival rate of jobs for each broker is then adjustedto give the required average system load, according to Eq. (23).For each load value, the system’s average response time is thencalculated; the system’s average response time is calculated usingthe following equation:

avg =1m∑j=1φj

m∑j=1

φjDj. (24)

The result of the experiment, using the setup described above, isshown in Fig. 2. As expected, the PA algorithmmaintains a constantQoS (in this case a constant expected response time of T = 180 s)for the range of system loads shown. This is done in a way thatsimultaneously maximizes the savings in energy cost for all theproviders. It should be noted that, as described previously, each ofthe algorithms has its own set of objectives and criteria.As our objective is the simultaneous maximization of energy

cost savings for all the providers, Fig. 3 shows the average powerrequirement for the different schemes. In the figure, the PA al-gorithm is shown with three different target expected responsetimes: T = 180, 260, and 340. The plot shows average power ver-sus average queue length, with average power generally increas-ing as the system load increases; greater energy savings can beachieved at the expense of higher average response time, whichgenerally leads to higher average queue length. The CG and NG al-gorithms do not necessarily use 100% power all the time becauseat low system loads some providers do not participate. In the PSscheme all the providers participate all the time.

Page 7: Cooperative power-aware scheduling in grid computing environments

90 R. Subrata et al. / J. Parallel Distrib. Comput. 70 (2010) 84–91

Table 3Average power of the various schemes.

Scheme PA 341 PA 346 PA 350 PS CG NG

Average response time 341 346 350 341 291 305Average power 100% 99.8% 99.7% 100% 100% 100%

Fig. 4. Number of participating providers versus system load.

6.2. High loads

Ideally, the capacity of the system should be such that the targetQoS can always be achieved. In this part of the experiment weinvestigate the performance of the proposed power-aware schemeunder a system loadof 90%. At this load level, a target response timeof T = 180 s cannot be achieved. Higher values of T are thereforeused, according to Eq. (20). Table 3 shows the various values of Tused for the PA scheme, including aminimumvalue for T of around341 s. From the table it can be seen that, at relatively high systemloads, the PA scheme has comparable performance to that of the PSscheme; increasing the value of T further results in further energysaving, whichmay ormay not be worth the increase in the averageresponse time, especially compared to the CG and NG schemes. Assuch, depending on the applications and scenarios, there may bemerit in combining the PA scheme with the CG scheme, wherebythe CG scheme is used during high system loads.

6.3. Low loads

While at high system loads one can expect any and all providersto benefit and get benefits from others participating, this may notbe the case for low system loads. At low system loads it maybe more advantageous for slower providers to not participate atall. Following the methods discussed earlier using Eq. (16), Fig. 4shows the actual number of providers participating for differentsystem loads, in the results of the experiment shown earlier. Asthe figure shows, at low system loads slow providers are totallyexcluded from participation. Only when the system load increasesdo the slow providers become beneficial. At 70% load, there is100% participation of providers in the system. For comparison,the number of providers participating is also shown for the CGscheme, which shows a similar general trend. Note that, for the PSscheme, all the providers always participate irrespective of whatthe system load is. The NG scheme also allows differing numbersof participating providers; however, the providers and numbersselected by each broker may be different.

7. Conclusion

This paper has discussed a cooperative, power-aware schedul-ing framework and algorithm for web-scale distributed systems,

namely the grid. The proposed algorithm (PA) directly takes intoaccount the multitude of ownerships of providers in a grid andincentives for cooperation, and simultaneously maximizes theutility of the providers in the grid. By its nature, the proposedalgorithm is fair to all users. Experiments show that, depending onthe target QoS level, significant energy savings can be achieved, es-pecially at low to normal system loads. At high loads the inevitabletradeoff between energy cost savings and lower QoS may need tobe taken into account, which is application and scenario specific. Itshould be noted that the proposed algorithm can be coupled withthe CG algorithm, especially at high loads, depending on whethereach provider feels the tradeoff between energy savings and lowerQoS is acceptable; this would most likely be dependent on the ap-plications and scenarios. An interesting future work is in devisingdecentralized strategy for the proposed cooperative algorithm, asdiscussed in Section 2. Such a decentralized strategy would makethe proposed algorithm more scalable and fault tolerant.

References

[1] A. Akella, R. Karp, C. Papadimitriou, S. Seshan, S. Shenker, Selfish behavior andstability of the Internet: A game-theoretic analysis of TCP, in: Proceedings ACMSIGCOMM, 2002, pp. 117-30, Pittsburgh, Pennsylvania, USA.

[2] M. Avvenuti, L. Rizzo, L. Vicisano, A hybrid approach to adaptive load sharingand its performance, Journal of Systems Architecture 42 (1997) 679–696.

[3] H. Aydi, P. Mejía-Alvarez, D. Mossé, R. Melhem, Dynamic and aggressivescheduling techniques for power-aware real-time systems, in: Proceedings22nd IEEE Real-Time Systems Symposium, 2001, London.

[4] C. Boeres, A. Lima, V.E.F. Rebello, Hybrid task scheduling: Integrating staticand dynamic heuristics, in: Proceedings 15th Symposium on ComputerArchitecture and High Performance Computing, 2003, pp. 199-206, Sao Paulo,Brazil.

[5] D.P. Bunde, Poweraware scheduling for makespan and flow, in: Proceedings18th ACM Symposium on Parallelism in Algorithms and Architectures, 2006,pp. 190–96.

[6] J.J. Chen, T.W. Kuo, Multiprocessor energy-efficient scheduling for real-time tasks with different power characteristics, in: Proceedings InternationalConference on Parallel Processing, 2005, pp. 13–20.

[7] Y.C. Chow,W.H. Kohler,Models for dynamic load balancing in a heterogeneousmultiple processor system, IEEE Transactions on Computers 28 (1979)354–361.

[8] R.B. Cooper, Introduction to Queueing Theory, 2nd ed, Elsevier, North Holland,1981.

[9] M.E. Crovella, A. Bestavros, Self-Similarity in world wide web traffic evidenceand possible causes, IEEE/ACM Transactions on Networking 5 (1996) 835–846.

[10] D. Grosu, A.T. Chronopoulos, Algorithmicmechanismdesign for load balancingin distributed systems, IEEE Transactions on Systems, Man and Cybernetics 34(2004) 77–84.

[11] D. Grosu, A.T. Chronopoulos, Noncooperative load balancing in distributedsystems, Journal of Parallel and Distributed Computing 65 (2005) 1022–1034.

[12] D. Grosu, A.T. Chronopoulos, M.Y. Leung, Load balancing in distributedsystems: An approach using cooperative games, in: Proceedings 16thInternational Parallel and Distributed Processing Symposium, 2002, pp.501–10, Fort Lauderdale, Florida.

[13] M. Harchol-Balter, A.B. Downey, Exploiting process lifetime distributions fordynamic load balancing, ACM Transactions on Computer Systems 15 (1997)253–285.

[14] C. Im, S. Ha, Dynamic voltage scaling for real-timemulti-task scheduling usingbuffers, in: Proceedings ACM SIGPLAN/SIGBED conference, 2004, pp. 88–94.

[15] R. Jain, The Art of Computer Systems Performance Analysis: Techniquesfor Experimental Design, Measurement, Simulation and Modelling, Wiley-Interscience, New York, 1991.

[16] C. Kim, H. Kameda, An algorithm for optimal static load balancing indistributed computer systems, IEEE Transactions on Computers 41 (1992)381–384.

[17] H.W. Kuhn, A.W. Tucker, Nonlinear programming, in: Proceedings 2ndBerkeley Symposium on Mathematical Statistics and Probability, 1951, pp.481–92, Berkeley.

Page 8: Cooperative power-aware scheduling in grid computing environments

R. Subrata et al. / J. Parallel Distrib. Comput. 70 (2010) 84–91 91

[18] Y.K. Kwok, K. Hwang, S. Song, Selfish grids: Game-theoretic modeling andNAS/PSA Benchmark evaluation, IEEE Transactions on Parallel and DistributedSystems 18 (2007) 621–636.

[19] A. Legrand, C. Touati, Non-cooperative scheduling of multiple bag-of-taskapplications, in: Proceedings IEEE Conference on Computer Communications,INFOCOM, 2007, pp. 427–35.

[20] W.E. Leland, M.S. Taqqu, W. Willinger, D.V. Wilson, On the self-similar natureof Ethernet traffic (extended version), IEEE/ACM Transactions on Networking2 (1994) 1–15.

[21] H.-C. Lin, C.S. Raghavendra, A dynamic load-balancing policywith a central jobdispatcher (LBC), IEEE Transactions on Software Engineering (1992) 148–158.

[22] K. Lu, R. Subrata, A.Y. Zomaya, Towards decentralized load balancing in acomputational grid environment, in: Proceedings 1st International Conferenceon Grid and Pervasive Computing (published in Springer Verlag Lecture Notesin Computer Science), 2006, pp. 466–77, Taichung, Taiwan.

[23] D.G. Luenberger, Linear and Nonlinear Programming, 2nd ed., Addison-Wesley, Reading, Massachusetts, 1984.

[24] R. Mahajan, M. Rodrig, D. Wetherall, J. Zahorjan, Experiences applying gametheory to system design, in: Proceedings ACM SIGCOMM, 2004, pp. 183–90,Portland, Oregon, USA.

[25] A. Muthoo, Bargaining Theory with Applications, Cambridge University Press,Cambridge, UK, 1999.

[26] J. Nash, The bargaining problem, Econometrica 18 (1950) 155–162.[27] K. Ranganathan, M. Ripeanu, A. Sarin, I. Foster, Incentive mechanisms for large

collaborative resource sharing, in: Proceedings IEEE International Symposiumon Cluster Computing and the Grid, 2004, pp. 1–8.

[28] T. Roughgarden, É. Tardos, How bad is selfish routing?, Journal of the ACM 49(2002) 236–259.

[29] D. Shin, J. Kim, Dynamic voltage scaling of mixed task sets in priority-drivensystems, IEEE Transactions on Computer-Aided Design of Integrated Circuitsand Systems 25 (2006) 438–453.

[30] N.G. Shivaratri, P. Krueger, M. Singhal, Load distributing for locally distributedsystems, Computer (1992) 33–44.

[31] A. Stefanescu,M.V. Stefanescu, The arbitrated solution formulti-objective con-vex programming, Revue Roumaine des Mathématiques Pures at Appliquees29 (1984) 593–598.

[32] R. Subrata, A.Y. Zomaya, B. Landfeldt, Artificial life techniques for loadbalancing in computational grids, Journal of Computer and System Sciences73 (2007) 1176–1190.

[33] R. Subrata, A.Y. Zomaya, B. Landfeldt, A cooperative game framework for QoSguided job allocation schemes in grids, IEEE Transactions on Parallel andDistributed Systems 57 (2008) 1413–1422.

[34] R. Subrata, A.Y. Zomaya, B. Landfeldt, Game theoretic approach for load bal-ancing in computational grids, IEEE Transactions on Parallel and DistributedSystems 19 (2008) 66–76.

[35] C. Touati, E. Altman, J. Galtier, Generalized Nash bargaining solution forbandwidth allocation, Computer Networks 50 (2006) 3242–3263.

[36] R. Wolski, J.S. Plank, T. Bryan, J. Brevik, G-commerce: Market formulationscontrolling resource allocation on the computational grid, in: Proceedings IEEEIPDPS 2001, 2001, San Francisco, CA, USA.

[37] R. Wolski, N.T. Spring, J. Hayes, The network weather service: A distributedresource performance forecasting service for metacomputing, Journal FutureGeneration Computer Systems 15 (1998) 757–768.

[38] H. Yaiche, R.R. Mazumdar, C. Rosenberg, A game theoretic frameworkfor bandwidth allocation and pricing in broadband networks, IEEE/ACMTransactions on Networking 8 (2000) 667–678.

[39] F. Yao, A. Demers, S. Shenker, A scheduling model for reduced CPU energy, in:Proceedings 36th Annual Symposium on Foundations of Computer Science,1995, pp. 374–82, Milwaukee, Wisconsin.

Riky Subrata received his B.E. (Hons) in Electrical andElectronic Engineering, and his B.Com. in 2000. He subse-quently received his Ph.D. degree from the School of Elec-trical, Electronic and Computer Engineering, University ofWestern Australia. His current research interests includehigh performance computing, distributed algorithms, andmobile computing.

Albert Y. Zomaya is currently the Chair Professor of HighPerformance Computing and Networking in the Schoolof Information Technologies, The University of Sydney.He is also the Director for the newly established SydneyUniversity Centre for Distributed and High PerformanceComputing. Prior to joining Sydney University he was aFull Professor in the Electrical and Electronic EngineeringDepartment at the University of Western Australia, wherehe also led the Parallel Computing Research Laboratoryduring the period 1990–2002. He is the author/co-authorof seven books, more than 350 publications in technical

journals and conferences, and the editor of eight books and eight conferencevolumes. He is currently an associate editor for 16 journals, the Founding Editorof theWiley Book Series on Parallel and Distributed Computing and a Founding Co-Editor of the Wiley Book Series on Bioinformatics. Professor Zomaya was the Chairthe IEEE Technical Committee on Parallel Processing (1999–2003) and currentlyserves on its executive committee. He also serves on the Advisory Board of theIEEE Technical Committee on Scalable Computing and IEEE Systems, Man, andCybernetics Society Technical Committee on Self-Organization and Cyberneticsfor Informatics and is a Scientific Council Member of the Institute for ComputerSciences, Social–Informatics, and Telecommunications Engineering (in Brussels). Hereceived the 1997 Edgeworth David Medal from the Royal Society of New SouthWales for outstanding contributions to Australian Science. Professor Zomaya isalso the recipient of the Meritorious Service Award (in 2000) and the Golden CoreRecognition (in 2006), both from the IEEE Computer Society. He is a CharteredEngineer (C.Eng.), a Fellow of the American Association for the Advancement ofScience, the IEEE, the Institution of Electrical Engineers (U.K.), and a DistinguishedEngineer of the ACM. His research interests are in the areas of high performancecomputing, parallel algorithms, mobile computing, and bioinformatics.

Bjorn Landfeldt was born in Sweden, where he stud-ied electrical engineering at the Royal Institute of Tech-nology in Stockholm. In parallel with his studies he alsoran a communication systems consulting company. Aftermoving to Australia he completed a Ph.D. in telecommu-nications at UNSW, before moving back to Europe andworking for Ericsson Research, Networks and Systems, fortwo years. During his time with Ericsson, Professor Land-feldt worked on 3G system aspects, mainly multimediamanagement and delivery. He returned to Australia in2001 as a CISCO Senior Lecturer in Networking at Univer-

sity of Sydney. He has been with the University of Sydney since then and was pro-moted to Associate Professor in 2008. Professor Landfeldt has published over 80publications in international journals and conferences on different issues in com-puter networks and distributed systems and he also holds over 10 patents in thisarea. He is also an associate editor for two international journals, and a frequentguest editor, technical program chair or technical program member for interna-tional IEEE and ACM conferences and symposia.