[ieee 2008 ieee asia-pacific services computing conference (apscc) - yilan, taiwan...

Using Frequent Workload Patterns in Resource Selection for Grid Jobs

Tyng-Yeu Liang Siou-Ying Wang I-Han Wu Department of Electrical Engineering

National Kaohsiung University of Applied Sciences No.415, Chien-Kung Road, Kaohsiung, R.O.C

{lty, sakula, han}@mail.ee.kuas.edu.tw

Abstract

Resource selection is an important issue of grid

computing. If a grid job can stably gain enough CPU time from the same resources, not only the execution time of the job but also the frequency of resource reallocation is effectively minimized. However, most of the proposed methods are not effective enough to resolve the problem of resource selection in computational grids. The main reason is that these methods usually make use of current workload state or short-term prediction in available CPU time to be the basis of resource selection while most of grid jobs require a long execution time. To address this problem, we propose a novel algorithm of resource selection for computational grids in this paper. The basic concept of this algorithm is to discover the frequent workload patterns of resources, and then select resources for grid jobs according to the long-term prediction of resource availability by using frequent workload patterns. 1. Introduction

Computational grids [1][2][3] provide users with a

uniform interface to aggregate geographically distributed resources available in computer networks to form a single system image for cooperatively resolving their problems. Through the resource broker of computational grids, user jobs are transparently dispatched to remote resources for execution. In other words, users need not to care about where and how to access available resources any more. As a result, they can put all of their effort into the development of applications on computational grids. However, the availability of grid resources provided by computational grids dynamically changes since the resources are not dedicated for grid jobs. When resource availability is degraded to an unacceptable level, the execution performance of user jobs will be

damaged due to not enough resource supply. Therefore, a careful resource selection is essential for user jobs to obtain a good execution performance in computational grids.

Basically speaking, the availability degree of a grid resource is dependent on the workload of the local jobs on this resource. The more lightly loaded is a resource; the more CPU time can be supplied by this resource for grid jobs. Accordingly, many proposed methods [4][5][6] ranked computational nodes based on the current load states of resources, and then selected the nodes with the highest score, i.e., the most lightly-loaded nodes for the execution of grid jobs. However, the selected resources do not always keep the lightly-loaded state during the execution of the grid jobs. When the local jobs increase their consumption on CPU times, the CPU time available for grid jobs is decreased. To prevent the degradation of program performance, dynamic resource reallocation [7][8] is a common reaction to the decrement of resource availability. Nonetheless, the cost of process migration over a wide area network is too expensive to be executed frequently, or the performance of user programs will be seriously damaged by the overhead of resource reallocation. The worse is that resource reallocation may be repeated constantly if future resource availability is not considered into resource reallocation. Therefore, a better solution is to select the resources with the highest future resource availability for the execution of grid jobs.

To achieve this goal, many past works [9][10][11] exploited the service of Network Weather Services (NWS) to address the problem of resource selection in computational grids. NWS [12] is a popular grid resource monitor used for dynamic performance prediction of network and computational resources. This toolkit exploits many different forecasting models such as running average and sliding window average, median etc. and then dynamically chooses the model with the most precise measurement over recent time

2008 IEEE Asia-Pacific Services Computing Conference

978-0-7695-3473-2/08 $25.00 © 2008 IEEE

DOI 10.1109/APSCC.2008.217

807

2008 IEEE Asia-Pacific Services Computing Conference

978-0-7695-3473-2/08 $25.00 © 2008 IEEE

DOI 10.1109/APSCC.2008.217

807

series for the next-step performance prediction. Yang et. al [13] has successfully improved the prediction precision of NWS by giving more weight to more recent history data and considering the descending and ascending tendency of workloads. In addition, Wu et.al [14] proposed a hybrid model which combines autoregressive (AR) model with confidence intervals estimations to support n-step-ahead load prediction. Their performance evaluation showed that this hybrid model performs well even when n is up to 50, i.e., fifty minutes. However, many grid applications usually execute for several hours or days. The resources selected by using the predictions of the previous models are not necessarily the best for this kind of time-consuming grid applications since the prediction time length is not long enough. Therefore, the problem of resource selection in computational grids has not completely resolved by the past work.

As previously discussed, we propose a novel algorithm of resource selection for computational grids in this paper. The basic idea of this algorithm is to discover frequent workload patterns (FWPs) according to the load history of resources by means of the data-mining technique, and then select resources for grid jobs according to the prediction of resource availability in a given time period by using the association rules derived from FWPs. In order to achieve a high prediction precision, our proposed algorithm dynamically adapts its prediction in resource availability by flushing old workload data. We have evaluated the effectiveness of the proposed algorithm in this paper. The evaluation results show that the proposed resource selection algorithm is more effective for improving the performance of time-consuming grid jobs than that considering only current load state of resources.

The rest of this paper is organized as follows. Section 2 is the background of sequential pattern mining and association rules. Section 3 explains how to predict resource availability by using frequent workload patterns. Section 4 introduces the proposed resource selection algorithm. Section 5 discusses the result of performance evaluation. Finally, Section 6 gives a conclusion of this paper and our future work. 2. Background

Wolski et. al [15] have shown that the load of resources has high self-similarity and strong correlation with times. This implies that it is possible to predict future load states from the load history. Therefore, correctly correlating history data with future load states is a key point to make accurate predictions in resource availability.

In this study, we make use of sequential pattern mining to predict the future load states of resources. Sequential pattern mining [16] is one technique used to discover the repeating data patterns hidden in time series. Basically, its process is an iterative operation based on a slide window. For example, assume there is a data sequence “ABC”. The mining process is described as follows. When the size of the slide window is 1, three data patterns including “A”, “B” and “C” are found. When the size of the slide window is 2, tow patterns namely “AB” and “BC” are discovered. Finally, one pattern, i.e., “ABC” is obtained. Since the whole process is aimed to estimating the number of each pattern appearing in time series, the complexity of this data mining process is O(n2) where n is the size of data sequence. Since a computational grid usually is aggregated with a large number of nodes, the cost of applying sequential pattern mining on the load history of grid resources is our important consideration in addition to prediction accuracy.

On the other hand, we apply association rule on the discovered data patterns to show how possibly the same patterns will repeat in future. Association rule [17] is used in data mining to represent the correlation between data patterns. We apply this rule on the workload patterns of grid resources as follows.

Let Support(A) and Support(A^B) denote the numbers of pattern “A” and pattern “AB” appearing in time series, respectively. The confidence of A associating B is Support(A^B)/Support(A). This confidence value implies how possibly the item next to A will be B if A appears. In this paper, we use association rules to predict future load states and availabilities of resources. It is worthy to say that the support values of patterns must be larger than a threshold. In other words, the patterns should frequently appear in time times, otherwise the derived correlation between these patterns is hard to be accepted. 3. Resource availability prediction

Basically, our resource selection algorithm is based on resource availability prediction using frequent workload patterns. In order to discover frequent workload patterns, we divide the CPU load of a resource into ten levels from 0 to 9. The interval between any two consecutive levels is 10%. We also transfer the time series of a resource workload into a chaincode which consists of a list of digits from 0 to 9, and define each digit of a chaincode as the average load of the resource per hour. On the other hand, we define a workload pattern as follows.

808808

Table 1. Workload pattern table

nid the identifier of a computational node.power the power factor of the node related to the

referenced node, and the attribute value is equal to or larger than one.

pattern the chaincode segment of the node in the time period from stime to etime.

hp the head digit of the workload pattern.stime the start time of the workload pattern, and

the range of attribute value is from 0 to 23. etime the end time of the workload pattern, and

the range of attribute value is from 0 to 23. length the time length of the workload pattern.

(hours) support the number of the workload pattern

appearing in the load history of the node. era The expected resource availability of the

node in the time period from stime to etime.

We apply the process of sequential pattern mining as previously described in section 2 on the chaincode of each resource to calculate the support values of all of the workload patterns appearing in the load history of the resource. When the support value of a workload pattern is larger than a threshold, we called this workload pattern as a frequent workload pattern. The era of a frequent workload pattern represents the expected resource availability of the node in the time period from stime to etime. Its value basically is dependent on predicted resource workload. If the local jobs of a resource consume 30~40% CPU time in the next hour, grid jobs will be able to gain 60~70% CPU time from this resource in that hour. Same as the classification of load degree, the availability degree of this resource in that hour is estimated as 6=9-3. Accordingly, the total availability degree of a resource in the next L hours is estimated with subtracting (9*L) by the total load of the resource at the time period. It is worthy to say that since grid nodes usually have different computational power, it is necessary to take the computational power of resources into the evaluation of resource availability.

As previous description, the era value of a frequent workload pattern is calculated as follows. era = confidence * total_availability*power where total_availability is (9*length-total_load). (1)

In this equation, the value of total_load is the summation of the digits of the workload pattern. The value of confidence is calculated with dividing the support value of the workload pattern by the support value of the workload pattern with the same nid, stime and hp but one-hour length. For example, assume there are two frequent workload patterns X (nid=0, power=2,

pattern=’1’, stime=12, length=1, and support=8) and Y (nid=0, power=2, pattern=’123’, stime=12, length=3, support=4). Using the formula (1), the era value of the workload pattern Y is 21=(4/8)*(9*3-(1+2+3))*2. That means that the expected resource availability of node 0 from 12 o’clock to 15 o’clock is predicted as 21. Note that if the length of a frequent workload pattern is one hour, its confidence value is equal to 1. As a result, the era value of the frequent workload pattern X is 16=1*((9*1)-1)*2. It means that the expect resource availability of node 0 from 12 o’clock to 13 o’clock is predicted as 16.

Generally speaking, the proposed prediction model of resource availability performs well when the time series of resource loads keep regular. However, its performance degrades when recent time series is obviously different to the past one. In order to resolve this problem, the proposed prediction model adapts the contents of chaincodes by cyclically flushing old load history data. In other words, it keeps the chaincodes of resources within recent 720 hours by cyclically flushing the first 24 digits of each chaincode every 24 hours. This adaptation in chaincodes has a side effect that is to prevent the cost of sequential pattern mining from growing as fast as the length of chaincodes. 4. Proposed resource selection algorithm

When a user submits a job at t o’clock, and requires

m nodes with the most total resource availability in the next l hours to execute this jobs, the proposed algorithm selects resources for this user job as follows.

Let n and Pitl denote the number of free node and the frequent workload pattern of node i in the time period from t o’clock to (t+l) o’clock.

Step 1: If n < m, reject the request of job submission;

otherwise go to the next step. Step 2: Detect the current state, i.e., csi of each free

node i. Step 3: For each free node i, select the pattern Pitl

where stime is t, and length is l, and hp is csi , and its support value is larger than Threshold and its conference value is the maximal from workload pattern table, and afterwards, calculate the number of frequent workload patterns, nfp.

Step 4: Sort the frequent workload patterns selected by Step 3 in a descendant order according to the values of era.

Step 5: Select the nodes corresponding to the first m frequent workload patterns in the sorted pattern list obtained by Step 4, for executing the user job, and mark the selected node as Busy.

809809

Step 6: If nfp < m, sort the remained free nodes in the ascendant order according to the current workload states, and then select the first (m-nfp) nodes in the sorted node list for executing the user job, and mark the selected nodes as Busy.

Step 7: Return a list of the selected nodes to the broker for job dispatch.

5. Performance evaluation

We have executed a job-scheduling simulation to

evaluate the effectiveness of resource selection based on frequent workload patterns in this paper. There are 50 jobs waiting for execution in this simulation. The number of nodes requested by each job is a Gaussian distribution. The mean value and standard difference of the node-number distribution are 11.56 and 5.37, respectively. In addition, the working hour of each job is ranged from one hour to eight hours. After our statistics, the average and the standard difference of the working hours are 5.44 and 1.1, respectively. On the other hand, there are 500 nodes in this simulation. In order to create regular time series of resource workload, we make each node have an identical chaincode segment everyday. The length of the chaincode segment is 24 hours, and each digit of the chaincode segment is randomly set from 0 to 9. Before the simulation of job scheduling, we use the load history of 30 days for workload patterns mining. The threshold used for the decision of frequent workload patterns is set as 6 according to our experience. In addition, we use the same chaincodes of 30 days to be the time series of resource workloads during the simulation of job scheduling. There is a scheduler responsible to allocate resources for user jobs in this simulation. The scheduling process is described as follows. for(i=0 ; i<50 ; i++) {

if the ith job is finished continue;

if the number of free nodes is equal to or larger than m, i.e., the number of nodes quested by the ith job

then { select m nodes from free nodes for the ith

job; dispatch the ith job onto these m nodes for execution; record the start time of the ith job;

} }

When a job is finished, the scheduler is trigged to record the end time of the job and set the statuses of its execution nodes to be free, and then perform the above scheduling process again. After all of user jobs are finished, the execution time of each job is calculated as the interval time between its start time and end time. In addition to the proposed resource selection algorithm, we apply another resource selection algorithm which considers only the current load states of free nodes in the job scheduler. We evaluate the performance gap between these two different selection algorithms in terms of the average improvement in the execution time of user jobs to show the effectiveness of using frequent workload patterns in resource selection. 5.1 Effectiveness

Table 2 shows that the proposed algorithm creates 9.16% performance improvement for user jobs in average when 50 jobs arrive at the same time. The performance improvement becomes more obvious when the job arrival time increases. The main reason is that the number of free nodes increases as well as the job arrival time, and then it becomes more possible to choose better resources for user jobs. This simulation result shows that using frequent workload patterns in resource selection indeed is significant and effective for improving the performance of grid jobs. Table 2. Performance improvement made by the proposed resource selection algorithm

job arrival time

all together

3 hrs per job

5 hrs per job

10 hrs per job

9.16% 18.91% 20.27% 22.22% Compared to the algorithm with considering only the current load states of resources.

On the other hand, we also have evaluated the

impact of adapting the chaincodes in workload pattern mining on the performance of the proposed resource selection algorithm. In order to achieve this goal, we added noises into the time series of resource workload to create irregular workload patterns. In other words, we randomly changed the digits of chaincodes of resources during the simulation of job scheduling. In addition, we modified the proposed resource selection algorithm by skipping the adaptation of the mined chaincodes, and applied this modified algorithm in the simulation of job scheduling. Figure 1 shows that the performance improvement made by the modified algorithm called as non-adaptive is seriously reduced by noises in all of the test cases. When the amount of noises is increased to 50% and 50 jobs arrives at the same time, the performance of the non-adaptive

810810

algorithm is almost same as that of the algorithm considering only the current states of resources. Compared to the non-adaptive algorithm, the performance degradation of the adaptive algorithm is smaller. The maximal performance gap between the adaptive algorithm and the non-adaptive algorithm reaches to 50%. This simulation result clearly shows that adapting the mined chaincodes is absolutely necessary and important for the proposed resource selection algorithm. Figure 1. Impact of adaptation in the mined chaincodes

0.00%

5.00%

10.00%

15.00%

20.00%

25.00%

non-

adap

tive

adap

tive

non-

adap

tive

adap

tive

non-

adap

tive

adap

tive

non-

adap

tive

adap

tive

arrival

together

3hrs 5hrs 10hrs

job arrival time

perf

orm

ance

im

prov

emen

t

0%noise

10%noise

20%noise

50%noise

5.2 Cost

In this study, we also have evaluated the cost of

applying the proposed resource selection algorithm in job scheduling. We found that almost all of the cost of the proposed algorithm is spent on chaincode mining. Basically, the cost of chaincode mining is dependent on node number, the length of a time slice, the day number of chaincodes, and the regular degrees of chaincodes. As a result, we evaluated the impact of these factors on the cost of chaincode mining. We performed this cost evaluation on a workstation with a 3.2 Ghz Xeon processor and 2GB RAM. The evaluation results are depicted in Tables 3, 4, 5 and 6.

It can be found in Table 3 that the cost of chaincode mining quickly and non-linearly rises when the number of nodes increases. The reason for this result is that chainode mining needs to perform pattern searching, and the cost of pattern searching increases as well as the number of nodes since more nodes creates more workload patterns. For the similar reason, the decrement of time-slice length results in the increment of workload patterns. Consequently, the cost of chaincode mining also grows fast when the length of time slices decreases. In contrast, Table 5 shows that

the cost of chaincode mining increases as linearly as the number of days. The main reason is that the chaincodes are regular. In other words, the number of workload patterns is same in these three cases. As a result, the cost of pattern searching also keeps same in all of these three cases. Conversely, irregular chaincodes increases a lot of amount of workload patterns. Consequently, the cost of chaincode mining hugely increases when the irregular degree of chaincodes becomes significant. The above results show that the cost of chaincode mining is really a big problem of the proposed resource selection algorithm. Distributed chaincode mining is a possible solution for resolving this problem.

Table 3. The impact of node number day no =1; a time slice = 1 hour. nodes 10 50 100time(s) < 1 23 96

Table 4. The impact of time-slice length

node no. = 10; day no.= 1.time slice

length One hour half-hour 15 minutes

time(s) < 1 14 254

Table 5. The impact of day number node no. = 10; a time slice = 1 hour.

days 1 10 30time(s) < 1 4 12

Table 6. The impact of irregular chaincodes

node no.=10; a time slice = 1 hour, and the regular degree of chaincodes per day = 10%.

days 1 10 30time(s) < 1 91 801

6. Conclusions and future work

We have successfully exploited frequent workload

patterns for addressing the resource selection problem of computational grids in this paper. Our evaluation result has proved that the proposed resource selection algorithm based on the resource-availability prediction using frequent workload patterns is effective for improving the performance of grid jobs. Moreover, adapting the mined chaincodes is very important for the effectiveness of the proposed resource selection algorithm. Our simulation has shown that it can effectively reduce the performance degradation of the proposed algorithm when the time series of resource workloads become irregular.

We focused only on processor availability in this paper. We will apply sequential pattern mining to the

811811

prediction of network performance for addressing the resource selection problem of computational and data grids in future. Acknowledgement

We would like to thank National Science Council of

Republic of China for their support to this work under the project with the number of NSC 96-2221-E-151-018-MY3. References [1] I. Foster, “Globus Toolkit Version 4: Software for

Service-Oriented Systems.”, IFIP International Conference on Network and Parallel Computing, Springer-Verlag LNCS 3779, pp. 2-13, 2006.

[2] Geoff Stoker, Brian S. White, Ellen Stackpole, T.J.

Highley, and Marty Humphrey, “Toward Realizable Restricted Delegation in Computational Grids”, Lecture Notes in Computer Science, vol. 2110, pp.24-31, 2001.

[3] Francine D. Berman , Rich Wolski , Silvia Figueira ,

Jennifer Schopf , Gary Shao, “Application-level Dcheduling on Distributed Heterogeneous Networks”, Proceedings of the 1996 ACM/IEEE conference on Supercomputing (CDROM), pp.39-es, 1996.

[4] Weizhe Zhang, Binxing Fang, Hui He, Hongli Zhang,

Mingzeng Hu, “Multisite Resource Selection and Scheduling Algorithm on Computational Grid”, Parallel and Distributed Processing Symposium, pp:105-es, 2004.

[5] Chuang Liu, Lingyun Yang, Foster Ian, Angulo D,

“Design and Evaluation of a Resource Selection Framework for Grid Applications”, Proceedings of the 11th IEEE International Symposium on High Performance Distributed Computing, pp.63-72, 2002.

[6] James Frey, Todd Tannenbaum, Ian Foster, Miron Livny,

and Steven Tuecke, "Condor-G: A Computation Management Agent for Multi-Institutional Grids", Proceedings of the Tenth IEEE Symposium on High Performance Distributed Computing (HPDC10), pp.55-es, 2001.

[7] Sathish Vadhiyar and Jack Dongarra, “Self Adaptability

in Grid Computing”, Concurrrency and Computation: Practice and Experience, vol. 17(2-4), pp. 235-257, 2005.

[8] Po-Cheng Chen, Jyh-Biau Chang, Tyng-Yeu Liang, Ce-

Kuen Shieh, “A Multi-layer Resource Reconfiguration Framework for Grid Computing”, The 4th Workshop on Middleware for Grid Computing, Melbourne, Australia, Monday November 27, 2006.

[9] Po-Cheng Chen, Ce-Kuen Shieh, Hui Shan Chen, Jyh-biau Chang, Yi-Chang Zhuang, Tyng-Yeu Liang , "An Enhanced Heterogeneous Grid Resource Scheduler for Parallel Jobs", The 4th Workshop on Grid Technologies and Applications (WoGTA’07), pp.95-102, 2007.

[10] Chao-Tung Yang, Chuan-Lin Lai, Po-Chi Shih, and

Kuan-Ching Li, “A Resource Broker for Computing Nodes Selection in Grid Environments,” Lecture Notes in Computer Science: Third International Conference on Grid and Cooperative Computing - GCC 2004, vol.3251, pp. 931-934, 2004.

[11] E. Elmroth and J. Tordsson., “Grid Resource Brokering

Algorithms Enabling Advance Reservations and Resource Selection Based on Performance Predictions”, Future Generation Computer Systems, vol. 24, no. 6, pp. 585-593, 2008.

[12] R. Wolski, “Dynamically Forecasting Network

Performance Using the Network Weather Service,” Journal of Cluster Computing, vol.1, pp. 119-132, 1998.

[13] L. Yang, I. Foster, and J.M. Schopf, “Homeostatic and

Tendency-based CPU load Predictions,” International Symposium on Parallel and Distributed Processing (IPDPS'03), pp. 42-50, 2003.

[14] Yongwei Wu, Yulai Yuan, Guangwen Yang Weimin

Zheng, “Load Prediction Using Hybrid Model for Computational Grid”, Proceedings of The 8th IEEE/ACM International Conference on Grid Computing, pp. 235-242, 2007.

[15] R. Wolski, N. Spring, and J. Hayes, “Predicting the CPU

Availability of Time-shared Unix Systems,” Proceedings of 8th IEEE High Performance Distributed Computing Conference (HPDC 1999), pp. 102-115, 1999.

[16] R. Agrawal and R. Srikant, “Mining Sequential Patterns”, IEEE International Conference on Data Engineering, pp.3-14, 1995.

[17] R. Agrawal, T. Imielinski, A. N. Swami, "Mining

Association Rules between Sets of Items in Large Data Bases", Proceedings of the 1993 ACM-SIGMOD International Conference on Management of Data, pp. 207-216, 1993.

812812

[ieee 2008 ieee asia-pacific services computing conference (apscc) - yilan, taiwan...

Documents