Download - [IEEE 2009 International Conference on Information and Communication Technologies (ICICT) - Karachi, Pakistan (2009.08.15-2009.08.16)] 2009 International Conference on Information

A new rough set based approach for optimized detection of available resources in Grid computing

Asgarali Bouyer, Mohd noor MD SAP and Abdul Hanan Abdullah

Abstract— Since the Grid is a dynamic environment, the prediction and detection of available resources in near future is important for resource scheduling. Economic-based grid management has been viewed as a feasible approach to carry out fair, efficient and reliable scheduling. One key issue in Economic-based grid strategy is to inform about available resources. In this paper, we present a novel predictable method to specify available resource in economic-based grid. This method use a rough set analysis by scheduler to divide resources in groups and then grant a priority to each group based on cost price and efficiency of nodes. The result show that our proposed method has an acceptable performance and it try to use cheaper and suitable resources for each job to decrease cost price of computation.

Keywords-component; economic-base grid; availability; resource scheduling; scheduler; Rough set

I. INTRODUCTION Grid, as a distributed environment, treats as powerful

distributed supercomputer by resource sharing. Grid systems (e.g. CERN[18], TeraGrid [20], AustrianGrid [19]) have emerged as promising next-generation computing platforms for sharing large number of heterogeneous nodes for cooperative problem solving. Each node can be a PC-desktop, Server, Cluster or even Super-computer [5][6]. Grid allow computing devices with different computational and communication resources such as a virtual organization, are collaborate to solve problems that usually is not solvable by PC or cluster or even super computer. Grid is consists of nodes that have different capabilities and configurations, and managed in multiple administrative domains with special policies. Every node has particular resources such as processor, memory and etc for sharing with other nodes in grid. There is many challenge such as discovery of resources, authentication and authorization of messages/jobs, and scheduling of resources in the grid the need to solve. In previous research [1], we proposed an approach for discovering resources and matching proper task to compatible resources in the grid. Here, we want to develop this method for detection and predication of available resource with considers Cost price and Spent Time for job scheduling. We used a rough set analysis in a central manner by scheduler to get a fast scheduling.

Asgarali Bouyer, Mohd noor MD SAP and Abdul Hanan Abdullah are with the Faculty of Computer Science and Information System, University Technology of Malaysia, Skudai, Johor, Malaysia. (Email: [email protected], {Mohnoor, hanan}@utm.my)

To address this complex problem, we consider an economic-based resource-detection mechanism. Because in a Grid, scheduler needs to allocate and schedule computational resources based on their available attributes with consider the cost and time constraints. We need to use the rule of maximized return and minimized risks to manage heterogeneous resources, especially in critical conditions.

Several thousands of computing resources are participating in grid computing system to executing job (tasks) and almost the situation, conditions and behavior of these resources saved by GIS (Grid Information Service) a part of middleware or related applications. Information Service in the Grid provides the ability to discover and monitor resources which is fundamental for the Grid infrastructure [4]. Resources schedulers need information about the efficiency, reliability, availability load and etc to determine which node should be used to run a desired task according to GIS information. But this information usually is out of date and gathered arbitrary (not complete). So that, to record more characters about nodes and their treat, we have provided a database on each node to record of the node’s performance.

We should find compatible resources for a desired job (tasks of job), before starting job execution. For this purpose, we used rough set analysis to generate rules by scheduler to classification nodes based on job condition. Rough set (RS) theory was introduced by Pawlak in1982 [8].

This paper mainly focuses on a new grid resource availability algorithm in cooperation with Rough set tool in a central manner for economic-based grid. The rest of the paper is organized as follow: in Section II we bring some details on related works in the past. Section III introduces the Rough Set concepts, and Section IV describe a new approach to do an online decision for job scheduling based on predictable method with up-to-date information. The simulation experiments results and evaluation of obtained result with other approach is brought in Section V. finally, the last section concludes the whole paper and alludes to the future work.

II. RELATED WORKS In the past, many approaches have been proposed for

detection of resource availability for non-economic grids. In this section, we want to introduce some of these approaches.

In our previous work we presented a new predictor approach for non-economic grid [1]. This approach used a distributed rough set analysis on all grid nodes. After analysis, each node sent its own obtained results to scheduler and then scheduler considered a priority for each node. This

978-1-4244-4609-4/09/$25.00 ©2009 IEEE

approach is one of a fast algorithm, but it doesn’t apply economic-based factors for available resources.

G.Singh and et al, have presented a provisioning model where the resource availability in the Grid can be enumerated as a set of slots [13]. A slot is defined as a number of processors available from a certain start time for certain duration at a certain cost for computational resources. They claimed that the set of available slots at any time can be determined by querying the Grid resources. By the way, they have assumed processor speed and overhead communication between all the sites to be homogeneous and contention for network resources is ignored. Alternatively, the resources will announce its slot information periodically to Grid information services (GIS). The slot based method may be is not practical in grid because it need to complex management by resource manager due to a large number of resources. Therefore, it takes considerable overhead time and even communication.

Yongcai Tao et al. [23] have presented an approach for predicting resource availability. this approach consider a Markov Chain based grid node availability prediction model which can predict grid nodes’ availability in the future by using idle CPU cycles and without adding significant overhead. Based on this model, they presents a grid workflow scheduling based on reliability cost (RCGS). Rational of RCGS is that it computes the reliability of node during task’s running time and then makes scheduling decision based on the reliability cost of task. The performance evaluation results demonstrate that RCGS improves the dependability of workflow execution and success ratio of tasks with low reliability cost.

Another method was proposed by Hu Zhoujun et al. [16]. This method use probability theory for detection of resource availability. They have considered four important criteria for availability metrics such as resource local task execution time, resource off-line time, waiting queue length and waiting time for design a predictable method. Each measure has particular computing for each node. Based on obtained value for these measures, this method predicts the availability of nodes. It seems that, this method due to use probability theory is one of the fast methods. However, the perdition results in the probability theory are not so reliable and number of considered parameters is low. In general, this method execute a fast prediction is.

Much work has recently focused on using agreement-based resource management [3] in order to meet the challenges of heterogeneous and autonomous resources. Using this model, the resource consumer or a broker enters into a contractual agreement with the resource provider about the availability of certain resources for a certain timeframe at a certain cost.

Finally, other predicator techniques have been proposed in [21] [10] [11] [12] [15] that we devolve them to readers.

III. ROUGH SET ANALYSIS Rough Set [9] as a mathematical tool to deal with

uncertainty and vagueness in data provides us with a sound theoretical basis to determine the properties that define

similarity [7]. Rough Sets Theory has often proved to be an excellent tool for the analysis of vagueness and uncertainty inherent in making decisions. The vague concepts were replaced by the precise concepts: the lower and the upper approximation of the vague concepts. Approximations are two basic operations in the rough set theory, The idea can he presented in the following manner.

Let U be a universe, A be a finite set of attributes, B be a subset of A, X be a subset of the universe, and I be an equivalence relation on U, called an indiscernible relation. IND(B)(x) , in short B(x) or IND(B), is an equivalence class containing an element x. Let us define the two basic operations, the B-lower and the B-upper approximations, on sets in the rough set theory:

(1) } X ) x ( B : U x { = (X) B* ⊆∈ (2) } X B(x) : U x { =(X)B* φ≠⊆∈

We usually use the rough member ship function, called the confidence function to define approximations and the boundary region of a set. The confidence function is defined as:

)3()(

))(()(xIND

XxINDNumxCF ∩=

Where CF(x)∈ [0,1] denotes the degrees of how the

element x belongs to the set X in view of the indiscernible relation I. The confidence function of the lower and the upper approximation can be redefined respectively 3s follows:

(4) 1} ) x CF( : U x { = (X) B* =∈ (5) 0} CF(x) : U x { =(X)B* >∈

Suppose that A = C∪D, in which C={c1, c2, ...cm} (m is

the number of conditional attributes) is a nonempty set of conditional attributes, and D is a finite set of decision attributes. Also let IND(B) denote the set of equivalence classes of U with respect to B (and B ⊂C and A = C∪D). Also IND(D) denote the set of equivalence classes of U with respect to D. The positive region of B in IND(D) is defined as:

(6) }. IND(D) X : (X)B { = (D)POSB ∈∪ This paper applies the rough set theory to scheduler

application’s database for finding out some rules for association explanation, such as the characterization to the tasks. The purpose of the paper is mining more detail knowledge for scheduler to detect available and unavailable resource in preset and near future. We use three attributes (Cost Price, final status of task, and completion time) as decision attributes. These attributes can be acted upon as condition attributes and decision attribute of a decision system. Desired application only uses one of this attributes at a moment as decision attribute and at the same time, other two attributes will be considered as conditional attributes. For example, if dependability and speed factors be more important, the second and third attribute respectively is considered as Decision attribute.

IV. PROPOSED RESOURCE AVAILABILITY DETECTION METHOD

This method, look like previous work [in55], apply a voluntary announcer to inform its status to scheduler without any inquiry from scheduler. Each node will provide a message to send to scheduler that is to contain some information to accept or deny new tasks. Each node must records all information about submitted or executed grid’s tasks to get truly decision based on previous recorded data.

This approach consists of two applications: Provider Node Application (PNA) and Scheduler Application (SA). An overview of this application is depicted in Figure 1. In economic based grid, one of the important issues is to determine the price of provider nodes based on previous performance. This factor is so important for doing fairly and properly scheduling based on cost price constraint. When a provider node gets a new task, it means that this node is suitable for this job rather than other nodes (the nodes that have not been selected to submit new task). PNA always save all useful information about new submitted task in local database (DB). Each node will record its information about submitted or executed grid’s tasks in its own local DB. Moreover, the scheduler DB is considered to save the received current status of grid nodes. Every time a node is connected to grid, it must update its new information on scheduler’s DB.

When scheduler wants to submit a new task on a desired node, it send a packet contain some important information about this task. This packet receives by First layer of PNA. This layer (Sender and Receiver layer) is responsible to investigate for scheduler’s requests. At first time, it will check whether there are sufficient resources to execute this task or not? If there is not enough resource, it withdraw this task to scheduler, otherwise it calls query analyzer. Query analyzers immediately do some inquiry from local DB based on new task parameters. After that, this layer also must compute Success Ratio, Average of Completion Time (ACT), Average of CPU-Idle, Average of free memory, the considered price for last finished task, and average price for this node. Then it delivers the obtained results to first layer. At the next, this information is sent by first layer to scheduler, then immediately Task-recorder section is called by first layer. This section is responsible to save all required information about submitted task. For every new grid task, PNA record several important properties at this time such as the Cost Price, CPU Load, Free memory , size of new task, priority of new task (priority is Low, Normal or High), number of all local task(system’s and user’s tasks), number of all local task with high priority (that means above normal, high or real-time), number of all grid tasks (in running, ready, waiting states), amount of Data Transmission Rate (DTR) about this node in the grid (DTR probably has upheaval in some times), start time of task execution, spend time for this task (only execution times and without any waiting times), completion time, final status of task (running, success, Abort or fail). Some of this information (e.g. cost price spent and completion time, final status of task and so on) is filled after confirming this node by scheduler

and finishing task. If scheduler confirms this node, PNA will start task execution. When the task is finished or aborts, this layer has to update related field in DB.

SA is responsible to initialize PNA and to send initial information about coming tasks to each node. Also SA receives data from provider nodes and save them in SA database. This is important due to execute rough set analysis. First layer in SA is responsible to do this mentioned works. When a provider node was ready to accept new task, all sent results will be received by first layer in SA and will be inserted to SA database. The second step in SA is the grouping of provider nodes based on cost price and other index parameters (e.g. completion time). One of the considerable criteria in economic-based grid is to execute a job with minimum cost price in defined and considered completion time.

SA will execute a rough set analysis to get some useful rules to classification of nodes. We have considered rough set analysis for produce some useful rules about availability of desired node. SA will do rough set in order to get when node was successfully available until to terminate a task, when cost price is high, how much is success ratio if cost price be high, when tasks are failed, which tasks has maximum success on this node. For doing this, we consider three attributes (cost price, final status of task, completion time) as decision attributes. It mentions that, NA only uses one of this attributes at the moment; and other two attributes will be conditional attributes at the same time. There is a section in SA that is called Nodes’ Classifier to categorize of existing nodes in three groups: Low-cost, medium-cost and high-cost. Each group is consisted of four classes. - Class A: these nodes are ready to accept new tasks and also will available in coming. The nodes in this class are better to select by scheduler in order to tasks scheduling. But it is possible that scheduler is encountered with shortage of resources. Therefore, scheduler is forced in some cases to select nodes from other classes.

- Class B: in this class the nodes are available to submit new task, but coming soon may be they will not available. If there is a task with least deadline time to start execution and also there is not enough resources from class A, scheduler is forced to schedule nodes from class B. in this case, since the probability of failing node is high, scheduler use check point technique[17][2] for doing a fault tolerance scheduling.

- Class C: the nodes belong to this class are not available at this time, but may be in near future they will be available. These nodes are useful for long time deadline tasks or also can be replaced with coming faulted nodes.

- Class D: this class’s nodes are worst nodes for scheduling. Because at this time, these nodes are unavailable and in near future they almost are not ready to accept new tasks. The nodes in this class is used when we have lack of nodes in three above class and only used for tasks that have enough time to stay in waiting time (long time deadline to start execution).

Another task for SA is computing of the priority for nodes in each Class based on gathered results from each node. This is done by computing priority section in SA application and it

is vital for grid scheduler to select proper nodes. The main goal of computing priority is to obtain the priority of nodes in Class A. other three classes are used at the least. By the way, CPU Speed and data transmission rate (DTR) in grid for each node must be considered. Priority for each ith node will be computed as below:

))(

())(

1()100

_(Ratio) Success(P ii freeRAMMAX

freeRAMACTMAX

ACTIdleCPU iii +−++=

The MAX(ACT) return biggest average of completion time in between of all nodes; and MAX(freeRAM) return largest free RAM in between of all nodes. For example, if Pi is larger than Pj it means that Pi is better than Pj or ith node has

better priority than jth node for selecting by scheduler. The nodes in Class A usually specified as Available nodes. When new tasks are received by scheduler, at first, it scans the priority of these tasks. If the task priority was low, it tries to select low priority nodes from low-cost group. If the task priority was high and there is not any cost problem, it tries to select high priority nodes from high-cost group. But if there is cost problem, it will select high priority nodes from medium-cost group. If there not exist enough high priority nodes in this group, it will select reminder nodes from low-cost group and from class A (high priority nodes).

Example: let’s consider a two task from two different tasks with some properties as following:

Task1 (task priority= high, estimation of execution time= 310sec, memory requirement=640KB, Maximum deadline time to back response=30minutes, fault-tolerance= normal cost-price= medium). Task2 (task priority= High, estimation of execution time= 560sec, memory requirement=1005KB, Maximum deadline time to back response=27 minutes, fault-tolerance= high, cost-price= high). Task1 will be submitting on high priority nodes in class A from medium-cost group.

Task2 will be submitting on high priority nodes in class A from high- cost group.

V. EXPERIMENTAL RESULTS AND DISCUSSION To evaluate this approach, GridsimV.4.2 [22], Alea 2.0

simulator and our provided package have been used. As mentioned before, to implement our approach we provided NA Application for grid Nodes (user nodes) and SA application for Scheduler (applied in Gridsim).

We considered a task (properties: size=1.3MB, execution time= 1600 sec, priority= high Cost-Price= high) to compare with RCGS[23] and TCI[24]. For this task we consider high measure for cost-price. it means that we don’t have cost limitation for task. After starting our proposed approach, there were 16 nodes in class A in the third group (high-cost group). This mentioned task was submitted and executed on 10 nodes of all 250 nodes for 50 times. At the next, we changed the value of Cost-Price property to Medium and low. These 10 nodes were selected by SA from class A in third group, at the second nodes selected from class A in second group ( medium-cost group) and ate the end, they were selected from Class A in first group (low-cost group). It mentions that the nodes in class A, in third group, are the best nodes for selecting. We showed the obtained results of comparison in Figure 2. The evaluation of this results show that the nodes in class A, in third group, in 98% is available.

By the way, the property of Cost-price is ineffective for TCI and RCGS methods. In Figure 2, the average of Success Ration has been mentioned for these methods.

At the next, another task (size=640KB, execution time= 1900 sec, priority= Normal Cost-Price= medium) was considered to submit on 100, 160, 200, 230 nodes of all 250 nodes. The simulation results show that after 1 hour of starting execution, our method has better output rather than two mentioned methods. This comparison is depicted in Figure 3.

Finally, we decide to compute the completion time for a task with different priority and cost-price. There is a 9 different state in execution cycle that is illustrated in Figure 4. As you see, for the first three states (high cost), our method has better completion time and for the last three states (low-cost) our approach is weak rather than others Because, cost-price parameter is more important than other parameters in our approach. Grid owners tend to participate in high cost computation. In other two methods, task priority is more important than cost price.

PNA

DB

Task-recorder

Query Analyzer

Task execution section

Sender/Receiver in PNA

SA

DB

Sender/ Receiver in SA

Nodes’ Classifier

Computing Priority

Rough set analyzer

Fig.1. An overview of offered approach.

Fig.2. This figure shows the Success Ratio after 50 test

for each group.

Fig.3. This figure shows the finished tasks after one

hour of starting execution.

VI. CONCLUSION AND FUTURE WORKS The resources in grid environment are generally provided

voluntarily and their availability fluctuates highly. In this paper, we proposed a centralized method for prediction, detection, and grouping current and future available resources. This method applies Rough set analysis as a knowledge extraction tools to obtain useful rules for a desired job (tasks) with consider cost price. By using Rough Set, our method can learn even with small number of samples. The experiment results show that prediction is effective, due to use a node classification technique by SA, in some cases we achieve to a large number of resources that in more than 98% available. As a future work, we will design dynamic resource availability evaluation by using rough set and Neural Network (NN).

Fig.4. This figure shows the completion time

for finishing all tasks on grid nodes.

REFERENCES [1] A. Bouyer, Mohdnoor MD SAP, A. hanan, "Using Self-announcer Approach for Resource Availability Detection in Grid Environment”, The Fourth International Multi-Conference on Computing in the Global Information Technology. Franch, ICCGI2009.. [2] S. Baghavathi Priya, M. Prakash, K. K. Dhawan. “Fault Tolerance-Genetic Algorithm for Grid Task Scheduling using Check Point”. Proceedings in Sixth International Conference on Grid and Cooperative Computing (GCC 2007), IEEE, pp. 676 – 680. [3]. K.Czajkowski, I. Foster, and C. Kesselman, “Agreement-based resource management”. Proceedings of the IEEE, 2005. 93(3): p. 631-643. [4] Yue Chen; Ying Li; Zhengxian Gong; Qiaoming Zhu, "A framework of a tree-based grid information service," Services Computing, 2005 IEEE International Conference on , vol.2, no., pp. 255-256 vol.2, 11-15 July 2005 [5] I. Foster, and C. Kesselman, The Grid 2: Blueprint for a New Computing Infrastructure, Morgan Kaufmann, USA, 2003. [6] I. Foster, C. Kesselman, and S. Tuecke, ”The anatomy of the grid: enabling scalable virtual organizations”, International Journal of High Performance Computing Applications, 2001 (3), pp. 200-222. [7] Kun Gao, Kexiong Chen, Meiqun Liu, and Jiaxun Chen, “Rough Set Based Data Mining Tasks Scheduling on Knowledge Grid”, AWIC 2005, LNAI 3528, pp. 150–155, Springer-Verlag Berlin Heidelberg 2005.

[8] Z. Pawlak, “Rough sets: Theoretical aspects of reasoning about data’, Kluwer Dordrecht, 1991 - 256 pages. [9] Z. Pawlak, J. Grzymala-Busse, R. Slowinski, W. Ziarko, “Rough Sets”, Communications on the ACM 38 _11. 1995. 89–95. [10] X. Ren and R. Eigenmann. Empirical studies on the behavior of resource availability in fine-grained cycle sharing systems. In International Conference on Parallel Processing, pages 3–11, 2006. [11] X. Ren, S. Lee, R. Eigenmann, and S. Bagchi. Prediction of resource availability in fine-grained cycle sharing systems empirical evaluation. Journal of Grid Computing, 5(2):173–195, 2007. [12] B. Rood and M. J. Lewis, “Resource Availability Prediction for Improved Grid Scheduling”, Fourth IEEE International Conference on eScienc, 2008. [13] G. Singh, C. Kesselman,, E. Deelman, "Application-Level Resource Provisioning on the Grid," e-Science and Grid Computing, 2006. e-Science '06. Second IEEE International Conference on , vol., no., pp.83-83, Dec. 2006 [14] R. Vilalta, C. Apt´e, J. L. Hellerstein, S.Ma, and S.M.Weiss. “Predictive algorithms in the management of computer systems”. IBM Systems Journal, 41(3):461–474, 2002. [15] Z. Xiong, Y. Yang, X. Zhang, Fu Chen, Li Liu, "Integrating Genetic and Ant Algorithm into P2P Grid Resource Discovery," Intelligent Information Hiding and Multimedia Signal Processing, 2007. IIHMSP

2007. Third International Conference on , vol.2, no., pp.83-86, 26-28 Nov. 2007. [16] Hu Zhoujun; Hu Zhigang; Liu Zhenhua, "Resource Availability Evaluation in Service Grid Environment," Asia-Pacific Service Computing Conference, The 2nd IEEE , vol., no., pp.232-238, 11-14 Dec. 2007 [17] University of Wisconsin-Madison. Condor Version 7.0.4 Manual, 2008. [18] EGEE Team, LCG. http://lcg.web.cern.ch/, 2007. [19] The Austrian Grid Consortium. http://www.austriangrid.at, 2007. [20] The TeraGrid Project. http://www.teragrid.org/, 2007. [21] F. Nadeem, R. Prodan, T. Fahringer, “A framework for resource availability characterization and online prediction in the Grids”, pages 209-224. CoreGRID. Springer, July 2008. [22] Gridsim: “A Grid Simulation Toolkit for Resource Modeling and Application Scheduling for Parallel and Distributed Computing”, http://www.gridbus.org/gridsim/release.html. [23] Y. Tao, H. Jin, X. Shi, “Grid Workflow Scheduling based on Reliability Cost”, Proc in Infoscale’07, ACM international conference, Suzhou, China, 2007. [24] H. Mohammadi Fard, H.ein Deldari, “An Economic Approach for Scheduling Dependent Tasks in Grid Computing”, Proc in The 11th IEEE International Conference on Computational Science and Engineering, 2008.

Download - [IEEE 2009 International Conference on Information and Communication Technologies (ICICT) - Karachi, Pakistan (2009.08.15-2009.08.16)] 2009 International Conference on Information

Top Related