appendix - i - university of windsorrichard.myweb.cs.uwindsor.ca/cs510/survey_aggarwal.doc · web...
TRANSCRIPT
University of Windsor(CS-03-60-510)
Survey Report
Architecture of Resource Scheduling On
Computational Grids
Submitted to: Dr Richard Frost Submitted by: Mona Aggarwal
0
Table of Contents
Abstract......................................................................................................................................... 3
1. Introduction......................................................................................................................... 4
1.1. Scheduling Systems...................................................................................................41.2. Cluster Scheduling......................................................................................................51.3. Grid System Taxonomy..............................................................................................6
1.3.1. Resource Management System........................................................................71.4. Grid Scheduling Problem............................................................................................8
2. Grid Scheduling.................................................................................................................. 9
2.1. Grid Scheduling Model................................................................................................92.1.1. Application Model...........................................................................................102.1.2. Resource Model..............................................................................................10
2.1.2.1. The Resource Information Service............................................................102.2. Globus Architecture for Grid Scheduling...................................................................112.3 Grid Scheduling Terminology....................................................................................13
2.3.1 Jobs: Kendall Notation.........................................................................................132.3.1 Application and Processes...................................................................................132.3.2 Allocation.............................................................................................................132.3.3 Preemptive / Non Preemptive assignment...........................................................132.3.4 Resource Co-allocation........................................................................................142.3.5 Migration..............................................................................................................142.3.6 Space-sharing......................................................................................................142.3.7 Time-sharing........................................................................................................14
2.4 Grid Scheduling Approaches....................................................................................142.4.1 First-Come-First-Serve (FCFS)............................................................................142.4.2 Backfilling.............................................................................................................152.4.3 Gang Scheduling.................................................................................................152.4.4 Genetic Algorithms (GAs)....................................................................................152.4.5 Directed Acyclic Graphs (DAG) Scheduling.........................................................152.4.6 Priority Queuing...................................................................................................16
2.5 Types of Jobs on Grid...............................................................................................162.6 Grid Scheduler Performance Criteria........................................................................17
3. Classification of Grid Schedulers....................................................................................18
3.1. Organizational structure based scheduler.................................................................183.1.1. Hierarchical Schedulers..................................................................................183.1.2. Centralized schedulers...................................................................................193.1.3. Decentralized Schedulers...............................................................................19
3.2. Application / Resource based Scheduler..................................................................203.2.1. User-centric scheduler....................................................................................203.2.2. Service-provider centric scheduler..................................................................203.2.3. Economy based scheduler..............................................................................20
3.3. Other characterizing features:...................................................................................203.3.1. Online vs. Batch Schedulers...........................................................................213.3.2. Preemptive / Non-preemptive based schedulers............................................21
1
4. Information Retrieval for Grid Schedulers......................................................................22
4.1. Resource Information retrieval..................................................................................224.2. Meta computing Directory Service (MDS).................................................................224.3. Network Weather Service (NWS).............................................................................24
4.3.1. Architecture of a NWS Resource-set..............................................................244.3.1.1. NWS Performance Sensors.......................................................................25
5. Existing Grid Schedulers.................................................................................................26
5.1. Condor...................................................................................................................... 265.2. Condor-G.................................................................................................................. 275.3. AppLeS Scheduler....................................................................................................285.4. Nimrod-G Resource Broker......................................................................................295.5. GrADS Scheduler.....................................................................................................305.6 The Legion Scheduler...............................................................................................315.7 TITAN....................................................................................................................... 32
6. Concluding Remarks........................................................................................................34
6.1. Performance Goals for a Scheduler for the Global Grid............................................346.2. Architecture of a Scheduler for the Global Grid........................................................34
Appendix – I................................................................................................................................ 35
Appendix – II............................................................................................................................... 52
Appendix – III.............................................................................................................................. 68
Appendix – IV.............................................................................................................................. 69
Appendix – V............................................................................................................................... 71
Appendix – VI.............................................................................................................................. 73
2
Abstract
This report is a Survey of the architecture of schedulers for grid systems. The
schedulers being used in various grid systems have been designed for the
applications that the grid is designed to handle. The survey presents a detailed
study of the schedulers that are in use, it describes the architectural model in
each case and it compares the features of the available schedulers. The
survey contains the architectures of the available schedulers in detail and the
architecture of the scheduler for the global grid as a conceptual model.
A scheduler on a grid has to deal with a continuous load of diverse
applications. It has a dynamic, geographically distributed and heterogeneous
set of resources. It is designed to map jobs to resources with certain specified
performance goals. The survey describes the methods of obtaining information
about the resources, the performance goals that a scheduler may attempt to
achieve and the process of mapping jobs to resources.
3
1. Introduction
“A computational grid is a hardware and software infrastructure that provides dependable
consistent, pervasive and inexpensive access to high-end computational capabilities” (Foster j,
1998). Today the users visualize a grid in a number of different ways. The common grids are
research grids. A number of academic-research organizations have set up grids both for high
performance computing as well as for grid research. Most of such grids are an inter-connection of
clusters, usually through a dedicated wide area network. In general a cluster consists of
homogeneous, dedicated and tightly coupled workstations that may be used to process high-
performance jobs. A grid, on the other hand, consists of heterogeneous and intermittently
available workstations. Whereas a cluster is expected to be located at a single laboratory or
office, the workstations on a grid may be widely distributed geographically.
The structure of a grid is heterogeneous and dynamic. Moreover it is expected to be used for a
wide variety of applications. Scheduling of jobs on a grid or a cluster is the task of allocation of
jobs to the available nodes. The objectives of grid scheduling are fast turn-around time and
maximum throughput (Wright, 2001). Scheduling in a cluster is a relatively simpler task. On a
grid, which is dynamic, heterogeneous and geographically spread out and which caters to highly
diverse applications, scheduling is a NP-complete problem (Buyya I, 2000). For any user,
whether on a cluster or a grid, the turn-around time is required to be low. But the objectives of a
service provider on the grid may be a high throughput in grids with a high load. For lightly loaded
grids, the objective may be to distribute the jobs to all the nodes in an equitable manner.
Scheduling has to take into account the objectives of the service provider, the needs of the user
and the requirements of the applications.
1.1. Scheduling Systems
Grids have evolved from a natural progression of parallel processing systems, distributed
processing systems and a metacomputing system.
Parallel processing systems can be classified into two types:
1. Shared Memory Processing systems
2. Clusters of workstations
Centralized systems designed to cater to the needs of a large number of users are usually multi-
processor systems with a shared memory.
4
A distributed system consists of heterogeneous systems connected in parallel. A grid is a
dynamic and geographically spread-out distributed system.
There are some common properties that a scheduler has to satisfy, whether it is for a parallel or
for a distributed system or for a grid.
A scheduler is designed to satisfy one or more of the following goals:
Maximizing system throughput
Maximizing resource utilizations
Maximizing economic gains
Minimizing the turn-around time for an application.
In addition a scheduler may be designed to follow a specific or an adaptable scheduling policy.
Optimization of scheduling process can be attempted when performance goals and scheduling
policies have been defined.
Since Grid technology is developed as an extension of technologies used in clusters and
distributed systems the next section describes the characteristics of scheduling in clusters.
1.2. Cluster Scheduling
A cluster consists of machines connected together through a high-speed network. A cluster
consists of homogeneous, dedicated and tightly coupled workstations. A cluster is managed by a
single administrative entity. A cluster has a server, which interacts with the users called clients.
The server also has information about the all computer nodes called Agents in the cluster. A
client may be connected to the server through Internet. The client has to inform the server about
the requirements of the application. The inter-dependency of the processes of the application has
also to be specified either through a script or through a GUI interface for application building. A
cluster can run both sequential as well as parallel jobs.
According to (Abawajy a, 2003), “The key to making cluster computing work well is the
middleware technologies that can manage the policies, protocols, networks, and job scheduling
across the interconnected set of computing resources.” The two main goals of cluster scheduling
are to obtain maximize the utilization and throughput.
5
The schedulers most commonly used by the high performance community are Maui scheduler
(Jackson, 2001) and PBS scheduler (Goldenberg, 2003).
These schedulers allow the administrator to set the following parameters:
To set multiple queues for different job classes
To set priorities and scheduling policies for each queue.
To treat interactive jobs differently from non-interactive jobs.
Even though the facility to set the parameters is useful to the administrator, if the clusters are
served with continuously varying applications, optimization of scheduling remains a desirable but
an elusive goal.
1.3. Grid System Taxonomy
Grid systems can be classified, into three types (Buyya c, 2002).
Computational grids
Data grids
Service grids
Figure 1-1: Grid Systems Taxonomy (Rajkumar Buyya, 2002, page: 3)
The aggregate computational power of a computational grid is larger than the computational
power of any single constituent of the grid. High Performance computing problems, which require
large computational power, use the grids for parallel computing. (Buyya c, 2002).
A data grid provides large data repositories such that both the data storage as well as the users
may be distributed over a large geographical area. Users may mine the data for studies in
different ways. As an example, the astronomers, as a loose group, have been able to create a
large data grid. Large telescope systems continuously scan the outer space and feed the data
into large data repositories. These are then used by astronomers from around the world. (Buyya
c, 2002).
6
Service grids provide services not available at a single site from a single machine. Examples of
such grids are On Demand computing, Multimedia computing or Collaborative computing.
Grids may also be classified on the basis of administrative domains covered by it. An IntraGrid
may be controlled by a single, multi-location organization. Sharcnet is an example of such a grid.
This consists of big clusters at 4 universities connected by a dedicated Internet.
An Extra Grid may couple two or more grids. ERP systems of today use such grids whereby a
business entity may be connected to its suppliers and its sales organizations.
Global Grid is the kind of grid being conceptualized by the Global Grid Forum. Such a grid would
have a large number of resource providers connected through Internet to the users. Middleware
for such a grid is being developed by a number of research groups of GGF and at various
Universities.
1.4. Resource Management System
Figure 1-2: Resource Management System Abstract Structure (Rajkumar Buyya, 2002, page: 5)
A scheduler is a component of the Resource Management System (RMS) on a grid. Figure 1-2
shows the architecture of a general RMS. Resource Information from the sensors of the service-
provider forms one input of RMS. The other input is the Resource Requests from the users. The
resource information inter-acts with the Discovery module of RMS. The State Estimation system
converts the information, through the Naming system to a Resource Information database. The
Resource Monitoring system maintains the Historical Data and the present Resource Status
databases. The Dissemination system may be used to convey the resource information to users
7
on the grid. The Request Interpreter may format the request, or in some cases like the TITAN
system (Spooner a, 2003), it may generate estimates of execution time for each task of the job
request. The Resource Broker will negotiate for the resources. It may use Resource Reservation
module to save information about the reserved resources in the Reservations database. The
Scheduling system will put the job in the Job Queues. Then the Execution Manager sends the job
for execution to the allocated resources. The Job Monitor continuously monitors the job, as it is
being executed. It stores the state of the job at checkpoints in the Job Status database.
1.5. Grid Scheduling Problem
(Foster j, 1998) defines the responsibilities of a grid scheduler as the “discovery of available
resources for an application, mapping the resources to the application subject to some
performance goals and scheduling policies and loading the application to the resource in
accordance with the best available schedule.”
In addition a grid scheduler must provide the facility of checkpointing so that a job can be
migrated, along with its state, to a different node. Migration may be required in case a pre-
emptive scheduling policy is considered due to failure of some nodes or some part of the network.
Facilities of suspension of processing, resuming the processing or aborting a job are also usually
provided in grid schedulers.
8
2. Grid Scheduling
Grid Scheduling is used to “efficiently and effectively manage the access to use grid resources by
the users” (Buyya c, 2002). According to (Casavant, 1988) grid scheduling problem consists of
three main components.
Consumer(s).
Resources(s).
Policy.
Figure 2-1: Scheduling system (Casavant, 1988, page: 142)
The policy chosen for implementing a Scheduler would affect both the Users as well as the
Service Providers.
2.1. Grid Scheduling Model
A Grid Scheduling Model consists of the following facilities (Berman d, 1999):
Prediction of performance for every job that is to be processed by the grid: Since
availability of resources may be continuously changing and the jobs may have a great deal of
diversity, performance may be a sensitive function of time. The scheduler of an application and
the resource provider can negotiate the use of resources only if the performance, that is expected
when the application is processed, can be predicted with a reasonable degree of confidence.
Use of dynamic information to work out the schedules: In a grid environment, the state of
available resources is highly dynamic. The scheduler should be able to use ‘the parameters of the
present state’ and the ‘meta data about predictions’ to work out the schedules with a greater
degree of stability.
9
Adaptability of schedules: After negotiations, the user and the service provider may agree on
the basis of the predicted performance and the scheduler may be able to map the application to
the nodes. However the available resources may change. Hence the schedule may have to be
modified and the performance may have to be re-worked out. The performance should be
reasonably stable even when the mapping has to be changed.
2.1.1. Application Model
As defined by (Feitelson a, 1998), grid applications are divided into processes, which may be
following a serial flow or a networked flow. The inter-dependency has to be specified by the user.
The nature of flow of processes would determine the maximum number of nodes that may be
used for a faster completion of the job, if the nodes are available.
The flow can be represented by a directed acyclic graph (DAG) (Feitelson b, 1997), where each
vertex represents a process that may be executed in one node. The computation cost of the
process is called the weight of the vertex. Each edge corresponds to the communication and
inter-dependence constraint between processes. The weight of the edge is the communication
cost, which depends upon the bandwidth and latency of the communication channel and the
quantum of the data required to be transferred.
Applications can be interactive or non-interactive, real-time or non-real-time. Any special
requirements of resources for a particular process may determine its scheduling on grid nodes.
2.1.2. Resource Model
(Berman d, 1999) describes a computational resource in terms of CPU speed, memory size and
the available time-slots. Other specialized features can be vector computers, pipe-lined systems
etc. Since it may be difficult to specify the computer power of a node, one of the possible ways
can be to specify it in terms of a benchmark test. Thus supercomputers in top 500 lists are
specified on the basis of the Linpack benchmark test. Policy considerations for use of the
resource may be time-shared vs. non-time-shared, dedicated vs. non-dedicated, pre-emptive vs.
non-pre-emptive.
2.1.2.1. The Resource Information Service
The resource information service in a grid may be centralized, decentralized or hybrid. A
centralized service may contain information about the whole of the grid. Thus theoretically it may
be able to lead to optimized scheduling. The disadvantages are that the centralized information
10
service is a single point of failure. An example of a centralized information service is the Globus
Meta Directory Service (MDS) given by Foster. (Foster I, 1997).
The Decentralized information service will store information about each node at the node.
Querying the node can provide the necessary information to a scheduler. But the querying
process may have a large overhead. Network Weather Service (Wolski, 2003) is an example of a
decentralized resource information service.
The hybrid scheme is a hierarchical scheme where at the highest level of a group, the information
is stored in a centralized entity. This information may be about the sources of information about
the resources over the entire system. But the information about each resource or a group of
resources is stored in a decentralized manner.
2.2. Globus Architecture for Grid Scheduling
Globus tool-kit is being used widely and nearly every scheduler is being designed to be
compatible with it. So it is important to define certain parts of the Globus architecture, since these
parts are used as a component of the scheduler architecture in most of the cases. In this Section,
the architecture of the Globus Resource Manager, which includes the allocation component, is
described.
Meta Directory Service (MDS) (Foster I, 1997), which is a part of the Globus tool-kit and Network
Weather Service (NWS) (Wolski, 2003) are used by many schedulers for obtaining resource
information.
Globus uses Grid Resource Allocation Manager (GRAM) as the Local Resource Manager (LRM)
(Foster I, 1999). Information Service and the database store the information about the resources
including the nodes and the network. Globus Meta Directory Service (MDS) (Foster I, 1997) and
Network Weather Service are examples of systems that can be used by an Information Service.
Figure 2-2-1 shows the Globus Architecture for Reservation and Allocation (GARA). GARA
provides for dynamic discovery, advance and immediate reservation, and management of
resources for networks, processors and disk memories. An application may have to use the
resources at multiple sites. At each site a Grid Resource Allocation Manager (GRAM) is used as
the Local Resource Manager (LRM) (Foster I, 1999). A Co-allocation agent is used to map
multiple resources to an application. Such an agent may use one or more GRAM(s). The
resource Information Service may be provided by an MDS or a NWS system.
11
Figure 2-2-1: GARA Resource Management Architecture (Czajkowski b, 1997, page: 12)
Figure 2-2-2 provides the major components of GRAM. The Figure shows that a GRAM client
obtains information from MDS. The GRAM Reported sends information about the resource to
MDS. The GRAM Client uses the Gatekeeper to load a task to the local resource set. The
Gatekeeper uses the Grid Security Architecture (GSA) for authentication and other security
services. The JobManager parses the tasks by using the Resource Scripting language (RSL)
library and conveys it to the Local Resource Manager for final allocation to the local resources.
The GRAM Client is able to obtain the status of the job through the Gatekeeper and the
JobManager.
Figure 2-2-2: Major Components of GRAM Implementation (Foster I, 1999, page: 4)
12
In the new upgrades to the Globus tool-kit, a Lightweight Directory Access Protocol (LDAP) has
been developed as the standard interface for both MDS and NWS. This makes it possible for
GARA to use either MDS or NWS as the source for resource Information Service.
2.3 Grid Scheduling Terminology
2.3.1 Jobs: Kendall Notation
This is used to specify how the jobs can be distributed over multiple nodes available on a grid for
simulation or for testing on a test-bed. The generalized notation uses six parameters
A/S/N/B/K/SD. A: The distribution of arrival of applications. S: The distribution of service time
where service time is the time spent by a job in the system. For study of grid systems, the usual
distributions used for applications and service time is Exponential or Deterministic. M and D
indicate these respectively. N: the number of servers, B: the number of buffers (storage capacity),
K: the population size of jobs, SD: Service Discipline. The service discipline can be First Come
First Serve (FCFS), Shortest Job First (SJF), Specified Priority (P), Round Robin (R-R) and
Multilevel Queue (MLQ).
2.3.1 Application and Processes
A grid application is a job, which a user wants to process on a grid. The application may consist
of a number of processes or tasks. Inter-dependencies among the processes will determine the
number of machines that can be used in parallel to process the application.
2.3.2 Allocation
The term allocation is considered equivalent to scheduling. Sometimes the word allocation of
resources is considered from the perspective of the service provider whereas scheduling is
expected to be of the user’s application.
2.3.3 Preemptive / Non Preemptive assignment
Pre-emptive assignment: Assign a task to a new node ever after the task has been partially
executed at another node. This would require transfer of the state of the process also. This may
be used by Round-Robin or Reservation method of scheduling. Or it may be used to avoid the
hogging of a node by a very long process. This may be a costly method to implement.
Non Pre-emptive assignment: Allocation of a process to a node, before the execution of the
process starts.
13
2.3.4 Resource Co-allocation
A metacomputing system may require an allocation of more than one resource from multiple
sites for the job to be completed. The allocation then depends upon information from multiple
resource managers. An allocation of resources in such an environment is called co-allocation
(Czajkowski b, 1997).
2.3.5 Migration
In this approach the main idea is to “dynamically migrate tasks of a parallel job. Migration delivers
flexibility of adjusting your schedule to avoid fragmentation. Migration is particularly important
when collocation in space and/or time of tasks is necessary” (Zhang a, 2003).
2.3.6 Space-sharing
In the Space-sharing approach, jobs can run concurrently on different nodes of the grid at the
same time, but each node is exclusively assigned to a job. Submitted jobs are kept in a priority
queue, which is always traversed according to a priority policy in search of the next job to
execute. (Zhang a, 2003).
2.3.7 Time-sharing
The Time-sharing approach allocates time slices to different jobs. Even if a job is not completed
in its pre-allocated time-slice, the partially executed job will be shifted and the next job will be
processed during the “next” time-slice. (Zhang, 2003).
2.4 Grid Scheduling Approaches
2.4.1 First-Come-First-Serve (FCFS)
The job arrival times and the arrival rates may vary for a grid. A grid scheduler that follows the
FCFS policy will allocate the jobs to compute nodes in the order of their arrival. The jobs, which
arrive when all the resources are in use, will wait in a Wait queue. “This strategy is known to be
inefficient for many workloads as a large number of jobs waiting for execution can result in
unnecessary idle time of some resources” (Schwiegelshohn b, 2000).
14
2.4.2 Backfilling
Backfilling refers to the case, where a task at the back in the Wait queue, is chosen on the basis
of some criterion to be processed before the first task in the Wait queue is taken up for
processing. Sometimes when a compute node becomes free, it may not be possible to allocate
the first task in the wait queue due to dependencies among the tasks. In such a case rather than
keeping the node idle, an out-of-order task may be allocated to the free node.
Two common variants are Easy and Conservative backfilling. In Conservative backfilling, the out-
of-order task must be chosen such that the first job in the Wait queue is not delayed. “The
performance of this algorithm depends upon a sufficiently large backlog.” (Schwiegelshohn b,
2000)
2.4.3 Gang Scheduling
In this approach the main concept is to “add a time-sharing dimension to space sharing using a
technique called gang scheduling or coscheduling. This technique virtualizes the physical
machine by slicing the time axis into multiple virtual machines. Tasks of a parallel job are co-
scheduled to run in the same time-slices (same virtual machines) “(Zhang a, 2003).
2.4.4 Genetic Algorithms (GAs)
GAs are useful for optimization problems, in cases where the number of parameters, which affect
the performance are large. GAs begins with a potential set of solutions, called the population of
the search space and a fitness function, appropriately defined for the problem on hand. GAs then
try to obtain an optimum solution by using the processes of cross-over and mutation. “GAs are
widely reckoned as effective techniques in solving numerous optimization problems because they
can potentially locate better solutions at the expense of longer running time. Another merit of a
genetic search is that its inherent parallelism can be exploited to further reduce its running time”
(Kwok, 1999).
2.4.5 Directed Acyclic Graphs (DAG) Scheduling
A job, which is to be run on a parallel system of computational nodes, can be represented by a
weighted-node and weighted-edge DAG. The weight of a node is the time that it would take to be
processed. The directed edges define the dependencies. The weight of the edges may represent
the communication delays.
15
“The objective of DAG scheduling is to minimize the overall program finish-time by proper
allocation of the tasks to the processors and arrangement of execution sequencing of the tasks.
Scheduling is done in such a manner that the precedence constraints among the program tasks
are preserved. The overall finish-time of a parallel program is commonly called the schedule
length or makespan” (Kwok, 1999).
2.4.6 Priority Queuing
Priority Queuing has two sub types such as Head of Line (HOL) for a fixed priority and Dynamic
Queue (DQ) where each arriving job is allocated a priority. Its priority level determines the
position of the job in the queue. (Zhang a, 2003)
2.4.6.1 Priority Queuing Methods: Three Issues
While implementing priority queuing, two issues have to be considered to avoid pitfalls.
1. Starvation: When high priority processes continuously enter the queue, lower priority
processes may not be able to get processing time. If pre-emption is allowed, even if a low priority
process were to start getting processed, it may be pushed out as soon as a high priority process
was to come in. In such a situation, the lower priority processes are said to face starvation.
Solution: Increase the priority level of a process depending upon the waiting time in the queue of
the process. Thus over a period of time an initially low-priority process may be able to beat a
high-priority process, which has just entered the queue.
2. Deadlocking: A high priority process may need some resource, which can be created only
when a lower priority process is executed. However since the high priority process is supposed to
be processed first, a deadlock may occur, in that neither the high priority nor the low priority
process may be executed.
Solution: Priority Inversion: The priority of the lower-priority process may be temporarily boosted
so that it may be executed, before the high priority task is processed.
2.5 Types of Jobs on Grid
A grid has heterogeneous dynamically available resources. In addition it handles a great diversity
of jobs. A grid may have three different categories of jobs defined by (Subramani, 2002):
16
(i) Local jobs: The node or a cluster may have to process local jobs. Usually such jobs
have a higher priority on locally available resources.
(ii) Batch jobs: considered for batch processing: such jobs may be required to be
processed as the resources become available. The jobs may have to follow a
queuing discipline, as specified by the administrator.
(iii) Reserved jobs: Reserved time-slot jobs: such jobs may have to be processed as the
highest priority jobs, during the pre-specified time slot. Or these jobs may have pre-
specified deadlines.
Each job may consist of a number of processes. The processes may have a variety of inter-
dependencies. If the interdependency is not serial in character, it may be possible to run different
processes of a single job in parallel on different nodes. In addition a grid scheduler may handle a
large variety of jobs from a multiplicity of users. Thus on a node may be mapped processes from
multiple jobs in succession.
The jobs may be inter-active or non-interactive. These may be real time jobs having deadlines or
batch jobs. The jobs may have different levels of priority. Moreover each site on the grid may
have a different scheduling policy. Such diverse jobs on intermittently available resources make
the grid scheduling highly complex.
2.6 Grid Scheduler Performance Criteria
Since grid applications have great diversity, it is not possible to specify a single test of
performance criteria, which can be applied to all cases. Depending upon the environment, some
of the following criteria may be used to evaluate the performance of a scheduler. (Berman d,
1999).
Improvement from the User’s perspective
Improvement from the Service Provider’s perspective
Minimum overhead of the scheduler
Fault tolerance
Scalability
Applicable to a wide diversity of applications and grid environment.
17
3. Classification of Grid Schedulers
Many scheduling systems for grid environments have been designed. However each of the
schedulers has been designed for a specific grid structure and each has some unique features.
3.1. Organizational structure based scheduler
The grid schedulers can be classified into three categorizes based on its organizational structure
given by (Hamscher, 2002).
3.1.1. Hierarchical Schedulers
Figure 3-1: Hierarchical scheduling (Hamscher, 2002, page: 4)
A hierarchical structure consists of local schedulers and higher-level schedulers, which work in
concert. However such a system may not be able to provide local autonomy for setting local
scheduling policies.
3.1.2. Centralized schedulers
18
Figure 3-2: Centralized scheduling (Hamscher, 2002, page: 3)
In a centralized scheduler, all applications are added to a common queue of a centralized
scheduler. Each site has no queue and scheduling function is not performed at the sites. Since
the central scheduler has information about all the jobs and all the resources, conceptually it
should be able to provide an optimal solution. However such a scheduler is slow and not scalable
as the grid grows.
3.1.3. Decentralized Schedulers
Figure 3-3: Decentralized scheduling (Hamscher, 2002, page: 5)
A decentralized scheduler has local queues at each site. It responds to requests for service from
local users and from other schedulers.
Such a system has no knowledge about user requests received at other schedulers. Nor does it
know about the resources at other sites. So the scheduling may not be optimal. But such a
system is fault-tolerant. It can also use scheduling policies as required by the local administrator.
19
3.2. Application / Resource based Scheduler
3.2.1. User-centric scheduler
The User-centric Performance goals can be defined as follows: (Berman d, 1999)
Minimize execution time
Minimize the waiting time in the ready-to-process queue
Maximize speed-up on the user’s own platform i.e. minimize the turn-around time
Minimize process slow-down, defined as the ratio of turn-around time to the actual
execution time.
Application level scheduler AppLeS (Berman b, 2003) is a User-centric scheduler.
3.2.2. Service-provider centric scheduler
The Service-centric Performance goals can be defined as follows: (Berman d, 1999)
Maximize throughput. (Throughput is the number of jobs processed per unit of time.)
Maximize utilization. (Utilization is the percentage of time a resource is busy)
Minimize flow time. (Flow time or session time is the sum of completion time of all jobs.)
A service-provider centric scheduler does not try to obtain detailed information about the
characteristics of the application and it tries to optimally use the computing power of the grid.
Condor (Wright, 2001) is an example of a service-provider centric scheduler.
3.2.3. Economy based scheduler
A third case of scheduling is called Economy-based. The scheduler mimics the market place
where the user’s application is to execute at a particular total maximum cost or it is required to
meet certain deadlines. The user is interested in satisfying the requirements of the application at
the minimum possible cost. The goal of the service-provider is to obtain the highest price for its
services. Thus each resource is characterized by its cost and its computational capacity
characteristics. The scheduler tries to provide the service to the user at the lowest cost while
providing maximum returns to the service provider. Nimrod-G (Buyya g, 2000) is a scheduling
system based on such market-like considerations.
3.3. Other characterizing features:
Each of the class of schedulers described in Section 3.1 and 3.2 may have some of the following
characteristics also.
20
3.3.1. Online vs. Batch Schedulers
Online schedulers allocate a node to an application as soon as the application is received. A
Batch scheduler puts the received applications in a queue. At regular scheduling events, it
allocates the applications to nodes, by following its scheduling policy. The overhead of online
scheduler is much higher than the overhead of batch schedulers. Most of the practical schedulers
are batch schedulers.
3.3.2. Preemptive / Non-preemptive based schedulers
Preemptive schedulers allow a job, which is being processed at one node to be rescheduled to a
different node. However rescheduling adds a great deal of overhead, since the entire state of a
partially processed task has to be transferred. Most of the practical schedulers are non-
preemptive type.
4. Information Retrieval for Grid Schedulers
21
A number of grid scheduling systems have been developed. Each of them is based upon a
particular vision of the grid, with different assumptions and constraints. Each has its own goals
and therefore each has been used in some specific grid application.
The Global Grid Forum has been able to develop generalized definitions and requirements for the
various components of the grid including the scheduler.
The schedulers require information about the availability of resources. There are a number of
research projects, which aim at gathering the information and at making it available to the
schedulers.
4.1. Resource Information retrieval
A grid has two major types of resources:
Compute resource
Communication resource
A resource information retrieval system may provide information either about only the compute
resources or about both the types of resources. Such a system may be a centralized system or a
distributed system. In this section one example each of the two types is described.
4.2. Meta computing Directory Service (MDS)
(Foster I, 1997) defines MDS as “Metacomputing Directory Service that provides efficient and
scalable access to diverse, dynamic, and distributed information about resource structure and
state”. It is a protocol for publishing information of grid resources. MDS is a part of the Globus
toolkit. The resource information is stored in a collection of LDAP servers. For each computing
station, the information includes the following:
Operating system
Processor type and speed
Number of processors
Available memory size
Dynamic information like load, process information etc.
Each resource has associated with it a Grid Resource Information Service (GRIS). GRIS monitors
the resource and stores current information about the status of the resource. Grid Index
Information Service (GIIS) provides an aggregate directory of lower level data received from
22
(GRIS) of each resource. Thus GIIS provides information about multiple GRIS systems. (GIIS)
can be queried for information about resources. (GIIS) caches information from (GRIS) with long
update frequency.
A hierarchical structure of GIIS can be built so that higher-level GIIS will contain information from
a number of GIIS units at a lower level. Globus Grid Security Infrastructure (GSI) can be
integrated with the querying system. (Zhang b, 2003)
Figure 4-1: The MDS 2 Architecture. This architecture is flexible: There can be several levels of GIISs, and any GRIS or GIIS can register with another, making this approach
modular and extensible. (Zhang b, 2003, page: 2)
To index the contents of a GRIS, GIIS can find a GRIS by using one of the following options:
(i) A GIIS may be configured with the associated GRIS hostnames.
(ii) A GRIS may register with a GIIS during start up
(iii) A referral service to link together multiple GRIS and/or GIIS servers into a single
LDAP namespace can be created. The referral service contains no information about
the resources. But it points towards GIIS and GRIS, from where the information can
be collected. GIIS may find a GRIS by using the referral service.
A user can query a GIIS or can directly collect the information from a GRIS. MDS provides
current information about the computing resources only. The advantages of using a standard
LDAP interface is that the information can be collected directly from the computing hosts or from
23
SNMP based systems or databases like Oracle. A user can find a GIIS by using DNS server
records or from a referral tree or from other GIISs.
4.3. Network Weather Service (NWS)
NWS monitors the deliverable performance available from a distributed resource set. Based on
the collected data through sensors, it forecasts the future performance levels using statistical
forecasting models. It is adaptive in that if it is queried about a resource set repeatedly, it selects
the forecasting model, which is able to provide the best prediction.
4.3.1. Architecture of a NWS Resource-set
For NWS, the resource set consists of computer machines and the network links connecting the
machines. Sensors are mounted on each of the resources. The data from the sensors is
collected. The forecasting subsystem creates forecasts by using each of the 17 statistical models.
It is possible to add additional statistical models to the system by using an API.
Figure 4-2: Network Weather Services. (Wolski b, 1999, page: 4)
The reporting interface responds to command-line commands generated by an NWS index node.
The NWS index node provides an interface between the NWS system as shown in fig 4-2 and an
MDS LDAP request.
4.3.1.1. NWS Performance Sensors
24
NWS supports CPU sensor, Real memory sensor and TCP/IP bandwidth and latency sensor
(Wolski b, 1999). The CPU sensor and the Real memory sensor monitor all active processes to
determine available CPU resources and the available memory. The bandwidth and latency are
measured by using Receive-side end-to-tend measurement algorithms. It is possible to extend
NWS by adding new sensors. Each sensor produces a time series of measurements.
Performance Comparison:
GRIS can provide both static and dynamic information about the current status of the computing
machines only. NWS is able to predict the performance at a time when the resource is expected
to be used. NWS is able to provide information both about the computing machines as well as
about the network.
However a grid scheduler requires a prediction about the resource status not only at a specific
point of time, but for a duration of time, over which the resource is to be used by the application
on the grid. Such systems have not yet become available for general use on the grid.
Both NWS generated information and the information from GRIS can be aggregated by an MDS
by using LDAP.
Resource information retrieval systems can be compared on the following basis:
Does the system provide only the current status or whether it can predict the future state of
the resource also? If it predicts the status of the resources in the future, does it make the
prediction for a fixed point of time in the future or for duration of time in the future?
Does it provide information about both the computer machines and the network?
Ease of use
Scalability
Speed of providing information
5. Existing Grid Schedulers
25
Many of the available grid schedulers are centralized. They assume that the jobs are received at
a central point and the scheduling process is controlled by the scheduler, which has complete
information about all the machines. Condor, Nimrod and GrADS are examples of the centralized
schedulers.
5.1. Condor
The Condor High Throughput Computing System is a combination of dedicated and opportunistic
scheduling. “Opportunistic scheduling involves placing jobs on non-dedicated resources under
the assumption that the resources might not be available for the entire duration of the jobs”
(Wright, 2001).
Figure 5-1: Architecture of a condor pool with no jobs running.(Wright, 2001, page: 4)
As shown in Figure 5.1, the architecture of Condor uses a daemon called the collector, for
centrally maintaining the list of Condor resources. Periodically updates are sent to the collector by
Condor daemons on computing machines. The collector runs on the machine called the Central
Manager. The Central Manager consists of negotiator, which tries to find a match between
resource requests, and resource offers (Wright, 2001). Each computational resource within the
pool is represented by a demon called startd. Another demon called Schedd, handles the user
jobs. All the jobs are submitted to Schedd. The main functionality of Schedd is to maintain a job
queue, to publish resource request and to negotiate for available resources.
Condor, assumed that jobs, to be processed by a grid, would be independent and non-real time
jobs. Resources are assumed to be independent workstations. The information about the
computing resources only specifies the availability of the resource. When it is used for grid
processing, the resource is not to be used for processing local jobs. Thus it is not time-shared.
26
However as soon as a local job is loaded on a resource, the job brought through the grid will be
pre-empted that is it would be moved to another resource available in the pool.
Check pointing is used for storing the state of the processing so that, if required, the job may be
moved. Condor does not take into account the overhead of transferring a job.
Condor scheduling is designed to increase the utilization of workstations within a single domain
(Wright, 2001).
5.2. Condor-G
Condor-G is a combination of Condor and Globus toolkit (Frey, 2002). The Globus toolkit
supports resource discovery and resource access in multi-domain systems. Condor-G allows a
user to harness multi-domain resources as if they all belong to a single domain. It also uses
authentication, authorization and secure file transfer facilities provided by the Globus tool-kit.
Figure 5-2: Remote execution by Condor-G on Globus- managed resources(Frey, 2002, page: 4)
As shown in the Figure 5.2, Condor-G uses Globus GASS (Global Access to Secondary Storage)
file server, G.S.I (Grid Security Infrastructure) and GRAM (Grid Resource Allocation and
Management) protocol. The user accesses Condor-G scheduler from the user desktop. A local
Grid-Manager daemon is created along with a GASS server. The Grid Manger gets authenticated
with the Globus tool-kit at the job execution site. A Globus Job Manager daemon is created at the
27
job-site. The Grid Manager and the Job Manager cooperatively get the job processed by using
the available grid resources.
The Condor-G scheduler at the user’s desktop maintains a persistent job queue, so that failure of
any part of the Grid would make available to the user the status of the job and state of the
processed part up to the last checkpoint.
5.3. AppLeS Scheduler
AppLeS stands for Application Level Scheduler (Berman b, 2003). It has been designed for
meeting performance goals, which may be specified by the application. Each application has its
own scheduling agent. The agent monitors available resources and generates a schedule for the
application.
AppLeS is based on a client/server model. It is also designed for a single domain where
resources are time-shared.
The resource discovery and prediction is implemented through NWS. The user has to specify the
necessary information about the application as well as the performance goal that the scheduler
may attempt.
A number of possible schedules are developed along with the expected performance index for
each case. The schedule, which maximizes the performance goal, is selected and implemented.
The scheduler adaptively learns from every cycle of implementation to refine its working.
Figure 5-3: Steps in the AppLeS methodology (Berman b, 2003, page: 2)
28
5.4. Nimrod-G Resource Broker
Nimrod-G was developed at Monash University, Australia. Nimrod mimics the model of a market
for a single domain (Buyya g, 2000). Nimrod-G has been obtained by redesigning Nimrod for
operation with Globus tool kit as shown in figure 5-4.
Figure 5-4: Nimrod/G and Globus Components Interactions(Rajkumar Buyya, 2000, page: 4)
The resource discovery and job allocation over a grid is implemented through the Globus tool-kit
whereas Nimrod handles the market mechanism.
Each Nimrod application has a specified budget and a deadline, before which it should be
completed. Each resource has a price, which must be paid if the resource is to be used. The
scheduler is designed to complete the application within the deadline at the minimum cost.
Similarly each resource tries to maximize the gain by providing its services. If the budget or the
deadline should be exceeded, the user is informed about it. Every application in Nimrod-G has an
associated Job Wrapper, which acts as a mediator between the resource and the application. It
would also be used for sending the required information to the Resource Accounting system.
29
Nimrod-G is part of a framework called Grid Architecture for Computational Economy (GRACE).
GRACE includes a global scheduler called a broker. The broker works with other components like
bid-manager, directory-server, and Globus tool-kit to maximize the performance goals.
5.5. GrADS Scheduler
The Grid Application Development Software (GrADS) project aims at making it simple for users to
use grid resources (Berman c, 2001). After the application has been prepared in the GrADS
program preparation system, it is delivered to the Scheduler/Service Negotiator system. The
scheduler is a part of the GrADS execution environment.
Figure 5-5: GrADS Program Preparation and Execution Architecture(Berman c, 2001, page: 8)
When the object containing the job with a performance contract is delivered to the
Scheduler/Service Negotiator, it will broker the allocation and scheduling of grid resources for the
job. Thereafter the dynamic optimizer is activated to adapt the program to the available
resources. It will also insert sensors for monitoring the status of the job.
The real time monitor verifies that the requirements of the performance contract are satisfied. In
case there is a violation, the execution may be interrupted. Either the optimizer may adapt the
program or new resources may be negotiated or both the steps be taken to ensure compliance
with the performance contract. Thus the closed loop system of GrADS scheduler ensures Quality
of Service (QoS).
30
5.6 The Legion Scheduler
The Legion scheduler is a part of the object-based grid middleware being developed at University
of Virginia. “The Legion design encompasses ten basic objectives: site autonomy, support for
heterogeneity, extensibility, ease-of-use, parallel processing to achieve performance, fault
tolerance, scalability, security, multi-language support, and global naming” (Chapin, 1999).
The Legion scheduler is customized for each application and it aims at minimizing the turn-
around time. Since a customized scheduler is used, Legion system can cater to a diverse set of
jobs on heterogeneous resources.
The Legion system consists of objects, different instances of which interact with one another.
Thus an application is an object. The scheduler will use instances of this object on different
resources as shown in the figure 5.6.1.
Figure 5-6-1: Choices in Resource Management Layering (Chapin, 1999, page: 5)
Legion has a centralized Resource State Information Base. It has computational resources and
storage resources. It does not yet have network resources as a part of the database.
31
Figure 5-6-2: Use of the Resource Management Infrastructure (Chapin, 1999, page: 7)
The customized scheduler, as shown in Figure 5.6.2, is a part of the application object. It interacts
with the resource database to work out a schedule, which meets the performance goals of the
application. The scheduler passes the schedule to the Legion Enactor. The Enactor makes the
reservations and inter-acts with the scheduler, if any changes are required. Then the Enactor
places the objects on the selected resources and monitors the status of the application. Legion
system is designed to be secure and scalable.
5.7 TITAN
The TITAN architecture “employs a performance prediction system (PACE) and task distribution
brokers to meet user-defined deadlines and improve resource usage efficiency” (Spooner, 2003).
Iterative heuristic algorithms are applied for performance prediction for job schedulers in Grid
environments. Workload managers, Distribution Brokers and Globus Interoperability providers
comprise the TITAN system's hierarchy, with Globus forming the highest level in the hierarchy.
The lowest level consists of schedulers for managing physical resources as shown in the figure 5-
7-1.
32
Figure 5-7-1: The TITAN architecture (Spooner, 2003, page: 3)
The algorithms of Deadline Sort, Deadline Sort with Node Limitation and the Genetic Algorithm
have been tested. The Genetic Algorithm approach is found to be less sensitive to the number of
processing nodes and to minor changes in the characteristic of the resource. Moreover it can take
into account a large number of parameters of the resource system. The genetic algorithm is able
to converge fast to balance the three objectives of makespan, idle time and the QoS metric of
deadline time. TITAN uses Performance Analysis and Characterization Environment (PACE) for
the predictor and brokers to meet performance objectives of a scheduler.
33
6. Concluding Remarks
A global grid will consist of heterogeneous computational resources connected through high-
speed network. It may also have many data repositories and code repositories. A scheduler
system will be the interface between a user and the grid resources.
6.1. Performance Goals for a Scheduler for the Global Grid
For a global grid, a scheduler will have the characteristics of being hierarchical, batch-type and
non-preemptive type.
The schedulers have been developed from the perspective of efficient resource usage (Condor
and Condor-G) or from the perspective of applications (AppLeS, Legion scheduler, GrADS) or
from the perspective of a grid as market (Nimrod). Since the Globus tool-kit is becoming
pervasive, schedulers like Condor-G or Nimrod-G may find wider acceptance. Most probably on a
worldwide grid, an economy-based scheduler turns out to be the final choice.
An open worldwide market in grid may lead to a user having a choice of being able to use
resources from a number of resource providers. Hence the scheduling process is likely to be
governed by the perspective of the user. In general a user may be interested in a short turn-
around time. Or the user may be interested in having the processing completed within a specified
deadline and cost. Such schedulers have not yet been built.
6.2. Architecture of a Scheduler for the Global Grid
In the survey a number of schedulers have been described. The architecture of each of the
schedulers has also been described. But the architecture of a Global Grid scheduling system is
not yet available in the published literature. But the ideas for such a system in a conceptual form
can be garnered from a study of the Survey. A user will be able to approach a number of
Resource Providers for processing application. The scheduling agent of the user application will
negotiate with a service provider on the basis of the deadline and the cost. Once the User’s Agent
and the Resource provider’s scheduling agent agree, the processing for the application would
begin. It is possible that user’s agent may do the transaction through a Broker rather than dealing
with the Resource Provider directly. An effective accounting system and many components of
such a scheduler are a subject of continuing research.
34
Appendix – I
Bibliography
[1]. (Abawajy a, 2003) J. H. Abawajy, S.P. Dandamudi. Parallel Job Scheduling on Multi-Cluster Computing Systems. Proceedings of IEEE International Conference on Cluster Computing (Cluster 2003), pp.11-18, 2003. Keywords: Cluster computing, Grid computing, dynamic-scheduling policy, space-sharing policy, time-sharing policy, load-redistribution, hybrid-policies, adaptive hierarchical Scheduling, job queue.
[2]. (Abawajy b, 2003) J. H. Abawajy, S.P. Dandamudi. Scheduling Parallel Jobs with CPU and I/O Resource Requirements in Cluster Computing Systems. In Proceedings of the 11th IEEE/ACM International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunications Systems (MASCOTS'03), IEEE Computer Society Press, pp.336-343, 2003.Keywords: Cluster computing, Parallel Job Scheduling, Parallel I/O, Coordinated CPU-I/O Resources Scheduling, High-Performance Computing.
[3]. (Abawajy c, 2002) J. H. Abawajy, "Job Scheduling Policy for High Throughput Computing Environments," In Proceedings of the 9th IEEE International Conference on Parallel and Distributed Systems (ICPADS2002), pp.605-610, 2002. Keywords: Job Scheduling, Cluster Computing, Opportunistic Job Scheduling, High Throughput Computing, Performance Analysis, Condor.
[4]. (Abawajy d, 2003) J. H. Abawajy. An integrated resource scheduling approach on cluster computing systems. In the Proceedings of the IEEE International Parallel and Distributed Processing Symposium (IPDPS 2003), pp. 251- 256, 2003. Keywords: Cluster computing, Job Scheduling, Parallel I/O, Resource Scheduling, High-Throughput Computing, High-Performance Computing.
[5]. (Abawajy e, 2003) J. H. Abawajy. Performance analysis of parallel I/O scheduling approaches on cluster computing systems. In the Proceedings of 3rd IEEE/ACM International Symposium on Cluster Computing and the Grid (CCGrid 2003), pp.724- 729, 2003. Keywords: I/O Scheduling, Equi-Partition Policy, Performance Analysis, Lowest Destination Degree First, Highest Destination Degree First, Random policies.
[6]. (Abramson, 2000) D. Abramson, J. Giddy, et al. High Performance Parametric Modeling with Nimrod/G: Killer Application for the Global Grid? In Proceedings of the 14th International Conference on Parallel and Distributed Processing Symposium (IPDPS-00), pp.520–528, Los Alamitos, 2000. IEEE.
Keywords: Parametric Modeling, Computational Grids, Global Grids, Computational Grid, heterogeneous computing, scheduling.
35
[7]. (Agarwal, 2004) P. Agarwal, Rashmi Bajaj. Improving Scheduling of Tasks in a Heterogeneous Environment. IEEE Transactions on parallel and Distributed systems, v.15 n.2, pp.107-118, 2004.
Keywords: Communication cost, computational cost, directed acyclic graph, heterogeneous environment, network of processors, optimal scheduling, task duplication.
[8]. (Allen, 2001) G. Allen, T. Dramlitsch, et al. Supporting efficient execution in heterogeneous distributed computing environments with Cactus and Globus. In Proceedings of the 2001 ACM/IEEE Conference on Supercomputing, pp. 52–76, Colorado, 2001.
Keywords: Grid resources, Computational Grids, Communication Schedulers, Cactus-G, Globus.
[9]. (Alhusaini, 1999) A.H. Alhusaini, V.K.Prasanna, et al. A unified resource scheduling framework for heterogeneous computing environments. In 8th Heterogeneous Computing Workshop (HCW’99), pp.156-165, 1999.
Keywords: Metacomputing Systems, Quality of service, Resource Scheduling algorithms, AppLes, Unified scheduling.
[10]. (Berman a, 2003) Francine Berman, Walfredo Cirne. When the Herd Is Smart: Aggregate Behavior in the Selection of Job Request. In IEEE Transactions On Parallel and Distributed Systems, v.14, pp.181-192, 2003.
Keywords: Parallel supercomputers, space-shared supercomputers, job scheduling, application scheduling, aggregate behavior.
[11]. (Berman b, 2003) Francine Berman, Richard Wolski, Henri Casanova, et al. Adaptive Computing on the Grid Using AppLes. In IEEE Transactions On Parallel and Distributed Systems, v.14, pp.369-382, 2003.
Keywords: Scheduling, parallel and distributed computing, heterogeneous computing, grid computing.
[12]. (Berman c, 2001) Francine Berman, Andrew Chien, Keith Cooper, et al. The GrADS Project: Software support for high-level Grid application development. The International Journal of High Performance Computing Applications, v.15 n.4, pp.327–344, 2001.
Keywords: Distributed, dynamic, heterogeneous computing, performance monitoring.
[13]. (Berman d, 1999) Francine Berman. High-performance schedulers. In Ian Foster and Carl Kesselman, editors, The Grid: Blueprint for a New Computing Infrastructure, pp.279–309, Morgan Kaufmann, San Francisco, CA, 1999.
Keywords: Scheduling, high-performance application schedulers, computational grids, performance measure, scheduling policy, performance model.
36
[14]. (Berman e, 1996) Francine Berman, Richard Wolski, et al. Application Level Scheduling on Distributed Heterogeneous Networks. In Proceedings of Supercomputing 1996, pp.1-28, ACM press, 1996.
Keywords: Heterogeneous systems, application-management system, Prediction, Resource Selection, Scheduling, Memory Availability.
[15]. (Berman f, 1996) F. Berman, R. Wolski. Scheduling From the Perspective of the Application. In proceedings of the Fifth IEEE Symposium on High performance Distributed Computing HPDC96, pp.100-111, 1996.
Keywords: Application Centric Scheduling, Metacomputing, Application Centric scheduling, Resource Selector, AppLes.
[16]. (Blumofe, 1994) R. D. Blumofe, Park, D. S. Scheduling large-scale parallel computations on network of workstations. 3rd IEEE International Symposium on High-Performance Distributed Computing. pp. 96-105, 1994.
Keywords: Parallel supercomputers, idle-initiated scheduler, gang-scheduler, Macro-level scheduling, Micro-level scheduling, heterogeneous systems.
[17]. (Braun, 1998) T.D. Braun, H.J. Siegel, et al. A Taxonomy for Describing Matching and Scheduling Heuristics for Mixed-machine Heterogeneous Computing Systems. IEEE symposium on Reliable Distributed Systems, pp.330-335, 1998.
Keywords: Taxonomy, Matching, Scheduling, subtask mapping, meta-task mapping, Heterogeneous Computing.
[18]. (Bringmann, 1997) Oliver Bringmann, Wolfgang Rosenstiel. Resource sharing in hierarchical synthesis. Proceedings of the 1997 IEEE/ACM international conference on Computer-aided design, pp.318-325, 1997.
Keywords: Component Matching, Resource sharing, hierarchical schedule, sharing interval, Collision Set.
[19]. (Burke a, 1999) E. K. Burke, A. J. Smith. A memetic algorithm to schedule planned maintenance for the national grid. Journal of Experimental Algorithmics (JEA), v.4, pp.1-13, 1999.
Keywords: Heuristics, memetic algorithms, tabu search, simulated annealing, hill climbing, Maintenance Scheduling.
[20]. (Burke b, 1997) E. K. Burke, A. J. Smith. A memetic algorithm for the maintenance-scheduling problem. In Proceedings of the International Conference on Neural Information Processing and Intelligent Information Systems, v.1, pp.469–472. Springer, 1997.
Keywords: Memetic algorithm, Genetic Representation, Selection, Mutation, Neighbourhood Search, Local Search.
37
[21]. (Buyya a, 2004) Rajkumar Buyya, Anthony Sulistio, et al. A Taxonomy of Computer-based Simulations and its Mapping to Parallel and Distributed Systems Simulation Tools. International Journal of Software: Practice and Experience, v.34 n.7, pp.653-673, Wiley Press, 2004
Keywords: taxonomy, simulation tools, parallel system, distributed system, scalability, Symmetric Multiprocessing (SMP) system, Massively Parallel Processing (MPP) system.
[22]. (Buyya b, 2003) Rajkumar Buyya, Alexander Barmouta. GridBank: A Grid Accounting Services Architecture (GASA) for Distributed System Sharing and Integration. In International Parallel and Distributed Processing Symposium. IPDPS2003: IEEE Computer Society Press, pp.1- 8, 2003.
Keywords: Computational economy, Grid accounting, Grid Bank, Payment schemes, Grid Scheduling.
[23]. (Buyya c, 2002) Rajkumar Buyya, Muthucumaru Maheswaran, et al. A taxonomy and survey of grid resource management systems for distributed computing. The Journal of Software Practice and Experience, v.32 n.2, pp.135–164, 2002.
Keywords: Metacomputing, Grids, Resource Management, Scheduling, and Internet Computing.
[24]. (Buyya d, 2002) Rajkumar Buyya. Economic-based Distributed Resource Management and Scheduling for Grid Computing. Ph.D. Thesis, Monash University Australia, April 2002.
Keywords: Grid computing, heterogeneous resources, policies, application scheduling, quality-of-service, GRACE, GridSim, optimization strategies.
[25]. (Buyya e, 2002) R Buyya, D Laforenza, et al. Grids and Grid Technologies for Wide-area Distributed Computing. The Journal of Concurrency and Computation: Practice and Experience, v.14, Issue 13-42, 2002.
Keywords: Grid Computing, middleware, resource management, scheduling, distributed applications.
[26]. (Buyya f, 2000) R. Buyya, S. Chapin, D. DiNucci. Architectural Models for Resource Management in the Grid. The First IEEE/ACM International Workshop on Grid Computing (GRID 2000), Springer Verlag LNCS Series, 2000
Keywords: Grid Systems, resource scheduling, Hierarchical Model, Abstract Owner Model, Computational Market/Economy Model.
38
[27]. (Buyya g, 2000) R. Buyya, D. Abramson, J. Giddy. Nimrod/G: An Architecture for a Resource Management and Scheduling System in a Global Computational Grid. Proceeding of the HPC ASIA’2000, the 4th International Conference on High Performance Computing in Asia- Pacific Region, Beijing, China, IEEE Computer Society Press, USA, 2000
Keywords: Computational Grids, System Architecture, Parametric Engine, Scheduler, Dispatcher, Job-Wrapper, Nimrod/G, Globus Toolkit, Resource allocation, Queuing System.
[28]. (Buyya h, 2000) R. Buyya, D. Abramson, et al. An Evaluation of Economy-based Resource Trading and Scheduling on Computational Power Grids for Parameter Sweep Applications. Workshop on Active Middleware Services (AMS 2000), (in conjunction with Ninth IEEE International Symposium on High Performance Distributed Computing), Kluwer Academic Press, 2000, Pittsburgh, USA.
Keywords: Grid Computing, Computational middleware, Resource Trading, Nimrod/G Resource Broker, scheduling, Parameter Sweep Applications.
[29]. (Buyya i, 2000) Rajkumar Buyya, Baikunth Nath, et al. Nature's Heuristics for Scheduling Jobs on Computational Grids. The 8th IEEE International Conference on Advanced Computing and Communications (ADCOM 2000), 2000, India. Keywords: Grid Computing, Computational grid, grid computing, scheduling, resource management, global optimization algorithms, genetic algorithm, simulated annealing, tabu search, nature’s heuristics and hybrid algorithm.
[30]. (Cao, 2002) Junwei Cao, Stephen A. Jarvis, et al. ARMS: An agent-based resource management system for grid computing. Scientific Programming, v.10 n.2, pp.135–148, 2002.
Keywords: Grid Computing, Performance Analysis, homogeneous agents, scheduling systems, scalability, adaptability, scheduling algorithms, PACE toolkit, resource management.
[31]. (Casanova a, 2003) H. Casanova, H. Dail, F. Berman. A Decoupled Scheduling Approach for Grid Application Development Environments. Journal of Parallel and Distributed Computing (JPDC), Special issue on Grid Computing, v.63 n.5, pp.505-524, 2003.
Keywords: Grid Computing, Performance Analysis, homogeneous agents, scheduling systems, scalability, adaptability, scheduling algorithms, PACE toolkit, resource management.
[32]. (Casanova b, 2002) H. Casanova. Distributed computing research issues in grid computing. SIGACT News, v.33 n.3, pp.50-70, 2002. ACM Press.
Keywords: Distributed Computing, Grid Computing, heterogeneous resources, High Performance Computing, Data Grid, Grid Information Service, application-level scheduler.
39
[33]. (Casanova c, 2001) Henri Casanova. Simgrid: A Toolkit for the Simulation of Application Scheduling. In Proceedings of the First IEEE/ACM International Symposium on Cluster Computing and the Grid (CCGrid 2001), pp. 430-437, 2001.
Keywords: Heterogeneous environment, Application scheduling, heuristics, random searches, scheduling algorithms, time-shared, topology, prediction, microgrid, simgrid.
[34]. (Casanova d, 2001) Henri Casanova, Shava Smallen, Francine Berman. Applying scheduling and tuning to on-line parallel tomography. Proceedings of the 2001 ACM/IEEE conference on Supercomputing (CDROM), pp.1-19, 2001.
Keywords: Application Level Scheduling, Application Tunability, Heterogeneous Computing, Computational Grid, Parallel Tomography, On-line Instruments.
[35]. (Casanova e, 2000) H. Casanova, A. Legrand, D. Zagorodnov, et al. Heuristics for Scheduling Parameter Sweep Applications in Grid Environments. In Proceedings of the 9th Heterogeneous Computing Workshop (HCW'00), pp.349--363, 2000.
Keywords: Computational Grid, scheduling, heuristics, parameter sweep, Grid Model, scheduling algorithm, CPU load.
[36]. (Casanova f, 1999) H. Casanova, M. Kim, J. Plank, J. Dongarra. Adaptive Scheduling for Task Farming with Grid Middleware. The International Journal of Supercomputer Applications and High Performance Computing (IJHPCA), v.13, pp.231-240, 1999.
Keywords: Farming, Master-Slave Parallelism, Scheduling, Metacomputing, Grid Computing.
[37]. (Casavant, 1988) T.L. Casavant, J.G. Kuhl. A Taxonomy of Scheduling in General-Purpose Distributed Computing Systems. In IEEE Transaction on Software Engineering, v.14 n.2, pp.141-154, 1988.
Keywords: operating systems, scheduling, distributed computing, resource management, distributed scheduling, distributed processing, operating systems (computers), scheduling.
[38]. (Chapin, 1999) S. J. Chapin, D. Katramatos, et al. The Legion Resource Management System. Proceedings of the Job Scheduling Strategies for Parallel Processing, pp.162-178, Springer-Verlag, 1999.
Keywords: parallel and distributed systems, task scheduling, resource management, autonomy.
40
[39]. (Chen, 2002) Hongtu Chen, Muthucumaru Maheswaran. Distributed Dynamic Scheduling of Composite Tasks on Grid Computing Systems. International Parallel and Distributed Processing Symposium, pp.1-10, 2002.
Keywords: Wide-area network, dynamic scheduling, scheduling algorithms, local queuing policies, task allocation, scalability, flexibility, adaptability, resource management systems.
[40]. (Coddington, 2003) Paul D. Coddington, Lici Lu, et al. Extensible job managers for grid computing. Proceedings of the twenty-sixth Australasian computer science conference on Conference in research and practice in information technology. v.16, pp.151-159, 2003. Australian Computer Society, Inc.
Keywords: Grid computing, cluster management systems, GLOBUS toolkit, PAGIS, job-manager, Java Applications.
[41]. (Compasano, 1990) Raul Compasano, Reinaldo A. Bergamaschi. Synthesis using path-based scheduling: algorithms and exercises. Conference proceedings on 27th ACM/IEEE design automation conference, pp.450-455, 1990.
Keywords: Scheduling algorithms, high-level synthesis, pipelined architecture, control-flow graph, Heuristics.
[42]. (Czajkowski a, 2001) K. Czajkowski, S. Fitzgerald, I. Foster, C.Kesselman. Grid Information Services for Distributed Resource Sharing. Proc. of the Tenth IEEE International Symposium on High-Performance Distributed Computing (HPDC-10), IEEE Press, pp.1-4, 2001.
Keywords: Grid systems, architecture, virtual organization, superscheduler, grid information protocol, distributed systems.
[43]. (Czajkowski b, 1997) K. Czajkowski, I. Foster, et al. A resource management architecture for metacomputing systems. The 4th Workshop on Job Scheduling Strategies for Parallel Processing, pp.62–82, Springer-Verlag LNCS, 1997.
Keywords: Metacomputing systems, resource management, site autonomy, co-allocation, heterogeneous substrate, policy extensibility, online control.
[44]. (Dail, 2000) Holly Dail, Graziano Obertelli, et al. Application-Aware Scheduling of a Magnetohydrodynamics Application in the Legion Metasystem. Proceedings of the 9th Heterogeneous Computing Workshop, IEEE Computer Society, pp.216-228, 2000.
Keywords: application scheduling, computational grid, dynamic scheduling, Legion, high-performance computing, distributed computing.
41
[45]. (Dogan, 2002) A. Dogan, F. Özgüner. Scheduling Independent Tasks with QoS requirements in Grid Computing with Time-Varying Resource Prices. Proc of Third International Workshop on Grid Computing –GRID 2002, Baltimore, MD, USA, LNCS 2536, pp. 58-69, 2002, Springer-Verlag.
Keywords: Grid Computing, Computational Grids, resource management, scheduling, QoS-based scheduling, GLOBUS, heuristics.
[46]. (Dongarra a, 2002) J. Dongarra, A. Yarkhan. Experiments with Scheduling Using Simulated Annealing in a Grid Environment. In M. Parashar, editor, Lecture notes in computer science 2536 Grid Computing – GRID 2002, volume Third International Workshop, pp.232-242, Baltimore, MD, USA, November 2002. Springer Verlag.
Keywords: Grid infrastructure, multiprocessor scheduling, performance model, Greedy methods, Annealing Scheduler.
[47]. (Dongarra b, 2002) J. Dongarra, Sathish S. Vadhiyar. A Metascheduler For The Grid. Proceedings of the 11 th IEEE International Symposium on High Performance Distributed Computing HPDC-11 20002 (HPDC'02), pp.343-351, 2002
Keywords: heterogeneous, network of workstations, grid environment, load sharing facility, metascheduling.
[48]. (Dongarra c, 2001) J. Dongarra, M. Shimasaki, B. Tourancheau. Clusters and computational grids for scientific computing. Parallel Computing, v. 27, Issue: 11, pp. 1401-1402, 2001
[49]. (Ekmeci’c, 1996) I. Ekmeci'c, I. Tartalja, et al. A survey of heterogeneous computing: Concepts and systems. Proceedings of the IEEE, v.84 n.8, pp.1127-1144, 1996.
Keywords: Heterogeneous computing, taxonomy, topologies, graphs, resource allocation, parallelism, algorithms, probabilistic heuristic allocation, homogeneous systems, Queuing Theory, Load Balancing.
[50]. (Feitelson a, 1998) D.G. Feitelson, L. Rudolph. Metrics and Benchmarking for Parallel Job Scheduling. In D.G. Feitelson and L. Rudolph, editor, Proc. of 4th Workshop on Job Scheduling Strategies for Parallel Processing, v.1459, pp. 1–24. Springer Verlag, 1998.
Keywords: Metrics, Benchmarking, Job Scheduling, queuing, dynamic partitioning.
[51]. (Feitelson b, 1997) D.G. Feitelson, L. Rudolph. Theory and Practice in Parallel Job Scheduling. In D.G. Feitelson and L. Rudolph, editor, Proc. of 3rd Workshop on Job Scheduling Strategies for Parallel Processing, volume 1291 of Lecture Notes in Computer Science, pp.1–34, Springer Verlag, 1997.
Keywords: distributed computing, algorithmic methods, scheduling, workload distribution, distributed memory, gang scheduling, resource manager.
42
[52]. (Foster a, 2003) I.Foster, L.Yang, et al. Conservative Scheduling: Using Predicted Variance to Improve Scheduling Decisions in Dynamic Environments. Proceedings of the ACM/IEEE Supercomputing 2003, pp.31-46, 2003
Keywords: Heterogeneous environment, conservative scheduling, CPU load, load prediction, stochastic scheduling policy, scheduling experiments.
[53]. (Foster b, 2002) I. Foster, C. Kesselman, J. Nick, S. Tuecke. Grid Services for Distributed System Integration. Journal of Computer, v.35 n.6, pp.35-46, IEEE Computer Society Press, 2002
Keywords: Open Grid Service Architecture, distributed, heterogeneous, dynamic, resource sharing, service-provider.
[54]. (Foster c, 2002) I. Foster, C. Kesselman, J. Nick, S. Tuecke. The Physiology of the Grid: An Open Grid Services Architecture for Distributed Systems Integration. Open Grid Service Infrastructure WG, Global Grid Forum, pp.1-31, 2002.
Keywords: Grid computing, Grid services, homogeneous, resource management, GLOBUS toolkit, SOAP, WSDL, Web Services.
[55]. (Foster d, 2002) I. Foster. The Grid: A New Infrastructure for 21st Century Science. Physics Today, v.55 n.2, pp.42-48, 2002
Keywords: Grid architecture, Cluster computing, Data Grids, Globus toolkit, authorization, authentication, monitoring, policies.
[56]. (Foster e, 2002) I. Foster, A. Iamnitchi. Decoupling Computation and Data Scheduling in Distributed Data-Intensive Applications. In Proceedings of the 11th IEEE International Symposium on High Performance Distributed Computing (HPDC-11), pp.1-7, 2002.
Keywords: Data-Grids, scheduling algorithms, data-intensive applications, Local scheduler, Dataset scheduler, Replication, resource allocation.
[57]. (Foster f, 2001) Foster, C. Kesselman, S. Tuecke. The Anatomy of the Grid - Enabling Scalable Virtual Organizations. Proceeding of First IEEE/ACM International Symposium on Cluster Computing and the Grid, pp.6 -31, 2001
Keywords: virtual organization, grid architecture, intergrid protocols, resource sharing, distributed computing, co- scheduling.
[58]. (Foster g, 2001) I. Foster. The globus toolkit for grid computing. Proceeding of First IEEE/ACM International Symposium on Cluster Computing and the Grid, 2001.
43
[59]. (Foster h, 2001) Foster, G. Lanfermann, et al. The Cactus Worm: Experiments with dynamic resource discovery and allocation in a Grid environment. International Journal of High Performance Computing Applications, v.15 n.4, 2001.
Keywords: Grid environment, Cactus, dynamic, heterogeneous, grid resource selection, scheduling, migration mechanisms, CONDOR, Globus toolkit.
[60]. (Foster I, 1999) Foster, C. Kesselman, et al. A. A Distributed Resource Management Architecture that Supports Advance Reservations and Co-Allocation. International Workshop on Quality of Service, 1999.
Keywords: Dynamic discovery, heterogeneous computing, resource allocation, co-allocation agents, Quality of Service, LDAP, schedulers, heuristics.
[61]. (Foster j, 1998) Foster, C. Kesselman, et al. Computational Grids, The Grid: Blueprint for a New Computing Infrastructure, Morgan-Kaufman, San Fransisco, 1998.
[62]. (Foster k, 1997) Foster, C. Kesselman. Globus: A Metacomputing Infrastructure Toolkit. International Journal of Supercomputer Applications, v.11 n.2, pp.115-128, 1997.
Keywords: metacomputing environment, communication, authentication, data access, resource services, heterogeneity, security, scheduling, GLOBUS.
[63]. (Foster l, 1997) S. Fitzgerald, I. Foster, et al. A directory service for configuring high-performance distributed computations. In Sixth IEEE Symposium On High Performance Distributed Computing, pp. 365-375, 1997.
Keywords: distributed systems, Scalability, Uniformity, Expressiveness, Extensibility, Security, Deployability, Dynamic data, metacomputing directory service.
[64]. (Frey, 2002) James Frey, Todd Tannenbaum, et al. Condor-G: A Computation Management Agent for Multi-Institutional Grids. Journal of Cluster Computing, v.5, pp.237-246, 2002.
Keywords: multi-domain resources, job management, resource selection, security, fault tolerance, Condor-G, Globus Toolkit, resource allocation, scheduling, resource discovery.
[65]. (Furmento, 2001) Nathalie Furmento, Anthony Mayer, et al. Optimization of component-based applications within a grid environment. Proceedings of the 2001 ACM/IEEE conference on Supercomputing (CDROM), pp.30-30, 2001, Denver, Colorado.
Keywords: Computational grids, intelligent scheduling, application mapper, data objects, dynamic optimization, grid information services, grid deployment services.
44
[66]. (Goldenberg, 2003) Mark Goldenberg, Paul Lu, et al. TrellisDAG: A System for Structured DAG Scheduling. In Proc. of the 9th International Workshop on Job Scheduling Strategies for Parallel Processing, volume 2862 of Lecture Notes in Computer Science, pp.21–43. Springer, 2003.
Keywords: High-performance computing, pipelines, scheduling policy, queues, high-throughput, peer-to-peer systems, CONDOR, global scheduling, placeholder scheduling.
[67]. (Gonzalez, 1997) M. J. Gonzalez. Deterministic Processor Scheduling. ACM Computing Surveys, v.9 n.3, pp.173-204, 1997.
Keywords: Deterministic scheduling, optimal schedules, multiprocessors, job-shop, flow-shop, graph structures, deadlines, resources, preemption, periodic jobs.
[68]. (Hagerup, 1997) T. Hagerup. Allocating Independent Tasks to Parallel Processors: An Experimental Study. Journal of Parallel and Distributed Computing, v.47, pp.185–197, 1997.
Keywords: Parallel machines, Self-scheduling, system induced variance, coordination, Guided Self-Scheduling, Trapezoid Self-Scheduling, Complexity.
[69]. (Hamscher, 2002) Volker Hamscher, Uwe Schwiegelshohn, et al. Evaluation of Job-Scheduling Strategies for Grid Computing. GRID 2000, pp.191-202, 2002.
Keywords: Computational grids, scheduling algorithms, scheduling policies, job-queue, Single-site scheduling, Multi-site scheduling, Global job scheduling, backfilling algorithms, homogeneous resources, metacomputing.
[70]. (Hovestadt, 2003) Matthias Hovestadt, Odej Kao, et al. Scheduling in HPC Resource Management Systems: Queuing vs. Planning. In Proc. of the 9th International Workshop on Job Scheduling Strategies for Parallel Processing, volume 2862 of Lecture Notes in Computer Science, pp.1–20. Springer, 2003.
Keywords: resource management system, heterogeneous, co-allocation, Globus, queuing systems, planning systems, scheduling policies, requesting resources.
[71]. (Jackson, 2001) D. Jackson, Q. Snell, et al. Core Algorithms of the Maui Scheduler. In D.G. Feitelson and L. Rudolph, editor, Proceedings of 7th Workshop on Job Scheduling Strategies for Parallel Processing, volume 2221 of Lecture Notes in Computer Science, pp. 87–103. Springer Verlag, 2001.
Keywords: Maui scheduling, job prioritization, backfill scheduling, quality of service (QoS), FIRSTFIT, BESTFIT, Job prioritization, priority algorithm, metascheduling.
45
[72]. (Krallmann, 1999) J. Krallmann, U. Schwiegelshohn, et al. On the Design and Evaluation of Job Scheduling Algorithms. In Fifth Annual Workshop on Job Scheduling Strategies for Parallel Processing, IPPS’99, pp.17–42, Springer–Verlag, Lecture Notes in Computer Science LNCS 1659, 1999.
Keywords: Scheduling algorithms, scheduling policy, resource requests, objective function, backfilling, SMART algorithm, list scheduling, randomized workload.
[73]. (Kruskal, 1984) C. Kruskal, A. Weiss. Allocating independent subtasks on parallel processors. IEEE Transactions on Software Engineering, 11, pp.1001-1016, 1984.
[74]. (Kwok, 1999) Y. K. Kwok, I. Ahmad. Static Scheduling Algorithms for Allocating Directed Task Graphs to Multiprocessors. ACM Computing Surveys, v.31 n. 4, pp. 406-471, 1999.
Keywords: Static Scheduling, Task Graphs, DAG, Multiprocessors, Parallel Processing, Software Tools, Automatic Parallelization.
[75]. (Laur, 1998) Rainer Laur, et al. Resource constrained modulo scheduling with global resource sharing, Proceedings of the 11th international symposium on System synthesis, IEEE Computer Society, pp.60-65, 1998.
Keywords: Global Resource Sharing, Scheduling algorithms, Modulo Scheduling, authorization functions, real time systems.
[76]. (Lee, 2000) Joseph C. Lee, Dian Rae Lopez, and William A. Royce. Task Allocation on a Network of Processors. In IEEE Transactions On Computers, v.49, pp.1339-1353, 2000.
Keywords: Scheduling Algorithms, parallel/distributed systems, approximation algorithms, optimization.
[77]. (Li, 2000) Keqin Li, Yi Pan. Probabilistic Analysis of Scheduling Precedence Constrained Parallel Tasks on Multicomputers with Contiguous Processor Allocation. In IEEE Transactions On Computers, v.49, pp.1021-1030, 2000.
Keywords: Average-case performance ratio, binary system partitioning, contiguous processor allocation, largest-task-first, parallel task, precedence constraint, probabilistic analysis, task scheduling.
[78]. (Litzkow, 1988) M.Litzkow, M.Livny. Condor -A Hunter of Idle Workstations. In Proc. of the 8th International Conference of Distributed Computing Systems, pp.104-111.University of Wisconsin, Madison, 1988.
Keywords: distributed systems, local workstations, scheduling systems, checkpointing, remote cycles, remote execution, CONDOR, CPU capacity.
46
[79]. (Liu, 2000) X.Liu, H.J.Song, et.al. The MicroGrid: a Scientific Tool for Modeling Computational Grids. In Proceedings of SC’ 2000, Texas, pp.1-22, 2000.
Keywords: Computational Grids, Micro Grid, virtual Grid, Global Coordination, Information services, Virtualizing Time, Resource Simulation, Clusters, Benchmark, Globus, Legion.
[80]. (Maheswaran a, 1999) M. Maheswaran, S. Ali, et al. Dynamic Matching and Scheduling of a Class of Independent Tasks onto Heterogeneous Computing Systems. In 8th Heterogeneous Computing Workshop (HCW’99), pp.30-45, 1999.
Keywords: Heterogeneous systems, dynamic matching, heuristics, mapping, meta-tasks, batch mode, scheduling, performance metrics, load balancing, queue, machine heterogeneity.
[81]. (Maheswaran b, 1999) M. Maheswaran, Shoukat Ali, et al. Dynamic Mapping of a Class of Independent Tasks onto Heterogeneous Computing Systems. Journal of Parallel and Distributed Computing, v.59 n.2, pp.107-131, 1999.
Keywords: Heterogeneous systems, dynamic mapping, heuristics, scheduling, load balancing, batch mode mapping, independent tasks.
[82]. (Min, 2002) Rui Min, Muthucumaru Maheswaran. Scheduling Co-Reservations with Priorities in Grid Computing Systems. Proceedings of the 2nd IEEE/ACM International Symposium on Cluster Computing and the Grid, pp.266-267, 2002.
Keywords: Grid Computing, Scheduling, Co-Reservations, Priorities, Quality of Service, Globus Architecture, scheduling algorithms, simulations.
[83]. (Neary, 2002) Michael O. Neary, Peter Cappello. Advanced eager scheduling for Java-based adaptively parallel computing. Proceedings of the 2002 joint ACM-ISCOPE conference on Java Grande, pp.56-65, 2002. ACM Press.
Keywords: Branch-and-bound, eager scheduling, fault tolerance, Grid computing, parallel computing, dynamic depth expansion, scalability.
[84]. (Ramamoorthi, 1997) R. Ramamoorthi, A. Rifkin, et al. A general resource reservation framework for scientific computing. In Scientific Computing in Object-Oriented Parallel Environments, pp. 283-290. Springer-Verlag, 1997.
Keywords: Metacomputer, Distributed resource allocation, distributed scheduling algorithms, resource managers, Central Scheduling, hierarchical infrastructure, reservation by attribute.
[85]. (Raman, 1998) R. Raman, M. Livny, et al. Matchmaking: Distributed resource management for high throughput computing. 7th IEEE International Symposium on High Performance Distributed Computing, pp.28-31, 1998.
Keywords: high throughput, distributed resource management, matchmaking algorithms, scheduling, heterogeneity, opportunistic scheduling, authentication, CONDOR.
47
[86]. (Rosenberg, 2002) Arnold L. Rosenberg. Optimal Schedules for Cycle-Stealing in a Network of Workstations with a Bag-of-Tasks Workload. In IEEE Transactions On Parallel and Distributed Systems, v.13, pp.179-191, 2002.
Keywords: Cycle stealing, bag-of-tasks workloads, heavy-tailed distributions, networks of workstations (NOWs), optimal scheduling, scheduling parallel computations.
[87]. (Schwiegelshohn a, 2002) U. Schwiegelshohn, R. Yahyapour, et. al. On Advantages of Grid Computing for Parallel Job Scheduling. In Proceedings of the 2nd IEEE International Symposium on Cluster Computing and the Grid (CC-GRID 2002), pp. 39-46, 2002.
Keywords: Grid computing, scheduling algorithms, throughput, load balancing, job sharing, machine selection, multi-site computing, WAN networks.
[88]. (Schwiegelshohn b, 2000) U. Schwiegelshohn, A. Streit, R. Yahyapour, et.al. Evaluation of job-scheduling strategies for grid computing. In Proceedings of 1st IEEE/ACM International Workshop on Grid Computing (Grid 2000), v.1971, pp.191–202, 2000.
Keywords: Grid computing, Job-Scheduling, Machine models, scheduling algorithms, Single-site scheduling, Multi-site scheduling, decentralized scheduling, backfilling algorithms.
[89]. (Schwiegelshohn c, 1998) U. Schwiegelshohn, R. Yahyapour. Analysis of First-Come-First-Serve Parallel Job Scheduling. In Proceedings of the 9th SIAM Symposium on Discrete Algorithms, pp. 629–638, 1998.
Keywords: Parallel Job Scheduling, space sharing, FCFS algorithm, gang scheduling, SMP nodes, scheduling policy.
[90]. (Smith, 1999) W. Smith, I. Foster, et al. Using Run-Time Predictions to Estimate Queue Wait Times and Improve Scheduler Performance. In D.G. Feitelson and L. Rudolph, editor, Proc. of 5th Workshop on Job Scheduling Strategies for Parallel Processing, v.1659 of Lecture Notes in Computer Science, pp. 202–219. Springer Verlag, 1999.
Keywords: High performance computing, backfill scheduling algorithms, scheduling algorithms, Run time prediction, predicting Queue, scheduler performance.
[91]. (Song, 2000) Ha Yoon Song, Richard A. Meyer, et al. An empirical study of conservative scheduling. Proceedings of the fourteenth workshop on Parallel and distributed simulation, pp.165-172, 2000, Italy.
Keywords: Conservative scheduling, scheduling algorithms, conservative protocols, conservative overheads, bipartition queue, granularity.
[92]. (Spencer, 2002) Matthew Spencer, Renato Ferreira, et al. Executing multiple pipelined data analysis operations in the grid. Proceedings of the 2002 ACM/IEEE conference on Supercomputing, IEEE Computer Society Press, pp.1-18, 2002.
Keywords: Grid computing, Data analysis, filters, pipelined algorithm, scheduling algorithms, dynamic environment.
48
[93]. (Spooner a, 2003) D. P Spooner, SA Jarvis, et al. Local Grid Scheduling Techniques using Performance Prediction. IEE Proceedings - Computers and Digital Techniques, v.150 n.2, pp.87-96, 2003.
Keywords: Grid computing, scheduling architecture, dynamic, resource discovery, monitoring, performance prediction, task allocation, GA algorithms.
[94]. (Streit, 2002) A. Streit. A Self-Tuning Job Scheduler Family with Dynamic Policy Switching. In Proc. of the 8th Workshop on Job Scheduling Strategies for Parallel Processing, volume 2537 of Lecture Notes in Computer Science, pp.1–23. Springer, 2002.
Keywords: resource management, gang-scheduling, dynamic policy, scheduling algorithms, self-tuning, workloads, backfilling schedulers, metrics.
[95]. (Su a, 1999) Alan Su, Francine Berman, et al. Using AppLeS to Schedule Simple SARA on the Computational Grid. International Journal of High Performance Computing Applications, v.13 n.3, pp.253-262, 1999.
Keywords: Computational Grids, Application-Level Scheduler, resource selection, agents, dynamic, distributed data applications.
[96]. (Simirni, 2002) Evgenia Smirni, Barry G. Lawson. Multiple-queue backfilling scheduling with priorities and reservations for parallel systems. SIGMETRICS Perform. Eval. Rev, v.29 n.4, pp.40-47, ACM Press, 2002.
Keywords: batch schedulers, computational grids, parallel systems, backfilling schedulers and performance analysis.
[97]. (Subramani, 2002) V. Subramani, et al. Distributed Job Scheduling on Computational Grids using Multiple Simultaneous Requests. 11th IEEE International Symposium on High Performance Distributed Computing (HPDC-11), pp.359-368, 2002.
Keywords: Computational Grids, Job scheduling, multiple requests, Metascheduling schemes, Hierarchical scheme, performance, k-distributed model, Queue Model, performance evaluation, dual queue scheduling.
[98]. (Takefusa, 1999) Takefusa, S. Matsuoka, et al. Overview of a Performance Evaluation System for Global Computing Scheduling Algorithms. In Proc. of 8th IEEE Int. Symposium. on High Performance Distributed Computing, pp.97–104, 1999.
Keywords: Global computing systems, scheduling schemes, scheduling algorithms, Brick scheduling, communication model, monitor.
[99]. (Tse, 2003) K. W. Tse, W. K. Lam, et al. Reservation aware operating system for grid economy. SIGOPS Oper. Syst. Rev. v.37 n.3, pp.36-42, 2003, ACM Press.
Keywords: Grid Economy, Grid computing, resource reservation, scheduling, multi-agent system, quality of service, co-reservation, advance reservation system, GLOBUS.
49
[100]. (Weissman a, 1999) Jon B. Weissman. Prophet: automated scheduling of SPMD programs in workstation networks. Concurrency: Practice and Experience, 11(6), pp.301–321, 1999.
Keywords: network computing, data parallel, automated scheduling, SPMD applications, resource sharing, dynamic schemes, load distribution, topology, heuristics, complexity.
[101]. (Weissman b, 1998) J. Weissman, X. Zhao. Scheduling Parallel Applications in Distributed Networks. Journal of Cluster Computing, pp.109–118, 1998.
Keywords: Distributed Networks, parallel applications, run-time scheduling, SPMD applications, resource heterogeneity, wide-area scheduling, dynamic, data parallel, scheduling algorithms.
[102]. (Wolski a, 2003) Rich Wolski. Experiences with predicting resource performance on-line in computational grid settings. ACM SIGMETRICS Perform. Eval. Rev. v.30 n.4, pp.41-49, 2003, ACM Press.
Keywords: Computational Grids, performance prediction, dynamic, resource allocation, scheduling, storage systems, Network weather forecast services.
[103]. (Wolski b, 1999) R. Wolski, N. T. Spring, et al. The Network Weather Service: A distributed resource performance forecasting service for metacomputing. The Journal of Future Generation Computing Systems, pp.1-19, 1999.
Keywords: Metacomputing, performance prediction, Network weather forecast, network monitoring, network aware, distributed computing.
[104]. (Wright, 2001) D Wright. Cheap Cycles from the Desktop to the Dedicated Cluster: Combining Opportunistic and Dedicated Scheduling with Condor. Proceeding of HPC Revolution 01, Illinois, 2001.
Keywords: Computational resources, scheduling algorithms, CONDOR, CPU cycles, dedicated resources, dedicated schedulers, Opportunistic scheduling, backfilling, resource reservations.
[105]. (Yang, 2003) Kun Yang, Xin Guo, et al. Towards efficient resource on-demand in Grid Computing. SIGOPS Oper. Syst. Rev. v.37 n.2, pp.37-43, 2003. ACM Press.
Keywords: Grid Computing, Quality of Service (QoS), Resource on Demand (RoD), Efficiency, Active Networks (AN), Policy-based Management (PBM).
[106]. (Zhang a, 2003) Yanyong Zhang, Hubertus Franke, et al. An Integrated Approach to Parallel Scheduling Using Gang-Scheduling, Backfilling, and Migration. In IEEE Transactions On Parallel and Distributed Systems, v. 14, pp.236-247, 2003.
Keywords: Parallel scheduling, Gang-scheduling, Backfilling, Migration, priority queue, Co-allocation, Performance metrics, scheduling matrix.
50
[107]. (Zhang b, 2003) Xuehai Zhang, Jeffrey L. Freschl, et al. A performance Study of Monitoring and Information Services for Distributed Systems. In Proceedings of HPDC-12, pp 1-12, 2003.
Keywords: Distributed Systems, Monitoring, MDS2, Globus Toolkit, GMA, R-GMA, Hawkeye tool, Performance Metrics.
[108]. (Zhao, 2004) Henan Zhao, Rizas Sakellarious. A Hybrid Heuristic for DAG Scheduling on Heterogeneous Systems. In Proceedings of Heterogeneous Computing Workshop (HCW)-13, IEEE Computer Society, pp.1-20, 2004.
Keywords: DAG Scheduling, hybrid heuristics, Independent tasks, optimization.
51
Appendix – II
Annotated Bibliography
[1]. [ Francine Berman, Andrew Chien, Keith Cooper, et al. The GrADS Project: Software support for high-level Grid application development. The International Journal of High Performance Computing Applications, v.15 n.4, pp.327–344, 2001.
This is major paper where the authors outline the vision and strategies for the Grid Application
Development Software (GrADS) project. Two concepts of their approach are: (a) An application
is a configurable object. (b) Performance contracts that specify the expected performance as a
function of available resources. The system uses feedback to monitor and to adapt the
application to the available resources. The architecture includes a Program Preparation
System that grid-enables the application. The execution environment includes a Grid
Resources Management Service to provide ability to reserve, allocate, configure and manage
collections of resources that match an application's needs. A real-time monitor tracks the actual
performance against the contract guarantees. A dynamic optimizer computes at load time, to
tailor object program to the actual runtime environment. Two software test beds are also
included as a part of GrADS such as Micro Grid and Macro Grid. The two test beds are used to
provide both a configurable and a fixed environment for studying grid behavior. Micro grid uses
clusters of workstations and software tools to develop a repeatable and observable test bed.
The Macro grid integrates computational and network resources at the GrADS institutions for a
more realistic test bed.
[2]. Francine Berman, Richard Wolski, Henri Casanova, et al. Adaptive Computing on the Grid Using AppLes. In IEEE Transactions On Parallel and Distributed Systems, v.14, pp.369-382, 2003.
Berman et al, 2003, describe Application Level Scheduling (AppLeS) project and its underlying
adaptive scheduling approach. The major feature is to combine the benefits of both static
partitioning and dynamic self-scheduling. Depending on computational power and predictive
errors, the job can be partitioned and allocated to different resources. The major part of the
work is partitioned statically, based on dependable performance of the resource set. The
values of the dependable resource set are determined from the prediction of available
resources and the estimate of error in the prediction by Network Weather Service. The
remaining work is dynamically self-scheduled depending on the actual completion of preceding
work segments/tasks. AppLeS tuning performance was 2.5 times better than hand-tuned
scheduling for the example application analyzed. For another application involving long running
jobs and file sharing, the authors developed XSufferage heuristic to take into account data
transfer costs and data reuse. System Model: The system assumes independent jobs, with no
52
inter-dependencies. The resources constitute a heterogeneous set. The system architecture
consists of a central server, which allocates jobs to agents.
[3]. Yanyong Zhang, Hubertus Franke, et al. An Integrated Approach to Parallel Scheduling Using Gang-Scheduling, Backfilling, and Migration. In IEEE Transactions On Parallel and Distributed Systems, v. 14, pp.236-247, 2003.
This paper focuses on the idea that traditional approaches for scheduling a job on a set of
dedicated nodes can result in low system utilization and large wait times. The authors discuss
three techniques that can be used beyond the traditional space sharing scheduling, namely:
backfilling, gang scheduling, and migration. They analyze the effects of combining these
techniques with respect to various performance criteria. They first look at the idea of
conservative back filling for filling up the holes in scheduling by using smaller jobs out of order
from the queue, and executing it. The condition is that such a job must complete before the
scheduled start time of the next scheduled job. Their second approach is to use a technique
known as gang scheduling. This is essentially the idea of time-sharing added to that of space
sharing. They show that this is quite effective in reducing wait time, while increasing the
apparent execution time. Their third approach is that of migration i.e. dynamically being able to
move tasks of a parallel job. It can mitigate fragmentation in the schedule. It is important when
collocation in space or time of tasks is necessary.
[4]. H. Casanova, H. Dail, F. Berman. A Decoupled Scheduling Approach for Grid Application Development Environments. Journal of Parallel and Distributed Computing (JPDC), Special issue on Grid Computing, v.63 n.5, pp.505-524, 2003.
This is a major paper, where the authors have developed a search methodology for selection of
appropriate resources needed for an application and mapping of data and tasks to selected
resources. The scheduler is decoupled from application-specific and platform specific
components. The schedule returned by the scheduler consists of ordered list of machines and
a mapping of data/tasks to those machines. The search procedure categorizes the machines
into Candidate Machine Groups (CMG) based on connectivity and network latency. Instead of
generating all possible combination of CMGs, the search procedure tries to match resource
needs of an application in a hierarchical way, first matching for site locations (nearness), then
matching for computation power and memory needs and then exhaustively searching for
resources within the subset of CMGs. Resources availability information is obtained from
multiple sources like NWS and MDS. The developed scheduling methodology was developed
in collaboration with the Grid Application Development Software (GrADS) project team.
53
[5]. Matthias Hovestadt, Odej Kao, et al. Scheduling in HPC Resource Management Systems: Queuing vs. Planning. In Proc. of the 9th International Workshop on Job Scheduling Strategies for Parallel Processing, volume 2862 of Lecture Notes in Computer Science, pp.1-20. Springer, 2003.
Hovestadt et al, discuss differences between two different paradigms for Scheduling. In
queuing systems, a job is placed in a queue, specifying different limits on the number of
requested resources and duration. The requests in the queue are serviced, based on the
scheduling policy. For example, the scheduling policy may indicate that the requests in the
queue may be serviced in First In First Out (FIFO) order. Queue-based servicing of requests
does not take into consideration the variations in load. Nor does it provide assured service
standards. If there is extensive load on the nodes, a service request may have to wait a long
time before it is taken up. In planning systems, the resource allocations are reserved in
advance. Thus when a job needs to be executed, a planning system gathers information about
the resource needs. Based on the availability of resources, the planning system reserves a
service time for every job.
[6]. R. D. Blumofe, Park, D. S. Scheduling large-scale parallel computations on network of workstations. 3rd IEEE International Symposium on High-Performance Distributed Computing. pp. 96 -105, 1994.
Blumfofe, et al, describe how their Phish system implements the idle-initiated scheduler that
they have developed. The Idle-initiated scheduler consists of a Macro-level scheduler and a
Micro-level scheduler. The Macro-level scheduler determines which workstations in a network
of workstations are idle so as to allocate parallel jobs to them, and a Micro-level Scheduler
allocates tasks of a job to participating processors. Phish, named after a Band from Burlington,
Vermont, is a software package for running parallel applications on a network of workstations.
Each workstation needs to run a Phish JobManager Process. A Phish JobQ is maintained at
server, which receives all the jobs. The JobManager communicates with the JobQ at most
every 30 seconds to get a job. If the owner of the workstation is using it, the workstation is not
available for the parallel jobs. The JobManager checks every 5 minutes to see if the processor
is idle. If so, it approaches the JobQ to get a job allocation. The Clearinghouse keeps track of
all the worker programs.
[7]. T.D. Braun, H.J. Siegel, et al. A Taxonomy for Describing Matching and Scheduling Heuristics for Mixed-machine Heterogeneous Computing Systems. IEEE symposium on Reliable Distributed Systems, pp.330-335, 1998.
This is a major paper in which the authors describe the Purdue Heterogeneous computing
taxonomy. The taxonomy is based on characterization of three major components: (1)
application model and communication based on characteristics, such as, size, type,
communication patterns, data availability, deadlines, priorities and Quality of Service
54
requirements. (2) Platform model based on characteristics, such as, analytical benchmarks,
interconnection network, machine heterogeneity and task compatibility. (3) Mapping strategy
using application model support, communication times, execution location, fault tolerance,
static and dynamic mapping.
[8]. Rajkumar Buyya, Anthony Sulistio, et al. A Taxonomy of Computer-based Simulations and its Mapping to Parallel and Distributed Systems Simulation Tools. International Journal of Software: Practice and Experience, v.34 n.7, pp.653-673, Wiley Press, 2004
A Taxonomy for Computer Based Simulations, with a focus on Parallel and Distributed
Systems, is outlined by Buyya et al. Parallel Distributed systems can be Parallel Systems, such
as, Symmetric Multi Processor (SMP) or Massively Parallel processing (MPP) systems, or
Distributed systems connected through internet, intranet, mobile or telephony systems.
Simulation and emulation tools in the area of PDS are analyzed and mapped into the proposed
taxonomy based on PDS, Usage, Simulations, and design. A number of open source
simulation tools are listed and compared with one another.
[9]. Rajkumar Buyya, Alexander Barmouta. GridBank: A Grid Accounting Services Architecture (GASA) for Distributed System Sharing and Integration. In International Parallel and Distributed Processing Symposium. IPDPS2003: IEEE Computer Society Press, pp.1- 8, 2003.
Buyya and Barmouta, present a Grid Accounting Services Architecture (GASA) on the lines of
banking principles, employing (a) pay before use, (b) pay as you go (c) pay after use policies.
The paper outlines Grid Bank infrastructure to address grid usage accounting. Three major
components of the architecture are the GridBank, GridServiceProvider and the GridService
Consumer. The paper also focuses on the protocols for inter-action among the three entities
and the format for Resources Usage Records. The records include user identification as well
as grid resources usage, such as, host type, cpu time, and software services.
[10]. Rajkumar Buyya, Muthucumaru Maheswaran, et al. A taxonomy and survey of grid resource management systems for distributed computing. The Journal of Software Practice and Experience, v.32 n.2, pp.135–164, 2002.
Resources Management System (RMS) in a Grid provides resource discovery, dissemination
and scheduling functions. A RMS interacts with the operating system, other subsystems of grid
that provide for accounting/billing, security, as well as, grid tool kit based application programs.
This taxonomy is focused on RMS, and does not cover application models or target platform
models. The RMS attributes, such as, Machine Organization (flat, cells or hierarchical),
resources model (schema or object), resource namespace organization (relational, hierarchical
or graph), Quality of Support (soft, hard or none), resource discovery and dissemination
(network directory, or distributed objects), scheduler organization (centralized, hierarchical, or
decentralized), state estimation (predictive or non-predictive), and rescheduling (periodic, or
55
event-driven) and scheduling policy (fixed or extensible) have also to be taken into
consideration while classifying the resource management systems.
[11]. Rajkumar Buyya, Baikunth Nath, et al. Nature's Heuristics for Scheduling Jobs on Computational Grids. The 8th IEEE International Conference on Advanced Computing and Communications (ADCOM 2000), 2000, India.
Job scheduling in a grid environment is a NP-Complete problem and requires use of heuristics
to search for near optimal solutions. The authors discuss analogy of genetic algorithms (GA)
involving natural selection heuristics in reproduction, and simulated annealing (SA) of metals
that achieve minimum-energy crystalline structure, in the context of job scheduling. They
develop a hybrid approach to grid scheduling involving GA-SA that combines the benefits of
parallelizability of GA and convergence of SA. The authors also employ Tabu Search to guide
heuristics search to overcome local minima and local maxima.
[12]. D. Jackson, Q. Snell, et al. Core Algorithms of the Maui Scheduler. In D.G. Feitelson and L. Rudolph, editor, Proceedings of 7th Workshop on Job Scheduling Strategies for Parallel Processing, volume 2221 of Lecture Notes in Computer Science, pp. 87–103. Springer Verlag, 2001.
Maui Scheduler is a popular scheduler used in heterogeneous computing environments. The
authors describe the algorithms used in Maui scheduler for backfill, job prioritization and
fairshare, including configurability and parameterization. Based on job prioritization and job's
estimated run time, reservations are made for the resources. Backfill then fills the gaps
between reserved slots with smaller and short running jobs. The paper describes the
algorithms and parameters used by Maui algorithms.
[13]. M.Litzkow, M.Livny. Condor -A Hunter of Idle Workstations. In Proc. of the 8th International Conference of Distributed Computing Systems, pp.104-111.University of Wisconsin, Madison, 1988.
This is a milestone paper in which the author’s describe Condor system that attempts to utilize
idle capacity of workstations. A metric called "leverage" is introduced, and is defined as the
ratio of capacity of a workstation consumed by job remotely to the capacity consumed on the
home station to support remote execution. The Up-Down algorithm that they use attempts to
provide steady access to heavy users while ensuring fair access to light users. The time a user
waits for remote cycles is weighted in, and a user, who is receiving denial-of-access to the
remote service for a long time, gets higher priority. This is implemented by maintaining a
schedule-index.
[14]. Mark Goldenberg, Paul Lu, et al. TrellisDAG: A System for Structured DAG Scheduling. In Proc. of the 9th International Workshop on Job Scheduling Strategies for Parallel Processing, volume 2862 of Lecture Notes in Computer Science, pp.21–43. Springer, 2003.
TrellisDAG, described in Goldenberg et al, is a part of the Trellis project, which aims at creating
56
a metacomputer. TrellisDAG is layered on top of the placeholder technique to take care of
dependencies among jobs. It introduces the concept of placeholder scheduling, in which each
placeholder represents a potential unit of work. The conceptual model is of a central server and
a set of computational nodes. The server receives all the jobs. A set of programs, called
services, forms a layer on both the server and the nodes. A Placeholder is placed on each of
the nodes. SSH is used for communication across administrative domains. Each of the
placeholders pulls the run-time parameters from the central server, instead of using a push
from the central server. Unlike Condor, TrellisDAG execution hosts are loosely coupled and
can be quickly deployed. Trellis allows modeling of hierarchical relationships amongst various
jobs. Jobs are organized into groups, prolog groups and epilog groups. Prologue groups do not
have subgroups. Epilogue groups are executed only after all other groups at the same level are
executed. Jobs within a group are executed in the order of submission. A super group jobs are
executed only after its sub groups are executed.
[15]. Henri Casanova. Simgrid: A Toolkit for the Simulation of Application Scheduling. In Proceedings of the First IEEE/ACM International Symposium on Cluster Computing and the Grid (CCGrid 2001), pp. 430-437, 2001.
Casanova outlines the features of Simgrid, a simulation toolkit for studying scheduling
algorithms for distributed systems. Simgrid uses event driven scheduling. Simgrid provides
model and abstraction, rapid prototyping and evaluation of schedulers. Simgrid provides API's
for defining resources, tasks and for simulating the grid computing to generate the results of
simulation. Scheduling algorithms usually depend upon the prediction of execution time by the
user. However these values may not be accurate. Simgrid can add statistically an error to the
predicted duration of a task before starting the process of scheduling. Simgrid can simulate the
grid application at higher speeds by taking a coarser time scale, if required. Experimental
results show that Simgrid can rank the scheduling algorithms well, and reports reasonable
accuracies.
[16]. Paul D. Coddington, Lici Lu, et al. Extensible job managers for grid computing. Proceedings of the twenty-sixth Australasian computer science conference on Conference in research and practice in information technology. v.16, pp.151-159, 2003. Australian Computer Society, Inc.
Coddington et al, describe the limitations of Globus Global Resource Allocation and
Management (GRAM) and propose some extensions to enable staging of necessary files for
executing a job, enabling submission of group of jobs together, passing run time parameters for
encoding job execution environments needed to run a job (such as running a Java program
within JVM) within GRAM.
[17]. K. Czajkowski, I. Foster, et al. Resource management architecture for metacomputing systems. The 4th Workshop on Job Scheduling Strategies for Parallel Processing, pp.62–82,
57
Springer-Verlag LNCS, 1997.
Czajkowski et al, describe the architecture of Globus Resource Management System (RMS).
Extensible Resource Specification Language (RSL) for a precise description of the resources is
a part of the architecture. RSL is based on the syntax for filter specifications in the Light weight
Directory Access Protocol. It is used for communication among the components of RMS.
Globus Resource Allocation Manager (GRAM) manages the resources at the local level and
interacts with the Co-allocator. Metacomputing Directory Service (MDS) provides information
about availability and capability of resources. Applications contact the Broker for getting the
application processed. They have to interact with the Information Service and the Co-allocator
(for allocating resources at different sites simultaneously). Brokers may be associated with a
group of resources. The Brokers may have to negotiate with other Brokers before the
scheduling for an application can be done. A Gatekeeper, using the Globus Security
Infrastructure for authentication, is a part of the architecture.
[18]. Junwei Cao, Stephen A. Jarvis, et al. ARMS: An agent-based resource management system for grid computing. Scientific Programming, v.10 n.2, pp.135–148, 2002.
Cao et al, describe an Agent based Resources Management System (ARMS) that uses a
hierarchy of agents for the purpose. Each agent maintains Agent Capability Tables (ACTs) to
record service information about other agents. When a request exhausts a search, but does
not find a matching service, it will query an agent higher in the hierarchy or terminate the
discovery process. Agents for the more powerful nodes in the grid will be higher in the
hierarchy of agents. Thus, when a request's need for resources cannot be met by nodes lower
in the agent hierarchy, the request propagates to higher agents.
[19]. D.G. Feitelson, L. Rudolph. Metrics and Benchmarking for Parallel Job Scheduling. In D.G. Feitelson and L. Rudolph, editor, Proc. of 4th Workshop on Job Scheduling Strategies for Parallel Processing, v.1459, pp. 1–24. Springer Verlag, 1998.
Feitelson and Rudolph propose some metrics for parallel job scheduling algorithms, and a
standard workload to benchmark performance for a quantitative comparison. Workload
description consists of two components: job arrival time (characteristics such as, time of day,
day of week variations) and job structure (modeling based on equations describing job
behavior or analytical methods to compute runtime from job structure). The proposed
benchmark suite consists of a family of programs that depend on parameters that describe
their granularity. The authors describe parameters for different types of workloads, such as,
Rigid jobs (e.g. number of processors and execution time), Work pile (no internal job structure),
Barriers (e.g., number of barriers) and Fork-join (e.g. sequence of parallel loops with different
degrees of parallelism).
58
[20]. D.G. Feitelson, L. Rudolph. Theory and Practice in Parallel Job Scheduling. In D.G. Feitelson and L. Rudolph, editor, Proc. of 3rd Workshop on Job Scheduling Strategies for Parallel Processing, volume 1291 of Lecture Notes in Computer Science, pp.1–34, Springer Verlag, 1997.
This is a major paper were Feitelson et al, review the theory of parallel job scheduling and
contrast with the practices. The author review different models for Partition specification (fixed,
variable, adaptive and dynamic), job flexibility (rigid, moldable, evolving and malleable), level of
preemption (none, local, migratable and gang scheduling), knowledge available to scheduler
about the workload (none, workload, class and job) and memory (distributed and shared). The
authors make recommendations to improve performance of schedulers. They then propose a
PSCHED standard that aims at standardizing interactions among various components involved
in scheduling parallel jobs, such as, message passing libraries, task managers, resource
managers and schedulers. PSCHED APIs cater to two areas, one to spawn, control, monitor
and signal tasks of jobs, and second, for a set of calls for batch job schedulers. PSCHED is
visualized for a meta centre. A meta center is a computing resource, where jobs can be
scheduled and run on a set of heterogeneous machines, physically distributed over a
geographical area but connected through a network.
[21]. D. P Spooner, SA Jarvis, et al. Local Grid Scheduling Techniques using Performance Prediction. IEE Proceedings - Computers and Digital Techniques, v.150 n.2, pp.87-96, 2003.
Spooner et al have applied Iterative heuristic algorithms for performance prediction for job
schedulers in Grid environments. Workload managers, Distribution Brokers and Globus
Interoperability providers comprise the TITAN system's hierarchy, with Globus forming the
highest level in the hierarchy. The lowest level consists of schedulers for managing physical
resources. Three algorithms such as Deadline Sort, Deadline Sort with Node Limitation and
Genetic Algorithm have been tested. The Genetic Algorithm approach is found to be less
sensitive to the number of processing nodes and to minor changes in the characteristic of the
resource. Moreover it can take into account a large number of parameters of the resource
system. The genetic algorithm is able to converge fast to balance the three objectives of
makespan, idle time and the QoS metric of deadline time. TITAN uses Performance Analysis
and Characterization Environment (PACE) to predictor and brokers to meet performance
objectives of a scheduler.
[22]. Rich Wolski. Experiences with predicting resource performance on-line in computational grid settings. ACM SIGMETRICS Perform. Eval. Rev. v.30 n.4, pp.41-49, 2003, ACM Press.
This is a major paper in which the author focuses on the grid computing environments, the
computational resources as well as Network Weather Service (NWS) uses a mixture-of-experts
59
approach for predicting performance characteristics of resources when a given job is to be
executed. A set of forecasting models, each with its own parameters, is configured. Using
performance history, a forecast is generated for each element of history, using all previous
observations, by each model. The method of estimating performance using all previous
observations is termed as "postcasting". Errors are computed by comparing the forecasts with
performance history. A model that predicts most accurately is chosen for the required forecast.
Every forecast is based upon the most recent measurements. The forecast will use the
statistical method, which has given the least aggregate error up to the instant when the
forecast is required. Thus NWS provides adaptive prediction for grid applications.
[23]. A.H. Alhusaini, V.K.Prasanna, et al. A unified resource-scheduling framework for heterogeneous computing environments. In 8th Heterogeneous Computing Workshop (HCW’99), pp.156-165, 1999.
A unified framework for resource scheduling is presented by Alhusaini et al (1999). This work is
a part of the Management System for Heterogeneous Networks (MSHN) project. The model is
of applications competing for resources such as compute cycles and memory of compute
nodes, communication network and data repositories or file servers. The model assumes that
no more than one sub-task can access a data repository at a time. Communication latencies
and data transfer time per byte between nodes are pre-specified. The goal of the framework is
to minimize the execution time of a group of jobs. Each Application is modeled as a Directed
Acyclic Graph. A job consists of many sub-tasks. For each sub-task, the resource requirements
are specified. The issues of bandwidth of the communication links, the selection of an
appropriate repository - in case of replicated data repositories, and advance reservation of
resources have been considered by the unified framework. Simulation has been done for 4
different scheduling algorithms, based on the unified approach and the Baseline algorithm,
which uses scheduling for each application separately. The framework takes into account
Quality of Service factors. But the algorithm to take into account QoS has not been reported in
this paper. The simulation results show that scheduling through unified approach is significantly
better than scheduling each type of resources separately.
[24]. P. Agarwal, Rashmi Bajaj. Improving Scheduling of Tasks in a Heterogeneous Environment. IEEE Transactions on parallel and Distributed systems, v.15 n.2, pp.107-118, 2004.
In this paper Bajaj and Agrawal (2001) have extended the Task duplication based approach for
scheduling an application on a homogenous cluster to a network of heterogeneous systems
(TANH). The scheduling algorithm attempts to schedule tasks to minimize makespan. The
performance is compared with that of the Best Imaginary Level Scheduling (BIL). TANH has
been tested exhaustively with four DAGS of 3000 nodes each. TANH algorithm is found to
60
reduce the makespan substantially below the value obtained from using BIL. TANH, with the
simplifying assumptions and in a homogeneous environment, has a polynomial complexity and
can be scaled easily upwards.
[25]. J. H. Abawajy. An integrated resource scheduling approach on cluster computing systems. In the Proceedings of the IEEE International Parallel and Distributed Processing Symposium (IPDPS 2003), pp. 251- 256, 2003.
Abawajy (2003) presents an Adaptive Distributed Scheduling (ADS) policy in a cluster-
computing environment. The ADS policy combines the benefits of Local (executing a job on
the site that originated it) and Affinity (executing a job on a site that has the required data files)
policies. The ADS job scheduler initially sends a job to a site that has the data files. But if the
site is busy, it forwards the request for Local execution. The performance metric also accounts
for network latency and time for data transfer. Discrete event Simulations show that the ADS
policy performs substantially better than either Local or Affinity policies.
[26]. J. H. Abawajy. Performance analysis of parallel I/O scheduling approaches on cluster computing systems. In the Proceedings of 3rd IEEE/ACM International Symposium on Cluster Computing and the Grid (CCGrid 2003), pp.724- 729, 2003.
I/O represents a growing fraction of application execution time, particularly in large clusters
located at geographically larger distances. The system model consists of a central data
scheduler with multiple sources of data. Every compute node of the cluster sends its I/O
request to the central data scheduler. Abawajy, (2003b) presents two algorithms for the data
scheduler: Equi-partition (EQUI) I/O policy and Adaptive Equipartition (AEQUI) I/O policy.
Average load is computed based on outstanding requests and number of I/O servers. In
Equipartitioning policy, it assigns a set of requests to a server that has the least number of
outstanding requests, until its load equals the average load. Adaptive Equipartitioning algorithm
is similar to Equipartitioning, but it also accounts for the backlog of jobs at a server, moving
jobs from heavily loaded servers to lightly loaded ones.
[27]. J. H. Abawajy, S.P. Dandamudi. Scheduling Parallel Jobs with CPU and I/O Resource Requirements in Cluster Computing Systems. In Proceedings of the 11th IEEE/ACM International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunications Systems (MASCOTS'03), IEEE Computer Society Press, pp.336-343, 2003.
The authors proposed Adaptive Hierarchical Scheduling policy (AHS) for multi-cluster
environments. Each cluster is geographically compact and contains a set of workstations. A
hierarchy of schedulers is adopted with a system scheduler at the top of hierarchy, a cluster
scheduler at cluster level, and local scheduler at the leaf level. Jobs are submitted to system
scheduler and maintained in a queue until they are assigned to lower levels. When a scheduler
at a lower level is ready to accept jobs, it updates its parent. This recursive messaging keeps
61
system scheduler updated about available leafs (self scheduling). AHS performs Affinity
scheduling, self-scheduling and job assignment. It takes into account co-scheduling of both the
jobs and the I/O. AHS is reported to be substantially better than a policy of Static space
sharing with time-sharing of processors only.
[28]. Francine Berman, Richard Wolski, et al. Application Level Scheduling on Distributed Heterogeneous Networks. In Proceedings of Supercomputing 1996, pp.1-28, ACM Press, 1996.
Berman and Wolski 1996, outline features of application centric scheduling in which systems
are evaluated in terms of their impacts on the Application. They also describe AppLes
scheduling agents. AppLes develop a customized scheduler for each application. The AppLes
scheduling agent inter-acts with Resource Management Systems of grid tool-kits like Globus or
Condor to meet the performance goals of the application. Resource information is obtained
through NWS. Experiments prove that the AppLes approach, based on the prediction of the
availability of resources is able to outperform the block-partitioning approach. The active Agent
in AppLes is called the Coordinator. The 4 subsystems, with which the Agent works, are: The
Resource Selector, The Planner for generating a schedule for the available resources, The
Performance Estimator, for generating performance estimates for each of the schedules
generated by the Planner and the actuator, which implements the best schedule.
[29]. Francine Berman. High-performance schedulers. In Ian Foster and Carl Kesselman, editors, The Grid: Blueprint for a New Computing Infrastructure, pp.279–309, Morgan Kaufmann, San Francisco, CA, 1999.
Berman (1999) discusses the characteristics of high performance schedulers and analyzes
several published schedulers according to scheduling models (Scheduling policy and Program
Model) they implement. The author also lists some of the challenges, such as, weighing
Portability against Performance, scalability, efficiency, repeatability, multi-scheduling (co-
existing with other schedulers) and grid aware programming. This is a survey of the challenges,
the available solutions and the goals of the research in application-centric schedulers.
[30]. Francine Berman, Walfredo Cirne. When the Herd Is Smart: Aggregate Behavior in the Selection of Job Request. In IEEE Transactions On Parallel and Distributed Systems, v.14, pp.181-192, 2003.
The features of Super Computer AppLes (SA) scheduler is outlined in Cirne and Berman
(2003). A moldable application gives to the SA all the information required for processing the
job. The SA inter-acts with the scheduler of the super-computer and determines the kind of
request that would result in the lowest value of turn-around time. Thus SA operates as a
request selector out of the various possible requests that can be made for a moldable
application. Then the SA makes that request to the scheduler of the supercomputer on behalf
of the application. The paper studies the aggregate behavior when a large number of SAs are
62
inter-acting with the same scheduler of a Super computer. Statistical analysis of four reference
workload logs is used to understand the job moldability. The degradation in performance due to
contention between multiple SA's is observed at low loads. But at medium and heavy loads,
the wait times by applications reduce when many instances of SAs are present. This is
because the schedulers of supercomputers would usually accept shorter jobs first. Multiple SAs
therefore find that the queues for execution will have less “holes”, when there is a greater
competition among a larger number of SAs.
[31]. R. Buyya, D. Abramson, et al. An Evaluation of Economy-based Resource Trading and Scheduling on Computational Power Grids for Parameter Sweep Applications. Workshop on Active Middleware Services (AMS 2000), (in conjunction with Ninth IEEE International Symposium on High Performance Distributed Computing), Kluwer Academic Press, 2000, Pittsburgh, USA.
Economy based resource trading and scheduling is evaluated in Buyya et al, (2000). The
system uses resource broker Nimrod/G, which supports a simple declarative parametric
modeling language for parametric experiments. It has been designed for working with
middleware like Globus, Legion, Ninf and NetSolve. However every middleware does not have
equal facilities for online trading. If Nimrod/G and system level infrastructure called GRACE
(Grid Architecture for computational economy) are both used along with the middleware of
choice, it is possible to have dynamic online trading. In this paper, a simulation of such a
system for parameter sweep studies has been done for three scheduling algorithms, which
provide (a) time minimization limited by budget, (b) cost minimization limited by deadline to
complete, and (c) None minimization, limited by deadline and budget.
[32]. R. Buyya, D. Abramson, J. Giddy. Nimrod/G: An Architecture for a Resource Management and Scheduling System in a Global Computational Grid. Proceeding of the HPC ASIA’2000, the 4th International Conference on High Performance Computing in Asia- Pacific Region, Beijing, China, IEEE Computer Society Press, USA, 2000
Architecture of Nimrod/G, a grid enabled resources management and scheduling system, is
described in Buyya et al, (2000). The system has user interface, a parametric engine
(persistent job control agent), a scheduler, a dispatcher (that initiates job execution on a
selected resource under the direction of scheduler, and a job-wrapper (for staging of tasks and
data and sending the results back). Nimrod/G is designed to work with the Globus tool-kit. In
particular, it can take the Resource discovery from the MDS of Globus and use the information
for scheduling by using its economy based trading for resources.
[33]. T.L. Casavant, J.G. Kuhl. A Taxonomy of Scheduling in General-Purpose Distributed Computing Systems. In IEEE Transaction on Software Engineering, v.14 n.2, pp.141-154, 1988.
The authors present the taxonomy of Scheduling in a general purpose distributed computing
63
systems environment. The taxonomy uses a hierarchical categorization distinguishing local
and global, static and dynamic, distributed and non-distributed, co-operative and non-
cooperative, optimal and sub optimal and approximate and heuristic schedulers. The
classification also takes into account other distinctive characteristics of scheduling systems,
such as, adaptive versus non-adaptive, load balancing, bidding (having a protocol framework
wherein a manager node asks for bids from contractor nodes; both managers and contractors
are autonomous in that a manager may assign the job to any one of the bidders and any one of
the contractors may or may not send a bid), probabilistic versus deterministic, and one-time
versus dynamic reassignments. Examples for each of the cases have been discussed and an
annotated bibliography has been provided.
[34]. I. Foster, L.Yang, et al. Conservative Scheduling: Using Predicted Variance to Improve Scheduling Decisions in Dynamic Environments. Proceedings of the ACM/IEEE Supercomputing 2003, pp.31-46, 2003
In this paper, the authors explored using values of predicted mean and variance for the
measures of “capacity” of grid resources (instead of using only the predicted capacity) in
making data mapping decisions. NWS can predict resource information at some future instance
of time. But grid computing systems require information about a resource over duration of time,
when the resource is expected to be used for processing. Five algorithms for estimating the
cpu load during the interval for execution of a task have been considered: One Step-ahead cpu
load prediction Scheduling (OSS), Predicted Mean Interval Scheduling of the effective CPU
load (PMIS), Conservative load Scheduling using mean and variance of load (CS), History
Mean Scheduling using the mean of the last 5 minutes of load as the cpu load for the task
about to start (HMS) and History Conservative Scheduling by adding the mean and the
variance of the load during the last 5 minutes (HCS). The results show that schedulers assign
fewer loads to less-reliable (high variance) resources. Experimental results show that
performance improvement by using CS is modest but consistent as compared to the other four
methods.
[35]. Foster, C. Kesselman, S. Tuecke. The Anatomy of the Grid - Enabling Scalable Virtual Organizations. Proceeding of First IEEE/ACM International Symposium on Cluster Computing and the Grid, pp.6 -31, 2001
Foster et al, (2001) define a grid in the most general terms. The authors work out the
requirements for a grid, including the security requirements like authentication and
authorization. An open grid architecture is given for resource sharing. The fabric layer of the 5-
layer architecture deals with the local resources like computational nodes, storage devices and
the network resources. The Connectivity layer provides interconnection among sites through
Internet protocols. The Resource layer provides access to resources and information about
state of the system and about the performance of the system. The Collective has the
64
functionality of resource discovery, brokering and allocation after security checks. The
application layer checkpoints and manages the job. It also deals with failure of the resources.
The paper also discusses grid-computing relationships with other technologies such as world
wide web.
[36]. D. Abramson, J. Giddy, et al. High Performance Parametric Modeling with Nimrod/G: Killer Application for the Global Grid? In Proceedings of the 14th International Conference on Parallel and Distributed Processing Symposium (IPDPS-00), pp.520–528, Los Alamitos, 2000. IEEE.
Abramson et al, 2000, describe the conversion of a tool, called Nimrod, developed for
parametric model studies on a local environment into Nimrod/G. Nimrod/G is a scheduling and
brokering tool, which can work along with Globus tool-kit on a grid for performing parametric
studies on heterogeneous computing resources, which may be available intermittently on a
global network. An extensive scientific study experiment has been conducted through
Nimrod/G on the Globus Ubiquitous Supercomputing Test-bed Organization (GUSTO). The
results show that computing resources spread over USA and Australia may be harnessed to
solve a high performance parametric problem. Nimrod/G, as described, does not have many
features needed on a grid. These are features like providing priority rights to the local users,
reservation of resources etc. However Nimrod/G uses Globus tool-kit for discovery of
resources and for security. Nimrod/G also can use grid resources for trying to meet specified
deadlines for completion of jobs. Basically the paper describes the up-gradation of Nimrod for
making it compatible with the Globus tool-kit and it proves the feasibility of solving a parametric
problem on a grid through an experiment on GUSTO.
[37]. Foster, C. Kesselman. Globus: A Metacomputing Infrastructure Toolkit. International Journal of Supercomputer Applications, v.11 n.2, pp.115-128, 1997.
Foster and Kesselman, outline the features of Globus meta computing infrastructure toolkit.
Meta-computers have some similarity with distributed systems that integrate resources of
varying capabilities, but high performance considerations in meta computing require radically
different programming models. Meta-computers need tools to schedule highly heterogeneous
and dynamic computational and communication resources to meet performance requirements.
Globus infrastructure toolkit provides capabilities and interfaces for resource discovery through
Metacomputing Directory Service (MDS). The resource data is maintained through the
Lightweight Directory Access Protocol (LDAP). Security services are provided by Generic
Security Services (GSS). This tool-kit has been tested through the I-WAY experiments, wherein
17 research groups, including Supercomputing centers, attempted to solve high performance
problems through aggregation of resources. The objective of the Globus project, according to
the authors is to develop an Adaptive Wide Area Resource Environment (AWARE).
65
[38]. James Frey, Todd Tannenbaum, et al. Condor-G: A Computation Management Agent for Multi-Institutional Grids. Journal of Cluster Computing, v.5, pp.237-246, 2002.
Condor had been developed to manage the aggregate computational power of a large number
of resources within a single administrative domain for solving high performance problems, with
fault tolerance and with a user-friendly user interface. Flocks of Condors may be used to deal
with multiple Condor systems. However if resources from across multiple domains are to be
used, Globus tool-kit provides resource discovery, scheduling and security services. Condor-G
uses Globus tool-kit to enable users to leverage resources in multiple domains, as described in
Frey et al, (2002). Condor-G uses Grid Security Infrastructure (GSI), Grid Resource Allocation
and Management (GRAM) protocol with two-phase commit and fault tolerance features,
Metacomputing Directory Service (MDS) and Global Access to Secondary Storage (GASS)
services of Globus. MDS, in turn, uses Grid Resource Registration Protocol (GRRP) and Grid
Resource Information Protocol (GRIP). The Condor-G uses its GlideIn mechanism for
conveying its daemon process to remote resources. It uses GSI authenticated Grid FTP to
retrieve Condor executables from a central repository. The paper concludes with description of
three major successful experiments conducted on multi domain systems with Condor-G.
[39]. Kun Yang, Xin Guo, et al. Towards efficient resource on-demand in Grid Computing. SIGOPS Oper. Syst. Rev. v.37 n.2, pp.37-43, 2003. ACM Press.
Grid transactions take place via networks. Therefore if Resources on Demand (RoD) concept is
to be realized in grids, QoS guarantees on internet need to be considered, in addition to
management of computational and storage resources. Yang et al (2003) focus on resource
sharing from the network technology perspective. The paper applies Active network technology
to Grid QoS to deliver efficiently RoD. Active network technology transforms the store and
forward network into a store, compute and forward network. In Active Networks functions can
be provisioned dynamically as per the policies administered. The architecture of the system
uses Policy Based Management (PBM) system. The PBM system has four components: Policy
Management Tool to provide an environment for creation or editing of policy by the
administrator, Policy Repository as a LDAP directory, Policy Decision Point (PDP) as a part of
the grid scheduling and monitoring system and Policy Enforcement Point (PEP) at the router.
For a hierarchical scheduling system of the grid, multiple PDPs are co-ordinate by a PDP
Manager. The communication between PDP and PEP can be handled through SNMP or
Common Open Policy Service (COPS). The feasibility of the concept has been tested on the
MANTRIP test bed, with IntServ regions at the end points and with a Diffserv region between
the two IntServ regions.
[40]. Volker Hamscher, Uwe Schwiegelshohn, et al. Evaluation of Job-Scheduling Strategies for
66
Grid Computing. GRID 2000, pp.191-202, 2002.
Scheduling structures used in computational grids have been discussed in Hamscher, 2002.
Performance evaluation of schedulers has been attempted using a Simulation environment for
different combinations of job and machine models. Existing Workload traces and logs from real
machines have been used in simulation. In Centralized job submission, with single-site
scheduling, in-job communications are often ignored, but with multi-site scheduling, the
communication between different parts of job is taken into consideration. In a hierarchical
scheduler, different policies can be enforced at different levels. In Decentralized job
submissions, a scheduler that cannot handle a request (a) redirects a job to a different machine
directly –through direct communication, or (b) can forward it to a central job pool from where
another machine can pick it up. The simulation shows that the simple FCFS mechanism shows
better results than backfilling for the parameters used. For hierarchical schedulers, Backfilling
showed better results. Until studies with extensive sets of parameters are not completed, it may
not be appropriate to draw generalized conclusions from the measurements of the limited set of
simulations.
Appendix – III
List of leading researchers
Dr. Ian Foster
67
Professor of Computer ScienceThe University of Chicago1100 E. 58th StreetRyerson HallRoom 155Chicago, IL 60637E-mail: [email protected]: http://www-fpp.mcs.anl.gov/~foster/
Dr. Karl Czajkowski
Center for Grid TechnologiesUSC Information Sciences Institute4676 Admiralty Way, Suite 1001Marina del Rey, CA 90292 USAE-mail: [email protected]: http://www.isi.edu/~karlcz/
Dr. Carl Kesselman
Director, Center for Grid TechnologiesUSC/Information Sciences InstituteResearch Associate Professor, Computer ScienceUSC/Information Sciences Institute4676 Admiralty Way, Suite 1001Marina del Way, CA 90292-6695E-mail: [email protected]: http://www.isi.edu/~carl/
Dr. Rajkumar Buyya
Grid Computing and Distributed Systems (GRIDS) LaboratoryDepartment of Computer Science and Software EngineeringThe University of Melbourne111, Barry Street, Carlton,Melbourne, VIC 3053, AustraliaE-mail: [email protected]: http://www.buyya.com
Dr. Ramin Yahyapour
Computer Engineering InstituteUniversity Dortmund44221 DortmundGermanyEmail: [email protected]
Appendix – IV
List of Recent and Forthcoming conferences
1. 8th International ConferenceParallel and Distributed Computing and Networks (PDCN 2005)
68
February 15-17, 2005Innsbruck, Australiahttp://www.iasted.org/conferences/2005/innsbruck/pdcn.htm
2. 6th International WorkshopInternational Workshop on Distributed Computing (IWDC 2004)December 27-30, 2004Kolkata, Indiahttp://www.isical.ac.in/~iwdc2004
3. 12th International Conference Advance Computing and Communications (ADCOM 2004)December 15-18, 2004Ahmedabad, Indiahttp://ewh.ieee.org/r10/gujarat/adcom2004
4. 16th International Conference Parallel and Distributed Computing and Systems (PDCS 2004)November 8-11, 2004Massachusetts Institute of Technology (MIT), Cambridge, MA, USA.http://www.iasted.org/conferences/2004/cambridge/pdcs.htm
5. 6th IEEE International Conference Cluster Computing (Cluster 2004) September 20-23, 2004 San Diego, California, USA
http://grail.sdsc.edu/cluster2004
6. International Conference Embedded and Ubiquitous Computing (EUC 2004) August 26-28, 2004 Aizu, Japan
http://oscar.u-aizu.ac.jp/EUC2004
7. 4th IEEE International Conference Peer-to-Peer Computing (P2P 2004) August 25-27, 2004 Zurich, Switzerland
http://femto.org/p2p2004
8. 18th International Symposium
69
High Performance Computing Systems and Applications (HPCS 2004) May 16-19, 2004 Winnipeg, Manitoba, Canada
http://www.cs.umanitoba.ca/~hpcs04
70
Appendix – V
Cross Referencing Graph
Representation of the graph:
Author’s Name
X
Referred By:
Author’s Name
71
Appendix – VI
72
Email Sent to Researchers
From: Henan Zhao <[email protected]>
Subject: Re: Seeking help on your paper: A Hybrid Heuristic for DAG Scheduling on Heterogeneous SystemsDate: Thu, 17 Jun 2004 11:28:53 +0100To: Aggarwal M <[email protected]>
Hello Aggarwal
Sorry about the incorrect link on my web page. There are some problems with that, I will sort it out soon.
The attached file is a copy of the HCW paper. You also can find it in the proceeding of IPDPS'04 conference.
Good luck with your survey and your study.
Cheers
Henan
Aggarwal M wrote:
Dear Henan Zhao,
I am a Master's Student at the University of Windsor, Canada. My research area is Grid SchedulingTechniques. I am conducting a survey on Grid Scheduling. Iam interested in reading your paper "A Hybrid Heuristic forDAG Scheduling on Heterogeneous Systems".
However I am not able to get a copy. Can you please send itto me electronically?
I am highly thankful to you for your time in reading mymail.
Thanking you again,
Mona Aggarwal
Graduate StudentSchool of Computer ScienceUniversity of Windsor
Attached File: paper.pdf (205Kbytes)
73