configuring resources for the grid

33
December 8 & 9, 2005, Austin, TX SURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide Configuring Resources for the Grid Jerry Perez Senior Administrator Texas Tech University

Upload: eve

Post on 25-Feb-2016

49 views

Category:

Documents


1 download

DESCRIPTION

Configuring Resources for the Grid . Jerry Perez Senior Administrator Texas Tech University. Outline. What is a Job Manager? Types of Job Managers PBS Pro SGE LSF Condor/Condor-DAGman Rocks + Rolls (Quick overview). What is a Job Manager?. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Configuring Resources for the Grid

December 8 & 9, 2005, Austin, TXSURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide

Configuring Resources for the Grid

Jerry Perez

Senior AdministratorTexas Tech University

Page 2: Configuring Resources for the Grid

December 8 & 9, 2005, Austin, TXSURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide

Outline• What is a Job Manager?• Types of Job Managers• PBS Pro• SGE• LSF• Condor/Condor-DAGman• Rocks + Rolls (Quick overview)

Page 3: Configuring Resources for the Grid

December 8 & 9, 2005, Austin, TXSURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide

What is a Job Manager?• A Job Management System is a

software component that ensures:• Balanced use of cluster resources.• Fair allocation of these resources

to user's jobs in a process that determines which job to run

• When and where to run compute jobs.

Page 4: Configuring Resources for the Grid

December 8 & 9, 2005, Austin, TXSURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide

What is a Job Manager?

Page 5: Configuring Resources for the Grid

December 8 & 9, 2005, Austin, TXSURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide

Components of a Job Manager

• Resource Management System – a process that maintains the current state of

all the resources under its control, including the physical resources of the cluster and account information such as relative priorities and account balances.

• Queuing System –  a process that maintains the current state

of jobs submitted but not completed. • Scheduler

–  a system that assigns jobs to resources.

Page 6: Configuring Resources for the Grid

December 8 & 9, 2005, Austin, TXSURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide

Why do we need a Job Manager?

A Job Management System should always be used for a cluster:• Operated as a public resource. • If there are a large number of users or users

who don't know each other. • With a large number of nodes and processors. that runs a large number of jobs. • Whose nodes are heterogeneous in terms of

memory, speed, number of processors, software licenses, networking, and other features.

Note: Most clusters are homogeneous with respect to hardware and software.

Page 7: Configuring Resources for the Grid

December 8 & 9, 2005, Austin, TXSURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide

Types of Job ManagersFeature PBS Pro SGE Condor LSF

Single process preemptive jobs No Yes Yes Yes

Multi-process preemptive jobs No Yes No Yes

Single process interactive jobs Yes Yes Yes Yes

Multi-process interactive jobs No Yes No Yes

Single-process preemptive, interactive jobs

No Yes Yes Yes

Multi-process, preemptive, interactive jobs

No Yes No Yes

Costs Free academic

Free academic

Freeacademic Commercial

Users’ desktops included in cluster (Cycle Scavenging Grid)

Yes Yes Yes Yes

Page 8: Configuring Resources for the Grid

December 8 & 9, 2005, Austin, TXSURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide

PBS ProComponents: PBS Pro is made up of a number of components:• The server and clients such as user commands. • A server component manages a number of

different objects, such as queues or jobs. • Each object consists of a number of data items

or attributes. • Scheduling is policy based and operates in a

FIFO round-robin type fashion.• Specific Queues can be configured for priority

queuing.• Minimal Queue/Scheduler configuration

Page 9: Configuring Resources for the Grid

December 8 & 9, 2005, Austin, TXSURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide

PBS Pro

Page 10: Configuring Resources for the Grid

December 8 & 9, 2005, Austin, TXSURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide

PBS Pro Graphical User Interface

Page 11: Configuring Resources for the Grid

December 8 & 9, 2005, Austin, TXSURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide

PBS Pro Graphical User Interface

Page 12: Configuring Resources for the Grid

December 8 & 9, 2005, Austin, TXSURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide

SGE – Sun Grid Engine• The SGE version 6 queue configuration

allows for a queue to span more than one execution host to provide multiple hosts per queue configuration.

• Uses concept of SGE Master node controlling “pools” of compute clients.

• Can manage up to 10,000 clients per SGE Master node.

• SGE can provide Load Leveling on the fly.• Scheduling can be policy based or

topologically based. • Addresses the “Backfill” problem. (More on

that later.)• Queue optimization is not automatic. It

requires “tuning”.

Page 13: Configuring Resources for the Grid

December 8 & 9, 2005, Austin, TXSURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide

SGE - Basic Cluster Configuration

• Configured to reflect site dependencies and to influence batch system behavior.

• Site dependencies include valid paths for programs such as mail or xterm.

• A global configuration is provided for the Master Host as well as for every host in the grid engine system pool.

• Can configure the system to use a configuration local to each host to override particular entries in the global configuration.

Page 14: Configuring Resources for the Grid

December 8 & 9, 2005, Austin, TXSURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide

SGE – Cluster Configuration GUI

Page 15: Configuring Resources for the Grid

December 8 & 9, 2005, Austin, TXSURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide

SGE – Host Configuration GUI

Page 16: Configuring Resources for the Grid

December 8 & 9, 2005, Austin, TXSURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide

(L)oad (S)haring (F)acility - LSF

Page 17: Configuring Resources for the Grid

December 8 & 9, 2005, Austin, TXSURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide

LSF• Scheduling can be policy based or

topologically based.• Queue optimization is not automatic. It

requires “tuning”.• Topologically based scheduling can use

load information to schedule jobs.• Addresses the “Backfill” problem.• Jobs in a backfill queue cannot be

preempted (a job in a backfill queue might be running in a reserved job slot, and starting a new job in that slot might delay the start of the big parallel job):

• A backfill queue cannot be preemptable. • A preemptive queue whose priority is

higher than the backfill queue cannot preempt the jobs in backfill queue.

Page 18: Configuring Resources for the Grid

December 8 & 9, 2005, Austin, TXSURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide

LSF - How backfilling works • LSF assumes that a job will run until its

run limit expires.• Backfill scheduling works most efficiently

when all the jobs in the cluster have a run limit.

• Since jobs with a shorter run limit have more chance of being scheduled as backfill jobs, users who specify appropriate run limits in a backfill queue will be rewarded by improved turnaround time.

Page 19: Configuring Resources for the Grid

December 8 & 9, 2005, Austin, TXSURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide

LSF - How backfilling works

Page 20: Configuring Resources for the Grid

December 8 & 9, 2005, Austin, TXSURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide

LSF - How backfilling works

Page 21: Configuring Resources for the Grid

December 8 & 9, 2005, Austin, TXSURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide

LSF GUI

Page 22: Configuring Resources for the Grid

December 8 & 9, 2005, Austin, TXSURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide

LSF – Cluster Monitoring GUI

Page 23: Configuring Resources for the Grid

December 8 & 9, 2005, Austin, TXSURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide

Condor

• Provides a job queuing mechanism• Scheduling policy• Priority scheme• Resource monitoring• Resource management.

Page 24: Configuring Resources for the Grid

December 8 & 9, 2005, Austin, TXSURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide

• Users submit their serial or parallel jobs to Condor.

• Condor places them into a queue.• Chooses when and where to run the

jobs based upon a policy.• Carefully monitors their progress• Informs the user upon completion • Uses FIFO round-robin scheduling out

of the box.• Can use attribute-based scheduling.

Page 25: Configuring Resources for the Grid

December 8 & 9, 2005, Austin, TXSURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide

• Condor can be used to build Grid-style computing environments that cross administrative boundaries.

• Condor's "flocking" technology allows multiple Condor compute installations to work together.

• Condor incorporates many of the emerging Grid-based computing methodologies and protocols.

• For instance, Condor-G is fully interoperable with resources managed by Globus.

Page 26: Configuring Resources for the Grid

December 8 & 9, 2005, Austin, TXSURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide

Condor-DAGMan • DAGMan (Directed Acyclic Graph

Manager) is a meta-scheduler for Condor. It manages dependencies between jobs at a higher level than the Condor Scheduler.

• DAGMan is responsible for scheduling, recovery, and reporting for the set of programs submitted to Condor

Page 27: Configuring Resources for the Grid

December 8 & 9, 2005, Austin, TXSURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide

Rocks + Rolls

Page 28: Configuring Resources for the Grid

December 8 & 9, 2005, Austin, TXSURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide

Rocks + Rolls

The complexity of cluster management (e.g., determining if all nodes have a consistent set of software) often overwhelms part-time cluster administrators, who are usually domain application scientists.

Rocks is a complete clustering solution with a goal to help deliver the computational power of clusters to a wide range of scientific users.

Page 29: Configuring Resources for the Grid

December 8 & 9, 2005, Austin, TXSURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide

Rocks + Rolls• Before you install Rocks, be sure

you have decided what Rolls you wish to include in your installations.

• You may install whatever you like, however remember you can only choose one scheduler: LSF, SGE, PBS, or Condor.

• Schedulers do not like being used together due to resource conflicts.

Page 30: Configuring Resources for the Grid

December 8 & 9, 2005, Austin, TXSURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide

Rocks + Rolls• Required Rolls:• Base• Hpc• Kernel• Web-server

Page 31: Configuring Resources for the Grid

December 8 & 9, 2005, Austin, TXSURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide

Rocks + Rolls

• List of various rolls:• Area51System - security related services and utilities• GangliaCluster - monitoring system from UCB• GridGlobus 4.0.1 (GT4)• Condor Roll• JavaSun Java SDK and JVM• MyrinetMyricom’s Myrinet drivers and MPICH

environments• PbsPBS - job queueing system• NinfNinf-G - a simple, yet powerful, client-server-based

standard RPC mechanism • SgeSun - Grid Engine job queueing system• VizSupport - for building visualization clusters• LSF - comes with Platform Rocks

Page 32: Configuring Resources for the Grid

December 8 & 9, 2005, Austin, TXSURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide

Rocks + Rolls

Page 33: Configuring Resources for the Grid

December 8 & 9, 2005, Austin, TXSURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide

Thank You.

Questions?