october 19, 2005charm++ workshop, 2005 1 faucets tutorial presented by esteban pauli and greg koenig...

43
October 19, 2005 Charm++ Workshop, 2005 1 Faucets Tutorial Presented by Esteban Pauli and Greg Koenig Parallel Programming Lab, UIUC

Upload: meredith-stafford

Post on 20-Jan-2016

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: October 19, 2005Charm++ Workshop, 2005 1 Faucets Tutorial Presented by Esteban Pauli and Greg Koenig Parallel Programming Lab, UIUC

October 19, 2005 Charm++ Workshop, 2005 1

Faucets Tutorial

Presented by Esteban Pauli and Greg Koenig

Parallel Programming Lab, UIUC

Page 2: October 19, 2005Charm++ Workshop, 2005 1 Faucets Tutorial Presented by Esteban Pauli and Greg Koenig Parallel Programming Lab, UIUC

Charm++ Workshop, 2005 2October 19, 2005

Outline

System Overview Cluster Scheduler Meta Scheduler Writing a Scheduling Strategy

Page 3: October 19, 2005Charm++ Workshop, 2005 1 Faucets Tutorial Presented by Esteban Pauli and Greg Koenig Parallel Programming Lab, UIUC

Charm++ Workshop, 2005 3October 19, 2005

Outline

System Overview Cluster Scheduler Meta Scheduler Writing a Scheduling Strategy

Page 4: October 19, 2005Charm++ Workshop, 2005 1 Faucets Tutorial Presented by Esteban Pauli and Greg Koenig Parallel Programming Lab, UIUC

Charm++ Workshop, 2005 4October 19, 2005

Current Situation

Cluster

Cluster

UserUser

Cluster

Cluster

User

User

Where should I submit my job?

Page 5: October 19, 2005Charm++ Workshop, 2005 1 Faucets Tutorial Presented by Esteban Pauli and Greg Koenig Parallel Programming Lab, UIUC

Charm++ Workshop, 2005 5October 19, 2005

Faucets System

User

User

UserCluster

Cluster

Cluster

Faucets System

I’ll submit my job to Faucets!!!

Page 6: October 19, 2005Charm++ Workshop, 2005 1 Faucets Tutorial Presented by Esteban Pauli and Greg Koenig Parallel Programming Lab, UIUC

Charm++ Workshop, 2005 6October 19, 2005

Faucets System

User Faucets System

Cluster

Cluster

Cluster

1. User submits job to Faucets system

Job requirements

Page 7: October 19, 2005Charm++ Workshop, 2005 1 Faucets Tutorial Presented by Esteban Pauli and Greg Koenig Parallel Programming Lab, UIUC

Charm++ Workshop, 2005 7October 19, 2005

Faucets System

User Faucets System

Cluster

Cluster

Cluster

2. Faucets forwards requests to clusters meeting minimum requirements

Job requirements

Job require

ments

Page 8: October 19, 2005Charm++ Workshop, 2005 1 Faucets Tutorial Presented by Esteban Pauli and Greg Koenig Parallel Programming Lab, UIUC

Charm++ Workshop, 2005 8October 19, 2005

Faucets System

User Faucets System

Cluster

Cluster

Cluster

3. Clusters analyze job specs and return bids

Bid

Bid

Bids

Page 9: October 19, 2005Charm++ Workshop, 2005 1 Faucets Tutorial Presented by Esteban Pauli and Greg Koenig Parallel Programming Lab, UIUC

Charm++ Workshop, 2005 9October 19, 2005

Faucets System

User Faucets System

Cluster

Cluster

Cluster

4. User selects winner

Winner selected Winner selected

Page 10: October 19, 2005Charm++ Workshop, 2005 1 Faucets Tutorial Presented by Esteban Pauli and Greg Koenig Parallel Programming Lab, UIUC

Charm++ Workshop, 2005 10October 19, 2005

Faucets System

User Faucets System

Cluster

Cluster

Cluster

5. Winner runs job, user monitors progress

Page 11: October 19, 2005Charm++ Workshop, 2005 1 Faucets Tutorial Presented by Esteban Pauli and Greg Koenig Parallel Programming Lab, UIUC

Charm++ Workshop, 2005 11October 19, 2005

Outline

System Overview Cluster Scheduler Meta Scheduler Writing a Scheduling Strategy

Page 12: October 19, 2005Charm++ Workshop, 2005 1 Faucets Tutorial Presented by Esteban Pauli and Greg Koenig Parallel Programming Lab, UIUC

Charm++ Workshop, 2005 12October 19, 2005

Cluster

Cluster Daemon

Scheduler

System Architecture

Central Server

Database

User

User

Page 13: October 19, 2005Charm++ Workshop, 2005 1 Faucets Tutorial Presented by Esteban Pauli and Greg Koenig Parallel Programming Lab, UIUC

Charm++ Workshop, 2005 13October 19, 2005

Why a New Scheduler?

Current schedulers try to maximize throughput – good for showing cluster is busy, but can be bad for users

Users worry about deadlines, priorities, fairness, etc. Profit centers worry about profits Need good interface between cluster scheduler and

meta scheduler Want scheduler that can leverage run-time system

(Charm++ checkpoint/restart, shrink/expand)

Page 14: October 19, 2005Charm++ Workshop, 2005 1 Faucets Tutorial Presented by Esteban Pauli and Greg Koenig Parallel Programming Lab, UIUC

Charm++ Workshop, 2005 14October 19, 2005

Scheduler

Database

Request Server

Job Monitor

Cluster Monitor

Strategy

Job scheduler

Scheduler Design

UserCluster

Page 15: October 19, 2005Charm++ Workshop, 2005 1 Faucets Tutorial Presented by Esteban Pauli and Greg Koenig Parallel Programming Lab, UIUC

Charm++ Workshop, 2005 15October 19, 2005

Installation

Install MySQL database Install Charm++, MPI ./configure; (edit config file); make; make install

sh sqlwriter.sh | mysql –user=root –p

Set up node list ./startScheduler

Done!!!

Page 16: October 19, 2005Charm++ Workshop, 2005 1 Faucets Tutorial Presented by Esteban Pauli and Greg Koenig Parallel Programming Lab, UIUC

Charm++ Workshop, 2005 16October 19, 2005

Job Submission

ufrun (uni-processor), frun (Charm++), frun-mpi (mpi) used for interactive jobs

ufsub, fsub, fsub-mpi used for batch jobs Options: +n, +p, +ppn, -stdout, … See manual for full details

Page 17: October 19, 2005Charm++ Workshop, 2005 1 Faucets Tutorial Presented by Esteban Pauli and Greg Koenig Parallel Programming Lab, UIUC

Charm++ Workshop, 2005 17October 19, 2005

Job Control

Job monitoring: fjobsOptions: -user <name>, -full, idSample output:UserId JobId # Nodes Head Node Status Name Time Remaining

------ ----- ------- --------- ------ ---- --------------

fuser 63503 1 arch017.c RUNNING my_exe 0:06:39:28

Job deletion: fkillOptions: -u <name>, id

Page 18: October 19, 2005Charm++ Workshop, 2005 1 Faucets Tutorial Presented by Esteban Pauli and Greg Koenig Parallel Programming Lab, UIUC

Charm++ Workshop, 2005 18October 19, 2005

Outline

System Overview Cluster Scheduler Meta Scheduler Writing a Scheduling Strategy

Page 19: October 19, 2005Charm++ Workshop, 2005 1 Faucets Tutorial Presented by Esteban Pauli and Greg Koenig Parallel Programming Lab, UIUC

Charm++ Workshop, 2005 19October 19, 2005

Disclaimer

Current code base developed as a proof of concept

Code is not yet production quality Code works, but has not been tested thoroughly Code has some security issues

Use at your own risk, and please report bugs Code will be updated within the next year

Page 20: October 19, 2005Charm++ Workshop, 2005 1 Faucets Tutorial Presented by Esteban Pauli and Greg Koenig Parallel Programming Lab, UIUC

Charm++ Workshop, 2005 20October 19, 2005

Cluster

Cluster Daemon

Scheduler

System Architecture

Central Server

Database

User

User

Page 21: October 19, 2005Charm++ Workshop, 2005 1 Faucets Tutorial Presented by Esteban Pauli and Greg Koenig Parallel Programming Lab, UIUC

Charm++ Workshop, 2005 21October 19, 2005

Central Server

Responsible for keeping all information about users and clusters in system

Responsible for forwarding users’ job requests to clusters

Responsible for dispute arbitration Responsible for keeping account balances

Page 22: October 19, 2005Charm++ Workshop, 2005 1 Faucets Tutorial Presented by Esteban Pauli and Greg Koenig Parallel Programming Lab, UIUC

Charm++ Workshop, 2005 22October 19, 2005

Central Server: Database

Cluster table: contains information about confederated clusters

mysql> create table Cluster (

domainName text not null,

port int not null,

status text not null,

acctId int not null );

Page 23: October 19, 2005Charm++ Workshop, 2005 1 Faucets Tutorial Presented by Esteban Pauli and Greg Koenig Parallel Programming Lab, UIUC

Charm++ Workshop, 2005 23October 19, 2005

Central Server: Database (cont.)

User table: contains information about registered users

mysql> create table users (

userid text not null,

password text not null,

localCluster text not null,

acctId int not null );

Page 24: October 19, 2005Charm++ Workshop, 2005 1 Faucets Tutorial Presented by Esteban Pauli and Greg Koenig Parallel Programming Lab, UIUC

Charm++ Workshop, 2005 24October 19, 2005

Central Server: Database (cont.)

Account table: keeps account balances for clusters and users

mysql> create table accounts (

clusterid text not null,

acctId int not null,

balance int not null,

pbalance int not null);

Page 25: October 19, 2005Charm++ Workshop, 2005 1 Faucets Tutorial Presented by Esteban Pauli and Greg Koenig Parallel Programming Lab, UIUC

Charm++ Workshop, 2005 25October 19, 2005

Central Server: Database (cont.)

Job table: keeps track of all running and completed jobs

mysql> create table Jobs (

JobID text not null,

User text not null,

Status text not null,

ClusterID text not null);

Page 26: October 19, 2005Charm++ Workshop, 2005 1 Faucets Tutorial Presented by Esteban Pauli and Greg Koenig Parallel Programming Lab, UIUC

Charm++ Workshop, 2005 26October 19, 2005

Central Server: Installation

Compilation & Configuration: cd faucets; makeEdit faucets/cs/db.propertiesGet and install JDBC

Running the central server: cd faucets java -cp .:/path/to/mm.mysql-2.0.8-bin.jar TheServer

As users and clusters join, update DB

Page 27: October 19, 2005Charm++ Workshop, 2005 1 Faucets Tutorial Presented by Esteban Pauli and Greg Koenig Parallel Programming Lab, UIUC

Charm++ Workshop, 2005 27October 19, 2005

Cluster

Cluster Daemon

Scheduler

System Architecture

Central Server

Database

User

User

Page 28: October 19, 2005Charm++ Workshop, 2005 1 Faucets Tutorial Presented by Esteban Pauli and Greg Koenig Parallel Programming Lab, UIUC

Charm++ Workshop, 2005 28October 19, 2005

Cluster Daemon

Purpose: provide interface between central server and cluster scheduler

No user intervention Installation: cd faucets; make Usage:

cd faucets java -cp .:./common/TB.jar cd.ClusterDaemon

<CS_hostname> <CS_port> /tmp/

Page 29: October 19, 2005Charm++ Workshop, 2005 1 Faucets Tutorial Presented by Esteban Pauli and Greg Koenig Parallel Programming Lab, UIUC

Charm++ Workshop, 2005 29October 19, 2005

Cluster

Cluster Daemon

Scheduler

System Architecture

Central Server

Database

User

User

Page 30: October 19, 2005Charm++ Workshop, 2005 1 Faucets Tutorial Presented by Esteban Pauli and Greg Koenig Parallel Programming Lab, UIUC

Charm++ Workshop, 2005 30October 19, 2005

Command Client

Installation: cd faucets/cc; make Some common commands

Job submissionjava cc.FaucetCLI <central_server_DNS> <port> <application name> [-input file1,file2,...,filen] [<args>]

Retrieving output filesjava FaucetCLI GetFile <job-id> <file-name>

Page 31: October 19, 2005Charm++ Workshop, 2005 1 Faucets Tutorial Presented by Esteban Pauli and Greg Koenig Parallel Programming Lab, UIUC

Charm++ Workshop, 2005 31October 19, 2005

Faucets GUI

Page 32: October 19, 2005Charm++ Workshop, 2005 1 Faucets Tutorial Presented by Esteban Pauli and Greg Koenig Parallel Programming Lab, UIUC

Charm++ Workshop, 2005 32October 19, 2005

Cluster

Cluster Daemon

Scheduler

System Architecture

Central Server

Database

User

User

Page 33: October 19, 2005Charm++ Workshop, 2005 1 Faucets Tutorial Presented by Esteban Pauli and Greg Koenig Parallel Programming Lab, UIUC

Charm++ Workshop, 2005 33October 19, 2005

Outline

System Overview Cluster Scheduler Meta Scheduler Writing a Scheduling Strategy

Page 34: October 19, 2005Charm++ Workshop, 2005 1 Faucets Tutorial Presented by Esteban Pauli and Greg Koenig Parallel Programming Lab, UIUC

Charm++ Workshop, 2005 34October 19, 2005

Faucets Scheduling Framework

In many ways Faucets can be thought of simply as a framework for creating cluster scheduling solutions

Any Faucets deployment has some scheduling objective that it tries to achieve Traditional FIFO scheduling On-demand scheduling – driven by workloads and the

priorities that users have to access resources Resource bartering – driven on an economic basis

(“where can this job be run for the least cost?”)

Page 35: October 19, 2005Charm++ Workshop, 2005 1 Faucets Tutorial Presented by Esteban Pauli and Greg Koenig Parallel Programming Lab, UIUC

Charm++ Workshop, 2005 35October 19, 2005

Schedule Strategies

The scheduling method used by Faucets can readily be changed by writing a scheduling strategy

The scheduling strategy can do interesting things that take advantage of features in lower level runtime systems (e.g., Charm++ Adaptive Jobs that shrink/expand to better utilize cluster)

Scheduling strategy code is a C++ class Implement to reflect the scheduling method Recompile the cluster scheduler executable

Page 36: October 19, 2005Charm++ Workshop, 2005 1 Faucets Tutorial Presented by Esteban Pauli and Greg Koenig Parallel Programming Lab, UIUC

Charm++ Workshop, 2005 36October 19, 2005

Schedule Strategy Examples

PriorityFIFOStrategy Jobs have assigned priorities Jobs of a given priority are scheduled FIFO Jobs of a higher priority can preempt jobs of a lower priority

LimitFIFOStrategy Jobs are scheduled FIFO The number of short/long jobs that a given user may be running

simultaneously is limited to prevent resource domination GanttChartStrategy

Jobs are scheduled by arranging them on a Gantt Chart Predictions can be made about whether a given job can be

completed before a user-specified deadline

Page 37: October 19, 2005Charm++ Workshop, 2005 1 Faucets Tutorial Presented by Esteban Pauli and Greg Koenig Parallel Programming Lab, UIUC

Charm++ Workshop, 2005 37October 19, 2005

Implementing a Strategy (1)

To implement a new strategy Inherit from SchedulingStrategy base class Implement four methods

class PriorityFIFOStrategy : public SchedulingStrategy{ public: PriorityFIFOStrategy (int nodes);

float is_available (Job *job, Job *wait_queue, Job *run_queue); void allocate_nodes (Job *wait_queue, Job *run_queue); void addjob (char *username, int num_procs); void removejob (char *username, int num_procs);};

Page 38: October 19, 2005Charm++ Workshop, 2005 1 Faucets Tutorial Presented by Esteban Pauli and Greg Koenig Parallel Programming Lab, UIUC

Charm++ Workshop, 2005 38October 19, 2005

Implementing a Strategy (2)

float is_available (Job *job, Job *wait_queue, Job *run_queue);

Method parameters job – a pointer to an incoming job wait_queue – a pointer to the queue of all waiting jobs run_queue – a pointer to the queue of all running jobs

Method returns a float 0.0 to 1.0 – the utilization of the cluster if the incoming job is accepted

for execution (0% to 100%) -1.0 - the incoming job cannot be accepted for immediate execution

This method is used (indirectly) by the Cluster Daemon to make the choice of target cluster when doing metascheduling

Page 39: October 19, 2005Charm++ Workshop, 2005 1 Faucets Tutorial Presented by Esteban Pauli and Greg Koenig Parallel Programming Lab, UIUC

Charm++ Workshop, 2005 39October 19, 2005

Implementing a Strategy (3)

void allocate_nodes (Job *wait_queue, Job *run_queue);

Method parameters wait_queue – a pointer to the queue of all waiting jobs run_queue – a pointer to the queue of all running jobs

Method examines each Job object in the wait_queue and updates it by allocating available cluster nodes to the job so the scheduler can launch it For Charm++ jobs, normally try to allocate max_nodes first and

then min_nodes after that to fulfill the request

Page 40: October 19, 2005Charm++ Workshop, 2005 1 Faucets Tutorial Presented by Esteban Pauli and Greg Koenig Parallel Programming Lab, UIUC

Charm++ Workshop, 2005 40October 19, 2005

Implementing a Strategy (4)

void addjob (char *username, int num_procs);void removejob (char *username, int num_procs);

Method parameters username – the user submitting a new job into the cluster num_procs – the number of processors the new job is allocated

Method can be used to enforce some characteristic about the number of processors allocated to any given user (e.g., limit the total number of processors that any given user is allocated)

Page 41: October 19, 2005Charm++ Workshop, 2005 1 Faucets Tutorial Presented by Esteban Pauli and Greg Koenig Parallel Programming Lab, UIUC

Charm++ Workshop, 2005 41October 19, 2005

Instantiating a Strategy

The scheduling strategy is instantiated in the Scheduler class constructor located in Scheduler.C

strategy = new PriorityFIFOStrategy (num_nodes);

The total number of nodes in the cluster is provided to the strategy’s constructor

That’s it! The scheduler code calls into your custom Strategy class whenever it needs to make a decision about whether a new job can be scheduled, to allocate nodes to the job, etc.

Page 42: October 19, 2005Charm++ Workshop, 2005 1 Faucets Tutorial Presented by Esteban Pauli and Greg Koenig Parallel Programming Lab, UIUC

Charm++ Workshop, 2005 42October 19, 2005

Questions?

Page 43: October 19, 2005Charm++ Workshop, 2005 1 Faucets Tutorial Presented by Esteban Pauli and Greg Koenig Parallel Programming Lab, UIUC

Charm++ Workshop, 2005 43October 19, 2005

Thanks