quick tutorial on mpich for nic-cluster cs 387 class notes

16
Quick Tutorial on MPICH for NIC- Cluster CS 387 Class Notes

Post on 21-Dec-2015

226 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Quick Tutorial on MPICH for NIC-Cluster CS 387 Class Notes

Quick Tutorial on MPICH for NIC-Cluster

CS 387 Class Notes

Page 2: Quick Tutorial on MPICH for NIC-Cluster CS 387 Class Notes

NIC CLUSTER OVERVIEW

Start page: http://hpc.mst.edu/

Node Allocation and Usage policy: http://hpc.mst.edu/accessandpolicies/

The Shared NIC Cluster Hardware and Software

The NIC cluster had 64-bit nodes (http://hpc.mst.edu/hardware/) with an Ethernet network and eventually Infiniband interconnect, with the following standard software suite:

• The Torque/PBS scheduler.• Compilers:  GCC, Intel-9, Intel-10, and Intel-11 Compiler Suites.• Applications and Libraries listed at http://hpc.mst.edu/applications/

InfiniBand offers point-to-point bidirectional serial links intended for the connection of processors with high-speed peripherals such as disks. InfiniBand also offers multicast operations.

Page 3: Quick Tutorial on MPICH for NIC-Cluster CS 387 Class Notes

Cluster pictures

Page 4: Quick Tutorial on MPICH for NIC-Cluster CS 387 Class Notes

PBS Job Scripts

NIC cluster uses PBS (Portable Batch System)

Why? Improves overall system efficiency Fair access to all users since it maintains a

scheduling policy Provides protection against dead nodes

Page 5: Quick Tutorial on MPICH for NIC-Cluster CS 387 Class Notes

How PBS works

User writes a batch script for the job and submits it to PBS with the qsub command.

PBS places the job into a queue based on its resource requests and runs the job when those resources become available.

The job runs until it either completes or exceeds one of its resource request limits.

PBS copies the job’s output into the directory from which the job was submitted and optionally notifies the user via email that the job has ended.

Page 6: Quick Tutorial on MPICH for NIC-Cluster CS 387 Class Notes

Step 1: Login

Off-campus machine:Connect to campus using MST VPN

> ssh nic.mst.edu

On-campus machine (VPN is not required) > ssh nic.mst.edu

Use the following to set up your MPI path correctly.

$ module load openmpi/gnu

To make this your default run$ savemodules

Visit http://hpc.mst.edu/examples/openmpi/c/to get information about OpenMP

Page 7: Quick Tutorial on MPICH for NIC-Cluster CS 387 Class Notes

DFS files are not directly accessible at the cluster

Use sftp command to transfer any files from your DFS space (S: drive)

e.g. > sftp minersftp.mst.edu> get X.c> quit

You may also use WinSCP in Windows or Fugu from OS X.

No DFS Support

Page 8: Quick Tutorial on MPICH for NIC-Cluster CS 387 Class Notes

Step 2: Compile MPICH Programs

Syntax:

C : mpicc –o hello hello.cC++ : mpiCC –o hello hello.cpp

Note: Before compilation, make sure the MPICH library path is set or use the export command like below:

export PATH=/opt/mpich/gnu/bin: $PATH

Executable file

Page 9: Quick Tutorial on MPICH for NIC-Cluster CS 387 Class Notes

Step 3: Write PBS batch script file

Ex1: A simple script file (pbs_script)

A job named “HELLO” requests 8 nodes and at most 15 minutes of runtime.

#!/bin/bash#PBS –N HELLO#PBS –l walltime=0:15:00#PBS –l nodes=8#PBS –q @nic-cluster.mst.edu

mpirun –n 8 /nethome/users/ercal/MPI/hello

Page 10: Quick Tutorial on MPICH for NIC-Cluster CS 387 Class Notes

Some PBS Directive options

• -N jobname (name the job “jobname”)

• -q @nic-cluster.mst.edu (The cluster address to send the job to)

• -e errfile (redirect standard error to a file named errfile)• -o outfile (redirect standard output to a file named outfile)• -j oe (combine standard output and standard error)• -l walltime=N (request a walltime of N in the form hh:mm:ss)• -l cput=N (request N sec of CPU time; or in the form hh:mm:ss)• -l mem=N[KMG][BW] (request total N kilo| mega| giga} {bytes|

words} of memory on all requested processors together)• -l nodes=N:ppn=M (request N nodes with M processors per node)

Page 11: Quick Tutorial on MPICH for NIC-Cluster CS 387 Class Notes

Step 3.1: Submit a Job

Use PBS command qsub

Syntax :qsub pbs-job-filename

Example :> qsub pbs_script

returns the message555.nic-p1.srv.mst.edu

(555 is the job ID that PBS automatically assigns to your job)

Page 12: Quick Tutorial on MPICH for NIC-Cluster CS 387 Class Notes

Result after job completion

An error file and an output file are created.The names are usually of the form:

jobfilename.o(jobid)jobfilename.e(jobid)

Ex: simplejob.e555 – Contains STDERRsimplejob.o555 – Contains STDOUT

-j oe (combine standard output and standard error)

Page 13: Quick Tutorial on MPICH for NIC-Cluster CS 387 Class Notes

Ex2: Another sample batch script (pbs_script)

#PBS -N hello#PBS -l mem=200mb#PBS -l walltime=0:15:00#PBS -l nodes=2:ppn=2#PBS -j oe#PBS –m abe

mpirun –n 8 /nethome/users/ercal/MPI/hello

This job “hello” requests 15 minutes of wall-time, 2 nodes using 2 processors each (4 processors), and 200MB of memory (100MB per node; 50MB per processor). Also, the output and error are written to one file.

How many processes are created?

Page 14: Quick Tutorial on MPICH for NIC-Cluster CS 387 Class Notes

Tools

qstat (jobid) qstat –u (userid) qstat -a

This command returns the status of specified job(s).

• qdel (jobid)This command deletes a job.

• size executable_file_name gives O/P in the following format:text data bss dec hex filename1650312 71928 6044136 7766376 768168 hello

(This can help to check memory requirements before submitting a job)

Page 15: Quick Tutorial on MPICH for NIC-Cluster CS 387 Class Notes

Tips for programming in MPICH

Use compiler optimizing flags for faster code. Some of them are:

-O2 (moderate optimization) -funroll-loops (enables loop unrolling optimizations) -Wall (enables all common warnings) -ansi (enables ANSI C/C++ compliance) -pedantic (enables strictness of language compliance)

Avoid using pointers in your program, unless absolutely necessary.

Page 16: Quick Tutorial on MPICH for NIC-Cluster CS 387 Class Notes

No scanf allowed in MPICH (fscanf is allowed). Instead, pass input to your program using command line arguments argc and argv in the mpiexec command of the PBS script

Example:mpirun –n 8 /nethome/users/ercal/MPI/cpi …arguments…

To give your job a high priority, set

wall-time ≤ 15 minutes (#PBS -l walltime=0:15:00)and

number of nodes ≤ 32 (#PBS -l nodes=16:ppn=2)

Tips (cont.)