quick tutorial on mpich for nic-cluster cs 387 class notes
Post on 21-Dec-2015
226 views
TRANSCRIPT
Quick Tutorial on MPICH for NIC-Cluster
CS 387 Class Notes
NIC CLUSTER OVERVIEW
Start page: http://hpc.mst.edu/
Node Allocation and Usage policy: http://hpc.mst.edu/accessandpolicies/
The Shared NIC Cluster Hardware and Software
The NIC cluster had 64-bit nodes (http://hpc.mst.edu/hardware/) with an Ethernet network and eventually Infiniband interconnect, with the following standard software suite:
• The Torque/PBS scheduler.• Compilers: GCC, Intel-9, Intel-10, and Intel-11 Compiler Suites.• Applications and Libraries listed at http://hpc.mst.edu/applications/
InfiniBand offers point-to-point bidirectional serial links intended for the connection of processors with high-speed peripherals such as disks. InfiniBand also offers multicast operations.
Cluster pictures
PBS Job Scripts
NIC cluster uses PBS (Portable Batch System)
Why? Improves overall system efficiency Fair access to all users since it maintains a
scheduling policy Provides protection against dead nodes
How PBS works
User writes a batch script for the job and submits it to PBS with the qsub command.
PBS places the job into a queue based on its resource requests and runs the job when those resources become available.
The job runs until it either completes or exceeds one of its resource request limits.
PBS copies the job’s output into the directory from which the job was submitted and optionally notifies the user via email that the job has ended.
Step 1: Login
Off-campus machine:Connect to campus using MST VPN
> ssh nic.mst.edu
On-campus machine (VPN is not required) > ssh nic.mst.edu
Use the following to set up your MPI path correctly.
$ module load openmpi/gnu
To make this your default run$ savemodules
Visit http://hpc.mst.edu/examples/openmpi/c/to get information about OpenMP
DFS files are not directly accessible at the cluster
Use sftp command to transfer any files from your DFS space (S: drive)
e.g. > sftp minersftp.mst.edu> get X.c> quit
You may also use WinSCP in Windows or Fugu from OS X.
No DFS Support
Step 2: Compile MPICH Programs
Syntax:
C : mpicc –o hello hello.cC++ : mpiCC –o hello hello.cpp
Note: Before compilation, make sure the MPICH library path is set or use the export command like below:
export PATH=/opt/mpich/gnu/bin: $PATH
Executable file
Step 3: Write PBS batch script file
Ex1: A simple script file (pbs_script)
A job named “HELLO” requests 8 nodes and at most 15 minutes of runtime.
#!/bin/bash#PBS –N HELLO#PBS –l walltime=0:15:00#PBS –l nodes=8#PBS –q @nic-cluster.mst.edu
mpirun –n 8 /nethome/users/ercal/MPI/hello
Some PBS Directive options
• -N jobname (name the job “jobname”)
• -q @nic-cluster.mst.edu (The cluster address to send the job to)
• -e errfile (redirect standard error to a file named errfile)• -o outfile (redirect standard output to a file named outfile)• -j oe (combine standard output and standard error)• -l walltime=N (request a walltime of N in the form hh:mm:ss)• -l cput=N (request N sec of CPU time; or in the form hh:mm:ss)• -l mem=N[KMG][BW] (request total N kilo| mega| giga} {bytes|
words} of memory on all requested processors together)• -l nodes=N:ppn=M (request N nodes with M processors per node)
Step 3.1: Submit a Job
Use PBS command qsub
Syntax :qsub pbs-job-filename
Example :> qsub pbs_script
returns the message555.nic-p1.srv.mst.edu
(555 is the job ID that PBS automatically assigns to your job)
Result after job completion
An error file and an output file are created.The names are usually of the form:
jobfilename.o(jobid)jobfilename.e(jobid)
Ex: simplejob.e555 – Contains STDERRsimplejob.o555 – Contains STDOUT
-j oe (combine standard output and standard error)
Ex2: Another sample batch script (pbs_script)
#PBS -N hello#PBS -l mem=200mb#PBS -l walltime=0:15:00#PBS -l nodes=2:ppn=2#PBS -j oe#PBS –m abe
mpirun –n 8 /nethome/users/ercal/MPI/hello
This job “hello” requests 15 minutes of wall-time, 2 nodes using 2 processors each (4 processors), and 200MB of memory (100MB per node; 50MB per processor). Also, the output and error are written to one file.
How many processes are created?
Tools
qstat (jobid) qstat –u (userid) qstat -a
This command returns the status of specified job(s).
• qdel (jobid)This command deletes a job.
• size executable_file_name gives O/P in the following format:text data bss dec hex filename1650312 71928 6044136 7766376 768168 hello
(This can help to check memory requirements before submitting a job)
Tips for programming in MPICH
Use compiler optimizing flags for faster code. Some of them are:
-O2 (moderate optimization) -funroll-loops (enables loop unrolling optimizations) -Wall (enables all common warnings) -ansi (enables ANSI C/C++ compliance) -pedantic (enables strictness of language compliance)
Avoid using pointers in your program, unless absolutely necessary.
No scanf allowed in MPICH (fscanf is allowed). Instead, pass input to your program using command line arguments argc and argv in the mpiexec command of the PBS script
Example:mpirun –n 8 /nethome/users/ercal/MPI/cpi …arguments…
To give your job a high priority, set
wall-time ≤ 15 minutes (#PBS -l walltime=0:15:00)and
number of nodes ≤ 32 (#PBS -l nodes=16:ppn=2)
Tips (cont.)