MPI CUSTOMIZATION IN ROMA3 SITE Antonio Budano Federico Bitelli.
Post on 28-Mar-2015
- Slide 1
MPI CUSTOMIZATION IN ROMA3 SITE Antonio Budano Federico Bitelli Slide 2 MPI in ROMA3 Our CE is a Cream CE and is used also to manage local queue (job submitted with pbs q ) Worker nodes are essentially of two types 16 blades of 8 cores on a HP system 8 blades of 16+ cores on SuperMicro system equipped with Infiniband Pbs nodefile so is composed of lines similar to wn-02-01-16.cluster.roma3 np=8 lcgpro wn-05-01-01.cluster.roma3 np=16 lcgpro infiniband Goal : Each mpi job must go to Infiniband nodes Local MPI Jobs shoulds exatcly meets users requirement (eg #PBS -l nodes=3:ppn=6) Publish in grid : MPI-Infiniband MPI-START MPICH2 MPICH2-1.6 OPENMPI OPENMPI-1.4.3 MPICH1 Slide 3 Local Jobs We had the problem that maui/pbs did not meet the pbs jobs requirement When users asked (#PBS -l nodes=3:ppn=6) System gave him just maximum avalaible slot on a single WN (so 16 in our case) We fixed this upgrading Maui to maui-3.3.1 and pbs to version 2.5.5 (on worker nodes too!!) We made the upgrades configuring and compiling both from tar.gz files Slide 4 Grid Jobs We just configured Torque Client in the CE (used to submit grid jobs) to use submit filter [root@ce-02 ~]# cat /var/spool/torque/torque.cfg SUBMITFILTER /var/spool/pbs/submit_filter http://web-cluster.fis.uniroma3.it/CE02/submit_filter Slide 5 > $bls_tmp_file That line routes MPI jobs to the queue mpi_ib And we told torque that each job in this queue must go to Infiniband Nodes set queue mpi_ib resources_default.neednodes = infiniband"> MPI INFINIBAND To route MPI Jobs to WorkerNodes with Infiniband We edited (on CE) /opt/glite/bin/pbs_submit.sh and we added the line [ -z "$bls_opt_mpinodes" ] || echo "#PBS -q mpi_ib" >> $bls_tmp_file That line routes MPI jobs to the queue mpi_ib And we told torque that each job in this queue must go to Infiniband Nodes set queue mpi_ib resources_default.neednodes = infiniband Slide 6 MPISTART PROBLEM we wanted to use and publish our version of MPICH2 (compiled for Infiniband) MPI-START MPICH2 MPICH2-1.6 To do that official manual says you should edit (on the WNs ) the files /etc/profile.d/mpi_grid_vars.sh (& /etc/profile.d/mpi_grid_vars.csh ) and add export MPI_MPICH2_MPIEXEC=/usr/mpi/gcc/mvapich2-1.6/bin/mpiexec export MPI_MPICH2_PATH=/usr/mpi/gcc/mvapich2-1.6/ export MPI_MPICH2_VERSION=1.6 and similar in /etc/profile.d/grid-env.sh But jobs could not start After some days of troubleshoting we saw that the problem was in i2g-mpi-start-0.0.64-1 package In particular the file /opt/i2g/etc/mpi-start/mpich2.mpi In this files there are some bugs The corrected version will be as soon as possible in WIKI pages Slide 7 BDII Configuration Remember to publish information into the Bdii So on CE edit /opt/glite/etc/gip/ldif/static-file-Cluster.ldif and add properly GlueHostApplicationSoftwareRunTimeEnvironment: MPI-START GlueHostApplicationSoftwareRunTimeEnvironment: MPICH2 GlueHostApplicationSoftwareRunTimeEnvironment: MPICH2-1.6 GlueHostApplicationSoftwareRunTimeEnvironment: OPENMPI GlueHostApplicationSoftwareRunTimeEnvironment: OPENMPI-1.4.3 GlueHostApplicationSoftwareRunTimeEnvironment: MPI-Infiniband Then /etc/init.d/bdii restart Slide 8 MPI STATUS [root@ce-02 bin]# ldapsearch -xLLL -h egee-bdii.cnaf.infn.it:2170 -b o=grid '(&(objectClass=GlueHostApplicationSoftware)(GlueSubClusterUniqueID='ce-02.roma3.infn.it'))' GlueHostApplicationSoftwareRunTimeEnvironment |grep MPI GlueHostApplicationSoftwareRunTimeEnvironment: MPI-Infiniband GlueHostApplicationSoftwareRunTimeEnvironment: MPI-START GlueHostApplicationSoftwareRunTimeEnvironment: MPICH2 GlueHostApplicationSoftwareRunTimeEnvironment: MPICH2-1.6 GlueHostApplicationSoftwareRunTimeEnvironment: OPENMPI GlueHostApplicationSoftwareRunTimeEnvironment: OPENMPI-1.4.3