linux hpc-cluster-setup-guide

28
Guide to Building your Linux High-performance Cluster Edmund Ochieng March 2, 2012 1

Upload: jasembo

Post on 14-May-2015

6.490 views

Category:

Technology


4 download

TRANSCRIPT

Page 1: Linux hpc-cluster-setup-guide

Guide to Building your Linux

High-performance Cluster

Edmund Ochieng

March 2, 2012

1

Page 2: Linux hpc-cluster-setup-guide

Abstract

In modern day where computer simulation forms a critical part inresearch, high-performance clusters have become a need in about everyeducational or research institution.

This paper aims to give you the instructions you need to setup yourpersonal computer. So if you are looking forward to setting up a cluster,this is the guide for you.

This guide is prepared with climate simulation in mind. However, be-sides the software required for climate simualtion, steps required to setupthe cluster remain more or less the same.

The setup aims to grant you the ability to run modelling, simulationand visualisation applications across multiple processors. Probably morethan you can have in a single server unit.

2

Page 3: Linux hpc-cluster-setup-guide

Contents

I Master node Configuration 5

1 Network configuration 61.1 Internal interface configuration . . . . . . . . . . . . . . . . . . . 61.2 External interface configuration . . . . . . . . . . . . . . . . . . . 6

2 MAC address acquisition 62.1 System Documentation / Manuals . . . . . . . . . . . . . . . . . 72.2 Netwotk Traffic Monitoring . . . . . . . . . . . . . . . . . . . . . 72.3 TFTP Configuration . . . . . . . . . . . . . . . . . . . . . . . . . 7

3 DHCP configuration 9

4 Local Repository 11

5 EPEL Repository 11

6 NFS configuration 12

7 SSH Key Generation Script 13

II Software and Compiler installation and configura-tion 14

8 Torque configuration 15

9 Maui configuration 19

10 Compiler Installation 2110.1 GCC Compilers . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2110.2 Intel Compilers . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

11 OpenMPI installation 2111.1 OpenMPI Compiled with GCC Compilers . . . . . . . . . . . . . 2211.2 OpenMPI Compiled with Intel Compilers . . . . . . . . . . . . . 22

12 Environment Modules installation 22

13 C3 Tools installation 23

14 Password Syncing 24

15 NetCDF, HDF5 and GrADs installation 24

16 NCL and NCO installation 25

17 R Statistical package installation 25

3

Page 4: Linux hpc-cluster-setup-guide

III Computing Node Installation 26

18 Node OS installtion 27

19 Name resolution 28

4

Page 5: Linux hpc-cluster-setup-guide

Part I

Master node Configuration

5

Page 6: Linux hpc-cluster-setup-guide

1 Network configuration

1.1 Internal interface configuration

Set the network interface through which the DHCP service will listen for IPaddress request to be static and to start on system boot up. This is shouldappear similar to the configurations below.

1. With a text editor of your choice, edit your master node network config-uration for the network interface to be used to communicate with othernodes in your cluster.

[root@master ~]# cat /etc/sysconfig/network-scripts/ifcfg-eth0

# Broadcom Corporation NetXtreme BCM5715 Gigabit Ethernet

DEVICE=eth0

#BOOTPROTO=dhcp

BOOTPROTO=static

HWADDR=00:16:36:E7:8B:A3

IPADDR=192.168.10.1

NETMASK=255.255.255.0

ONBOOT=yes

DHCP_HOSTNAME=master.cluster

2. Once the changes have been made, you can save the file and start theinterface.

3. Finally, you should invoke, the ifconfig instruction to confirm the settingsare active as illustrated below.

[root@master ~]# ifconfig eth0

eth0 Link encap:Ethernet HWaddr 00:16:36:E7:8B:A3

inet addr:192.168.10.1 Bcast:192.168.10.127 Mask:255.255.255.0

UP BROADCAST MULTICAST MTU:1500 Metric:1

RX packets:0 errors:0 dropped:0 overruns:0 frame:0

TX packets:0 errors:0 dropped:0 overruns:0 carrier:0

collisions:0 txqueuelen:1000

RX bytes:0 (0.0 b) TX bytes:0 (0.0 b)

Interrupt:74 Memory:fdfc0000-fdfd0000

1.2 External interface configuration

The eth1 interface shall be connected to the organizational network and willacquire network configuration via DHCP. So to have the inetrface working, allthat needs to be done is to set the ONBOOT option in /etc/sysconfig/network-scripts/ifcfg-eth1 and connect a cable to the interface.

2 MAC address acquisition

The MAC address acquisition step is important as it allows the master node touniquely identify the nodes that make up the cluster and as a result give themcustomized configuration.

6

Page 7: Linux hpc-cluster-setup-guide

Each network interface has a unique MAC address which can be obtainedeither from the system manuals/documentation or from listening to the networktraffic from the master node interface on which the dhcp shall be listening on.

2.1 System Documentation / Manuals

This could either be on the hardware such as is the case on Sun servers and acouple of HP servers I’ve seen or on the booklets provided alongside the server.However, this could at times be deceiving. If that is the case, you could alwayslisten on the network to obtain the desired MAC address.

2.2 Netwotk Traffic Monitoring

Using the tcpdump command, we can acquire the hardware interfaces’ MACaddress. For easy identification, each node should be turned on at any giventime during the MAC address collection process.

From the tcpdump output below, we can identify the network interface MACaddress of the first node as 00:1b:24:3d:f1:a3 since the column just before thesecond ”greater than” symbol is 0.0.0.0.68 - which basically means it has no ipaddress and expects a response on UDP port 68.

[root@master ~]# tcpdump -i eth0 -nn -qtep port bootpc and port bootps \

and ip broadcast

tcpdump: verbose output suppressed, use -v or -vv for full protocol

decode listening on eth0, link-type EN10MB (Ethernet), capture size

96 bytes

00:1b:24:3d:f1:a3 > ff:ff:ff:ff:ff:ff, IPv4, length 590: 0.0.0.0.68 >

255.255.255.255.67: UDP, length 548

00:16:36:e7:8b:a3 > ff:ff:ff:ff:ff:ff, IPv4, length 342: 192.168.10.1

.67 > 255.255.255.255.68: UDP, length 300

Repeat the above process for all nodes to which you would like to issue staticIP addresses.

2.3 TFTP Configuration

The TFTP service is trivial for a PXE server to work as they serve provide anetinstall kernel and a ramdisk to the clients when they attempt to do a networkboot.

By default, tftp which is part of xinetd.d is disabled. You can have it enabledby opening the configuration file and changing the value of the option ”disabled”from yes to no. Your completed configuration file should be similar to the oneshown below

1. Enable tftp which is part of the xinetd stack

[root@master ~]# vi /etc/xinetd.d/tftp

[root@master ~]# cat /etc/xinetd.d/tftp

# default: off

service tftp

7

Page 8: Linux hpc-cluster-setup-guide

{

socket_type = dgram

protocol = udp

wait = yes

user = root

server = /usr/sbin/in.tftpd

server_args = -s /tftpboot

disable = no

per_source = 11

cps = 100 2

flags = IPv4

}

2. Once done, restart the service xinetd to start tftp alongside other serviceson the next start.

[root@master ~]# service xinetd restart

Stopping xinetd: [ OK ]

Starting xinetd: [ OK ]

3. Check if a tftpboot directory has been created on the root directory treeas is shown below

[root@master ~]# file /tftpboot/

/tftpboot/: directory

4. Create a directory tree into which the pxe files shall be placed.

[root@master ~]# mkdir -p /tftpboot/pxe/pxelinux.cfg

5. Copy the netboot kernel image and an initial ramdisk.

[root@master ~]# ls /distro/centos/images/pxeboot/

initrd.img README TRANS.TBL vmlinuz

[root@master ~]# cp /distro/centos/images/pxeboot/{vmlinuz,

initrd.img} /tftpboot/pxe/

6. Locate the pxelinux.0 file and copy to the /tftpboot/pxe directory fromwhere it should be accessible via tftp daemon.

[root@master ~]# locate pxelinux.0

/usr/lib/syslinux/pxelinux.0

[root@master ~]# cp -av /usr/lib/syslinux/pxelinux.0 /tftpboot/pxe/

‘/usr/lib/syslinux/pxelinux.0’ -> ‘/tftpboot/pxe/pxelinux.0’

NOTE: Keenly note the location of the pxelinux.0 file as its relativepath(i.e. from the tftp root directory - /tftpboot) will be used in theDHCP daemon configuration section.

7. Create a default boot configuration file for machines that may not have aspecific boot file in the pxelinux.cfg directory.

8

Page 9: Linux hpc-cluster-setup-guide

[root@master ~]# vi /tftpboot/pxe/pxelinux.cfg/default

[root@master ~]# cat /tftpboot/pxe/pxelinux.cfg/default

# /tftpboot/pxe/pxelinux.cfg/default

prompt 1

timeout 100

default local

label local

LOCALBOOT 0

label install

kernel vmlinuz

append initrd=initrd.img network ip=dhcp lang=en US keymap=us \

ksdevice=eth0 ks=http://192.168.10.1/ks/node-ks.cfg \

loadramdisk=1 prompt_ramdisk=0 ramdisksize=16384 vga=normal \

selinux=0

8. Get the hexadecimal equivalent of the nodes ip address used to creat aper client pxe configuration.

[root@master pxelinux.cfg]# gethostip node01

node01 192.168.10.2 C0A80A02

[root@master pxelinux.cfg]# cp default C0A80A02

9. Copy the default file to a file with the hex equivalent obtained above.Open the file and change the line default local to default install. Thisshould commence installation on rebooting node01. The same should bedone for all other nodes.

[root@master ~]# cp /tftpboot/pxe/pxelinux.cfg/default /tftpbo

ot/pxe/pxelinux.cfg/C0A80A02

3 DHCP configuration

To issue static ip addresses via the DHCP daemon, the network interface hard-ware(or MAC) addresses collected in the MAC address collection section willbe necessary.

DHCP daemon configuration for the cluster should carried out as outlined inthe steps below.

1. Enter the name of the interface through which the DHCP daemon will belistening on.

[root@master ~]# cat /etc/sysconfig/dhcpd

# Command line options here

DHCPDARGS="eth0"

2. Create your DHCP configuration file, from the sample file in the locationbelow.

9

Page 10: Linux hpc-cluster-setup-guide

[root@master ~]# cp /usr/share/doc/dhcp-3.0.5/dhcpd.conf.sample \

/etc/dhcpd.conf

cp: overwrite ‘/etc/dhcpd.conf’? y

3. You could edit your your configurations to look more or less like my con-figurations issuing addresses to desired hosts using their MAC addressesas illustrated below.

[root@master ~]# cat /etc/dhcpd.conf

ddns-update-style interim;

ignore client-updates;

allow booting;

allow bootp;

subnet 192.168.10.0 netmask 255.255.255.0 {

# --- default gateway

# option routers 192.168.0.1;

option subnet-mask 255.255.255.0;

# option nis-domain "domain.org";

option domain-name "cluster";

option domain-name-servers 192.168.10.1;

option time-offset 10800; # EAT

# option ntp-servers 192.168.1.1;

# option netbios-name-servers 192.168.1.1;

# range dynamic-bootp 192.168.10.4 192.168.10.20;

default-lease-time 21600;

max-lease-time 43200;

filename "pxe/pxelinux.0";

next-server 192.168.10.1;

# we want the nameserver to appear at a fixed address

host node01 {

hardware ethernet 00:1b:24:3d:f1:a3;

fixed-address 192.168.10.2;

option host-name "node01";

}

host node02 {

hardware ethernet 00:1b:24:3e:05:d1;

fixed-address 192.168.10.3;

option host-name "node02";

}

host node03 {

hardware ethernet 00:1b:24:3e:04:f6;

fixed-address 192.168.10.4;

option host-name "node03";

}

}

4. Finally, save the configuration file and start the server.

10

Page 11: Linux hpc-cluster-setup-guide

[root@master ~]# service dhcpd start

Starting dhcpd: [ OK ]

5. Should the starting of DHCP daemon fail, you could look at the logs at/var/logs/messages and identify any DHCP daemon related errors. Thiscould be done using the GNU/Linux editor but for better troubleshooting,I’d proceed as below.

[root@master ~]# tail -f /var/log/messages

4 Local Repository

A local repository is very crucial in cases of poor Internet connectivity.

1. Create a directory on the system and make it copy all the contents of theinstallation disk into it.

[root@master ~]# mkdir -p /distro/centos

[root@master ~]# cp -ar /media/CentOS_5.6_Final/* /distro/centos

2. Create a new repository file that would point to the location created above.

[root@master ~]# cat /etc/yum.repos.d/CentOS-Local.repo

[Local]

name=CentOS- - Local

baseurl=file:///distro/centos

gpgcheck=0

enabled=1

3. Clear the cache and any other repository information saved locally

[root@master ~]# yum clean all

4. Make a cache of the new available repositories.

[root@master ~]# yum makecache

5 EPEL Repository

The addition of the EPEL(Extraa Packages for Enterprise Linux) repositorywas crucial in the facilitation of the installation of some of the software neededin the cluster and which installation from source was not quite a simple process.These are such as:

1. R - R Statistical package http://www.r-project.org/

2. NCO - NetCDF Operator http://nco.sourceforge.net/

3. CDO - Climate Data Operators

4. NCL - NCAR Command Languagehttp://www.ncl.ucar.edu/Applications/rcm.shtml

11

Page 12: Linux hpc-cluster-setup-guide

5. GrADS - Grid Analysis and Display System http://www.iges.org/

This is done as illustrated below:

[root@master ~]# rpm -Uvh http://download.fedora.redhat.com/pub/epel/5

/x86_64/epel-release-5-4.noarch.rpm

Retrieving http://download.fedora.redhat.com/pub/epel/5/x86_64/epel-re

lease-5-4.noarch.rpm

warning: /var/tmp/rpm-xfer.Ln8ILG: Header V3 DSA signature: NOKEY, key

ID 217521f6

Preparing... ########################################### [100%]

1:epel-release ########################################### [100%]

6 NFS configuration

We shall export some of the master node’s filesystem to reduce the need forrepetitive configuration.

1. Populate the /etc/exports configuration file with the directories you’d wishto have exported via nfs.

[root@master ~]# vi /etc/exports

/distro *(ro,root_squash)

/home *(rw,root_squash)

/distro/centos *(ro,root_squash)

/distro/ks *(ro,root_squash)

/opt *(ro,root_squash)

/usr/local *(ro,root_squash)

/scratch *(rw,root_squash)

2. Start the nfs daemon. Which should start succesfully should your config-urations.

[root@master ~]# service nfs start

Starting NFS services: [ OK ]

Starting NFS quotas: [ OK ]

Starting NFS daemon: [ OK ]

Starting NFS mountd: [ OK ]

3. Make the nfs daemon to autostart without on system start up.

[root@master ~]# chkconfig nfs on

[root@master ~]# exportfs -vra

exporting *:/distro/centos

exporting *:/distro/ks

exporting *:/usr/local

exporting *:/scratch

exporting *:/distro

exporting *:/home

exporting *:/opt

12

Page 13: Linux hpc-cluster-setup-guide

7 SSH Key Generation Script

To allow jobs to be succesfully submitted to the cluster, passwordless ssh loginshould be possible for all users on the cluster. So the script below will createa key pair and copy it over to the authorized keys file in the .ssh/ directory ineach users home directory.

This shall be automated by the script below which we shall place in system-wide /etc/profile.d directory.

[root@master modulefiles]# cat /etc/profile.d/passwordless-ssh.sh

Listing 1: /etc/profile.d/passwordless-ssh.sh#!/ bin /bash## / e tc / p r o f i l e . d/ password less−ssh . sh#

i f [ ! −d ”${HOME}” / . ssh / −o ! −f ”${HOME}” / . ssh / id d sa . pub ]thenecho −ne ”Generating ssh keys :\ t ”ssh−keygen −t dsa −N ”” −f ”${HOME}” / . ssh / id d sa

i f [ ”$?” −eq 0 ] ; thenecho −e ” [ \ 0 3 3 [ 3 2 ; 1m done \033 [0m] ” ;cat ”${HOME}” / . ssh / id d sa . pub >> ”${HOME}” / . ssh / author i z ed keyschmod −R u+rwX, go= ”${HOME}” / . ssh /

e l s eecho −e ” [ \ 0 3 3 [ 3 5 ; 1m f a i l e d \033 [0m] ”

f if i

13

Page 14: Linux hpc-cluster-setup-guide

Part II

Software and Compilerinstallation and configuration

14

Page 15: Linux hpc-cluster-setup-guide

8 Torque configuration

1. Untar the source and execute the configure script with the following below.

[root@master src]# tar xvfz torque-2.4.14.tar.gz

[root@master src]# cd torque-2.4.14

[root@master torque-2.4.14]# mkdir build

[root@master torque-2.4.14]# cd build

[root@master build]# ../configure --help

[root@master build]# ../configure --prefix=/opt/torque --

enable-server --enable-mom --enable-clients --disable-gui

--with-rcp=scp

2. Compile the code to create binary files by executing ”make”, followed by”make install” to install the binaries.

[root@master build]# make

[root@master build]# make install

3. Add the path for the sbin directory to the root user’s .bashrc file.

[root@master torque-2.4.14]# echo "export PATH=/opt/torqu

e/sbin:\$PATH" >> /root/.bashrc

[root@master torque-2.4.14]# tail -n 1 ~/.bashrc

export PATH=/opt/torque/sbin:$PATH

4. Copy the pbs mom script in the contrib/init.d directory of the installationsource /opt/torque/pbs mom.init. Open the file in an editor of your choiceand ammend any erroneous paths.

[root@master torque-2.4.14]# cp contrib/init.d/pbs_mom \

/opt/torque/pbs_mom.init

[root@master torque-2.4.14]# vi /opt/torque/pbs_mom.init

5. Copy the node install.sh script into the torque install directory. It will beused to install pbs mom on the computing nodes.

Listing 2: node install.sh#!/ bin /bash# /opt / torque / n o d e i n s t a l l . sh

# h t t p :// ep ico . esc ience−l a b . org# mai l to : baro@democritos . i t

TORQUEHOME=/opt/ torque /TORQUEBIN=$TORQUEHOME/bin

MAUIBIN=/opt/maui/ binSPOOL=/var / spoo l / torque

mkdir −vp $SPOOL

cd $SPOOL | | e x i t

#===========================================================#

15

Page 16: Linux hpc-cluster-setup-guide

mkdir −vp aux mom priv/ jobs mom logs checkpo int spoo lunde l i ve r ed

chmod −v 1777 spoo l unde l i ve r ed

f o r s in pro logue ep i l o guedo

t e s t −e $TORQUEHOME/ s c r i p t s / $s && \ln −sv $TORQUEHOME/ s c r i p t s / $s $SPOOL/mom priv/

done

#===========================================================#

cat << EOF > pbs environmentPATH=/bin : / usr / binLANG=C

EOF

#===========================================================#

echo master > server name

#===========================================================#

cat << EOF > mom priv/ c on f i g\ $ c l i e n t h o s t master\ $ logevent 0 x7f\ $usecp ∗ : / u /u\ $usecp ∗ : / home /home\ $usecp ∗ : / s c ra t ch / s c ra t ch

EOF

#===========================================================#

MOM INIT=/etc / i n i t . d/pbs mom

cp −va /opt/ torque /pbs mom . i n i t $MOM INITchmod +x $MOM INIT

chkcon f i g −−add pbs momchkcon f i g pbs mom on

# increase l im i t s f o r i n f i n i b and s t u f f (pbs mom i s NOTpam l imi ts aware )

egrep ’ u l im i t [ [ : space : ] ]+.∗ − l [ [ : space : ] ] ’ $MOM INIT | | \pe r l −e ’whi l e (<>) {

pr in t ;i f ( /ˆ [ \ t ]+ s t a r t \) / ){

pr in t << EOF ;#−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−## increase l im i t s f o r i n f i n i b and s t u f f (no−pam limits−

aware )# max locked memory , s o f t and hard l im i t s f o r a l l PBS

ch i l d r enu l im i t −H − l un l imi tedu l im i t −S − l 4096000# stack s i z e , s o f t and hard l im i t s f o r a l l PBS

ch i l d r enu l im i t −H −s un l imi ted

16

Page 17: Linux hpc-cluster-setup-guide

u l im i t −S −s 1024000#−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−#

EOF}

}’ − i $MOM INIT

#===========================================================#

cat << EOF > / e tc / p r o f i l e . d/pbs . shexport PATH=$TORQUEBIN:$MAUIBIN:\$PATH

EOF

#EOF

6. In an editor of your choice, enter the fully qualified domain name of yourmaster node in the file below.

[root@master torque-2.4.14]# vi /var/spool/torque/server_name

master.cluster

7. Add your nodes and the their properties into the nodes file as shown below.

[root@master torque-2.4.14]# vi /var/spool/torque/server_priv/nodes

node01 np=4

node02 np=4

node03 np=4

8. Initialize the serverdb and start the torque pbs server as shown below

[root@master ~]# pbs_server -t create

[root@master ~]# service pbs_server start

Starting TORQUE Server: [ OK ]

9. Create a queue(s) to suit your configuration and make at least one ofdefault using the torque qmgr command. An easier way would be tocreate a file as below

[root@master ~]# vi qmgr.cluster

create queue default

set queue default queue_type = Execution

set queue default Priority = 60

set queue default max_running = 128

set queue default resources_max.walltime = 168:00:00

set queue default resources_default.walltime = 01:00:00

set queue default max_user_run = 12

set queue default enabled = True

set queue default started = True

set server scheduling = True

set server managers = maui@master

set server managers += root@master

set server operators = maui@master

set server operators += root@master

set server default_queue = default

17

Page 18: Linux hpc-cluster-setup-guide

10. Load the enter the file containing the qmgr configuration as illustratedbelow

[root@master ~]# qmgr -c < qmgr.cluster

11. A print of the pbs server configuration looks as below

[root@master ~]# qmgr -c ’p s’

#

# Create queues and set their attributes.

#

#

# Create and define queue default

#

create queue default

set queue default queue_type = Execution

set queue default Priority = 60

set queue default max_running = 128

set queue default resources_max.walltime = 168:00:00

set queue default resources_default.walltime = 01:00:00

set queue default max_user_run = 12

set queue default enabled = True

set queue default started = True

#

# Set server attributes.

#

set server scheduling = True

set server acl_hosts = master.cluster

set server managers = maui@master

set server managers += root@master

set server operators = maui@master

set server operators += root@master

set server default_queue = default

set server log_events = 511

set server mail_from = adm

set server query_other_jobs = True

set server scheduler_iteration = 600

set server node_check_rate = 150

set server tcp_timeout = 6

set server next_job_number = 26

12. Restart both the pbs server on the master node and the pbs mom on thenodes and execute, pbsnodes to see a print out on all free nodes.

[root@master ~]# pbsnodes

node01

state = free

np = 2

ntype = cluster

status = rectime=1308321567,varattr=,jobs=,state=free,

netload=1205591,gres=,loadave=0.18,ncpus=4,physmem=4051184

kb,availmem=5021068kb,totmem=5103400kb,idletime=0,nusers=0,

nsessions=? 0,sessions=? 0,uname=Linux node01 2.6.18-238.

el5 #1 SMP Thu Jan 13 15:51:15 EST 2011 x86_64,opsys=linux

18

Page 19: Linux hpc-cluster-setup-guide

node02

state = free

np = 2

ntype = cluster

status = rectime=1308321569,varattr=,jobs=,state=free,

netload=1209868,gres=,loadave=0.38,ncpus=4,physmem=4051184

kb,availmem=5020892kb,totmem=5103400kb,idletime=0,nusers=0,

nsessions=? 0,sessions=? 0,uname=Linux node02 2.6.18-238.

el5 #1 SMP Thu Jan 13 15:51:15 EST 2011 x86_64,opsys=linux

node03

state = free

np = 2

ntype = cluster

status = rectime=1308321569,varattr=,jobs=,state=free,

netload=1209868,gres=,loadave=0.38,ncpus=4,physmem=4051184

kb,availmem=5020892kb,totmem=5103400kb,idletime=0,nusers=0,

nsessions=? 0,sessions=? 0,uname=Linux node02 2.6.18-238.

el5 #1 SMP Thu Jan 13 15:51:15 EST 2011 x86_64,opsys=linux

9 Maui configuration

1. Untar, configure, make binaries and install maui from source as shown inthe next sequence of steps

[root@master ~]# tar xvfz maui-3.3.1.tar.gz

[root@master ~]# cd maui-3.3.1

[root@master maui-3.3.1]# ./configure --help

[root@master maui-3.3.1]# ./configure --prefix=/opt/maui

--with-spooldir=/var/spool/maui --with-pbs=/opt/torque/

[root@master maui-3.3.1]# make

[root@master maui-3.3.1]# make install

2. Create a system user maui through which maui shall be run

[root@master maui-3.3.1]# useradd -d /var/spool/maui -r -g daemon \

maui

3. Edit the maui.cfg file changing the SERVERHOST, ADMIN1, ADMIN3and resouce manager definition(RMCFG) as shown in the snipett below

[root@master maui-3.3.1]# vi /var/spool/maui/maui.cfg

# maui.cfg 3.3.1

SERVERHOST master

# primary admin must be first in list

ADMIN1 maui root

ADMIN3 ALL

# Resource Manager Definition

RMCFG[MASTER] TYPE=PBS

19

Page 20: Linux hpc-cluster-setup-guide

# Allocation Manager Definition

AMCFG[bank] TYPE=NONE

....

EOF

4. Copy the init script in the maui source package to /etc/init.d/ and, edit thefile changing the MAUI PREFIX to point to your installation directory.

[root@master maui-3.3.1]# cp contrib/service-scripts/redhat. \

maui.d /etc/init.d/maui

[root@master maui-3.3.1]# vi /etc/init.d/maui

[root@master maui-3.3.1]# cat /etc/init.d/maui

#!/bin/sh

#

# maui This script will start and stop the MAUI Scheduler

#

# chkconfig: 345 85 85

# description: maui

#

ulimit -n 32768

# Source the library functions

. /etc/rc.d/init.d/functions

MAUI_PREFIX=/opt/maui

# let see how we were called

case "$1" in

start)

echo -n "Starting MAUI Scheduler: "

daemon --user maui $MAUI_PREFIX/sbin/maui

echo

;;

stop)

echo -n "Shutting down MAUI Scheduler: "

killproc maui

echo

;;

status)

status maui

;;

restart)

$0 stop

$0 start

;;

*)

echo "Usage: maui {start|stop|restart|status}"

exit 1

esac

5. Create a file maui.sh in the /etc/profile.d directory and to it add theenvironment variables PATH, INCLUDE and LD LIBRARY PATH andmake it executable.

20

Page 21: Linux hpc-cluster-setup-guide

[root@master maui]# vi /etc/profile.d/maui.sh

[root@master maui]# chmod +x /etc/profile.d/maui.sh

10 Compiler Installation

A compilers is necessary in a cluster as they aid in the changing of sourcecode into executables that can be run or understood by the computer. Ofinterest are C, C++ and fortran compilers popular of which are the GCC andIntel compilers. Another, option is the PGI compilers which we shall not haveinstalled.

10.1 GCC Compilers

From the CentOS repositories we shall install the GCC compilers using the yumpackage management utility.

[root@master src]# yum -y install gcc.x86_64 gcc-gfortran.x86_64 \

libstdc++.x86_64 libstdc++-devel.x86_64 libgcj.x86_64 compat-lib \

stdc++.x86_64

10.2 Intel Compilers

For the Intel compilers which may give better results depending on the scenario,we shall proceed with the installation as outlined below:

1. Visit the Intel Website in your preferred web browser, register and down-load the Intel compilers for non-commercial use.

2. Move to the directory into which you downloaded the Intel C compilersand Fortran compilers.

3. Untar the tarballs and change directory into the created directory.

[root@master ~]# tar xvfz l_ccompxe_2011.4.191.tgz

[root@master ~]# cd l_ccompxe_2011.4.191

[root@master l_ccompxe_2011.4.191]# ./install.sh

[root@master ~]# tar xvfz l_fcompxe_2011.4.191.tgz

[root@master ~]# cd l_fcompxe_2011.4.191

[root@master l_fcompxe_2011.4.191]# ./install.sh

4. Execute the install.sh script and proceed as prompted.

11 OpenMPI installation

OpenMPI is an open source library implementation of the Message PassingInterface(MPI-2) and facilitates communication/message inter-change betweenprocess in a High Performance Computing environment.

21

Page 22: Linux hpc-cluster-setup-guide

11.1 OpenMPI Compiled with GCC Compilers

1. Untar and compile the sources

[root@master src]# tar xvfj openmpi-1.4.2.tar.bz2

[root@master src]# cd openmpi-1.4.2

[root@master openmpi-1.4.2]# mkdir build

[root@master openmpi-1.4.2]# cd build/

[root@master build]# ../configure CC=gcc CXX=g++ FC=gfortran \

F77=gfortran --prefix=/opt/openmpi/1.4.2/gcc/4.1.2 \

--with-tm=/opt/torque/

2. Create binaries by running ”make”

[root@master build]# make

3. Finally, install the binaries into the system

[root@master build]# make install

11.2 OpenMPI Compiled with Intel Compilers

1. Untar and compile the sources as above. However, take keen notice of thevalue of the variables CC, CXX, FC and F77 as compared to the samestep when compiled with the GCC compilers above.

[root@master src]# tar xvfj openmpi-1.4.2.tar.bz2

[root@master src]# cd openmpi-1.4.2

[root@master openmpi-1.4.2]# mkdir build

[root@master openmpi-1.4.2]# cd build/

[root@master build]# ../configure CC=icc CXX=icpc FC=ifort \

F77=ifort --prefix=/opt/openmpi/1.4.2/intel/12.0.4 \

--with-tm=/opt/torque/

2. Create binaries by running ”make”

[root@master build]# make

3. Finally, install the binaries into the system

[root@master build]# make install

12 Environment Modules installation

1. Obtain the environment modules source file, uncompress it and changeedirectory into the created directory as below

[root@master src]# tar xvfz modules-3.2.8a.tar.gz

[root@master src]# cd modules-3.2.8

2. Then compile the sources specifying a prefix where the sources should beinstalled.

22

Page 23: Linux hpc-cluster-setup-guide

[root@master modules-3.2.8]# ./configure --prefix=/opt

Should, you be running a 64-bit system and encounter an error indicatingtcl lib and include directories cannot be found, proceed as below

[root@master modules-3.2.8]# ./configure --with-tcl-lib=/usr/lib64/

--with-tcl-inc=/usr/include/ --prefix=/opt

3. Then create binaries and install.

[root@master modules-3.2.8]# make

[root@master modules-3.2.8]# make install

4. Finally, copy the init scrips to the /etc/profile.d directory to make themodules command available system-wide.

[root@master modules-3.2.8]# cp /opt/Modules/3.2.8/init/bash /etc/

profile.d/modules.sh

[root@master modules-3.2.8]# cp /opt/Modules/3.2.8/init/bash_compl

etion /etc/profile.d/modules_bash_completion.sh

13 C3 Tools installation

1. Uncompress the C3 tools source package and execute the install script

[root@master src]# tar xvfz c3-4.0.1.tar.gz

[root@master src]# cd c3-4.0.1

[root@master c3-4.0.1]# ./Install-c3

2. Create a c3.conf configuration file defining a cluster name, the master nodeand nodes in the cluster.

[root@master c3-4.0.1]# vi /etc/c3.conf

[root@master c3-4.0.1]# cat /etc/c3.conf

cluster cluster1 {

master:master

node0[1-3]

}

3. Create ssh keys to be used for passwordless login in the nodes of thecluster.

[root@master ~]# ssh-keygen -t dsa

Generating public/private dsa key pair.

Enter file in which to save the key (/root/.ssh/id_dsa):

Created directory ’/root/.ssh’.

Enter passphrase (empty for no passphrase):

Enter same passphrase again:

Your identification has been saved in /root/.ssh/id_dsa.

Your public key has been saved in /root/.ssh/id_dsa.pub.

The key fingerprint is:

46:6d:e5:e5:e2:5c:b5:72:16:bc:04:6f:59:2c:b5:32 root@master

.cluster

23

Page 24: Linux hpc-cluster-setup-guide

4. Copy the /.ssh/id dsa.pub contents to the authorized keys file of all nodesin the cluster. This is how to do it on a single node.

[root@master ~]# ssh-copy-id -i ~/.ssh/id_dsa.pub root@node01

21

The authenticity of host ’node01 (192.168.10.2)’ can’t be es

tablished. DSA key fingerprint is fe:8d:bf:6e:de:f4:94:d3:c4:

d7:ee:74:6c:8c:dd:da. Are you sure you want to continue conn-

ecting (yes/no)? yes

Warning: Permanently added ’node01,192.168.10.2’ (RSA) to the

list of known hosts.

root@node01’s password:

Now try logging into the machine, with "ssh ’root@node01’",

and check in:

.ssh/authorized_keys

to make sure we haven’t added extra keys that you weren’t

expecting.

5. Test if the key was succesfully registered by attempting to login intonode01.

[root@master ~]# ssh node01

Last login: Fri Jun 17 12:53:28 2011

[root@node01 ~]# exit

logout

14 Password Syncing

User accounts and passwords in the cluster should be similar in all nodes form-ing the cluster should be the same however, we cant have the user create thepassword in all the machines that form up the cluster. We shall therefore createa script to effect this. In our case we shall use the cpush command from the c3tools package installed earlier.

Listing 3: node-ks.cfg#!/ bin /bash## Sync / e t c /passwd , / e t c /shadow and / e t c /group# Fi l e : / root / bin# Cron : min hour dom month dow root / e t c /password−push . sh

f o r f in passwd shadow group ; do/opt/c3−4/cpush / e tc /”${ f }” > /dev/ nu l l

done

However, have in mind that rsync could be used to achieve the same.

15 NetCDF, HDF5 and GrADs installation

Grads requires NetCDF and HDF5 as dependencies for its installtion. Therefore,we shall install them all as a pack from the epel repositories.

24

Page 25: Linux hpc-cluster-setup-guide

[root@master ~]# yum -y install netcdf hdf5 grads

16 NCL and NCO installation

These too we shall have installed using the yum package manager as below

[root@master ~]# yum -y install ncl nco

17 R Statistical package installation

The R statistical package will be installed from the epel repositories to save asfrom the agony of installing a myraid of dependencies and for easy updating ofthe packages.

[root@master ~]# yum -y install R.x86_64 R-core.x86_64 R-devel.x86_64 \

libRmath.x86_64 libRmath-devel.x86_64

25

Page 26: Linux hpc-cluster-setup-guide

Part III

Computing Node Installation

26

Page 27: Linux hpc-cluster-setup-guide

18 Node OS installtion

With the master node setup complete, installtion of the nodes should just be apush of a button. However, a little understanding of the node-ks.cfg is essential.It marks the packages tftp, openssh-server, openssh, xorg-x11-xauth, mc andstrace for installation and those with a preceeding − sign for uninstalltion.

There after, the post installation section is executed, which removes unwantedpackages, creates a local repository, and install the gcc compilers on the nodeswhich are available on the CentOS repositories.

Listing 4: node-ks.cfgt f t popenssh−s e r v e ropensshxorg−x11−xauthmcs t r a c e−cups−cups− l i b s−bluez−u t i l s−bluez−gnome−rp−pppoe−ppp

%post −−l og=/root /ks−post . l ogMASTER=192.168 .10 .1

# Dele te unwanted s e r v i c e sf o r i in sendmail ;do

chkcon f i g −−de l ”${ i }”done

# Remove d e f a u l t reposta r cv f z yum. repos . d . ta r . gz / e tc /yum. repos . drm −r f / e t c /yum. repos . d/∗

# Mount / d i s t r o form master nodemkdir −p / d i s t r omount −t n f s $MASTER:/ d i s t r o / d i s t r o

# Add mount to f s t a becho −e ” 1 9 2 . 1 6 8 . 1 0 . 1 : / d i s t r o \ t / d i s t r o \ t \ t n f s \ t d e f a u l t s \ t0 0” | t e e

−a / e tc / f s t ab

# Add master node ’ s / opt to f s t a becho −e ” 1 9 2 . 1 6 8 . 1 0 . 1 : / opt\ t /opt\ t \ t n f s \ t d e f a u l t s \ t0 0” | t e e −a /

e tc / f s t ab

# Add master node ’ s /home to f s t a becho −e ” 1 9 2 . 1 6 8 . 1 0 . 1 : / home\ t /home\ t \ t n f s \ t d e f a u l t s \ t0 0” | t e e −a

/ e tc / f s t ab

# Execute the n o d e i n s t a l l . sh s c r i p t to i n s t a l l pbs mom/opt/ torque / n o d e i n s t a l l . sh

# Create l o c a l repomkdir −p / d i s t r o / centosecho −e ” [ Local ] \nname=CentOS−$ r e l e a s e v e r − Local \ nbaseur l= f i l

27

Page 28: Linux hpc-cluster-setup-guide

e :/// d i s t r o / centos \ngpgcheck=0 \nenabled=1” | t e e / e t c /yum. repos. d/CentOS−Local . repo

yum c l ean a l lyum makecache

# GCC compi lersyum −y i n s t a l l gcc . x86 64 gcc−g f o r t r an . x86 64 l i b s t d c++.x86 64

l i b s t d c++−deve l . x86 64 l i b g c j . x86 64 compat−l i b s t d c++.x86 64

Once the installation is complete, you could have a look at the ks-post.login root’s home directory for any errors while executing the post section of thekickstart file.

19 Name resolution

Finally, ensure that all the nodes in the cluster can resolve names of the nodes inthe cluster. You can either setup DNS on the master node or use the /etc/hostsfile.

SHould you need help setting up a DNS server, post your requests in thecomments below.

28