Transcript
Page 1: OptiPortal Configuration Considerations

OptiPortal Configuration Considerations

Ashley WrightHigh Performance Computing and Research Support

(QUT)

Page 2: OptiPortal Configuration Considerations

Our OptiPortal

Page 3: OptiPortal Configuration Considerations

Our Optiportal

6x Dell Precision T3500 Intel Xeon E5520 (2.27GHz) 4GB RAM nVidia FX 1800 Onboard 1Gb/s network PCIe 1Gb/s network card (supports Jumbo Frames) 300GB HDD

22x Dell 24” Monitors (4x5 configuration)

Page 4: OptiPortal Configuration Considerations

Considerations

Wish to be able to keep the cluster in a known state. To be able to recover quickly when something goes

wrong. Need to be able to install applications fast. Compile code on the OptiPortal. Fast. Easy to use.

Page 5: OptiPortal Configuration Considerations

ROCKS with Viz Roll

Fairly easy to install. Used initially to test OptiPortal and software which

can run on a Vis Wall. Software was out of date

(CentOS 5 vs Fedora 12) Difficult to customise. Difficult to install our own software.

Page 6: OptiPortal Configuration Considerations

Similarities to HPC clusters.

Lots of applications. Each node of the cluster is identical. Need performance. Need to minimise downtime.

Page 7: OptiPortal Configuration Considerations

HPC Cluster

Network boot and install. Shared file system across nodes. Nodes are generally identical. Multiple networks for different uses

(ie management vs MPI)

Page 8: OptiPortal Configuration Considerations

Installing nodes

Network boot and auto install scripts, make reinstalling easy.

Fedora 11 & 12 used. Cobbler (https://fedorahosted.org/cobbler/)

HTTP/PXE/TFTP DHCP/DNS Yum mirror Also customisation of the install process.

Page 9: OptiPortal Configuration Considerations

Installing nodes - cobbler

#install nvidia driver

pushd /root/

wget http://$http_server/files/NVIDIA-Linux-x86_64-190.53-pkg2.run -O /root/NVIDIA-Linux-x86_64-190.53-pkg2.run

chmod +x /root/NVIDIA-Linux-x86_64-190.53-pkg2.run

wget http://$http_server/files/nvidia-install.sh -O /etc/init.d/nvidia-install.sh

chmod +x /etc/init.d/nvidia-install.sh

chkconfig --add nvidia-install.sh

chkconfig nvidia-install.sh on

Page 10: OptiPortal Configuration Considerations

File Server

Hosts non-volatile, shared home directories (/home), software directories (/pkg), and fedora mirror. Built with an old Dell 2900 Server:

6x1.5TB HDD (RAID 0+1). 4x 1Gb/s aggregate network. 250MB/s throughput.

Page 11: OptiPortal Configuration Considerations

Keeping nodes in 'sync'

When you change something on one node you want it the same on the other nodes.

Having a shared home and application directory makes this easy.

Puppet to manage files in /etc (http://www.puppetlabs.com/)

Automated configuration management. Makes sure files and services are in a known state.

If they are not puppet fixes them. Updates every 30mins (default).

Page 12: OptiPortal Configuration Considerations

Nodes in 'sync' - Puppetclass sshd {

file { "/etc/ssh/sshd_config":

owner => root,

group => root,

mode => 600,

ensure => present,

source => "puppet:///files/ssh/sshd_config"

}

exec { "/etc/init.d/sshd reload":

subscribe => File["/etc/ssh/sshd_config"],

refreshonly => true,

}

service { "sshd":

status => "/etc/init.d/sshd status",

ensure => running,

}

}

Page 13: OptiPortal Configuration Considerations

Network

One network for management (dns/dhcp). Onboard network, can network boot.

One network for Internet. PCIe network card, can jumbo frame.

Internet network outside QUT firewall.

Page 14: OptiPortal Configuration Considerations

Performance

Aim to render 10-25 frames per sec. 9600x4800 pixels = 175MB/frame. Bottlenecks everywhere, mostly I/O (bus, disk and

network). 1x PCIe (Gen 2) = 500MB/s 1Gb/s network = 120MB/s 1.5TB hard disk = 150MB/s (maximum)

Page 15: OptiPortal Configuration Considerations

Performance - Disk

First file server. Open Solaris + ZFS RAID5z (across 6 disks) ZFS makes all reads random seeks <100 MB/s read performance Single 1Gb/s network.

Page 16: OptiPortal Configuration Considerations

Performance - Disk

Second Server Fedora 12. SW RAID 0 (3 pairs) across HW RAID 1 (2 disks). Reads mostly sequential. 250 MB/s read performance. 4x 1Gb/s network.

Page 17: OptiPortal Configuration Considerations

Performance - Compression

Compressing data files reduces disk I/O. CPU time to decompress negligible. Better use of I/O cache. Decompress straight to memory. Can get you over the line.

(2x-5x improvement)

Page 18: OptiPortal Configuration Considerations

Issues

SSH and puppet security keys change on rebuild. Upgrading major OS versions still a lot of work. More RAM in file server (IO Cache). 1 Gb/s is not enough (at times). Need to remember to add changes to build scripts.

Page 19: OptiPortal Configuration Considerations

Issues - Multiple Networks

Some software does not like multiple networks. Looks up hostname and will only use that IP

address. Should be able to overwrite in a config file.

Page 20: OptiPortal Configuration Considerations

Questions?


Top Related