ral tier 1/a status
DESCRIPTION
RAL Tier 1/A Status. HEPiX-HEPNT NIKHEF, May 2003. CPU Farm – Existing Hardware. 108 dual processors (450, 600 and 1GHz) Up to 1GB RAM Desktop towers on warehouse shelves 156 dual processor 1400MHz PIII 133MHz FSB, 1Gb RAM each 1U rackmount, remote power switching RedHat 7.2. - PowerPoint PPT PresentationTRANSCRIPT
Martin Bly
RAL CSF Tier 1/A
RAL Tier 1/A Status
HEPiX-HEPNT
NIKHEF, May 2003
Martin Bly
RAL CSF Tier 1/A
CPU Farm – Existing Hardware
• 108 dual processors (450, 600 and 1GHz)
– Up to 1GB RAM
– Desktop towers on warehouse shelves
• 156 dual processor 1400MHz PIII
– 133MHz FSB, 1Gb RAM each
– 1U rackmount, remote power switching
– RedHat 7.2
Martin Bly
RAL CSF Tier 1/A
New Hardware – Spring 2003 +
• 80 dual processor 1U rackmount units– 2 x 2.66GHz P4 Xeons @ 533MHz FSB– Hyper-threading– 2048Mbyte memory– 2x1Gb/s NICs (o/b)– RedHat 7.3– 3 racks, remote power switching
• Next delivery expected Summer 2003
Martin Bly
RAL CSF Tier 1/A
Operating Systems
• Operating Systems:– Redhat 6.2 service will close end May– Redhat 7.2 service has been in production for
Babar for 6 months.– New Redhat 7.3 service now available for
LHC/other experiments– Testing/benchmarking on new Xeon systems
• Increasing demands for security updates becoming problematic.
Martin Bly
RAL CSF Tier 1/A
Disk Farm – Existing Hardware
• 2002 – 26 servers, each with 2 external RAID arrays - 1.7TB disk per server, RAID 5:– Excellent performance, well balanced system– Problems with a bad batch of Maxtor drives –
many failures and high error rate – all 620 drives now replaced by Maxtor.
– Still outstanding problems with Accusys controller failing to eject bad drives from RAID set.
Martin Bly
RAL CSF Tier 1/A
Disk Farm – Spring 2003 +
• Recent upgrade to disk farm:– 11 dual P4 Xeon servers (2.4GHz, 1024Mb RAM, PCIx), each
with 2 Infortrend IFT-6300 arrays via Ultra160 SCSI– 12 Maxtor 200GB DiamondMax Plus 9 drives per array, RAID 5.
• Not yet in production – but a few snags:– Originally tendered Maxtor Maxline Plus II drive was found not to
exist!– Infortrend array has 2TB limit per RAID set – pushing for a
firmware update.– 11+1spare better than 2 x 6 – 5Gb over 11 systems.
• Nick White ([email protected]) for more info.
Martin Bly
RAL CSF Tier 1/A
New Projects
• Basic fabric performance monitoring (ganglia)
• Resource CPU accounting (based on PBS accounts/mysql)
• New CA in production
• New batch scheduler (MAUI)
• Deploy new helpdesk (May)
Martin Bly
RAL CSF Tier 1/A
Ganglia
• Urgently needed live performance and utilisation monitoring:– RAL Ganglia Monitoring
http://ganglia.gridpp.rl.ac.uk/• Scalable solution based on multicast• Very rapidly deployable - reasonable
support on all Tier1A Hardware• See: http://ganglia.sourceforge.net/
Martin Bly
RAL CSF Tier 1/A
PBS Accounting Software
• Need to keep track of system CPU and disk usage.
• Home grown PBS accounting package (Derek Ross):– Upload PBS and disk stats into MYSQL– Process with Perl DBI script– Serve via Apache
• http://www.gridpp.rl.ac.uk/stats• Contact Derek ([email protected]) for more info.
Martin Bly
RAL CSF Tier 1/A
MAUI / PBS
• Maui scheduler has been in production for last 4 months.
• Allows extremely flexible scheduling with many features. But ….– Not all of it works – we have done much work
with developers for fixes.– Major problem – MAUI schedules on wall
clock time – not CPU time. Had to bodge it!!
Martin Bly
RAL CSF Tier 1/A
New Helpdesk Software
• Old helpdesk email based/unfriendly.• With additional staff, urgently need to deploy
new solution.• Expect new system to be based on free software
– probably Request Tracker• Hope that deployed system will also meet needs
of Testbed and may also satisfy Tier 2 sites.• Expect deployment by end of May.• http://requestracker.gridpp.rl.ac.uk
Martin Bly
RAL CSF Tier 1/A
Outstanding issues / worries
• We have to run many distinct services.– Fermi Linux– RH 6.2/7.2/7.3…– EDG testbeds, LCG …
• Farm management is getting very complex. We need better tools and automation.
• Security is becoming a big concern again.