winnie lacesso bristol storage june 2009. 2 dpm lcg storage lcgse01 = dpm built in 2005 by yves...
TRANSCRIPT
Winnie Lacesso
Bristol StorageJune 2009
2 DPM LCG Storage
• lcgse01 = DPM built in 2005 by Yves Coppens & Pete Gronbech• SuperMicro X5DPAGG (Streamline Computing), 2 Intel Xeon 2.8GHz,
2GB RAM, 32-bit SL3.x, Adaptec 39320A Ultra320 dual-channel• May 2006:Transtec T6100 = Infortrend/EonStor A08U-G2421
8x400GB RAID5 = 2.2TB usable; 673GB = any-VO , 1.527TB = CMS• May 2007: Transtec PV610S = Infortrend A16U-G2430• 16x750GB as 2xRAID6 = 8.4TB usable, all CMS-only• all ext3 filesystems; both RAID arrays nearly full• Feb-May 2008: intermittent SCSI problems with 16-bay• June 2008: rebuild lgse01 as SL4 32-bit; July-Aug: SCSI problems
increase, always w/16-bay, causing errors in dpm filesystems :(• Aug: replace Adaptec SCSI ctlr w/LSI: No help. Add +2GB RAM.• Sept/Oct/Nov - trying to debug, RAID array rejected 5 disks in 3
months; Vendor finally admits to replace hardware. Arrives in Dec.• New hardware replaced January - excellent working since then.
3 HPC-LCG Storage
• HPC has used DPM = lcgse01 so far
• HPC uses gpfs so Jon Wakelin looked into StoRM which can (supposedly) leverage gpfs for bulk access (instead of going thru server = bottleneck)
• lcgse02 = Viglen 1U, X7DBU mobo, 2 x Intel E5405 = 8 x 2.0GHz, 16GB RAM, 2 x 250GB RAID1 disks, dual PSU
• gridftp01 = identical but only 8GB RAM
• SL4.6 64-bit, gpfs 3.2.1.9 (currently) - kernel versions are constrained by gpfs (currently 2.6.9-67)
• StoRM FrontEnd + Backend on one machine (common config)
• StoRM supports gsiftp, rfio & file protocol
• Passing all OPS, LHCb, CMS SAM tests since forever :)
4 GPFS & HPC storage• Storageless Physics gpfs cluster = {lcgse02,gridftp01} plus 3 test nodes
• Storage gpfs cluster = 4 x DDN I/O servers (filers) & 44TB usable
• Jon got them multiclustered over public network so StoRM can write
• But after Jon left we found out rfio does not work - must be a config problem with ACLs within gpfs, but we can't find it yet
• HPC WN gpfs cluster needs to be multiclustered with Storage gpfs cluster, so LCG jobs on WN can ask lcgse02 for file:/ location of their data and access it over gpfs.
• HPC maintenance outage in May - multiclustering failed with openssl errors - no help from IBM gpfs experts
• New Storage Admin Bob Cregan will debug it!
5 StoRM SE, GPFS
• New hardware for HPC CE & StoRM SE, also gridftp server & new MON (syslog, Nagios, etc): X7DBU Xeon E5405 with 2GB RAM/core
• HPC CE working well except gpfs timeouts – patchy OPS SAM fails• Problems with StoRM - gpfs multiclustering not yet working, rfio
permission problems (ACLs??) - thought Jon left it in working order but guess not... New Storage Admin (Bob Cregan) will help get gpfs multiclustering working
• Good performance on new hardware!