winnie lacesso bristol storage june 2009. 2 dpm lcg storage lcgse01 = dpm built in 2005 by yves...

5
Winnie Lacesso Bristol Storage June 2009

Upload: heather-wilkerson

Post on 30-Dec-2015

215 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Winnie Lacesso Bristol Storage June 2009. 2 DPM LCG Storage lcgse01 = DPM built in 2005 by Yves Coppens & Pete Gronbech SuperMicro X5DPAGG (Streamline

Winnie Lacesso

Bristol StorageJune 2009

Page 2: Winnie Lacesso Bristol Storage June 2009. 2 DPM LCG Storage lcgse01 = DPM built in 2005 by Yves Coppens & Pete Gronbech SuperMicro X5DPAGG (Streamline

2 DPM LCG Storage

• lcgse01 = DPM built in 2005 by Yves Coppens & Pete Gronbech• SuperMicro X5DPAGG (Streamline Computing), 2 Intel Xeon 2.8GHz,

2GB RAM, 32-bit SL3.x, Adaptec 39320A Ultra320 dual-channel• May 2006:Transtec T6100 = Infortrend/EonStor A08U-G2421

8x400GB RAID5 = 2.2TB usable; 673GB = any-VO , 1.527TB = CMS• May 2007: Transtec PV610S = Infortrend A16U-G2430• 16x750GB as 2xRAID6 = 8.4TB usable, all CMS-only• all ext3 filesystems; both RAID arrays nearly full• Feb-May 2008: intermittent SCSI problems with 16-bay• June 2008: rebuild lgse01 as SL4 32-bit; July-Aug: SCSI problems

increase, always w/16-bay, causing errors in dpm filesystems :(• Aug: replace Adaptec SCSI ctlr w/LSI: No help. Add +2GB RAM.• Sept/Oct/Nov - trying to debug, RAID array rejected 5 disks in 3

months; Vendor finally admits to replace hardware. Arrives in Dec.• New hardware replaced January - excellent working since then.

Page 3: Winnie Lacesso Bristol Storage June 2009. 2 DPM LCG Storage lcgse01 = DPM built in 2005 by Yves Coppens & Pete Gronbech SuperMicro X5DPAGG (Streamline

3 HPC-LCG Storage

• HPC has used DPM = lcgse01 so far

• HPC uses gpfs so Jon Wakelin looked into StoRM which can (supposedly) leverage gpfs for bulk access (instead of going thru server = bottleneck)

• lcgse02 = Viglen 1U, X7DBU mobo, 2 x Intel E5405 = 8 x 2.0GHz, 16GB RAM, 2 x 250GB RAID1 disks, dual PSU

• gridftp01 = identical but only 8GB RAM

• SL4.6 64-bit, gpfs 3.2.1.9 (currently) - kernel versions are constrained by gpfs (currently 2.6.9-67)

• StoRM FrontEnd + Backend on one machine (common config)

• StoRM supports gsiftp, rfio & file protocol

• Passing all OPS, LHCb, CMS SAM tests since forever :)

Page 4: Winnie Lacesso Bristol Storage June 2009. 2 DPM LCG Storage lcgse01 = DPM built in 2005 by Yves Coppens & Pete Gronbech SuperMicro X5DPAGG (Streamline

4 GPFS & HPC storage• Storageless Physics gpfs cluster = {lcgse02,gridftp01} plus 3 test nodes

• Storage gpfs cluster = 4 x DDN I/O servers (filers) & 44TB usable

• Jon got them multiclustered over public network so StoRM can write

• But after Jon left we found out rfio does not work - must be a config problem with ACLs within gpfs, but we can't find it yet

• HPC WN gpfs cluster needs to be multiclustered with Storage gpfs cluster, so LCG jobs on WN can ask lcgse02 for file:/ location of their data and access it over gpfs.

• HPC maintenance outage in May - multiclustering failed with openssl errors - no help from IBM gpfs experts

• New Storage Admin Bob Cregan will debug it!

Page 5: Winnie Lacesso Bristol Storage June 2009. 2 DPM LCG Storage lcgse01 = DPM built in 2005 by Yves Coppens & Pete Gronbech SuperMicro X5DPAGG (Streamline

5 StoRM SE, GPFS

• New hardware for HPC CE & StoRM SE, also gridftp server & new MON (syslog, Nagios, etc): X7DBU Xeon E5405 with 2GB RAM/core

• HPC CE working well except gpfs timeouts – patchy OPS SAM fails• Problems with StoRM - gpfs multiclustering not yet working, rfio

permission problems (ACLs??) - thought Jon left it in working order but guess not... New Storage Admin (Bob Cregan) will help get gpfs multiclustering working

• Good performance on new hardware!