j coles escience centre storage at ral tier1a jeremy coles [email protected]
TRANSCRIPT
J Coles
eScience Centre
Outline
• Disk– Current Status and Plans– dCache
• Tape– Current Status and History– SRM etc– Plans
• Hardware
• Software
J Coles
eScience Centre
Tier1A Disk
• 2002-3 80TB– Dual Processor Server– Dual channel SCSI
interconnect– External IDE/SCSI
RAID arrays (Accusys and Infortrend)
– ATA drives (mainly Maxtor)
– Cheap and (fairly) cheerful
• 2004 (140TB)– Infortrend Eonstore
SATA/SCSI RAID Arrays
– 16*250GB Western Digital SATA per array
– Two arrays per server
J Coles
eScience Centre
Implementation
• Used by BaBar and other experiments as well as LHC
• 60 disk servers nfs-exporting their filesystems– Potential scaling problems if every cpu node
wants to use the same disk
• Servers allocated to VOs so no contention or interference
• Need a means of pooling servers– so looked at dCache
J Coles
eScience Centre
Why we tried dCache?
• Gives you a virtual file space across many file systems optionally on several nodes
• Allows replication within file space to increase redundancy
• Allows a tape system to be interfaced at the back to further increase redundancy and storage available
• Data protocols are scalable, one GridFTP interface per server is easy and transparent
• Only SRM available for disk pools
J Coles
eScience Centre
dCache Doors
• Doors (interfaces) can be created into the system– GridFTP
– SRM
– GSIDCAP• GFAL gives you a POSIX interface to this.
• All of these are GSI enabled but Kerberos doors also exist
• Everything remains consistent regardless of the door that is used
J Coles
eScience Centre
History of dCache at RAL
• Mid 2003– We deployed a non grid
version for CMS. It was never used in production.
• End of 2003/Start of 2004– RAL offered to package a
production quality dCache.
– Stalled due to bugs and holidays
– went back to developers and LCG developers.
• September 2004– Redeployed
DCache into LCG system for CMS, and DTeam VOs.
• dCache deployed within JRA1 testing infrastructure for gLite i/o daemon testing.
J Coles
eScience Centre
dCache at RAL today
• Now deployed for ATLAS, CMS, DTeam and LHCb.
• 5 disk servers made up of 16 * 1.7TB partitions.
• CMS, the only serious users of dCache at RAL, have saved 2.5 TB in the system.
• They are accessing byte ranges via the GSIDCAP posix interface.
J Coles
eScience Centre
Pool Group Per VO
• We found we could not quota file space between VOs.– Following advice, DCache redeployed with a pool
group per VO.
– Still only one SRM frontend. Data channels to it will be switched off as we found that data transfers kill the head node.
• Now unable to publish space per VO…
J Coles
eScience Centre
Current Deployment at RAL
J Coles
eScience Centre
Transfer into DCache
J Coles
eScience Centre
Transfer into DCache
J Coles
eScience Centre
Extreme Deployment.
J Coles
eScience Centre
Current Installation Technique
• The Tier-1 now has its own cookbook to follow but it is not generic at this time.
• Prerequisites– VDT for certificate infrastructure.– edg-mkgridmap for grid-mapfile.– J2RE.– Host certificate for all nodes with a GSI door.
J Coles
eScience Centre
Unanswered Questions
• How do we drain a node for maintenance?– CHEP papers and statements from developers
say this is possible.
• How do we support small VOs.– 1.7TB our standard partition size and pools fill
a partition.
J Coles
eScience Centre
Other interfaces
• SRB– RAL supports (and
develops) SRB for other communities
– Ran MCAT for worldwide CMS simulation data.
– SRB is interfaced to the Atlas datastore
– Committed to supporting SRB
• Xrootd– xrootd interface to the
BaBar data held on disk and in the ADS at RAL
– ~15 TB of data of which about 10TB is in the ADS Will expand both to about 70TB in the next few months.
– BaBar is planning to use xrootd access for background and Conditions files for Monte Carlo production on LCG, basic test have been run on WAN access to xrootd in Italy and RAL will be involved in more soon
J Coles
eScience Centre
Tape Overview
• General purpose, multi user, data archive.
• In use over 20 years. Four major upgrades.
• Current capacity 1PB – largest (non dedicated) multi user system in UK academia?
History• M860
– 110GB
• STK 4400– 1.2Tbytes
• IBM 3494– 30Tbytes
• STK 9310– 1Pbyte
J Coles
eScience Centre
STK 9310
8 x 9940 tape drives
ADS_switch_1 ADS_Switch_2
Brocade FC switches
4 drives to each switch
ermintrudeAIX
dataserver
florenceAIX
dataserver
zebedeeAIX
dataserver
dougalAIX
dataserver
mchenry1AIXTest flfsys
basilAIXtest
dataserver
brianAIXflfsys
ADS0CNTRRedhatcounter
ADS0PT01Redhat
pathtape
ADS0SB01Redhat
SRB interface
dylanAIX
Import/exportbuxtonSunOSACSLS
User
array4 array3 array2 array1
catalogue
cache
catalogue
cache
Test system
SRB Inq; S commands; MySRB
Tape devices
ADStape
ADS sysreq
admin commandscreate query
User pathtape commands
Logging
Physical connection (FC/SCSI)
Sysreq udp command
User SRB command
VTP data transfer
SRB data transfer
STK ACSLS command
All sysreq, vtp andACSLS connections to dougal also apply tothe other dataserver machines, but are left out for clarity
Production system
SRB pathtape commands
Thursday, 04 November 2004
J Coles
eScience Centre
flfstk
tapeserv
Farm Server
flfsys(+libflf)
user
flfscan
data transfer (libvtp)
catalogue data
STK tape drive
cellmgr
Catalogue Server (brian)
flfdoexp(+libflf)
flfdoback(+libflf)
datastore (script)
Robot Server (buxton)
ACSLS
API
control info(mount/dismount)
data
data
Tape Robot
flfsys user commands (sysreq)
SE
recycling (+libflf)
read
read
read
Atlas Datastore Architecture
28 Feb 03 - 2 B Strong
SSI
CSI
flfsys farm commands (sysreq)
LMU
flfsys admin commands(sysreq)
administrators
flfaio
flfaio
flfaio
IBM tape drive
flfqryoff(copy of
flfsyscode)
Backupcatalogue
stats
flfsys tapecommands
(sysreq)
servesys
pathtape
long name(sysreq)
short name(sysreq)
frontend
backendPathtape Server (rusty)
(sysreq)
importexport
flfsys import/export commands (sysreq)
libvtp User Node
I/E Server(dylan)
?
Copy BCopy C
ACSLS
cache disk
Copy A
vtp
vtp
user program
tape
(sysreq)
J Coles
eScience Centre
Hardware upgrade - completed Jun 2003
• STK 9310 “Powderhorn” with 6000 slots (1.2Pbytes)• 4 IBM 3590 B drives now phased out
– 10 Gbyte native– 10 Mbyte/s transfer
• 8 New STK 9940B drives– 200 Gbyte native– 30Mbytes/sec/drive transfer – 240Mbyte/sec theoretical maximum bandwidth
• 4 RS6000 Data servers (+ 4 “others”)• 1Gbit networking (Expected to become 10Gbit by
2005)• Data Migration to new media completed ~ Feb 2004
J Coles
eScience Centre
Strategy
• De-couple user and application from storage media. • Upgrades and media migration occur “behind the scenes”• High resilience - very few Single Point Failures• High reliability high, availability (99.9986%)• Constant environmental monitoring linked to alarm/call out• Easy to exploit (endless) new technology• Lifetime data integrity checks hardware and software• Fire safe and off-site backups; Tested disaster recovery
procedures; media migration, recycling• Technology watch to monitor future technology path
J Coles
eScience Centre
Supported interfaces
• We have successfully implemented a variety of layers on top of ADS to support standard interfaces
• FTP, OOFS, Globus IO, SRB, EDG SE, SRM, xrootd– so we can probably support others
J Coles
eScience Centre
Overall Storage Goals – GridPP2
• Provide SRM interfaces to:– The Atlas Petabyte Storage facility at RAL– Disk (for Tier 1 and 2 in UK)– Disk pools (for Tier 1 and 2 in UK)
• Deploy and support interface to Atlas Datastore
• Package and support interfaces to disk
J Coles
eScience Centre
Current status
• EDG-SE interface to ADS– published as SE in LCG– supported by edg-rm
• SRM v1.1 interface to ADS– Tested with GFAL (earlier versions, <1.3.7)– Tested with srmcp (the dCache client)– Based on EDG Storage Element– Also interfaces to disk
• Also working with RAL Tier 1 on dCache– Install and support, including the SRM
J Coles
eScience Centre
(Short Term) Timeline
• Provide a release of SRM to disk and disk array by end of January 2005
• Coincide with the EGEE gLite “release”
• Plan to match the path toward the full gLite release
J Coles
eScience Centre
(Short Term) Strategy
• Currently considering both EDG SE and dCache
Storage Element
dCache + dCache-SRM
SRM toADS
SRM todisk
SRM todisk pool
Look at both to meet all goals:• Some duplicated effort, but helps mitigating risks• Can fall back to only one (which may be dCache)• In the long term, we will probably have a single solution
J Coles
eScience Centre
Acceptance tests
• SRM tests – SRM interface must work with:– srmcp (the dCache SRM client)– GFAL– gLite I/O
• Disk pool test – must work with– dccp (dCache specific)– plus SRM interface on top
J Coles
eScience Centre
Questions
• What is the future of CERN’s DPM?– We want to test it
• Should we start implementing SRM 3?
• Will dCache ever go Open Source?
J Coles
eScience Centre
Planned Tape Capacity
TB 2004 2005 2006 2007 2008
LHC 483 1150 1639 1573
Total 882 1500 2100 2100
Don’t believe 2008 figures, reviewing storage in this timeframe
J Coles
eScience Centre
ADS Plans
• Planning a wider UK role in Data Curation and Storage (potentially 10-20PB by 2014)
• Review software layer – use of Castor possible
• Capacity plans based on adding STK Titanium 1 in 2005/06 and Titanium 2 in 2008/09
J Coles
eScience Centre
Summary
• Working implementation of dCache for disk pools (main user is CMS)– Some outstanding questions– Plan to involve some Tier-2s shortly
• We will review other implementations as they become available
• RAL ADS supports SRB and xrootd for other communities.