bnl facility status and service challenge 3 hepix karlsruhe, germany may 9~13, 2005 zhenping liu,...
TRANSCRIPT
BNL Facility Status and Service Challenge 3
HEPiXHEPiX
Karlsruhe, GermanyKarlsruhe, Germany
May 9~13, 2005May 9~13, 2005
Zhenping Liu, Razvan Popescu, and Dantong YuZhenping Liu, Razvan Popescu, and Dantong Yu
USATLAS/RHIC Computing FacilityUSATLAS/RHIC Computing Facility
Brookhaven National LabBrookhaven National Lab
HEPiX
Karlsruhe, Germany, May 09-13, 2005
2
Outline
Lessons learned from SC2. Lessons learned from SC2.
Goals of BNL Service ChallengesGoals of BNL Service Challenges
Detailed SC3 planningDetailed SC3 planning Throughput Challenge (Simple)
Network Upgrade Plan USATLAS dCache system at BNL MSS Tier 2 Integration Planning File Transfer System
Service Phase Challenge to include ATLAS applications (difficult)
HEPiX
Karlsruhe, Germany, May 09-13, 2005
3
One day data transfer of SC2
HEPiX
Karlsruhe, Germany, May 09-13, 2005
4
Lessons Learned From SC2
Four file transfer servers with 1 Gigabit WAN network connection to CERN.Four file transfer servers with 1 Gigabit WAN network connection to CERN.
Meet the performance/throughput challenges (70~80MB/second disk to disk).Meet the performance/throughput challenges (70~80MB/second disk to disk). Enabled data transfer between dCache/SRM and CERN SRM at openlab
Design our own script to control SRM data transfer.
Enabled data transfer between BNL GridFtp servers and CERN openlab GridFtp
servers controlled by Radiant software.
Many components need to be tunedMany components need to be tuned 250 ms RRT, high packet dropping rate, has to use multiple TCP streams and
multiple file transfers to fill up network pipe.
Sluggish parallel file I/O with EXT2/EXT3, lot of processes with “D” state, more file
streams, worse the performance on file system.
Slight improvement with XFS system. Still need to tune file system parameter
HEPiX
Karlsruhe, Germany, May 09-13, 2005
5
Goals
Network, disk, and tape service Network, disk, and tape service Sufficient network bandwidth: 2Gbit/sec
Quality of service: performance: 150Mbtype/sec to Storage, upto 60
Mbytes/second to tape, Has to be done with efficiency and effectives. Functionality/Services, high reliability, data integration, high performance
Robust file transfer serviceRobust file transfer service Storage Servers
File Transfer Software (FTS)
Data Management software (SRM, dCache)
Archiving service: tapeservers, taperobots, tapes, tapedrives,
Sustainability Sustainability Weeks in a row un-interrupted 24/7 operation
Involve ATLAS Experiment ApplicationsInvolve ATLAS Experiment Applications
HEPiX
Karlsruhe, Germany, May 09-13, 2005
6
BNL network Topology
HEPiX
Karlsruhe, Germany, May 09-13, 2005
7
Network Upgrade Status and Plan
WAN connection OC-48.WAN connection OC-48.
Dual GigE links connect the BNL boarder router to Dual GigE links connect the BNL boarder router to
the Esnet router. the Esnet router.
Work on LAN upgrade from 1 GigE to 10 GigE, date Work on LAN upgrade from 1 GigE to 10 GigE, date
to complete: Middle of June, 2005to complete: Middle of June, 2005
HEPiX
Karlsruhe, Germany, May 09-13, 2005
8
BNL Storage Element: dCache System
Allows transparent access to large amount of data files distributed on disk Allows transparent access to large amount of data files distributed on disk in dCache pools or stored on HPSS.in dCache pools or stored on HPSS.
Provides the users with one unique name-space for all the data files.
Significantly improve the efficiency of connected tape storage systems, Significantly improve the efficiency of connected tape storage systems, through caching, i.e. gather & flush, and scheduled staging techniques.through caching, i.e. gather & flush, and scheduled staging techniques.
Clever selection mechanism.Clever selection mechanism. The system determines whether the file is already stored on one or
more disks or on HPSS. The system determines the source or destination dCache pool based on
storage group and network mask of clients, also CPU load and disk space, configuration of the dcache pools.
Optimizes the throughput to and from data clients as well as balances the Optimizes the throughput to and from data clients as well as balances the load of the connected disk storage nodes by dynamically replicating files load of the connected disk storage nodes by dynamically replicating files upon the detection of hot spots. upon the detection of hot spots.
Tolerant against failures of its data servers. Tolerant against failures of its data servers.
Various access protocols, including gridftp, SRM and dccp. Various access protocols, including gridftp, SRM and dccp.
HEPiX
Karlsruhe, Germany, May 09-13, 2005
9
Read poolsDCap doors
SRM door doors
GridFTP doors doors
Control Channel
External Write Pools
Internal write pools
Data Channel
DCap Clients
Pnfs Manager Pool Manager
HPSS
GridFTP Clientsd
SRM Clients
Oak Ridge Batch system
DCache System
BNL dCache Architecture
HEPiX
Karlsruhe, Germany, May 09-13, 2005
10
dCache System, Continued
BNL USATLAS dCache system works as a disk BNL USATLAS dCache system works as a disk
caching system as a front end for Mass Storage caching system as a front end for Mass Storage
System System
Current configuration: Total 72 nodes with 50.4 TB Current configuration: Total 72 nodes with 50.4 TB
disks:disks: Core server nodes, database server
Internal/External Read pool: 65 x 49.45 TB
Internal write pool nodes 4 x 532 GB
External write pool nodes 2 x 420 GB
dCache version: V1.2.2.7-2
Access protocols: GridFTP, SRM, dCap, gsi-dCap
HEPiX
Karlsruhe, Germany, May 09-13, 2005
11
Immediate dCache Upgrade
Existing dCache has 50 TB data storage.Existing dCache has 50 TB data storage.
288 new dual-CPU 3.4 Ghz dell hosts will be on-site on 288 new dual-CPU 3.4 Ghz dell hosts will be on-site on
May/11/2005May/11/2005 2 x 250G SATA drives
2GB memory and dual Gigbit on-board ports
These hosts will be split into more than two dCache system.These hosts will be split into more than two dCache system.
One of system will be used to SC3. The disk pool nodes will be One of system will be used to SC3. The disk pool nodes will be
connected directly to ATLAS router which has 10 G uplink.connected directly to ATLAS router which has 10 G uplink.
SL3 will be installed on all these dell hosts.SL3 will be installed on all these dell hosts.
File System to be installed: XFS, need to tune to improve disk File System to be installed: XFS, need to tune to improve disk
utilization per host. utilization per host.
HEPiX
Karlsruhe, Germany, May 09-13, 2005
12
BNL ATLAS MSS
Two 9940B tape drivers. Data transfer rate is Two 9940B tape drivers. Data transfer rate is
between 10MB~30MB/second. These two tape drives between 10MB~30MB/second. These two tape drives
are saturated with daily USATLAS production.are saturated with daily USATLAS production.
200 GB tapes. 200 GB tapes.
We need to borrow tape drives from other BNL in-We need to borrow tape drives from other BNL in-
house experiments on July to meet 60MByte/second house experiments on July to meet 60MByte/second
performance target. performance target.
HEPiX
Karlsruhe, Germany, May 09-13, 2005
13
File Transfer Service
ATLAS sees benefits on trying gLite FTS as soon as possibleATLAS sees benefits on trying gLite FTS as soon as possible To see ASAP whether it meet data transfer requirements
Data transfer requires significant effort to ramp up, learn from SC2
Help debugging gLite FTS
Transfers between Tier 0, Tier 1 and a few Tier 2.
A real usage with ROME production data.
Uniform low-level file transfer layer to interface with several
implementations of SRM: dCache/SRM, DPM, even vanilla GridFtp.
Xin deployed the FTS service. Xin has done successfully the data
transfer test with FTS.
We are ready for the Prime Time of July 1,2005.
HEPiX
Karlsruhe, Germany, May 09-13, 2005
14
Tier 2 Plans
Choose two USATLAS tier 2 sites. Choose two USATLAS tier 2 sites.
Each site will deploy DPM server as storage element Each site will deploy DPM server as storage element
with SRM interface. with SRM interface.
gLite FTS (file transfer service) will transfer data from gLite FTS (file transfer service) will transfer data from
BNL to each of two chosen sites in the speed of 75M BNL to each of two chosen sites in the speed of 75M
byte/second.byte/second.
Files will be kept in BNL Tier 1 dCache until they are Files will be kept in BNL Tier 1 dCache until they are
read once to Tier 2 center.read once to Tier 2 center.
HEPiX
Karlsruhe, Germany, May 09-13, 2005
15
ATLAS and SC3 Service Phase
SeptemberSeptember ATLAS release 11 (mid September)
Will include use of conditions database and COOL
We intend to use COOL for several sub-detectors Not clear how many sub-detectors will be ready Not clear as well how we will use COOL
Central COOL database or COOL distributed database
Debug scaling for distributed conditions data access calibration/alignment, DDM, event data distribution and discovery
Tier 0 exercise testing A dedicated server is requested for the initial ATLAS COOL service Issues on FroNtier are still under discussion and ATLAS is
interested Data can be thrown away.
HEPiX
Karlsruhe, Germany, May 09-13, 2005
16
ATLAS & SC3 Service Phase
April-July: Preparation phaseApril-July: Preparation phase Test of FTS (“gLite-SRM”) Integration of FTS with DDM
July: Scalability tests (commissioning data; Rome Physics workshop data)July: Scalability tests (commissioning data; Rome Physics workshop data)
September: test of new components and preparation for real use of the serviceSeptember: test of new components and preparation for real use of the service Intensive debugging of COOL and DDM Prepare for “scalability” running
Mid-OctoberMid-October Use of the Service Scalability tests of all components (DDM) Production of real data (MonteCarlo; Tier-0; …)
LaterLater “continuous” production mode Re-processing Analysis
HEPiX
Karlsruhe, Germany, May 09-13, 2005
17
Conclusion
Storage Element and network go well with upgrade.Storage Element and network go well with upgrade.
The whole chain of system will be tuned before the end of The whole chain of system will be tuned before the end of
May.May.
Wait for FTS software to control data transfer.Wait for FTS software to control data transfer.
Talk with USATLAS Tier 2 sites to participate SC3.Talk with USATLAS Tier 2 sites to participate SC3.
Discuss on how the experiment software can be involved.Discuss on how the experiment software can be involved.
HEPiX
Karlsruhe, Germany, May 09-13, 2005
18
Thank You!