bnl facility status and service challenge 3 hepix karlsruhe, germany may 9~13, 2005 zhenping liu,...

18
BNL Facility Status and Service Challenge 3 HEPiX HEPiX Karlsruhe, Germany Karlsruhe, Germany May 9~13, 2005 May 9~13, 2005 Zhenping Liu, Razvan Popescu, and Zhenping Liu, Razvan Popescu, and Dantong Yu Dantong Yu USATLAS/RHIC Computing Facility USATLAS/RHIC Computing Facility Brookhaven National Lab Brookhaven National Lab

Upload: bertram-myles-lee

Post on 05-Jan-2016

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: BNL Facility Status and Service Challenge 3 HEPiX Karlsruhe, Germany May 9~13, 2005 Zhenping Liu, Razvan Popescu, and Dantong Yu USATLAS/RHIC Computing

BNL Facility Status and Service Challenge 3

HEPiXHEPiX

Karlsruhe, GermanyKarlsruhe, Germany

May 9~13, 2005May 9~13, 2005

Zhenping Liu, Razvan Popescu, and Dantong YuZhenping Liu, Razvan Popescu, and Dantong Yu

USATLAS/RHIC Computing FacilityUSATLAS/RHIC Computing Facility

Brookhaven National LabBrookhaven National Lab

Page 2: BNL Facility Status and Service Challenge 3 HEPiX Karlsruhe, Germany May 9~13, 2005 Zhenping Liu, Razvan Popescu, and Dantong Yu USATLAS/RHIC Computing

HEPiX

Karlsruhe, Germany, May 09-13, 2005

2

Outline

Lessons learned from SC2. Lessons learned from SC2.

Goals of BNL Service ChallengesGoals of BNL Service Challenges

Detailed SC3 planningDetailed SC3 planning Throughput Challenge (Simple)

Network Upgrade Plan USATLAS dCache system at BNL MSS Tier 2 Integration Planning File Transfer System

Service Phase Challenge to include ATLAS applications (difficult)

Page 3: BNL Facility Status and Service Challenge 3 HEPiX Karlsruhe, Germany May 9~13, 2005 Zhenping Liu, Razvan Popescu, and Dantong Yu USATLAS/RHIC Computing

HEPiX

Karlsruhe, Germany, May 09-13, 2005

3

One day data transfer of SC2

Page 4: BNL Facility Status and Service Challenge 3 HEPiX Karlsruhe, Germany May 9~13, 2005 Zhenping Liu, Razvan Popescu, and Dantong Yu USATLAS/RHIC Computing

HEPiX

Karlsruhe, Germany, May 09-13, 2005

4

Lessons Learned From SC2

Four file transfer servers with 1 Gigabit WAN network connection to CERN.Four file transfer servers with 1 Gigabit WAN network connection to CERN.

Meet the performance/throughput challenges (70~80MB/second disk to disk).Meet the performance/throughput challenges (70~80MB/second disk to disk). Enabled data transfer between dCache/SRM and CERN SRM at openlab

Design our own script to control SRM data transfer.

Enabled data transfer between BNL GridFtp servers and CERN openlab GridFtp

servers controlled by Radiant software.

Many components need to be tunedMany components need to be tuned 250 ms RRT, high packet dropping rate, has to use multiple TCP streams and

multiple file transfers to fill up network pipe.

Sluggish parallel file I/O with EXT2/EXT3, lot of processes with “D” state, more file

streams, worse the performance on file system.

Slight improvement with XFS system. Still need to tune file system parameter

Page 5: BNL Facility Status and Service Challenge 3 HEPiX Karlsruhe, Germany May 9~13, 2005 Zhenping Liu, Razvan Popescu, and Dantong Yu USATLAS/RHIC Computing

HEPiX

Karlsruhe, Germany, May 09-13, 2005

5

Goals

Network, disk, and tape service Network, disk, and tape service Sufficient network bandwidth: 2Gbit/sec

Quality of service: performance: 150Mbtype/sec to Storage, upto 60

Mbytes/second to tape, Has to be done with efficiency and effectives. Functionality/Services, high reliability, data integration, high performance

Robust file transfer serviceRobust file transfer service Storage Servers

File Transfer Software (FTS)

Data Management software (SRM, dCache)

Archiving service: tapeservers, taperobots, tapes, tapedrives,

Sustainability Sustainability Weeks in a row un-interrupted 24/7 operation

Involve ATLAS Experiment ApplicationsInvolve ATLAS Experiment Applications

Page 6: BNL Facility Status and Service Challenge 3 HEPiX Karlsruhe, Germany May 9~13, 2005 Zhenping Liu, Razvan Popescu, and Dantong Yu USATLAS/RHIC Computing

HEPiX

Karlsruhe, Germany, May 09-13, 2005

6

BNL network Topology

Page 7: BNL Facility Status and Service Challenge 3 HEPiX Karlsruhe, Germany May 9~13, 2005 Zhenping Liu, Razvan Popescu, and Dantong Yu USATLAS/RHIC Computing

HEPiX

Karlsruhe, Germany, May 09-13, 2005

7

Network Upgrade Status and Plan

WAN connection OC-48.WAN connection OC-48.

Dual GigE links connect the BNL boarder router to Dual GigE links connect the BNL boarder router to

the Esnet router. the Esnet router.

Work on LAN upgrade from 1 GigE to 10 GigE, date Work on LAN upgrade from 1 GigE to 10 GigE, date

to complete: Middle of June, 2005to complete: Middle of June, 2005

Page 8: BNL Facility Status and Service Challenge 3 HEPiX Karlsruhe, Germany May 9~13, 2005 Zhenping Liu, Razvan Popescu, and Dantong Yu USATLAS/RHIC Computing

HEPiX

Karlsruhe, Germany, May 09-13, 2005

8

BNL Storage Element: dCache System

Allows transparent access to large amount of data files distributed on disk Allows transparent access to large amount of data files distributed on disk in dCache pools or stored on HPSS.in dCache pools or stored on HPSS.

Provides the users with one unique name-space for all the data files.

Significantly improve the efficiency of connected tape storage systems, Significantly improve the efficiency of connected tape storage systems, through caching, i.e. gather & flush, and scheduled staging techniques.through caching, i.e. gather & flush, and scheduled staging techniques.

Clever selection mechanism.Clever selection mechanism. The system determines whether the file is already stored on one or

more disks or on HPSS. The system determines the source or destination dCache pool based on

storage group and network mask of clients, also CPU load and disk space, configuration of the dcache pools.

Optimizes the throughput to and from data clients as well as balances the Optimizes the throughput to and from data clients as well as balances the load of the connected disk storage nodes by dynamically replicating files load of the connected disk storage nodes by dynamically replicating files upon the detection of hot spots. upon the detection of hot spots.

Tolerant against failures of its data servers. Tolerant against failures of its data servers.

Various access protocols, including gridftp, SRM and dccp. Various access protocols, including gridftp, SRM and dccp.

Page 9: BNL Facility Status and Service Challenge 3 HEPiX Karlsruhe, Germany May 9~13, 2005 Zhenping Liu, Razvan Popescu, and Dantong Yu USATLAS/RHIC Computing

HEPiX

Karlsruhe, Germany, May 09-13, 2005

9

Read poolsDCap doors

SRM door doors

GridFTP doors doors

Control Channel

External Write Pools

Internal write pools

Data Channel

DCap Clients

Pnfs Manager Pool Manager

HPSS

GridFTP Clientsd

SRM Clients

Oak Ridge Batch system

DCache System

BNL dCache Architecture

Page 10: BNL Facility Status and Service Challenge 3 HEPiX Karlsruhe, Germany May 9~13, 2005 Zhenping Liu, Razvan Popescu, and Dantong Yu USATLAS/RHIC Computing

HEPiX

Karlsruhe, Germany, May 09-13, 2005

10

dCache System, Continued

BNL USATLAS dCache system works as a disk BNL USATLAS dCache system works as a disk

caching system as a front end for Mass Storage caching system as a front end for Mass Storage

System System

Current configuration: Total 72 nodes with 50.4 TB Current configuration: Total 72 nodes with 50.4 TB

disks:disks: Core server nodes, database server

Internal/External Read pool: 65 x 49.45 TB

Internal write pool nodes 4 x 532 GB

External write pool nodes 2 x 420 GB

dCache version: V1.2.2.7-2

Access protocols: GridFTP, SRM, dCap, gsi-dCap

Page 11: BNL Facility Status and Service Challenge 3 HEPiX Karlsruhe, Germany May 9~13, 2005 Zhenping Liu, Razvan Popescu, and Dantong Yu USATLAS/RHIC Computing

HEPiX

Karlsruhe, Germany, May 09-13, 2005

11

Immediate dCache Upgrade

Existing dCache has 50 TB data storage.Existing dCache has 50 TB data storage.

288 new dual-CPU 3.4 Ghz dell hosts will be on-site on 288 new dual-CPU 3.4 Ghz dell hosts will be on-site on

May/11/2005May/11/2005 2 x 250G SATA drives

2GB memory and dual Gigbit on-board ports

These hosts will be split into more than two dCache system.These hosts will be split into more than two dCache system.

One of system will be used to SC3. The disk pool nodes will be One of system will be used to SC3. The disk pool nodes will be

connected directly to ATLAS router which has 10 G uplink.connected directly to ATLAS router which has 10 G uplink.

SL3 will be installed on all these dell hosts.SL3 will be installed on all these dell hosts.

File System to be installed: XFS, need to tune to improve disk File System to be installed: XFS, need to tune to improve disk

utilization per host. utilization per host.

Page 12: BNL Facility Status and Service Challenge 3 HEPiX Karlsruhe, Germany May 9~13, 2005 Zhenping Liu, Razvan Popescu, and Dantong Yu USATLAS/RHIC Computing

HEPiX

Karlsruhe, Germany, May 09-13, 2005

12

BNL ATLAS MSS

Two 9940B tape drivers. Data transfer rate is Two 9940B tape drivers. Data transfer rate is

between 10MB~30MB/second. These two tape drives between 10MB~30MB/second. These two tape drives

are saturated with daily USATLAS production.are saturated with daily USATLAS production.

200 GB tapes. 200 GB tapes.

We need to borrow tape drives from other BNL in-We need to borrow tape drives from other BNL in-

house experiments on July to meet 60MByte/second house experiments on July to meet 60MByte/second

performance target. performance target.

Page 13: BNL Facility Status and Service Challenge 3 HEPiX Karlsruhe, Germany May 9~13, 2005 Zhenping Liu, Razvan Popescu, and Dantong Yu USATLAS/RHIC Computing

HEPiX

Karlsruhe, Germany, May 09-13, 2005

13

File Transfer Service

ATLAS sees benefits on trying gLite FTS as soon as possibleATLAS sees benefits on trying gLite FTS as soon as possible To see ASAP whether it meet data transfer requirements

Data transfer requires significant effort to ramp up, learn from SC2

Help debugging gLite FTS

Transfers between Tier 0, Tier 1 and a few Tier 2.

A real usage with ROME production data.

Uniform low-level file transfer layer to interface with several

implementations of SRM: dCache/SRM, DPM, even vanilla GridFtp.

Xin deployed the FTS service. Xin has done successfully the data

transfer test with FTS.

We are ready for the Prime Time of July 1,2005.

Page 14: BNL Facility Status and Service Challenge 3 HEPiX Karlsruhe, Germany May 9~13, 2005 Zhenping Liu, Razvan Popescu, and Dantong Yu USATLAS/RHIC Computing

HEPiX

Karlsruhe, Germany, May 09-13, 2005

14

Tier 2 Plans

Choose two USATLAS tier 2 sites. Choose two USATLAS tier 2 sites.

Each site will deploy DPM server as storage element Each site will deploy DPM server as storage element

with SRM interface. with SRM interface.

gLite FTS (file transfer service) will transfer data from gLite FTS (file transfer service) will transfer data from

BNL to each of two chosen sites in the speed of 75M BNL to each of two chosen sites in the speed of 75M

byte/second.byte/second.

Files will be kept in BNL Tier 1 dCache until they are Files will be kept in BNL Tier 1 dCache until they are

read once to Tier 2 center.read once to Tier 2 center.

Page 15: BNL Facility Status and Service Challenge 3 HEPiX Karlsruhe, Germany May 9~13, 2005 Zhenping Liu, Razvan Popescu, and Dantong Yu USATLAS/RHIC Computing

HEPiX

Karlsruhe, Germany, May 09-13, 2005

15

ATLAS and SC3 Service Phase

SeptemberSeptember ATLAS release 11 (mid September)

Will include use of conditions database and COOL

We intend to use COOL for several sub-detectors Not clear how many sub-detectors will be ready Not clear as well how we will use COOL

Central COOL database or COOL distributed database

Debug scaling for distributed conditions data access calibration/alignment, DDM, event data distribution and discovery

Tier 0 exercise testing A dedicated server is requested for the initial ATLAS COOL service Issues on FroNtier are still under discussion and ATLAS is

interested Data can be thrown away.

Page 16: BNL Facility Status and Service Challenge 3 HEPiX Karlsruhe, Germany May 9~13, 2005 Zhenping Liu, Razvan Popescu, and Dantong Yu USATLAS/RHIC Computing

HEPiX

Karlsruhe, Germany, May 09-13, 2005

16

ATLAS & SC3 Service Phase

April-July: Preparation phaseApril-July: Preparation phase Test of FTS (“gLite-SRM”) Integration of FTS with DDM

July: Scalability tests (commissioning data; Rome Physics workshop data)July: Scalability tests (commissioning data; Rome Physics workshop data)

September: test of new components and preparation for real use of the serviceSeptember: test of new components and preparation for real use of the service Intensive debugging of COOL and DDM Prepare for “scalability” running

Mid-OctoberMid-October Use of the Service Scalability tests of all components (DDM) Production of real data (MonteCarlo; Tier-0; …)

LaterLater “continuous” production mode Re-processing Analysis

Page 17: BNL Facility Status and Service Challenge 3 HEPiX Karlsruhe, Germany May 9~13, 2005 Zhenping Liu, Razvan Popescu, and Dantong Yu USATLAS/RHIC Computing

HEPiX

Karlsruhe, Germany, May 09-13, 2005

17

Conclusion

Storage Element and network go well with upgrade.Storage Element and network go well with upgrade.

The whole chain of system will be tuned before the end of The whole chain of system will be tuned before the end of

May.May.

Wait for FTS software to control data transfer.Wait for FTS software to control data transfer.

Talk with USATLAS Tier 2 sites to participate SC3.Talk with USATLAS Tier 2 sites to participate SC3.

Discuss on how the experiment software can be involved.Discuss on how the experiment software can be involved.

Page 18: BNL Facility Status and Service Challenge 3 HEPiX Karlsruhe, Germany May 9~13, 2005 Zhenping Liu, Razvan Popescu, and Dantong Yu USATLAS/RHIC Computing

HEPiX

Karlsruhe, Germany, May 09-13, 2005

18

Thank You!