1
INFN-T1 site report
Andrea Chierici
On behalf of INFN-T1 staff
28th October 2009
Overview
Infrastructure Network Farming Storage
2
Infrastructure
3
4
INFN-T1 2005 INFN-T1 2009
Racks 40 120
Power source University Directly from supplier (15kV)
Power Transformer
1 (~1MVA) 3 (~2.5MVA)
UPS 1 diesel engine/UPS (~640kVA)
2 Rotary UPS (~3400kVA) + 1 diesel engine (~640kVA)
Chiller 1 (~530kVA) 7 (~2740kVA)
5
UPS up to 3,8 MW
15000 V
1.4 MW
1 MW
Chillers
1.4 MW
1.2 MW
Mechanical and electrical surveillance
Network
7
INFN CNAF TIER1 Network
7600
GARR
2x10Gb/s
10Gb
/sExtermeBD10808
4x10Gb
/s
10Gb/s
LHC-OPNdedicated link 10Gb/s
T0-T1 (CERN)T1-T1 (PIC,RAL,TRIUMPH)
•T1-T1’s (BNL,FNAL,TW-ASGC,NDGF)•T1-T2’s
•CNAF General purpose
ExtermeBD8810
Worker Nodes
Worker Nodes
2x1Gb/s
2x1Gb/s
Extreme Summit450
Extreme Summit450
4x1Gb
/s
Extreme Summit450
Worker Nodes
4x1Gb/s
2x10
Gb/
s
Extreme Summit400
Storage Servers•Disk Servers
•Castor StagersFiber Channel
Storage Devices
SANExtreme
Summit400
In Case of network Congestion: Uplink upgrade from 4 x 1Gb/s
to 10 Gb/s or 2x10Gb/s
FC director
LHC-OPNCNAF-KIT
CNAF-IN2P3CNAF-SARA
T0-T1 BACKUP 10Gb/s
WAN
RALPIC
TRIUMPH
CiscoNEXUS 7000
Farming
9
New tender
1U Twin solution with these specs: 2 Intel Nehalem E5520 @2.26GHz 24GB RAM 2x 320 GB SATA HD @7200 rpm, 2x 1Gbps Ethernet
118 twin, reaching 20500 HEP-SPEC, measured on SLC44
Delivery and installation foreseen within 2009
10
Computing resources
Including machines from new tender, INFN-T1 computing power will reach 42000 HEP-SPEC within 2009
Further increase within January 2010 will bring us to 46000 HEP-SPEC
Within may 2010, we will reach 68000 HEP-SPEC (as we pledged to WLCG) This basically will triple current computing power
11
Resource usage per VO
12
KSI2K pledged vs used
13
New accounting system
Grid, local and overall job visualization Tier1/Tier2 separation Several parameters monitored
avg and max RSS, avg and max Vmem added in latest release
KSI2K/HEP-SPEC accounting WNoD accounting Available at: http://tier1.cnaf.infn.it/monitor Feedback welcome to: [email protected]
14
New accounting: sample picture
15
GPU Computing (1)
We are investigating GPU computingNVIDIA Tesla C1060, used for porting
software and performing comparison testshttps://agenda.cnaf.infn.it/conferenceDisplay.
py?confId=266, meeting with Bill Dally (chief scientist and vice president of NVIDIA).
16
GPU Computing (2)
Applications currently tested: Bioinformatics: CUDA-based paralog filtering in
Expressed Sequence Tag clusters Physics: Implementing a second order
electromagnetic particle in cell code on the CUDA architecture
Physics: Spin-Glass Monte Carlo Simulations First two apps showed more than 10x
increase in performance!!
17
GPU Computing (3)
We plan to buy 2 more workstations in 2010, with 2 GPU each. We wait for the FERMI architecture, foreseen for
spring 2010 We will continue the activities currently
ongoing and will probably test some monte carlo simulations for superB
We plan to test selection and shared usage of GPUs via grid
18
Storage
19
2009-2010 tenders
Disk tender requested Baseline: 3.3 PB raw (~ 2.7 PB-N)
1st option: 2.35 PB raw (~ 1.9 PB-N) 2nd option: 2 PB raw (~ 1.6 PB-N) Options to be requested during Q2 and Q3 2010
New disk in production ~ end of Q1 2010 4000 tapes (~ 4 PB) acquired with library
tender4.9 PB needed beginning of 20107.7 PB probably needed by half 2010
21
Castor@INFN-T1 To be upgraded to 2.1.7-27 1 Srm v 2.2 end-points available
Supported protocols: rfio, gridftp Still cumbersome to manage
requires frequent intervention in the Oracle db Lack of management tools
CMS migrated to StoRM for D0T1
22
WLCG Storage Classes at INFN-T1 today Storage Class – offer different levels of storage
quality (e.g. copy on disk and/or on tape) DnTm = n copies on disk and m copies on tape
Implementation of 3 Storage Classes needed for WLCG (but usable also by non-LHC experiments) Disk0-Tape1 (D0T1) or “custodial nearline”
Data migrated to tapes and deleted from disk whenstaging area full
Space managed by system Disk is only a temporary buffer
Disk1-Tape0 (D1T0) “replica online” Data kept on disk: no tape copy Space managed by VO
Disk1-Tape1 (D1T1) “custodial online” Data kept on disk AND one copy kept on tape Space managed by VO (i.e. if disk is full, copy fails)
CurrentlyCASTOR
CurrentlyGPFS/TSM
+ StoRM
23
YAMSS: present status Yet Another Mass Storage System
Scripting and configuration layer to interface GPFS&TSM Can work driven by StoRM or stand-alone
Experiments not using the SRM model can work with it GPFS-TSM (no StoRM) interface ready
Full support for migrations and tape ordered recalls StoRM
StoRM in production at INFN-T1 and in other centres around the world for “pure” disk access (i.e. no tape)
integration with YAMSS for migrations and tape ordered recalls ongoing (almost completed)
Bulk migrations and recalls tested with a typical use case (stand-alone YAMSS, without StoRM) Weekly production workflow of the CMS experiment
24
Why GPFS&TSM Tivoli Storage Manager (developed by IBM) is a
tape oriented storage manager widely used (also in HEP world, e.g. FZK) Built-in functionality present in both products to
implement backup and archiving from GPFS. The development of a HSM solution is based on
the combination of features of GPFS (since v.3.2) and TSM (since v.5.5). Since GPFS v.3.2 the new concept of “external
storage pool” extends use of policy driven Information Lifecycle Management (ILM) to tape storage.
External pools are real interfaces to external storage managers, e.g. HPSS or TSM HPSS very complex (no benefits in this sense
compared to CASTOR)
25
YAMSS: hardware set-up
20x4 Gbps
~ 500 TBfor GPFSon CX4-960
4 GridFTP servers (4x2 Gbps)6 NSD servers (6x2 Gbps) on LAN
HSMSTA
HSMSTA
HSMSTA
8x4 Gbps
3x4 Gbps
3x4 Gbps
db
8x4 Gbps 8 tape drives T10KB:- 1 TB per tape, - 1 Gbps per driveTAN
SAN
TSM server
4 Gbps FC
YAMSS: validation tests
Concurrent access in r/w to MSS for transfers and from farmStoRM not used in these tests
3 HSM nodes serving 8 T10KB drives6 drives (at maximum) used for recalls2 drives (at maximum) used for migrations
Order of 1GB/s of aggregated traffic
26
• ~550 MB/s from tape to disk
• ~100 MB/s from disk to tape
• ~400 MB/s from disk to the computing
nodes (not shown in this graph)
Questions?
27