site report: tokyo tomoaki nakamura icepp, the university of tokyo 2013/12/13tomoaki nakamura icepp,...

13
Site Report: Tokyo Tomoaki Nakamura ICEPP, The University of Tokyo 2013/12/13 Tomoaki Nakamura ICEPP, UTokyo 1

Upload: patricia-richardson

Post on 16-Jan-2016

225 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Site Report: Tokyo Tomoaki Nakamura ICEPP, The University of Tokyo 2013/12/13Tomoaki Nakamura ICEPP, UTokyo1

Site Report: Tokyo

Tomoaki NakamuraICEPP, The University of Tokyo

2013/12/13 Tomoaki Nakamura ICEPP, UTokyo 1

Page 2: Site Report: Tokyo Tomoaki Nakamura ICEPP, The University of Tokyo 2013/12/13Tomoaki Nakamura ICEPP, UTokyo1

ICEPP regional analysis center

2013/12/13 Tomoaki Nakamura ICEPP, UTokyo 2

2013 2014 2015

CPU pledge

16000 [HS06]

20000 [HS06] 24000

CPU deployed

43673.6 [HS06-SL5] (2560core)

46156.8 [HS06-SL6] (2560core)

-

Disk pledge 1600 [TB] 2000 [TB] 2400 [TB]

Disk deployed 2000 [TB] 2000 [TB] -

Resource overviewSupport only ATLAS VO in WLCG as Tier2, and ATLAS-Japan collaboration as Tier3. The first production system for WLCG was deployed in 2005 after the several years R&D.Almost of hardware are prepared by three years lease. System have been upgraded in every three years.Current system is the 3rd generation system.

Human resourceTetsuro Mashimo (associate professor): fabric operation, procurement, Tier3 supportNagataka Matsui (technical staff): fabric operation, Tier3 supportTomoaki Nakamura (project assistant professor): Tier2 operation, Tier3 analysis environmentHiroshi Sakamoto (professor): site representative, coordination, ADCoS (AP leader)Ikuo Ueda (assistant professor): ADC coordinator, site contact with ADCSystem engineer from company (2FTE): fabric maintenance, system setup

WLCG pledge

Page 3: Site Report: Tokyo Tomoaki Nakamura ICEPP, The University of Tokyo 2013/12/13Tomoaki Nakamura ICEPP, UTokyo1

Computing resource (2013-2015)

2013/12/13 Tomoaki Nakamura ICEPP, UTokyo 3

2nd system  (2010-2012) 3rd system (2013-2015)

Computing node Total Node: 720 nodes, 5720 cores(including service nodes)CPU: Intel Xeon X5560(Nehalem 2.8GHz, 4 cores/CPU)

Node: 624 nodes, 9984 cores (including service nodes)CPU: Intel Xeon E5-2680(Sandy Bridge 2.7GHz, 8cores/CPU)

Non-grid (Tier3) Node: 96 (496) nodes, 768 (3968) coresMemory: 16GB/nodeNIC: 1Gbps/nodeNetwork BW: 20Gbps/16 nodesDisk: 300GB SAS x 2

Node: 416 nodes, 6656 coresMemory: 16GB/node (to be upgraded)NIC: 10Gbps/nodeNetwork BW: 40Gbps/16 nodesDisk: 600GB SAS x 2

Tier2 Node: 464 (144) nodes, 3712 (1152) coresMemory: 24GB/node NIC: 10Gbps/nodeNetwork BW: 30Gbps/16 nodesDisk: 300GB SAS x 2

Node: 160 nodes, 2560 coresMemory: 32GB/node (to be upgraded)NIC: 10Gbps/nodeNetwork BW: 80Gbps/16 nodesDisk: 600GB SAS x 2

Disk storage Total Capacity: 5280TB (RAID6)Disk Array: 120 units (HDD: 2TB x 24)File Server: 64 nodes (blade)FC: 4Gbps/Disk, 8Gbps/FS

Capacity: 6732TB (RAID6) + αDisk Array: 102 (3TB x 24)File Server: 102 nodes (1U)FC: 8Gbps/Disk, 8Gbps/FS

Non-grid (Tier3) Mainly NFS Mainly GPFS

Tier2 DPM: 1.36PB DPM: 2.64PB

Network bandwidth

LAN 10GE ports in switch: 192Switch inter link: 80Gbps

10GE ports in switch: 352Switch inter link : 160Gbps

WAN ICEPP-UTnet: 10Gbps (+10Gbps)SINIET-USA: 10Gbps x 2ICEPP-EU: 10Gbps

ICEPP-UTNET: 10Gbps (+10Gbps)SINET-USA: 10Gbps x 3ICEPP-EU: 10Gbps (+10Gbps)LHCONE

Page 4: Site Report: Tokyo Tomoaki Nakamura ICEPP, The University of Tokyo 2013/12/13Tomoaki Nakamura ICEPP, UTokyo1

Evolution of disk storage capacity for Tier2

2013/12/13 Tomoaki Nakamura ICEPP, UTokyo 4

16x500GB HDD / array5disk arrays / server

XFS on RAID64G-FC via FC switch

10GE NIC

24x2TB HDD / array1disk array / server

XFS on RAID68G-FC via FC switch

10GE NIC

24x3TB HDD /array2disk arrays / server

XFS on RAID68G-FC without FC switch

10GE NIC

■ WLCG pledge● Deployed  (assigned to ATLAS)Number of disk arraysNumber of file servers

Pilot systemfor R&D

1st system2007 - 2009

2nd system2010 - 2012

3rd system2013 - 2015

30 65 30 4034 4065 30 406 13 15 4017 4013 15 40

Tota

l cap

acity

in D

PM

2.4PB

4th system2016 - 2019

5TBHDD?

Page 5: Site Report: Tokyo Tomoaki Nakamura ICEPP, The University of Tokyo 2013/12/13Tomoaki Nakamura ICEPP, UTokyo1

ATLAS disk and LocalGroupDisk in DPM

2013/12/13 Tomoaki Nakamura ICEPP, UTokyo 5

2014’s pledge deployed since Feb. 20, 2013DATADISK: 1740TB (including old GroupDisk, MCDisk, HotDisk)PRODDISK: 60TBSCRATCHDISK: 200TB-------------------------------Total: 2000TB

Keep less than 500TBat present, if no more request from users.

But, it is unlikely…

← one year →

datasets deleted one-year-old

500TB

datasets deleted half-year-old

datasets deleted heavy user

Filled 1.6PB

Page 6: Site Report: Tokyo Tomoaki Nakamura ICEPP, The University of Tokyo 2013/12/13Tomoaki Nakamura ICEPP, UTokyo1

Fraction of ATLAS jobs at Tokyo Tier2

2013/12/13 Tomoaki Nakamura ICEPP, UTokyo 6

Frac

tion

of c

ompl

eted

jobs

[%]

[per 6 month]

2nd system 3rd system

System migration

Page 7: Site Report: Tokyo Tomoaki Nakamura ICEPP, The University of Tokyo 2013/12/13Tomoaki Nakamura ICEPP, UTokyo1

SL6 migration

2013/12/13 Tomoaki Nakamura ICEPP, UTokyo 7

Migration to SL6 WN was completed in several week at Oct. 2013.It was done by the rolling transition making TOKYO-SL6 queue to minimize the downtime and risk hedge for the massive miss configuration .

Performance improvement5% increased as HepSpec06 scoreSL5, 32bit compile mode: 17.06±0.02SL6, 32bit compile mode: 18.03±0.02

R. M. Llamas, ATLAS S&C workshop Oct. 2013

Page 8: Site Report: Tokyo Tomoaki Nakamura ICEPP, The University of Tokyo 2013/12/13Tomoaki Nakamura ICEPP, UTokyo1

WLCG service instance and middle-ware

2013/12/13 Tomoaki Nakamura ICEPP, UTokyo 8

Service Num. EMI Version OS MachineryCREAM-CE 2 EMI-2 SL6 BM

Worker node 160 EMI-2 SL6 BMDPM-MySQL 1 EMI-2 SL5 BM

DPM-Disk 40 EMI-2 SL5 BMMon (APEL) 1 EMI-2 SL5 BM

WMS 1 EMI-3 SL5 VML&B 1 EMI-2 SL5 VM

MyProxy 1 EMI-2 SL5 VMSite-BDII 1 EMI-2 SL5 VMTop-BDII 1 EMI-2 SL5 VM

Argus 1 EMI-2 SL6 VMperfSONAR-PS 2 3.2.2 Centos5 BM

Squid 4 - SL5 BM

Page 9: Site Report: Tokyo Tomoaki Nakamura ICEPP, The University of Tokyo 2013/12/13Tomoaki Nakamura ICEPP, UTokyo1

Memory upgrade

2013/12/13 Tomoaki Nakamura ICEPP, UTokyo 9

lcg-ce01.icepp.jp: 80 nodes (1280 cores)32GB/node: 2GB RAM/core (=job slot), 4GB vmem/core

lcg-ce02.icepp.jp: 80 nodes (1280 cores)64GB/node: 4GB RAM/core (=job slot), 6GB vmem/core

• Useful for memory consuming ATLAS production jobs.• Available sites are rare.• We might be able to upgrade remaining half of WNs in 2014.

New Panda queue: TOKYO_HIMEM / ANALY_TOKYO_HIMEM (since Dec. 5, 2013)

Memory have been built up for half of the WNs (Nov. 6, 2013)

Page 10: Site Report: Tokyo Tomoaki Nakamura ICEPP, The University of Tokyo 2013/12/13Tomoaki Nakamura ICEPP, UTokyo1

I/O performance study

2013/12/13 Tomoaki Nakamura ICEPP, UTokyo 10

The number of CPU cores in the new worker node was increased from 8 cores to 16 cores.

Local I/O performance for the data staging area may become a possible bottleneck. We have checked the performance by comparing with a special worker node,which have a SSD for the local storage, in the production situation with real ATLAS jobs.

Normal worker nodeHDD: HGST Ultrastar C10K600, 600GB SAS, 10k rpmRAID1: DELL PERC H710P FS: ext3Sequential I/O: ~150MB/secIOPS: ~650 (fio tool)

Special worker nodeSSD: Intel SSD DC S3500 450GBRAID0: DELL PERC H710P FS: ext3Sequential I/O: ~400MB/secIOPS: ~40000 (fio tool)

Page 11: Site Report: Tokyo Tomoaki Nakamura ICEPP, The University of Tokyo 2013/12/13Tomoaki Nakamura ICEPP, UTokyo1

Results

2013/12/13 Tomoaki Nakamura ICEPP, UTokyo 11

Page 12: Site Report: Tokyo Tomoaki Nakamura ICEPP, The University of Tokyo 2013/12/13Tomoaki Nakamura ICEPP, UTokyo1

Direct mount via XRootD

2013/12/13 Tomoaki Nakamura ICEPP, UTokyo 12J. Elmsheuser, ATLAS S&C workshop Oct. 2013

I/O performance [Staging to local disk vs. Direct I/O from storage]Some improvements have been reported especially for the DPM storage.

ADC/user’s point of viewJobs will be almost freed from the input file size and the number of input files.

But, it should be checked more precisely…

Page 13: Site Report: Tokyo Tomoaki Nakamura ICEPP, The University of Tokyo 2013/12/13Tomoaki Nakamura ICEPP, UTokyo1

Summary of I/O study and others

2013/12/13 Tomoaki Nakamura ICEPP, UTokyo 13

Local I/O performance• The local I/O performance in our worker node have been studied by a comparison with HDD and SSD at the

mixture situation of running real ATLAS production jobs and analysis jobs.• We confirmed that HDD in the worker node at Tokyo Tier2 center is not a bottleneck for the long batch type jobs

at least for the situation of 16 jobs running concurrently.• The same thing should be checked also for the next generation worker node, which have more CPU cores greater

than 16 cores, and also for the ext4 or XFS file system.• The improvement of the I/O performance with respect to the direct mount of DPM should be confirmed by

ourselves more precisely.• Local network usage should be checked toward the next system design.

DPM:• The database for the DPM head node become very huge (~40GB).• We will add more RAM to the DPM head node, and it will be 128GB in total.• We are also planning to have Fusion I/O (high performance NAND type flush memory connected by PCI-express

bus) for the DPM head node to increase the maintainability of the DB.• Redundant configuration of the MySQL-DB should be studied for the daily backup.

Wide area network:• We will migrate all production instance to LHCONE ASAP (to be discussed on Monday).

Tier3:• SL6 migration is on going.• CVMFS will be available to use ATLAS_LOCAL_ROOT_BASE and nightly release.