ccj computing center in japan for spin physics at rhic

12
CCJ Computing Center in Japan for spin physics at RHIC T. Ichihara , Y. Watanabe, S. Yokkaichi, O. Jinnouc hi, N. Saito, H. En’yo, M. Ishihara,Y.Goto (1) , S. Sawada (2) , H. H amagaki (3) RIKEN , RBRC (1) ,KEK (2) , CNS (3) Presented on 17th October 2001 at First Joint meeting of the Nuclear Physics Division of APS and JPS (Hawai i 2001)

Upload: ryann

Post on 05-Feb-2016

20 views

Category:

Documents


0 download

DESCRIPTION

CCJ Computing Center in Japan for spin physics at RHIC. T. Ichihara , Y. Watanabe, S. Yokkaichi, O. Jinnouchi, N. Saito, H. En’yo, M. Ishihara,Y.Goto (1) , S. Sawada (2) , H. Hamagaki (3) RIKEN , RBRC (1) ,KEK (2) , CNS (3) - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: CCJ Computing Center in Japan for spin physics at RHIC

CCJComputing Center in Japan for spin physics at RHIC

T. Ichihara, Y. Watanabe, S. Yokkaichi, O. Jinnouchi, N. Saito,

H. En’yo, M. Ishihara,Y.Goto(1) , S. Sawada(2), H. Hamagaki(3)

RIKEN, RBRC(1),KEK(2), CNS(3)

Presented on 17th October 2001 at First Joint meeting of the Nuclear Physics Division of APS and JPS (Hawaii 2001)

Page 2: CCJ Computing Center in Japan for spin physics at RHIC

RIKEN CCJ : Overview Scope

Center for the analysis of RHIC Spin Physics Principal site of computing for PHENIX simulation

PHENIX CC-J is aiming at covering most of the simulation tasks of the whole PHENIX experiments

Regional computing center in Japan Size

Data amount: handing 225 TB /year Disk Storage : ~ 20 TB, Tape Library: ~500 TB CPU performance : 13 k SPECint95 (~300 Pentium 3/4 CPU)

Schedule R&D for the CC-J started in April ‘98 at RBRC in BNL Construction began in April ‘99 over a three years period CCJ started operation in June 2000 CCJ will reach full scale in 2002

Page 3: CCJ Computing Center in Japan for spin physics at RHIC

Concept of RIKEN CCJ System

STKTapeRobot

HPSS

RCF

CRS

RawDST

BigDisk

SMPServers

CAS

DSTμDST

μDSTPhysics

40TB

Tape drive unitsto duplicate data

Tapes(50GB/volume)

Duplicating Facility

STKTapeRobot

importDST

DST

BigDisk

SMPServers

PC farmsfor ana. &simulation10k Spectnt95

DST μDST

15TB

PHENIX CC -J

μDSTsim.

μDST

Phys.

sim.ExportSim.

HPSSServers

HPSS

APAN/ESNETWAN

Trackreconstruction

20MB/s

HPSSServers

Tape drive unitsto duplicate data

Tapes(50GB/volume)

Duplicating Facility

PHENIX

Page 4: CCJ Computing Center in Japan for spin physics at RHIC

Annual Data amount• DST 150 TB• micro-DST 45 TB• Simulated Data 30 TB• Total 225 TB

Hierarchical Storage System• Handle data amount of 225 TB/year• Total I/O bandwidth: 112 MB/s

• HPSS system

Disk storage system• 15 TB capacity• All RAID system• I/O bandwidth: 520 MB/s

System Requirement for CCJ

CPU ( SPECint95)

Simulation 8200

Sim. Reconst 1300

Sim. ana. 170

Theor. Mode 800

Data Analysis 2000

Total 12470 SPECint95

( = 120K SPECint2000)

Data Accessibility (exchange 225 TB /year)

Data Duplication Facility (tape cartridge) Wide Area Network (IMnet/APAN/ESnet)

Software EnviromentAFS, Objectivity/DB, Batch Queuing System

Page 5: CCJ Computing Center in Japan for spin physics at RHIC

Requirements as a Regional Computing Center Software Environment

• Software environment of the CCJ needed be compatible to the PHENIX software environment at the RHIC Computing Facility (RCF) at BNL

• AFS (/afs/rhic) • Remote access over the network is very slow and unstable (AFS sever : BNL)• daily mirroring at CCJ and accessed via NFS from Linux farms

• Objectivity/DB • Remote access over the network is very slow and unstable (AMS server: BNL)• Local AMS server at CCJ (regularly update DB)

• Batch Queuing System: Load Sharing Facility (LSF 4.1) • RedHat Linux 6.1 (Kernel 2.2.19 with nfsv3 enabled), gcc-2.95.3 etc.• Veritas File System (Journaling file system on Solaris): free from fsck for ~TB disk

Data Accessibility• Need to exchange data of 225 TB/year to BNL RCF

• Most part of the data exchange is carried out with SD3 tape cartridges • Data duplicating Facility at both (10 MB/s performance by airbone)

• Some part of the data exchange is carried out over the Wide Area Network (WAN)• CC-J will use Asia-Pacific Advanced Network (APAN) for US-Japan connection

• http://www.apan.net/• APAN has currently ~100 Mbps bandwidth for Japan-US(STAR TAP) connection

• 3 MB/s transfer performance was obtained using bbftp protocol

Page 6: CCJ Computing Center in Japan for spin physics at RHIC

Plan and current status of RIKEN CCJ

1998 1999 2000 2001 2002

RBRC(BNL)

R&D for CC-J

RIKENWako

Phase 1Phase 2

Phase 31/ 3 scale2/ 3 scale

full scale

Prot ot ype of CPU farmsDat a Duplicat ion facilit y

April April April April

CC-J review at BNL(Dec. 1 99 8 )

HPSS Soft ware/ HardwareInst allat ion ( March 1999)( Supplement ary Budget)

CC-J starts operationat 1 / 3 scale(J une. 2 00 0)

Full scale CC-J(Mar. 2 0 0 2 )

CC-Jconstruction

CC-J frontend at BNL

April

CC-J Working Groupformed (Oct. 1 9 9 8 )

PHENIX Exp. at RHIC

CC-J Operation

Tape Robot/driveupgrade

2003

April

Oct 2001 JFY2002CPU farm (number) 221 288+CPU farm (SPECint95) 9000 13000CPU farm (SPECint2000) 90 k 130kTape Storage size(TB) 115 460Disk Storage size(TB) 21 21+HPSS Tape Drive(model)

4(Redwood)

10(9940C)

HPSS Tape I/O (MB/s) 45 200Data Disk I/O (MB/s) 720 720+SUN SMP Data Server 4 4+HPSS Server unit 5 5+

Page 7: CCJ Computing Center in Japan for spin physics at RHIC

Current configuration of CCJ

STKTapeRobot

HPSS

SP RouterAscend GRF

Tape Mover

Tape Mover

Disk Mover

HPSS Server

Disk Mover

HIPPI

HIPPISWITCH

288 GBLVD RAID

100BaseT x n

RIKEN LAN

RIKENsupercomputer

4 RedWooddrives

(Info. Blldg. 1F)

2F

32 Pentium III (700 MHz)+

96 Pentium III (850 MHz)+96 Pentium III (1000 MHz) 512 MB Memory /CPU

100 TB

1000BaseSX(9kB MTU)

1000BaseSX

ACSLS

GigabitSwitch #2 (L3)

1000BaseSX x 5 (9kB MTU)

1000BaseSX (9kB MTU)

WAN

IBM SP2 (AIX 4.3.2)

HPSS

Privateaddress

GigabitSwitch#1 (L3)

EPS-1000

Alteon 180(9kB MTU)

Alteon 180 (9KB MTU)

Gigabit SwitchCatalyst 2948G

Alta cluster) * 14 box

Switch

HPSSController

10/100BaseT

SUNACSLS

1000BaseSX

150 GB

RAID

compacDS20

AFS01AFS server

Altaclustercontrol WS

3.2 TBLVD RAID

288 GB

Raid (Work)

4 TB RaidHPSS Cache

ÇP00GB

SUN E450Data Server

5.7 TBFC RAID

6.0 TBFC RAID

6.0 TBFC RAID

ÇP00GB

SUN E450

Data Server

ÇP00GB

SUN E450

Data Server

ÇP00GB

SUN E450

Data Server

ÇP00GB

SUN E450

Data Server

Pentium III

Pentium III

Pentium III

Pentium III

Redhat 6.ÇPLinux

Pentium III

Pentium III

Pentium III

Pentium III

Redhat 6.ÇPLinux

Pentium III

Pentium III

Pentium III

Pentium III

Redhat 6.ÇPLinux

Pentium III

Pentium III

Pentium III

Pentium III

Redhat 6.ÇPLinux

Pentium III

Pentium III

Pentium III

Pentium III

Redhat 6.ÇPLinux

21.2 TB RAID

Pentium III

Pentium III

Pentium III

Pentium III

Redhat 6.ÇPLinux

Page 8: CCJ Computing Center in Japan for spin physics at RHIC

Components of RIKEN CCJ

STK Tape Robot (100 TB [240 TB] )

HPSS Server (IBM RS-6000/SP)StorageTek Tape Robot (100TB [250 TB])

CPU Farm of 224 CPUsGigabit Network Switch

x

5.7 TBRAID

12TBRAID

Data server(SUN E450)

Data servers(SUN E450)

Page 9: CCJ Computing Center in Japan for spin physics at RHIC

File Transfer over WAN s RIKEN - (Imnet) - APAN (~100 Mbps) -startap- ESnet - BNL

s Round Trip Time(RTT) for RIKEN-BNL :170 ms• RFC1323 (TCP Extensions for high performance, May 1992) describes the method of using large TCP window-size (> 64 KB)• FTP performance : 290 KB/s (64 KB windowsize), 640 KB/s(512KB window)

• Large TCP-window size is necessary to obtain high-transfer rate

BBFTP (http://doc.in2p3.fr/bbftp/) Software designed to quickly transfer files accross a wide area network. It has been written for the babar experiment in order to transfer big files transferring data through several parallel tcp streams RIKEN-BNL bbftp Performance (10 tcp streams for each session)

1.5 MB/s (1-sessinon) 3 MB/s (3-sessions)

MRTG graph during bbftp transfer between RIEKN and BNL

Page 10: CCJ Computing Center in Japan for spin physics at RHIC

Analysis for the first PHENIX experiment with CCJ Quark Matter 2001 conference (Jan 2001)

• 21 papers using CCJ / total 33 PHENIX papers Simulation

• 100k event simulation/reconst. (Hayashi)• DC Simulation (Jane)• Muon Moc Data Challenge (MDC) (Satohiro) 100k Au+Au (June 2001)

DST Production • 1 TB of Raw Data (Year-1) was transferred via WAN from BNL to RIKEN• PHENIX official DST (v03) production (Sawada) (Dec. 2000)

- about 40% of DSTs produced at CCJ and transferred to BNL RCF Detector calibration/simulation

• TOF -(Kiyomochi,Chujo, Suzuki) • EMCal - (Torii,Oyama, Matsumoto)• RICH -(Akiba,Shigaki, Sakaguchi)

Physics/simulation• Single electron spectrum - (Akiba, Hachiya)• Hadoron particle ratio/spectrum ( Ohnishi, Chujo)• Photon [π0 spectrum] (Oyama, Sakaguchi)

Page 11: CCJ Computing Center in Japan for spin physics at RHIC

CCJ Operation

Operation, maintenance and development of CC-J are carried out under the charge of the CCJ Planning and Coordinate Office (PCO).

CCJ Director (Chief Scientist of the Radiation Lab.)H. En’yo

Å@Planning and Coordination OfficeÅ@manager T. Ichihara (RIKEN and RBRC)

Å@technical manager Y. Watanabe (RIKEN and RBRC)

Å@scientific programming coordinator

Å@ H. En'yo (RIKEN and RBRC, PHENIX-EC)

Å@ H. Hamagaki (CNS-U-Tokyo, PHENIX-EC)

Å@PHENIX Liaison N. Saito (RIKEN and RBRC)

computer scientists S. Yokkaichi (RIKEN)

O. Jinnouchi (RIKEN)

Y. Goto (RBRC)

S. Sawada (KEK)

Å@

Technical Management OfficeÅ@Manager, Data duplication Y. Watanabe (RIKEN and RBRC)

Å@System engineer N. Otaki(IBM Japan)

Å@Tape duplication operator (TBD)

Page 12: CCJ Computing Center in Japan for spin physics at RHIC

Construction of the CCJ started in 1999 The CCJ operation started in June 2000 at 1/3 scale.

• 43 user’s account created. Hardware/upgrade and software improvement :

• 224 Linux CPU(90k Specint2000), 21 TB disk, 100TB HPSS Tape library• Data Duplicating Facility at BNL RCF started operation in Dec 2000.

PHENIX Year -1 experiment : summer in 2000• Analysis of the PHENIX first experiment with CCJ

• Official DST (v03) Product ion • Simulations• Detector calibration/simulation• Physics analysis/simulation• Quark Matter 2001 conference (Jan 2001)

• 21 papers using CCJ / total 33 PHENIX papers

PHENIX Year 2 (2001) Run started in August 2001 Spin experiments at RHIC will start in December 2001 !!

Summary