99.9.25jps mtg @ matsue1 phenix computing center in japan (phenix cc-j) の採用技術...
TRANSCRIPT
99.9.25 JPS mtg @ Matsue 1
PHENIX Computing Center in JapanPHENIX Computing Center in Japan(PHENIX CC-J) (PHENIX CC-J) の採用技術の採用技術
澤田真也( KEK )市原卓、渡邊康(理研、理研 BNL 研究センター)
後藤雄二、竹谷篤、林直樹(理研)延與秀人、四日市悟(京大)、浜垣秀樹(東大 CNS )
99.9.25 JPS mtg @ Matsue 2
PHENIX CC-J
CC-J CC-J の構成要素の構成要素
Linux farm Data server HPSS Network Misc. softwares &
tools
99.9.25 JPS mtg @ Matsue 3
PHENIX CC-J
Linux farmLinux farm
Two boxes of AltaCluster http://www.altatech.com/products/clusters.html
– 16 nodes = 32 CPUs (will be doubled soon)
– Pentium II 450MHz (18.5 SpecINT95/cpu)
– Remote boot, remote monitoring, … Linux RedHat5.2, Kernel 2.2.11 with NFSv3 Patch PBS Batch Queuing System Memory: 512 MB/node Local Disk: 9-14 GB/node
– Benchmark test (Bonnie):write xxMB/s, read xxMB/s NFS mounted RAID5 Disks on SUN E450 100BaseT NIC on each node & Catalyst 2948G (gigabit Swit
ching Hub)
99.9.25 JPS mtg @ Matsue 4
PHENIX CC-J
AltaClusterAltaCluster
99.9.25 JPS mtg @ Matsue 5
PHENIX CC-J
Data ServerData Server
SUN E450: 400 MHz 2 CPU, 1GB Memory, 360GB Raid disk (One more E450 will be purchased soon.)– General ‘home’ machine
288GB Raid5 disk (1.6TB Raid5 will be purchased soon.)– Working space for users
Alteon Ace 180 Gigabit Switch (Jumbo frame operation)
99.9.25 JPS mtg @ Matsue 6
PHENIX CC-J
RAID performance measurementRAID performance measurement
Preliminary measurement on 16 Apr 1999 (T. Ichihara RIKEN) Hardware SUN E450 (Dual Ultra2 sparc, 400MHz, 1280 MB Memoy)
read write
BareÅ@Disk (Seagate 18GBST318275LC internal, ultra-
wide scsi)
13.33 MB/S ( 200 MB 15. sec.)14.18 MB/S ( 2000 MB 141. sec.)
14.3 MB/S ( 200 MB 14. sec.)12.58 MB/S ( 2000 MB 159. sec.)
Hardware RAID4288GB (IAI SNX-
960000ED-3200 cachememory 32MB)
13.33 MB/S ( 200 MB 15. sec.)14.18 MB/S ( 2000 MB 141. sec.)
13.3 MB/S ( 200 MB 15. sec.)12.90 MB/S ( 2000 MB 155. sec.)
Hardware RAID5288GB (IAI SNX-
960000ED-3200 cachememory 32MB)
16.7 MB/S ( 200 MB 12. sec.)16.39 MB/S ( 2000 MB 122. sec.)
14.3 MB/S ( 200 MB 14. sec.)12.99 MB/S ( 2000 MB 154. sec.)
Workingarea forusers
Software RAID 0+1(Solaris2.6 Disksuite)
25.00 MB/S ( 200 MB 8. sec.)23.26 MB/S ( 2000 MB 86. sec.)
11.1 MB/S ( 200 MB 18. sec.)10.10 MB/S ( 2000 MB 198. sec.)
Homearea forusers
Software RAID 5(Solaris2.6 Disksuite)
28.57 MB/S ( 200 MB 7. sec.)27.78 MB/S ( 2000 MB 72. sec.)
1.9 MB/S ( 200 MB 102. sec.)1.80 MB/S ( 2000 MB 1111. sec.)
99.9.25 JPS mtg @ Matsue 7
PHENIX CC-J
NFS performance measurementNFS performance measurement Test with bonnie ( bonnie -s 100 : )
– from a Linux node to RAID on ccjsun with NFS
– ap14 (kernel 2.2.10)
Sequential Output (write) Sequential Input (read) Random
per char block rewirte per char block seeks
MB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU M/sec %CPU /sec %CPU
100 374 4.6 483 1.3 523 1.5 9712 100.0 254.0 99.2 571.5 6.1
Sequential Output (write) Sequential Input (read) Random
per char block rewirte per char block seeks
MB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU M/sec %CPU /sec %CPU
100 6559 89.2 6554 16.3 6791 19.1 9650 100.0 252.6 101.1 1262.31
– ap15 (kernel 2.2.10 NFSv3)
Use NFSv3!
99.9.25 JPS mtg @ Matsue 8
PHENIX CC-J
HPSS (High Performance Storage System)HPSS (High Performance Storage System)
Hierarchical storage system HPSS server (SP2 5-node 20-CPU, with SP switch and Giga
bit NIC) 144 GB HPSS Cache disk (SSA Raid5) + 288 GB Work disk
(Raid 5) HPSS 4.1.1, AIX 4.3.2 STK Robot (4 RedWood drives, 100TB tape media) Alteon Ace 180 Gigabit Switch (Jumbo frame operation) Gigabit (jumbo frame) network and Hippi connection to SU
N/Linux fpt or ‘pftp’ (parallel ftp) is used for data access between H
PSS and SUN/Linux nodes.
99.9.25 JPS mtg @ Matsue 9
PHENIX CC-J
Overview of HPSS-CCJOverview of HPSS-CCJ
99.9.25 JPS mtg @ Matsue 10
PHENIX CC-J
HPSS HardwareHPSS Hardware
99.9.25 JPS mtg @ Matsue 11
PHENIX CC-J
HPSS Software ConfigurationHPSS Software Configuration
9076-550 POWER Parallel Server
7024-E30 Control Workstation
- AIX 4.3.2- PSSP 3.1 - DCE 2.2- Encina 4.2- HPSS 4.1.1
7043-240 Workstation for HPSS Monitor
- AIX 4.3.2- PSSP 3.1- C for AIX 4.4
- AIX 4.3.2- PSSP 3.1 - DCE 2.2- Encina 4.2- HPSS 4.1.1
- AIX 4.3.2- PSSP 3.1 - DCE 2.2- Encina 4.2- HPSS 4.1.1
- AIX 4.3.2- PSSP 3.1- DCE 2.2- Encina 4.2- HPSS 4.1.1
- AIX 4.3.2- PSSP 3.1 - DCE 2.2- Encina 4.2- HPSS 4.1.1
- AIX 4.3.2- DCE 2.2- Encina 4.2- HPSS 4.1.1- C for AIX 4.3- ssh 1.2.26- tcpwrapper 7.6
- C for AIX 4.4- ssh 1.2.26- tcpwrapper7.6- Advantape 41.1.7.2
- C for AIX 4.4- ssh 1.2.26- tcpwrapper7.6- Advantape 41.1.7.2
- C for AIX 4.4- ssh 1.2.26- tcpwrapper 7.6
- C for AIX 4.4- ssh 1.2.26- tcpwrapper 7.6
- C for AIX 4.4- ssh 1.2.26- tcpwrapper 7.6- Sammi 4.1.2
99.9.25 JPS mtg @ Matsue 12
PHENIX CC-J
STK Tape RobotSTK Tape Robot
Redwood drives: ~11MB/s/drive
Currently we have 4 drives. Thus totally about 45MB/s can be achieved.
50GB/cartridge * 2000cartridges = 100TB
Data (raw data and DSTs) will be transported with tape cartridges between RIKEN and BNL.
99.9.25 JPS mtg @ Matsue 13
PHENIX CC-J
NetworkNetwork LAN
– Gigabit ethernet with jumbo frame (9kB frame (normal: 1.5kB) available on AIX 4.2 or later) and HiPPI
– Gbit has a similar performance with HiPPI
– Gbit will be used. WAN
– HEPNET-J/SINET between Japanese institutions
– APAN between RIKEN and ESnet sites (BNL etc.)
99.9.25 JPS mtg @ Matsue 14
PHENIX CC-J
Network PerformanceNetwork Performance
Test with netperf http://www.netperf.org/netperf/NetperfPage.html
– More study needed to get nearly Gbit performance
Gigabit Ethernet(Junbo Frame)åoóRÉRÉAÅEÉTÅ[ÉoÅ[Å@Å®Å@CCJSUNRecv. Socket Size Send. Socket Size MB/s CPU UsageÅisysÅjCPU UsageÅiidÅj
262640 262144 46.8 22.7 76.6
CCJSUNÅ@Å®Å@ÉRÉAÅEÉTÅ[ÉoÅ[Recv. Socket Size Send. Socket Size MB/s CPU UsageÅisysÅjCPU UsageÅiidÅj
262144 262144 52.5 23.7 75.1
ÉRÉAÅEÉTÅ[ÉoÅ[Å@Å®Å@CCJSUNRecv. Socket Size Send. Socket Size MB/s CPU UsageÅisysÅjCPU UsageÅiidÅj
262640 262144 42.5 27.9 69.8
CCJSUNÅ@Å®Å@ÉRÉAÅEÉTÅ[ÉoÅ[Recv. Socket Size Send. Socket Size MB/s CPU UsageÅisysÅjCPU UsageÅiidÅj
262144 262144 36.9 20 77.7
99.9.25 JPS mtg @ Matsue 15
PHENIX CC-J
Data Transfer PerformanceData Transfer Performance
Test results with pftp (parallel ftp) between Linux nodes and HPSS– 100BaseT on Linux limits the performance?
99.9.25 JPS mtg @ Matsue 16
PHENIX CC-J
WAN WAN http://ccjsun.riken.go.jp/cgi-bin/ping_data_plot.plhttp://ccjsun.riken.go.jp/cgi-bin/ping_data_plot.pl
Remote Host is ns.bnl.gov packet size is 100 from Fri Aug 20 0:19:10 Japan 1999 to Sun Aug 29 23:
49:10 Japan 1999
There is a time tic every day
Remote Host is cnsuty.cns.s.u-tokyo.ac.jp packet size is 100 from Fri Aug 20 0:19:09 Japan 1999 to Sun Aug 2
9 23:49:09 Japan 1999 There is a time tic every day
99.9.25 JPS mtg @ Matsue 17
PHENIX CC-J
Key SoftwareKey Software
PBS: Batch Queuing System– http://pbs.mrj.com/
– Free package developed mainly at NAS of NASA AFS: File system with Kerberos
– Important files (source codes, libraries etc.) are on AFS at BNL.
– Mirroring from BNL Monitoring: MRTG
– CPU, memory, disk usage of each node as well as transmission rate via network
– http://www.ceres.dti.ne.jp/~riocat/webtools/mrtg/
– http://ccjsun.riken.go.jp/~yokkaich/mrtg/resourceWatch/index.html
99.9.25 JPS mtg @ Matsue 18
PHENIX CC-J
PHENIX SoftwarePHENIX Software
99.9.25 JPS mtg @ Matsue 19
PHENIX CC-J
SummarySummary
CC-J を構成する「部品」は一通りそろった。 各部品および全体としてのさまざまな性能を している。チェック おおむね所期の性能を出している。(予定通りの数が入れば要
求を満たす。) なお、細かい点でのバグ出し、性能の理解を進め、初期の要求
を満たす。1999 óvãÅ åªéûì_ 1999ó\íË 2001óvãÅ
CPU 2400SPECint95
592SPECint95
> 1184-1776SPECint95
10700SPECint95
Disk storage 5TB 0.3TB > 1.9TB 15TBDiskI/O 200MB/s ~15MB/s ~50MB/s? 600MB/sTape storage 100TB 100TB 100TB 100TBTapeI/O 67.5MB/s 45MB/s 67.5MB/s? 112.5MB/s