belle computing upgrade ichiro adachi 22 april 2005 super b workshop in hawaii

13
Belle computing upgrade Ichiro Adachi 22 April 2005 Super B workshop in Hawaii

Post on 19-Dec-2015

223 views

Category:

Documents


3 download

TRANSCRIPT

Belle computing upgrade

Ichiro Adachi

22 April 2005

Super B workshop in Hawaii

2

Belle’s computing goal

• Data processing 3 months to reprocess entire data accumulated so far using all of K

EK computing resources efficient resources flexibility

• Successful ( I think at least ) 1999 - 2004 all data processed and used for analysis for summer c

onferences ( good or bad? ) Example: Ds

J(2317) from David Brown’s CHEP04 talk

BaBar discovery paper : Feb 2003 Belle: confirm Ds

J(2317) : Jun 2003 Belle: discover B Ds

J(2317)D: Oct 2003 BaBar: confirm B Ds

J(2317)D: Aug 2004

“How can we keep computing power ?”

also validate software reliability

3

Present Belle computing system

Athron 1.67GHz50TB disk

50TB IDE disk

155 TB disk+

Tape Library1.29PB S-AIT

Xeon 2.8GHz

Tape Library500TB DTF2

Sparc 0.5GHz

HSM 4TB diskTape Library120TB DTF2Xeon 0.7GHz

8TB disk

Pen3 1.26GHz

Xeon 3.2GHz

Athron 1.67GHz

Xeon 3.4GHz 2 major components• under rental contract

• start from 2001• Belle own system

4

Computing resources evolving

• Purchased what we needed as we accumulated integrated luminosities so far

• Rental system contract Expired on 2006 Jan. Has to be replaced to new one

0

500

1000

1500

2000

2500

3000

3500

2001.2 2005.10

400

800

1200

1600

2000

2001.2 2005.10

40

80

120

160

200

2001.2 2005.1

CPU HSM volume Disk capacity

GHz TB TB

Processing power at 2005: 7fb-1/day 5fb-1/day at 2004

5

New rental system

Rental period

x 6 data

• Specifications Based on Oide’s luminosity scenari

o 6-year contract to 2012 Jan Middle of bidding process

• 40,000 specCINT2000_rates compute servers at 2006

• 5(1)PB tape(disk) storage system with extensions

• fast enough network connection to read/write data at the rate of 2-10GB/s (2 for DST, 10 for physics analysis)

• User friendly and efficient batch system that can be used collaboration wide

In a single 6-year lease contract we hope to double the resource in the middle, assuming Moore’s law in the

IT commodity market

6

Lessons and remarks

• Data size and access• Mass storage

Hardware Software

• Compute server

7

0

40

80

120

160

200

0 50 100 150 200

Data size & access

• Possible consideration rawdata

rawdata size integ. lum 1 PB for 1 ab-1 (at least) Read once or twice/year Keep at archive

compact beam data for analysis (“mini-DST”) 60 TB for 1 ab-1

Access frequently and (almost) randomly

Easy access preferable MC

180 TB for 1 ab-1

3 beam data in Belle’s law Read all data files by most of u

sers

Belle

2000 20012002

2003

2004

rawdata/yr(TB)

Integ.luminosity/yr(fb-1)

Detector & accelerator upgrades can change this slope

on disk

on disk? where to go?

8

Mass storage : hardware

• Central system in the coming computing• Lesson from Belle

We have been using SONY DTF drive technology since 1999. SONY DTF2…No roadmap to future development. Dead-end. S

ONY’s next technology choice is S-AIT. Testing a tape library of S-AIT from 2004. Already recorded in 5000 DTF2 tapes. We have to move…

2Gbit FC switch

The front-end disks•18 dual Xeon PC servers with two SCSI channels•8(10) connecting one 16 320(400)GB IDE disk RAID system•Total capacity is 56(96)TB

The back-end S-AIT system•SONY Petasite tape library system in 7 rack wide space• main system (12 drives) + 5 cassette consoles with total capacity of 1.3 PB (2500 tapes)

vendor’s trend

cost & time

9

Mass storage : software

• 2nd lesson We are moving from direct tape access to hierarchical storage s

ystem We have learned that automatic file migration is quite conveni

ent. But we need a lot of capacity so that we do not need operator

s to mount tapes Most of users go through all of (MC) data available in HSM, and

each access from user is random, not controlled at all. Each access requires tape reloading to copy data onto disk. # of reloading for a tape is hitting its limit !

in our usage, HSM not archive, but a big cache

need optimization in both of HSM control & user I/O

huge disk may help ?

10

Compute server

• 40,000 specCINT2000_rate at 2006• Assume Moor’s law is still valid for coming years• Bunch of PC’s is difficult for us to manage

At Belle, limited human resources Belle software distribution

• “Space” problem One floor of Tsukuba exp. hall B3(~10m20m)

2002 cleared and flooring 2005 full ! No more space ! Air condition system should be equipped “electricity” probl

em:~500W for dual 3.5GHz CPUs Moor’s law is not enough to solve this problem

11

Software

• Simulation & reconstruction Geant4 framework for Supe

r Belle detector underway Simulation with beam back

ground is being done For reconstruction, robustn

ess against BG can be a key.

12

Grid

• Distributed computing at Belle MC production carried out at 20 sites outside KEK ~45 % of MC events produced at remote institutes from 2004 Infrastructure

Super-SINET 1Gbps to major universities inside Japan Need improvements for other sites

• Grid Should help us Effort with KEK computing research center

SRB(storage resource broker) Gfarm at Grid technology research center, National Institute o

f Advanced Industrial Science and Technology(AIST)

13

Summary

• Computing for physics output Try keeping the present goal

• Rental system Renew from 2006 Jan

• Mass storage PB scale: not only size but also type of accesses Technology choice and vendor’s roadmap

• CPU Moor’s law alone does not solve “space” problem

• Software Geant4 simulation underway

• Grid Infrastructure getting better in Japan (SuperSINET)