jean-yves nief cc-in2p3, lyon hepix-hepnt, fermilab october 22nd – 25th, 2002

16
Jean-Yves Nief CC-IN2P3, Lyon HEPiX-HEPNT, Fermilab October 22nd – 25th, 2002

Upload: david-barton

Post on 31-Dec-2015

224 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Jean-Yves Nief CC-IN2P3, Lyon HEPiX-HEPNT, Fermilab October 22nd – 25th, 2002

Jean-Yves Nief

CC-IN2P3, Lyon

HEPiX-HEPNT, Fermilab October 22nd – 25th, 2002

Page 2: Jean-Yves Nief CC-IN2P3, Lyon HEPiX-HEPNT, Fermilab October 22nd – 25th, 2002

HEPiX-HEPNT Conference, Fermilab, October 22nd-25th 2002

2

Talk’s outlineTalk’s outline

1) Overview of BaBar: motivation for a TierA.

2) Hardware available for the CC-IN2P3 TierA (servers, storage, batch workers, network).

3) Software issues (maintenance, data import).

4) Resources usage (CPU used…).

5) Problems encountered (hardware, software).

6) BaBar-Grid and future developments.

Page 3: Jean-Yves Nief CC-IN2P3, Lyon HEPiX-HEPNT, Fermilab October 22nd – 25th, 2002

HEPiX-HEPNT Conference, Fermilab, October 22nd-25th 2002

3

BaBar: a short overviewBaBar: a short overview• Study of CP violation using B mesons, located at SLAC.• Since 1999, more than 88 millions B-B events collected. ~ 660 TB of data stored (real data + simulation)

How is it handled ?• Object oriented techniques: C++ software and OO database system

(Objectivity).• For data analysis @ SLAC: 445 batch workers (500 CPUs), 127 Objy

servers + ~50 TB of disk + HPSS.But: important users needs (> 500 physicists)=>saturation of the system. collaborators spread world-wide (America, Europe).

Idea: creation of mirror sites where data analysis/simu prod could be done.

Page 4: Jean-Yves Nief CC-IN2P3, Lyon HEPiX-HEPNT, Fermilab October 22nd – 25th, 2002

HEPiX-HEPNT Conference, Fermilab, October 22nd-25th 2002

4

CC-IN2P3 Tier A: hardware (I)CC-IN2P3 Tier A: hardware (I)

• 19 Objectivity servers: SUN machines. - 8 Sun Netra 1405T (4 CPUs). - 2 Sun 4500 (4 CPUs). - 1 Sun 1450 (4 CPUs). - 8 Sun 250 (2 CPUs).

9 servers for data access for analysis jobs.2 databases catalog servers.6 servers for databases transactions handling.1 server for Monte-Carlo production.1 server for data import/export.

• 20 TB of disks.

Page 5: Jean-Yves Nief CC-IN2P3, Lyon HEPiX-HEPNT, Fermilab October 22nd – 25th, 2002

HEPiX-HEPNT Conference, Fermilab, October 22nd-25th 2002

5

Hardware (II): Storage systemHardware (II): Storage system• Mass storage system:

20 % available on disk => automatic staging required.

• Storage for private use: Temporary storage: 200 GB NFS space. Permanent storage:

- For small files (log files…): Elliot archiving system. - For large files (ntuples…) > 20 GB: HPSS (2% of the total occupancy).

> 100 TB in HPSS

Page 6: Jean-Yves Nief CC-IN2P3, Lyon HEPiX-HEPNT, Fermilab October 22nd – 25th, 2002

HEPiX-HEPNT Conference, Fermilab, October 22nd-25th 2002

6

Hardware (III): the networkHardware (III): the network• Massive data import from Slac ( ~ 80 TB in one year ).• Data needs to be available in Lyon within a short amount of time

(max: 24 - 48 hours). Large bandwidth between SLAC and IN2P3 required. 2 roads:

CC-IN2P3 Renater US : 100 Mbs/s CC-IN2P3 CERN US : 155 Mbs/s (until this

week) CC-IN2P3 Geant US : 1 Gbs/s (from now on)

Full potential never reached (not understood)

Page 7: Jean-Yves Nief CC-IN2P3, Lyon HEPiX-HEPNT, Fermilab October 22nd – 25th, 2002

HEPiX-HEPNT Conference, Fermilab, October 22nd-25th 2002

7

Hardware (IV): the batch and interactive farm

Hardware (IV): the batch and interactive farm

• The batch farm (shared):

– 20 Sun Ultra 60 dual processor.

– 96 Linux PIII-750 MHz dual processor, NetFinity 4000R.

– 96 Linux PIII-1GHz dual processor, IBM X-series.

424 CPUs

• The interactive farm (shared):– 4 Sun machines.

– 12 Linux machines.

Page 8: Jean-Yves Nief CC-IN2P3, Lyon HEPiX-HEPNT, Fermilab October 22nd – 25th, 2002

HEPiX-HEPNT Conference, Fermilab, October 22nd-25th 2002

8

Software (I): BaBar releases, Objectivity

Software (I): BaBar releases, Objectivity

• BaBar releases:• Needs to keep up with evolution of the BaBar software at Slac.

new BaBar software releases have to be installed as soon as they are available.

• Objectivity and related issues:• Development of tools:

To monitor the servers activity, HPSS and batch resources.To survey the Objectivity processes on the servers (« sick »

daemons, transactions locks…).

• Maintenance: software upgrades, load balancing of the servers.

• Debugging the Objy problems both on client and server side.

Page 9: Jean-Yves Nief CC-IN2P3, Lyon HEPiX-HEPNT, Fermilab October 22nd – 25th, 2002

HEPiX-HEPNT Conference, Fermilab, October 22nd-25th 2002

9

Software (II): data import mechanism

Software (II): data import mechanism

Data catalog available for users through a mySql database.

(1) SLACCern IN2P3

(2) SLACRenaterIN2P3

• < size of the dbs > ~ 500 MB• using multi-stream transfer (bbftp: designed for big files).• extraction when new or updated dbs available.• import in Lyon launched when extraction @ SLAC is finished.

Page 10: Jean-Yves Nief CC-IN2P3, Lyon HEPiX-HEPNT, Fermilab October 22nd – 25th, 2002

HEPiX-HEPNT Conference, Fermilab, October 22nd-25th 2002

10

Resources usage (I)Resources usage (I)Tier A officially opened last fall.• ~ 200 - 250 analysis jobs running in parallel (the batch system can

handle up to 600 jobs in // ).• ~ 60 – 70 MC production jobs running in //. already ~ 50 millions events produced in Lyon. now represents ~ 10-15% of the total weekly BaBar MC prod. ~ 1/3 of the jobs running are BaBar jobs.

• Up to 4500 jobs in queue during the busiest periods.

Page 11: Jean-Yves Nief CC-IN2P3, Lyon HEPiX-HEPNT, Fermilab October 22nd – 25th, 2002

HEPiX-HEPNT Conference, Fermilab, October 22nd-25th 2002

11

Resource usage (II)Resource usage (II)• BaBar: top CPU consumer

group in the last 4 months at IN2P3.

• Second CPU consumer since the beginning of the year.

MC prod represents 25 – 30% of the total CPU time used.

~ 25 – 30% of CPU for analysis used by remote users.

(*) 1 unit = 1/8 hour on PIII, 1 GHz.

(*)

Page 12: Jean-Yves Nief CC-IN2P3, Lyon HEPiX-HEPNT, Fermilab October 22nd – 25th, 2002

HEPiX-HEPNT Conference, Fermilab, October 22nd-25th 2002

12

Resources usage (III)Resources usage (III)• 20% of the data on disk

dynamic staging via HPSS (RFIO interface).– ~ 80 s for a staging request.– Up to 3000 staging requests

possible per day Not a limitation for CPU

efficiency. Needs less disk space, allow to

save money.

Page 13: Jean-Yves Nief CC-IN2P3, Lyon HEPiX-HEPNT, Fermilab October 22nd – 25th, 2002

HEPiX-HEPNT Conference, Fermilab, October 22nd-25th 2002

13

Problems encounteredProblems encountered• A few problems with the availability of data in Lyon due

to the complexity of the export/import procedure.

• Network bandwidth for data import a bit erratic, maximum never reached.

• Objectivity related bugs (most of them due to Objy server problems).

• Some HPSS outages, system overloaded (software related + hardware limitations): solved better performance now.

• During peak activity (e.g. before the summer conference), huge backlog on the batch system.

Page 14: Jean-Yves Nief CC-IN2P3, Lyon HEPiX-HEPNT, Fermilab October 22nd – 25th, 2002

HEPiX-HEPNT Conference, Fermilab, October 22nd-25th 2002

14

The Tier A and the outer world: BaBar Grid @ IN2P3

The Tier A and the outer world: BaBar Grid @ IN2P3

• Involvement of BaBar to use Grid technologies.• Storage Resource Broker (SRB) and MetaCatalog

(MCAT) software installed and tested @ IN2P3:– Allows to access data sets and resources based on their

attributes rather than their physical locations. Future for the data distribution between SLAC and

IN2P3.• Tests @ IN2P3 of the EDG software using BaBar analysis

applications: possible to remotely submit a job @ IN2P3 to RAL and SLAC. Prototype of a tool to remotely submit jobs: December

2002.

Page 15: Jean-Yves Nief CC-IN2P3, Lyon HEPiX-HEPNT, Fermilab October 22nd – 25th, 2002

HEPiX-HEPNT Conference, Fermilab, October 22nd-25th 2002

15

CC-IN2P3 Tier A: future developments

CC-IN2P3 Tier A: future developments

• 2 new Objy servers + new disks (near future):

– 1 allocated to MC prod goal: x 2 the MC production.

– Less staging requests to HPSS.

• 72 new Linux batch workers ( PIII, 1.4 Ghz) CPU power increased by 50% (shared with others).

• Compression of the databases on disk (client or server decompression on the fly) HPSS load decreased.

• Installation of a dynamic load balancing system on the Objy servers more efficient (next year).

Page 16: Jean-Yves Nief CC-IN2P3, Lyon HEPiX-HEPNT, Fermilab October 22nd – 25th, 2002

HEPiX-HEPNT Conference, Fermilab, October 22nd-25th 2002

16

ConclusionConclusion

• BaBar Tier A in Lyon running full steam.

• ~ 25 – 30 % of the CPU consumed by analysis jobs used by remote users.

• Significant resources at CC-IN2P3 dedicated to BaBar (CPU: 2nd biggest user this year, HPSS: first staging requester).

• Contribution to BaBar overall effort increasing thanks to:– New Objy servers and disk space.

– New batch workers (72 new Linux this year, ~ 200 next year).

– HPSS new tape drivers.

– Database compression and dynamic load balancing of the servers.