lhcb distributed computing and the grid v. vagnoni (infn bologna)

18 June 2002 V. Vagnoni BEAUTY 2002

LHCb Distributed Computingand the Grid

V. Vagnoni (INFN Bologna)

D. Galli, U. Marconi, V. Vagnoni INFN BolognaN. Brook BristolK. Harrison CambridgeE. Van Herwijnen, J. Closier, P. Mato CERNA. Khan EdinburghA. Tsaregorodtsev MarseilleH. Bulten, S. Klous NikhefF. Harris, I. McArthur, A. Soroko OxfordG. N. Patrick, G. Kuznetsov RAL

Overview of presentation

• Current organisation of LHCb distributed computing

• The Bologna Beowulf cluster and its performance in distributed environment

• Current use of Globus and EDG middleware

• Planning for data challenge and the use of Grid

• Current LHCb Grid/applications R/D

• Conclusions

History of distributed MC production

• Distributed System has been running for 3+ years & processed many millions of events for LHCb design.

• Main production sites:– CERN, Bologna, Liverpool, Lyon, NIKHEF & RAL

• Globus already used for job submission to RAL and Lyon

• System interfaced to GRID and demonstrated at EU-DG Review and NeSC/UK Opening.

• For 2002 Data Challenges, adding new institutes:– Bristol, Cambridge, Oxford, ScotGrid

• In 2003, add – Barcelona, Moscow, Germany, Switzerland &

Poland.

Updatebookkeepingdatabase

Transferdata tomass store

Data quality check

Submit jobs remotelyvia Web

Executeon farm

Analysis

LOGICALFLOW

Monitoring and Control of MC jobs

• LHCb has adopted PVSS II as prototype control and monitoring system for MC production.

– PVSS is a commercial SCADA (Supervisory Control And Data Acquisition) product developed by ETM.

– Adopted as Control framework for LHC Joint Controls Project (JCOP).

– Available for Linux and Windows platforms.

Example of LHCb computing facility:

the Bologna Beowulf cluster• Set up at INFN-CNAF

– ˜100 CPUs hosted in Dual Processor machines (ranging from 866 MHz to 1.2 GHz PIII), 512 MB RAM

– 2 Network Attached Storage systems• 1 TB in RAID5, with 14 IDE disks + hot spare• 1 TB in RAID5, with 7 SCSI disks + hot spare

• Linux disk-less processing nodes with OS centralized on a file server (root file-system mounted over NFS)

• Usage of private network IP addresses and Ethernet VLAN– High level of network isolation– Access to external services (afs, mccontrol, bookkeeping db,

java servlets of various kinds, …) provided by means of NAT mechanism on a GW node

Farm Configuration

Red Hat 7.2 (kernel 2.4.18)DNSNAT (IP masquerading)

Disk-less nodeCERN Red Hat 6.1Kernel 2.2.18PBS MasterMC control serverFarm Monitoring

Gateway

Fast Ethernet Switch

Power Distributor

EthernetLink

Power Control

Control Node

Processing Node 1

Processing Node n

Red Hat 7.2

Various services:Home directories

PXE remote boot,DHCP, NIS

1TB RAID 5 1TB RAID 5

Uplink

Mirrored disks (RAID 1)

PublicVLAN

PrivateVLAN

Disk-less nodesCERN Red Hat 6.1Kernel 2.2.18PBS Slave

OS file-systemsMaster Server

Fast ethernet switch

NAS 1TB

Ethernet controlled power distributor

Rack (1U dual-processor MB)

Farm performance• Farm capable to simulate and reconstruct

about (700 LHCb-events/day)*(100 CPUs)=70000 LHCb-events/day

• Data transfer over the WAN to the CASTOR tape library at CERN realised by using bbftp– very good throughput (up to 70 Mbits/s over

currently available 100 Mbits/s)

Current Use of Grid Middleware in development

system• Authentication

– grid-proxy-init• Job submission to DataGrid

– dg-job-submit• Monitoring and control

– dg-job-status– dg-job-cancel– dg-job-get-output

• Data publication and replication– globus-url-copy, GDMP

Example 1:Job Submission

dg-job-submit /home/evh/sicb/sicb/bbincl1600061.jdl -o /home/evh/logsub/

bbincl1600061.jdl:#

Executable = "script_prod";

Arguments = "1600061,v235r4dst,v233r2";

StdOutput = "file1600061.output";

StdError = "file1600061.err";

InputSandbox = {"/home/evhtbed/scripts/x509up_u149","/home/evhtbed/sicb/mcsend","/home/evhtbed/sicb/fsize","/home/evhtbed/sicb/cdispose.class","/home/evhtbed/v235r4dst.tar.gz","/home/evhtbed/sicb/sicb/bbincl1600061.sh","/home/evhtbed/script_prod","/home/evhtbed/sicb/sicb1600061.dat","/home/evhtbed/sicb/sicb1600062.dat","/home/evhtbed/sicb/sicb1600063.dat","/home/evhtbed/v233r2.tar.gz"};

OutputSandbox = {"job1600061.txt","D1600063","file1600061.output","file1600061.err","job1600062.txt","job1600063.txt"};

Local disk

Compute Element

globus-url-copy

ReplicaCatalogue

NIKHEF - Amsterdam

CERN TESTBED

REST-OF-GRID

JobStorage Element

replica-get

publish

register-local-file

Storage Element

Data Data

Example 2: Data Publishing & Replication

LHCb Data Challenge 1 (July-September 2002)

• Physics Data Challenge (PDC) for detector, physics and trigger evaluations– based on existing MC production system – small

amount of Grid tech to start with

– Generate ~3*10**7 events (signal + specific background + generic b and c + min bias)

• Computing Data Challenge (CDC) for checking developing software– will make more extensive use of Grid middleware

• Components will be incorporated into PDC once proven in CDC

GANGA: Gaudi ANd Grid Alliance

Joint Atlas (C. Tull) and LHCb (P. Mato) project,formally supported by GridPP/UK with 2 joint

Atlas/LHCb research posts at Cambridge and Oxford

GAUDI Program

GANGAGU

JobOptionsAlgorithms

Collective&

ResourceGrid

Services

HistogramsMonitoringResults

• Application facilitating end-user physicists and production managers the use of Grid services for running Gaudi/Athena jobs.

• a GUI based application that should help for the complete job life-time:- job preparation and

configuration- resource booking- job submission- job monitoring and control

Required functionality

• Before Gaudi/Athena program starts– Security (obtaining certificates and credentials)– Job configuration (algorithm configuration, input data

selection, ...)– Resource booking and policy checking (CPU, storage,

network)– Installation of required software components– Job preparation and submission

• While Gaudi/Athena program is running:– Job monitoring (generic and specific)– Job control (suspend, abort, ...)

• After program has finished:– Data management (registration)

Conclusions• LHCb already has distributed MC production using GRID

facilities for job submission• We are embarking on large scale data challenges

commencing July 2002, and we are developing our analysis model

• Grid middleware will be being progressively integrated into our production environment as it matures (starting with EDG, and looking forward to GLUE)

• R/D projects are in place – for interfacing users (production + analysis) and Gaudi/Athena

software framework to Grid services – for putting production system into integrated Grid environment

with monitoring and control• All work being conducted in close participation with EDG and

LCG projects– Ongoing evaluations of EDG middleware with physics jobs– Participate in LCG working groups e.g. Report on ‘Common use

cases for a HEP Common Application layer’ http://cern.ch/fca/HEPCAL.doc

lhcb distributed computing and the grid v. vagnoni (infn bologna)

vagnoni beauty

job submissiondgjob

vagnoni infn bolognad

lhcb design

lhcb distributed computingand

lhcbeventsdaydata transfer

control of mc jobslhcb

control framework

Documents

infn computing for lhcb

vincenzo vagnoni infn bologna lhcb light meeting 2...

lhcb-infn computing for the years 2003-2005 csn1, perugia,...

ifae2006stefania vecchi - infn bologna1 ricerche di nuova...

lhcb: what to do with the first month of data taking ·...

vci 2004gaia lanfranchi-lnf/infn1 time resolution...

the$upgrade$program$of$the$lhcb$ … ·...

new results on charm spectroscopy in lhcb. ·...

lhcb impact on ckm fits vincenzo vagnoni (for the lhcb...

3 sept 2001f harris chep, beijing 1 moving the lhcb monte...

lhcb the lhcb data management system philippe charpentier...

lhcb muon systemlhcb-muon.web.cern.ch/lhcb-muon/alessia lhcb...

vagnoni grant proposal submission 2 full word07

angelo carbone infn bologna (on behalf of the lhcb...

lhcb-infn computing for the years 2003-2005

the optical transmitters for the lhcb l0 calorimeter trigger...

april 6, 2000 lhcb event data model pavel binko, gloria...

dressing test for the lhcb muon mwp chambers lecc 2006...

lhcb on-line/off-line computing domenico galli, bologna infn...

lhcb prospects for rare decays · full angular analysis...