lhcb distributed computing and the grid v. vagnoni (infn bologna)
Post on 31-Dec-2015
29 Views
Preview:
DESCRIPTION
TRANSCRIPT
18 June 2002 V. Vagnoni BEAUTY 2002
1
LHCb Distributed Computingand the Grid
V. Vagnoni (INFN Bologna)
D. Galli, U. Marconi, V. Vagnoni INFN BolognaN. Brook BristolK. Harrison CambridgeE. Van Herwijnen, J. Closier, P. Mato CERNA. Khan EdinburghA. Tsaregorodtsev MarseilleH. Bulten, S. Klous NikhefF. Harris, I. McArthur, A. Soroko OxfordG. N. Patrick, G. Kuznetsov RAL
18 June 2002 V. Vagnoni BEAUTY 2002
2
Overview of presentation
• Current organisation of LHCb distributed computing
• The Bologna Beowulf cluster and its performance in distributed environment
• Current use of Globus and EDG middleware
• Planning for data challenge and the use of Grid
• Current LHCb Grid/applications R/D
• Conclusions
18 June 2002 V. Vagnoni BEAUTY 2002
3
History of distributed MC production
• Distributed System has been running for 3+ years & processed many millions of events for LHCb design.
• Main production sites:– CERN, Bologna, Liverpool, Lyon, NIKHEF & RAL
• Globus already used for job submission to RAL and Lyon
• System interfaced to GRID and demonstrated at EU-DG Review and NeSC/UK Opening.
• For 2002 Data Challenges, adding new institutes:– Bristol, Cambridge, Oxford, ScotGrid
• In 2003, add – Barcelona, Moscow, Germany, Switzerland &
Poland.
18 June 2002 V. Vagnoni BEAUTY 2002
4
Updatebookkeepingdatabase
Transferdata tomass store
Data quality check
Submit jobs remotelyvia Web
Executeon farm
Analysis
LOGICALFLOW
18 June 2002 V. Vagnoni BEAUTY 2002
5
Monitoring and Control of MC jobs
• LHCb has adopted PVSS II as prototype control and monitoring system for MC production.
– PVSS is a commercial SCADA (Supervisory Control And Data Acquisition) product developed by ETM.
– Adopted as Control framework for LHC Joint Controls Project (JCOP).
– Available for Linux and Windows platforms.
18 June 2002 V. Vagnoni BEAUTY 2002
7
Example of LHCb computing facility:
the Bologna Beowulf cluster• Set up at INFN-CNAF
– ˜100 CPUs hosted in Dual Processor machines (ranging from 866 MHz to 1.2 GHz PIII), 512 MB RAM
– 2 Network Attached Storage systems• 1 TB in RAID5, with 14 IDE disks + hot spare• 1 TB in RAID5, with 7 SCSI disks + hot spare
• Linux disk-less processing nodes with OS centralized on a file server (root file-system mounted over NFS)
• Usage of private network IP addresses and Ethernet VLAN– High level of network isolation– Access to external services (afs, mccontrol, bookkeeping db,
java servlets of various kinds, …) provided by means of NAT mechanism on a GW node
18 June 2002 V. Vagnoni BEAUTY 2002
8
Farm Configuration
NAS
Red Hat 7.2 (kernel 2.4.18)DNSNAT (IP masquerading)
Disk-less nodeCERN Red Hat 6.1Kernel 2.2.18PBS MasterMC control serverFarm Monitoring
Gateway
Fast Ethernet Switch
NAS
Power Distributor
EthernetLink
Power Control
Control Node
Processing Node 1
Processing Node n
Red Hat 7.2
Various services:Home directories
PXE remote boot,DHCP, NIS
1TB RAID 5 1TB RAID 5
Uplink
Mirrored disks (RAID 1)
Mirrored disks (RAID 1)
PublicVLAN
PrivateVLAN
Disk-less nodesCERN Red Hat 6.1Kernel 2.2.18PBS Slave
OS file-systemsMaster Server
18 June 2002 V. Vagnoni BEAUTY 2002
9
Fast ethernet switch
NAS 1TB
Ethernet controlled power distributor
Rack (1U dual-processor MB)
18 June 2002 V. Vagnoni BEAUTY 2002
10
Farm performance• Farm capable to simulate and reconstruct
about (700 LHCb-events/day)*(100 CPUs)=70000 LHCb-events/day
• Data transfer over the WAN to the CASTOR tape library at CERN realised by using bbftp– very good throughput (up to 70 Mbits/s over
currently available 100 Mbits/s)
18 June 2002 V. Vagnoni BEAUTY 2002
11
Current Use of Grid Middleware in development
system• Authentication
– grid-proxy-init• Job submission to DataGrid
– dg-job-submit• Monitoring and control
– dg-job-status– dg-job-cancel– dg-job-get-output
• Data publication and replication– globus-url-copy, GDMP
18 June 2002 V. Vagnoni BEAUTY 2002
12
Example 1:Job Submission
dg-job-submit /home/evh/sicb/sicb/bbincl1600061.jdl -o /home/evh/logsub/
bbincl1600061.jdl:#
Executable = "script_prod";
Arguments = "1600061,v235r4dst,v233r2";
StdOutput = "file1600061.output";
StdError = "file1600061.err";
InputSandbox = {"/home/evhtbed/scripts/x509up_u149","/home/evhtbed/sicb/mcsend","/home/evhtbed/sicb/fsize","/home/evhtbed/sicb/cdispose.class","/home/evhtbed/v235r4dst.tar.gz","/home/evhtbed/sicb/sicb/bbincl1600061.sh","/home/evhtbed/script_prod","/home/evhtbed/sicb/sicb1600061.dat","/home/evhtbed/sicb/sicb1600062.dat","/home/evhtbed/sicb/sicb1600063.dat","/home/evhtbed/v233r2.tar.gz"};
OutputSandbox = {"job1600061.txt","D1600063","file1600061.output","file1600061.err","job1600062.txt","job1600063.txt"};
18 June 2002 V. Vagnoni BEAUTY 2002
13
Job
Local disk
Compute Element
globus-url-copy
ReplicaCatalogue
NIKHEF - Amsterdam
CERN TESTBED
REST-OF-GRID
JobStorage Element
replica-get
publish
register-local-file
Storage Element
MSS
Data Data
Data
Example 2: Data Publishing & Replication
18 June 2002 V. Vagnoni BEAUTY 2002
14
LHCb Data Challenge 1 (July-September 2002)
• Physics Data Challenge (PDC) for detector, physics and trigger evaluations– based on existing MC production system – small
amount of Grid tech to start with
– Generate ~3*10**7 events (signal + specific background + generic b and c + min bias)
• Computing Data Challenge (CDC) for checking developing software– will make more extensive use of Grid middleware
• Components will be incorporated into PDC once proven in CDC
18 June 2002 V. Vagnoni BEAUTY 2002
15
GANGA: Gaudi ANd Grid Alliance
Joint Atlas (C. Tull) and LHCb (P. Mato) project,formally supported by GridPP/UK with 2 joint
Atlas/LHCb research posts at Cambridge and Oxford
GAUDI Program
GANGAGU
I
JobOptionsAlgorithms
Collective&
ResourceGrid
Services
HistogramsMonitoringResults
• Application facilitating end-user physicists and production managers the use of Grid services for running Gaudi/Athena jobs.
• a GUI based application that should help for the complete job life-time:- job preparation and
configuration- resource booking- job submission- job monitoring and control
18 June 2002 V. Vagnoni BEAUTY 2002
16
Required functionality
• Before Gaudi/Athena program starts– Security (obtaining certificates and credentials)– Job configuration (algorithm configuration, input data
selection, ...)– Resource booking and policy checking (CPU, storage,
network)– Installation of required software components– Job preparation and submission
• While Gaudi/Athena program is running:– Job monitoring (generic and specific)– Job control (suspend, abort, ...)
• After program has finished:– Data management (registration)
18 June 2002 V. Vagnoni BEAUTY 2002
17
Conclusions• LHCb already has distributed MC production using GRID
facilities for job submission• We are embarking on large scale data challenges
commencing July 2002, and we are developing our analysis model
• Grid middleware will be being progressively integrated into our production environment as it matures (starting with EDG, and looking forward to GLUE)
• R/D projects are in place – for interfacing users (production + analysis) and Gaudi/Athena
software framework to Grid services – for putting production system into integrated Grid environment
with monitoring and control• All work being conducted in close participation with EDG and
LCG projects– Ongoing evaluations of EDG middleware with physics jobs– Participate in LCG working groups e.g. Report on ‘Common use
cases for a HEP Common Application layer’ http://cern.ch/fca/HEPCAL.doc
top related