gridifying the lhcb monte carlo production system
DESCRIPTION
Gridifying the LHCb Monte Carlo production system. Eric van Herwijnen, CERN [email protected] Tuesday, 19 february 2002 Talk given at GGF4, Toronto. Contents. LHCb LHCb distributed computing environment Current GRID involvement Functionality of current Monte Carlo system - PowerPoint PPT PresentationTRANSCRIPT
Gridifying the LHCb Monte Carlo Gridifying the LHCb Monte Carlo production systemproduction system
Eric van Herwijnen, CERNEric van Herwijnen, [email protected]@cern.ch
Tuesday, 19 february 2002Tuesday, 19 february 2002
Talk given at GGF4, TorontoTalk given at GGF4, Toronto
ContentsContents
LHCbLHCb LHCb distributed computing environmentLHCb distributed computing environment Current GRID involvementCurrent GRID involvement Functionality of current Monte Carlo systemFunctionality of current Monte Carlo system Integration of DataGrid middlewareIntegration of DataGrid middleware Monitoring and controlMonitoring and control Requirements of DataGrid middlewareRequirements of DataGrid middleware
LHCbLHCb
LHC collider experimentLHC collider experiment 10109 9 events * 1Mb = 1 Pbevents * 1Mb = 1 Pb Problems of data storage, access and Problems of data storage, access and
computationcomputation Monte Carlo simulation very important for Monte Carlo simulation very important for
detector designdetector design Need a distributed modelNeed a distributed model Create, distribute and keep track of data Create, distribute and keep track of data
automaticallyautomatically
LHCb distributed computing LHCb distributed computing environmentenvironment
15 countries, 13 European + Brazil, China, 50 15 countries, 13 European + Brazil, China, 50 institutesinstitutes
Tier-0: CERNTier-0: CERN Tier-1: RAL, IN2P3 (Lyon), INFN (Bologna), Tier-1: RAL, IN2P3 (Lyon), INFN (Bologna),
Nikhef, CERN + ?Nikhef, CERN + ? Tier-2: Liverpool, Edinburgh/Glasgow, Tier-2: Liverpool, Edinburgh/Glasgow,
Switzerland + ? (grow to ~10)Switzerland + ? (grow to ~10) Tier-3: 50 throughout collaborationTier-3: 50 throughout collaboration Ongoing negotiatons for centres Tier-1/2/3: Ongoing negotiatons for centres Tier-1/2/3:
Germany, Russia, Poland, Spain, BrazilGermany, Russia, Poland, Spain, Brazil
Current GRID involvementCurrent GRID involvement
EU DataGrid project (involves HEP, Biology, EU DataGrid project (involves HEP, Biology, Medecine and Earth Observation sciences)Medecine and Earth Observation sciences)
Active in WP8 (HEP applications) of DataGridActive in WP8 (HEP applications) of DataGrid Use “middleware” (WP1-5) + Testbed (WP6) + Use “middleware” (WP1-5) + Testbed (WP6) +
Network (WP7) Network (WP7) Current distributed system works since some Current distributed system works since some
time, LHCb is:time, LHCb is: Grid enabled, but not Grid dependentGrid enabled, but not Grid dependent
MC production facilities MC production facilities (summer 2001)(summer 2001)
CentreCentre Max. (av.) # of Max. (av.) # of CPUs available CPUs available simultaneouslysimultaneously
Batch Batch SystemSystem
Typical Typical weekly weekly productionproduction
% submitted through % submitted through GRIDGRID
CERNCERN 315 (60)315 (60) LSFLSF 85 k85 k 10%10%
RALRAL 100 (60)100 (60) PBSPBS 35k 35k 100%100%
IN2P3IN2P3 225 (60)225 (60) BQSBQS 35k35k 100%100%
LiverpoolLiverpool 300 (250)300 (250) CustomCustom 150k150k 0%0%
BolognaBologna 20 (20)20 (20) PBSPBS 35k35k 0%0%
NikhefNikhef 40 (40)40 (40) PBSPBS 35k35k 0%0%
BristolBristol 10 (10)10 (10) PBSPBS 15k15k 0%0%
Update bookkeepingdatabase
Transfer data toMass store
Data Quality Check
Submit jobs remotelyviaWeb
Monitorperformanceof farm viaWeb
Executeon farm
GRID-enabling productionGRID-enabling production
Construct job script and submit via Web(dg- authentication, dg-job-submit)
•Run mc executable
•write log to Web
•copy data to mass store
(dg-data-copy)
•call CERN servlet
mass store
•call servlet to copy data from local mass store to CERN
•update bookkeeping db
(?LDAP-now Oracle)
•FTP servlet
(dg-data-replication)
•copy data to CERN mass store
Gridi-fying the MC Gridi-fying the MC production systemproduction system
Provide a convenient tool for DataGrid Testbed Provide a convenient tool for DataGrid Testbed validation tests validation tests
Feed back improvements into the MC system Feed back improvements into the MC system currently in productioncurrently in production
Clone current system, replace commands by Clone current system, replace commands by DataGrid middlewareDataGrid middleware
Report back to WP8 and other workpackages as Report back to WP8 and other workpackages as requiredrequired
Monitoring and control of Monitoring and control of running jobsrunning jobs
Control system to monitoring distributed production Control system to monitoring distributed production (based on PVSS, author: Clara Gaspar)(based on PVSS, author: Clara Gaspar)
Initially for MC production, later all Grid computing Initially for MC production, later all Grid computing Automatic quality checks on final data samples Automatic quality checks on final data samples Online histograms and comparisons between Online histograms and comparisons between
histogramshistograms Use DataGrid monitoring tools Use DataGrid monitoring tools Feed back improvements into production MC Feed back improvements into production MC
system system
Requirements on DataGrid Requirements on DataGrid middlewaremiddleware
Security: single user logonSecurity: single user logon Job submission: use “sandboxes” to package Job submission: use “sandboxes” to package
environment so that use of AFS is unnecessaryenvironment so that use of AFS is unnecessary Monitoring: integrate with WP3 tools where Monitoring: integrate with WP3 tools where
possible for farm monitoring, use own tools for possible for farm monitoring, use own tools for data quality monitoringdata quality monitoring
Data moving: use a single API to move dataData moving: use a single API to move data We are in a cycle of requirements, design, We are in a cycle of requirements, design,
implementation and testingimplementation and testing