hp puerto-rico – 9 february 2004 - 1 cern and the lhc computing grid ian bird it department cern,...
Post on 18-Dec-2015
220 views
TRANSCRIPT
HP Puerto-Rico – 9 February 2004 - 1
CERN and the LHC Computing Grid
Ian BirdIT Department
CERN, Geneva, Switzerland
HP Puerto Rico9 February 2004
HP Puerto-Rico – 9 February 2004 - 2
• CERN is the world's largest particle physics centre funded by 20 European member states
• Particle physics is about:
- elementary particles of which all matter in the universe is made
- fundamental forces which hold matter together
• Particles physics requires:
- special tools to create and study new particles
What is CERN? CERN is:- 2500 staff scientists (physicists, engineers, …)- Some 6500 visiting scientists (half of the world's particle physicists)
They come from 500 universities representing
80 nationalities.
HP Puerto-Rico – 9 February 2004 - 3
Mont Blanc, 4810 m
Downtown Geneva
… is located in Geneva, Switzerland
HP Puerto-Rico – 9 February 2004 - 4
What is CERN?
The special tools for particle physics are:
• ACCELERATORSHuge machines able to speed up particles to very high energies before colliding them into other particles
• DETECTORSMassive instruments which register the particles produced when the accelerated particles collide
HP Puerto-Rico – 9 February 2004 - 5
• LHC will collide beams of protons at an energy of 14 TeV
• Using the latest super-conducting technologies, it will operate at about – 2700C, just above absolute zero of temperature.
• With its 27 km circumference, the accelerator will be the largest superconducting installation in the world.
What is LHC? LHC is due to
switch on in 2007
Four experiments, with detectors as ‘big as cathedrals’:
ALICEATLASCMSLHCb
HP Puerto-Rico – 9 February 2004 - 6
• A particle collision = an event
• Provides trivial parallelism, hence usage of simple farms
• Physicist's goal is to count, trace and characterize all the particles produced and fully reconstruct the process.
• Among all tracks, the presence of “special shapes” is the sign for the occurrence of interesting interactions.
The LHC Data Challenge
HP Puerto-Rico – 9 February 2004 - 7
Starting from this event…
You are looking for this “signature”
Selectivity: 1 in 1013
Like looking for 1 person in a thousand world populations!
Or for a needle in 20 million haystacks!
The LHC Data Challenge
HP Puerto-Rico – 9 February 2004 - 8
LHC data (simplified)
• 40 million collisions per second
• After filtering, 100 collisions of interest per second
• A Megabyte of digitised information for each collision = recording rate of 0.1 Gigabytes/sec
• 1011 collisions recorded each year = 10 Petabytes/year of data
CMS LHCb ATLAS ALICE
1 Megabyte (1MB)A digital photo
1 Gigabyte (1GB) = 1000MBA DVD movie
1 Terabyte (1TB)= 1000GBWorld annual book production
1 Petabyte (1PB)= 1000TB10% of the annual production by LHC experiments
1 Exabyte (1EB)= 1000 PBWorld annual information production
HP Puerto-Rico – 9 February 2004 - 9
Expected LHC computing needs
Estimated DISK Capacity at CERN
0
1000
2000
3000
4000
5000
6000
7000
1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010
year
Tera
Byt
es
Estimated Mass Storage at CERN
LHC
Other experiments
0
20
40
60
80
100
120
140
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
Year
Pet
aByt
es
Estimated CPU Capacity at CERN
0
1,000
2,000
3,000
4,000
5,000
6,000
1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010
year
K S
I95 Moore’s law (based
on 2000 data)
Networking:10 – 40 Gb/s to all big centres
today
Data: ~15 Petabytes a yearProcessing: ~ 100,000 of today’s PC’s
HP Puerto-Rico – 9 February 2004 - 10
Computing at CERN today
• High-throughput computing based on reliable “commodity” technology• More than 1500 dual processor PCs • More than 3 Petabyte of data on disk (10%) and tapes (90%)
Nowhere near enough!
HP Puerto-Rico – 9 February 2004 - 11
The new computer room is being populated…
CPU servers
Disk servers
Tape silos and servers
Computing at CERN today
HP Puerto-Rico – 9 February 2004 - 12
CPU servers
Disk servers
…while the existing computer centre is being cleared for renovation…
Computing at CERN today
…and an upgrade of the power supply from 0.5MW to 2.5MW is underway.
HP Puerto-Rico – 9 February 2004 - 13
Computing for LHC
• Problem: even with computer centre upgrade, CERN can only provide a fraction of the necessary resources
• Solution: computing centres, which were isolated in the
past, will now be connected, uniting the computing resources of particle physicists in the world using GRID technologies!
Europe: ~270 institutes~4500 users
Elsewhere: ~200 institutes~1600 users
HP Puerto-Rico – 9 February 2004 - 14
LHC Computing Grid Project
• The LCG Project is a collaboration of –• The LHC experiments• The Regional Computing Centres• Physics institutes
.. working together to prepare and deploy the computing environment that will be used by the experiments to analyse the LHC data
• This includes support for applications• provision of common tools, frameworks, environment, data persistency
• .. and the development and operation of a computing service • exploiting the resources available to LHC experiments in computing
centres, physics institutes and universities around the world• presenting this as a reliable, coherent environment for the experiments
HP Puerto-Rico – 9 February 2004 - 15
Applications AreaTorre Wenaus
Development environmentJoint projects, Data management
Distributed analysis
Middleware AreaFrédéric Hemmer
Provision of a base set of grid middleware (acquisition, development, integration)
Testing, maintenance, support
CERN Fabric AreaBernd Panzer
Large cluster managementData recording, Cluster technology
Networking, Computing service at CERN
Grid Deployment AreaIan Bird
Establishing and managing the Grid Service - Middleware, certification, security
operations, registration, authorisation,accounting
LCG Project
Technology Office - David FosterOverall coherence of the project; Pro-active technology watch
Long-term grid technology strategy; Computing models
HP Puerto-Rico – 9 February 2004 - 16
Project Management Board
Project ManagementManagement TeamSC2, GDB chairs
Experiment Delegates
External ProjectsEDG, GridPP, INFN Grid,
VDT, Trillium
Other Resource SuppliersIN2P3, Germany, CERN-IT
Architects’ ForumApplications Area Manager
Experiment ArchitectsComputing Coordinators
Grid Deployment BoardExperiment delegates, national
regional centre delegates
PEB deals directly with the Fabric and
Middleware areas
The GDB negotiates andAgrees operational and security policy,Resource allocation, etc
HP Puerto-Rico – 9 February 2004 - 17
LCG-1 components (schematic)
Computing cluster Network resources Data storage
Operating system Local schedulerFile system
User access Security Data transfer Information schema
Global scheduler Data managementInformation system
User interfaces Applications
Hardware
System software
“Passive” services
“Active” services
High level services
Closed system (?)Closed system (?) HPSS, CASTOR…HPSS, CASTOR…
RedHat LinuxRedHat Linux NFS, …NFS, … PBS, Condor, LSF,…PBS, Condor, LSF,…
VDT (Globus, GLUE)VDT (Globus, GLUE)
EU DataGridEU DataGrid
LCG, experimentsLCG, experiments
HP Puerto-Rico – 9 February 2004 - 18
Elements of a Production Grid Service
• Middleware: - the systems software that interconnects the computing clusters at regional centres to provide the illusion of a single computing facility
• Information publishing and finding, distributed data catalogue, data management tools, work scheduler, performance monitors, etc.
• Operations:• Grid infrastructure services• Registration, accounting, security• Regional centre and network operations• Grid operations centre(s) – trouble and performance monitoring, problem
resolution – 24x7 around the world
• Support:• Middleware and systems support for computing centres• Applications integration, production• User support – call centres/helpdesk – global coverage; documentation;
training
HP Puerto-Rico – 9 February 2004 - 19
• Certification and distribution process established • Middleware package – components from –
• European DataGrid (EDG)• US (Globus, Condor, PPDG, GriPhyN) the Virtual Data
Toolkit• Agreement reached on principles for registration and security• Rutherford Lab (UK) to provide the initial Grid Operations Centre• FZK (Karlsruhe) to operate the Call Centre
LCG Service
The 1st “certified” release was made available to 14 centres on 1 September – Academia Sinica Taipei, BNL, CERN, CNAF, Cyfronet Cracow, FNAL, FZK, IN2P3 Lyon, KFKI Budapest, Moscow State Univ., Prague, PIC Barcelona, RAL, Univ. Tokyo
HP Puerto-Rico – 9 February 2004 - 20
LCG Service – Next Steps
• Deployment status –• 12 sites active when service opened on 15 September• ~30 sites now active• Pakistan, China, Korea, HP, ..preparing to join
• Preparing now for adding new functionality in November to be ready for 2004
• VO management system• Integration of mass storage systems
• Experiments now starting their tests on LCG-1• CMS target is to have 80% of their production on the grid before
the end of the PCP of DC04• Essential that experiments use all features (including/especially
data management)• -- and exercise the grid model even if not needed for short term
challenges• Capacity will follow readiness of experiments
HP Puerto-Rico – 9 February 2004 - 21
LCG Service – Next Steps
• Deployment status –• 12 sites active when service opened on 15 September• 28 sites now active• HP, Pakistan, Australia, Korea, China, ..preparing to join
• Starting to deploy LCG-2 – upgrade for 2004• VO management system• Integration of mass storage systems
• Experiments now starting their tests on LCG-2 in preparation for Data Challenges
• CMS target is to have 80% of their production on the grid before the end of the PCP of DC04
• Essential that experiments use all features (including/especially data management)
• -- and exercise the grid model even if not needed for short term challenges
• Capacity will follow readiness of experiments
HP Puerto-Rico – 9 February 2004 - 22
Resources committed for 1Q04
Resources in Regional Centres
• Resources planned for the period of the data challenges in 2004
• CERN ~12% of the total capacity
• Numbers have to be refined – different standards used by different countries
• Efficiency of use is a major question mark
• Reliability• Efficient scheduling• Sharing between Virtual
Organisations (user groups)
CPU (kSI2K)
Disk TB
Support FTE
Tape TB
CERN 700 160 10.0 1000
Czech Repub
60 5 2.5 5
France 420 81 10.2 540
Germany 207 40 9.0 62
Holland 124 3 4.0 12
Italy 507 60 16.0 100
Japan 220 45 5.0 100
Poland 86 9 5.0 28
Russia 120 30 10.0 40
Taiwan 220 30 4.0 120
Spain 150 30 4.0 100
Sweden 179 40 2.0 40
Switzerland 26 5 2.0 40
UK 1656 226 17.3 295
USA 801 176 15.5 1741
Total 5600 1169 120.0 4223
HP Puerto-Rico – 9 February 2004 - 23
LCG Service Time-line
open LCG-1 (schedule – 1 July)
used for simulated event productions
agree spec. of initial service (LCG-1) 2003
2004
2005
2006
2007
first data
physicscomputing service
Level 1 Milestone – Opening of LCG-1 service• 2 month delay, lower functionality than planned• use by experiments will only starting now (planned for end August) decision on final set of middleware for the 1H04 data challenges will be taken without experience of production running
reduced time for integrating and testing the service with experiments’ systems before data challenges start next spring additional functionality will have to be integrated later
open LCG-1 (achieved) – 1 Sept
HP Puerto-Rico – 9 February 2004 - 24
LCG Service Time-line
used for simulated event productions
agree spec. of initial service (LCG-1) 2003
2004
2005
2006
2007
first data
physicscomputing service
open LCG-1 (achieved) – 1 Sept
* TDR – technical design report
LCG-2 - upgraded middleware, mgt. and ops tools principal service for
LHC data challenges
Computing model TDRs*LCG-3 – second generation
middleware validation of computing models
TDR for the Phase 2 grid
Phase 2 service acquisition, installation, commissioning
experiment setup & preparation
Phase 2 service in production
HP Puerto-Rico – 9 February 2004 - 25
LCG and EGEE
• EU project approved to provide partial funding for operation of a general e-Science grid in Europe, including the supply of suitable middleware – Enabling Grids for e-Science in Europe – EGEEEGEE provides funding for 70 partners, large majority of which have strong HEP ties
• Similar funding being sought in the US
• LCG and EGEE work closely together, sharing the management and responsibility for -
• Middleware – share out the work to implement the recommendations of HEPCAL II and ARDA
• Infrastructure operation – LCG will be the core from which the EGEE grid develops – ensures compatibility; provides useful funding at many Tier 1, Tier2 and Tier 3 centres
• Deployment of HEP applications - small amount of funding provided for testing and integration with LHC experiments
HP Puerto-Rico – 9 February 2004 - 26
Middleware - Next 15 months
• Work closely with experiments on developing experience with early distributed analysis models using the grid
• Multi-tier model • Data management, localisation, migration• Resource matching & scheduling• Performance, scalability
• Evolutionary introduction of new software – rapid testing and integration into mainline services – – while maintaining a stable service for data challenges!
• Establish a realistic assessment of the grid functionality that we will be able to depend on at LHC startup – a fundamental input for the Computing Model TDRs due at end 2004
HP Puerto-Rico – 9 February 2004 - 27
Grids - Maturity is some way off
• Research still needs to be done in all key areas of importance to LHC• e.g. data management, resource matching/provisioning, security, etc.
• Our life would be easier if standards were agreed and solid implementations were available – but they are not
• We are just entering now in the second phase of development• Everyone agrees on the overall direction, based on Web services• But these are not simple developments • And we still are learning how to best approach many of the problems of a grid• There will be multiple and competing implementations – some for sound
technical reasons• We must try to follow these developments and influence the
standardisation activities of the Global Grid Forum (GGF)• It has become clear that LCG will have to live in a world of multiple grids –
but there is no agreement on how grids should inter-operate• Common protocols?• Federations of grids inter-connected by gateways?• Regional Centres connecting to multiple grids?
Running a service in this environment will be challenge!