grid computing at the large hadron collider: massive computing at the limit of scale, space, power...
TRANSCRIPT
![Page 1: Grid Computing at the Large Hadron Collider: Massive Computing at the Limit of Scale, Space, Power and Budget Dr Helge Meinhard CERN, IT Department 02-Jul-2009](https://reader035.vdocuments.mx/reader035/viewer/2022062720/56649f135503460f94c27659/html5/thumbnails/1.jpg)
Grid Computingat the Large Hadron
Collider:Massive Computing at the
Limit of Scale, Space, Power and Budget
Dr Helge MeinhardCERN, IT Department
02-Jul-2009
![Page 2: Grid Computing at the Large Hadron Collider: Massive Computing at the Limit of Scale, Space, Power and Budget Dr Helge Meinhard CERN, IT Department 02-Jul-2009](https://reader035.vdocuments.mx/reader035/viewer/2022062720/56649f135503460f94c27659/html5/thumbnails/2.jpg)
CERN (1)
Conseil européen pour la recherche nucléaire – aka European Laboratory for Particle Physics Facilities for
fundamental research
Between Geneva and the Jura mountains, straddling the Swiss-French border
Founded in 1954
![Page 3: Grid Computing at the Large Hadron Collider: Massive Computing at the Limit of Scale, Space, Power and Budget Dr Helge Meinhard CERN, IT Department 02-Jul-2009](https://reader035.vdocuments.mx/reader035/viewer/2022062720/56649f135503460f94c27659/html5/thumbnails/3.jpg)
CERN (2)
20 member states ~3400 staff members,
fellows, students, apprentices
9000 users registered (~6500 on site) from more than 550
institutes in more than 80 countries
~ 910 MCHF (~550 MEUR) annual budget
http://cern.ch/
![Page 4: Grid Computing at the Large Hadron Collider: Massive Computing at the Limit of Scale, Space, Power and Budget Dr Helge Meinhard CERN, IT Department 02-Jul-2009](https://reader035.vdocuments.mx/reader035/viewer/2022062720/56649f135503460f94c27659/html5/thumbnails/4.jpg)
Physics at the LHC (1)
Matter particles: fundamental building
blocks
Force particles:bind matter particles
![Page 5: Grid Computing at the Large Hadron Collider: Massive Computing at the Limit of Scale, Space, Power and Budget Dr Helge Meinhard CERN, IT Department 02-Jul-2009](https://reader035.vdocuments.mx/reader035/viewer/2022062720/56649f135503460f94c27659/html5/thumbnails/5.jpg)
Physics at the LHC (2)
Four known forces: strong force,weak force, electromagnetism, gravitation
Standard model unifies three of them Verified to
0.1 percent level Too many free
parameters E.g. particle masses
Higgs particle Higgs condensate
fills vacuum Acts like ‘molasse’,
slows other particles down, gives them mass
![Page 6: Grid Computing at the Large Hadron Collider: Massive Computing at the Limit of Scale, Space, Power and Budget Dr Helge Meinhard CERN, IT Department 02-Jul-2009](https://reader035.vdocuments.mx/reader035/viewer/2022062720/56649f135503460f94c27659/html5/thumbnails/6.jpg)
Physics at the LHC (3)
Open questions in particle physics: Why are the parameters of the size as we
observe them? What gives the particles their masses? How can gravity be integrated into a unified
theory? Why is there only matter and no anti-matter in
the universe? Are there more space-time dimensions than
the 4 we know of? What is dark energy and dark matter which
makes up 98% of the universe? Finding the Higgs and possible new
physics with LHC will give the answers!
![Page 7: Grid Computing at the Large Hadron Collider: Massive Computing at the Limit of Scale, Space, Power and Budget Dr Helge Meinhard CERN, IT Department 02-Jul-2009](https://reader035.vdocuments.mx/reader035/viewer/2022062720/56649f135503460f94c27659/html5/thumbnails/7.jpg)
The Large Hadron Collider (1)
Accelerator for protons against protons – 14 TeV collision energy By far the world’s
most powerful accelerator
Tunnel of 27 km circumference, 4 m diameter, 50…150 m below ground
Detectors at four collision points
![Page 8: Grid Computing at the Large Hadron Collider: Massive Computing at the Limit of Scale, Space, Power and Budget Dr Helge Meinhard CERN, IT Department 02-Jul-2009](https://reader035.vdocuments.mx/reader035/viewer/2022062720/56649f135503460f94c27659/html5/thumbnails/8.jpg)
The Large Hadron Collider (2)
Approved 1994, first circulating beams on10 September 2008
Protons are bent by superconducting magnets (8 Tesla, operating at 2K = –271°C) all around the tunnel
Each beam: 3000 bunches of 100 billion protons each
Up to 40 million bunch collisions per second at the centre of each of the four detectors
![Page 9: Grid Computing at the Large Hadron Collider: Massive Computing at the Limit of Scale, Space, Power and Budget Dr Helge Meinhard CERN, IT Department 02-Jul-2009](https://reader035.vdocuments.mx/reader035/viewer/2022062720/56649f135503460f94c27659/html5/thumbnails/9.jpg)
The Large Hadron Collider (3) Incident on 19 September 2008
During attempt to ramp beam energy up to 7 TeV, a leak occurred in the cold mass causing significant loss of helium
Repair work is ongoing Instrumentation for detecting this kind of
problem being added Schedule: beam around mid October
2009, collisions around mid November 2009, running until autumn 2010 Collision energy: 5 + 5 TeV Short technical stop at Christmas 2009
![Page 10: Grid Computing at the Large Hadron Collider: Massive Computing at the Limit of Scale, Space, Power and Budget Dr Helge Meinhard CERN, IT Department 02-Jul-2009](https://reader035.vdocuments.mx/reader035/viewer/2022062720/56649f135503460f94c27659/html5/thumbnails/10.jpg)
LHC Detectors (1)ATLAS
CMS
LHCb
![Page 11: Grid Computing at the Large Hadron Collider: Massive Computing at the Limit of Scale, Space, Power and Budget Dr Helge Meinhard CERN, IT Department 02-Jul-2009](https://reader035.vdocuments.mx/reader035/viewer/2022062720/56649f135503460f94c27659/html5/thumbnails/11.jpg)
LHC Detectors (2)
2’200 physicists (including 450 students) from 167 institutes of 37
countries
![Page 12: Grid Computing at the Large Hadron Collider: Massive Computing at the Limit of Scale, Space, Power and Budget Dr Helge Meinhard CERN, IT Department 02-Jul-2009](https://reader035.vdocuments.mx/reader035/viewer/2022062720/56649f135503460f94c27659/html5/thumbnails/12.jpg)
LHC Data(1)
The accelerator generates 40 million bunch collisions (“events”) every second at the centre of each of the four experiments’ detectors
![Page 13: Grid Computing at the Large Hadron Collider: Massive Computing at the Limit of Scale, Space, Power and Budget Dr Helge Meinhard CERN, IT Department 02-Jul-2009](https://reader035.vdocuments.mx/reader035/viewer/2022062720/56649f135503460f94c27659/html5/thumbnails/13.jpg)
LHC Data (2)
Reduced by online computers that filter out a few hundred “good” events per second …
… which are recorded on disk and magnetic tape at 100…1’000 Megabytes/sec
15 Petabytes per year for four experiments
15’000 Terabytes = 3 million DVDs
1 event = few Megabytes
![Page 14: Grid Computing at the Large Hadron Collider: Massive Computing at the Limit of Scale, Space, Power and Budget Dr Helge Meinhard CERN, IT Department 02-Jul-2009](https://reader035.vdocuments.mx/reader035/viewer/2022062720/56649f135503460f94c27659/html5/thumbnails/14.jpg)
LHC Data (3)
![Page 15: Grid Computing at the Large Hadron Collider: Massive Computing at the Limit of Scale, Space, Power and Budget Dr Helge Meinhard CERN, IT Department 02-Jul-2009](https://reader035.vdocuments.mx/reader035/viewer/2022062720/56649f135503460f94c27659/html5/thumbnails/15.jpg)
CERN18%
All Tier-1s39%
All Tier-2s43%
CERN12%
All Tier-1s55%
All Tier-2s33%
CERN34%
All Tier-1s66%
CPU Disk Tape
30’000 CPU servers, 110’000 disks: Far too
much for CERN!
Summary of Computing Resource RequirementsAll experiments – 2008
From LCG TDR – June 2005
Total
CPU (MSPECint2000s) 142
Disk (Petabytes) 57
Tape (Petabytes) 53
CERN All Tier-1s All Tier-2s
25 56 61
7 31 19
18 35
![Page 16: Grid Computing at the Large Hadron Collider: Massive Computing at the Limit of Scale, Space, Power and Budget Dr Helge Meinhard CERN, IT Department 02-Jul-2009](https://reader035.vdocuments.mx/reader035/viewer/2022062720/56649f135503460f94c27659/html5/thumbnails/16.jpg)
Worldwide LHC Computing Grid (1) Tier 0: CERN
Data acquisition and initial processing
Data distribution Long-term curation
Tier 1: 11 major centres Managed mass
storage Data-heavy analysis Dedicated 10 Gbps
lines to CERN Tier 2: More than
200 centres in more than 30 countries Simulation End-user analysis
Tier 3: physicists’ desktops
Tier3physics
department
Desktop
Germany
USAUK
France
Italy
Taiwan
NordicCountries
Nether-lands
CERN Tier 0
Tier2
Lab a
Uni a
Lab c
Uni n
Lab m
Lab b
Uni bUni y
Uni x
grid for a physicsstudy group
SpainTier 1
grid for a regional group
![Page 17: Grid Computing at the Large Hadron Collider: Massive Computing at the Limit of Scale, Space, Power and Budget Dr Helge Meinhard CERN, IT Department 02-Jul-2009](https://reader035.vdocuments.mx/reader035/viewer/2022062720/56649f135503460f94c27659/html5/thumbnails/17.jpg)
Worldwide LHC Computing Grid (2) Grid middleware for “seemless”
integration of services Aim: Looks like single huge compute
facility Projects: EDG/EGEE, OSG Big step from proof of concept to stable,
large-scale production Centres are autonomous, but lots of
commonalities Commodity hardware (e.g. x86
processors) Linux (RedHat Enterprise Linux variant)
![Page 18: Grid Computing at the Large Hadron Collider: Massive Computing at the Limit of Scale, Space, Power and Budget Dr Helge Meinhard CERN, IT Department 02-Jul-2009](https://reader035.vdocuments.mx/reader035/viewer/2022062720/56649f135503460f94c27659/html5/thumbnails/18.jpg)
CERNComputer CentreFunctions:
WLCG: Tier 0, some T1/T2
Support for smaller experiments at CERN
Infrastructure for the laboratory
…
![Page 19: Grid Computing at the Large Hadron Collider: Massive Computing at the Limit of Scale, Space, Power and Budget Dr Helge Meinhard CERN, IT Department 02-Jul-2009](https://reader035.vdocuments.mx/reader035/viewer/2022062720/56649f135503460f94c27659/html5/thumbnails/19.jpg)
Requirements and Boundaries (1) High Energy Physics applications require
mostly integer processor performance Large amount of processing power and
storage needed for aggregate performance No need for parallelism / low-latency high-
speed interconnects Can use large numbers of components with
performance below optimum level (“coarse-grain parallelism”)
Infrastructure (building, electricity,cooling) is a concern Refurbished two machine rooms
(1500 + 1200 m2) for total air cooledpower consumption of 2.5 MWatts
Will run out of power in about 2011…
![Page 20: Grid Computing at the Large Hadron Collider: Massive Computing at the Limit of Scale, Space, Power and Budget Dr Helge Meinhard CERN, IT Department 02-Jul-2009](https://reader035.vdocuments.mx/reader035/viewer/2022062720/56649f135503460f94c27659/html5/thumbnails/20.jpg)
Requirements and Boundaries (2)
Major boundary condition: cost Getting maximum
resources with fixed budget…
… then dealing with cuts to “fixed” budget
Only choice: commodity equipment as far as possible, minimisingTCO / performance This is not always the
solution with the cheapest investment cost!
Purchased in 2004, now retired
![Page 21: Grid Computing at the Large Hadron Collider: Massive Computing at the Limit of Scale, Space, Power and Budget Dr Helge Meinhard CERN, IT Department 02-Jul-2009](https://reader035.vdocuments.mx/reader035/viewer/2022062720/56649f135503460f94c27659/html5/thumbnails/21.jpg)
The Bulk Resources – Event Data
Tapes/ server
s
Disk servers
CPU servers
Router
R
R
R
REthernetbackbone
(multiple 10GigE)
10GigE
Permanent storage on tape
Disk as temporary buffer
Data paths:tape diskdisk cpu
(simplified network topology)
![Page 22: Grid Computing at the Large Hadron Collider: Massive Computing at the Limit of Scale, Space, Power and Budget Dr Helge Meinhard CERN, IT Department 02-Jul-2009](https://reader035.vdocuments.mx/reader035/viewer/2022062720/56649f135503460f94c27659/html5/thumbnails/22.jpg)
CERN CC currently (July 2009)
5’700 systems, 34’600 processing cores CPU servers, disk servers, infrastructure
servers 13’900 TB usable on 41’500 disk
drives 34’000 TB on 45’000 tape cartridges
(56’000 slots), 160 tape drives Tenders in progress or planned
(estimates) 3’000 systems, 20’000 processing cores 19’000 TB usable on 21’000 disk drives
![Page 23: Grid Computing at the Large Hadron Collider: Massive Computing at the Limit of Scale, Space, Power and Budget Dr Helge Meinhard CERN, IT Department 02-Jul-2009](https://reader035.vdocuments.mx/reader035/viewer/2022062720/56649f135503460f94c27659/html5/thumbnails/23.jpg)
CPU Servers (1)
Simple, stripped down, “HPC like” boxes No fast low-latency interconnects
EM64T or AMD64 processors (usually 2),2 GB/core, 1 disk/processor
Open to multiple systems per enclosure Adjudication based on total performance
(SPECint2000, moving to SPECcpu2006 – all_cpp subset)
Power consumption taken into account
![Page 24: Grid Computing at the Large Hadron Collider: Massive Computing at the Limit of Scale, Space, Power and Budget Dr Helge Meinhard CERN, IT Department 02-Jul-2009](https://reader035.vdocuments.mx/reader035/viewer/2022062720/56649f135503460f94c27659/html5/thumbnails/24.jpg)
CPU Servers (2)
![Page 25: Grid Computing at the Large Hadron Collider: Massive Computing at the Limit of Scale, Space, Power and Budget Dr Helge Meinhard CERN, IT Department 02-Jul-2009](https://reader035.vdocuments.mx/reader035/viewer/2022062720/56649f135503460f94c27659/html5/thumbnails/25.jpg)
The Power Challenge (1)
Infrastructure limitations E.g. CERN: 2.5 MW for IT equipment
Clearly insufficient – need to fit maximum capacity into given power envelope
Additional creative measures required (water-cooled racks in air-cooled room)
Electricity costs money Electricity costs likely to raise (steeply)
over the next few years Saving 10 W is saving 88 kWh per year
![Page 26: Grid Computing at the Large Hadron Collider: Massive Computing at the Limit of Scale, Space, Power and Budget Dr Helge Meinhard CERN, IT Department 02-Jul-2009](https://reader035.vdocuments.mx/reader035/viewer/2022062720/56649f135503460f94c27659/html5/thumbnails/26.jpg)
The Power Challenge (2)
IT responsible of significant fraction of world energy consumption Server farms: 180…280 billion kWh per
year (20…31 million kW) CERN’s data centre is 0.1 per mille of this…
1…2% of the world’s energy consumption, annual growth rate: 16…23%
Responsibility towards mankind demands using the energy as efficiently as possible
Saving a few percent of energy consumption makes a big difference
![Page 27: Grid Computing at the Large Hadron Collider: Massive Computing at the Limit of Scale, Space, Power and Budget Dr Helge Meinhard CERN, IT Department 02-Jul-2009](https://reader035.vdocuments.mx/reader035/viewer/2022062720/56649f135503460f94c27659/html5/thumbnails/27.jpg)
Server Energy Consumption
Power supply Fans Processors Chipset Memory modules Disk drives VRMs (Voltage Regulator Modules) RAID controllers … What should we start looking at?
![Page 28: Grid Computing at the Large Hadron Collider: Massive Computing at the Limit of Scale, Space, Power and Budget Dr Helge Meinhard CERN, IT Department 02-Jul-2009](https://reader035.vdocuments.mx/reader035/viewer/2022062720/56649f135503460f94c27659/html5/thumbnails/28.jpg)
CERN’s Approach
Measure apparent (VA) power consumption in primary AC circuit CPU servers: 80% full load, 20% idle Infrastructure servers: 50% full load, 50%
idle Add element reflecting power
consumption to purchase price Currently about 6.50 EUR per VA
Adjudicate on the sum of purchase price and power adjudication element
![Page 29: Grid Computing at the Large Hadron Collider: Massive Computing at the Limit of Scale, Space, Power and Budget Dr Helge Meinhard CERN, IT Department 02-Jul-2009](https://reader035.vdocuments.mx/reader035/viewer/2022062720/56649f135503460f94c27659/html5/thumbnails/29.jpg)
Power Efficiency: Lessons Learned (1) Power efficiency increased by factor 12
in just a little over 4 years Power efficiency =
performance / power consumption Quantum steps:
Microarchitecture: Netburst to Core to Nehalem Multi-core: 1 to 2 to 4 per CPU For Core: 5000P (Blackford) to 5100 (San
Clemente) chipset Much improved PSU efficiencies
CERN retires servers aggressively after end of warranty period
![Page 30: Grid Computing at the Large Hadron Collider: Massive Computing at the Limit of Scale, Space, Power and Budget Dr Helge Meinhard CERN, IT Department 02-Jul-2009](https://reader035.vdocuments.mx/reader035/viewer/2022062720/56649f135503460f94c27659/html5/thumbnails/30.jpg)
Power Efficiency: Lessons Learned (2)
Need to benchmark concrete servers; generic statements on platform are void Beginning 2007: different servers with same
CPU, same chipset, same memory config resulted in proposals with 50% spread of power efficiency
Fostering energy-efficient solutions makes a difference Summer 2008: different techniques in
response to same call for tender differed by 60% in power efficiency
![Page 31: Grid Computing at the Large Hadron Collider: Massive Computing at the Limit of Scale, Space, Power and Budget Dr Helge Meinhard CERN, IT Department 02-Jul-2009](https://reader035.vdocuments.mx/reader035/viewer/2022062720/56649f135503460f94c27659/html5/thumbnails/31.jpg)
Power Efficiency: Lessons Learned (3)
Solutions with power supplies feeding more than one system usually more power-efficient There are more options than just
blades… Redundant power supplies are
inefficient Summer 2008: 1U server running on one
PSU module used 8.5% less power than running on two modules
Difference even larger in idle mode
![Page 32: Grid Computing at the Large Hadron Collider: Massive Computing at the Limit of Scale, Space, Power and Budget Dr Helge Meinhard CERN, IT Department 02-Jul-2009](https://reader035.vdocuments.mx/reader035/viewer/2022062720/56649f135503460f94c27659/html5/thumbnails/32.jpg)
Future (1)
Is IT growth sustainable? Demands continue to rise exponentially Even if Moore’s law continues to apply,
data centres will need to grow in number and size
IT already consuming 2% of world’s energy – where do we go?
How to handle growing demands within a given data centre? Demands evolve very rapidly, technologies
less so, infrastructure even at a slower pace – how to best match these three?
![Page 33: Grid Computing at the Large Hadron Collider: Massive Computing at the Limit of Scale, Space, Power and Budget Dr Helge Meinhard CERN, IT Department 02-Jul-2009](https://reader035.vdocuments.mx/reader035/viewer/2022062720/56649f135503460f94c27659/html5/thumbnails/33.jpg)
Future (2)
IT: Ecosystem of Hardware OS software and tools Applications
Evolving at different paces: hardware fastest, applications slowest How to make sure at any given time that
they match reasonably well?
![Page 34: Grid Computing at the Large Hadron Collider: Massive Computing at the Limit of Scale, Space, Power and Budget Dr Helge Meinhard CERN, IT Department 02-Jul-2009](https://reader035.vdocuments.mx/reader035/viewer/2022062720/56649f135503460f94c27659/html5/thumbnails/34.jpg)
Future (3)
Example: single-core to multi-core to many-core Most HEP applications currently single-
threaded Consider server with two quad-core
CPUs as eight independent execution units Model does not scale much further
Need to adapt applications to many-core machines Large, long effort
![Page 35: Grid Computing at the Large Hadron Collider: Massive Computing at the Limit of Scale, Space, Power and Budget Dr Helge Meinhard CERN, IT Department 02-Jul-2009](https://reader035.vdocuments.mx/reader035/viewer/2022062720/56649f135503460f94c27659/html5/thumbnails/35.jpg)
Summary The Large Hadron Collider (LHC) and its experiments
is a very data (and compute) intensive project LHC has triggered or pushed new technologies
E.g. Grid middleware, WANs High-end or bleeding edge technology not necessary
everywhere That’s why we can benefit from the cost advantages of
commodity hardware Scaling computing to the requirements of LHC is hard
work IT power consumption/efficiency is a primordial
concern We had the first circulating beams on 10-Sep-2008,
and have the capacity in place for the initial needs We are on track for further ramp-ups of the computing
capacity for future requirements
![Page 36: Grid Computing at the Large Hadron Collider: Massive Computing at the Limit of Scale, Space, Power and Budget Dr Helge Meinhard CERN, IT Department 02-Jul-2009](https://reader035.vdocuments.mx/reader035/viewer/2022062720/56649f135503460f94c27659/html5/thumbnails/36.jpg)
Summary of Computing Resource RequirementsAll experiments - 2008From LCG TDR - June 2005
CERN All Tier-1s All Tier-2s TotalCPU (MSPECint2000s) 25 56 61 142Disk (PetaBytes) 7 31 19 57Tape (PetaBytes) 18 35 53
Thank you