atlas computing status and plans
DESCRIPTION
ATLAS Computing Status and Plans. Richard P Mount SLAC National Accelerator Laboratory. Topics. Organization from March 1 Status – the last 12 months Plans – Preparation for Run 2 Vision of the Future. Software and Computing Organization – From March 1. - PowerPoint PPT PresentationTRANSCRIPT
LHCC Referees Meeting Richard P Mount March 5, 2014
ATLAS ComputingStatus and Plans
Richard P Mount
SLAC National Accelerator Laboratory
LHCC Referees Meeting Richard P Mount March 5, 2014 2
Topics
• Organization from March 1
• Status – the last 12 months
• Plans – Preparation for Run 2
• Vision of the Future
LHCC Referees Meeting Richard P Mount March 5, 2014 3
Software and Computing Organization – From March 1
Software and Computing CoordinationRichard Mount
Eric Lancon (deputy)
Distributed Computing (ADC)Simone Campana and Torre Wenaus
SoftwareRolf Seuster and Markus Elsing
Major software effort underway preparing for Run 2:• Implementation Task Forces• Reorganization and
rationalization of existing activities
LHCC Referees Meeting Richard P Mount March 5, 2014 4
CPU Usage March 2013 to January 2014
Tier 1s:• Consistent above-pledge performance• Saturation most of the time
Tier 2s:• Consistent delivery of above-pledge and
opportunistic resources• Saturation most of the time
MC SimulationUser AnalysisMC RecoGroup ProdGroup Analy
LHCC Referees Meeting Richard P Mount March 5, 2014 5
High Level Trigger Farm Exploitation
CERN T0 and CAF usage for grid jobs
ATLAS HLT usage for grid jobs (bursts of over 15k jobs)
• The HLT has about 10% of the total ATLAS CPU capacity• Its time-averaged availability for simulation is expected to be no more
than 30%
LHCC Referees Meeting Richard P Mount March 5, 2014 6
Pending Jobs and Volume of Data Processed
MC SimulationUser AnalysisMC RecoGroup ProdGroup Analy
Total:> 1 Exabyte
Must limit simulation to keep analysis turnround acceptable, always many pending requests, priority via physics coordination
Analysis is the main driver of storage+network I/O capacity
LHCC Referees Meeting Richard P Mount March 5, 2014 7
Disk Space
MC Production
Real Data
Group Analysis
User Analysis
MC Production
Group Analysis
Real Data
User Analysis
Primary (pinned)Default (pinned)Secondary (Dynamically Managed)Input
Tier 1
Tier 2
T1 and T2 disks are full, requiring regular deletion of less-recently-accessed data
T1 dynamically managed space is currently too small (need to pin less data)
LHCC Referees Meeting Richard P Mount March 5, 2014 8
Tier 1 Tape
On track to saturate the 41 PB pledge
Simulated Hits to be kept for ~1 year in future
ESD no longer written in most cases
Expect major growth of Group Data on tape.
Raw Data
Simulated Hits
AOD
ESDNTUP
LHCC Referees Meeting Richard P Mount March 5, 2014 9
Preparation for Run 2
Guiding Principle:
Maximize physics capability while requiring resources that grow
more slowly than LHC luminosity
Key disk and CPU efficiency improvements:
• Improve reconstruction efficiency (target factor 2 to 3 in speed)
• Improve full simulation efficiency
• Implement the Integrated Simulation Framework supporting an
optimal mix of full and fast simulation
• Rationalize analysis workflow (less CPU/luminosity and less
Disk/luminosity1 for the same physics )
1) Smaller data formats, fewer version of the largest datasets
LHCC Referees Meeting Richard P Mount March 5, 2014 10
LHCC Referees Meeting Richard P Mount March 5, 2014 11
Major improvements to analysis environment
Task forces
• TF1, TF4: Define and implement new Event Data Model (xAOD);
migrate to new vector/matrix library Eigen
• TF2: Define and implement Reduction framework and train model
• TF3: New Analysis framework and tools; generic tools for “Combined
Performance” recommendations.
LHCC Referees Meeting Richard P Mount March 5, 2014 12
LHCC Referees Meeting Richard P Mount March 5, 2014 13
Preparation for Run 2 (continued)
Other key improvements:
• Rucio – new scalable distributed data management system
• ProdSys-2 (DEFT and JEDI) • More formalized and automated production management
• Jobs automatically defined to meet the needs of computing
resources
Ongoing developments
• Federated Atlas Xrootd (and http) access• Potential optimization of disk use – test in DC14
• Radical reduction of pinned disk data• Test during 2014
• Stay clear of thrashing the tape system
• “Event Server” technology to facilitate exploitation of
opportunistic resources with unpredictable availability.
LHCC Referees Meeting Richard P Mount March 5, 2014 14
ATLAS Computing Longer Term Vision
Computing:• Major shifts in relative costs of CPU/Disk/Tape/Networks will
continue
• Need to be flexible in “store versus recompute” and “store locally
versus get quickly or access directly from somewhere else”
Software:• Multi-threading (100s of threads) seems inevitable
• Quality, intelligibility and supportability of software will be vital
• Software for the Upgrade(s)
And Finally:• Beware of optimizing away our ability to discover the unexpected