the alice computing f.carminati may 4, 2006 madrid, spain
Post on 05-Jan-2016
216 Views
Preview:
TRANSCRIPT
The ALICE Computing
F.CarminatiMay 4, 2006
Madrid, Spain
24/5/2006 fca @ CIEMAT
level 0 - special hardware8 kHz (160 GB/sec)
level 1 - embedded processors
level 2 - PCs
200 Hz (4 GB/sec)
30 Hz (2.5 GB/sec)
30 Hz
(1.25 GB/sec)
data recording &
offline analysis
Total weight 10,000tOverall diameter 16.00mOverall length 25mMagnetic Field 0.4Tesla
ALICE Collaboration ~ 1/2 ATLAS, CMS, ~ 2x LHCb ~1000 people, 30 countries, ~
80 Institutes
34/5/2006 fca @ CIEMAT
The history
• Developed since 1998 along a coherent line
• Developed in close collaboration with the ROOT team
• No separate physics and computing team– Minimise communication problems– May lead to “double counting” of people
• Used for the TDR’s of all detectors and Computing TDR simulations and reconstructions
44/5/2006 fca @ CIEMAT
The framework
ROOT
AliRoot
STEER
Virtual MC
G3 G4 FLUKA
HIJING
MEVSIM
PYTHIA6
EVGEN
HBTP
HBTAN
ISAJET
AliE
n +
LC
G
EMCAL ZDCITS PHOSTRD TOF RICH
ESD
AliAnalysis
AliReconstruction
PMD
CRT FMD MUON TPCSTART RALICESTRUCT
AliSimulation
JETAN
54/5/2006 fca @ CIEMAT
The code
• 0.5MLOC C++• 0.5MLOC “vintage” FORTRAN code• Nightly builds• Strict coding conventions• Subset of C++ (no templates, STL or exceptions!)
– “Simple” C++, fast compilation and link (see R.Brun’s talk)
– No configuration management tools (only cvs)– aliroot is a single package to install
• Maintained on several systems– DEC-Tru64, Mac OSX, Linux RH/SLC/Fedora
(i32:i64:AMD), Sun Solaris
• 30% developed at CERN and 70% outside
64/5/2006 fca @ CIEMAT
The tools
• Coding convention checker• Reverse engineering• Smell detection• Branch instrumentation• Genetic testing (in preparation)• Aspect Oriented Programming (in
preparation)
74/5/2006 fca @ CIEMAT
The Simulation
User Code
VMC
Geometrical Modeller
G3 G3 transport
G4 transportG4
FLUKA transportFLUKA
Reconstruction
Visualisation
Generators
See A.Morsch’s talk
84/5/2006 fca @ CIEMAT
QuickTime™ and a decompressor
are needed to see this picture.
94/5/2006 fca @ CIEMAT
0 10 20 30
microsec/point (1 milion
Gexam1
Gexam3
Gexam4
ATLAS
CMS
BRAHMS
CDF
MINOS_NEAR
BTEV
TESLA
Performance for "Where am I" - physics case (G3 geometries collected in 2002)
ROOT
G3
TGeo modeller
104/5/2006 fca @ CIEMAT
The reconstruction
• Incremental process– Forward propagation
towards to the vertex TPCITS
– Back propagation ITSTPCTRDTOF
– Refit inward TOFTRDTPCITS
• Continuous seeding– Track segment finding
in all detectors
Best track 1 Best track 2
Conflict !
TRD
TPC
ITS
TOF
• Combinatorial tracking in ITS– Weighted two-tracks 2 calculated – Effective probability of cluster
sharing– Probability not to cross given layer
for secondary particles
See P.Hristov’s talk
114/5/2006 fca @ CIEMAT
Calibration
DAQ
Trigger
DCS
ECS
Physics
data
DCDB
AliEn+LCGmetadatafile store
calibration procedures
calibration files
AliRoot
Calibration classes
API
API
API
API
API
filesFrom URs:
Source, volume, granularity, update frequency, access pattern, runtime environment and dependencies
API – Application Program Interface
API
APIHLT
shuttle
124/5/2006 fca @ CIEMAT
Alignment
Simulation
Ideal Geometry
Misalignment
Reconstruction
Raw data
File from survey
Ideal Geometry
Alignment procedure
134/5/2006 fca @ CIEMAT
Tag architecture
ev#guid
Tag1, tag2, tag3…
ev#guid
Tag1, tag2, tag3…
ev#guid
Tag1, tag2, tag3…
ev#guid
Tag1, tag2, tag3…
Reconstruction
Bitmap Index
Index builder
Analysis job
Selection List of ev#guid’s
proof#1
proof#2
proof#3
…
proof#n
guid#{ev1…evn}
guid#{ev1…evn}
guid#{ev1…evn}
guid#{ev1…evn}
144/5/2006 fca @ CIEMAT
Visualisation
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
See M.Tadel’s talk
154/5/2006 fca @ CIEMAT
ALICE Analysis Basic Concepts• Analysis Models
– Prompt reco/analysis at T0 using PROOF infrastructure
– Batch Analysis using GRID infrastructure
– Interactive Analysis using PROOF(+GRID) infrastructure
• User Interface– ALICE User access any GRID
Infrastructure via AliEn or ROOT/PROOF UIs
• AliEn– Native and “GRID on a GRID”
(LCG/EGEE, ARC, OSG)– integrate as much as possible
common components• LFC, FTS, WMS, MonALISA ...
• PROOF/ROOT– single + multitier static and
dynamic PROOF cluster– GRID API class
TGrid(virtual)TAliEn(real)
p
164/5/2006 fca @ CIEMAT
If you thought this was difficult ...
NA49 experiment:
A Pb-Pb event
174/5/2006 fca @ CIEMAT
ALICE Pb-Pb central event
Nch(-0.5<<0.5)=8000
… then what about this!
184/5/2006 fca @ CIEMAT
ALICE Collaboration
UKPORTUGAL
JINR
GERMANY
SWEDENCZECH REP.
HUNGARYNORWAY
SLOVAKIA
POLANDNETHERLANDS
GREECE
DENMARKFINLAND
SWITZERLAND
RUSSIA CERN
FRANCE
MEXICOCROATIA ROMANIA
CHINA
USAARMENIA
UKRAINE
INDIA
ITALYS. KOREA
~ 1000 Members
(63% from CERN MS)
~30 Countries
~80 Institutes
0
200
400
600
800
1000
1200
1990 1992 1994 1996 1998 2000 2002 2004
ALICE Collaboration statistics
LoI
MoU
TP
TRD
194/5/2006 fca @ CIEMAT
CERN computing power
• “High throughput” computing based on reliable commercial components
• More tha 1500 double CPU PC’s– 5000 in 2007
• More than 3 PB of data on disks & tapes– > 15 PB in 2007
Far from enough!
204/5/2006 fca @ CIEMAT
EGEE production service
• >180 sites• >15 000 CPUs • ~14 000 jobs
completed per day
• 20 VOs • >800
registered users that represent thousand of scientists
http://gridportal.hep.ph.ic.ac.uk/rtm/
Situation 20 September 2005
214/5/2006 fca @ CIEMAT
ALICE view on the current situation
EDG
AliEn
Exp specific services
LCGAliEn arch + LCG code
EGEE
Exp specific services (AliEn’ for ALICE)
EGEE, ARC, OSG…
224/5/2006 fca @ CIEMAT
Job 1.1 lfn1
Job 1.2 lfn2
Job 1.3 lfn3, lfn4
Job 2.1 lfn1, lfn3
Job 2.1 lfn2, lfn4
Job 3.1 lfn1, lfn3
Job 3.2 lfn2
Site
ALICE central services
ALICE Grid
Optimizer
ComputingAgent
RB
CE
WN
Execs agent
Submits job UserALICE Job Catalogue
VO-Box
LCG
User Job
ALICE catalogues
Registers output
lfn guid
{se’s}
lfn guid
{se’s}
lfn guid
{se’s}
lfn guid
{se’s}
lfn guid
{se’s}
ALICE File Catalogue
packman
SA
xrootdGUID
LFC
SRM
MSS
File accessWorkloadrequest
SURL
234/5/2006 fca @ CIEMAT
File Catalogue query
CE and SE
processing
User job (many events)
Data set (ESD’s, AOD’s)
Job Optimizer
Sub-job 1 Sub-job 2 Sub-job n
CE and SEprocessin
g
CE and SE
processing
Job Broker
Grouped by SE files location
Submit to CE with closest SE
Output file 1
Output file 2
Output file n
File merging job
Job output
Distributed analysis
processin
g
processin
g
244/5/2006 fca @ CIEMAT
Data Challenge
• Last (!) exercise before data taking
• Test of the system started with simulation
• Up to 3600 jobs running in parallel
• Next will be reconstruction and analysis
254/5/2006 fca @ CIEMAT
T1 T2 T1 T2 T1 T2 T1 T2
TDR requirement (MSI 2K) 4.9 5.8 12.3 14.4 16.0 18.7 20.9 24.3
Missing % -44% -40% -43% -56% -29% -41% -24% -53%
TDR requirement (PB) 3.1 1.5 7.9 3.7 10.2 4.8 13.3 6.2
Missing % -61% -35% -61% -48% -51% -28% -46% -32%
TDR requirement (TB) 2779 - 6947 - 9031 - 11740 -
Missing % -45% - -45% - -15% - -9% -
CPU
Disk
MS
Pledged by external sites versus required MoU
2007 2008 2009 2010
ALICE computing model• For pp similar to the other experiments
– Quasi-online data distribution and first reconstruction at T0– Further reconstructions at T1’s
• For AA different model– Calibration, alignment, pilot reconstructions and partial data export during data taking– Data distribution and first reconstruction at T0 in the four months after AA run (shutdown)– Further reconstructions at T1’s
• T0: First pass reconstruction, storage of RAW, calibration data and first-pass ESD’s• T1: Subsequent reconstructions and scheduled analysis, storage of a collective copy of RAW
and one copy of data to be safely kept, disk replicas of ESD’s and AOD’s• T2: Simulation and end-user analysis, disk replicas of ESD’s and AOD’s
264/5/2006 fca @ CIEMAT
Production Environment
Coord.
• Production environment (simulation, reconstruction & analysis)
• Distributed computing environment
• Database organisation
DetectorProjects
Framework & Infrastructure
Coord.
• Framework development (simulation, reconstruction & analysis)
• Persistency technology
• Computing data challenges
• Industrial joint projects
• Tech. Tracking• Documentation
Simulation Coord.
• Detector Simulation• Physics simulation• Physics validation• GEANT 4 integration• FLUKA integration• Radiation Studies• Geometrical modeler
International Computing
Board
DAQ
Reconstruction & Physics Soft
Coord.
• Tracking• Detector
reconstruction• Global
reconstruction• Analysis tools• Analysis algorithms• Physics data
challenges• Calibration &
alignment algorithms
Management Board
Regional Tiers
Offline BoardChair: Comp Coord
Software Projects
HLTLCG
SC2, PEB, GDB, POB
Core Computing and Software
EU Gridcoord.
US Gridcoord.
Offline Coordination
• Resource planning• Relation with funding agencies• Relations with C-RRB
Offline Coord.(Deputy PL)
274/5/2006 fca @ CIEMAT
Conclusions
• ALICE has followed a single evolution line since eight years
• Most of the initial choices have been validated by our experience
• Some parts of the framework still have to be populated by the sub-detectors
• Wish us good luck!
284/5/2006 fca @ CIEMAT
top related