analysis software benchmark
DESCRIPTION
TRANSCRIPT
Root analysis and implications to analysis model in ATLAS
Akira Shibata, New York University@ ACAT 08 in Erice
Nov 05, 2008
1
[email protected] - Novebmer 5, 2008
Are we ready to face data from LHC collisions?Grid computing? Do we have enough CPU? Tape? Disks? RAM? Do we need T1? T2? T3? AF? Do we need backdoor access? Are the machines maintained? Is it scary? Are they online? Do we have enough bandwidth? Can we copy data across the world? Can we reach the data we need? Can we reduce the data size? ESD? AOD? D1PD? D2PD? D3PD? Can we download them? Do we need interactive access? How do we write an analysis? How fast do they run? Do we need to buy more disk? How big is my ntuple? Do we need to buy more CPU? Disks? RAM? Are we up to date? Do I look cool if I buy a mac? Is virtual machine useful? Why do we use ROOT? What is PROOF? Is python fast enough? Is it easy to code? How often will I need to process my data? How fast will my analysis run? What can I do to get faster? What are the options? What is the future technology?
2
[email protected] - Novebmer 5, 2008
Analysis in the Era of Grid Computing
Tiered model for computing model. Leveled approach needed to optimize the system. Above all, how well does it work from the
physicistsʼ point of view?
Root Native
Root + POOL
Rough size estimate
T1T1
T2
T3
Desktop
T2
T3
Desktop
ROOT / ARA
Analysis at Institute
HistoHisto
Central
AOD/
DPD
making
Grid
Analy /
DPD
making
ESDESDESDESD
AODAODAOD
D1PDD1PD
D2PDD2PD
D3PDD3PD
User
NtupleUser
Ntuple
Local Root
Analysis
Get
D1PDD1PD
cpu
request
deliver
~500kB/evt
~100kB/evt
10-50kB/evt 1-10kB/evt
~1kB/evt
30-80kB/evt
3
[email protected] - Novebmer 5, 2008
Derived Physics Data
• DPDs are created using the following operations:• Skimming: selecting the events one needs• Thinning: selecting the objects one needs• Slimming: removing information from objects.
• ESDs hold full information from reconstruction. AOD, DnPDs are derived with increasing level of derivation.
• Primary purpose of D1PD is to have access to parts of the ESD information that are otherwise difficult to get to.
• D1/2PD are in POOL format. D3PD refers to any DPD that are in ntuple format.
• ESD, AOD, D1PD contents are defined by groups. Several types of D1PD are defined by performance groups. D2PD and D3PD are defined by users.
• First level analysis may be done (variable calculation, object reco etc) when D2/3PD are created.
4
[email protected] - Novebmer 5, 2008
Motivation for Profiling ROOT Analysis• The primary use of the Grid is event reconstruction,
storage and production of reduced data. This is done using ATLAS software, Athena. Some analysis happens here too.
• However, post-Grid (non-Athena) ROOT analysis is the main stage for physics analysis.
• Mostly a user-level decision due to the private nature of physics analysis but:
• the situation is becoming more complex due to availability of new technology;
• no good summary exists comparing the available options;
• it is an important ingredient for an efficient analysis model;
• it is needed for estimating resource requirements.• Technical discussions does not always answer practical
questions. This study will benchmark analysis “modes” in realistic settings based on wall-time measurements.
5
[email protected] - Novebmer 5, 2008
“Flat” vs POOL Persistency• Many of the complexity in the current situation is due to
the POOL technology (additional layer to the ROOT persistency technology) used in ATLAS. POOL supports:• Metadata lookup - used by TAG to access events in
large file without having to read the full contents.• More flexibility in writing out complex objects. Has its
own way of T/P separation and schema evolution.• When the decision was made ROOT persistency was not
so great as it is now.• Problems writing out STL objects.• Problems referring to objects in different trees/files.
• ROOT persistency has improved and now has less issues.
• ARA - enabling reading POOL objects from ROOT by calling POOL converters on demand. P->T conversion. Takes extra read time.
6
[email protected] - Novebmer 5, 2008
Summary of Existing Analysis ModesMode Draw CINT ACLiC PyRoot g++ Athena
Ntuple ◎◎ ◎◎ ◎◎ ◎◎ ◎◎ ◎
POOL ◎ ◎ ╳ ◎◎ ◎◎ ◎◎◎
Compiled/Interpreted Interpreted Interpreted Compiled Interpreted Compiled Both
LanguageC++
Python(C++)-- C++ Python C++
C++Python
Interactive ◎◎ ◎◎ ╳ ◎◎ ╳ ◎
Additional packages -
MakeClassMakeSelector
SPyrootSFrame
AMA-
Standard dev env - - ╳ - ◎◎ ◎◎
Athena components ╳ ╳ ╳ ◎ ◎ ◎◎◎
Implemented most common options. All codes available in ATLAS CVS: users/ashibata/RootBenchmark
7
[email protected] - Novebmer 5, 2008
Benchmark Analysis Contents• A simple Zee reconstruction analysis implemented for
every mode:1. Access electron container (POOL) / electron
kinematics branches (Ntuple)2. Select electrons using isEM and pt and charge3. Fill histograms with electron kinematics (pT and
multiplicity)4. Combine electrons to reconstruct Z5. Fill histogram with Z mass6. Write histograms out in finalize• Repeated the above 10 times
• Not complex enough for a real analysis but not entirely trivial.
• For Draw, plot electron after isEM/pt/charge selection. No four vector arithmetics.
8
[email protected] - Novebmer 5, 2008
Obtaining Reliable Results• Using POSIX measurement as much as
possible. Use wall time from time module.• Avoiding somewhat unstable measurement
with TStopwatch.• Measurements affected by other activities on
the machine. Overcome by multiple measurements.• Machine: Acas (BNL) node with normal load
3.34GB mem, 2 cores Xeon@ 2.00 GHz, data on NFS.
• Disk cache leads to misleading results. CPU time = Wall time once the data is in memory. • Force disc read by flushing RAM. Do not re-
read until all other files have been read. Alternate between AOD and ntuple analyses.
9
[email protected] - Novebmer 5, 2008
Methodology
Number of events0 1000020000300004000050000
Wal
l tim
e (s
)
0
200
400
600
800
1000
1200
1400
1600
AODgpp (init:6.64e+01s, rate:5.35e+02Hz)
SFrame (init:3.62e+01s, rate:3.15e+02Hz)
Draw (init:4.62e+01s, rate:1.25e+02Hz)
PyAthena (init:2.74e+01s, rate:9.65e+01Hz)
Athena (init:3.08e+01s, rate:6.86e+01Hz)
CINT (init:5.25e+01s, rate:1.85e+01Hz)
PyRoot (init:2.50e+00s, rate:1.24e+01Hz)
AOD
Number of events0 1000020000300004000050000
Wal
l tim
e (s
)
0
200
400
600
800
1000
1200
1400
1600
AODgpp (init:6.64e+01s, rate:5.35e+02Hz)
SFrame (init:3.62e+01s, rate:3.15e+02Hz)
Draw (init:4.62e+01s, rate:1.25e+02Hz)
PyAthena (init:2.74e+01s, rate:9.65e+01Hz)
Athena (init:3.08e+01s, rate:6.86e+01Hz)
CINT (init:5.25e+01s, rate:1.85e+01Hz)
PyRoot (init:2.50e+00s, rate:1.24e+01Hz)
AOD1. Measured time taken to
process with increasing number of events.
2. Repeat measurements and take average for each point.
3. Fit a straight line to obtain overhead (offset) and rate (evt/sec).
4. Calculate errors from standard deviation.
Only use rate in comparing the modes. Overhead varies between a fraction of seconds to tens of seconds.
10
[email protected] - Novebmer 5, 2008
Data and FormatPOOL Ntuple
Full contents AOD 144.22 kB/evt
CBNT?not tried
DPD contentsTrigger/Jets/Leptons etc
TopD1PD31.42 kB/evt
TopD3PD4.87 kB/evt
Small DPD contentsTracks + Electrons
SmallD2PD18.74 kB/evt
SmallD3PD0.71 kB/evt
Very small DPDElectrons
VerySmallD2PD1.06 kB/evt
VerySmallD3PD0.37 kB/evt
All derived from FDR2 AODs. All produced on PANDA (except AOD and D1PD). Around 10,000 events per file. Total sample size for one data type ranges between 1 GB - 100 GB. A use-case driven comparison. Input file sizes are different.
11
[email protected] - Novebmer 5, 2008
AOD
Hz
0 200 400
PyRoot (17Hz, 18%)
TSelector (19Hz, 2%)
CINT (21Hz, 15%)
PyAthena (95Hz, 11%)
Athana (98Hz, 8%)
Draw (138Hz, 35%)
SFrame (321Hz, 13%)
gpp (535Hz, 3%)
PyRoot (17Hz, 18%)
TSelector (19Hz, 2%)
CINT (21Hz, 15%)
PyAthena (95Hz, 11%)
Athana (98Hz, 8%)
Draw (138Hz, 35%)
SFrame (321Hz, 13%)
gpp (535Hz, 3%)
AODAOD Analysis Results
mode (rate, error)
Hz
AOD Input
Seems to be reading all containers in the files
Only small difference between C++/Python in Athena.
Compiled non-framework analysis is the fastest.
CINT by far the slowest.
12
[email protected] - Novebmer 5, 2008
Top_D1PD
Hz
0 500 1000
TSelector (22Hz, 2%)
CINT (26Hz, 6%)
PyRoot (43Hz, 9%)
PyAthena (204Hz, 4%)
Draw (298Hz, 55%)
Athana (313Hz, 6%)
SFrame (721Hz, 17%)
gpp (1130Hz, 15%)
TSelector (22Hz, 2%)
CINT (26Hz, 6%)
PyRoot (43Hz, 9%)
PyAthena (204Hz, 4%)
Draw (298Hz, 55%)
Athana (313Hz, 6%)
SFrame (721Hz, 17%)
gpp (1130Hz, 15%)
Top_D1PDD1PD Level ComparisonTop D1PD Input Top D3PD Input
Hz Hz
mode (rate, error)
An order of magnitude advantage for using ntuple for g++ analysis. Much less difference with non-compiled modes.
Top_D3PD
Hz
0 20000 40000 60000CINT (32Hz, 2%)
TSelector (39Hz, 3%)
PyAthena (242Hz, 30%)
PyRoot (300Hz, 21%)
Athana (838Hz, 1%)
Draw (2343Hz, 15%)
SFrame (9453Hz, 19%)
TSelector_ACLiC (18551Hz, 18%)
gpp (45869Hz, 21%)
ACLiC (48494Hz, 20%)
ACLiC_Opt (58719Hz, 16%)
CINT (32Hz, 2%)
TSelector (39Hz, 3%)
PyAthena (242Hz, 30%)
PyRoot (300Hz, 21%)
Athana (838Hz, 1%)
Draw (2343Hz, 15%)
SFrame (9453Hz, 19%)
TSelector_ACLiC (18551Hz, 18%)
gpp (45869Hz, 21%)
ACLiC (48494Hz, 20%)
ACLiC_Opt (58719Hz, 16%)
Top_D3PD
Ntuple/POOL=7.9
Ntuple/POOL=13.1
Ntuple/POOL=40.6
Ntuple/POOL=2.7
Ntuple/POOL=7.1
Ntuple/POOL=1.2
Ntuple/POOL=1.2
Ntuple/POOL=1.8
13
[email protected] - Novebmer 5, 2008
Small_D3PD
Hz
0 20000 40000 60000CINT (32Hz, 1%)
TSelector (40Hz, 2%)
PyAthena (367Hz, 28%)
PyRoot (382Hz, 22%)
Athana (855Hz, 3%)
Draw (6358Hz, 17%)
SFrame (14597Hz, 26%)
TSelector_ACLiC (33579Hz, 23%)
ACLiC_Opt (58223Hz, 18%)
gpp (71003Hz, 7%)
CINT (32Hz, 1%)
TSelector (40Hz, 2%)
PyAthena (367Hz, 28%)
PyRoot (382Hz, 22%)
Athana (855Hz, 3%)
Draw (6358Hz, 17%)
SFrame (14597Hz, 26%)
TSelector_ACLiC (33579Hz, 23%)
ACLiC_Opt (58223Hz, 18%)
gpp (71003Hz, 7%)
Small_D3PD
Ntuple/POOL=8.7
Ntuple/POOL=1.1
Ntuple/POOL=33.3
Ntuple/POOL=21.2
Ntuple/POOL=1.4
Ntuple/POOL=3.8
Ntuple/POOL=1.1
Ntuple/POOL=1.7
Small_D2PD
Hz
0 1000 2000
TSelector (23Hz, 1%)
CINT (29Hz, 4%)
PyRoot (100Hz, 10%)
Draw (300Hz, 29%)
PyAthena (326Hz, 4%)
Athana (596Hz, 5%)
SFrame (1679Hz, 29%)
gpp (2132Hz, 6%)
TSelector (23Hz, 1%)
CINT (29Hz, 4%)
PyRoot (100Hz, 10%)
Draw (300Hz, 29%)
PyAthena (326Hz, 4%)
Athana (596Hz, 5%)
SFrame (1679Hz, 29%)
gpp (2132Hz, 6%)
Small_D2PDD2PD Level ComparisonSmall D2PD Input Small D3PD Input
Hz
mode (rate, error)
HzPOOL analysis faster than AOD input by x4. Larger difference between Athena and PyAthena with smaller input files. Why?
14
[email protected] - Novebmer 5, 2008
Very_Small_D3PD
Hz
0 20000 40000 60000CINT (32Hz, 1%)
TSelector (40Hz, 1%)
PyRoot (331Hz, 25%)
PyAthena (343Hz, 28%)
Athana (854Hz, 5%)
Draw (6777Hz, 16%)
SFrame (13751Hz, 28%)
TSelector_ACLiC (34201Hz, 22%)
gpp (48516Hz, 17%)
ACLiC_Opt (63555Hz, 9%)
CINT (32Hz, 1%)
TSelector (40Hz, 1%)
PyRoot (331Hz, 25%)
PyAthena (343Hz, 28%)
Athana (854Hz, 5%)
Draw (6777Hz, 16%)
SFrame (13751Hz, 28%)
TSelector_ACLiC (34201Hz, 22%)
gpp (48516Hz, 17%)
ACLiC_Opt (63555Hz, 9%)
Very_Small_D3PDVery_Small_D2PD
Hz
0 1000 2000 3000
CINT (31Hz, 0%)
Draw (294Hz, 47%)
PyAthena (307Hz, 14%)
PyRoot (416Hz, 19%)
Athana (667Hz, 8%)
SFrame (2519Hz, 12%)
gpp (2798Hz, 5%)
CINT (31Hz, 0%)
Draw (294Hz, 47%)
PyAthena (307Hz, 14%)
PyRoot (416Hz, 19%)
Athana (667Hz, 8%)
SFrame (2519Hz, 12%)
gpp (2798Hz, 5%)
Very_Small_D2PDVery Small Input ComparisonVery Small D2PD Input Very Small D3PD Input
Ntuple/POOL=5.5
Ntuple/POOL=1.0
Ntuple/POOL=17.3
Ntuple/POOL=23.0
Ntuple/POOL=1.3
Ntuple/POOL=1.1
Ntuple/POOL=0.8
HzD2PD nearing D3PD even more. A few thousand Hz possible with ARA. Ntuple mode still factor of 5-10 faster in C++ modes.
15
[email protected] - Novebmer 5, 2008
Event Size (kB)0 20 40 60 80 100120140160
Eve
nt
Siz
e *
Exe
c R
ate
(kB
/s)
210
310
410
POOL Analysis
AthAthena
PyRoot
PyAthena
Draw
gpp
CINT
SFrame
TSelector
POOL Analysis
Event Size (kB)0 20 40 60 80 100120140160
Event S
ize *
Exec R
ate
(kB
/s)
210
310
410
POOL Analysis
Draw
Athena
PyRoot
PyAthena
gpp
CINT
SFrame
POOL Analysis
I/O Dependency Comparison
Clear I/O constraint > 20 kB in POOL analysis coming from file size, NOT read-out size. Ntuples are usually smaller than 20kB.
Event Size (kb)0 20 40 60 80 100120140160
Eve
nt
Siz
e *
Exe
c R
ate
(kb
/s)
210
310
410
POOL Analysis
AthAthena
PyRoot
PyAthena
Draw
gpp
CINT
SFrame
TSelector
POOL Analysis
Event Size (kB)0 1 2 3 4 5
Eve
nt
Siz
e *
Exe
c R
ate
(kB
/s)
210
310
410
510
Ntuple Analysis
ACLiC
gpp
PyAthena
TSelector
ACLiC_Opt
CINT
TSelector_ACLiC
AthAthena
PyRoot
SFrame
Draw
Ntuple Analysis
Event Size (kb)0 1 2 3 4 5
Event S
ize *
Exec R
ate
(kb/s
)
210
310
410
510
Ntuple Analysis
ACLiC
gpp
PyAthena
TSelector
ACLiC_Opt
CINT
TSelector_ACLiC
AthAthena
PyRoot
SFrame
Draw
Ntuple Analysis
16
[email protected] - Novebmer 5, 2008
Summary• Very clear performance advantage for ROOT native ntuple
format. An order of magnitude difference. Ball park figure: Thousands evts/sec vs hundreds of Hz. Those numbers should be taken as upper limit, real analyses would be more complex.
• Compiled mode is ~two orders of magnitude faster than non-compiled options.
• Use of frameworks, even quite a simple one, can slow things down, though, any realistic analysis would require some infrastructure. Choose/write frameworks wisely!
• With Athena, the overhead of framework seems large, though typical DPD jobs can be highly CPU intensive.
• Effect of file caching by system ties input file size and the execution rate (regardless of the actual read-out). Above 20 kb/evt, the analysis is bound by this effect. This is a very tight slimming/thinning requirement for D12PD. May be able to improve this with high performance disk.
17
[email protected] - Novebmer 5, 2008
Acknowlegement
I have bothered a lot of people with this project including (random order):Scott Snyder, Wim Lavrijsen, Sebastien Binet, Emil Obrekov, David Quarrie, Kyle Cranmer, David Adams, Sven Menke, Shuwei Ye, Sergey Panitkin, Stephanie Majeski, Hong Ma, Tadashi Maeno, Attila Krasznahorkay, Jim Cochran, roottalk, Paolo Califiura
Many thanks.
18