enabling the next generation of particle physics experiments: gpus for online track reconstruction
DESCRIPTION
Status of the work for my PhD I presented at GPU Technology Conference.TRANSCRIPT
-
Mitg
lied
der H
elm
holtz
-Gem
eins
chaf
t
1
GPU Technology Conference 201426 March 2014, Andreas Herten (Institute for Nuclear Physics, Forschungszentrum Jlich, Germany)
Enabling the Next Generation ofParticle Physics Experiments:GPUs for Online Track Reconstruction
-
Mitg
lied
der H
elm
holtz
-Gem
eins
chaf
t
Outline
High Energy Physics PANDA Experiment Particle Tracking GPUs at PANDA Algorithms Hough Transform Riemann Track Finder Triplet Finder
2
-
Mitg
lied
der H
elm
holtz
-Gem
eins
chaf
t
HEPHigh Energy Physics
3
-
Mitg
lied
der H
elm
holtz
-Gem
eins
chaf
t
High Energy Physics
High Energy Physics (HEP) in a nutshell:
4
HEP Recipe1. Accelerate particles (e, p,)2. Accelerate particles more!3. Smash into each other4. Look at resulting particles5. Understand world
-
Mitg
lied
der H
elm
holtz
-Gem
eins
chaf
t
High Energy Physics
High Energy Physics (HEP) in a nutshell:
4
HEP Recipe1. Accelerate particles (e, p,)2. Accelerate particles more!3. Smash into each other4. Look at resulting particles5. Understand world
-
Mitg
lied
der H
elm
holtz
-Gem
eins
chaf
t
High Energy Physics
High Energy Physics (HEP) in a nutshell:
4
HEP Recipe1. Accelerate particles (e, p,)2. Accelerate particles more!3. Smash into each other4. Look at resulting particles5. Understand world
-
Mitg
lied
der H
elm
holtz
-Gem
eins
chaf
t
High Energy Physics
High Energy Physics (HEP) in a nutshell:
4
HEP Recipe1. Accelerate particles (e, p,)2. Accelerate particles more!3. Smash into each other4. Look at resulting particles5. Understand world
-
Mitg
lied
der H
elm
holtz
-Gem
eins
chaf
t
High Energy Physics
High Energy Physics (HEP) in a nutshell:
4
HEP Recipe1. Accelerate particles (e, p,)2. Accelerate particles more!3. Smash into each other4. Look at resulting particles5. Understand world
E=mc2
-
Mitg
lied
der H
elm
holtz
-Gem
eins
chaf
t
High Energy Physics
High Energy Physics (HEP) in a nutshell:
4
HEP Recipe1. Accelerate particles (e, p,)2. Accelerate particles more!3. Smash into each other4. Look at resulting particles5. Understand world
E=mc2
-
Mitg
lied
der H
elm
holtz
-Gem
eins
chaf
t
High Energy Physics
High Energy Physics (HEP) in a nutshell:
4
HEP Recipe1. Accelerate particles (e, p,)2. Accelerate particles more!3. Smash into each other4. Look at resulting particles5. Understand world
E=mc2
-
Mitg
lied
der H
elm
holtz
-Gem
eins
chaf
t
High Energy Physics
High Energy Physics (HEP) in a nutshell:
4
HEP Recipe1. Accelerate particles (e, p,)2. Accelerate particles more!3. Smash into each other4. Look at resulting particles5. Understand world
GPUs are interesting for HEP Many events due to high collision rate Events independent, dividable into subsets Many features extractable (computational intensive)
E=mc2
-
Mitg
lied
der H
elm
holtz
-Gem
eins
chaf
t
PANDA
5
-
Mitg
lied
der H
elm
holtz
-Gem
eins
chaf
t
PANDA FAIR
Anti Proton Annihilation at Darmstadt
6
-
Mitg
lied
der H
elm
holtz
-Gem
eins
chaf
t
PANDA FAIR
Anti Proton Annihilation at Darmstadt FAIR: Facility for Antiproton and Ion Research Accelerator complex at GSI Darmstadt Currently under construction
6
-
Mitg
lied
der H
elm
holtz
-Gem
eins
chaf
t
PANDA FAIR
Anti Proton Annihilation at Darmstadt FAIR: Facility for Antiproton and Ion Research Accelerator complex at GSI Darmstadt Currently under construction
6
-
Mitg
lied
der H
elm
holtz
-Gem
eins
chaf
t
PANDA The Experiment
7
13 m (43 ft)
-
Mitg
lied
der H
elm
holtz
-Gem
eins
chaf
t
PANDA The Experiment
7
13 m (43 ft)
p
p
-
Mitg
lied
der H
elm
holtz
-Gem
eins
chaf
t
PANDA The Experiment
7
13 m (43 ft)
p
p
-
Mitg
lied
der H
elm
holtz
-Gem
eins
chaf
t
PANDA The Experiment
7
13 m (43 ft)
p
p
Magnet
STT
MVD
-
Mitg
lied
der H
elm
holtz
-Gem
eins
chaf
t
PANDA Event Reconstruction
Continuous read out Background & signal similar Novel feature
Event Rate: 2 107/s
8
Raw Data Rate:200 GB/s
Disk Storage Space forOine Analysis: 2 PB/y
Reduce by~1/1000(Reject background events,save interesting physics events)
-
Mitg
lied
der H
elm
holtz
-Gem
eins
chaf
t
PANDA Event Reconstruction
Continuous read out Background & signal similar Novel feature
Event Rate: 2 107/s
8
Raw Data Rate:200 GB/s
Disk Storage Space forOine Analysis: 2 PB/y
Reduce by~1/1000(Reject background events,save interesting physics events)
GPUs
-
Mitg
lied
der H
elm
holtz
-Gem
eins
chaf
t
9
PANDA Online Tracking Example
pp + -
The physics side:Antiproton-proton event
e+e-
-
Mitg
lied
der H
elm
holtz
-Gem
eins
chaf
t
9
PANDA Online Tracking Example
pp + -
The physics side:Antiproton-proton event
e+e-
-
Mitg
lied
der H
elm
holtz
-Gem
eins
chaf
t
9
PANDA Online Tracking Example
pp + -
The physics side:Antiproton-proton event
e+e-
-
Mitg
lied
der H
elm
holtz
-Gem
eins
chaf
t
9
PANDA Online Tracking Example
pp + -
The physics side:Antiproton-proton event
e+e-
-
Mitg
lied
der H
elm
holtz
-Gem
eins
chaf
t
9
PANDA Online Tracking Example
+
-
e+
e-
pp + -
The physics side:Antiproton-proton event
e+e-
-
Mitg
lied
der H
elm
holtz
-Gem
eins
chaf
t
9
PANDA Online Tracking Example
+
-
e+
e-
pp + -
The physics side:Antiproton-proton event
e+e-
-
Mitg
lied
der H
elm
holtz
-Gem
eins
chaf
t
9
PANDA Online Tracking Example
+
-
e+
e-
pp + -
The physics side:Antiproton-proton event
e+e-
-
Mitg
lied
der H
elm
holtz
-Gem
eins
chaf
t
9
PANDA Online Tracking Example
pp + -
The physics side:Antiproton-proton event
e+e-
-
Mitg
lied
der H
elm
holtz
-Gem
eins
chaf
t
10
PANDA Online Tracking ExampleThe detector sideEverything in reverse
-
Mitg
lied
der H
elm
holtz
-Gem
eins
chaf
t
10
PANDA Online Tracking ExampleThe detector sideEverything in reverse
-
Mitg
lied
der H
elm
holtz
-Gem
eins
chaf
t
10
PANDA Online Tracking ExampleThe detector sideEverything in reverse
Particle tracks are curves*
-
Mitg
lied
der H
elm
holtz
-Gem
eins
chaf
t
10
PANDA Online Tracking ExampleThe detector sideEverything in reverse
Particle tracks are curves*act
ually: 3
D helic
es
-
Mitg
lied
der H
elm
holtz
-Gem
eins
chaf
t
10
PANDA Online Tracking ExampleThe detector sideEverything in reverse
Particle tracks are curves* Find curves connecting hit points!
-
Mitg
lied
der H
elm
holtz
-Gem
eins
chaf
t
10
PANDA Online Tracking ExampleThe detector sideEverything in reverse
Particle tracks are curves* Find curves connecting hit points!
-
Mitg
lied
der H
elm
holtz
-Gem
eins
chaf
t
10
PANDA Online Tracking ExampleThe detector sideEverything in reverse
Particle tracks are curves* Find curves connecting hit points!
Sort by track quality
Hits well matched?How many hits?
-
Mitg
lied
der H
elm
holtz
-Gem
eins
chaf
t
10
PANDA Online Tracking ExampleThe detector sideEverything in reverse
Particle tracks are curves* Find curves connecting hit points!
Sort by track quality
Hits well matched?How many hits?
-
Mitg
lied
der H
elm
holtz
-Gem
eins
chaf
t
10
PANDA Online Tracking ExampleThe detector sideEverything in reverse
Particle tracks are curves* Find curves connecting hit points!
Sort by track quality
Hits well matched?How many hits?
Identify finalparticles
Curvature, length
-
Mitg
lied
der H
elm
holtz
-Gem
eins
chaf
t
10
PANDA Online Tracking ExampleThe detector sideEverything in reverse
Particle tracks are curves* Find curves connecting hit points!
Sort by track quality
Hits well matched?How many hits?
Identify finalparticles
Curvature, length
+
-
e+
e-
?
-
Mitg
lied
der H
elm
holtz
-Gem
eins
chaf
t
10
PANDA Online Tracking ExampleThe detector sideEverything in reverse
Particle tracks are curves* Find curves connecting hit points!
Sort by track quality
Hits well matched?How many hits?
Identify finalparticles
Curvature, length
Identify intermediateparticles
Mass constraintsGeometry
+
-
e+
e-
?
-
Mitg
lied
der H
elm
holtz
-Gem
eins
chaf
t
10
PANDA Online Tracking ExampleThe detector sideEverything in reverse
Particle tracks are curves* Find curves connecting hit points!
Sort by track quality
Hits well matched?How many hits?
Identify finalparticles
Curvature, length
Identify intermediateparticles
Mass constraintsGeometry
Identify process:pp e+e- + -
+
-
e+
e-
?
-
Mitg
lied
der H
elm
holtz
-Gem
eins
chaf
t
11
PANDA Triggering
-
Mitg
lied
der H
elm
holtz
-Gem
eins
chaf
t
11
PANDA TriggeringTrigger
-
Mitg
lied
der H
elm
holtz
-Gem
eins
chaf
t
11
PANDA TriggeringTrigger
Fast detector layer(s)Trigger data acquisition
-
Mitg
lied
der H
elm
holtz
-Gem
eins
chaf
t
11
PANDA TriggeringTrigger
Fast detector layer(s)Trigger data acquisition
+
-
e+
e-
-
Mitg
lied
der H
elm
holtz
-Gem
eins
chaf
t
11
PANDA TriggeringTrigger
Fast detector layer(s)Trigger data acquisition
+
-
e+
e-
Usual HEP experiment
-
Mitg
lied
der H
elm
holtz
-Gem
eins
chaf
t
11
PANDA TriggeringTrigger
Fast detector layer(s)Trigger data acquisition
+
-
e+
e-
Usual HEP experiment
-
Mitg
lied
der H
elm
holtz
-Gem
eins
chaf
t
11
PANDA TriggeringTrigger
Fast detector layer(s)Trigger data acquisition
+
-
e+
e-
Usual HEP experiment
-
Mitg
lied
der H
elm
holtz
-Gem
eins
chaf
t
11
PANDA TriggeringTrigger
Fast detector layer(s)Trigger data acquisition
+
-
e+
e-
Usual HEP experiment
-
Mitg
lied
der H
elm
holtz
-Gem
eins
chaf
t
11
PANDA TriggeringTrigger
Fast detector layer(s)Trigger data acquisition
+
-
e+
e-
PANDA
Usual HEP experiment
-
Mitg
lied
der H
elm
holtz
-Gem
eins
chaf
t
11
PANDA TriggeringTrigger
Fast detector layer(s)Trigger data acquisition
+
-
e+
e-
PANDA
Usual HEP experiment
-
Mitg
lied
der H
elm
holtz
-Gem
eins
chaf
t
11
PANDA TriggeringTrigger
Fast detector layer(s)Trigger data acquisition
+
-
e+
e-
PANDA
Usual HEP experiment
-
Mitg
lied
der H
elm
holtz
-Gem
eins
chaf
t
11
PANDA TriggeringTrigger
Fast detector layer(s)Trigger data acquisition
+
-
e+
e-
PANDA
Usual HEP experiment
-
Mitg
lied
der H
elm
holtz
-Gem
eins
chaf
t
11
PANDA TriggeringTrigger
Fast detector layer(s)Trigger data acquisition
Online Trac
king!
+
-
e+
e-
PANDA
Usual HEP experiment
-
Mitg
lied
der H
elm
holtz
-Gem
eins
chaf
t
GPUS AT PANDA
12
-
Mitg
lied
der H
elm
holtz
-Gem
eins
chaf
t
GPUs @PANDA Online Tracking
Port tracking algorithms to GPU Serial parallel C++ CUDA
Investigate suitability for online performance But also: Find & invent tracking algorithms Under investigation: Hough Transformation Riemann Track Finder Triplet Finder
13
-
Mitg
lied
der H
elm
holtz
-Gem
eins
chaf
t
ALGORITHMS #1
14
Hough TransformRiemann Track Finder
Triplet Finder
-
Mitg
lied
der H
elm
holtz
-Gem
eins
chaf
t
Hough Transform
Established method for edge detection in images(from 1970s HEP experiments!) New challenges for
particle tracking algorithm Only limited pixels per edge
Easily parallelizable method
15
Original algorithm byHough, adapted by
Duda & Hart
-
Mitg
lied
der H
elm
holtz
-Gem
eins
chaf
t
Hough Transform Method
Idea: Transform (x,y)i (,r)ij, find lines via (,r) space Solve rij line equation for Many hits (x,y)i Many j [0,360) each
Fill histogram Extract track parameters
16
x
y
x
y
Mitg
lied
der H
elm
holtz
-Gem
eins
chaf
t
Hough Transform Principle
43
Bin with highest multiplicity gives track parameters
r
rij = cosj xi + sinj yi + i
More
-
Mitg
lied
der H
elm
holtz
-Gem
eins
chaf
t
Hough Transform Method
Idea: Transform (x,y)i (,r)ij, find lines via (,r) space Solve rij line equation for Many hits (x,y)i Many j [0,360) each
Fill histogram Extract track parameters
16
x
y
x
y
Mitg
lied
der H
elm
holtz
-Gem
eins
chaf
t
Hough Transform Principle
43
Bin with highest multiplicity gives track parameters
r
rij = cosj xi + sinj yi + i
More
i: ~100 hits/event (STT)j: steps of 0.2 rij: 180 000
rij = cosj xi + sinj yi
-
Mitg
lied
der H
elm
holtz
-Gem
eins
chaf
t
17
Angle / 0 20 40 60 80 100 120 140 160 180
Houg
h tra
nsfo
rmed
-0.04
-0.02
0
0.02
0.04
0.06
0Entries 324000Mean x 90Mean y 0.02791RMS x 51.96RMS y 0.02133
0
1
2
3
4
5
6
7
8
9
100Entries 324000Mean x 90Mean y 0.02791RMS x 51.96RMS y 0.02133
PANDA STT180 x 180 Grid
r
0.06
0.04
Hough Transform Example
10 (x,y) points
-
Mitg
lied
der H
elm
holtz
-Gem
eins
chaf
t
17
Angle / 0 20 40 60 80 100 120 140 160 180
Houg
h tra
nsfo
rmed
-0.04
-0.02
0
0.02
0.04
0.06
0Entries 324000Mean x 90Mean y 0.02791RMS x 51.96RMS y 0.02133
0
1
2
3
4
5
6
7
8
9
100Entries 324000Mean x 90Mean y 0.02791RMS x 51.96RMS y 0.02133
PANDA STT180 x 180 Grid
r
0.06
0.04
Hough Transform Example
10 (x,y) points
-
Mitg
lied
der H
elm
holtz
-Gem
eins
chaf
t
17
r
0.06
0.04
Angle / 0 20 40 60 80 100 120 140 160 180
Houg
h tra
nsfo
rmed
-0.4
-0.3
-0.2
-0.1
0
0.1
0.2
0.3
0.4
0.5
0.6 0Entries 2.2356e+08Mean x 90Mean y 0.02905RMS x 51.96RMS y 0.1063
0
5
10
15
20
250
Entries 2.2356e+08Mean x 90Mean y 0.02905RMS x 51.96RMS y 0.1063
1800 x 1800 GridPANDA STT+MVD
68 (x,y) points
Hough Transform Example
-
Mitg
lied
der H
elm
holtz
-Gem
eins
chaf
t
17
r
0.06
0.04
Angle / 0 20 40 60 80 100 120 140 160 180
Houg
h tra
nsfo
rmed
-0.4
-0.3
-0.2
-0.1
0
0.1
0.2
0.3
0.4
0.5
0.6 0Entries 2.2356e+08Mean x 90Mean y 0.02905RMS x 51.96RMS y 0.1063
0
5
10
15
20
250
Entries 2.2356e+08Mean x 90Mean y 0.02905RMS x 51.96RMS y 0.1063
1800 x 1800 GridPANDA STT+MVD
68 (x,y) points
Hough Transform Example
-
Mitg
lied
der H
elm
holtz
-Gem
eins
chaf
t
Hough Transform Remarks
18
Two Implementations
Thrust Plain CUDA
Performance: 3 ms/event Independent of angular granularity
Reduced to set of standard routines Fast (uses Thrusts optimized algorithms)
Inflexible (has its limits, hard to customize)
No peakfinding included Even possible?
Adds to time!
Performance: 0.5 ms/event Built completely for this task
Fitting to every problem
Customizable
A bit more complicated at parts
Simple peakfinder implemented (threshold)
Using: Dynamic Parallelism, Shared Memory
-
Mitg
lied
der H
elm
holtz
-Gem
eins
chaf
t
19
ALGORITHMS #2Hough Transform
Riemann Track FinderTriplet Finder
-
Mitg
lied
der H
elm
holtz
-Gem
eins
chaf
t
20
Riemann Track Finder
Algorithm in use in PANDAs oine analysis frameworkfor long time Good results Well-understood Handling of uncertainties
Work by Jonathan Timcheck Summer student at Jlich
Based on work byStrandlie et al
-
Mitg
lied
der H
elm
holtz
-Gem
eins
chaf
t
21
Riemann Track Finder Method
Idea: Dont fit lines (in 2D), fit planes (in 3D)! Create seeds All possible three hit combinations
Grow seeds to tracksContinuously test next hit if it fits Use mapping to Riemann paraboloid
xx
x
x
y
z
x
xx
y
xx
xx
y
xMore on: Seeds; Growing
-
nLayerx =1
2
p8x+ 1 1
pos(nLayerx) =
3pp
3p243x2 1+ 27x32/3
+1
3p3
3pp
3p243x2 1+ 27x
1
Mitg
lied
der H
elm
holtz
-Gem
eins
chaf
t
22
Riemann Algorithm GPU Version
GPU Optimization: Unfolding loops
100 faster than CPU version
Time for one event (Tesla K20X)Time(%) Time Calls Avg Min Max Name 75.55% 439.49us 1 439.49us 439.49us 439.49us extend_cut_hit_triplets_k 5.96% 34.656us 4 8.6640us 2.3360us 22.432us [CUDA memcpy DtoH] 4.36% 25.344us 1 25.344us 25.344us 25.344us cut_hit_triplets_k 4.26% 24.800us 6 4.1330us 3.7760us 5.3440us [CUDA memset] 2.57% 14.976us 1 14.976us 14.976us 14.976us generate_hit_triplet 2.44% 14.176us 1 14.176us 14.176us 14.176us generate_layer_triplets 1.30% 7.5520us 1 7.5520us 7.5520us 7.5520us void thrust 1.11% 6.4640us 1 6.4640us 6.4640us 6.4640us void thrust 1.11% 6.4640us 1 6.4640us 6.4640us 6.4640us void thrust 0.89% 5.1520us 5 1.0300us 928ns 1.3440us [CUDA memcpy HtoD] 0.45% 2.6240us 1 2.6240us 2.6240us 2.6240us project_onto_paraboloid_k
int ijk = threadIdx.x + blockIdx.x * blockDim.x;for () {for () {for () {}}}
-
Mitg
lied
der H
elm
holtz
-Gem
eins
chaf
t
23
ALGORITHMS #3Hough Transform
Riemann Track FinderTriplet Finder
-
Mitg
lied
der H
elm
holtz
-Gem
eins
chaf
t
24
Triplet Finder
Algorithm specifically designed for thePANDA Straw Tube Tracker (STT)
http://www.fz-juelich.de/ias/jsc/
Original algorithm byMarius Mertens et al
1.5 m
Ported to GPU by Andrew Adinetz NVIDIA Application Lab Jlich CUDA, Dynamic Parallelism, Thrust
-
Mitg
lied
der H
elm
holtz
-Gem
eins
chaf
t
25
Triplet Finder
Idea: Use only subset of detector as seed Combine 3 hits to Triplet Calculate circle from 3 Triplets (no fit)
Features Fast & robust algorithm, no t0 Many tuning possibilities
More
-
Mitg
lied
der H
elm
holtz
-Gem
eins
chaf
t
Triplet Finder Display
26
Triplet
Isochrone early
Isochrone early & skewed
Isochrone close
Isochrone late
MVD hit
Track timed out
Track current
-
Mitg
lied
der H
elm
holtz
-Gem
eins
chaf
t
27
Triplet Finder Times
-
Mitg
lied
der H
elm
holtz
-Gem
eins
chaf
t
27
Triplet Finder Times
-
Mitg
lied
der H
elm
holtz
-Gem
eins
chaf
t
Triplet Finder Optimizations
Bunching Wrapper Hits from one event have similar timestamp Combine hits to sets (bunches) which occupy GPU best
28
-
Mitg
lied
der H
elm
holtz
-Gem
eins
chaf
t
Triplet Finder Optimizations
Bunching Wrapper Hits from one event have similar timestamp Combine hits to sets (bunches) which occupy GPU best
28
Hit
-
Mitg
lied
der H
elm
holtz
-Gem
eins
chaf
t
Triplet Finder Optimizations
Bunching Wrapper Hits from one event have similar timestamp Combine hits to sets (bunches) which occupy GPU best
28
Hit Event
-
Mitg
lied
der H
elm
holtz
-Gem
eins
chaf
t
Triplet Finder Optimizations
Bunching Wrapper Hits from one event have similar timestamp Combine hits to sets (bunches) which occupy GPU best
28
Hit Event
-
Mitg
lied
der H
elm
holtz
-Gem
eins
chaf
t
Triplet Finder Optimizations
Bunching Wrapper Hits from one event have similar timestamp Combine hits to sets (bunches) which occupy GPU best
28
Hit Event
Bunch
-
Mitg
lied
der H
elm
holtz
-Gem
eins
chaf
t
Triplet Finder Optimizations
Bunching Wrapper Hits from one event have similar timestamp Combine hits to sets (bunches) which occupy GPU best
28
Hit Event
Bunch (N2) (N)
-
Mitg
lied
der H
elm
holtz
-Gem
eins
chaf
t
29
Triplet Finder Bunching Performance
-
Mitg
lied
der H
elm
holtz
-Gem
eins
chaf
t
Triplet Finder Optimizations
30More
Sector Row testing After found track:
Hit association not with all hits of current window,but only with subset(first test rows of sector, then hits of row)
-
Mitg
lied
der H
elm
holtz
-Gem
eins
chaf
t
Triplet Finder Optimizations
30More
Sector Row testing After found track:
Hit association not with all hits of current window,but only with subset(first test rows of sector, then hits of row)
-
Mitg
lied
der H
elm
holtz
-Gem
eins
chaf
t
Triplet Finder Optimizations
30More
Sector Row testing After found track:
Hit association not with all hits of current window,but only with subset(first test rows of sector, then hits of row)
-
Mitg
lied
der H
elm
holtz
-Gem
eins
chaf
t
Triplet Finder Optimizations
30More
Sector Row testing After found track:
Hit association not with all hits of current window,but only with subset(first test rows of sector, then hits of row)
-
Mitg
lied
der H
elm
holtz
-Gem
eins
chaf
t
Triplet Finder Optimizations
30More
Sector Row testing After found track:
Hit association not with all hits of current window,but only with subset(first test rows of sector, then hits of row)
-
Mitg
lied
der H
elm
holtz
-Gem
eins
chaf
t
31
Triplet Finder Sector Rows
Preliminary(in publication)
-
DynamicParallelism
Mitg
lied
der H
elm
holtz
-Gem
eins
chaf
t
Triplet Finder Optimizations
Compare kernel launch strategies
32
1 thread/bunchCalling kernel1 thread/bunch
Calling kernel
TripletFinder
1 thread/bunch
Calling kernel
1 block/bunch
Joined kernel1 block/bunch
Joined kernel1 block/bunch
Joined kernel
TF Stage #1
TF Stage #2
TF Stage #3
TF Stage #4
1 stream/bunch
Combining stream
1 stream/bunch
Combining stream
1 stream/bunch
Calling stream
JoinedKernel
HostStreams
TripletFinder
TripletFinder
CPU
GPU
TF Stage #1
TF Stage #2
TF Stage #3
TF Stage #4
TF Stage #1
TF Stage #2
TF Stage #3
TF Stage #4
CPU
GPU
-
Mitg
lied
der H
elm
holtz
-Gem
eins
chaf
t
33
Triplet Finder Kernel Launches
Explanation
Preliminary(in publication)
-
Tesla K40 Tesla K20X
Peak double performance
Peak single performance
GPU Chipset
# CUDA Cores
Memory size
Memory bandwidth
1.46 TFLOPS 1.31 TFLOPS
4.29 TFLOPS 3.95 TFLOPS
GK110B GK110
2880 2688
12 GB 6 GB
288 GByte/s 250 GByte/s
Mitg
lied
der H
elm
holtz
-Gem
eins
chaf
t
Triplet Finder Optimizations
Impact of chipset
34Source: http://www.nvidia.com/content/tesla/pdf/NVIDIA-Tesla-Kepler-Family-Datasheet.pdf
-
Mitg
lied
der H
elm
holtz
-Gem
eins
chaf
t
35
Triplet Finder Clock Speed / GPU
Preliminary(in publication)
K40 3004 MHz, 745 MHz / 875 MHzK20X 2600 MHz, 732 MHz / 784 MHz
Memory Clock Core Clock GPU Boost
-
Mitg
lied
der H
elm
holtz
-Gem
eins
chaf
t
Triplet Finder Optimizations
Many optimizations possible Most important: Bunching wrapper More float less double-cards la K10 a viable alternative
Best performance: 20 s/event Online Tracking a feasible technique for PANDA Multi GPU system needed (100) GPUs
36
-
Mitg
lied
der H
elm
holtz
-Gem
eins
chaf
t
Summary
GPUs are very interesting for HEP PANDA investigates GPUs as central element in experiments
design Algorithms in active evaluation and optimization Collaboration with NVIDIA Application Lab
37
-
Thank you!Andreas Herten
@AndiH#GTC14
Mitg
lied
der H
elm
holtz
-Gem
eins
chaf
t
Summary
GPUs are very interesting for HEP PANDA investigates GPUs as central element in experiments
design Algorithms in active evaluation and optimization Collaboration with NVIDIA Application Lab
37
-
Mitg
lied
der H
elm
holtz
-Gem
eins
chaf
t
List of Resources Used
#4: Earth icon by Francesco Paleari from The Noun Project
#4: Einstein icon by Roman Rusinov from The Noun Project #6: FAIR vector logo from ocial FAIR website #6: FAIR rendering from ocial website #11: Flare Gun icon by Jop van der Kroef from The Noun Project
#27: STT event animation by Marius C. Mertens #35: Graphics cards images by NVIDIA promotion #35: GPU Specifications
Tesla K20X Specifications: http://www.nvidia.com/content/PDF/kepler/Tesla-K20X-BD-06397-001-v07.pdf
Tesla K40 Specifications: http://www.nvidia.com/content/PDF/kepler/Tesla-K40-Active-Board-Spec-BD-06949-001_v03.pdf
Tesla Familiy Overview: http://www.nvidia.com/content/tesla/pdf/NVIDIA-Tesla-Kepler-Family-Datasheet.pdf
38
-
Mitg
lied
der H
elm
holtz
-Gem
eins
chaf
t
BACKUP
39
-
Mitg
lied
der H
elm
holtz
-Gem
eins
chaf
t
Hough Transform Principle
40Back
-
Mitg
lied
der H
elm
holtz
-Gem
eins
chaf
t
Hough Transform Principle
40
x
y
Back
-
Mitg
lied
der H
elm
holtz
-Gem
eins
chaf
t
Hough Transform Principle
40
x
y
Back
-
Mitg
lied
der H
elm
holtz
-Gem
eins
chaf
t
Hough Transform Principle
40
x
y
*
*
(r, )1
rij = cosj xi + sinj yi + i
Back
-
Mitg
lied
der H
elm
holtz
-Gem
eins
chaf
t
Hough Transform Principle
40
x
y
*
*
r
(r, )1
rij = cosj xi + sinj yi + i
Back
-
Mitg
lied
der H
elm
holtz
-Gem
eins
chaf
t
Hough Transform Principle
40
x
y
*
*
r
(r, )1
rij = cosj xi + sinj yi + i
Back
-
Mitg
lied
der H
elm
holtz
-Gem
eins
chaf
t
Hough Transform Principle
40
x
y
*
*
r
(r, )1
(r, )2
rij = cosj xi + sinj yi + i
Back
-
Mitg
lied
der H
elm
holtz
-Gem
eins
chaf
t
Hough Transform Principle
40
x
y
*
*
r
rij = cosj xi + sinj yi + i
Back
-
Mitg
lied
der H
elm
holtz
-Gem
eins
chaf
t
Hough Transform Principle
40
x
y
*
*
r
rij = cosj xi + sinj yi + i
Back
-
Mitg
lied
der H
elm
holtz
-Gem
eins
chaf
t
Hough Transform Principle
40
x
y
*
*
r
rij = cosj xi + sinj yi + i
Back
-
Mitg
lied
der H
elm
holtz
-Gem
eins
chaf
t
Hough Transform Principle
40
x
y
*
*
r
rij = cosj xi + sinj yi + i
Back
-
Mitg
lied
der H
elm
holtz
-Gem
eins
chaf
t
Hough Transform Principle
40
x
y
*
*
r
rij = cosj xi + sinj yi + i
Back
-
Mitg
lied
der H
elm
holtz
-Gem
eins
chaf
t
Hough Transform Principle
40
x
y
Bin with highest multiplicity gives track parameters
*
*
r
rij = cosj xi + sinj yi + i
Back
-
Mitg
lied
der H
elm
holtz
-Gem
eins
chaf
t
41
Riemann Algorithm Procedure
-
Mitg
lied
der H
elm
holtz
-Gem
eins
chaf
t
41
Riemann Algorithm Procedure
Create triplet of hit points All possible three hit combinations need to become triplets
1
-
Mitg
lied
der H
elm
holtz
-Gem
eins
chaf
t
41
Riemann Algorithm Procedure
Create triplet of hit points All possible three hit combinations need to become triplets
Grow triplets to tracks:Continuously test next hit if it fits to triplet track Use Riemann paraboloid to circle fit track Test closeness of new hit: good add hit; bad dismiss hit Continue with next hit
Helix fit: arc length s vs. z position
1
2
-
Mitg
lied
der H
elm
holtz
-Gem
eins
chaf
t
42
1 2 3 4 5
1
2
3
4
5
Riemann Algorithm 1 Triplets1
Layer number
Back
-
Mitg
lied
der H
elm
holtz
-Gem
eins
chaf
t
42
1 2 3 4 5
1
2
3
4
5
Riemann Algorithm 1 Triplets1
Layer number
Back
-
Mitg
lied
der H
elm
holtz
-Gem
eins
chaf
t
42
1 2 3 4 5
1
2
3
4
5
Riemann Algorithm 1 Triplets1
Layer number
Back
-
Mitg
lied
der H
elm
holtz
-Gem
eins
chaf
t
42
1 2 3 4 5
2111 31
1
2
3
4
5
Riemann Algorithm 1 Triplets1
Layer number
Back
-
Mitg
lied
der H
elm
holtz
-Gem
eins
chaf
t
42
1 2 3 4 5
2111 31
3111 41
1
2
3
4
5
Riemann Algorithm 1 Triplets1
Layer number
Back
-
Mitg
lied
der H
elm
holtz
-Gem
eins
chaf
t
42
1 2 3 4 5
2111 31
3111 41
3111 32
1
2
3
4
5
Riemann Algorithm 1 Triplets1
Layer number
Back
-
Mitg
lied
der H
elm
holtz
-Gem
eins
chaf
t
42
1 2 3 4 5
2111 31
3111 41
3111 32
1
2
3
4
5
Riemann Algorithm 1 Triplets1
Layer number
Back
-
Mitg
lied
der H
elm
holtz
-Gem
eins
chaf
t
43
Riemann Algorithm 1 Expansion2
Back
-
Mitg
lied
der H
elm
holtz
-Gem
eins
chaf
t
43
Riemann Algorithm 1 Expansion2
xx
x
x
y
z
Expand to z
Back
-
Mitg
lied
der H
elm
holtz
-Gem
eins
chaf
t
43
Riemann Algorithm 1 Expansion2
xx
x
x
y
z
Expand to z
x
xx
y
x
Riemann Surface(paraboloid)
Back
-
Mitg
lied
der H
elm
holtz
-Gem
eins
chaf
t
43
Riemann Algorithm 1 Expansion2
xx
x
x
y
z
Expand to z
x
xx
y
x
Riemann Surface(paraboloid)
Back
-
Mitg
lied
der H
elm
holtz
-Gem
eins
chaf
t
43
Riemann Algorithm 1 Expansion2
xx
x
x
y
z
Expand to z
x
xx
y
x
Riemann Surface(paraboloid)
Back
-
Mitg
lied
der H
elm
holtz
-Gem
eins
chaf
t
43
Riemann Algorithm 1 Expansion2
xx
x
x
y
z
Expand to z
x
xx
y
x
Riemann Surface(paraboloid)
Back
-
Mitg
lied
der H
elm
holtz
-Gem
eins
chaf
t
43
Riemann Algorithm 1 Expansion2
xx
x
x
y
z
Expand to z
x
xx
y
x
Riemann Surface(paraboloid)
Back
-
Mitg
lied
der H
elm
holtz
-Gem
eins
chaf
t
43
Riemann Algorithm 1 Expansion2
xx
x
x
y
z
Expand to z
x
xx
y
x
Riemann Surface(paraboloid)
x
Back
-
Mitg
lied
der H
elm
holtz
-Gem
eins
chaf
t
43
Riemann Algorithm 1 Expansion2
xx
x
x
y
z
Expand to z
x
xx
y
x
Riemann Surface(paraboloid)
x
Back
-
Mitg
lied
der H
elm
holtz
-Gem
eins
chaf
t
43
Riemann Algorithm 1 Expansion2
xx
x
x
y
z
Expand to z
x
xx
y
x
Riemann Surface(paraboloid)
x
Back
-
Mitg
lied
der H
elm
holtz
-Gem
eins
chaf
t
43
Riemann Algorithm 1 Expansion2
xx
x
x
y
z
Expand to z
x
xx
y
x
Riemann Surface(paraboloid)
x
Back
-
Mitg
lied
der H
elm
holtz
-Gem
eins
chaf
t
43
Riemann Algorithm 1 Expansion2
xx
x
x
y
z
Expand to z
x
xx
y
x
Riemann Surface(paraboloid)
x
Back
-
Mitg
lied
der H
elm
holtz
-Gem
eins
chaf
t
Triplet Finder Method
44More
STT
-
Mitg
lied
der H
elm
holtz
-Gem
eins
chaf
t
Triplet Finder Method
STT hit in pivot straw
44More
STT
-
Mitg
lied
der H
elm
holtz
-Gem
eins
chaf
t
Triplet Finder Method
STT hit in pivot straw Find surrounding hits Create virtual hit (triplet) at center of gravity (cog)
44More
STT
-
Mitg
lied
der H
elm
holtz
-Gem
eins
chaf
t
Triplet Finder Method
STT hit in pivot straw Find surrounding hits Create virtual hit (triplet) at center of gravity (cog) Combine with
44More
STT
-
Mitg
lied
der H
elm
holtz
-Gem
eins
chaf
t
Triplet Finder Method
STT hit in pivot straw Find surrounding hits Create virtual hit (triplet) at center of gravity (cog) Combine with
1. Second STT pivot-cog virtual hit
44More
STT
-
Mitg
lied
der H
elm
holtz
-Gem
eins
chaf
t
Triplet Finder Method
STT hit in pivot straw Find surrounding hits Create virtual hit (triplet) at center of gravity (cog) Combine with
1. Second STT pivot-cog virtual hit2. Interaction point
44More
STT
-
Mitg
lied
der H
elm
holtz
-Gem
eins
chaf
t
Triplet Finder Method
STT hit in pivot straw Find surrounding hits Create virtual hit (triplet) at center of gravity (cog) Combine with
1. Second STT pivot-cog virtual hit2. Interaction point
Calculate circle through three points
44More
STT
-
Mitg
lied
der H
elm
holtz
-Gem
eins
chaf
t
Triplet Finder Method
STT hit in pivot straw Find surrounding hits Create virtual hit (triplet) at center of gravity (cog) Combine with
1. Second STT pivot-cog virtual hit2. Interaction point
Calculate circle through three points Track Candidate
44More
STT
-
Mitg
lied
der H
elm
holtz
-Gem
eins
chaf
t
Triplet Finder Method
STT hit in pivot straw Find surrounding hits Create virtual hit (triplet) at center of gravity (cog) Combine with
1. Second STT pivot-cog virtual hit2. Interaction point
Calculate circle through three points Track Candidate
44More
STT
-
Mitg
lied
der H
elm
holtz
-Gem
eins
chaf
t
Triplet Finder Method
STT hit in pivot straw Find surrounding hits Create virtual hit (triplet) at center of gravity (cog) Combine with
1. Second STT pivot-cog virtual hit2. Interaction point
Calculate circle through three points Track Candidate
44More
STT
-
Mitg
lied
der H
elm
holtz
-Gem
eins
chaf
t
Triplet Finder Method
STT hit in pivot straw Find surrounding hits Create virtual hit (triplet) at center of gravity (cog) Combine with
1. Second STT pivot-cog virtual hit2. Interaction point
Calculate circle through three points Track Candidate
44More
STT
-
Mitg
lied
der H
elm
holtz
-Gem
eins
chaf
t
Triplet Finder Method
STT hit in pivot straw Find surrounding hits Create virtual hit (triplet) at center of gravity (cog) Combine with
1. Second STT pivot-cog virtual hit2. Interaction point
Calculate circle through three points Track Candidate
44More
STT
-
Mitg
lied
der H
elm
holtz
-Gem
eins
chaf
t
Triplet Finder Method
STT hit in pivot straw Find surrounding hits Create virtual hit (triplet) at center of gravity (cog) Combine with
1. Second STT pivot-cog virtual hit2. Interaction point
Calculate circle through three points Track Candidate
44More
Interaction Point
STT
-
Mitg
lied
der H
elm
holtz
-Gem
eins
chaf
t
Triplet Finder Optimizations
Sector Row testing Thicken track; shrink sector row layer to line Find intersection
45
11.12.2013 Slide 12 Andrew V. Adinetz
Sector-Row Testing Track
Sector-Row
Track
Sector-Row
Back
-
Mitg
lied
der H
elm
holtz
-Gem
eins
chaf
t
Triplet Finder Kernel Launch Strategies
Joined Kernel (JK): slowest High # registers low occupancy
Dynamic Parallelism (DP) / Host Streams (HS): comparable performance Performance
HS faster for small # processed hits, DP faster for > 45000 hits HS stagnates there, while DP continues rising
Limiting factor High # of required kernel calls Kernel launch latency Memcopy
HS more aected by this, because More PCI-E transfers (launch configurations for kernels) Less launch throughput, kernel launch latency gets more important False dependencies of launched kernels
Single CPU thread handles all CUDA streams (Multi-thread possible, but synchronization overhead too high for good performance)
Grid scheduling done on hardware (Grid Management Unit) (DP: software) False dependencies when N(streams) > N(device connections)=323.5
46BackBack
-
Mitg
lied
der H
elm
holtz
-Gem
eins
chaf
t
47
Triplet Finder Host Stream Connections
Preliminary(in publication)
-
Mitg
lied
der H
elm
holtz
-Gem
eins
chaf
t
48
Triplet Finder Bunch Sizes
Preliminary(in publication)
-
Berlin
Munich
Cologne
Jlich
Mitg
lied
der H
elm
holtz
-Gem
eins
chaf
t
49
Forschungszentrum Jlich & Me
Research Center *1956; Federal center
Budget: 730 Mio. USD/year 5300 employees Thereof 1700 scientists (600 PhD students)
Topics: Health, Energy, EnvironmentPhysics; SupercomputingMany large-scale facilities
Me Diploma in physics from RWTH Aachen University
(CMS experiment) PhD researcher since 2011:
GPU Online Tracking for PANDA