telescope array fluorescence detector sim on gpus | gtc 2013 · previous slide shows a 1d...
TRANSCRIPT
![Page 1: Telescope Array Fluorescence Detector Sim on GPUs | GTC 2013 · Previous slide shows a 1D description of longitudinal development. Lateral spread is described by a parameterization](https://reader033.vdocuments.mx/reader033/viewer/2022052018/603183c38407327f0d50e09d/html5/thumbnails/1.jpg)
1
The Telescope Array Fluorescence Detector Simulation on GPUs
Tareq AbuZayyadUniversity of Utah
GTC 2013San Jose, CA03/21/2013
![Page 2: Telescope Array Fluorescence Detector Sim on GPUs | GTC 2013 · Previous slide shows a 1D description of longitudinal development. Lateral spread is described by a parameterization](https://reader033.vdocuments.mx/reader033/viewer/2022052018/603183c38407327f0d50e09d/html5/thumbnails/2.jpg)
2
Overview
● A brief introduction:● What we study:
– (Ultra) High energy cosmic rays: Matter, mainly atomic nuclei, that has been accelerated to ~speed-of-light by powerful astrophysical sources. Cosmic rays impinging on the top of the atmosphere generate extensive air showers (of secondary particles)
● Who we are:– The Telescope Array collaboration: Multi-National, more than 100 people
from some 20 institutions. Me, I am a Research Assistant Professor at the University of Utah.
● How we do it:– Surface Detector / Fluorescence Telescopes. The surface array provides
the highest statistics at the highest energies. The fluorescence telescopes provide lower energy reach, calibration for the surface detector, and hybrid observation. In this talk I will not discuss the surface detector!
![Page 3: Telescope Array Fluorescence Detector Sim on GPUs | GTC 2013 · Previous slide shows a 1D description of longitudinal development. Lateral spread is described by a parameterization](https://reader033.vdocuments.mx/reader033/viewer/2022052018/603183c38407327f0d50e09d/html5/thumbnails/3.jpg)
3
Overview (cont'd)
● The Main Topic:● Fluorescence Detector (FD) Response Simulation:
– Overview: We simulate the development of an extensive air shower in the atmosphere. The light production and propagation to the detector, and the collection and processing of this light by the detector to produce an “event”.
● GPU Based Simulation:– How the simulation is done using CUDA. Design goals, code
implementation and kernel launch configuration.
● GPU Performance Analysis:– Results on accuracy and speedup are shown for a few test cases.
More details if time allows. See backup slides otherwise.
● Summary and Outlook.
![Page 4: Telescope Array Fluorescence Detector Sim on GPUs | GTC 2013 · Previous slide shows a 1D description of longitudinal development. Lateral spread is described by a parameterization](https://reader033.vdocuments.mx/reader033/viewer/2022052018/603183c38407327f0d50e09d/html5/thumbnails/4.jpg)
4
High Energy Cosmic Rays
![Page 5: Telescope Array Fluorescence Detector Sim on GPUs | GTC 2013 · Previous slide shows a 1D description of longitudinal development. Lateral spread is described by a parameterization](https://reader033.vdocuments.mx/reader033/viewer/2022052018/603183c38407327f0d50e09d/html5/thumbnails/5.jpg)
5
High Energy Cosmic Rays● Cosmic Rays (CR) flux
extends across many decades in energy.
● Mainly protons (hydrogen nucleus) and nuclei of heavier elements.
● Acceleration up to 1015 eV in supernovae. Sources of highest energy CR still unknown.
● Flux falls off rapidly. At some energy direct observation is no longer feasible.
![Page 6: Telescope Array Fluorescence Detector Sim on GPUs | GTC 2013 · Previous slide shows a 1D description of longitudinal development. Lateral spread is described by a parameterization](https://reader033.vdocuments.mx/reader033/viewer/2022052018/603183c38407327f0d50e09d/html5/thumbnails/6.jpg)
6
Indirect Observation- Extensive Air Showers
● High energy CR primary particle collides with an “air molecule” near the top of the atmosphere.
● Collision initiates a cascade of lower energy particles moving at the speed of light towards the ground.
● Shower Development is complex. Specialized MC programs are used to simulate the high energy physics processes involved.
![Page 7: Telescope Array Fluorescence Detector Sim on GPUs | GTC 2013 · Previous slide shows a 1D description of longitudinal development. Lateral spread is described by a parameterization](https://reader033.vdocuments.mx/reader033/viewer/2022052018/603183c38407327f0d50e09d/html5/thumbnails/7.jpg)
7
Indirect Observation- Extensive Air Showers
● While complex, the average shower behavior can be parameterized for use in Detector Response simulations.
● Most of the primary cosmic ray particle's energy ends up in the electromagnetic component.
● Shower electrons excite Nitrogen Molecules in the atmosphere. The molecules give back the excess energy in the form of scintillation light which can be observed by our detectors.
● Cerenkov radiation is also emitted by shower particles and depending on geometry can contribute to the observed signal.
![Page 8: Telescope Array Fluorescence Detector Sim on GPUs | GTC 2013 · Previous slide shows a 1D description of longitudinal development. Lateral spread is described by a parameterization](https://reader033.vdocuments.mx/reader033/viewer/2022052018/603183c38407327f0d50e09d/html5/thumbnails/8.jpg)
8
The Telescope Array Experiment
![Page 9: Telescope Array Fluorescence Detector Sim on GPUs | GTC 2013 · Previous slide shows a 1D description of longitudinal development. Lateral spread is described by a parameterization](https://reader033.vdocuments.mx/reader033/viewer/2022052018/603183c38407327f0d50e09d/html5/thumbnails/9.jpg)
9
Telescope Array ExperimentT Abu-Zayyad1, R Aida2, M Allen1, R Azuma3, E Barcikowski1, JW Belz1, T Benno4, DR Bergman1,
SA Blake1, O Brusova1, R Cady1, BG Cheon6, J Chiba7, M Chikawa4, EJ Cho6, LS Cho8, WR Cho8, F Cohen9,K Doura4, C Ebeling1, H Fujii10, T Fujii11, T Fukuda3, M Fukushima9,22, D Gorbunov12, W Hanlon1,
K Hayashi3, Y Hayashi11, N Hayashida9, K Hibino13, K Hiyama9, K Honda2, G Hughes5, T Iguchi3, D Ikeda9, K Ikuta2, SJJ Innemee5, N Inoue14, T Ishii2, R Ishimori3, D Ivanov5, S Iwamoto2, CCH Jui1,
K Kadota15, F Kakimoto3, O Kalashev12, T Kanbe2, H Kang16, K Kasahara17, H Kawai18, S Kawakami11, S Kawana14, E Kido9, BG Kim19, HB Kim6, JH Kim6, JH Kim20, A Kitsugi9, K Kobayashi7, H Koers21,
Y Kondo9, V Kuzmin12, YJ Kwon8, JH Lim16, SI Lim19, S Machida3, K Martens22, J Martineau1, T Matsuda10, T Matsuyama11, JN Matthews1, M Minamino11, K Miyata7, H Miyauchi11, Y Murano3, T Nakamura23,
SW Nam19, T Nonaka9, S Ogio11, M Ohnishi9, H Ohoka9, T Okuda11, A Oshima11, S Ozawa17, IH Park19, D Rodriguez1, SY Roh20, G Rubtsov12, D Ryu20, H Sagawa9, N Sakurai9, LM Scott5, PD Shah1, T Shibata9, H Shimodaira9, BK Shin6, JD Smith1, P Sokolsky1, TJ Sonley1, RW Springer1, BT Stokes5, SR Stratton5,
S Suzuki10, Y Takahashi9, M Takeda9, A Taketa9, M Takita9, Y Tameda3, H Tanaka11, K Tanaka24, M Tanaka10, JR Thomas1, SB Thomas1, GB Thomson1, P Tinyakov12,21, I Tkachev12, H Tokuno9, T Tomida2,
R Torii9, S Troitsky12, Y Tsunesada3, Y Tsuyuguchi2, Y Uchihori25, S Udo13, H Ukai2, B Van Klaveren1, Y Wada14, M Wood1, T Yamakawa9, Y Yamakawa9, H Yamaoka10, J Yang19, S Yoshida18, H Yoshii26, Z Zundel1
1University of Utah, 2University of Yamanashi, 3Tokyo Institute of Technology, 4Kinki University, 5Rutgers University, 6Hanyang University, 7Tokyo University of Science, 8Yonsei University,
9Institute for Cosmic Ray Research, University of Tokyo, 10Institute of Particle and Nuclear Studies, KEK, 11Osaka City University, 12Institute for Nuclear Research of the Russian Academy of Sciences,
13Kanagawa University, 14Saitama University, 15Tokyo City University, 16Pusan National University, 17Waseda University, 18Chiba University 19Ewha Womans University, 20Chungnam National University,
21University Libre de Bruxelles, 22University of Tokyo, 23Kochi University, 24Hiroshima City University, 25National Institute of Radiological Science, Japan, 26Ehime University
![Page 10: Telescope Array Fluorescence Detector Sim on GPUs | GTC 2013 · Previous slide shows a 1D description of longitudinal development. Lateral spread is described by a parameterization](https://reader033.vdocuments.mx/reader033/viewer/2022052018/603183c38407327f0d50e09d/html5/thumbnails/10.jpg)
10
Telescope Array Experiment
● TA is located in Millard County, Utah, ~200 km southwest of Salt Lake City.
● Surface Detector: 507 scintillation counters 1.2 km spacing. (run 24/7)
● Three Fluorescence Detectors overlooking SD (run only during moonless nights):● Middle Drum (MD)
● Black Rock (BR)
● Long Ridge (LR)
![Page 11: Telescope Array Fluorescence Detector Sim on GPUs | GTC 2013 · Previous slide shows a 1D description of longitudinal development. Lateral spread is described by a parameterization](https://reader033.vdocuments.mx/reader033/viewer/2022052018/603183c38407327f0d50e09d/html5/thumbnails/11.jpg)
1111
Long Ridge Black Rock Mesa
Middle DrumRefurbishedfrom HiRes
~30km New FDs
6.8 m2 ~1 m2
14 cameras/station256 PMTs/camera
5.2 m2
TA Fluorescence Detectors
Observation started Dec. 2007
Observation started Nov. 2007
Observation started Jun. 2007
256 PMTs/cameraHAMAMATSU R9508
FOV~15x18deg12 cameras/station
![Page 12: Telescope Array Fluorescence Detector Sim on GPUs | GTC 2013 · Previous slide shows a 1D description of longitudinal development. Lateral spread is described by a parameterization](https://reader033.vdocuments.mx/reader033/viewer/2022052018/603183c38407327f0d50e09d/html5/thumbnails/12.jpg)
12
Middle Drum TA FD Observatory Site (14 Telescopes)
![Page 13: Telescope Array Fluorescence Detector Sim on GPUs | GTC 2013 · Previous slide shows a 1D description of longitudinal development. Lateral spread is described by a parameterization](https://reader033.vdocuments.mx/reader033/viewer/2022052018/603183c38407327f0d50e09d/html5/thumbnails/13.jpg)
13
Middle Drum TA/TALE FD Observatory Site (14 + 10 Telescopes)
![Page 14: Telescope Array Fluorescence Detector Sim on GPUs | GTC 2013 · Previous slide shows a 1D description of longitudinal development. Lateral spread is described by a parameterization](https://reader033.vdocuments.mx/reader033/viewer/2022052018/603183c38407327f0d50e09d/html5/thumbnails/14.jpg)
14
Middle Drum TA/TALE Viewing Range
● TAMD + TALE
● 14 lower telescopes make up TA (Middle Drum) Detector.
● 10 higher telescope (new addition) make up the TA-Low Energy extension Detector.
![Page 15: Telescope Array Fluorescence Detector Sim on GPUs | GTC 2013 · Previous slide shows a 1D description of longitudinal development. Lateral spread is described by a parameterization](https://reader033.vdocuments.mx/reader033/viewer/2022052018/603183c38407327f0d50e09d/html5/thumbnails/15.jpg)
15
Fluorescence Detector Response Simulation
![Page 16: Telescope Array Fluorescence Detector Sim on GPUs | GTC 2013 · Previous slide shows a 1D description of longitudinal development. Lateral spread is described by a parameterization](https://reader033.vdocuments.mx/reader033/viewer/2022052018/603183c38407327f0d50e09d/html5/thumbnails/16.jpg)
16
Main components of Simulation
● Atmosphere (Medium)● Extensive Air Shower (Light Source)● Detector (Instrument)● Physics (Interactions)● Auxiliary
● RNG● Coordinates System● Event Management● I/O
![Page 17: Telescope Array Fluorescence Detector Sim on GPUs | GTC 2013 · Previous slide shows a 1D description of longitudinal development. Lateral spread is described by a parameterization](https://reader033.vdocuments.mx/reader033/viewer/2022052018/603183c38407327f0d50e09d/html5/thumbnails/17.jpg)
17
Coordinates System• Defines the origin of a Cartesian system (x, y, z),
as a point on the surface of the Earth (in geodetic coordinates).
• Provides a function to convert (x,y,z) to height (a.s.l.) using a spherical earth approximation.● Also (x,y,z) to height above ground (for aerosols
density calculation)
• Coordinate transformations to/from alternative systems.
• Single instance per simulation (a service class).
![Page 18: Telescope Array Fluorescence Detector Sim on GPUs | GTC 2013 · Previous slide shows a 1D description of longitudinal development. Lateral spread is described by a parameterization](https://reader033.vdocuments.mx/reader033/viewer/2022052018/603183c38407327f0d50e09d/html5/thumbnails/18.jpg)
18
The Atmosphere● Medium in which shower
develops and deposits it's energy, and where light production occurs.
● Single instance per simulation (a service class).
● Pressure, Density, and Temperature profiles:
● All functions of altitude.
● Also Contains an Ozone Layer and Aerosols near the ground:
● Altitude dependent as well.
![Page 19: Telescope Array Fluorescence Detector Sim on GPUs | GTC 2013 · Previous slide shows a 1D description of longitudinal development. Lateral spread is described by a parameterization](https://reader033.vdocuments.mx/reader033/viewer/2022052018/603183c38407327f0d50e09d/html5/thumbnails/19.jpg)
19
Physics• Different models (experimental measurements) of
Fluorescence yield, wavelength spectrum and shower dE/dx.● User selectable.
• Functions / data to calculate Cerenkov light production and wavelength spectrum.
• Single instance per simulation (a service class).
![Page 20: Telescope Array Fluorescence Detector Sim on GPUs | GTC 2013 · Previous slide shows a 1D description of longitudinal development. Lateral spread is described by a parameterization](https://reader033.vdocuments.mx/reader033/viewer/2022052018/603183c38407327f0d50e09d/html5/thumbnails/20.jpg)
20
Extensive Air Shower Parameters
● Primary particle energy:
● Either explicitly set or randomly sampled from a power law.● Determines total # of particles in the shower.
● Shower Geometry (Randomly Sampled):
● Shower Track can be at any distance from the detector, and can have any downward direction.
● Shower Longitudinal Development Curve (Randomly Sampled):
● Parametrized; parameters selected randomly from shower library.
● Shower Lateral extent:
● Parameterized; average shower behavior.
![Page 21: Telescope Array Fluorescence Detector Sim on GPUs | GTC 2013 · Previous slide shows a 1D description of longitudinal development. Lateral spread is described by a parameterization](https://reader033.vdocuments.mx/reader033/viewer/2022052018/603183c38407327f0d50e09d/html5/thumbnails/21.jpg)
21
EAS Parameters (cont'd)
N x =N max× x−x o
xmax− xo xmax−xo
×e
xmax− x
● Primary particle energy (E)
● Shower Geometry:
● A line segment from the top of the atmosphere, ~47km, to the ground. Rp_vec[3], ut_hat[3]
● Shower Longitudinal Profile:
● Parametrization given by a Gaisser-Hillas function with four parameters (N_max, x_0, x_max, lambda)
● x [g/cm2] is the atmospheric slant depth along the shower track.
● N(x) is the number of charged particles at depth x
● A total of eight free parameters:
● E → N_max
● ut_hat is a unit vector & is normal to Rp_vec.
![Page 22: Telescope Array Fluorescence Detector Sim on GPUs | GTC 2013 · Previous slide shows a 1D description of longitudinal development. Lateral spread is described by a parameterization](https://reader033.vdocuments.mx/reader033/viewer/2022052018/603183c38407327f0d50e09d/html5/thumbnails/22.jpg)
22
EAS Parameters (cont'd)
r =N
r 2 f s , rr M
● Real showers are 3D.
● Previous slide shows a 1D description of longitudinal development.
● Lateral spread is described by a parameterization based on NKG function:
● r radial distance.
● rM Moliere Radius (air
density dependent)● s: shower age parameter.● f: NKG function
(normalized)
![Page 23: Telescope Array Fluorescence Detector Sim on GPUs | GTC 2013 · Previous slide shows a 1D description of longitudinal development. Lateral spread is described by a parameterization](https://reader033.vdocuments.mx/reader033/viewer/2022052018/603183c38407327f0d50e09d/html5/thumbnails/23.jpg)
23
Light Production● Air Fluorescence
● Shower Particles excite Nitrogen Molecules in the atmosphere
● the molecules give back the energy in the form of UV light.
![Page 24: Telescope Array Fluorescence Detector Sim on GPUs | GTC 2013 · Previous slide shows a 1D description of longitudinal development. Lateral spread is described by a parameterization](https://reader033.vdocuments.mx/reader033/viewer/2022052018/603183c38407327f0d50e09d/html5/thumbnails/24.jpg)
24
Light Production (cont'd)
● Cerenkov Radiation.● Shower particles traveling at
speeds faster than the speed of light in air emit Cerenkov radiation.
● We observe direct Cerenkov light.
● We also see scattered light● Scattering by air molecules
● Scattering by Aerosols
![Page 25: Telescope Array Fluorescence Detector Sim on GPUs | GTC 2013 · Previous slide shows a 1D description of longitudinal development. Lateral spread is described by a parameterization](https://reader033.vdocuments.mx/reader033/viewer/2022052018/603183c38407327f0d50e09d/html5/thumbnails/25.jpg)
25
Light Propagation
● Light Scattering● Air Molecules (Rayleigh)● Aerosols (Empirical
Model)
● Ozone Absorption.● Function of:
● Wavelength● Altitude (density)
![Page 26: Telescope Array Fluorescence Detector Sim on GPUs | GTC 2013 · Previous slide shows a 1D description of longitudinal development. Lateral spread is described by a parameterization](https://reader033.vdocuments.mx/reader033/viewer/2022052018/603183c38407327f0d50e09d/html5/thumbnails/26.jpg)
26
Light Detection
● Spherical mirror collects photons and focuses them to a PMT cluster.
● Photo-Multiplier Tubes convert photons to photoelectrons (electrical signal)
![Page 27: Telescope Array Fluorescence Detector Sim on GPUs | GTC 2013 · Previous slide shows a 1D description of longitudinal development. Lateral spread is described by a parameterization](https://reader033.vdocuments.mx/reader033/viewer/2022052018/603183c38407327f0d50e09d/html5/thumbnails/27.jpg)
27
Detector Efficiency
● Mirror Reflectivity● UV Filter Transmission● PMT Quantum
efficiency● are all wavelength
dependent.
● PMT efficiency profile is also accounted for in the simulation.
![Page 28: Telescope Array Fluorescence Detector Sim on GPUs | GTC 2013 · Previous slide shows a 1D description of longitudinal development. Lateral spread is described by a parameterization](https://reader033.vdocuments.mx/reader033/viewer/2022052018/603183c38407327f0d50e09d/html5/thumbnails/28.jpg)
28
Electronics
● Signal processing for PMT trigger:● RC filter● LC filter + threshold test● Integrate pulse if trigger.
● Record data.● “Mirror Trigger Logic”
for event formation done on CPU.
![Page 29: Telescope Array Fluorescence Detector Sim on GPUs | GTC 2013 · Previous slide shows a 1D description of longitudinal development. Lateral spread is described by a parameterization](https://reader033.vdocuments.mx/reader033/viewer/2022052018/603183c38407327f0d50e09d/html5/thumbnails/29.jpg)
29
GPU Based Simulation
![Page 30: Telescope Array Fluorescence Detector Sim on GPUs | GTC 2013 · Previous slide shows a 1D description of longitudinal development. Lateral spread is described by a parameterization](https://reader033.vdocuments.mx/reader033/viewer/2022052018/603183c38407327f0d50e09d/html5/thumbnails/30.jpg)
30
Opportunities for Parallelism
● Simulate a large number O(106) of independent events.● 1 GB of GPU card memory can accommodate ~500 showers
simultaneously. (when using floats)
● A shower is treated as an extended light source.● divided into ~1000 point sources.● 32 wavelength bins.
● Ray-trace collected photons independently.● 256 PMT's (pixels) per telescope; multiple telescopes per event.● Signal processing: ~10000 time bins per pixel
![Page 31: Telescope Array Fluorescence Detector Sim on GPUs | GTC 2013 · Previous slide shows a 1D description of longitudinal development. Lateral spread is described by a parameterization](https://reader033.vdocuments.mx/reader033/viewer/2022052018/603183c38407327f0d50e09d/html5/thumbnails/31.jpg)
31
Design Goals and Constraints● CUDA Capable GPU may or may not be available on a given system, so CUDA support has to be optional.● To insure the program produces the same results with or without acceleration, as much code as possible is written to run on either: using __host__ __device__ qualifier.
● This is needed most if/when someone updates the code in the future; … chances are they will update one path and not the other! (speaking from experience)
● To get best performance on GeForce GPU's we also require that as much as code as possible is written using C++ templates to support both single and double precision calculations.
![Page 32: Telescope Array Fluorescence Detector Sim on GPUs | GTC 2013 · Previous slide shows a 1D description of longitudinal development. Lateral spread is described by a parameterization](https://reader033.vdocuments.mx/reader033/viewer/2022052018/603183c38407327f0d50e09d/html5/thumbnails/32.jpg)
32
Basic Code structure
• Can compile on a machine with/without CUDA environment.
• Can compile to run with/without acceleration.
• Essentially all code can run on CPU and GPU.
• As float or double.
• Aside: Can also build a Root dictionary and use interactively in a root session(http://root.cern.ch/)
![Page 33: Telescope Array Fluorescence Detector Sim on GPUs | GTC 2013 · Previous slide shows a 1D description of longitudinal development. Lateral spread is described by a parameterization](https://reader033.vdocuments.mx/reader033/viewer/2022052018/603183c38407327f0d50e09d/html5/thumbnails/33.jpg)
33
Hide Separate Code Path
• Create wrapper functions for cases where CPU/GPU require different functions:
• CPU code calls as random_uniform()
• GPU code calls as random_uniform(x,a)
• x, a unique per thread
![Page 34: Telescope Array Fluorescence Detector Sim on GPUs | GTC 2013 · Previous slide shows a 1D description of longitudinal development. Lateral spread is described by a parameterization](https://reader033.vdocuments.mx/reader033/viewer/2022052018/603183c38407327f0d50e09d/html5/thumbnails/34.jpg)
34
Service Classes
• Simulation Parameters:● Coordinates● Atmosphere● Physics● Detector
• Single initialization point (cpu double); copy to cpu float, gpu float / double.
• Accessed through a common interface with a dummy argument to resolve data type.
![Page 35: Telescope Array Fluorescence Detector Sim on GPUs | GTC 2013 · Previous slide shows a 1D description of longitudinal development. Lateral spread is described by a parameterization](https://reader033.vdocuments.mx/reader033/viewer/2022052018/603183c38407327f0d50e09d/html5/thumbnails/35.jpg)
35
Memory Layout
● PMT signal processing and trigger logic has to be done in serial.
● To achieve efficient memory access pattern on the gpu:● Reverse memory layout for 2D
arrays used for electronics signal processing
● Use macro to select the appropriate class in __host__ __device__ code.
![Page 36: Telescope Array Fluorescence Detector Sim on GPUs | GTC 2013 · Previous slide shows a 1D description of longitudinal development. Lateral spread is described by a parameterization](https://reader033.vdocuments.mx/reader033/viewer/2022052018/603183c38407327f0d50e09d/html5/thumbnails/36.jpg)
36
Kernel Launch Configuration
• Table shows kernel launch configuration on Fermi card with 1GB of memory.
![Page 37: Telescope Array Fluorescence Detector Sim on GPUs | GTC 2013 · Previous slide shows a 1D description of longitudinal development. Lateral spread is described by a parameterization](https://reader033.vdocuments.mx/reader033/viewer/2022052018/603183c38407327f0d50e09d/html5/thumbnails/37.jpg)
37
Kernel Launch Configuration
• All kernels were launched using a similar setup.
• All prefer L1 Cache over User managed Shared Memory.
![Page 38: Telescope Array Fluorescence Detector Sim on GPUs | GTC 2013 · Previous slide shows a 1D description of longitudinal development. Lateral spread is described by a parameterization](https://reader033.vdocuments.mx/reader033/viewer/2022052018/603183c38407327f0d50e09d/html5/thumbnails/38.jpg)
38
GPU Performance Analysis
![Page 39: Telescope Array Fluorescence Detector Sim on GPUs | GTC 2013 · Previous slide shows a 1D description of longitudinal development. Lateral spread is described by a parameterization](https://reader033.vdocuments.mx/reader033/viewer/2022052018/603183c38407327f0d50e09d/html5/thumbnails/39.jpg)
39
Performance comparison – running time
E (eV) 100 evt (cpu)t in sec.
100 evt (gpu) t in sec.
1000 evt (cpu) t in sec.
1000 evt (gpu) t in sec.
1017 13.91 0.759 (18x) 140.97 2.860 (49x)
1018 18.76 0.875 (21x) 164.59 3.348 (49x)
1019 24.64 1.006 (24x) 196.99 4.230 (47x)
1020 40.72 1.297 (31x) 364.44 6.834 (53x)
• Desktop CPU vs. Desktop GPU
![Page 40: Telescope Array Fluorescence Detector Sim on GPUs | GTC 2013 · Previous slide shows a 1D description of longitudinal development. Lateral spread is described by a parameterization](https://reader033.vdocuments.mx/reader033/viewer/2022052018/603183c38407327f0d50e09d/html5/thumbnails/40.jpg)
40
Nvidia Virtual Profiler results
• Fraction of time spent in each function.
![Page 41: Telescope Array Fluorescence Detector Sim on GPUs | GTC 2013 · Previous slide shows a 1D description of longitudinal development. Lateral spread is described by a parameterization](https://reader033.vdocuments.mx/reader033/viewer/2022052018/603183c38407327f0d50e09d/html5/thumbnails/41.jpg)
41
Nvidia Virtual Profiler results
![Page 42: Telescope Array Fluorescence Detector Sim on GPUs | GTC 2013 · Previous slide shows a 1D description of longitudinal development. Lateral spread is described by a parameterization](https://reader033.vdocuments.mx/reader033/viewer/2022052018/603183c38407327f0d50e09d/html5/thumbnails/42.jpg)
42
Nvidia Virtual Profiler results
![Page 43: Telescope Array Fluorescence Detector Sim on GPUs | GTC 2013 · Previous slide shows a 1D description of longitudinal development. Lateral spread is described by a parameterization](https://reader033.vdocuments.mx/reader033/viewer/2022052018/603183c38407327f0d50e09d/html5/thumbnails/43.jpg)
43
Nvidia Virtual Profiler results
![Page 44: Telescope Array Fluorescence Detector Sim on GPUs | GTC 2013 · Previous slide shows a 1D description of longitudinal development. Lateral spread is described by a parameterization](https://reader033.vdocuments.mx/reader033/viewer/2022052018/603183c38407327f0d50e09d/html5/thumbnails/44.jpg)
44
Summary and Outlook
● An entire software suite used for the detector simulation of a real astrophysics experiment is ported to CUDA.
● Verified float precision is adequate for the MC● GPU provides >10x performance of a quad core
CPU.● Next is to do event reconstruction, inverse Monte
Carlo, on the GPU. This would be big!● Need to implement minimization routine on GPU.
● Looking forward for running on Kepler as well.
![Page 45: Telescope Array Fluorescence Detector Sim on GPUs | GTC 2013 · Previous slide shows a 1D description of longitudinal development. Lateral spread is described by a parameterization](https://reader033.vdocuments.mx/reader033/viewer/2022052018/603183c38407327f0d50e09d/html5/thumbnails/45.jpg)
45
Backup Slides
![Page 46: Telescope Array Fluorescence Detector Sim on GPUs | GTC 2013 · Previous slide shows a 1D description of longitudinal development. Lateral spread is described by a parameterization](https://reader033.vdocuments.mx/reader033/viewer/2022052018/603183c38407327f0d50e09d/html5/thumbnails/46.jpg)
46
Cuda 5.0
• Unfortunately:● Code would not compile under 5.0! cicc hangs.● After some testing I isolated the problem:
– A function with a large “const array”● Code compiles, but I still need to implement a work-
around for offending function.
• Did you submit a bug report?● I had every intention to, but got sidetracked. Sorry!
• Utah CHPC just added a node with 2 Kepler GPU's, will be testing soon.
![Page 47: Telescope Array Fluorescence Detector Sim on GPUs | GTC 2013 · Previous slide shows a 1D description of longitudinal development. Lateral spread is described by a parameterization](https://reader033.vdocuments.mx/reader033/viewer/2022052018/603183c38407327f0d50e09d/html5/thumbnails/47.jpg)
47
floats vs. double precision
● NOTE: in the following comparisons I mention the function xyz2h()
● This function calculates the height of a point in (x,y,z) coordinates above sea level. The calculation involves the “Earth Radius”:
● RE >> z causes the calculation to be inaccurate
when using floats.
![Page 48: Telescope Array Fluorescence Detector Sim on GPUs | GTC 2013 · Previous slide shows a 1D description of longitudinal development. Lateral spread is described by a parameterization](https://reader033.vdocuments.mx/reader033/viewer/2022052018/603183c38407327f0d50e09d/html5/thumbnails/48.jpg)
48
Quick Test Set
• 150 Thrown showers: 1017 eV proton.
• Run once without GPU acceleration, using double precision.
• Run once with GPU acceleration, using single precision math on the GPU.
• Write out intermediate calculation results for comparison.
• Plot results
![Page 49: Telescope Array Fluorescence Detector Sim on GPUs | GTC 2013 · Previous slide shows a 1D description of longitudinal development. Lateral spread is described by a parameterization](https://reader033.vdocuments.mx/reader033/viewer/2022052018/603183c38407327f0d50e09d/html5/thumbnails/49.jpg)
49
Shower Simulation Step 1: Generate track segments
• Subdivide shower track to 832 segments; (~1-2 g/cm2.
• For each segment calculate:● Height at center of segment.● Atmospheric Density.● Segment length (meters & g/cm2)● Slant depth at center of segment (g/cm2)● ...
• Comparison follows.
![Page 50: Telescope Array Fluorescence Detector Sim on GPUs | GTC 2013 · Previous slide shows a 1D description of longitudinal development. Lateral spread is described by a parameterization](https://reader033.vdocuments.mx/reader033/viewer/2022052018/603183c38407327f0d50e09d/html5/thumbnails/50.jpg)
50
Track Segments (All floats)
• Each data point corresponds to a track segment in one of the simulated showers which was in MD FOV.
• Height
• Density
• Segment Length
• Slant Depth
![Page 51: Telescope Array Fluorescence Detector Sim on GPUs | GTC 2013 · Previous slide shows a 1D description of longitudinal development. Lateral spread is described by a parameterization](https://reader033.vdocuments.mx/reader033/viewer/2022052018/603183c38407327f0d50e09d/html5/thumbnails/51.jpg)
51
Track Segments (double xyz2h)
• Each data point corresponds to a track segment in one of the simulated showers which was in MD FOV.
• Height
• Density
• Segment Length
• Slant Depth
![Page 52: Telescope Array Fluorescence Detector Sim on GPUs | GTC 2013 · Previous slide shows a 1D description of longitudinal development. Lateral spread is described by a parameterization](https://reader033.vdocuments.mx/reader033/viewer/2022052018/603183c38407327f0d50e09d/html5/thumbnails/52.jpg)
52
Track Segments (double xyz2h)• Error on
Slant Depth Calculation (floats) vs Slant Depth calculated with double precision.
![Page 53: Telescope Array Fluorescence Detector Sim on GPUs | GTC 2013 · Previous slide shows a 1D description of longitudinal development. Lateral spread is described by a parameterization](https://reader033.vdocuments.mx/reader033/viewer/2022052018/603183c38407327f0d50e09d/html5/thumbnails/53.jpg)
53
Track Segments (Kahn Summation)
• Error on Slant Depth Calculation (floats) after use of Kahn Summation to sum up individual track segments length.
![Page 54: Telescope Array Fluorescence Detector Sim on GPUs | GTC 2013 · Previous slide shows a 1D description of longitudinal development. Lateral spread is described by a parameterization](https://reader033.vdocuments.mx/reader033/viewer/2022052018/603183c38407327f0d50e09d/html5/thumbnails/54.jpg)
54
Track Segments (Kahn Summation)
• Error on Slant Depth Calculation (floats) after use of Kahn Summation to sum up individual track segments length.
![Page 55: Telescope Array Fluorescence Detector Sim on GPUs | GTC 2013 · Previous slide shows a 1D description of longitudinal development. Lateral spread is described by a parameterization](https://reader033.vdocuments.mx/reader033/viewer/2022052018/603183c38407327f0d50e09d/html5/thumbnails/55.jpg)
55
Shower Simulation Step 2: Generate shower profile
• For each track segment calculate:
● Shower Size Ne(x).
● Shower age.● Moliere radius. (used for lateral spread calculation)● Total amount of Fluorescence light production.● Total amount of Cerenkov light production.
• Comparison follows.
![Page 56: Telescope Array Fluorescence Detector Sim on GPUs | GTC 2013 · Previous slide shows a 1D description of longitudinal development. Lateral spread is described by a parameterization](https://reader033.vdocuments.mx/reader033/viewer/2022052018/603183c38407327f0d50e09d/html5/thumbnails/56.jpg)
56
Shower profile
• One data point for each track segment:● Shower size● Shower age● Moliere Radius
![Page 57: Telescope Array Fluorescence Detector Sim on GPUs | GTC 2013 · Previous slide shows a 1D description of longitudinal development. Lateral spread is described by a parameterization](https://reader033.vdocuments.mx/reader033/viewer/2022052018/603183c38407327f0d50e09d/html5/thumbnails/57.jpg)
57
Shower profile (cont'd)
• One data point for each track segment:● Total
Fluorescence ● Total
Cerenkov
![Page 58: Telescope Array Fluorescence Detector Sim on GPUs | GTC 2013 · Previous slide shows a 1D description of longitudinal development. Lateral spread is described by a parameterization](https://reader033.vdocuments.mx/reader033/viewer/2022052018/603183c38407327f0d50e09d/html5/thumbnails/58.jpg)
58
Shower Simulation Step 3: Generate shower photons
• For each track segment calculate:● Total amount of Fluorescence light; divided into
wavelength bins … currently using FLASH.● Total amount of Cerenkov light; divided into
wavelength bins.● Rayleigh & Aerosols scattered Cerenkov.● Total Cerenkov (Accumulates along shower
development)
• Comparison follows.
![Page 59: Telescope Array Fluorescence Detector Sim on GPUs | GTC 2013 · Previous slide shows a 1D description of longitudinal development. Lateral spread is described by a parameterization](https://reader033.vdocuments.mx/reader033/viewer/2022052018/603183c38407327f0d50e09d/html5/thumbnails/59.jpg)
59
Shower photons (Cerenkov)
• One data point for each track segment (summed over wavelength):● Accumulated C'kov
● Rayleigh scattered
● Aerosols scattered
● Ozone absorption
![Page 60: Telescope Array Fluorescence Detector Sim on GPUs | GTC 2013 · Previous slide shows a 1D description of longitudinal development. Lateral spread is described by a parameterization](https://reader033.vdocuments.mx/reader033/viewer/2022052018/603183c38407327f0d50e09d/html5/thumbnails/60.jpg)
60
Shower Simulation Step 4: Generate Mirror photons (pe's)
• Each event in the FOV of the detector can be seen by one or more mirrors (telescopes).
• A telescope will see a portion of the shower (A subset of the shower track segments).
• For each track segment (in telescope FOV) calculate:● Total amount of Fluorescence pe's (flux x mirror_area)● Total amount of Cerenkov pe's including
– Rayleigh & Aerosols scattered Cerenkov.
– Direct Cerenkov.
• Note: This includes atmospheric transmission and telescope response at relevant wavelengths.
• Comparison follows.
![Page 61: Telescope Array Fluorescence Detector Sim on GPUs | GTC 2013 · Previous slide shows a 1D description of longitudinal development. Lateral spread is described by a parameterization](https://reader033.vdocuments.mx/reader033/viewer/2022052018/603183c38407327f0d50e09d/html5/thumbnails/61.jpg)
61
Mirror photo-electrons
• One data point for each track segment:● Fluorescence
pe's
● Direct Cerenkov
● Rayleigh Scat. Ckov.
● Aerosols Scat. Ckov.
![Page 62: Telescope Array Fluorescence Detector Sim on GPUs | GTC 2013 · Previous slide shows a 1D description of longitudinal development. Lateral spread is described by a parameterization](https://reader033.vdocuments.mx/reader033/viewer/2022052018/603183c38407327f0d50e09d/html5/thumbnails/62.jpg)
62
Mirror photo-electrons
• One data point for each track segment:● Total # pe's
• End of Physics.
![Page 63: Telescope Array Fluorescence Detector Sim on GPUs | GTC 2013 · Previous slide shows a 1D description of longitudinal development. Lateral spread is described by a parameterization](https://reader033.vdocuments.mx/reader033/viewer/2022052018/603183c38407327f0d50e09d/html5/thumbnails/63.jpg)
63
Mirror photo-electrons
• One data point for each track segment:● Total # pe's
• Largest difference at slant depth ~<200 g/cm2
![Page 64: Telescope Array Fluorescence Detector Sim on GPUs | GTC 2013 · Previous slide shows a 1D description of longitudinal development. Lateral spread is described by a parameterization](https://reader033.vdocuments.mx/reader033/viewer/2022052018/603183c38407327f0d50e09d/html5/thumbnails/64.jpg)
64
Mirror photo-electrons
• How much error can we tolerate?● Next step in the simulation is ray-tracing. The
variable “ctotal” is used as the mean of a Poisson distribution:
● int raytrace_npe = random_poisson(ctotal);
• As long as we are within 0.1% (10-3), I highly doubt we can tell the difference.
• We get O(10-5)
![Page 65: Telescope Array Fluorescence Detector Sim on GPUs | GTC 2013 · Previous slide shows a 1D description of longitudinal development. Lateral spread is described by a parameterization](https://reader033.vdocuments.mx/reader033/viewer/2022052018/603183c38407327f0d50e09d/html5/thumbnails/65.jpg)
65
Mirror photo-electrons
• Needed for ray tracing:
• transformed coordinates:● rvec_x,
rvec_y, rvec_z
● Angle made with mirror axis.
![Page 66: Telescope Array Fluorescence Detector Sim on GPUs | GTC 2013 · Previous slide shows a 1D description of longitudinal development. Lateral spread is described by a parameterization](https://reader033.vdocuments.mx/reader033/viewer/2022052018/603183c38407327f0d50e09d/html5/thumbnails/66.jpg)
66
More comparisons ...
● Still need to verify● Ray tracing accuracy, including sampling of shower lateral spread.● Random number generator works as expected.● Electronics simulation is correct.
● Harder to do because of the use of different random number generators and therefore different random sequences.
● Can be done by using statistical averages.
● The end result of the simulation (triggered event data) has been checked and the cpu/gpu are in excellent agreement. Still, would be nice to have the Individual checks listed above.