enabling the next generation of particle physics experiments: gpus for online track reconstruction

150
Mitglied der Helmholtz-Gemeinschaft 1 GPU Technology Conference 2014 26 March 2014, Andreas Herten (Institute for Nuclear Physics, Forschungszentrum Jülich, Germany) Enabling the Next Generation of Particle Physics Experiments: GPUs for Online Track Reconstruction

Upload: andiherten

Post on 25-Nov-2015

10 views

Category:

Documents


0 download

DESCRIPTION

Status of the work for my PhD I presented at GPU Technology Conference.

TRANSCRIPT

  • Mitg

    lied

    der H

    elm

    holtz

    -Gem

    eins

    chaf

    t

    1

    GPU Technology Conference 201426 March 2014, Andreas Herten (Institute for Nuclear Physics, Forschungszentrum Jlich, Germany)

    Enabling the Next Generation ofParticle Physics Experiments:GPUs for Online Track Reconstruction

  • Mitg

    lied

    der H

    elm

    holtz

    -Gem

    eins

    chaf

    t

    Outline

    High Energy Physics PANDA Experiment Particle Tracking GPUs at PANDA Algorithms Hough Transform Riemann Track Finder Triplet Finder

    2

  • Mitg

    lied

    der H

    elm

    holtz

    -Gem

    eins

    chaf

    t

    HEPHigh Energy Physics

    3

  • Mitg

    lied

    der H

    elm

    holtz

    -Gem

    eins

    chaf

    t

    High Energy Physics

    High Energy Physics (HEP) in a nutshell:

    4

    HEP Recipe1. Accelerate particles (e, p,)2. Accelerate particles more!3. Smash into each other4. Look at resulting particles5. Understand world

  • Mitg

    lied

    der H

    elm

    holtz

    -Gem

    eins

    chaf

    t

    High Energy Physics

    High Energy Physics (HEP) in a nutshell:

    4

    HEP Recipe1. Accelerate particles (e, p,)2. Accelerate particles more!3. Smash into each other4. Look at resulting particles5. Understand world

  • Mitg

    lied

    der H

    elm

    holtz

    -Gem

    eins

    chaf

    t

    High Energy Physics

    High Energy Physics (HEP) in a nutshell:

    4

    HEP Recipe1. Accelerate particles (e, p,)2. Accelerate particles more!3. Smash into each other4. Look at resulting particles5. Understand world

  • Mitg

    lied

    der H

    elm

    holtz

    -Gem

    eins

    chaf

    t

    High Energy Physics

    High Energy Physics (HEP) in a nutshell:

    4

    HEP Recipe1. Accelerate particles (e, p,)2. Accelerate particles more!3. Smash into each other4. Look at resulting particles5. Understand world

  • Mitg

    lied

    der H

    elm

    holtz

    -Gem

    eins

    chaf

    t

    High Energy Physics

    High Energy Physics (HEP) in a nutshell:

    4

    HEP Recipe1. Accelerate particles (e, p,)2. Accelerate particles more!3. Smash into each other4. Look at resulting particles5. Understand world

    E=mc2

  • Mitg

    lied

    der H

    elm

    holtz

    -Gem

    eins

    chaf

    t

    High Energy Physics

    High Energy Physics (HEP) in a nutshell:

    4

    HEP Recipe1. Accelerate particles (e, p,)2. Accelerate particles more!3. Smash into each other4. Look at resulting particles5. Understand world

    E=mc2

  • Mitg

    lied

    der H

    elm

    holtz

    -Gem

    eins

    chaf

    t

    High Energy Physics

    High Energy Physics (HEP) in a nutshell:

    4

    HEP Recipe1. Accelerate particles (e, p,)2. Accelerate particles more!3. Smash into each other4. Look at resulting particles5. Understand world

    E=mc2

  • Mitg

    lied

    der H

    elm

    holtz

    -Gem

    eins

    chaf

    t

    High Energy Physics

    High Energy Physics (HEP) in a nutshell:

    4

    HEP Recipe1. Accelerate particles (e, p,)2. Accelerate particles more!3. Smash into each other4. Look at resulting particles5. Understand world

    GPUs are interesting for HEP Many events due to high collision rate Events independent, dividable into subsets Many features extractable (computational intensive)

    E=mc2

  • Mitg

    lied

    der H

    elm

    holtz

    -Gem

    eins

    chaf

    t

    PANDA

    5

  • Mitg

    lied

    der H

    elm

    holtz

    -Gem

    eins

    chaf

    t

    PANDA FAIR

    Anti Proton Annihilation at Darmstadt

    6

  • Mitg

    lied

    der H

    elm

    holtz

    -Gem

    eins

    chaf

    t

    PANDA FAIR

    Anti Proton Annihilation at Darmstadt FAIR: Facility for Antiproton and Ion Research Accelerator complex at GSI Darmstadt Currently under construction

    6

  • Mitg

    lied

    der H

    elm

    holtz

    -Gem

    eins

    chaf

    t

    PANDA FAIR

    Anti Proton Annihilation at Darmstadt FAIR: Facility for Antiproton and Ion Research Accelerator complex at GSI Darmstadt Currently under construction

    6

  • Mitg

    lied

    der H

    elm

    holtz

    -Gem

    eins

    chaf

    t

    PANDA The Experiment

    7

    13 m (43 ft)

  • Mitg

    lied

    der H

    elm

    holtz

    -Gem

    eins

    chaf

    t

    PANDA The Experiment

    7

    13 m (43 ft)

    p

    p

  • Mitg

    lied

    der H

    elm

    holtz

    -Gem

    eins

    chaf

    t

    PANDA The Experiment

    7

    13 m (43 ft)

    p

    p

  • Mitg

    lied

    der H

    elm

    holtz

    -Gem

    eins

    chaf

    t

    PANDA The Experiment

    7

    13 m (43 ft)

    p

    p

    Magnet

    STT

    MVD

  • Mitg

    lied

    der H

    elm

    holtz

    -Gem

    eins

    chaf

    t

    PANDA Event Reconstruction

    Continuous read out Background & signal similar Novel feature

    Event Rate: 2 107/s

    8

    Raw Data Rate:200 GB/s

    Disk Storage Space forOine Analysis: 2 PB/y

    Reduce by~1/1000(Reject background events,save interesting physics events)

  • Mitg

    lied

    der H

    elm

    holtz

    -Gem

    eins

    chaf

    t

    PANDA Event Reconstruction

    Continuous read out Background & signal similar Novel feature

    Event Rate: 2 107/s

    8

    Raw Data Rate:200 GB/s

    Disk Storage Space forOine Analysis: 2 PB/y

    Reduce by~1/1000(Reject background events,save interesting physics events)

    GPUs

  • Mitg

    lied

    der H

    elm

    holtz

    -Gem

    eins

    chaf

    t

    9

    PANDA Online Tracking Example

    pp + -

    The physics side:Antiproton-proton event

    e+e-

  • Mitg

    lied

    der H

    elm

    holtz

    -Gem

    eins

    chaf

    t

    9

    PANDA Online Tracking Example

    pp + -

    The physics side:Antiproton-proton event

    e+e-

  • Mitg

    lied

    der H

    elm

    holtz

    -Gem

    eins

    chaf

    t

    9

    PANDA Online Tracking Example

    pp + -

    The physics side:Antiproton-proton event

    e+e-

  • Mitg

    lied

    der H

    elm

    holtz

    -Gem

    eins

    chaf

    t

    9

    PANDA Online Tracking Example

    pp + -

    The physics side:Antiproton-proton event

    e+e-

  • Mitg

    lied

    der H

    elm

    holtz

    -Gem

    eins

    chaf

    t

    9

    PANDA Online Tracking Example

    +

    -

    e+

    e-

    pp + -

    The physics side:Antiproton-proton event

    e+e-

  • Mitg

    lied

    der H

    elm

    holtz

    -Gem

    eins

    chaf

    t

    9

    PANDA Online Tracking Example

    +

    -

    e+

    e-

    pp + -

    The physics side:Antiproton-proton event

    e+e-

  • Mitg

    lied

    der H

    elm

    holtz

    -Gem

    eins

    chaf

    t

    9

    PANDA Online Tracking Example

    +

    -

    e+

    e-

    pp + -

    The physics side:Antiproton-proton event

    e+e-

  • Mitg

    lied

    der H

    elm

    holtz

    -Gem

    eins

    chaf

    t

    9

    PANDA Online Tracking Example

    pp + -

    The physics side:Antiproton-proton event

    e+e-

  • Mitg

    lied

    der H

    elm

    holtz

    -Gem

    eins

    chaf

    t

    10

    PANDA Online Tracking ExampleThe detector sideEverything in reverse

  • Mitg

    lied

    der H

    elm

    holtz

    -Gem

    eins

    chaf

    t

    10

    PANDA Online Tracking ExampleThe detector sideEverything in reverse

  • Mitg

    lied

    der H

    elm

    holtz

    -Gem

    eins

    chaf

    t

    10

    PANDA Online Tracking ExampleThe detector sideEverything in reverse

    Particle tracks are curves*

  • Mitg

    lied

    der H

    elm

    holtz

    -Gem

    eins

    chaf

    t

    10

    PANDA Online Tracking ExampleThe detector sideEverything in reverse

    Particle tracks are curves*act

    ually: 3

    D helic

    es

  • Mitg

    lied

    der H

    elm

    holtz

    -Gem

    eins

    chaf

    t

    10

    PANDA Online Tracking ExampleThe detector sideEverything in reverse

    Particle tracks are curves* Find curves connecting hit points!

  • Mitg

    lied

    der H

    elm

    holtz

    -Gem

    eins

    chaf

    t

    10

    PANDA Online Tracking ExampleThe detector sideEverything in reverse

    Particle tracks are curves* Find curves connecting hit points!

  • Mitg

    lied

    der H

    elm

    holtz

    -Gem

    eins

    chaf

    t

    10

    PANDA Online Tracking ExampleThe detector sideEverything in reverse

    Particle tracks are curves* Find curves connecting hit points!

    Sort by track quality

    Hits well matched?How many hits?

  • Mitg

    lied

    der H

    elm

    holtz

    -Gem

    eins

    chaf

    t

    10

    PANDA Online Tracking ExampleThe detector sideEverything in reverse

    Particle tracks are curves* Find curves connecting hit points!

    Sort by track quality

    Hits well matched?How many hits?

  • Mitg

    lied

    der H

    elm

    holtz

    -Gem

    eins

    chaf

    t

    10

    PANDA Online Tracking ExampleThe detector sideEverything in reverse

    Particle tracks are curves* Find curves connecting hit points!

    Sort by track quality

    Hits well matched?How many hits?

    Identify finalparticles

    Curvature, length

  • Mitg

    lied

    der H

    elm

    holtz

    -Gem

    eins

    chaf

    t

    10

    PANDA Online Tracking ExampleThe detector sideEverything in reverse

    Particle tracks are curves* Find curves connecting hit points!

    Sort by track quality

    Hits well matched?How many hits?

    Identify finalparticles

    Curvature, length

    +

    -

    e+

    e-

    ?

  • Mitg

    lied

    der H

    elm

    holtz

    -Gem

    eins

    chaf

    t

    10

    PANDA Online Tracking ExampleThe detector sideEverything in reverse

    Particle tracks are curves* Find curves connecting hit points!

    Sort by track quality

    Hits well matched?How many hits?

    Identify finalparticles

    Curvature, length

    Identify intermediateparticles

    Mass constraintsGeometry

    +

    -

    e+

    e-

    ?

  • Mitg

    lied

    der H

    elm

    holtz

    -Gem

    eins

    chaf

    t

    10

    PANDA Online Tracking ExampleThe detector sideEverything in reverse

    Particle tracks are curves* Find curves connecting hit points!

    Sort by track quality

    Hits well matched?How many hits?

    Identify finalparticles

    Curvature, length

    Identify intermediateparticles

    Mass constraintsGeometry

    Identify process:pp e+e- + -

    +

    -

    e+

    e-

    ?

  • Mitg

    lied

    der H

    elm

    holtz

    -Gem

    eins

    chaf

    t

    11

    PANDA Triggering

  • Mitg

    lied

    der H

    elm

    holtz

    -Gem

    eins

    chaf

    t

    11

    PANDA TriggeringTrigger

  • Mitg

    lied

    der H

    elm

    holtz

    -Gem

    eins

    chaf

    t

    11

    PANDA TriggeringTrigger

    Fast detector layer(s)Trigger data acquisition

  • Mitg

    lied

    der H

    elm

    holtz

    -Gem

    eins

    chaf

    t

    11

    PANDA TriggeringTrigger

    Fast detector layer(s)Trigger data acquisition

    +

    -

    e+

    e-

  • Mitg

    lied

    der H

    elm

    holtz

    -Gem

    eins

    chaf

    t

    11

    PANDA TriggeringTrigger

    Fast detector layer(s)Trigger data acquisition

    +

    -

    e+

    e-

    Usual HEP experiment

  • Mitg

    lied

    der H

    elm

    holtz

    -Gem

    eins

    chaf

    t

    11

    PANDA TriggeringTrigger

    Fast detector layer(s)Trigger data acquisition

    +

    -

    e+

    e-

    Usual HEP experiment

  • Mitg

    lied

    der H

    elm

    holtz

    -Gem

    eins

    chaf

    t

    11

    PANDA TriggeringTrigger

    Fast detector layer(s)Trigger data acquisition

    +

    -

    e+

    e-

    Usual HEP experiment

  • Mitg

    lied

    der H

    elm

    holtz

    -Gem

    eins

    chaf

    t

    11

    PANDA TriggeringTrigger

    Fast detector layer(s)Trigger data acquisition

    +

    -

    e+

    e-

    Usual HEP experiment

  • Mitg

    lied

    der H

    elm

    holtz

    -Gem

    eins

    chaf

    t

    11

    PANDA TriggeringTrigger

    Fast detector layer(s)Trigger data acquisition

    +

    -

    e+

    e-

    PANDA

    Usual HEP experiment

  • Mitg

    lied

    der H

    elm

    holtz

    -Gem

    eins

    chaf

    t

    11

    PANDA TriggeringTrigger

    Fast detector layer(s)Trigger data acquisition

    +

    -

    e+

    e-

    PANDA

    Usual HEP experiment

  • Mitg

    lied

    der H

    elm

    holtz

    -Gem

    eins

    chaf

    t

    11

    PANDA TriggeringTrigger

    Fast detector layer(s)Trigger data acquisition

    +

    -

    e+

    e-

    PANDA

    Usual HEP experiment

  • Mitg

    lied

    der H

    elm

    holtz

    -Gem

    eins

    chaf

    t

    11

    PANDA TriggeringTrigger

    Fast detector layer(s)Trigger data acquisition

    +

    -

    e+

    e-

    PANDA

    Usual HEP experiment

  • Mitg

    lied

    der H

    elm

    holtz

    -Gem

    eins

    chaf

    t

    11

    PANDA TriggeringTrigger

    Fast detector layer(s)Trigger data acquisition

    Online Trac

    king!

    +

    -

    e+

    e-

    PANDA

    Usual HEP experiment

  • Mitg

    lied

    der H

    elm

    holtz

    -Gem

    eins

    chaf

    t

    GPUS AT PANDA

    12

  • Mitg

    lied

    der H

    elm

    holtz

    -Gem

    eins

    chaf

    t

    GPUs @PANDA Online Tracking

    Port tracking algorithms to GPU Serial parallel C++ CUDA

    Investigate suitability for online performance But also: Find & invent tracking algorithms Under investigation: Hough Transformation Riemann Track Finder Triplet Finder

    13

  • Mitg

    lied

    der H

    elm

    holtz

    -Gem

    eins

    chaf

    t

    ALGORITHMS #1

    14

    Hough TransformRiemann Track Finder

    Triplet Finder

  • Mitg

    lied

    der H

    elm

    holtz

    -Gem

    eins

    chaf

    t

    Hough Transform

    Established method for edge detection in images(from 1970s HEP experiments!) New challenges for

    particle tracking algorithm Only limited pixels per edge

    Easily parallelizable method

    15

    Original algorithm byHough, adapted by

    Duda & Hart

  • Mitg

    lied

    der H

    elm

    holtz

    -Gem

    eins

    chaf

    t

    Hough Transform Method

    Idea: Transform (x,y)i (,r)ij, find lines via (,r) space Solve rij line equation for Many hits (x,y)i Many j [0,360) each

    Fill histogram Extract track parameters

    16

    x

    y

    x

    y

    Mitg

    lied

    der H

    elm

    holtz

    -Gem

    eins

    chaf

    t

    Hough Transform Principle

    43

    Bin with highest multiplicity gives track parameters

    r

    rij = cosj xi + sinj yi + i

    More

  • Mitg

    lied

    der H

    elm

    holtz

    -Gem

    eins

    chaf

    t

    Hough Transform Method

    Idea: Transform (x,y)i (,r)ij, find lines via (,r) space Solve rij line equation for Many hits (x,y)i Many j [0,360) each

    Fill histogram Extract track parameters

    16

    x

    y

    x

    y

    Mitg

    lied

    der H

    elm

    holtz

    -Gem

    eins

    chaf

    t

    Hough Transform Principle

    43

    Bin with highest multiplicity gives track parameters

    r

    rij = cosj xi + sinj yi + i

    More

    i: ~100 hits/event (STT)j: steps of 0.2 rij: 180 000

    rij = cosj xi + sinj yi

  • Mitg

    lied

    der H

    elm

    holtz

    -Gem

    eins

    chaf

    t

    17

    Angle / 0 20 40 60 80 100 120 140 160 180

    Houg

    h tra

    nsfo

    rmed

    -0.04

    -0.02

    0

    0.02

    0.04

    0.06

    0Entries 324000Mean x 90Mean y 0.02791RMS x 51.96RMS y 0.02133

    0

    1

    2

    3

    4

    5

    6

    7

    8

    9

    100Entries 324000Mean x 90Mean y 0.02791RMS x 51.96RMS y 0.02133

    PANDA STT180 x 180 Grid

    r

    0.06

    0.04

    Hough Transform Example

    10 (x,y) points

  • Mitg

    lied

    der H

    elm

    holtz

    -Gem

    eins

    chaf

    t

    17

    Angle / 0 20 40 60 80 100 120 140 160 180

    Houg

    h tra

    nsfo

    rmed

    -0.04

    -0.02

    0

    0.02

    0.04

    0.06

    0Entries 324000Mean x 90Mean y 0.02791RMS x 51.96RMS y 0.02133

    0

    1

    2

    3

    4

    5

    6

    7

    8

    9

    100Entries 324000Mean x 90Mean y 0.02791RMS x 51.96RMS y 0.02133

    PANDA STT180 x 180 Grid

    r

    0.06

    0.04

    Hough Transform Example

    10 (x,y) points

  • Mitg

    lied

    der H

    elm

    holtz

    -Gem

    eins

    chaf

    t

    17

    r

    0.06

    0.04

    Angle / 0 20 40 60 80 100 120 140 160 180

    Houg

    h tra

    nsfo

    rmed

    -0.4

    -0.3

    -0.2

    -0.1

    0

    0.1

    0.2

    0.3

    0.4

    0.5

    0.6 0Entries 2.2356e+08Mean x 90Mean y 0.02905RMS x 51.96RMS y 0.1063

    0

    5

    10

    15

    20

    250

    Entries 2.2356e+08Mean x 90Mean y 0.02905RMS x 51.96RMS y 0.1063

    1800 x 1800 GridPANDA STT+MVD

    68 (x,y) points

    Hough Transform Example

  • Mitg

    lied

    der H

    elm

    holtz

    -Gem

    eins

    chaf

    t

    17

    r

    0.06

    0.04

    Angle / 0 20 40 60 80 100 120 140 160 180

    Houg

    h tra

    nsfo

    rmed

    -0.4

    -0.3

    -0.2

    -0.1

    0

    0.1

    0.2

    0.3

    0.4

    0.5

    0.6 0Entries 2.2356e+08Mean x 90Mean y 0.02905RMS x 51.96RMS y 0.1063

    0

    5

    10

    15

    20

    250

    Entries 2.2356e+08Mean x 90Mean y 0.02905RMS x 51.96RMS y 0.1063

    1800 x 1800 GridPANDA STT+MVD

    68 (x,y) points

    Hough Transform Example

  • Mitg

    lied

    der H

    elm

    holtz

    -Gem

    eins

    chaf

    t

    Hough Transform Remarks

    18

    Two Implementations

    Thrust Plain CUDA

    Performance: 3 ms/event Independent of angular granularity

    Reduced to set of standard routines Fast (uses Thrusts optimized algorithms)

    Inflexible (has its limits, hard to customize)

    No peakfinding included Even possible?

    Adds to time!

    Performance: 0.5 ms/event Built completely for this task

    Fitting to every problem

    Customizable

    A bit more complicated at parts

    Simple peakfinder implemented (threshold)

    Using: Dynamic Parallelism, Shared Memory

  • Mitg

    lied

    der H

    elm

    holtz

    -Gem

    eins

    chaf

    t

    19

    ALGORITHMS #2Hough Transform

    Riemann Track FinderTriplet Finder

  • Mitg

    lied

    der H

    elm

    holtz

    -Gem

    eins

    chaf

    t

    20

    Riemann Track Finder

    Algorithm in use in PANDAs oine analysis frameworkfor long time Good results Well-understood Handling of uncertainties

    Work by Jonathan Timcheck Summer student at Jlich

    Based on work byStrandlie et al

  • Mitg

    lied

    der H

    elm

    holtz

    -Gem

    eins

    chaf

    t

    21

    Riemann Track Finder Method

    Idea: Dont fit lines (in 2D), fit planes (in 3D)! Create seeds All possible three hit combinations

    Grow seeds to tracksContinuously test next hit if it fits Use mapping to Riemann paraboloid

    xx

    x

    x

    y

    z

    x

    xx

    y

    xx

    xx

    y

    xMore on: Seeds; Growing

  • nLayerx =1

    2

    p8x+ 1 1

    pos(nLayerx) =

    3pp

    3p243x2 1+ 27x32/3

    +1

    3p3

    3pp

    3p243x2 1+ 27x

    1

    Mitg

    lied

    der H

    elm

    holtz

    -Gem

    eins

    chaf

    t

    22

    Riemann Algorithm GPU Version

    GPU Optimization: Unfolding loops

    100 faster than CPU version

    Time for one event (Tesla K20X)Time(%) Time Calls Avg Min Max Name 75.55% 439.49us 1 439.49us 439.49us 439.49us extend_cut_hit_triplets_k 5.96% 34.656us 4 8.6640us 2.3360us 22.432us [CUDA memcpy DtoH] 4.36% 25.344us 1 25.344us 25.344us 25.344us cut_hit_triplets_k 4.26% 24.800us 6 4.1330us 3.7760us 5.3440us [CUDA memset] 2.57% 14.976us 1 14.976us 14.976us 14.976us generate_hit_triplet 2.44% 14.176us 1 14.176us 14.176us 14.176us generate_layer_triplets 1.30% 7.5520us 1 7.5520us 7.5520us 7.5520us void thrust 1.11% 6.4640us 1 6.4640us 6.4640us 6.4640us void thrust 1.11% 6.4640us 1 6.4640us 6.4640us 6.4640us void thrust 0.89% 5.1520us 5 1.0300us 928ns 1.3440us [CUDA memcpy HtoD] 0.45% 2.6240us 1 2.6240us 2.6240us 2.6240us project_onto_paraboloid_k

    int ijk = threadIdx.x + blockIdx.x * blockDim.x;for () {for () {for () {}}}

  • Mitg

    lied

    der H

    elm

    holtz

    -Gem

    eins

    chaf

    t

    23

    ALGORITHMS #3Hough Transform

    Riemann Track FinderTriplet Finder

  • Mitg

    lied

    der H

    elm

    holtz

    -Gem

    eins

    chaf

    t

    24

    Triplet Finder

    Algorithm specifically designed for thePANDA Straw Tube Tracker (STT)

    http://www.fz-juelich.de/ias/jsc/

    Original algorithm byMarius Mertens et al

    1.5 m

    Ported to GPU by Andrew Adinetz NVIDIA Application Lab Jlich CUDA, Dynamic Parallelism, Thrust

  • Mitg

    lied

    der H

    elm

    holtz

    -Gem

    eins

    chaf

    t

    25

    Triplet Finder

    Idea: Use only subset of detector as seed Combine 3 hits to Triplet Calculate circle from 3 Triplets (no fit)

    Features Fast & robust algorithm, no t0 Many tuning possibilities

    More

  • Mitg

    lied

    der H

    elm

    holtz

    -Gem

    eins

    chaf

    t

    Triplet Finder Display

    26

    Triplet

    Isochrone early

    Isochrone early & skewed

    Isochrone close

    Isochrone late

    MVD hit

    Track timed out

    Track current

  • Mitg

    lied

    der H

    elm

    holtz

    -Gem

    eins

    chaf

    t

    27

    Triplet Finder Times

  • Mitg

    lied

    der H

    elm

    holtz

    -Gem

    eins

    chaf

    t

    27

    Triplet Finder Times

  • Mitg

    lied

    der H

    elm

    holtz

    -Gem

    eins

    chaf

    t

    Triplet Finder Optimizations

    Bunching Wrapper Hits from one event have similar timestamp Combine hits to sets (bunches) which occupy GPU best

    28

  • Mitg

    lied

    der H

    elm

    holtz

    -Gem

    eins

    chaf

    t

    Triplet Finder Optimizations

    Bunching Wrapper Hits from one event have similar timestamp Combine hits to sets (bunches) which occupy GPU best

    28

    Hit

  • Mitg

    lied

    der H

    elm

    holtz

    -Gem

    eins

    chaf

    t

    Triplet Finder Optimizations

    Bunching Wrapper Hits from one event have similar timestamp Combine hits to sets (bunches) which occupy GPU best

    28

    Hit Event

  • Mitg

    lied

    der H

    elm

    holtz

    -Gem

    eins

    chaf

    t

    Triplet Finder Optimizations

    Bunching Wrapper Hits from one event have similar timestamp Combine hits to sets (bunches) which occupy GPU best

    28

    Hit Event

  • Mitg

    lied

    der H

    elm

    holtz

    -Gem

    eins

    chaf

    t

    Triplet Finder Optimizations

    Bunching Wrapper Hits from one event have similar timestamp Combine hits to sets (bunches) which occupy GPU best

    28

    Hit Event

    Bunch

  • Mitg

    lied

    der H

    elm

    holtz

    -Gem

    eins

    chaf

    t

    Triplet Finder Optimizations

    Bunching Wrapper Hits from one event have similar timestamp Combine hits to sets (bunches) which occupy GPU best

    28

    Hit Event

    Bunch (N2) (N)

  • Mitg

    lied

    der H

    elm

    holtz

    -Gem

    eins

    chaf

    t

    29

    Triplet Finder Bunching Performance

  • Mitg

    lied

    der H

    elm

    holtz

    -Gem

    eins

    chaf

    t

    Triplet Finder Optimizations

    30More

    Sector Row testing After found track:

    Hit association not with all hits of current window,but only with subset(first test rows of sector, then hits of row)

  • Mitg

    lied

    der H

    elm

    holtz

    -Gem

    eins

    chaf

    t

    Triplet Finder Optimizations

    30More

    Sector Row testing After found track:

    Hit association not with all hits of current window,but only with subset(first test rows of sector, then hits of row)

  • Mitg

    lied

    der H

    elm

    holtz

    -Gem

    eins

    chaf

    t

    Triplet Finder Optimizations

    30More

    Sector Row testing After found track:

    Hit association not with all hits of current window,but only with subset(first test rows of sector, then hits of row)

  • Mitg

    lied

    der H

    elm

    holtz

    -Gem

    eins

    chaf

    t

    Triplet Finder Optimizations

    30More

    Sector Row testing After found track:

    Hit association not with all hits of current window,but only with subset(first test rows of sector, then hits of row)

  • Mitg

    lied

    der H

    elm

    holtz

    -Gem

    eins

    chaf

    t

    Triplet Finder Optimizations

    30More

    Sector Row testing After found track:

    Hit association not with all hits of current window,but only with subset(first test rows of sector, then hits of row)

  • Mitg

    lied

    der H

    elm

    holtz

    -Gem

    eins

    chaf

    t

    31

    Triplet Finder Sector Rows

    Preliminary(in publication)

  • DynamicParallelism

    Mitg

    lied

    der H

    elm

    holtz

    -Gem

    eins

    chaf

    t

    Triplet Finder Optimizations

    Compare kernel launch strategies

    32

    1 thread/bunchCalling kernel1 thread/bunch

    Calling kernel

    TripletFinder

    1 thread/bunch

    Calling kernel

    1 block/bunch

    Joined kernel1 block/bunch

    Joined kernel1 block/bunch

    Joined kernel

    TF Stage #1

    TF Stage #2

    TF Stage #3

    TF Stage #4

    1 stream/bunch

    Combining stream

    1 stream/bunch

    Combining stream

    1 stream/bunch

    Calling stream

    JoinedKernel

    HostStreams

    TripletFinder

    TripletFinder

    CPU

    GPU

    TF Stage #1

    TF Stage #2

    TF Stage #3

    TF Stage #4

    TF Stage #1

    TF Stage #2

    TF Stage #3

    TF Stage #4

    CPU

    GPU

  • Mitg

    lied

    der H

    elm

    holtz

    -Gem

    eins

    chaf

    t

    33

    Triplet Finder Kernel Launches

    Explanation

    Preliminary(in publication)

  • Tesla K40 Tesla K20X

    Peak double performance

    Peak single performance

    GPU Chipset

    # CUDA Cores

    Memory size

    Memory bandwidth

    1.46 TFLOPS 1.31 TFLOPS

    4.29 TFLOPS 3.95 TFLOPS

    GK110B GK110

    2880 2688

    12 GB 6 GB

    288 GByte/s 250 GByte/s

    Mitg

    lied

    der H

    elm

    holtz

    -Gem

    eins

    chaf

    t

    Triplet Finder Optimizations

    Impact of chipset

    34Source: http://www.nvidia.com/content/tesla/pdf/NVIDIA-Tesla-Kepler-Family-Datasheet.pdf

  • Mitg

    lied

    der H

    elm

    holtz

    -Gem

    eins

    chaf

    t

    35

    Triplet Finder Clock Speed / GPU

    Preliminary(in publication)

    K40 3004 MHz, 745 MHz / 875 MHzK20X 2600 MHz, 732 MHz / 784 MHz

    Memory Clock Core Clock GPU Boost

  • Mitg

    lied

    der H

    elm

    holtz

    -Gem

    eins

    chaf

    t

    Triplet Finder Optimizations

    Many optimizations possible Most important: Bunching wrapper More float less double-cards la K10 a viable alternative

    Best performance: 20 s/event Online Tracking a feasible technique for PANDA Multi GPU system needed (100) GPUs

    36

  • Mitg

    lied

    der H

    elm

    holtz

    -Gem

    eins

    chaf

    t

    Summary

    GPUs are very interesting for HEP PANDA investigates GPUs as central element in experiments

    design Algorithms in active evaluation and optimization Collaboration with NVIDIA Application Lab

    37

  • Thank you!Andreas Herten

    [email protected]

    @AndiH#GTC14

    Mitg

    lied

    der H

    elm

    holtz

    -Gem

    eins

    chaf

    t

    Summary

    GPUs are very interesting for HEP PANDA investigates GPUs as central element in experiments

    design Algorithms in active evaluation and optimization Collaboration with NVIDIA Application Lab

    37

  • Mitg

    lied

    der H

    elm

    holtz

    -Gem

    eins

    chaf

    t

    List of Resources Used

    #4: Earth icon by Francesco Paleari from The Noun Project

    #4: Einstein icon by Roman Rusinov from The Noun Project #6: FAIR vector logo from ocial FAIR website #6: FAIR rendering from ocial website #11: Flare Gun icon by Jop van der Kroef from The Noun Project

    #27: STT event animation by Marius C. Mertens #35: Graphics cards images by NVIDIA promotion #35: GPU Specifications

    Tesla K20X Specifications: http://www.nvidia.com/content/PDF/kepler/Tesla-K20X-BD-06397-001-v07.pdf

    Tesla K40 Specifications: http://www.nvidia.com/content/PDF/kepler/Tesla-K40-Active-Board-Spec-BD-06949-001_v03.pdf

    Tesla Familiy Overview: http://www.nvidia.com/content/tesla/pdf/NVIDIA-Tesla-Kepler-Family-Datasheet.pdf

    38

  • Mitg

    lied

    der H

    elm

    holtz

    -Gem

    eins

    chaf

    t

    BACKUP

    39

  • Mitg

    lied

    der H

    elm

    holtz

    -Gem

    eins

    chaf

    t

    Hough Transform Principle

    40Back

  • Mitg

    lied

    der H

    elm

    holtz

    -Gem

    eins

    chaf

    t

    Hough Transform Principle

    40

    x

    y

    Back

  • Mitg

    lied

    der H

    elm

    holtz

    -Gem

    eins

    chaf

    t

    Hough Transform Principle

    40

    x

    y

    Back

  • Mitg

    lied

    der H

    elm

    holtz

    -Gem

    eins

    chaf

    t

    Hough Transform Principle

    40

    x

    y

    *

    *

    (r, )1

    rij = cosj xi + sinj yi + i

    Back

  • Mitg

    lied

    der H

    elm

    holtz

    -Gem

    eins

    chaf

    t

    Hough Transform Principle

    40

    x

    y

    *

    *

    r

    (r, )1

    rij = cosj xi + sinj yi + i

    Back

  • Mitg

    lied

    der H

    elm

    holtz

    -Gem

    eins

    chaf

    t

    Hough Transform Principle

    40

    x

    y

    *

    *

    r

    (r, )1

    rij = cosj xi + sinj yi + i

    Back

  • Mitg

    lied

    der H

    elm

    holtz

    -Gem

    eins

    chaf

    t

    Hough Transform Principle

    40

    x

    y

    *

    *

    r

    (r, )1

    (r, )2

    rij = cosj xi + sinj yi + i

    Back

  • Mitg

    lied

    der H

    elm

    holtz

    -Gem

    eins

    chaf

    t

    Hough Transform Principle

    40

    x

    y

    *

    *

    r

    rij = cosj xi + sinj yi + i

    Back

  • Mitg

    lied

    der H

    elm

    holtz

    -Gem

    eins

    chaf

    t

    Hough Transform Principle

    40

    x

    y

    *

    *

    r

    rij = cosj xi + sinj yi + i

    Back

  • Mitg

    lied

    der H

    elm

    holtz

    -Gem

    eins

    chaf

    t

    Hough Transform Principle

    40

    x

    y

    *

    *

    r

    rij = cosj xi + sinj yi + i

    Back

  • Mitg

    lied

    der H

    elm

    holtz

    -Gem

    eins

    chaf

    t

    Hough Transform Principle

    40

    x

    y

    *

    *

    r

    rij = cosj xi + sinj yi + i

    Back

  • Mitg

    lied

    der H

    elm

    holtz

    -Gem

    eins

    chaf

    t

    Hough Transform Principle

    40

    x

    y

    *

    *

    r

    rij = cosj xi + sinj yi + i

    Back

  • Mitg

    lied

    der H

    elm

    holtz

    -Gem

    eins

    chaf

    t

    Hough Transform Principle

    40

    x

    y

    Bin with highest multiplicity gives track parameters

    *

    *

    r

    rij = cosj xi + sinj yi + i

    Back

  • Mitg

    lied

    der H

    elm

    holtz

    -Gem

    eins

    chaf

    t

    41

    Riemann Algorithm Procedure

  • Mitg

    lied

    der H

    elm

    holtz

    -Gem

    eins

    chaf

    t

    41

    Riemann Algorithm Procedure

    Create triplet of hit points All possible three hit combinations need to become triplets

    1

  • Mitg

    lied

    der H

    elm

    holtz

    -Gem

    eins

    chaf

    t

    41

    Riemann Algorithm Procedure

    Create triplet of hit points All possible three hit combinations need to become triplets

    Grow triplets to tracks:Continuously test next hit if it fits to triplet track Use Riemann paraboloid to circle fit track Test closeness of new hit: good add hit; bad dismiss hit Continue with next hit

    Helix fit: arc length s vs. z position

    1

    2

  • Mitg

    lied

    der H

    elm

    holtz

    -Gem

    eins

    chaf

    t

    42

    1 2 3 4 5

    1

    2

    3

    4

    5

    Riemann Algorithm 1 Triplets1

    Layer number

    Back

  • Mitg

    lied

    der H

    elm

    holtz

    -Gem

    eins

    chaf

    t

    42

    1 2 3 4 5

    1

    2

    3

    4

    5

    Riemann Algorithm 1 Triplets1

    Layer number

    Back

  • Mitg

    lied

    der H

    elm

    holtz

    -Gem

    eins

    chaf

    t

    42

    1 2 3 4 5

    1

    2

    3

    4

    5

    Riemann Algorithm 1 Triplets1

    Layer number

    Back

  • Mitg

    lied

    der H

    elm

    holtz

    -Gem

    eins

    chaf

    t

    42

    1 2 3 4 5

    2111 31

    1

    2

    3

    4

    5

    Riemann Algorithm 1 Triplets1

    Layer number

    Back

  • Mitg

    lied

    der H

    elm

    holtz

    -Gem

    eins

    chaf

    t

    42

    1 2 3 4 5

    2111 31

    3111 41

    1

    2

    3

    4

    5

    Riemann Algorithm 1 Triplets1

    Layer number

    Back

  • Mitg

    lied

    der H

    elm

    holtz

    -Gem

    eins

    chaf

    t

    42

    1 2 3 4 5

    2111 31

    3111 41

    3111 32

    1

    2

    3

    4

    5

    Riemann Algorithm 1 Triplets1

    Layer number

    Back

  • Mitg

    lied

    der H

    elm

    holtz

    -Gem

    eins

    chaf

    t

    42

    1 2 3 4 5

    2111 31

    3111 41

    3111 32

    1

    2

    3

    4

    5

    Riemann Algorithm 1 Triplets1

    Layer number

    Back

  • Mitg

    lied

    der H

    elm

    holtz

    -Gem

    eins

    chaf

    t

    43

    Riemann Algorithm 1 Expansion2

    Back

  • Mitg

    lied

    der H

    elm

    holtz

    -Gem

    eins

    chaf

    t

    43

    Riemann Algorithm 1 Expansion2

    xx

    x

    x

    y

    z

    Expand to z

    Back

  • Mitg

    lied

    der H

    elm

    holtz

    -Gem

    eins

    chaf

    t

    43

    Riemann Algorithm 1 Expansion2

    xx

    x

    x

    y

    z

    Expand to z

    x

    xx

    y

    x

    Riemann Surface(paraboloid)

    Back

  • Mitg

    lied

    der H

    elm

    holtz

    -Gem

    eins

    chaf

    t

    43

    Riemann Algorithm 1 Expansion2

    xx

    x

    x

    y

    z

    Expand to z

    x

    xx

    y

    x

    Riemann Surface(paraboloid)

    Back

  • Mitg

    lied

    der H

    elm

    holtz

    -Gem

    eins

    chaf

    t

    43

    Riemann Algorithm 1 Expansion2

    xx

    x

    x

    y

    z

    Expand to z

    x

    xx

    y

    x

    Riemann Surface(paraboloid)

    Back

  • Mitg

    lied

    der H

    elm

    holtz

    -Gem

    eins

    chaf

    t

    43

    Riemann Algorithm 1 Expansion2

    xx

    x

    x

    y

    z

    Expand to z

    x

    xx

    y

    x

    Riemann Surface(paraboloid)

    Back

  • Mitg

    lied

    der H

    elm

    holtz

    -Gem

    eins

    chaf

    t

    43

    Riemann Algorithm 1 Expansion2

    xx

    x

    x

    y

    z

    Expand to z

    x

    xx

    y

    x

    Riemann Surface(paraboloid)

    Back

  • Mitg

    lied

    der H

    elm

    holtz

    -Gem

    eins

    chaf

    t

    43

    Riemann Algorithm 1 Expansion2

    xx

    x

    x

    y

    z

    Expand to z

    x

    xx

    y

    x

    Riemann Surface(paraboloid)

    x

    Back

  • Mitg

    lied

    der H

    elm

    holtz

    -Gem

    eins

    chaf

    t

    43

    Riemann Algorithm 1 Expansion2

    xx

    x

    x

    y

    z

    Expand to z

    x

    xx

    y

    x

    Riemann Surface(paraboloid)

    x

    Back

  • Mitg

    lied

    der H

    elm

    holtz

    -Gem

    eins

    chaf

    t

    43

    Riemann Algorithm 1 Expansion2

    xx

    x

    x

    y

    z

    Expand to z

    x

    xx

    y

    x

    Riemann Surface(paraboloid)

    x

    Back

  • Mitg

    lied

    der H

    elm

    holtz

    -Gem

    eins

    chaf

    t

    43

    Riemann Algorithm 1 Expansion2

    xx

    x

    x

    y

    z

    Expand to z

    x

    xx

    y

    x

    Riemann Surface(paraboloid)

    x

    Back

  • Mitg

    lied

    der H

    elm

    holtz

    -Gem

    eins

    chaf

    t

    43

    Riemann Algorithm 1 Expansion2

    xx

    x

    x

    y

    z

    Expand to z

    x

    xx

    y

    x

    Riemann Surface(paraboloid)

    x

    Back

  • Mitg

    lied

    der H

    elm

    holtz

    -Gem

    eins

    chaf

    t

    Triplet Finder Method

    44More

    STT

  • Mitg

    lied

    der H

    elm

    holtz

    -Gem

    eins

    chaf

    t

    Triplet Finder Method

    STT hit in pivot straw

    44More

    STT

  • Mitg

    lied

    der H

    elm

    holtz

    -Gem

    eins

    chaf

    t

    Triplet Finder Method

    STT hit in pivot straw Find surrounding hits Create virtual hit (triplet) at center of gravity (cog)

    44More

    STT

  • Mitg

    lied

    der H

    elm

    holtz

    -Gem

    eins

    chaf

    t

    Triplet Finder Method

    STT hit in pivot straw Find surrounding hits Create virtual hit (triplet) at center of gravity (cog) Combine with

    44More

    STT

  • Mitg

    lied

    der H

    elm

    holtz

    -Gem

    eins

    chaf

    t

    Triplet Finder Method

    STT hit in pivot straw Find surrounding hits Create virtual hit (triplet) at center of gravity (cog) Combine with

    1. Second STT pivot-cog virtual hit

    44More

    STT

  • Mitg

    lied

    der H

    elm

    holtz

    -Gem

    eins

    chaf

    t

    Triplet Finder Method

    STT hit in pivot straw Find surrounding hits Create virtual hit (triplet) at center of gravity (cog) Combine with

    1. Second STT pivot-cog virtual hit2. Interaction point

    44More

    STT

  • Mitg

    lied

    der H

    elm

    holtz

    -Gem

    eins

    chaf

    t

    Triplet Finder Method

    STT hit in pivot straw Find surrounding hits Create virtual hit (triplet) at center of gravity (cog) Combine with

    1. Second STT pivot-cog virtual hit2. Interaction point

    Calculate circle through three points

    44More

    STT

  • Mitg

    lied

    der H

    elm

    holtz

    -Gem

    eins

    chaf

    t

    Triplet Finder Method

    STT hit in pivot straw Find surrounding hits Create virtual hit (triplet) at center of gravity (cog) Combine with

    1. Second STT pivot-cog virtual hit2. Interaction point

    Calculate circle through three points Track Candidate

    44More

    STT

  • Mitg

    lied

    der H

    elm

    holtz

    -Gem

    eins

    chaf

    t

    Triplet Finder Method

    STT hit in pivot straw Find surrounding hits Create virtual hit (triplet) at center of gravity (cog) Combine with

    1. Second STT pivot-cog virtual hit2. Interaction point

    Calculate circle through three points Track Candidate

    44More

    STT

  • Mitg

    lied

    der H

    elm

    holtz

    -Gem

    eins

    chaf

    t

    Triplet Finder Method

    STT hit in pivot straw Find surrounding hits Create virtual hit (triplet) at center of gravity (cog) Combine with

    1. Second STT pivot-cog virtual hit2. Interaction point

    Calculate circle through three points Track Candidate

    44More

    STT

  • Mitg

    lied

    der H

    elm

    holtz

    -Gem

    eins

    chaf

    t

    Triplet Finder Method

    STT hit in pivot straw Find surrounding hits Create virtual hit (triplet) at center of gravity (cog) Combine with

    1. Second STT pivot-cog virtual hit2. Interaction point

    Calculate circle through three points Track Candidate

    44More

    STT

  • Mitg

    lied

    der H

    elm

    holtz

    -Gem

    eins

    chaf

    t

    Triplet Finder Method

    STT hit in pivot straw Find surrounding hits Create virtual hit (triplet) at center of gravity (cog) Combine with

    1. Second STT pivot-cog virtual hit2. Interaction point

    Calculate circle through three points Track Candidate

    44More

    STT

  • Mitg

    lied

    der H

    elm

    holtz

    -Gem

    eins

    chaf

    t

    Triplet Finder Method

    STT hit in pivot straw Find surrounding hits Create virtual hit (triplet) at center of gravity (cog) Combine with

    1. Second STT pivot-cog virtual hit2. Interaction point

    Calculate circle through three points Track Candidate

    44More

    Interaction Point

    STT

  • Mitg

    lied

    der H

    elm

    holtz

    -Gem

    eins

    chaf

    t

    Triplet Finder Optimizations

    Sector Row testing Thicken track; shrink sector row layer to line Find intersection

    45

    11.12.2013 Slide 12 Andrew V. Adinetz

    Sector-Row Testing Track

    Sector-Row

    Track

    Sector-Row

    Back

  • Mitg

    lied

    der H

    elm

    holtz

    -Gem

    eins

    chaf

    t

    Triplet Finder Kernel Launch Strategies

    Joined Kernel (JK): slowest High # registers low occupancy

    Dynamic Parallelism (DP) / Host Streams (HS): comparable performance Performance

    HS faster for small # processed hits, DP faster for > 45000 hits HS stagnates there, while DP continues rising

    Limiting factor High # of required kernel calls Kernel launch latency Memcopy

    HS more aected by this, because More PCI-E transfers (launch configurations for kernels) Less launch throughput, kernel launch latency gets more important False dependencies of launched kernels

    Single CPU thread handles all CUDA streams (Multi-thread possible, but synchronization overhead too high for good performance)

    Grid scheduling done on hardware (Grid Management Unit) (DP: software) False dependencies when N(streams) > N(device connections)=323.5

    46BackBack

  • Mitg

    lied

    der H

    elm

    holtz

    -Gem

    eins

    chaf

    t

    47

    Triplet Finder Host Stream Connections

    Preliminary(in publication)

  • Mitg

    lied

    der H

    elm

    holtz

    -Gem

    eins

    chaf

    t

    48

    Triplet Finder Bunch Sizes

    Preliminary(in publication)

  • Berlin

    Munich

    Cologne

    Jlich

    Mitg

    lied

    der H

    elm

    holtz

    -Gem

    eins

    chaf

    t

    49

    Forschungszentrum Jlich & Me

    Research Center *1956; Federal center

    Budget: 730 Mio. USD/year 5300 employees Thereof 1700 scientists (600 PhD students)

    Topics: Health, Energy, EnvironmentPhysics; SupercomputingMany large-scale facilities

    Me Diploma in physics from RWTH Aachen University

    (CMS experiment) PhD researcher since 2011:

    GPU Online Tracking for PANDA