parallelisation of random number generation in placet approaches of parallelisation in placet martin...

24
Parallelisation of Random Number Generation in PLACET Approaches of parallelisation in PLACET Martin Blaha University of Vienna AT CERN 25.09.2013

Upload: valerie-nicholson

Post on 01-Jan-2016

223 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Parallelisation of Random Number Generation in PLACET Approaches of parallelisation in PLACET Martin Blaha University of Vienna AT CERN 25.09.2013

Parallelisation of Random Number Generation in PLACET

Approaches of parallelisation in PLACET

Martin BlahaUniversity of Vienna AT

CERN 25.09.2013

Page 2: Parallelisation of Random Number Generation in PLACET Approaches of parallelisation in PLACET Martin Blaha University of Vienna AT CERN 25.09.2013

Additions to centralised RNG

• TCL command RandomReseto Sets seeds to all streams individually

RandomReset –stream Misalignments –seed 1234o sets default seeds = reset, if called without argumento sets generatorso replaces redundancy in Tcl commands that set seeds (e.g.

Groundmotion_init)o Help that lists all streams

• Benchmarks on not parallelised codeo gsl causes slowdown of max. 3% depending on generator

1

Page 3: Parallelisation of Random Number Generation in PLACET Approaches of parallelisation in PLACET Martin Blaha University of Vienna AT CERN 25.09.2013

Motivation for parallel execution

Runtimes of simulations slow!

“Low-performance” functions:

● SBEND

● QUADRUPOLE

● MULTIPOLE

● ELEMENT

→ they refer to RNGs through syncrotron radiation emission

profile by Yngve LevinsenFeb. 2013

2

Page 4: Parallelisation of Random Number Generation in PLACET Approaches of parallelisation in PLACET Martin Blaha University of Vienna AT CERN 25.09.2013

Parallel Random Number Generation

Problems

requesting random numbers from a sequential stream for parallel use is uncontrolable

controlable and reproducible

gsl random number generators do not support parallel generation by itself

3

Page 5: Parallelisation of Random Number Generation in PLACET Approaches of parallelisation in PLACET Martin Blaha University of Vienna AT CERN 25.09.2013

Methods for parallel random number generation

● centralized generation

● replicated generation

● distributed generation

● existing Libraries

4

Page 6: Parallelisation of Random Number Generation in PLACET Approaches of parallelisation in PLACET Martin Blaha University of Vienna AT CERN 25.09.2013

Centralized RNGOne generator produces all numbers

Advantages:

only one RNG with good sequence

easy implementation

Disadvantage:

race conditions occur

fair play not guaranteed or crash (programme not stable)

slow if queueing (even slower than single thread)

5

Page 7: Parallelisation of Random Number Generation in PLACET Approaches of parallelisation in PLACET Martin Blaha University of Vienna AT CERN 25.09.2013

Replicated RNGInitial RNG is copied for each thread

Advantages:

more efficient

easy implementation

Disadvantage:

can suffer from correlations between threads

6

Page 8: Parallelisation of Random Number Generation in PLACET Approaches of parallelisation in PLACET Martin Blaha University of Vienna AT CERN 25.09.2013

Distributed RNGEach thread has its own generator

Advantages:

efficient - each thread can work stand alone

threadsafe

reproducible

Disadvantage:

can suffer from correlations

7

Page 9: Parallelisation of Random Number Generation in PLACET Approaches of parallelisation in PLACET Martin Blaha University of Vienna AT CERN 25.09.2013

Existing Libraries

SPRNG - University of Florida

hard to find “good” documentation on how to

combine with parallel code eg OpenMp

PRAND

for CUDA environment on GPU and CPU

good documentation on RNGs in general

Disadvantage: yet another library8

Page 10: Parallelisation of Random Number Generation in PLACET Approaches of parallelisation in PLACET Martin Blaha University of Vienna AT CERN 25.09.2013

Distributed RNG

Summary:

distributed generation considered to fit the best for our needs

Common methods that are known to produce satisfactory outcome

1. Random Tree Method

2. Block Splitting

3. Leapfrog Method 9

Page 11: Parallelisation of Random Number Generation in PLACET Approaches of parallelisation in PLACET Martin Blaha University of Vienna AT CERN 25.09.2013

Random Tree Method

• Global RNG for seeding

• Standalone RNG per thread

• Reproducible for known number of threads

new tcl command to set number of threads

→ only runs fair for the same number of

threads, not for dinamical thread assignment

Seed

10

Page 12: Parallelisation of Random Number Generation in PLACET Approaches of parallelisation in PLACET Martin Blaha University of Vienna AT CERN 25.09.2013

Block Splitting

Split a sequence of RN in blocks

Advantages:

no overlap in random numbers

plays fair

Disadvantages:

allocates a huge array of numbers

number of RNs has to be known in advance

11

Page 13: Parallelisation of Random Number Generation in PLACET Approaches of parallelisation in PLACET Martin Blaha University of Vienna AT CERN 25.09.2013

Leapfrog Method

Distributes a sequence or RN over several threads one by one

Advantages:

number of RNs must not be known in advance

guarantees no overlap of RN

plays fair, still permutations in calls

Disadvantage:

costly call of random numbers

12

Page 14: Parallelisation of Random Number Generation in PLACET Approaches of parallelisation in PLACET Martin Blaha University of Vienna AT CERN 25.09.2013

Block splitting vs. Leapfrog

Block-Splitting and Leapfrog runs fair with dynamic thread assignment

Problem of implimentation in a distributed, non centralised wayPeriod per thread is period of RNG/# threads

13

Page 15: Parallelisation of Random Number Generation in PLACET Approaches of parallelisation in PLACET Martin Blaha University of Vienna AT CERN 25.09.2013

Testing parallel RNG methods

SPEEDUP to -33,3% in runtime for random tree method

only overheads for nosynrad and little number of particles

SLOWDOWN to + 120% in runtime for leapfrog method

due to withdrawing more numbers than needed

Testing via test-bds-track for 300 000 particles, with quadrupoles and multipoles

14

Page 16: Parallelisation of Random Number Generation in PLACET Approaches of parallelisation in PLACET Martin Blaha University of Vienna AT CERN 25.09.2013

PreparationTool for Parallelisation - OpenMp

easy implementation

control of variable scope, assignment schedule, critical sections

15

Page 17: Parallelisation of Random Number Generation in PLACET Approaches of parallelisation in PLACET Martin Blaha University of Vienna AT CERN 25.09.2013

Preparation:Centralising synrad functions

2 functions calculate synrad emmission:

synrad.cc

photon_spectrum.cc

Centralised for easier and reproducible use of parallel RNG

synrad.cc has been removed

Tested via test-bds-track for 3e5 particles, same outcome

16

Page 18: Parallelisation of Random Number Generation in PLACET Approaches of parallelisation in PLACET Martin Blaha University of Vienna AT CERN 25.09.2013

Implementation of new class

New class PARALLEL_RNG

Inherits all methods from RANDOM_NEW

Initialises parallel RNG always on max. number of available threads

New Tcl-command ParallelThreads –num val to choose number of threads

Now RNG stream Radiation runs completely parallel by default 17

Page 19: Parallelisation of Random Number Generation in PLACET Approaches of parallelisation in PLACET Martin Blaha University of Vienna AT CERN 25.09.2013

Testing – BDS tracking

Covariance Matrix of test-bds-track

18

Testing via test-bds-track for 300 000 particles, with quadrupoles and multipoles

Page 20: Parallelisation of Random Number Generation in PLACET Approaches of parallelisation in PLACET Martin Blaha University of Vienna AT CERN 25.09.2013

Testing – CLIC beam tracking

Beam - tracking with no correction Beam - tracking with simple correction

19Testing test-clic-3 for 3500 machines

Page 21: Parallelisation of Random Number Generation in PLACET Approaches of parallelisation in PLACET Martin Blaha University of Vienna AT CERN 25.09.2013

Time Profile

Total runtime on 32 cores: 27 sec

Total runtime on 1 core 1 m 21 sec

Total runtime on PLACET:58 sec

BDS tracking:

PLACET: 39 sec

PLACET-NEW:~9 sec

BDS TRACKING 3.5 times faster

Timeprofile for BDS tracking for 300 000 particles2x Intel Xeon E5-2650 2.00 GHz 8-Core (16 w/hyper threading)(95W 20MB 2.8GHz Turbo Sandy Bridge EP)

20

Page 22: Parallelisation of Random Number Generation in PLACET Approaches of parallelisation in PLACET Martin Blaha University of Vienna AT CERN 25.09.2013

Profiling

BDS:

Sbend, elements, multipole, quadrupole still most timeconsuming functions

Linac:OMP library causes slowdown in simple-correction routines

(e.g. test-clic-4)

76% of time consumption caused by OpenMP in wait_sleep

It was necessary to find a compromise!21

Page 23: Parallelisation of Random Number Generation in PLACET Approaches of parallelisation in PLACET Martin Blaha University of Vienna AT CERN 25.09.2013

Conclusion

BDS runs ~30 % faster (total runtime)

CLIC 4 runs ~13 % faster

Compared to current placet in the trunk

OpenMP is a quick and easy way to parallelisation for existing functions.

22

Page 24: Parallelisation of Random Number Generation in PLACET Approaches of parallelisation in PLACET Martin Blaha University of Vienna AT CERN 25.09.2013

Future Plan

• Need to understand the overhead while running sequential

• Benchmark performance of quick functions e.g. dipoles, drifts, step-in, BPMs

• Adjust automatically to current configuration

• Write technical/user documentation

• Merge into trunk

23