[0.5em] numerical simulations using approximate random numbers · numerical simulations using...
TRANSCRIPT
Mat
hem
atic
alIn
stitu
teU
nive
rsity
ofO
xfor
d
Numerical simulations usingapproximate random numbers
Oliver [email protected]
Thursday 6th February 2020
Supervisors:Prof. Michael Giles OxfordDr Christopher Goodyer Arm
EPSRC Centre for Doctoral Training in Industrially Focused Mathematical Modelling
Mat
hem
atic
alIn
stitu
teU
nive
rsity
ofO
xfor
d Some HPC dogmas
Of course my code will run faster if it’s vectorised.
No
Of course my code will run faster if I worked in a lower precision.
No
Well my compiler should be clever enough to decide what’s best.
No
Half-precision is useless for anyone who wants accurate answers!
No
You can’t design code that performs well on arbitrary vector lengths.
No
EPSRC Centre for Doctoral Training in Industrially Focused Mathematical Modelling
Mat
hem
atic
alIn
stitu
teU
nive
rsity
ofO
xfor
d Overview
1 What is an approximate random variable?
2 When can we use them?
3 Do we still get the right answer?
4 What precisions can we use, and when?
5 Conclusions
EPSRC Centre for Doctoral Training in Industrially Focused Mathematical Modelling
Mat
hem
atic
alIn
stitu
teU
nive
rsity
ofO
xfor
d Stochastic simulations
Weather Tomorrow’s temperature given today’s weather.Finance Value of contract given today’s prices.
Traffic Rush hour traffic given morning congestion.Health Risk of later secondary condition given current health.
Technology Expected number of online visitors given current search trends.
EPSRC Centre for Doctoral Training in Industrially Focused Mathematical Modelling
Mat
hem
atic
alIn
stitu
teU
nive
rsity
ofO
xfor
d Mathematical simulations
f (·)
F (·)
F−1(u)
EPSRC Centre for Doctoral Training in Industrially Focused Mathematical Modelling
Mat
hem
atic
alIn
stitu
teU
nive
rsity
ofO
xfor
d The inverse Gaussian CDF Φ−1(·)
0 0.5 1x
-5
0
5Φ−
1 (x)
EPSRC Centre for Doctoral Training in Industrially Focused Mathematical Modelling
Mat
hem
atic
alIn
stitu
teU
nive
rsity
ofO
xfor
d Tails kill performance in SVE and FP16
0 0.5 1x
-5
0
5Φ−
1 (x)
EPSRC Centre for Doctoral Training in Industrially Focused Mathematical Modelling
Mat
hem
atic
alIn
stitu
teU
nive
rsity
ofO
xfor
d Tails kill performance in SVE and FP16
0 0.5 1x
-5
0
5Φ−
1 (x)
EPSRC Centre for Doctoral Training in Industrially Focused Mathematical Modelling
Mat
hem
atic
alIn
stitu
teU
nive
rsity
ofO
xfor
d Piecewise constant approximation
0 0.5 1x
0
V(Z − Z̃
)≈ 1.6× 10−4 for 1024 partitions
Uniform piecewise constant approximation of Φ−1(·)
Φ−1(x)Φ̃−1(x)
EPSRC Centre for Doctoral Training in Industrially Focused Mathematical Modelling
Mat
hem
atic
alIn
stitu
teU
nive
rsity
ofO
xfor
d Piecewise linear dyadic approximation
0 0.5 1x
0
V(Z − Z̃
)≈ 4.3× 10−5 for 31 intervals
Piecewise linear dyadic approximation of Φ−1(·)
Φ−1(x)Φ̃−1(x)
EPSRC Centre for Doctoral Training in Industrially Focused Mathematical Modelling
Mat
hem
atic
alIn
stitu
teU
nive
rsity
ofO
xfor
d Polynomial approximation
0 0.5 1x
0
V(Z − Z̃
)≈ 2.6× 10−3 for a 7th order polynomial
Cubic approximation of Φ−1(·)
Φ−1(x)Φ̃−1(x)
EPSRC Centre for Doctoral Training in Industrially Focused Mathematical Modelling
Mat
hem
atic
alIn
stitu
teU
nive
rsity
ofO
xfor
d Speed of execution
Average speed(clock cycles)
Intel MKL 8Lookup table 2
EPSRC Centre for Doctoral Training in Industrially Focused Mathematical Modelling
Mat
hem
atic
alIn
stitu
teU
nive
rsity
ofO
xfor
d Speed of execution
Average speed(clock cycles)
Intel MKL 8Lookup table 2
GNU GSL 128Cephes 83NAG 117
EPSRC Centre for Doctoral Training in Industrially Focused Mathematical Modelling
Mat
hem
atic
alIn
stitu
teU
nive
rsity
ofO
xfor
d Speed of execution
Average speed(clock cycles)
Intel MKL 8Lookup table 2
GNU GSL 128Cephes 83NAG 117
GSL (optimised) 17Giles [1] (NVIDIA?) 12
EPSRC Centre for Doctoral Training in Industrially Focused Mathematical Modelling
Mat
hem
atic
alIn
stitu
teU
nive
rsity
ofO
xfor
d Accuracy and precision
EPSRC Centre for Doctoral Training in Industrially Focused Mathematical Modelling
Mat
hem
atic
alIn
stitu
teU
nive
rsity
ofO
xfor
d Multilevel Monte Carlo
E(P ) ≈ E(P̂Accurate
)= E
(P̂Crude
)+ E
(P̂Accurate − P̂Crude
)
EPSRC Centre for Doctoral Training in Industrially Focused Mathematical Modelling
Mat
hem
atic
alIn
stitu
teU
nive
rsity
ofO
xfor
d Nested multilevel Monte Carlo I
Monte Carlo (MC)
Multilevel Monte Carlo(MLMC)
Quantised multilevel MonteCarlo (QMLMC)
Reduced precision quantisedmultilevel Monte Carlo
(RPQMLMC)
Monte Carlo
Temporal discretisation
Quantised distribution
Reduced (half) precision
EPSRC Centre for Doctoral Training in Industrially Focused Mathematical Modelling
Mat
hem
atic
alIn
stitu
teU
nive
rsity
ofO
xfor
d Discretisation, quantisation, and roundoff
Correction = Discretisation × Quantisation
+ Roundoff︸ ︷︷ ︸1
Discretisation
EPSRC Centre for Doctoral Training in Industrially Focused Mathematical Modelling
Mat
hem
atic
alIn
stitu
teU
nive
rsity
ofO
xfor
d Discretisation, quantisation, and roundoff
Correction = Discretisation × Quantisation
+ Roundoff
︸ ︷︷ ︸1
Discretisation
EPSRC Centre for Doctoral Training in Industrially Focused Mathematical Modelling
Mat
hem
atic
alIn
stitu
teU
nive
rsity
ofO
xfor
d Discretisation, quantisation, and roundoff
Correction = Discretisation × Quantisation
+ Roundoff︸ ︷︷ ︸1
Discretisation
EPSRC Centre for Doctoral Training in Industrially Focused Mathematical Modelling
Mat
hem
atic
alIn
stitu
teU
nive
rsity
ofO
xfor
d When can’t we use half-precision?
0 5 10 15 20 25 30Level l with N = 2l
2−50
2−40
2−30
2−20
2−10
20 Reduced precision quantised multilevel Monte Carlo
V(X̂ f
64− X̂c64− X̃ f
16 + X̃c16
)
V(X̂ f
64− X̂c64− X̃ f
32 + X̃c32
)
V(X̂ f
64− X̂c64− X̃ f
64 + X̃c64
)
EPSRC Centre for Doctoral Training in Industrially Focused Mathematical Modelling
Mat
hem
atic
alIn
stitu
teU
nive
rsity
ofO
xfor
d Numerical results
Running our QMLMC estimator with a single quantisation level (1024 bins) forfinely discretised paths gives the following average time per path:
Relative accuracy ε = 10−3
Times per path 10−4 sMemoryintensive
Workintensive
Original MLMC 24.8 17.0
Quantised MLMC 13.2 3.99
Level pathsOriginal 1 920 000
Quantised 1 980 000Correction 14 000
EPSRC Centre for Doctoral Training in Industrially Focused Mathematical Modelling
Mat
hem
atic
alIn
stitu
teU
nive
rsity
ofO
xfor
d Numerical results
Running our QMLMC estimator with a single quantisation level (1024 bins) forfinely discretised paths gives the following average time per path:
Relative accuracy ε = 10−3
Times per path 10−4 sMemoryintensive
Workintensive
Original MLMC 24.8 17.0
Quantised MLMC 13.2 3.99×2
Level pathsOriginal 1 920 000
Quantised 1 980 000Correction 14 000
EPSRC Centre for Doctoral Training in Industrially Focused Mathematical Modelling
Mat
hem
atic
alIn
stitu
teU
nive
rsity
ofO
xfor
d Numerical results
Running our QMLMC estimator with a single quantisation level (1024 bins) forfinely discretised paths gives the following average time per path:
Relative accuracy ε = 10−3
Times per path 10−4 sMemoryintensive
Workintensive
Original MLMC 24.8 17.0
Quantised MLMC 13.2 3.99
Level pathsOriginal 1 920 000
Quantised 1 980 000Correction 14 000
EPSRC Centre for Doctoral Training in Industrially Focused Mathematical Modelling
Mat
hem
atic
alIn
stitu
teU
nive
rsity
ofO
xfor
d Numerical results
Running our QMLMC estimator with a single quantisation level (1024 bins) forfinely discretised paths gives the following average time per path:
Relative accuracy ε = 10−3
Times per path 10−4 sMemoryintensive
Workintensive
Original MLMC 24.8 17.0
Quantised MLMC 13.2 3.99×4
Level pathsOriginal 1 920 000
Quantised 1 980 000Correction 14 000
EPSRC Centre for Doctoral Training in Industrially Focused Mathematical Modelling
Mat
hem
atic
alIn
stitu
teU
nive
rsity
ofO
xfor
d Conclusions
Errors from using a cheap proxy distribution can be quantified and controlled bythe introduction of a nested multilevel Monte Carlo framework.
There is a degree of freedom in the construction of this proxy. Put the results inthe low level cache, or use a very cheap (piece-wise) polynomial.
The resultant approximations converge.
The approximate schemes scale as we move to wider vectors (SVE) and lowerprecisions (FP16), benefiting from greater SIMD parallelisation and faster FP16calculations.
Half-precision can be used in the coarsest calculations (which consume most ofthe computer time). (BFloat16 may require Kahan summation. . . ).
EPSRC Centre for Doctoral Training in Industrially Focused Mathematical Modelling
Mat
hem
atic
alIn
stitu
teU
nive
rsity
ofO
xfor
d References I
[1] Mike Giles. Approximating the erfinv function. In GPU Computing Gems,Jade Edition, volume 2, pages 109–116. Elsevier, 2011.
EPSRC Centre for Doctoral Training in Industrially Focused Mathematical Modelling