[0.5em] numerical simulations using approximate random numbers · numerical simulations using...

Mat

hem

atic

alIn

stitu

teU

nive

rsity

ofO

xfor

d

Numerical simulations usingapproximate random numbers

Oliver [email protected]

Thursday 6th February 2020

Supervisors:Prof. Michael Giles OxfordDr Christopher Goodyer Arm

EPSRC Centre for Doctoral Training in Industrially Focused Mathematical Modelling

mailto:[email protected]

Mat

hem

atic

alIn

stitu

teU

nive

rsity

ofO

xfor

d Some HPC dogmas

Of course my code will run faster if it’s vectorised.

No

Of course my code will run faster if I worked in a lower precision.

No

Well my compiler should be clever enough to decide what’s best.

No

Half-precision is useless for anyone who wants accurate answers!

No

You can’t design code that performs well on arbitrary vector lengths.

No


Mat

hem

atic

alIn

stitu

teU

nive

rsity

ofO

xfor

d Overview

1 What is an approximate random variable?

2 When can we use them?

3 Do we still get the right answer?

4 What precisions can we use, and when?

5 Conclusions


Mat

hem

atic

alIn

stitu

teU

nive

rsity

ofO

xfor

d Stochastic simulations

Weather Tomorrow’s temperature given today’s weather.Finance Value of contract given today’s prices.

Traffic Rush hour traffic given morning congestion.Health Risk of later secondary condition given current health.

Technology Expected number of online visitors given current search trends.


Mat

hem

atic

alIn

stitu

teU

nive

rsity

ofO

xfor

d Mathematical simulations

f (·)

F (·)

F−1(u)


Mat

hem

atic

alIn

stitu

teU

nive

rsity

ofO

xfor

d The inverse Gaussian CDF Φ−1(·)

0 0.5 1x

-5

0

5Φ−

1 (x)


Mat

hem

atic

alIn

stitu

teU

nive

rsity

ofO

xfor

d Tails kill performance in SVE and FP16

0 0.5 1x

-5

0

5Φ−

1 (x)


Mat

hem

atic

alIn

stitu

teU

nive

rsity

ofO

xfor

d Piecewise constant approximation

0 0.5 1x

0

V(Z − Z̃

)≈ 1.6× 10−4 for 1024 partitions

Uniform piecewise constant approximation of Φ−1(·)

Φ−1(x)Φ̃−1(x)


Mat

hem

atic

alIn

stitu

teU

nive

rsity

ofO

xfor

d Piecewise linear dyadic approximation

0 0.5 1x

0

V(Z − Z̃

)≈ 4.3× 10−5 for 31 intervals

Piecewise linear dyadic approximation of Φ−1(·)

Φ−1(x)Φ̃−1(x)


Mat

hem

atic

alIn

stitu

teU

nive

rsity

ofO

xfor

d Polynomial approximation

0 0.5 1x

0

V(Z − Z̃

)≈ 2.6× 10−3 for a 7th order polynomial

Cubic approximation of Φ−1(·)

Φ−1(x)Φ̃−1(x)


Mat

hem

atic

alIn

stitu

teU

nive

rsity

ofO

xfor

d Speed of execution

Average speed(clock cycles)

Intel MKL 8Lookup table 2


Mat

hem

atic

alIn

stitu

teU

nive

rsity

ofO

xfor




GNU GSL 128Cephes 83NAG 117


Mat

hem

atic

alIn

stitu

teU

nive

rsity

ofO

xfor




GNU GSL 128Cephes 83NAG 117

GSL (optimised) 17Giles [1] (NVIDIA?) 12


Mat

hem

atic

alIn

stitu

teU

nive

rsity

ofO

xfor

d Accuracy and precision


Mat

hem

atic

alIn

stitu

teU

nive

rsity

ofO

xfor

d Multilevel Monte Carlo

E(P ) ≈ E(P̂Accurate

)= E

(P̂Crude

)+ E

(P̂Accurate − P̂Crude

)


Mat

hem

atic

alIn

stitu

teU

nive

rsity

ofO

xfor

d Nested multilevel Monte Carlo I

Monte Carlo (MC)

Multilevel Monte Carlo(MLMC)

Quantised multilevel MonteCarlo (QMLMC)

Reduced precision quantisedmultilevel Monte Carlo

(RPQMLMC)

Monte Carlo

Temporal discretisation

Quantised distribution

Reduced (half) precision


Mat

hem

atic

alIn

stitu

teU

nive

rsity

ofO

xfor

d Discretisation, quantisation, and roundoff

Correction = Discretisation × Quantisation

+ Roundoff︸︷︷︸1

Discretisation


Mat

hem

atic

alIn

stitu

teU

nive

rsity

ofO

xfor



+ Roundoff

︸︷︷︸1

Discretisation


Mat

hem

atic

alIn

stitu

teU

nive

rsity

ofO

xfor



+ Roundoff︸︷︷︸1

Discretisation


Mat

hem

atic

alIn

stitu

teU

nive

rsity

ofO

xfor

d When can’t we use half-precision?

0 5 10 15 20 25 30Level l with N = 2l

2−50

2−40

2−30

2−20

2−10

20 Reduced precision quantised multilevel Monte Carlo

V(X̂ f

64− X̂c64− X̃ f

16 + X̃c16

)

V(X̂ f

64− X̂c64− X̃ f

32 + X̃c32

)

V(X̂ f

64− X̂c64− X̃ f

64 + X̃c64

)


Mat

hem

atic

alIn

stitu

teU

nive

rsity

ofO

xfor

d Numerical results

Running our QMLMC estimator with a single quantisation level (1024 bins) forfinely discretised paths gives the following average time per path:

Relative accuracy ε = 10−3

Times per path 10−4 sMemoryintensive

Workintensive

Original MLMC 24.8 17.0

Quantised MLMC 13.2 3.99

Level pathsOriginal 1 920 000

Quantised 1 980 000Correction 14 000


Mat

hem

atic

alIn

stitu

teU

nive

rsity

ofO

xfor

d Numerical results




Workintensive


Quantised MLMC 13.2 3.99×2




Mat

hem

atic

alIn

stitu

teU

nive

rsity

ofO

xfor

d Numerical results




Workintensive


Quantised MLMC 13.2 3.99




Mat

hem

atic

alIn

stitu

teU

nive

rsity

ofO

xfor

d Numerical results




Workintensive


Quantised MLMC 13.2 3.99×4




Mat

hem

atic

alIn

stitu

teU

nive

rsity

ofO

xfor

d Conclusions

Errors from using a cheap proxy distribution can be quantified and controlled bythe introduction of a nested multilevel Monte Carlo framework.

There is a degree of freedom in the construction of this proxy. Put the results inthe low level cache, or use a very cheap (piece-wise) polynomial.

The resultant approximations converge.

The approximate schemes scale as we move to wider vectors (SVE) and lowerprecisions (FP16), benefiting from greater SIMD parallelisation and faster FP16calculations.

Half-precision can be used in the coarsest calculations (which consume most ofthe computer time). (BFloat16 may require Kahan summation. . . ).


Mat

hem

atic

alIn

stitu

teU

nive

rsity

ofO

xfor

d References I

[1] Mike Giles. Approximating the erfinv function. In GPU Computing Gems,Jade Edition, volume 2, pages 109–116. Elsevier, 2011.


[0.5em] numerical simulations using approximate random numbers · numerical simulations using...

Documents