electronic supplement for chapter 4 random number generation › 0026-1394 › 52 › 3 › s42 ›...

1

Electronic supplement for Chapter 4

“Random Number Generation”

John Keightley

National Physical Laboratory, Hampton Road, Teddington, Middlesex, TW11 0LW, United Kingdom

Introduction

The remit of this document is to provide a relative novice with enough information to create

stand-alone software to simulate random variates from any desired probability distribution,

and to incorporate these into a scheme to effectively propagate uncertainty distributions

through a given mathematical measurement model, utilising Monte-Carlo techniques, as

described in the GUM Supplement 1. (JCGM 101:2008). It is provided as an Appendix to the

chapter “Example of Monte Carlo uncertainty assessment in the field of radionuclide

metrology” in Chapter 4 of the special issue on uncertainty evaluation in radionuclide

metrology.

There is a wealth of available information on the simulation from both continuous and

discrete probability distributions, and the topic is covered in most introductory books on

probability and statistics. The author found the books of Bevington (2003), Ross (2000),

Kroese et al (2011), Knuth (1998) to be excellent sources of information.

This Annex describes the generation of uniform random variates on the interval [0, 1], and

suggests an extremely compact and transportable uniform random number generation

algorithm, along with example code in C/C++, Matlab and Visual Basic for Applications.

Methods for transformation of these into variates belonging to a user defined distribution are

then discussed, along with some elementary methods for testing the validity of these

transformations (via provision of the expected lower order moments of the simulated

distributions). Only the simple cases of continuous rectangular, triangular and trapezoidal

distributions are considered here as examples, and the reader is left to implement algorithms

as required for other distributions.

Also discussed are methods for simulating radioactive decay processes, with particular

emphasis on the arrival-times of events and inter-arrival distributions. Such techniques are

highly valuable in the simulation of “list-mode” data files, commonly used in radionuclide

metrology in recent years (Keightley and Park, 2007).

1 Random Number Generation : Uniform distribution on Interval [0, 1].

As discussed in the main text of Chapter 4, there exists a variety of unit uniform random

number generation algorithms of varying quality. For the purposes of uncertainty assessment

via the propagation of uncertainty distributions through a given mathematical measurement

2

model, (following the Guidance of the GUM Supplement 1 (JCGM 101:2008), a fairly simple

random number may suffice. Indeed many software packages provide in-built random

number generators, and these may be more than suitable.

It is however instructive to provide the reader with a portable algorithm which may be used

with confidence, and may be either programmed to reside in a library file and called directly

from the user’s code, or programmed as a function which resides directly in the user’s code.

The author has some positive experience with a compact and readily programmable pseudo-

random number generator routine (Wichmann and Hill, 2006), incorporating the addition of

four simple multiplicative congruential generators, each using a prime number for its

modulus and a primitive root for its multiplier. The fractional part of this sum is taken as the

random variate on the interval [0, 1]. The period of the generator is approximately 2.6E+36.

This algorithm has been successfully subjected to the ‘DIEHARD’ series of tests (Marsaglia,

1985) and successfully passed the most stringent battery of tests (Big Crush) contained in the

Test01 package (L’Ecuyer and Simard, 2005).

1.1 C/C++ Implementation/s

An implementation of the Wichmann-Hill (2006) algorithm in C/C++ is shown below.

double random64() {

// Author : John Keightley, NPL (UK)

double U; static long long int IX = 1LL; static long long int IY = 1LL; static long long int IZ = 1LL; static long long int IT = 1LL; IX = (11600LL * IX) % 2147483579LL; IY = (47003LL * IY) % 2147483543LL; IZ = (23000LL * IZ) % 2147483423LL; IT = (33000LL * IT) % 2147483123LL; U = (double)IX/2147483579.0 + (double)IY/2147483543.0 + (double)IZ/2147483423.0 + (double)IT/2147483123.0; while (U >= 1.0) U -= 1.0; return U; }

The “random64()” function above returns a double precision floating point random variate U

on the interval [0, 1]. Note that since the fractional part of “U” is returned, then the return

value will never “exactly” equal 1.0.

3

The static keyword indicates that values for the 64-bit integers IX, IY, IZ and IT are retained

between subsequent calls to the random() function, and the first four lines of code are only

executed on the first instance of calling this function. This speeds up execution, as these

parameters are not passed at each subsequent function call. However, if one requires to start

the random number generator from the same starting point (for some reason), then the

random() function requires modification to explicitly pass IX, IY, IZ and IT, maintaining

their values via the use of global variables, or by passing the parameters as pointers to/from

the calling routine.

The above routine was successfully compiled using Microsoft Visual Studio 2013. Note that

this routine “random64( )” requires the use of 64-bit arithmetic. If a particular complier is

unable to manipulate 64-bit integers, then Wichmann and Hill (2006) provide another

algorithm utilising 32-bit arithmetic, but essentially performing the same calculations.

Depending on the operating system and processor employed, there may be some speed

benefits in employing the 32-bit version of the algorithm if there are no native machine code

instructions for performing division and modulus operations on 64-bit integers.

A faster C/C++ implementation of the Wichmann-Hill (2006) algorithm utilising 32-bit logic

follows:

double random() { // Author : John Keightley, NPL (UK)

// 32-bit version : Performs the same operations as in the // random64() function, but utilises 32-bit integer logic // which is around 25% faster (due to optimised modulus (%) operator) double U; static long int IX_32 = 1L; static long int IY_32 = 1L; static long int IZ_32 = 1L; static long int IT_32 = 1L; IX_32 = 11600L * (IX_32 % 185127L) - 10379L * (IX_32 / 185127L); IY_32 = 47003L * (IY_32 % 45688L) - 10479L * (IY_32 / 45688L); IZ_32 = 23000L * (IZ_32 % 93368L) - 19423L * (IZ_32 / 93368L); IT_32 = 33000L * (IT_32 % 65075L) - 8123L * (IT_32 / 65075L); if (IX_32 < 0) IX_32 += 2147483579L; if (IY_32 < 0) IY_32 += 2147483543L; if (IZ_32 < 0) IZ_32 += 2147483423L; if (IT_32 < 0) IT_32 += 2147483123L; U = (double)IX_32 / 2147483579.0 + (double)IY_32 / 2147483543.0 + (double)IZ_32 / 2147483423.0 + (double)IT_32 / 2147483123.0; while (U >= 1.0) U -= 1.0; return U; }

4

Thus, it is recommended to use the faster 32-bit function “random()”. A C++ header file may

be provided, as well as a Dynamic Link Library (DLL) containing these functions, by

contacting the author.

5

1.2 Microsoft Visual Basic for Applications (VBA) implementation

The following code may be utilised within a “module” of an Excel Spreadsheet. Note that

since the “Mod” operator of VBA cannot handle large integers, a replacement function

“mod_DBL” was created by the author instead.

Option Explicit Public Function random() As Double ‘Author : John Keightley, NPL (UK) Dim U As Double Static IX, IY, IZ, IT As Long ‘ NB: In VBA, Static variables are set to zero by default, and to the author’s ‘ knowledge, there is no mechanism to initialise a static variable. The ‘ following conditional statement will only execute on the first call to this ‘ function, until the calling program is closed. If ((IX = 0) And (IY = 0) And (IZ = 0) And (IT = 0)) Then IX = 1 IY = 1 IZ = 1 IT = 1 End If IX = mod_DBL((CDbl(11600) * CDbl(IX)), CDbl(2147483579)) IY = mod_DBL((CDbl(47003) * CDbl(IY)), CDbl(2147483543)) IZ = mod_DBL((CDbl(23000) * CDbl(IZ)), CDbl(2147483423)) IT = mod_DBL((CDbl(33000) * CDbl(IT)), CDbl(2147483123)) U = CDbl(IX) / CDbl(2147483579) + CDbl(IY) / CDbl(2147483543) + _ CDbl(IZ) / CDbl(2147483423) + CDbl(IT) / CDbl(2147483123) While (U >= 1) U = U - 1 End While random = U End Function

Private Function mod_DBL(a As Double, b As Double) As Double ‘ The VBA native Mod function cannot operate with large integers. This ‘ routine correctly operates on large numbers a = Int(Abs(a)) b = Int(Abs(b)) mod_DBL = a - (Int(a / b) * b) End Function

A VBA implementation (as a BAS file) may be provided by contacting the author.

6

1.3 MATLAB implementation

The following code may be utilised in MATLAB:

%-----------------------------------------------------------

% random.m

% ----------------------------------------------------------

% Author John Keightley, NPL

% ----------------------------------------------------------

function [U] = random()

persistent IX IY IZ IT;

if (isempty(IX) && isempty(IY) && isempty(IZ) && isempty(IT))

IX = 1;

IY = 1;

IZ = 1;

IT = 1;

end

IX = 11600*(mod(IX, 185127))- 10379*fix(IX/185127);

IY = 47003*(mod(IY, 45688)) - 10479*fix(IY/45688);

IZ = 23000*(mod(IZ, 93368)) - 19423*fix(IZ/93368);

IT = 33000*(mod(IT, 65075)) - 8123 *fix(IT/65075);

% If negative, we add the complement :

if (IX < 0) IX = IX + 2147483579; end

if (IY < 0) IY = IY + 2147483543; end

if (IZ < 0) IZ = IZ + 2147483423; end

if (IT < 0) IT = IT + 2147483123; end

U = IX/2147483579 + IY/2147483543 + IZ/2147483423 + IT/2147483123;

while (U > 1) U = U-1; end

% End random.m

A MATLAB script may be provided by contacting the author.

Regardless of the implementation of the algorithm (or computer language) used, one should

check that the first (say) 5 random numbers returned (on the first set of calls to the random()

function) are:

0.0000533661866319

0.8448766521181464

0.6367129108205449

0.3023666398238367

0.0662462213575785

7

2 Simulation of random numbers from other (continuous) distributions.

The general concept is to utilise techniques which operate on uniform random variates on the

interval [0, 1] (later denotedU ) to generate random variates from any user defined

distribution.

2.1 Inverse transform sampling technique.

Let X denote a random variable with a probability density function (PDF) f x and

cumulative density function (CDF) X

XF x P X x f x dx

.

In order to randomly sample a value according to this distribution of X :

Sample U from a uniform distribution on the interval [0, 1], and

Compute the value of ix such that: XF x U .

In other words, the inverse of the CDF, 1

XF U

exhibits the distribution XF x . This is

often referred to as the quantile function.

The “randomness” of X is guaranteed by that of U . This method is very simple, and is

indeed exact, provided the inverse of the CDF can be calculated. The expression XF x U

is often referred to as the “sampling equation”, when expressed in the form 1

XX F U

The inverse transform sampling technique relies on the monotonically increasing nature of

the CDF of a probability distribution. And on the one-to-one correspondence between values

of X and U .

2.2 Rejection sampling techniques

In cases where the sampling equation, 1XX F U

, is not readily calculable, the inverse

transform sampling technique is not generally applicable. However, one can sample

uniformly from the region under the PDF of the function to generate sampled values.

In order to randomly sample a value ix according to the desired distribution of X , two

uniformly distributed random variates 1U and 2U on the interval [0, 1] are required.

1) Sample a point 1 max minix U x x from the abscissa of the PDF, where maxx and minx

are the upper and lower bounds of the permissible values.

2) Calculate if x for this value of ix .

3) If 2 iU f x , then reject ix and return to step 1).

4) Else accept ix as belonging to the desired distribution.

8

3 Testing the simulated distribution

Having simulated data according to a desired distribution, some rudimentary calculations of

the observed lower-order moments should be performed, at least as a simple check that the

code performing the transformation is correct. In all cases, as a minimum, the mean and

variance of the observed distribution should be in excellent agreement with the theoretically

predicted values, provided a suitably large number of variates are generated.

For each distribution of a continuous random variable X discussed in the following sections,

the probability density function, cumulative density function and the “sampling equation” are

provided.

In order to test the validity of the transforms performed in the sampling equation, the moment

generating function, 1st ordinary moment ( 1m = mean), 2

nd ordinary moment ( 2m ) and the 2

nd

central moment ( 2 = variance) are also listed.

Probability density function (PDF): f x

Cumulative density function (CDF): X

XF x P X x f x dx

Sampling equation: XU F x rearranged to find X in terms of U

Moment generating function: . .t x t x

XM t E e e f x dt

1st ordinary moment (mean):

01 lim .

Xt

dm E X M t x f x dt

dt

rth

ordinary moment: 0

lim

r

r r

Xrt

r

dm E X M t x f x dt

dt

2nd

central moment (variance): 22

2 1 2 1E X m m m

For N simulated values of ix , one can readily compare the population mean x and

variance 2 from with the above predicted values, E X and 2 respectively, recalling:

Population mean: 1

1 N

i

i

x xN

Population variance: 2 22 2

1 1

1 1N N

i i

i i

x x x xN N

.

9

If one desires, one can readily calculate the predicted higher-order central moments (and thus

the “skewness” and “kurtosis”) of the distribution of X , using the following generally

applicable relationship between the central and ordinary moments:

rth

central moment; 1

0

1r

r k r k

r k

k

rm m

k

where 0

0 1 1m E X E

i.e.: 2

2 1 2m m as stated before

3

3 1 1 2 32 3m m m m

4 2

4 1 1 2 1 3 43 6 4

...

m m m m m m

Skewness:

3

3

22

Kurtosis:

4

2

2

For N simulated values of ix , compare the predicted results to those observed from the

resulting population.

Population skewness:

3

1

32 2

1 N

i

i

x xN

Population kurtosis:

4

1

22

1 N

i

i

x xN

10

4 Uniform Distributions

4.1 General uniform distribution on interval [a, b]

Probability density

function:

1

| ,

0

a x bf x a b b a

elsewhere

Cumulative density

function:

1

| ,XF x a b X ab a

Sampling equation: X a b a U

Moment generating

function:

1 bt at

XM t e e

t b a

Mean:

12

b am

Second ordinary moment:

3 3

23

b am

b a

rth

ordinary moment:

1 1

1

r r

r

b am

r b a

Variance:

2

212

b a

rth

central moment:

12 1

r r

r r

a b b a

r

Skewness:

3

3

22

0

Kurtosis:

4

2

2

9

5

11

4.2 Uniform distribution on interval [0, 1]

The results may be found directly from the above expressions in section 4.1 by simply

substituting a = 0 and b = 1.

Probability density

function:

1 0 1

0

xf x

elsewhere

Cumulative density

function: X

F x X

Sampling equation: X U

Moment generating

function:

1

t

X

eM t

t

Mean: 1

1

2m

Second ordinary moment: 2

1

3m

rth

ordinary moment:

1

1rm

r

Variance:

2

1

12

rth

central moment:

1

1

2

1

1

r

r r r

Skewness:

3

3

22

0

Kurtosis:

4

2

2

9

5

12

4.3 Uniform distribution with expectation “A” and semi-width “w/2”

The results may be found directly from the above expression in Section 4.1 for a general

uniform distribution, by simply substituting 2a A w and 2b A w .

Probability density

function:

12 2

| , w

0

A w x A wf x A w

elsewhere

Cumulative density

function:

1

| ,2

X

wF x A w X A

w

Sampling equation:

2 1

2

UX A w

Moment generating

function:

2 2

w wA t A t

X

e eM t

wt

Mean: 1 Am


2

2

212

wAm

rth

ordinary moment:

1 1/ 2 / 2

1

r r

r

A w A wm

w r

Variance:

2

212

w

rth

central moment:

12 1

rr

r r

ww

r

Skewness:

3

3

22

0

Kurtosis:

4

2

2

9

5

13

5 Triangular Distributions

5.1 General triangular distribution on interval [a, b] with apex at c

Probability density

function:

2

( )

2| , ,

( )

0

x aa x c

b a c a

b xc x bf x a b c

b a b c

elsewhere

Cumulative density

function:

2

2| , ,

1

X

X aa x c

b a c aF x a b c

b Xc x b

b a b c

Sampling equation:

0 / ( )

1 / ( )

a U b a c a U c a b aX

b U b a b c c a b a U b

Moment generating

function:

2

2

( )

bt ct ct at

X

e e e eM t

t b a b c c a

Mean: 13

a b cm


2 2 2

26

a b c ab ac bcm

Variance:

2 2 2

218

a b c ab ac bc

Skewness:

3

3 32 2 22 2

2

2 2 2 2

5

a b c a b c a b c

a b c ab ac bc

Kurtosis:

4

2

2

12

5

14

5.2 Isosceles triangle distribution on interval [a,b]

The results from the previous section may be generalised to the case of an isosceles triangle

on the interval [a,b], where the peak resides in the middle of the interval. Substitution of c =

(a+b)/2 into the above expressions yields:

Probability density

function:

2

2

42

42| ,

0

x aa x a b

b a

b xa b x bf x a b

b a

elsewhere

Cumulative density

function:

2

2

2

2

22

| , ,2

1 2

X

X aa x a b

b aF x a b c

b Xa b x b

b a

Sampling equation:

2 0 0.5

1 2 0.5 1

a b a U UX

b b a U U

Moment generating

function:

2

22

42

a b

bt

X

tat

M t e et

eb a

Mean: 12

a bm


2 2

2

7 10 7

24

a ab bm

Variance:

2

224

b a

Skewness:

3

3

22

0

Kurtosis:

4

2

2

12

5

15

5.3 Isosceles distribution with expectation “A” and semi-width “w/2”

The results from the 2.2 section may be generalised to the case of an isosceles triangle with

expectation A and semi-width w/2. Substitution of a = A - w/2 and b = A + w/2 into the

expressions in Section A.5.2 yields:

Probability density

function:

2

2

4 22

4 22| ,

0

x A wA w x A

w

A w xA x A wf x A w

w

elsewhere

Cumulative density

function:

2

2

2

2

22

2

| , ,

22

1 2

X

wX A

A w x Aw

F x a b cw

A X

A x A ww

Sampling equation:

2 1 2 0 0.5

2 1 2 1 0.5 1

A w U U

XA w U U

Moment generating

function:

2 2

2 2

42

w wA t A t

At

XM t e e e

t w

Mean: 1 Am


2

2

224

wAm

Variance:

2

224

w

Skewness:

3

3

22

0

Kurtosis:

4

2

2

12

5

16

6 Trapezoidal distributions

6.1 Generalised trapezoidal distribution on interval [a, b] with nodes at n1 and n2

The sampling equations for simple geometrical shapes may be taken as specific instances of a

generalised trapezoidal distribution. A random variate X with a trapezoidal PDF f x on

the interval ,a b with nodes at 1n and 2n is shown below. Since the area under the PDF must

be unity, the height of the PDF is given by the expression. 2 1

2

( )h

b a n n

a bn1 n2

f(x)

h

Probability

density

function:

1

1 2 1 2

2

1

2

| , , ,

a x n

f x a b n n h n x n

n x b

h x a n a

h x b n b

Cumulative

density

function:

1

1 2 1 1 2

2

2

1

1

2

2

| , , ,

1

2

2

2

X

a x n

F x a b n n h X n n x n

n x b

h X a n a

h n a

h X b n b

Sampling

equation:

2 1 1 1

2 1 1

1 1 2

2 2 1 2

02

12 2 2

1 1 12

ha U b a n n n a U n a

U b a n n n a h hX n n a U b n

hb U b n b a n n b n U

Moment

generating

function:

2 1

2

2 1

n t n tbt at

X

h e e e eM t

t b n n a

17

Mean:

3 3 3 3

2 1

2 1

2 2

1 1 2 1 2 2

2 2 2 2

1 2 2 1 16 6

2 3 26

b n n ah h

b n n a

hn a a n n n b n b n

m b bn n a an n

Second

ordinary

moment:

4 4 4 4

2 1

2 1

2 22 3 3 2

1 1 1 2 1 2 2 2

3 2 2 3 3 2 2 3

2 2 2 2 1 1 112

2 4 212

12

b n n ah h

b n n a

hn a a n n n n b n b n n

m b b n bn n a a n an n

rth

ordinary

moment:

2 2 2 2

2 1

2 1

11 1

2 1

0

1 2

1 2

r r r r

r

rr i i r i i

i

b n n ah

b n n a

h

mr r

b n a nr r

Variance:

2 22 2 2 2

2 1 2 1 24 2

1 1 1

12 2 48

hn a b n n a b n

h h

Skewness:

(*)

3

3 1 1 2 3

3 322 2

2 2 1

2 3m m m m

m m

Kurtosis:

(*)

4 2

1 1 2 1 3 44

2 22

2 2 1

3 6 4m m m m m m

m m

(*) Closed-form expressions for the higher-order moments and coefficients of skewness and

kurtosis are difficult to deduce. The numerical values are readily calculated via the

expressions for the moments about the origin rm , and the general relationships between

central and ordinary moments as detailed in section 3.

The interested reader may like to confirm that these reduce to the triangular case (for

1 2n n c ) as described in Section 5.1 and the rectangular case (for 1n a and 2n b ) as

described in Section 4.1. Further details may be found in the articles of Kacker and Lawrence

(2007) and van Dorp and Kotz (2003).

18

7 Normal distribution

Due to the central limit theorem, the normal distribution is omni-present.

Any suite of Monte Carlo simulation software should be able to generate variates from a

normal distribution. First we will discuss methods for generating variates from the standard

normal distribution.

7.1 Standard normal distribution with expectation 0 and standard deviation 1

Probability density

function:

2

21

2

x

f x e

Cumulative density

function: 2

21 1

122 2

X x

X

xF x e dx erf

2

0

2: ( )

z

twhere erf z e dt

see notes below

Sampling equation: Does not exist. See notes below.

Moment generating

function:

2

2

t

XM t e

Mean: 1 0m

Second ordinary moment: 2 1m

Variance: 2 1

Skewness: 0

Kurtosis: 3

The integral of the PDF evaluated between - and X, (or the the CDF); X P X xF x ,

cannot be evaluated analytically. One solution is to perform a Taylor series expansion of the

PDF and to integrate each successive term until some required precision is reached, or to

perform numerical integration.

Thus, the inverse of the CDF, or sampling equation, 1

XX F U

, does not exist, and the

simulation of normal random variates requires other methods than the inverse transform

sampling technique. A useful and compact rejection technique is the polar form of the well-

known Box-Müller transform:

1) Create 2 uniform random variates 1 2,U U on the interval 0,1 .

2) Transform these to two random variables 1V and 2V on interval [-1, 1].

i.e.: 1 12 1V U and 2 22 1V U .

3) Calculate 2 2

1 2s V V .

19

4) If 0s or 1s then go to step 1.

5) Else

1 1

2lnX U

s

s

6) Return the value 1X .

o Note that this algorithm can also return a second independent standard normal

variate (for free), if required

2 2

2lnX U

s

s

.

o This algorithm may also be adapted to return a random variates from the

standard Cauchy (or Lorentzian) distribution, if one returns the ratio 1 2X X .

7.2 Standard normal distribution with expectation and variance 2

The results from the standard normal distribution may be readily generalised to a normal

distribution with mean = and standard deviation . In the PDF, the replacement of x with

the dimensionless term x

is required, noting that for the PDF to be normalised, a

multiplicative factor of 1

is also required.

Probability density

function:

2

22

1| ,

2

x

f x e

Cumulative density

function: 1

| , 12 2

X

xF x erf

2

0

2: ( )

z

twhere erf z e dt

see notes below

Sampling equation: Does not exist. See notes below.

Moment generating

function:

2 2

2

tt

XM t e

Mean: 1m

Second ordinary moment: 2 2

2m

Variance: 2

2

Skewness: 0

Kurtosis: 3

20

To generate normal random variates with mean and variance 2 , one can readily use the

convenient scaling properties of the normal distribution to simply generate random variates

0,1X from the standard normal distribution (as described in Section A.7.1), and apply the

transform , 0,1X X .

21

8 Poisson Process, exponential intervals with rate

It is often useful to simulate a Poisson Process with rate , when considering the effects of

dead-time on a particular counting experiment. The “memory-less” nature of a Poisson

process implies that the inter-arrival times between successive events are independent,

identically distributed exponential variables. The fundamental properties of counting statistics

in radionuclide metrology are readily derived from the “inter-arrival” distribution of pulses

from radiation detectors. A great deal of information on this topic has been published

elsewhere (Müller, 1973) and (ICRU, 1994).

A means of simulating an exponential distribution, as well as some of the expected moments

are given below:

Probability

density function: | 0 0x

f x e x and

Cumulative

density function: | 1 0x

XF x e t

Sampling

equation:

ln 1 ln

since both and 1 are uniform random variates

U UX

U U

Moment

generating

function:

XM tt

Mean: 1

1m

Second ordinary

moment: 2 2

2m

rth

ordinary

moment: !

r r

rm

Variance:

2 2

1

rth

central

moment: 0

!1

!

rr jr

r

j

r

r j

Skewness: 2

Kurtosis: 9

(Note that for an exponential distribution, the standard deviation equals the mean).

However, if simulating data in “list-mode” (where the recorded arrival times of events are

quantised according to the sampling frequency of an ADC), there will be truncations of the

simulated arrival times, perturbing the exponential distribution and creating a loss of the

“memory-less” nature of the input Poisson Process. Such truncations in effect impose an

extendable dead-time on the simulated pulse train, with the duration of the dead-time equal to

1.5 times the period of the sampling clock.

22

9 Poisson Process, Gamma distribution (multiple intervals) with rate

From the previous section we saw that the inter-arrival times 1 2, ,X X between successive

events for a Poisson Process with rate form a sequence of independent exponentially

distributed random variables, with the PDF : xe . The thk arrival time is thus the sum of the

first k inter-arrival times: i.e.:

1

lnki

k

i

UT

.

The probability density function for the thk arrival time is thus a k - fold convolution of the

exponentially distributed inter-arrivals, and forms a gamma distribution with shape parameter

k and rate . Since the shape parameter in this case is an integer, this is also often referred

to as an “Erlang” distribution.

Probability density

function:

1

| , ; 0 ; 01 !

kk x

f x kx

e xk

Cumulative distribution

function:

| ,

,

1 ! !

i

x

i k

F x kk x x

ek i

1

0

,

x

k zk x z e dz

is the lower incomplete

gamma function.

Sampling equation:

1 1

1 1ln ln

kk

i i

i i

X U U

see notes below, for large k

Moment generating

function: k

XM tt

Mean: 1m k

Second ordinary

moment: 21k k

rth

ordinary moment:

1

0

rr

i

k i

Variance: 2k

Skewness: 2 k

Kurtosis: 3 2k k

The sampling equation is presented in two forms. The first is expressed as the summation of

the independent exponentially distributed terms representing the interval between successive

events. When k is large, this sampling equation will require taking the logarithm of a large

number of random variates within the loop over k events. One may be tempted to use the

second form of the sampling equation presented above: 1

1ln

k

i

i

U

, as the logarithm is

taken outside of the loop over k terms. However, numerical stability issues may occur due to

the rapid convergence of the product towards zero. More information on this issue is

presented in the next section on the generation of Poisson variates.

23

Several algorithms devoted to gamma variate simulation have been published: Ahrens and

Dieter (1974), Cheng and Feast (1979), Best (1983), Marsaglia and Tsang (2000) and have

been incorporated into a variety of software packages. Particularly when k is large, it is

recommended to utilise high-quality verified algorithms via calls to external code which has

been verified. Some examples include the function G05SJF from the Dynamic Link Library,

Mark 24 from the Numerical Algorithms Group (NAG, 2013). This generates a vector of

pseudorandom numbers taken from a gamma distribution with high precision for large k and

may be readily called from a variety of programming languages. However use of this library

requires a (paid for) licence. It appears that a similar C/C++ function “gsl_ran_gamma” is

available from the GNU Scientific Library (GNU, 2013) and is free software under the GNU

General Public License. MATLAB also includes the “gamrand” function.

10 Poisson distribution

In radionuclide metrology, it is often also of great interest to consider the probability p k of

observing k events from a Poisson Process of rate , in a time interval t . The variates of k

form a Poisson distribution on the interval 0, with expectation E k , where t .

Probability mass function:

| 0,1,2,

!

k

p ke

P X k kk

Cumulative distribution

function: 0

|!

ik

i

P k ei

Sampling equation: See discussion below

Moment generating

function:

1te

kM t e

Mean: 1m

Second ordinary moment: 2

2m

rth

ordinary moment:

1 0

1!

j jri r

r

j i

jm j i

ij

Variance: 2

rth

central moment: 2

0

0

11

r

r j

j

rwhere

j

Skewness: 1

Kurtosis: 13

In order for exactly k events to be recorded in a time interval t , we require by necessity that

1k kt t t . One can readily simulate such Poisson distributed variates by forming the

summation of simulated inter-arrival times in the time interval 0, t (as described in the

previous section) and returning the number k when the condition 1k kt t t is reached.

24

This may also be expressed as the maximum value of n where

1

lnni

i

Ut

, i.e.:

1

max n : lnn

i

i

k U

.

In other words, the summation is stopped at the first instance when 1

lnn

i

i

U

, and the

value 1k n represents the Poisson random variate of interest. However, each simulated

inter-arrival time in the summation requires the computation of a natural logarithm, and for

large , the number of iterations required may be excessive.

A “snippet” of C/C++ code illustrates this algorithm:

unsigned long int Poisson_Example_1(double lambda) { unsigned long int k = 0; double sum_intervals = 0.0; do{ k++; sum_intervals += -log(random()); } while (sum_intervals <= lambda); return (k - 1); }

One may be tempted to use the essentially equivalent (and at first glance more

computationally effective) method of removing the logarithm from the expression within the

loop. Thus only a single exponential e is calculated, and only the product of the uniform

random variates iU is required.

1

min :n

i

i

k n U e

.

In this instance, the algorithm is stopped at the first instance where 1

n

i

i

U e

and the value

1k n represents the Poisson random variate. A “snippet” of C/C++ code illustrates this

algorithm:

unsigned long int Poisson_Example_2(double lambda) { double L = exp(-lambda); unsigned long int k = 0; double product_U = 1.0; do{ k++; product_U *= random(); } while (product_U > L);

25

return (k - 1); }

However, the use of the above algorithm requires a great deal of caution. If t is large,

then the value e rapidly approaches zero. Depending on the machine precision available, a

great deal of numerical instability is to be expected. If using double precision floating point

variables, a value for of as low as 700 has already approached the smallest possible double

precision floating point number (at least on the author’s computer system).

As stated in the previous section it is recommended to utilise high-quality, verified algorithms

in such cases to perform the simulations in such cases. Examples include the Numerical

Algorthims Group function “G05TJF” (NAG, 2013), the C/C++ implementation of the GNU

Scientific Library “gsl_ran_poisson” or the MATLAB function “poissrnd”.

However, since we can express k in terms of the sum of independent, identically distributed

random variables (intervals), we can apply the Central Limit Theorem, for large . This

states that the Poisson distribution can be safely approximated by a normal distribution with

expectation 1E k m and variance 2 , and for (say) 25 the approximation is

rather good.

However, since the Poisson distribution is a discrete distribution and the normal distribution

is continuous, a continuity correction is required, where a discrete value k from the Poisson

distribution is represented by the interval from 0.5k to 0.5k . Thus, the area under the

normal PDF between the limits 0.5k and 0.5k represents the probability of the discrete

whole number k , and this correction should be made.

26

References

Ahrens, J.H. and Dieter, U., (1974) Computer methods for sampling from gamma, beta,

Poisson, and binomial distributions. Computing, 12(3):223-246.

Best, D. J.,.(1983) A note on gamma variate generators with shape parameters less than

unity. Computing, 30(2):185-188.

Bevington, B.R. and Robinson, D.K., (2003). Data reduction and error analysis for the

physical sciences. McGraw Hill, New York, USA.

Cheng, R. C. H. and Feast, G.M., (1979) Some simple gamma variate generators.

Computing, 28(3):290-295.

GNU., (2013). GNU Scientific Library. Version GSL-1.16, released on 19 July 2013.

http://www.gnu.org/software/gsl/.

ICRU Report 52., (1994). Particle counting in radioactivity measurements. Bethesda, MD.

JCGM 101:2008. Evaluation of measurement data — Supplement 1 to the “Guide to the expression of

uncertainty in measurement” — Propagation of distributions using a Monte Carlo method. 2008.

Kacker, R.N. and Lawrence, J. F., (2007) Trapezoidal and triangular distributions for Type B

evaluation of standard uncertainty. Metrologia. 44, 117-127.

Knuth, D.E., 1998. The Art of Computer Programming (2nd Edition): Volume 2: Seminumerical

Algorithms. Addison Wesley.

L’Ecuyer, P., Simard, R., 2005. TestU01:A software library in ANSI C for empirical testing of

random number generators. Laboratoire de simulation et d’optimisation. Université de Montréal IRO.

Version 6.0, dated January 14, 2005. Available at http://simul.iro.umontreal.ca/testu01/tu01.html

D.P. Kroese, T. Taimre, Z.I. Botev. (2011) Handbook of Monte Carlo Methods. John Wiley & Sons.

Marsaglia, G. and Tsang, W. (2000) A simple method for generating gamma variables. ACM

Transactions on Mathematical Software, 26(3):363-372.

Marsaglia, G., 1985. The Marsaglia random number CDROM, with the Diehard battery of tests of

randomness, produced at Florida State University under a grant from The National Science

Foundation. Available at http://www.stat.fsu.edu/pub/diehard

Müller, J. W., (1973). Dead time problems. Nucl. Inst. Meth. 112, 47-57.

NAG (2013). The NAG Library, The Numerical Algorithms Group (NAG), Oxford, United Kingdom.

www.nag.com

Ross, S.M., (2000). Introduction to probability models. 7th edition. Harcourt Academic

Press. San Diego

Stuart, A. and Ord, J.K., 1994. Kendall’s advanced theory of statistics : Volume 1.

Distribution theory. Sixth Edition. Arnold, London.

Van Dorp, J. R. and Kotz, S., (2003) Generalized trapezoidal distributions. Metrika, 58, 85-87.

http://www.gnu.org/software/gsl/

27

Wichmann, B. A. and Hill, I. D., (2006) Generating good pseudo-random numbers. Computational

Statistics and Data Analysis. 51, 1614–1622.

electronic supplement for chapter 4 random number generation › 0026-1394 › 52 › 3 › s42 ›...

Documents