electronic supplement for chapter 4 random number generation › 0026-1394 › 52 › 3 › s42 ›...
TRANSCRIPT
1
Electronic supplement for Chapter 4
“Random Number Generation”
John Keightley
National Physical Laboratory, Hampton Road, Teddington, Middlesex, TW11 0LW, United Kingdom
Introduction
The remit of this document is to provide a relative novice with enough information to create
stand-alone software to simulate random variates from any desired probability distribution,
and to incorporate these into a scheme to effectively propagate uncertainty distributions
through a given mathematical measurement model, utilising Monte-Carlo techniques, as
described in the GUM Supplement 1. (JCGM 101:2008). It is provided as an Appendix to the
chapter “Example of Monte Carlo uncertainty assessment in the field of radionuclide
metrology” in Chapter 4 of the special issue on uncertainty evaluation in radionuclide
metrology.
There is a wealth of available information on the simulation from both continuous and
discrete probability distributions, and the topic is covered in most introductory books on
probability and statistics. The author found the books of Bevington (2003), Ross (2000),
Kroese et al (2011), Knuth (1998) to be excellent sources of information.
This Annex describes the generation of uniform random variates on the interval [0, 1], and
suggests an extremely compact and transportable uniform random number generation
algorithm, along with example code in C/C++, Matlab and Visual Basic for Applications.
Methods for transformation of these into variates belonging to a user defined distribution are
then discussed, along with some elementary methods for testing the validity of these
transformations (via provision of the expected lower order moments of the simulated
distributions). Only the simple cases of continuous rectangular, triangular and trapezoidal
distributions are considered here as examples, and the reader is left to implement algorithms
as required for other distributions.
Also discussed are methods for simulating radioactive decay processes, with particular
emphasis on the arrival-times of events and inter-arrival distributions. Such techniques are
highly valuable in the simulation of “list-mode” data files, commonly used in radionuclide
metrology in recent years (Keightley and Park, 2007).
1 Random Number Generation : Uniform distribution on Interval [0, 1].
As discussed in the main text of Chapter 4, there exists a variety of unit uniform random
number generation algorithms of varying quality. For the purposes of uncertainty assessment
via the propagation of uncertainty distributions through a given mathematical measurement
2
model, (following the Guidance of the GUM Supplement 1 (JCGM 101:2008), a fairly simple
random number may suffice. Indeed many software packages provide in-built random
number generators, and these may be more than suitable.
It is however instructive to provide the reader with a portable algorithm which may be used
with confidence, and may be either programmed to reside in a library file and called directly
from the user’s code, or programmed as a function which resides directly in the user’s code.
The author has some positive experience with a compact and readily programmable pseudo-
random number generator routine (Wichmann and Hill, 2006), incorporating the addition of
four simple multiplicative congruential generators, each using a prime number for its
modulus and a primitive root for its multiplier. The fractional part of this sum is taken as the
random variate on the interval [0, 1]. The period of the generator is approximately 2.6E+36.
This algorithm has been successfully subjected to the ‘DIEHARD’ series of tests (Marsaglia,
1985) and successfully passed the most stringent battery of tests (Big Crush) contained in the
Test01 package (L’Ecuyer and Simard, 2005).
1.1 C/C++ Implementation/s
An implementation of the Wichmann-Hill (2006) algorithm in C/C++ is shown below.
double random64() {
// Author : John Keightley, NPL (UK)
double U; static long long int IX = 1LL; static long long int IY = 1LL; static long long int IZ = 1LL; static long long int IT = 1LL; IX = (11600LL * IX) % 2147483579LL; IY = (47003LL * IY) % 2147483543LL; IZ = (23000LL * IZ) % 2147483423LL; IT = (33000LL * IT) % 2147483123LL; U = (double)IX/2147483579.0 + (double)IY/2147483543.0 + (double)IZ/2147483423.0 + (double)IT/2147483123.0; while (U >= 1.0) U -= 1.0; return U; }
The “random64()” function above returns a double precision floating point random variate U
on the interval [0, 1]. Note that since the fractional part of “U” is returned, then the return
value will never “exactly” equal 1.0.
3
The static keyword indicates that values for the 64-bit integers IX, IY, IZ and IT are retained
between subsequent calls to the random() function, and the first four lines of code are only
executed on the first instance of calling this function. This speeds up execution, as these
parameters are not passed at each subsequent function call. However, if one requires to start
the random number generator from the same starting point (for some reason), then the
random() function requires modification to explicitly pass IX, IY, IZ and IT, maintaining
their values via the use of global variables, or by passing the parameters as pointers to/from
the calling routine.
The above routine was successfully compiled using Microsoft Visual Studio 2013. Note that
this routine “random64( )” requires the use of 64-bit arithmetic. If a particular complier is
unable to manipulate 64-bit integers, then Wichmann and Hill (2006) provide another
algorithm utilising 32-bit arithmetic, but essentially performing the same calculations.
Depending on the operating system and processor employed, there may be some speed
benefits in employing the 32-bit version of the algorithm if there are no native machine code
instructions for performing division and modulus operations on 64-bit integers.
A faster C/C++ implementation of the Wichmann-Hill (2006) algorithm utilising 32-bit logic
follows:
double random() { // Author : John Keightley, NPL (UK)
// 32-bit version : Performs the same operations as in the // random64() function, but utilises 32-bit integer logic // which is around 25% faster (due to optimised modulus (%) operator) double U; static long int IX_32 = 1L; static long int IY_32 = 1L; static long int IZ_32 = 1L; static long int IT_32 = 1L; IX_32 = 11600L * (IX_32 % 185127L) - 10379L * (IX_32 / 185127L); IY_32 = 47003L * (IY_32 % 45688L) - 10479L * (IY_32 / 45688L); IZ_32 = 23000L * (IZ_32 % 93368L) - 19423L * (IZ_32 / 93368L); IT_32 = 33000L * (IT_32 % 65075L) - 8123L * (IT_32 / 65075L); if (IX_32 < 0) IX_32 += 2147483579L; if (IY_32 < 0) IY_32 += 2147483543L; if (IZ_32 < 0) IZ_32 += 2147483423L; if (IT_32 < 0) IT_32 += 2147483123L; U = (double)IX_32 / 2147483579.0 + (double)IY_32 / 2147483543.0 + (double)IZ_32 / 2147483423.0 + (double)IT_32 / 2147483123.0; while (U >= 1.0) U -= 1.0; return U; }
4
Thus, it is recommended to use the faster 32-bit function “random()”. A C++ header file may
be provided, as well as a Dynamic Link Library (DLL) containing these functions, by
contacting the author.
5
1.2 Microsoft Visual Basic for Applications (VBA) implementation
The following code may be utilised within a “module” of an Excel Spreadsheet. Note that
since the “Mod” operator of VBA cannot handle large integers, a replacement function
“mod_DBL” was created by the author instead.
Option Explicit Public Function random() As Double ‘Author : John Keightley, NPL (UK) Dim U As Double Static IX, IY, IZ, IT As Long ‘ NB: In VBA, Static variables are set to zero by default, and to the author’s ‘ knowledge, there is no mechanism to initialise a static variable. The ‘ following conditional statement will only execute on the first call to this ‘ function, until the calling program is closed. If ((IX = 0) And (IY = 0) And (IZ = 0) And (IT = 0)) Then IX = 1 IY = 1 IZ = 1 IT = 1 End If IX = mod_DBL((CDbl(11600) * CDbl(IX)), CDbl(2147483579)) IY = mod_DBL((CDbl(47003) * CDbl(IY)), CDbl(2147483543)) IZ = mod_DBL((CDbl(23000) * CDbl(IZ)), CDbl(2147483423)) IT = mod_DBL((CDbl(33000) * CDbl(IT)), CDbl(2147483123)) U = CDbl(IX) / CDbl(2147483579) + CDbl(IY) / CDbl(2147483543) + _ CDbl(IZ) / CDbl(2147483423) + CDbl(IT) / CDbl(2147483123) While (U >= 1) U = U - 1 End While random = U End Function
Private Function mod_DBL(a As Double, b As Double) As Double ‘ The VBA native Mod function cannot operate with large integers. This ‘ routine correctly operates on large numbers a = Int(Abs(a)) b = Int(Abs(b)) mod_DBL = a - (Int(a / b) * b) End Function
A VBA implementation (as a BAS file) may be provided by contacting the author.
6
1.3 MATLAB implementation
The following code may be utilised in MATLAB:
%-----------------------------------------------------------
% random.m
% ----------------------------------------------------------
% Author John Keightley, NPL
% ----------------------------------------------------------
function [U] = random()
persistent IX IY IZ IT;
if (isempty(IX) && isempty(IY) && isempty(IZ) && isempty(IT))
IX = 1;
IY = 1;
IZ = 1;
IT = 1;
end
IX = 11600*(mod(IX, 185127))- 10379*fix(IX/185127);
IY = 47003*(mod(IY, 45688)) - 10479*fix(IY/45688);
IZ = 23000*(mod(IZ, 93368)) - 19423*fix(IZ/93368);
IT = 33000*(mod(IT, 65075)) - 8123 *fix(IT/65075);
% If negative, we add the complement :
if (IX < 0) IX = IX + 2147483579; end
if (IY < 0) IY = IY + 2147483543; end
if (IZ < 0) IZ = IZ + 2147483423; end
if (IT < 0) IT = IT + 2147483123; end
U = IX/2147483579 + IY/2147483543 + IZ/2147483423 + IT/2147483123;
while (U > 1) U = U-1; end
% End random.m
A MATLAB script may be provided by contacting the author.
Regardless of the implementation of the algorithm (or computer language) used, one should
check that the first (say) 5 random numbers returned (on the first set of calls to the random()
function) are:
0.0000533661866319
0.8448766521181464
0.6367129108205449
0.3023666398238367
0.0662462213575785
7
2 Simulation of random numbers from other (continuous) distributions.
The general concept is to utilise techniques which operate on uniform random variates on the
interval [0, 1] (later denotedU ) to generate random variates from any user defined
distribution.
2.1 Inverse transform sampling technique.
Let X denote a random variable with a probability density function (PDF) f x and
cumulative density function (CDF) X
XF x P X x f x dx
.
In order to randomly sample a value according to this distribution of X :
Sample U from a uniform distribution on the interval [0, 1], and
Compute the value of ix such that: XF x U .
In other words, the inverse of the CDF, 1
XF U
exhibits the distribution XF x . This is
often referred to as the quantile function.
The “randomness” of X is guaranteed by that of U . This method is very simple, and is
indeed exact, provided the inverse of the CDF can be calculated. The expression XF x U
is often referred to as the “sampling equation”, when expressed in the form 1
XX F U
The inverse transform sampling technique relies on the monotonically increasing nature of
the CDF of a probability distribution. And on the one-to-one correspondence between values
of X and U .
2.2 Rejection sampling techniques
In cases where the sampling equation, 1XX F U
, is not readily calculable, the inverse
transform sampling technique is not generally applicable. However, one can sample
uniformly from the region under the PDF of the function to generate sampled values.
In order to randomly sample a value ix according to the desired distribution of X , two
uniformly distributed random variates 1U and 2U on the interval [0, 1] are required.
1) Sample a point 1 max minix U x x from the abscissa of the PDF, where maxx and minx
are the upper and lower bounds of the permissible values.
2) Calculate if x for this value of ix .
3) If 2 iU f x , then reject ix and return to step 1).
4) Else accept ix as belonging to the desired distribution.
8
3 Testing the simulated distribution
Having simulated data according to a desired distribution, some rudimentary calculations of
the observed lower-order moments should be performed, at least as a simple check that the
code performing the transformation is correct. In all cases, as a minimum, the mean and
variance of the observed distribution should be in excellent agreement with the theoretically
predicted values, provided a suitably large number of variates are generated.
For each distribution of a continuous random variable X discussed in the following sections,
the probability density function, cumulative density function and the “sampling equation” are
provided.
In order to test the validity of the transforms performed in the sampling equation, the moment
generating function, 1st ordinary moment ( 1m = mean), 2
nd ordinary moment ( 2m ) and the 2
nd
central moment ( 2 = variance) are also listed.
Probability density function (PDF): f x
Cumulative density function (CDF): X
XF x P X x f x dx
Sampling equation: XU F x rearranged to find X in terms of U
Moment generating function: . .t x t x
XM t E e e f x dt
1st ordinary moment (mean):
01 lim .
Xt
dm E X M t x f x dt
dt
rth
ordinary moment: 0
lim
r
r r
Xrt
r
dm E X M t x f x dt
dt
2nd
central moment (variance): 22
2 1 2 1E X m m m
For N simulated values of ix , one can readily compare the population mean x and
variance 2 from with the above predicted values, E X and 2 respectively, recalling:
Population mean: 1
1 N
i
i
x xN
Population variance: 2 22 2
1 1
1 1N N
i i
i i
x x x xN N
.
9
If one desires, one can readily calculate the predicted higher-order central moments (and thus
the “skewness” and “kurtosis”) of the distribution of X , using the following generally
applicable relationship between the central and ordinary moments:
rth
central moment; 1
0
1r
r k r k
r k
k
rm m
k
where 0
0 1 1m E X E
i.e.: 2
2 1 2m m as stated before
3
3 1 1 2 32 3m m m m
4 2
4 1 1 2 1 3 43 6 4
...
m m m m m m
Skewness:
3
3
22
Kurtosis:
4
2
2
For N simulated values of ix , compare the predicted results to those observed from the
resulting population.
Population skewness:
3
1
32 2
1 N
i
i
x xN
Population kurtosis:
4
1
22
1 N
i
i
x xN
10
4 Uniform Distributions
4.1 General uniform distribution on interval [a, b]
Probability density
function:
1
| ,
0
a x bf x a b b a
elsewhere
Cumulative density
function:
1
| ,XF x a b X ab a
Sampling equation: X a b a U
Moment generating
function:
1 bt at
XM t e e
t b a
Mean:
12
b am
Second ordinary moment:
3 3
23
b am
b a
rth
ordinary moment:
1 1
1
r r
r
b am
r b a
Variance:
2
212
b a
rth
central moment:
12 1
r r
r r
a b b a
r
Skewness:
3
3
22
0
Kurtosis:
4
2
2
9
5
11
4.2 Uniform distribution on interval [0, 1]
The results may be found directly from the above expressions in section 4.1 by simply
substituting a = 0 and b = 1.
Probability density
function:
1 0 1
0
xf x
elsewhere
Cumulative density
function: X
F x X
Sampling equation: X U
Moment generating
function:
1
t
X
eM t
t
Mean: 1
1
2m
Second ordinary moment: 2
1
3m
rth
ordinary moment:
1
1rm
r
Variance:
2
1
12
rth
central moment:
1
1
2
1
1
r
r r r
Skewness:
3
3
22
0
Kurtosis:
4
2
2
9
5
12
4.3 Uniform distribution with expectation “A” and semi-width “w/2”
The results may be found directly from the above expression in Section 4.1 for a general
uniform distribution, by simply substituting 2a A w and 2b A w .
Probability density
function:
12 2
| , w
0
A w x A wf x A w
elsewhere
Cumulative density
function:
1
| ,2
X
wF x A w X A
w
Sampling equation:
2 1
2
UX A w
Moment generating
function:
2 2
w wA t A t
X
e eM t
wt
Mean: 1 Am
Second ordinary moment:
2
2
212
wAm
rth
ordinary moment:
1 1/ 2 / 2
1
r r
r
A w A wm
w r
Variance:
2
212
w
rth
central moment:
12 1
rr
r r
ww
r
Skewness:
3
3
22
0
Kurtosis:
4
2
2
9
5
13
5 Triangular Distributions
5.1 General triangular distribution on interval [a, b] with apex at c
Probability density
function:
2
( )
2| , ,
( )
0
x aa x c
b a c a
b xc x bf x a b c
b a b c
elsewhere
Cumulative density
function:
2
2| , ,
1
X
X aa x c
b a c aF x a b c
b Xc x b
b a b c
Sampling equation:
0 / ( )
1 / ( )
a U b a c a U c a b aX
b U b a b c c a b a U b
Moment generating
function:
2
2
( )
bt ct ct at
X
e e e eM t
t b a b c c a
Mean: 13
a b cm
Second ordinary moment:
2 2 2
26
a b c ab ac bcm
Variance:
2 2 2
218
a b c ab ac bc
Skewness:
3
3 32 2 22 2
2
2 2 2 2
5
a b c a b c a b c
a b c ab ac bc
Kurtosis:
4
2
2
12
5
14
5.2 Isosceles triangle distribution on interval [a,b]
The results from the previous section may be generalised to the case of an isosceles triangle
on the interval [a,b], where the peak resides in the middle of the interval. Substitution of c =
(a+b)/2 into the above expressions yields:
Probability density
function:
2
2
42
42| ,
0
x aa x a b
b a
b xa b x bf x a b
b a
elsewhere
Cumulative density
function:
2
2
2
2
22
| , ,2
1 2
X
X aa x a b
b aF x a b c
b Xa b x b
b a
Sampling equation:
2 0 0.5
1 2 0.5 1
a b a U UX
b b a U U
Moment generating
function:
2
22
42
a b
bt
X
tat
M t e et
eb a
Mean: 12
a bm
Second ordinary moment:
2 2
2
7 10 7
24
a ab bm
Variance:
2
224
b a
Skewness:
3
3
22
0
Kurtosis:
4
2
2
12
5
15
5.3 Isosceles distribution with expectation “A” and semi-width “w/2”
The results from the 2.2 section may be generalised to the case of an isosceles triangle with
expectation A and semi-width w/2. Substitution of a = A - w/2 and b = A + w/2 into the
expressions in Section A.5.2 yields:
Probability density
function:
2
2
4 22
4 22| ,
0
x A wA w x A
w
A w xA x A wf x A w
w
elsewhere
Cumulative density
function:
2
2
2
2
22
2
| , ,
22
1 2
X
wX A
A w x Aw
F x a b cw
A X
A x A ww
Sampling equation:
2 1 2 0 0.5
2 1 2 1 0.5 1
A w U U
XA w U U
Moment generating
function:
2 2
2 2
42
w wA t A t
At
XM t e e e
t w
Mean: 1 Am
Second ordinary moment:
2
2
224
wAm
Variance:
2
224
w
Skewness:
3
3
22
0
Kurtosis:
4
2
2
12
5
16
6 Trapezoidal distributions
6.1 Generalised trapezoidal distribution on interval [a, b] with nodes at n1 and n2
The sampling equations for simple geometrical shapes may be taken as specific instances of a
generalised trapezoidal distribution. A random variate X with a trapezoidal PDF f x on
the interval ,a b with nodes at 1n and 2n is shown below. Since the area under the PDF must
be unity, the height of the PDF is given by the expression. 2 1
2
( )h
b a n n
a bn1 n2
f(x)
h
Probability
density
function:
1
1 2 1 2
2
1
2
| , , ,
a x n
f x a b n n h n x n
n x b
h x a n a
h x b n b
Cumulative
density
function:
1
1 2 1 1 2
2
2
1
1
2
2
| , , ,
1
2
2
2
X
a x n
F x a b n n h X n n x n
n x b
h X a n a
h n a
h X b n b
Sampling
equation:
2 1 1 1
2 1 1
1 1 2
2 2 1 2
02
12 2 2
1 1 12
ha U b a n n n a U n a
U b a n n n a h hX n n a U b n
hb U b n b a n n b n U
Moment
generating
function:
2 1
2
2 1
n t n tbt at
X
h e e e eM t
t b n n a
17
Mean:
3 3 3 3
2 1
2 1
2 2
1 1 2 1 2 2
2 2 2 2
1 2 2 1 16 6
2 3 26
b n n ah h
b n n a
hn a a n n n b n b n
m b bn n a an n
Second
ordinary
moment:
4 4 4 4
2 1
2 1
2 22 3 3 2
1 1 1 2 1 2 2 2
3 2 2 3 3 2 2 3
2 2 2 2 1 1 112
2 4 212
12
b n n ah h
b n n a
hn a a n n n n b n b n n
m b b n bn n a a n an n
rth
ordinary
moment:
2 2 2 2
2 1
2 1
11 1
2 1
0
1 2
1 2
r r r r
r
rr i i r i i
i
b n n ah
b n n a
h
mr r
b n a nr r
Variance:
2 22 2 2 2
2 1 2 1 24 2
1 1 1
12 2 48
hn a b n n a b n
h h
Skewness:
(*)
3
3 1 1 2 3
3 322 2
2 2 1
2 3m m m m
m m
Kurtosis:
(*)
4 2
1 1 2 1 3 44
2 22
2 2 1
3 6 4m m m m m m
m m
(*) Closed-form expressions for the higher-order moments and coefficients of skewness and
kurtosis are difficult to deduce. The numerical values are readily calculated via the
expressions for the moments about the origin rm , and the general relationships between
central and ordinary moments as detailed in section 3.
The interested reader may like to confirm that these reduce to the triangular case (for
1 2n n c ) as described in Section 5.1 and the rectangular case (for 1n a and 2n b ) as
described in Section 4.1. Further details may be found in the articles of Kacker and Lawrence
(2007) and van Dorp and Kotz (2003).
18
7 Normal distribution
Due to the central limit theorem, the normal distribution is omni-present.
Any suite of Monte Carlo simulation software should be able to generate variates from a
normal distribution. First we will discuss methods for generating variates from the standard
normal distribution.
7.1 Standard normal distribution with expectation 0 and standard deviation 1
Probability density
function:
2
21
2
x
f x e
Cumulative density
function: 2
21 1
122 2
X x
X
xF x e dx erf
2
0
2: ( )
z
twhere erf z e dt
see notes below
Sampling equation: Does not exist. See notes below.
Moment generating
function:
2
2
t
XM t e
Mean: 1 0m
Second ordinary moment: 2 1m
Variance: 2 1
Skewness: 0
Kurtosis: 3
The integral of the PDF evaluated between - and X, (or the the CDF); X P X xF x ,
cannot be evaluated analytically. One solution is to perform a Taylor series expansion of the
PDF and to integrate each successive term until some required precision is reached, or to
perform numerical integration.
Thus, the inverse of the CDF, or sampling equation, 1
XX F U
, does not exist, and the
simulation of normal random variates requires other methods than the inverse transform
sampling technique. A useful and compact rejection technique is the polar form of the well-
known Box-Müller transform:
1) Create 2 uniform random variates 1 2,U U on the interval 0,1 .
2) Transform these to two random variables 1V and 2V on interval [-1, 1].
i.e.: 1 12 1V U and 2 22 1V U .
3) Calculate 2 2
1 2s V V .
19
4) If 0s or 1s then go to step 1.
5) Else
1 1
2lnX U
s
s
6) Return the value 1X .
o Note that this algorithm can also return a second independent standard normal
variate (for free), if required
2 2
2lnX U
s
s
.
o This algorithm may also be adapted to return a random variates from the
standard Cauchy (or Lorentzian) distribution, if one returns the ratio 1 2X X .
7.2 Standard normal distribution with expectation and variance 2
The results from the standard normal distribution may be readily generalised to a normal
distribution with mean = and standard deviation . In the PDF, the replacement of x with
the dimensionless term x
is required, noting that for the PDF to be normalised, a
multiplicative factor of 1
is also required.
Probability density
function:
2
22
1| ,
2
x
f x e
Cumulative density
function: 1
| , 12 2
X
xF x erf
2
0
2: ( )
z
twhere erf z e dt
see notes below
Sampling equation: Does not exist. See notes below.
Moment generating
function:
2 2
2
tt
XM t e
Mean: 1m
Second ordinary moment: 2 2
2m
Variance: 2
2
Skewness: 0
Kurtosis: 3
20
To generate normal random variates with mean and variance 2 , one can readily use the
convenient scaling properties of the normal distribution to simply generate random variates
0,1X from the standard normal distribution (as described in Section A.7.1), and apply the
transform , 0,1X X .
21
8 Poisson Process, exponential intervals with rate
It is often useful to simulate a Poisson Process with rate , when considering the effects of
dead-time on a particular counting experiment. The “memory-less” nature of a Poisson
process implies that the inter-arrival times between successive events are independent,
identically distributed exponential variables. The fundamental properties of counting statistics
in radionuclide metrology are readily derived from the “inter-arrival” distribution of pulses
from radiation detectors. A great deal of information on this topic has been published
elsewhere (Müller, 1973) and (ICRU, 1994).
A means of simulating an exponential distribution, as well as some of the expected moments
are given below:
Probability
density function: | 0 0x
f x e x and
Cumulative
density function: | 1 0x
XF x e t
Sampling
equation:
ln 1 ln
since both and 1 are uniform random variates
U UX
U U
Moment
generating
function:
XM tt
Mean: 1
1m
Second ordinary
moment: 2 2
2m
rth
ordinary
moment: !
r r
rm
Variance:
2 2
1
rth
central
moment: 0
!1
!
rr jr
r
j
r
r j
Skewness: 2
Kurtosis: 9
(Note that for an exponential distribution, the standard deviation equals the mean).
However, if simulating data in “list-mode” (where the recorded arrival times of events are
quantised according to the sampling frequency of an ADC), there will be truncations of the
simulated arrival times, perturbing the exponential distribution and creating a loss of the
“memory-less” nature of the input Poisson Process. Such truncations in effect impose an
extendable dead-time on the simulated pulse train, with the duration of the dead-time equal to
1.5 times the period of the sampling clock.
22
9 Poisson Process, Gamma distribution (multiple intervals) with rate
From the previous section we saw that the inter-arrival times 1 2, ,X X between successive
events for a Poisson Process with rate form a sequence of independent exponentially
distributed random variables, with the PDF : xe . The thk arrival time is thus the sum of the
first k inter-arrival times: i.e.:
1
lnki
k
i
UT
.
The probability density function for the thk arrival time is thus a k - fold convolution of the
exponentially distributed inter-arrivals, and forms a gamma distribution with shape parameter
k and rate . Since the shape parameter in this case is an integer, this is also often referred
to as an “Erlang” distribution.
Probability density
function:
1
| , ; 0 ; 01 !
kk x
f x kx
e xk
Cumulative distribution
function:
| ,
,
1 ! !
i
x
i k
F x kk x x
ek i
1
0
,
x
k zk x z e dz
is the lower incomplete
gamma function.
Sampling equation:
1 1
1 1ln ln
kk
i i
i i
X U U
see notes below, for large k
Moment generating
function: k
XM tt
Mean: 1m k
Second ordinary
moment: 21k k
rth
ordinary moment:
1
0
rr
i
k i
Variance: 2k
Skewness: 2 k
Kurtosis: 3 2k k
The sampling equation is presented in two forms. The first is expressed as the summation of
the independent exponentially distributed terms representing the interval between successive
events. When k is large, this sampling equation will require taking the logarithm of a large
number of random variates within the loop over k events. One may be tempted to use the
second form of the sampling equation presented above: 1
1ln
k
i
i
U
, as the logarithm is
taken outside of the loop over k terms. However, numerical stability issues may occur due to
the rapid convergence of the product towards zero. More information on this issue is
presented in the next section on the generation of Poisson variates.
23
Several algorithms devoted to gamma variate simulation have been published: Ahrens and
Dieter (1974), Cheng and Feast (1979), Best (1983), Marsaglia and Tsang (2000) and have
been incorporated into a variety of software packages. Particularly when k is large, it is
recommended to utilise high-quality verified algorithms via calls to external code which has
been verified. Some examples include the function G05SJF from the Dynamic Link Library,
Mark 24 from the Numerical Algorithms Group (NAG, 2013). This generates a vector of
pseudorandom numbers taken from a gamma distribution with high precision for large k and
may be readily called from a variety of programming languages. However use of this library
requires a (paid for) licence. It appears that a similar C/C++ function “gsl_ran_gamma” is
available from the GNU Scientific Library (GNU, 2013) and is free software under the GNU
General Public License. MATLAB also includes the “gamrand” function.
10 Poisson distribution
In radionuclide metrology, it is often also of great interest to consider the probability p k of
observing k events from a Poisson Process of rate , in a time interval t . The variates of k
form a Poisson distribution on the interval 0, with expectation E k , where t .
Probability mass function:
| 0,1,2,
!
k
p ke
P X k kk
Cumulative distribution
function: 0
|!
ik
i
P k ei
Sampling equation: See discussion below
Moment generating
function:
1te
kM t e
Mean: 1m
Second ordinary moment: 2
2m
rth
ordinary moment:
1 0
1!
j jri r
r
j i
jm j i
ij
Variance: 2
rth
central moment: 2
0
0
11
r
r j
j
rwhere
j
Skewness: 1
Kurtosis: 13
In order for exactly k events to be recorded in a time interval t , we require by necessity that
1k kt t t . One can readily simulate such Poisson distributed variates by forming the
summation of simulated inter-arrival times in the time interval 0, t (as described in the
previous section) and returning the number k when the condition 1k kt t t is reached.
24
This may also be expressed as the maximum value of n where
1
lnni
i
Ut
, i.e.:
1
max n : lnn
i
i
k U
.
In other words, the summation is stopped at the first instance when 1
lnn
i
i
U
, and the
value 1k n represents the Poisson random variate of interest. However, each simulated
inter-arrival time in the summation requires the computation of a natural logarithm, and for
large , the number of iterations required may be excessive.
A “snippet” of C/C++ code illustrates this algorithm:
unsigned long int Poisson_Example_1(double lambda) { unsigned long int k = 0; double sum_intervals = 0.0; do{ k++; sum_intervals += -log(random()); } while (sum_intervals <= lambda); return (k - 1); }
One may be tempted to use the essentially equivalent (and at first glance more
computationally effective) method of removing the logarithm from the expression within the
loop. Thus only a single exponential e is calculated, and only the product of the uniform
random variates iU is required.
1
min :n
i
i
k n U e
.
In this instance, the algorithm is stopped at the first instance where 1
n
i
i
U e
and the value
1k n represents the Poisson random variate. A “snippet” of C/C++ code illustrates this
algorithm:
unsigned long int Poisson_Example_2(double lambda) { double L = exp(-lambda); unsigned long int k = 0; double product_U = 1.0; do{ k++; product_U *= random(); } while (product_U > L);
25
return (k - 1); }
However, the use of the above algorithm requires a great deal of caution. If t is large,
then the value e rapidly approaches zero. Depending on the machine precision available, a
great deal of numerical instability is to be expected. If using double precision floating point
variables, a value for of as low as 700 has already approached the smallest possible double
precision floating point number (at least on the author’s computer system).
As stated in the previous section it is recommended to utilise high-quality, verified algorithms
in such cases to perform the simulations in such cases. Examples include the Numerical
Algorthims Group function “G05TJF” (NAG, 2013), the C/C++ implementation of the GNU
Scientific Library “gsl_ran_poisson” or the MATLAB function “poissrnd”.
However, since we can express k in terms of the sum of independent, identically distributed
random variables (intervals), we can apply the Central Limit Theorem, for large . This
states that the Poisson distribution can be safely approximated by a normal distribution with
expectation 1E k m and variance 2 , and for (say) 25 the approximation is
rather good.
However, since the Poisson distribution is a discrete distribution and the normal distribution
is continuous, a continuity correction is required, where a discrete value k from the Poisson
distribution is represented by the interval from 0.5k to 0.5k . Thus, the area under the
normal PDF between the limits 0.5k and 0.5k represents the probability of the discrete
whole number k , and this correction should be made.
26
References
Ahrens, J.H. and Dieter, U., (1974) Computer methods for sampling from gamma, beta,
Poisson, and binomial distributions. Computing, 12(3):223-246.
Best, D. J.,.(1983) A note on gamma variate generators with shape parameters less than
unity. Computing, 30(2):185-188.
Bevington, B.R. and Robinson, D.K., (2003). Data reduction and error analysis for the
physical sciences. McGraw Hill, New York, USA.
Cheng, R. C. H. and Feast, G.M., (1979) Some simple gamma variate generators.
Computing, 28(3):290-295.
GNU., (2013). GNU Scientific Library. Version GSL-1.16, released on 19 July 2013.
http://www.gnu.org/software/gsl/.
ICRU Report 52., (1994). Particle counting in radioactivity measurements. Bethesda, MD.
JCGM 101:2008. Evaluation of measurement data — Supplement 1 to the “Guide to the expression of
uncertainty in measurement” — Propagation of distributions using a Monte Carlo method. 2008.
Kacker, R.N. and Lawrence, J. F., (2007) Trapezoidal and triangular distributions for Type B
evaluation of standard uncertainty. Metrologia. 44, 117-127.
Knuth, D.E., 1998. The Art of Computer Programming (2nd Edition): Volume 2: Seminumerical
Algorithms. Addison Wesley.
L’Ecuyer, P., Simard, R., 2005. TestU01:A software library in ANSI C for empirical testing of
random number generators. Laboratoire de simulation et d’optimisation. Université de Montréal IRO.
Version 6.0, dated January 14, 2005. Available at http://simul.iro.umontreal.ca/testu01/tu01.html
D.P. Kroese, T. Taimre, Z.I. Botev. (2011) Handbook of Monte Carlo Methods. John Wiley & Sons.
Marsaglia, G. and Tsang, W. (2000) A simple method for generating gamma variables. ACM
Transactions on Mathematical Software, 26(3):363-372.
Marsaglia, G., 1985. The Marsaglia random number CDROM, with the Diehard battery of tests of
randomness, produced at Florida State University under a grant from The National Science
Foundation. Available at http://www.stat.fsu.edu/pub/diehard
Müller, J. W., (1973). Dead time problems. Nucl. Inst. Meth. 112, 47-57.
NAG (2013). The NAG Library, The Numerical Algorithms Group (NAG), Oxford, United Kingdom.
www.nag.com
Ross, S.M., (2000). Introduction to probability models. 7th edition. Harcourt Academic
Press. San Diego
Stuart, A. and Ord, J.K., 1994. Kendall’s advanced theory of statistics : Volume 1.
Distribution theory. Sixth Edition. Arnold, London.
Van Dorp, J. R. and Kotz, S., (2003) Generalized trapezoidal distributions. Metrika, 58, 85-87.
27
Wichmann, B. A. and Hill, I. D., (2006) Generating good pseudo-random numbers. Computational
Statistics and Data Analysis. 51, 1614–1622.