General Principles in Random Variates Generation
General Principles in Random Variates Generation
E. Moulines and G. Fort
Telecom ParisTech
June 2015
Bibliography : Luc Devroye, Non-Uniform Random Variate Generator,Springer-Verlag (1986) available on the webpage of the author http ://luc.devroye.org/books-luc.html
General Principles in Random Variates Generation
Outline
Uniform Random number generatorsPesudo-Random sequences
Inversion Methods for distributions on RCumulative distribution functionAlgorithmSampling real valued random variablesWhen F−1(u) is not explicit
Simulation of Gaussian variablesBox-MullerMarsaglia-BrayInversion method
The rejection method (on Rd)Sampling under a curveConditional distributionAlgorithmDesign parametersExample 1 : a normal generator based on the rejection algorithmExample 2 : Gaussian from Cauchy
General Principles in Random Variates Generation
Uniform Random number generators
Uniform Random number generatorsPesudo-Random sequences
Inversion Methods for distributions on RCumulative distribution functionAlgorithmSampling real valued random variablesWhen F−1(u) is not explicit
Simulation of Gaussian variablesBox-MullerMarsaglia-BrayInversion method
The rejection method (on Rd)Sampling under a curveConditional distributionAlgorithmDesign parametersExample 1 : a normal generator based on the rejection algorithmExample 2 : Gaussian from Cauchy
General Principles in Random Variates Generation
Uniform Random number generators
Pesudo-Random sequences
Pseudo-random sequences (1/3)
The key ingredient for Monte Carlo methods is a generator of randomnumber. We will see in this lesson that the generator of uniform randomvariable on [0,1] is a central tool for sampling more general distributions.
Therefore, how to sample from a uniform distribution on [0, 1] ? i.e. how toobtain a sequence of numbers u1, · · · , un, · · · that can be considered as apath of the random sequence (U1, · · · , Un, · · · ) where
the r.v. (Un)n are independentand with the same distribution U([0, 1]).
Well, we are only able to produce a sequence (u1, · · · , un, · · · ) from amachine which is a pseudo random sequence.
General Principles in Random Variates Generation
Uniform Random number generators
Pesudo-Random sequences
Pseudo-random sequences (2/3)Let us start with the question of sampling a sequence of {0, 1}-valued r.v. witha (fair) coin.Examples (L = 12)
0 0 0 · · · · · · · · · · · · · · · 0 0 0 00 1 0 1 · · · · · · · · · · · · 0 1 0 10 1 1 0 1 1 1 0 0 1 0 1
All of these sequences of length L occur with probability 1/2L. But are theyrandom ?
Champernowne sequence :
0 1 1 0︸ ︷︷ ︸2
1 1︸ ︷︷ ︸3
1 0 0︸ ︷︷ ︸4
1 0 1︸ ︷︷ ︸5
General Principles in Random Variates Generation
Uniform Random number generators
Pesudo-Random sequences
Pseudo-random sequences (2/3)Let us start with the question of sampling a sequence of {0, 1}-valued r.v. witha (fair) coin.Examples (L = 12)
0 0 0 · · · · · · · · · · · · · · · 0 0 0 00 1 0 1 · · · · · · · · · · · · 0 1 0 10 1 1 0 1 1 1 0 0 1 0 1
All of these sequences of length L occur with probability 1/2L. But are theyrandom ?Champernowne sequence :
0 1 1 0︸ ︷︷ ︸2
1 1︸ ︷︷ ︸3
1 0 0︸ ︷︷ ︸4
1 0 1︸ ︷︷ ︸5
General Principles in Random Variates Generation
Uniform Random number generators
Pesudo-Random sequences
Pseudo-random sequences (3/3)
For any length L and any binary motif (a1, · · · , aL), the Law of LargeNumbers for i.i.d. r.v. implies
1
n
n∑k=1
1(a1,··· ,aL)(U(k−1)L+1, · · · , UkL) −→n→∞1
2Lw.p.1
A sequence (un)n is said ∞-uniform if for any L ≥ 1 and any(a1, · · · , aL) ∈ {0, 1}L,
1
n
n∑k=1
1(a1,··· ,aL)(u(k−1)L+1, · · · , ukL)→ 1
2L
It can be proved that the Champernowne sequence is ∞-uniform ...
General Principles in Random Variates Generation
Uniform Random number generators
Pesudo-Random sequences
Pseudo-random generator
The definition of a generator consists in
a finite sequence of states (x1, · · · , xM ),
a mapping T : xk+1 = T (xk)
a mapping S : uk = S(xk)
an output sequence (un)n.
The output sequence is called the random sequence produced by this generator.
Example : The best known and still most widely used generators are the simplelinear congruential generators.
x0 (seed) xk = (axk−1 + b) modM uk =xkM
The properties (e.g. cycles) of this generator depends on a, b,M (M = 232)
General Principles in Random Variates Generation
Uniform Random number generators
Pesudo-Random sequences
Bibliography
More complex generators exist, with improved properties. See e.g.
Cours d’E. Moulines, sur le site pedagogique du cours MDI 345, rubriques :
Introduction aux methodes de Monte Carlo et a la SimulationMethodes de simulation de v.a. uniformes
Cours de B. Ycart ”methodes de Monte Carlo”,
http ://ljk.imag.fr/membres/Bernard.Ycart/polys/polys.html
Livre de S. Asmussen and P.W. Glynn Stochastic Simulation, Springer.Chapter II section 2.1.
Livre de P. Glasserman Monte Carlo methods in Financial Engineering,Springer. Chapter II section 2.1.
General Principles in Random Variates Generation
Inversion Methods for distributions on R
Uniform Random number generatorsPesudo-Random sequences
Inversion Methods for distributions on RCumulative distribution functionAlgorithmSampling real valued random variablesWhen F−1(u) is not explicit
Simulation of Gaussian variablesBox-MullerMarsaglia-BrayInversion method
The rejection method (on Rd)Sampling under a curveConditional distributionAlgorithmDesign parametersExample 1 : a normal generator based on the rejection algorithmExample 2 : Gaussian from Cauchy
General Principles in Random Variates Generation
Inversion Methods for distributions on RCumulative distribution function
Cumulative distribution function
Let F be a cumulative distribution function on R,
F (x) = P(X ≤ x) .
Properties of F :
The function x 7→ F (x) is non-decreasing.At every point, F has left limits
limy→x−
F (y) exists
F is continuous to the right
F (x) = limy→x+
F (y)
For any x ∈ R,
F (x+)− F (x−) = P(X = x) .
General Principles in Random Variates Generation
Inversion Methods for distributions on RCumulative distribution function
The quantile function
Here are examples of cumulative distribution functions F :
−3 −2 −1 0 1 2 3
0
0.2
0.4
0.6
0.8
1
−2 −1 0 1 2 3 4 5
0
0.2
0.4
0.6
0.8
1
0 1 2 3 4 5 6 7 8 9 10
0
0.2
0.4
0.6
0.8
1
F is not necessarily invertible.The generalized inverse of the function F , denoted F−1, is defined on (0, 1) by
F−1(p) = inf{y ∈ R, F (y) ≥ p} , 0 < p < 1 .
This function is called the quantile function.On the figures above : [left] F−1 : (0, 1)→ R ; [center] F−1 : (0, 1)→ (0,∞) ;[right] F−1 : (0, 1)→ (2, 8)
General Principles in Random Variates Generation
Inversion Methods for distributions on RCumulative distribution function
Properties of the quantile function
F−1(p) = inf{y ∈ R, F (y) ≥ p} , 0 < p < 1 .
F−1 is a proper inverse if and only if F is continuous and strictly increasing.
Properties :
F−1 is nondecreasing.
F ◦ F−1(p) ≥ p ; equality can fail only if F is discontinuous at F−1(p).
F−1 ◦ F (x) ≤ x ; equality fails iff x is in the interior or at the right end ofa ”flat” of F .
F−1(p) ≤ x if and only if p ≤ F (x).
General Principles in Random Variates Generation
Inversion Methods for distributions on RCumulative distribution function
0 1 2 3 4 5 6 7 8 9 10
0
0.2
0.4
0.6
0.8
1
p
p
x
p=F(x)
F−1(p) =F−1(F(x)) F−1(p)F−1(p)
F(F−1(p))
General Principles in Random Variates Generation
Inversion Methods for distributions on RAlgorithm
Simulation using the quantile transformation
Theorem (Inverse CDF method)
Let F be a cumulative distribution function and U ∼ U([0, 1]). Then thecumulative distribution function of the r.v. F−1(U) is F .
This is called the inverse transform sampling method (also known as theinverse probability integral transform).
The interest for this method stems from the fact that many programminglanguages have the ability to generate pseudorandom numbers which are(almost) distributed according to i.i.d. standard uniform r.v. see the first section of
these slides, anf refs therein
TheoremIf the cumulative distribution function F of the r.v. X is continuous, thenF (X) ∼ U([0, 1]).
This is called the probability integral transformation
General Principles in Random Variates Generation
Inversion Methods for distributions on RSampling real valued random variables
Sampling discrete random variables
Case 1 : X takes values in {a1, · · · , an} with probabilities p1, · · · , pn
F (x) =
0 if x < a1p1 + · · ·+ pj−1 if aj−1 ≤ x < aj1 if x ≥ an
and for any p ∈ (0, 1),
F−1(p) =
{a1 if 0 < p ≤ p1aj ifp1 + · · ·+ pj−1 < p ≤ p1 + · · ·+ pj
General Principles in Random Variates Generation
Inversion Methods for distributions on RSampling real valued random variables
Sampling discrete random variables
Case 1 : X takes values in {a1, · · · , an} with probabilities p1, · · · , pn
F (x) =
0 if x < a1p1 + · · ·+ pj−1 if aj−1 ≤ x < aj1 if x ≥ an
and for any p ∈ (0, 1),
F−1(p) =
{a1 if 0 < p ≤ p1aj ifp1 + · · ·+ pj−1 < p ≤ p1 + · · ·+ pj
Hence, the algorithm is
Draw U ∼ U([0, 1])
Find J ∈ {1, · · · , n} such that p1 + · · ·+ pJ−1 < U ≤ p1 + · · ·+ pJ .
Return aJ .
General Principles in Random Variates Generation
Inversion Methods for distributions on RSampling real valued random variables
Case 2 : X takes values in the countable set {a1, · · · , aj , · · · } withprobabilities p1, · · · , pj , · · · .The algorithm is
Draw U ∼ U([0, 1])
Find J ≥ 1 such that p1 + · · ·+ pJ−1 < U ≤ p1 + · · ·+ pJ .
Return aJ .
Remark : strategies have ben developed to solve efficiently ”find the index Jsuch that p1 + · · ·+ pJ−1 < U ≤ p1 + · · ·+ pJ” (see TD and the book Non-uniform Random Variate
Generation by L. Devroye)
General Principles in Random Variates Generation
Inversion Methods for distributions on RSampling real valued random variables
Case 3 : Sampling random variable with continuous (and explicit) cdf
support f F (x) X = F−1(U)
Exponential x > 0 λe−λx 1− e−λx −λ−1 log(U)
Cauchy R σ
π(x2+σ)12
+ 1π
atan(xσ
)σtan(πU)
Rayleigh R xσ
e−x2/2σ2
1− e−x2/2σ2
σ√−2 log(U)
Pareto x ≥ b > 0 aba
xa+1 1−(bx
)a b
U1/a
The cdf of a Gaussian random variable does not have an explicit expression. Wewill see later how to sample such a distribution from a generator of uniformrandom variables.
General Principles in Random Variates Generation
Inversion Methods for distributions on RWhen F−1(u) is not explicit
When F−1 is not explicit (1/2)
The inversion method is exact when an explicit form of F−1 is known.
In other cases, we must solve the equation F (x) = p and this requires aninfinite amount of time if F is continuous.Any stopping rule that we use with the numerical method leads necessarilyto an inexact algorithm....
General Principles in Random Variates Generation
Inversion Methods for distributions on RWhen F−1(u) is not explicit
When F−1 is not explicit (2/2) : The Bissection Method
When F−1(u) is not explicit, the solution x such that F (x) = u canbe numerically approximated by the Bissection method
Algorithm :
Find an initial interval [a, b] to which the solution belongsrepeatX ← (a+ b)/2if F (X) ≤ u thena← X
elseb← X
end ifuntil b− a ≤ 2δReturn X
Rmk : This algorithm may never work if, for fixed u, there is an interval ofsolution to F (x) = u. Nevertheless, it can be proved that the set
{u ∈ (0, 1) : ∃ x < y s.t.F (x) = F (y) = u}
is of null Lebesgue-measure.
General Principles in Random Variates Generation
Simulation of Gaussian variables
Uniform Random number generatorsPesudo-Random sequences
Inversion Methods for distributions on RCumulative distribution functionAlgorithmSampling real valued random variablesWhen F−1(u) is not explicit
Simulation of Gaussian variablesBox-MullerMarsaglia-BrayInversion method
The rejection method (on Rd)Sampling under a curveConditional distributionAlgorithmDesign parametersExample 1 : a normal generator based on the rejection algorithmExample 2 : Gaussian from Cauchy
General Principles in Random Variates Generation
Simulation of Gaussian variables
Box-Muller
Box-Muller algorithm (1/2)
Set Z1, Z2, R,Θ such that
Z1 = R cos(Θ) Z2 = R sin(Θ)
The algorithm is based on the following property : (Z1, Z2) ∼ N (0, I2) iff
(R,Θ) are independent random variables,
R has the distribution of the square root of a chi-square with two degreesof freedom or, equivalently, R is a Rayleigh distribution with parameter σ = 1
Θ has the uniform distribution [0, 2π].
Algorithm
Generate a pair (U1, U2) of independent variables uniform on [0, 1].R←
√−2 log(U1)
V ← 2πU2
Z1 ← R cos(V ), Z2 ← R sin(V )Return Z1, Z2.
General Principles in Random Variates Generation
Simulation of Gaussian variables
Box-Muller
Box-Muller algorithm (1/2)
Set Z1, Z2, R,Θ such that
Z1 = R cos(Θ) Z2 = R sin(Θ)
The algorithm is based on the following property : (Z1, Z2) ∼ N (0, I2) iff
(R,Θ) are independent random variables,
R has the distribution of the square root of a chi-square with two degreesof freedom or, equivalently, R is a Rayleigh distribution with parameter σ = 1
Θ has the uniform distribution [0, 2π].
Algorithm
Generate a pair (U1, U2) of independent variables uniform on [0, 1].R←
√−2 log(U1)
V ← 2πU2
Z1 ← R cos(V ), Z2 ← R sin(V )Return Z1, Z2.
General Principles in Random Variates Generation
Simulation of Gaussian variables
Box-Muller
Box-Muller algorithm (2/2)
Show that
1 R is a Rayleigh distribution.
2 V is a uniform distribution on [0, 2π].
3 R and V are independent.
and prove that (Z1, Z2) are independent r.v. with distribution N (0, 1).
The main drawback of this method is that it requires the use of the sinus andcosinus functions, which can be expensive to evaluate. The following method isa way to overcome this problem.
General Principles in Random Variates Generation
Simulation of Gaussian variables
Marsaglia-Bray
Marsaglia-Bray algorithm (1/2)
repeatSample two independent uniform [0, 1] random variables U1 and U2
U1 ← 2U1 − 1, U2 ← 2U2 − 1until U2
1 + U22 ≤ 1
Y ←√−2 log(U2
1 + U22 )
Z1 ← Y U1√U2
1+U22
, Z2 ← Y U2√U2
1+U22
.
Return Z1, Z2.
General Principles in Random Variates Generation
Simulation of Gaussian variables
Marsaglia-Bray
Marsaglia-Bray algorithm (2/2)
Set
T =U1√
U21 + U2
2
W = U21 + U2
2
1 What is the distribution of (U1, U2) at the end of the ”repeat/until” loop ?
2 Show that T and W are independent, W is a uniform distribution on [0, 1]and T has the same distribution as cos(Θ) when Θ ∼ U([0, 1]).
3 By using the result of the Box-Muller algorithm, show that Z1, Z2 areindependent r.v. with normal distribution N (0, 1).
4 What is the acceptance probability of the ”repeat/until” test ? rmk :π/4 ≈ 0.785
This method substitutes the call to the sinus, cosinus functions for thefollowing algorithm : instead of computing (cos(Θ), sin(Θ)) from Θ, a point onthe unit ball is drawn, then projected on the unit sphere and its cartesiancoordinates are then considered.
General Principles in Random Variates Generation
Simulation of Gaussian variables
Inversion method
Inversion Method
The cumulative distribution function Φ of a standard Gaussian randomvariable satisfies
Φ−1(1− u) = −Φ−1(u) , 0 < u < 1 ,
and it suffices to approximate Φ−1 on [1/2, 1].
We may approximate Φ−1 using a rational function
Φ−1(u) ≈∑3n=0 an(u− 1/2)2n+1∑3n=0 bn(u− 1/2)2n
,
where 0.5 ≤ u ≤ 0.92 with a precision better than 10e−5 using the valuecomputed in Springer Beasley (1977)
For 0.92 ≤ u ≤ 1, Moro suggest to approximate Φ−1 using
Φ−1(u) ≈8∑
n=0
cn [log (log(1− u))]n , 0.92 ≤ u ≤ 1 ,
General Principles in Random Variates Generation
The rejection method (on Rd)
Uniform Random number generatorsPesudo-Random sequences
Inversion Methods for distributions on RCumulative distribution functionAlgorithmSampling real valued random variablesWhen F−1(u) is not explicit
Simulation of Gaussian variablesBox-MullerMarsaglia-BrayInversion method
The rejection method (on Rd)Sampling under a curveConditional distributionAlgorithmDesign parametersExample 1 : a normal generator based on the rejection algorithmExample 2 : Gaussian from Cauchy
General Principles in Random Variates Generation
The rejection method (on Rd)
Sampling under a curve
Tools 1 and 2 : Sampling under the curve
Theorem
(i) Let X be a random vector with density g on Rd, and let U be a uniformr.v. on [0, 1], independent of X. Let c > 0 be an arbitrary constant.Then, (X, cUg(X)) is uniformly distributed on the set
A = {(x, v) : x ∈ Rd, 0 ≤ v ≤ cg(x)} .
(ii) Conversely, if (X,V ) is a random vector in Rd+1 uniformly distributed onthis set A, then X has density g on Rd.
−3 −2.5 −2 −1.5 −1 −0.5 0 0.5 1 1.5 20
0.2
0.4
0.6
0.8
1
1.2
1.4
X
Function x −> g(x)
Function x −> cg (x)
Point with coordinates:(X, c g(X))
General Principles in Random Variates Generation
The rejection method (on Rd)
Conditional distribution
Tool 3 : First draw hitting a subset (1/2)
TheoremLet X1, X2, . . . be a sequence of i.i.d. random vectors taking values in A ⊆ Rd,and let B ⊆ A be a Borel set such that P(X1 ∈ B) = p > 0.Let Y be the first r.v. Xi taking values in A. Then
(i) The distribution of Y is given by
P(Y ∈ A) = P(X1 ∈ A|X1 ∈ B)
(ii) Let I = inf{i ≥ 1, Xi ∈ B} . Then I is a geometric r.v. withparameter p i.e.
P(I = k) = (1− p)k−1 p k ≥ 1 .
General Principles in Random Variates Generation
The rejection method (on Rd)
Conditional distribution
Tool 3 : First draw hitting a subset (2/2)
repeatDraw i.i.d. samples Xk with distribution π
until Xk ∈ BReturn Xk.
1 If π is the uniform distribution on A, then the algorithm returns a samplewith a uniform distribution on B.
2 The number of loops to return a value is a Geometric random variablewith parameter p (and expected value 1/p).
General Principles in Random Variates Generation
The rejection method (on Rd)
Algorithm
Rejection Method for sampling under the distribution f on X ⊆ Rd
below : either f, g are densities w.r.t. Lebesgue on Rd ; or f, g are probability mass functions on
an (at most) countable set X of Rd
The basic version of the rejection algorithm assumes
the existence of a distribution g on Rd
and the knowledge of a constant c ≥ 1 such that
f(x) ≤ cg(x) for all x ∈ X
Algorithm :
repeatGenerate independently X with distribution g and U ∼ U([0, 1])
untilU ≤ f(X)
cg(X)
Return X.
Key result : The distribution of the r.v. X is f .
General Principles in Random Variates Generation
The rejection method (on Rd)
Design parameters
On the design parameters
The three things we need before running the rejection algorithm are
1 a dominating distribution g
2 a simple method for generating random variates with distribution g
3 knowledge of c.
Basically g must have heavier tails and sharper peaks than f . The dominatingmeasure should be chosen with care !
The number of iterations in order to return a value X is a Geometric r.v. withparameter 1/c : we should keep c as small as possible !
General Principles in Random Variates Generation
The rejection method (on Rd)
Design parameters
Development of good rejection algorithms
Generally speaking, g is chosen from a class of ”classical” densities. Thisclass includes the uniform density, triangular densities and most densitiesthat can be generated quickly by the inversion method.
One generally starts with a family of dominating densities g (say aparametric family {gθ, θ ∈ Θ}) and chooses the density within this classfor which c is the smallest.This approach sometimes leads to some difficult optimization problem.
General Principles in Random Variates Generation
The rejection method (on Rd)
Example 1 : a normal generator based on the rejection algorithm
Example 1 : A normal generator by rejection from the Laplace density
The Laplace density is given by
g(x) ∝ exp(−|x|) x ∈ R
and the Normal density is given by
f(x) =1√2π
exp(−0.5 x2) x ∈ R
Show that the following algorithm is a normal generator by rejection from theLaplace density
repeatGenerate independently
an exponential random variate X (with parameter λ = 1)
two r.v. U and V with distribution U([0, 1]).
If U < 1/2 set X ← −X.
until V e1/2−|X| ≤ e−X2/2
return X
General Principles in Random Variates Generation
The rejection method (on Rd)
Example 2 : Gaussian from Cauchy
Example 2 : Gaussian from Cauchy (1/2)
We want to sample a standard Gaussian distribution on R.
The family of dominating densities is the Cauchy family with scaleparameter θ
gθ(x) =θ
π
1
θ2 + x2.
There is no need to consider a translation parameter as well because bothf and gθ are unimodal with peak at 0.
The optimal rejection constant is defined by
cθ = supx
f(x)
gθ(x).
It is given by
cθ =
{√2πeθ
eθ2/2 θ <
√2
θ√π/2 θ ≥
√2
The function cθ has only one minimum at θ = 1, and the minimal value isc1 =
√2π/e.
General Principles in Random Variates Generation
The rejection method (on Rd)
Example 2 : Gaussian from Cauchy
An example : Gaussian from Cauchy (2/2)
Show that the following algorithm is a normal generator by rejection from theCauchy distribution.
set α←√
e/2repeat
Sample independently the r.v. U and V with uniform distribution on [0, 1].Set X ← tan(πV )
until U ≤ α(1 +X2)e−X2/2
Return X
The rejection constant is near 1.52 ; it is no match for most normal generatorsdeveloped further on.