Download - General Principles in Random Variates Generation...General Principles in Random Variates Generation General Principles in Random Variates Generation E. Moulines and G. Fort Telecom

General Principles in Random Variates Generation


E. Moulines and G. Fort

Telecom ParisTech

June 2015

Bibliography : Luc Devroye, Non-Uniform Random Variate Generator,Springer-Verlag (1986) available on the webpage of the author http ://luc.devroye.org/books-luc.html


Outline

Uniform Random number generatorsPesudo-Random sequences

Inversion Methods for distributions on RCumulative distribution functionAlgorithmSampling real valued random variablesWhen F−1(u) is not explicit

Simulation of Gaussian variablesBox-MullerMarsaglia-BrayInversion method

The rejection method (on Rd)Sampling under a curveConditional distributionAlgorithmDesign parametersExample 1 : a normal generator based on the rejection algorithmExample 2 : Gaussian from Cauchy


Uniform Random number generators







Pesudo-Random sequences

Pseudo-random sequences (1/3)

The key ingredient for Monte Carlo methods is a generator of randomnumber. We will see in this lesson that the generator of uniform randomvariable on [0,1] is a central tool for sampling more general distributions.

Therefore, how to sample from a uniform distribution on [0, 1] ? i.e. how toobtain a sequence of numbers u1, · · · , un, · · · that can be considered as apath of the random sequence (U1, · · · , Un, · · · ) where

the r.v. (Un)n are independentand with the same distribution U([0, 1]).

Well, we are only able to produce a sequence (u1, · · · , un, · · · ) from amachine which is a pseudo random sequence.




Pseudo-random sequences (2/3)Let us start with the question of sampling a sequence of {0, 1}-valued r.v. witha (fair) coin.Examples (L = 12)

0 0 0 · · · · · · · · · · · · · · · 0 0 0 00 1 0 1 · · · · · · · · · · · · 0 1 0 10 1 1 0 1 1 1 0 0 1 0 1

All of these sequences of length L occur with probability 1/2L. But are theyrandom ?

Champernowne sequence :

0 1 1 0︸︷︷︸2

1 1︸︷︷︸3

1 0 0︸︷︷︸4

1 0 1︸︷︷︸5




Pseudo-random sequences (2/3)Let us start with the question of sampling a sequence of {0, 1}-valued r.v. witha (fair) coin.Examples (L = 12)

0 0 0 · · · · · · · · · · · · · · · 0 0 0 00 1 0 1 · · · · · · · · · · · · 0 1 0 10 1 1 0 1 1 1 0 0 1 0 1

All of these sequences of length L occur with probability 1/2L. But are theyrandom ?Champernowne sequence :

0 1 1 0︸︷︷︸2

1 1︸︷︷︸3

1 0 0︸︷︷︸4

1 0 1︸︷︷︸5




Pseudo-random sequences (3/3)

For any length L and any binary motif (a1, · · · , aL), the Law of LargeNumbers for i.i.d. r.v. implies

1

n

n∑k=1

1(a1,··· ,aL)(U(k−1)L+1, · · · , UkL) −→n→∞1

2Lw.p.1

A sequence (un)n is said ∞-uniform if for any L ≥ 1 and any(a1, · · · , aL) ∈ {0, 1}L,

1

n

n∑k=1

1(a1,··· ,aL)(u(k−1)L+1, · · · , ukL)→ 1

2L

It can be proved that the Champernowne sequence is ∞-uniform ...




Pseudo-random generator

The definition of a generator consists in

a finite sequence of states (x1, · · · , xM ),

a mapping T : xk+1 = T (xk)

a mapping S : uk = S(xk)

an output sequence (un)n.

The output sequence is called the random sequence produced by this generator.

Example : The best known and still most widely used generators are the simplelinear congruential generators.

x0 (seed) xk = (axk−1 + b) modM uk =xkM

The properties (e.g. cycles) of this generator depends on a, b,M (M = 232)




Bibliography

More complex generators exist, with improved properties. See e.g.

Cours d’E. Moulines, sur le site pedagogique du cours MDI 345, rubriques :

Introduction aux methodes de Monte Carlo et a la SimulationMethodes de simulation de v.a. uniformes

Cours de B. Ycart ”methodes de Monte Carlo”,

http ://ljk.imag.fr/membres/Bernard.Ycart/polys/polys.html

Livre de S. Asmussen and P.W. Glynn Stochastic Simulation, Springer.Chapter II section 2.1.

Livre de P. Glasserman Monte Carlo methods in Financial Engineering,Springer. Chapter II section 2.1.


Inversion Methods for distributions on R






Inversion Methods for distributions on RCumulative distribution function

Cumulative distribution function

Let F be a cumulative distribution function on R,

F (x) = P(X ≤ x) .

Properties of F :

The function x 7→ F (x) is non-decreasing.At every point, F has left limits

limy→x−

F (y) exists

F is continuous to the right

F (x) = limy→x+

F (y)

For any x ∈ R,

F (x+)− F (x−) = P(X = x) .



The quantile function

Here are examples of cumulative distribution functions F :

−3 −2 −1 0 1 2 3

0

0.2

0.4

0.6

0.8

1

−2 −1 0 1 2 3 4 5

0

0.2

0.4

0.6

0.8

1

0 1 2 3 4 5 6 7 8 9 10

0

0.2

0.4

0.6

0.8

1

F is not necessarily invertible.The generalized inverse of the function F , denoted F−1, is defined on (0, 1) by

F−1(p) = inf{y ∈ R, F (y) ≥ p} , 0 < p < 1 .

This function is called the quantile function.On the figures above : [left] F−1 : (0, 1)→ R ; [center] F−1 : (0, 1)→ (0,∞) ;[right] F−1 : (0, 1)→ (2, 8)



Properties of the quantile function

F−1(p) = inf{y ∈ R, F (y) ≥ p} , 0 < p < 1 .

F−1 is a proper inverse if and only if F is continuous and strictly increasing.

Properties :

F−1 is nondecreasing.

F ◦ F−1(p) ≥ p ; equality can fail only if F is discontinuous at F−1(p).

F−1 ◦ F (x) ≤ x ; equality fails iff x is in the interior or at the right end ofa ”flat” of F .

F−1(p) ≤ x if and only if p ≤ F (x).



0 1 2 3 4 5 6 7 8 9 10

0

0.2

0.4

0.6

0.8

1

p

p

x

p=F(x)

F−1(p) =F−1(F(x)) F−1(p)F−1(p)

F(F−1(p))


Inversion Methods for distributions on RAlgorithm

Simulation using the quantile transformation

Theorem (Inverse CDF method)

Let F be a cumulative distribution function and U ∼ U([0, 1]). Then thecumulative distribution function of the r.v. F−1(U) is F .

This is called the inverse transform sampling method (also known as theinverse probability integral transform).

The interest for this method stems from the fact that many programminglanguages have the ability to generate pseudorandom numbers which are(almost) distributed according to i.i.d. standard uniform r.v. see the first section of

these slides, anf refs therein

TheoremIf the cumulative distribution function F of the r.v. X is continuous, thenF (X) ∼ U([0, 1]).

This is called the probability integral transformation


Inversion Methods for distributions on RSampling real valued random variables

Sampling discrete random variables

Case 1 : X takes values in {a1, · · · , an} with probabilities p1, · · · , pn

F (x) =

0 if x < a1p1 + · · ·+ pj−1 if aj−1 ≤ x < aj1 if x ≥ an

and for any p ∈ (0, 1),

F−1(p) =

{a1 if 0 < p ≤ p1aj ifp1 + · · ·+ pj−1 < p ≤ p1 + · · ·+ pj



Sampling discrete random variables

Case 1 : X takes values in {a1, · · · , an} with probabilities p1, · · · , pn

F (x) =

0 if x < a1p1 + · · ·+ pj−1 if aj−1 ≤ x < aj1 if x ≥ an

and for any p ∈ (0, 1),

F−1(p) =

{a1 if 0 < p ≤ p1aj ifp1 + · · ·+ pj−1 < p ≤ p1 + · · ·+ pj

Hence, the algorithm is

Draw U ∼ U([0, 1])

Find J ∈ {1, · · · , n} such that p1 + · · ·+ pJ−1 < U ≤ p1 + · · ·+ pJ .

Return aJ .



Case 2 : X takes values in the countable set {a1, · · · , aj , · · · } withprobabilities p1, · · · , pj , · · · .The algorithm is

Draw U ∼ U([0, 1])

Find J ≥ 1 such that p1 + · · ·+ pJ−1 < U ≤ p1 + · · ·+ pJ .

Return aJ .

Remark : strategies have ben developed to solve efficiently ”find the index Jsuch that p1 + · · ·+ pJ−1 < U ≤ p1 + · · ·+ pJ” (see TD and the book Non-uniform Random Variate

Generation by L. Devroye)



Case 3 : Sampling random variable with continuous (and explicit) cdf

support f F (x) X = F−1(U)

Exponential x > 0 λe−λx 1− e−λx −λ−1 log(U)

Cauchy R σ

π(x2+σ)12

+ 1π

atan(xσ

)σtan(πU)

Rayleigh R xσ

e−x2/2σ2

1− e−x2/2σ2

σ√−2 log(U)

Pareto x ≥ b > 0 aba

xa+1 1−(bx

)a b

U1/a

The cdf of a Gaussian random variable does not have an explicit expression. Wewill see later how to sample such a distribution from a generator of uniformrandom variables.


Inversion Methods for distributions on RWhen F−1(u) is not explicit

When F−1 is not explicit (1/2)

The inversion method is exact when an explicit form of F−1 is known.

In other cases, we must solve the equation F (x) = p and this requires aninfinite amount of time if F is continuous.Any stopping rule that we use with the numerical method leads necessarilyto an inexact algorithm....


Inversion Methods for distributions on RWhen F−1(u) is not explicit

When F−1 is not explicit (2/2) : The Bissection Method

When F−1(u) is not explicit, the solution x such that F (x) = u canbe numerically approximated by the Bissection method

Algorithm :

Find an initial interval [a, b] to which the solution belongsrepeatX ← (a+ b)/2if F (X) ≤ u thena← X

elseb← X

end ifuntil b− a ≤ 2δReturn X

Rmk : This algorithm may never work if, for fixed u, there is an interval ofsolution to F (x) = u. Nevertheless, it can be proved that the set

{u ∈ (0, 1) : ∃ x < y s.t.F (x) = F (y) = u}

is of null Lebesgue-measure.


Simulation of Gaussian variables







Box-Muller

Box-Muller algorithm (1/2)

Set Z1, Z2, R,Θ such that

Z1 = R cos(Θ) Z2 = R sin(Θ)

The algorithm is based on the following property : (Z1, Z2) ∼ N (0, I2) iff

(R,Θ) are independent random variables,

R has the distribution of the square root of a chi-square with two degreesof freedom or, equivalently, R is a Rayleigh distribution with parameter σ = 1

Θ has the uniform distribution [0, 2π].

Algorithm

Generate a pair (U1, U2) of independent variables uniform on [0, 1].R←

√−2 log(U1)

V ← 2πU2

Z1 ← R cos(V ), Z2 ← R sin(V )Return Z1, Z2.



Box-Muller

Box-Muller algorithm (2/2)

Show that

1 R is a Rayleigh distribution.

2 V is a uniform distribution on [0, 2π].

3 R and V are independent.

and prove that (Z1, Z2) are independent r.v. with distribution N (0, 1).

The main drawback of this method is that it requires the use of the sinus andcosinus functions, which can be expensive to evaluate. The following method isa way to overcome this problem.



Marsaglia-Bray

Marsaglia-Bray algorithm (1/2)

repeatSample two independent uniform [0, 1] random variables U1 and U2

U1 ← 2U1 − 1, U2 ← 2U2 − 1until U2

1 + U22 ≤ 1

Y ←√−2 log(U2

1 + U22 )

Z1 ← Y U1√U2

1+U22

, Z2 ← Y U2√U2

1+U22

.

Return Z1, Z2.



Marsaglia-Bray

Marsaglia-Bray algorithm (2/2)

Set

T =U1√

U21 + U2

2

W = U21 + U2

2

1 What is the distribution of (U1, U2) at the end of the ”repeat/until” loop ?

2 Show that T and W are independent, W is a uniform distribution on [0, 1]and T has the same distribution as cos(Θ) when Θ ∼ U([0, 1]).

3 By using the result of the Box-Muller algorithm, show that Z1, Z2 areindependent r.v. with normal distribution N (0, 1).

4 What is the acceptance probability of the ”repeat/until” test ? rmk :π/4 ≈ 0.785

This method substitutes the call to the sinus, cosinus functions for thefollowing algorithm : instead of computing (cos(Θ), sin(Θ)) from Θ, a point onthe unit ball is drawn, then projected on the unit sphere and its cartesiancoordinates are then considered.



Inversion method

Inversion Method

The cumulative distribution function Φ of a standard Gaussian randomvariable satisfies

Φ−1(1− u) = −Φ−1(u) , 0 < u < 1 ,

and it suffices to approximate Φ−1 on [1/2, 1].

We may approximate Φ−1 using a rational function

Φ−1(u) ≈∑3n=0 an(u− 1/2)2n+1∑3n=0 bn(u− 1/2)2n

,

where 0.5 ≤ u ≤ 0.92 with a precision better than 10e−5 using the valuecomputed in Springer Beasley (1977)

For 0.92 ≤ u ≤ 1, Moro suggest to approximate Φ−1 using

Φ−1(u) ≈8∑

n=0

cn [log (log(1− u))]n , 0.92 ≤ u ≤ 1 ,


The rejection method (on Rd)







Sampling under a curve

Tools 1 and 2 : Sampling under the curve

Theorem

(i) Let X be a random vector with density g on Rd, and let U be a uniformr.v. on [0, 1], independent of X. Let c > 0 be an arbitrary constant.Then, (X, cUg(X)) is uniformly distributed on the set

A = {(x, v) : x ∈ Rd, 0 ≤ v ≤ cg(x)} .

(ii) Conversely, if (X,V ) is a random vector in Rd+1 uniformly distributed onthis set A, then X has density g on Rd.

−3 −2.5 −2 −1.5 −1 −0.5 0 0.5 1 1.5 20

0.2

0.4

0.6

0.8

1

1.2

1.4

X

Function x −> g(x)

Function x −> cg (x)

Point with coordinates:(X, c g(X))



Conditional distribution

Tool 3 : First draw hitting a subset (1/2)

TheoremLet X1, X2, . . . be a sequence of i.i.d. random vectors taking values in A ⊆ Rd,and let B ⊆ A be a Borel set such that P(X1 ∈ B) = p > 0.Let Y be the first r.v. Xi taking values in A. Then

(i) The distribution of Y is given by

P(Y ∈ A) = P(X1 ∈ A|X1 ∈ B)

(ii) Let I = inf{i ≥ 1, Xi ∈ B} . Then I is a geometric r.v. withparameter p i.e.

P(I = k) = (1− p)k−1 p k ≥ 1 .



Conditional distribution

Tool 3 : First draw hitting a subset (2/2)

repeatDraw i.i.d. samples Xk with distribution π

until Xk ∈ BReturn Xk.

1 If π is the uniform distribution on A, then the algorithm returns a samplewith a uniform distribution on B.

2 The number of loops to return a value is a Geometric random variablewith parameter p (and expected value 1/p).



Algorithm

Rejection Method for sampling under the distribution f on X ⊆ Rd

below : either f, g are densities w.r.t. Lebesgue on Rd ; or f, g are probability mass functions on

an (at most) countable set X of Rd

The basic version of the rejection algorithm assumes

the existence of a distribution g on Rd

and the knowledge of a constant c ≥ 1 such that

f(x) ≤ cg(x) for all x ∈ X

Algorithm :

repeatGenerate independently X with distribution g and U ∼ U([0, 1])

untilU ≤ f(X)

cg(X)

Return X.

Key result : The distribution of the r.v. X is f .



Design parameters

On the design parameters

The three things we need before running the rejection algorithm are

1 a dominating distribution g

2 a simple method for generating random variates with distribution g

3 knowledge of c.

Basically g must have heavier tails and sharper peaks than f . The dominatingmeasure should be chosen with care !

The number of iterations in order to return a value X is a Geometric r.v. withparameter 1/c : we should keep c as small as possible !



Design parameters

Development of good rejection algorithms

Generally speaking, g is chosen from a class of ”classical” densities. Thisclass includes the uniform density, triangular densities and most densitiesthat can be generated quickly by the inversion method.

One generally starts with a family of dominating densities g (say aparametric family {gθ, θ ∈ Θ}) and chooses the density within this classfor which c is the smallest.This approach sometimes leads to some difficult optimization problem.



Example 1 : a normal generator based on the rejection algorithm

Example 1 : A normal generator by rejection from the Laplace density

The Laplace density is given by

g(x) ∝ exp(−|x|) x ∈ R

and the Normal density is given by

f(x) =1√2π

exp(−0.5 x2) x ∈ R

Show that the following algorithm is a normal generator by rejection from theLaplace density

repeatGenerate independently

an exponential random variate X (with parameter λ = 1)

two r.v. U and V with distribution U([0, 1]).

If U < 1/2 set X ← −X.

until V e1/2−|X| ≤ e−X2/2

return X



Example 2 : Gaussian from Cauchy

Example 2 : Gaussian from Cauchy (1/2)

We want to sample a standard Gaussian distribution on R.

The family of dominating densities is the Cauchy family with scaleparameter θ

gθ(x) =θ

π

1

θ2 + x2.

There is no need to consider a translation parameter as well because bothf and gθ are unimodal with peak at 0.

The optimal rejection constant is defined by

cθ = supx

f(x)

gθ(x).

It is given by

cθ =

{√2πeθ

eθ2/2 θ <

√2

θ√π/2 θ ≥

√2

The function cθ has only one minimum at θ = 1, and the minimal value isc1 =

√2π/e.



Example 2 : Gaussian from Cauchy

An example : Gaussian from Cauchy (2/2)

Show that the following algorithm is a normal generator by rejection from theCauchy distribution.

set α←√

e/2repeat

Sample independently the r.v. U and V with uniform distribution on [0, 1].Set X ← tan(πV )

until U ≤ α(1 +X2)e−X2/2

Return X

The rejection constant is near 1.52 ; it is no match for most normal generatorsdeveloped further on.

Download - General Principles in Random Variates Generation...General Principles in Random Variates Generation General Principles in Random Variates Generation E. Moulines and G. Fort Telecom

Top Related