[wiley series in probability and statistics] handbook of monte carlo methods (kroese/handbook) ||...

67
CHAPTER 4 PROBABILITY DISTRIBUTIONS This chapter lists the major discrete and continuous probability distributions used in Monte Carlo simulation, along with their main properties and specific algorithms for random variable generation. For general random variable generation procedures, see Chapter 3. Further information on families of distributions and their properties can be found in Section D.l (exponential families), Section D.2 (tail and stability properties), and Section 3.1.2 (transformations and location-scale families). 4.1 DISCRETE DISTRIBUTIONS We list various discrete distributions in alphabetical order. Recall that a discrete distribution is completely specified by its discrete pdf. 4.1.1 Bernoulli Distribution The pdf of the Bernoulli distribution is given by f(x;p)=p'(l-p) 1 - x , xe{0,l}, where p G [0,1] is the success parameter. We write the distribution as Ber(p). The Bernoulli distribution is used to describe experiments with only two out- comes: 1 (success) or 0 (failure). Such an experiment is called a Bernoulli trial. A sequence of iid Bernoulli random variables, X\,X2,... ~m Ber(p), is called a Handbook of Monte Carlo Methods. By D. P. Kroese, T. Taimre, Z. I. Botev 85 Copyright © 2011 John Wiley & Sons, Inc.

Upload: zdravko-i

Post on 08-Dec-2016

213 views

Category:

Documents


1 download

TRANSCRIPT

CHAPTER 4

PROBABILITY DISTRIBUTIONS

This chapter lists the major discrete and continuous probability distributions used in Monte Carlo simulation, along with their main properties and specific algorithms for random variable generation. For general random variable generation procedures, see Chapter 3. Further information on families of distributions and their properties can be found in Section D.l (exponential families), Section D.2 (tail and stability properties), and Section 3.1.2 (transformations and location-scale families).

4.1 DISCRETE DISTRIBUTIONS

We list various discrete distributions in alphabetical order. Recall that a discrete distribution is completely specified by its discrete pdf.

4.1.1 Bernoulli Distribution

The pdf of the Bernoull i distribution is given by

f(x;p)=p'(l-p)1-x, x e { 0 , l } ,

where p G [0,1] is the success parameter. We write the distribution as Ber(p). The Bernoulli distribution is used to describe experiments with only two out-

comes: 1 (success) or 0 (failure). Such an experiment is called a Bernoull i trial. A sequence of iid Bernoulli random variables, X\,X2,... ~m Ber(p), is called a

Handbook of Monte Carlo Methods. By D. P. Kroese, T. Taimre, Z. I. Botev 85 Copyright © 2011 John Wiley & Sons, Inc.

8 6 PROBABILITY DISTRIBUTIONS

Bernoull i process. Such a process is a model for the random experiment where " ^ 626 a biased coin is tossed repeatedly; see also Example A.5.

Table 4.1 Moment properties of the Ber(p) distribution.

Property Condition

Expectation p

Variance p{l — p)

Probability generating function 1 — p + zp |z| ^ 1

The inverse-transform method leads to the following generation algorithm.

Algor i thm 4.1 (Ber(p) Generator)

1. Generate U ~ U(0,1).

2. IfU^ p, return X = 1; otherwise, return X = 0.

■ EXAMPLE 4.1 (Bernoull i Generat ion)

The following MATLAB code generates one hundred Ber(0.25) random variables, and plots a bar graph of the binary data.

X = (raud(l.lOO) <= 0 . 2 5 ) ; bar(X)

4.1.2 Binomial Distribution

The pdf of the binomial distribution is given by

/ ( x ; η,ρ) = Ψ(Χ = x)= ( " V ( l - p)n~x, x = 0 , 1 , . . . , n ,

where 0 ^ p ^ 1. We write the distribution as Bin(n,p). The binomial distribution is used to describe the total number of successes in a sequence of n independent Bernoulli trials. That is, a Bin(n,p)-distributed random variable X can be written as the sum X = Βχ + ■ ■ ■ + Bn of independent Ber(p) random variables {-B;}. Examples of the graph of the pdf are given in Figure 4.1.

DISCRETE DISTRIBUTIONS 8 7

0.3

0.25

0.2

-0.15 I

0.1

0.05

-Bin(20,0.1) -Bin(20,0.5)

g è — Θ — à — o — o " · 4 · 4 · 4 · 4 · — 8 — · — é — · — é 0 8 10 12 14 16 18 20

Figure 4.1 The pdfs of the Bin(20,0.1) (solid dot) and Bin(20,0.5) (circle) distributions.

Table 4.2 Moment and tail properties of the Bin(n,p) distribution.

Property Condition

Expectation np

Variance np(l — p)

Probability generating function (1 — p + zp)n \z\^\

Tail probability P(X > x) Ip{x + 1, n — x) x = 0 , 1 , . . .

Here, Ix(a, ß) denotes the incomplete beta function. Other properties and relations ^ 715

1. Order statistics: Let U\, ■.., Un ~ U(0,1), and denote by i/(i) < · · · < i/(n) the order statistics. Let X = max{z : U^ < p). Then, X ~ B\n(n,p).

2. Number of failures: Denote X ~ Bin(n,p) the number of successes in n Bernoulli experiments and Y — n — X the number of failures. Then, Y ~ B i n ( n , l - p ) .

3. Sum of binomial random variables: Let X/. ~ Bin(nfc,p), k = 1 , . . . , K, inde-pendently. Then,

K / K

fc=l ^ f c = l '

4. Convergence to the normal distribution: Let Xn ~ Bin(n,p). A direct conse-quence of the central limit theorem is

Xn -np

y V ( l - p ) V ~ N ( 0 , 1 ) as n ^ o o .

8 8 PROBABILITY DISTRIBUTIONS

An often used rule of thumb is that for finite n the corresponding approxi-mation is accurate if both np and n ( l — p) are greater than 5. For p close to 1/2 the cdf of N(np — 1/2, np(l —p)) approximates the cdf of Xn even better. This is called the continuity correction.

5. Convergence to the Poisson distribution: Let Xn ~ Bin(n,A/n). Then,

Xn —> Y ~ Ροί(λ) as n —> oo .

6. Geometric distribution: Let YQ,Y\,... ~ Geom(p). Then,

{k:YjYi>n\ X = min < fc .y Yi> n> ~ Bin(n

iid 7. Exponential distribution: Let Υ ι , . , . , Υ ^ ~ Exp(l). Then,

X = min I fc : ^ - ^ > - ln(l - p) 1 ~ B\n(n,p) . ^ t=o -*

The fact that binomial random variables can be viewed as sums of Bernoulli random variables leads to the following generation algorithm.

Algori thm 4.2 (Bin(n, p) Generator)

1. Generate X\,..., Xn '~ Ber(p).

2. Return X = Σ"=1Χί.

Since the execution time of Algorithm 4.2 is proportional to n, one is motivated to use alternative methods for large n. When n is large but p small such that np is moderate (say ^ 10), the following method, which is based on Property 6 above, provides fast random variable generation. The algorithm can also be used when p is close to 1, by generating Y = n — X.

Algor i thm 4.3 (Bin(n,p) Generator via the Geometr ic M e t h o d )

1. SetX = Q,c = ln(l -p), and S = rin(i7)/cl, where U ~ U(0,1).

2. While S < n + 1, generate U ~ U(0,1), and set X = X + 1 and S = S+\\n(U)/c\.

3. Return X.

■ EXAMPLE 4.2 (Binomial Generation)

A MATLAB implementation of Algorithm 4.3 is given next. In the last part of the code the observed counts are compared with the expected counts.

DISCRETE DISTRIBUTIONS 8 9

’/obingen.m n = 100; p = 0.1; mu = n*p; N = 10~5; x = zeros(l,N); c = log(l-p); for i=l:N

s = ceil(log(rand)/c); while s < n + 1

x(i) = x(i)+ 1; s = s + ceil(log(rand)/c);

end end

xx = [floor(mu - 4*sqrt(mu)): 1 : ceil(mu count = hist(x,xx); ex = binopdf(xx,n,p)*N; hold on plot(xx,count,’or’) plot(xx,ex,’.b’) hold off

+ 4*sqrt(mu))];

Finally, as a result of the central limit theorem for the binomial distribution (see Property 4 above), the distribution of X ~ Bin(n,p) is close to that of Y ~ N(np— l/2,np(l —p)) as n becomes large. This leads to the following approximate generation algorithm.

Algor i thm 4.4 (Bin(n,p) Generator via the Normal Approximat ion)

1. Generate Y ~ N(0,1).

2. Return X = max ί θ , \np + \ + Z^/np(l - p) j .

As the approximation can be quite inaccurate for certain choices of n and p, we recommend the following exact recursive methods instead for n > 10.

The first method relies on the fast NegBin(r, p) generator given in Algorithm 4.10 and Property 2 on Page 95.

Algor i thm 4.5 (Recursive Bin(n,p) Generator (I))

1. If n ^ 10, output X ~ Bin(n,p) using Algorithm 4-2; otherwise, proceed with the next step.

2. Set k = \np]. Generate Y ~ NegBin(fc,p) via Algorithm 4.10 and set T = Y + k.

3. IfT^ n, generate Z ~ Bin(n — T,p) and output:

X = k + Z ;

otherwise (that is, ifT> n), generate Z ~ Bin(T — n,p) and output:

X=k-Z.

9 0 PROBABILITY DISTRIBUTIONS

The following MATLAB code implements the algorithm. With n = 1010 and p = 1/2, the number of recursive calls to the function binomialrnd.m is typically 4 or 5.

func t ion x=binomia l rnd(n ,p) "/. r e c u r s i v e b inomial g e n e r a t o r i f n<=10

x=sum(rand( l , n )<p) ; e l s e

k=ce i l (n*p) ;Y=nbinrnd(k,p) ;'/, g e n e r a t e NegBin(k.p) T=k+Y; i f T<=n

x=k+binomia l rnd(n-T,p) ; e l s e

x=k-b inomia l rnd (T-n ,p ) ; end

end

The second recursive method relies on a fast Beta(a,/3) generator (for example, Algorithm 4.25) and is based on Property 1 on Page 87 and Property 6 on Page 104.

A l g o r i t h m 4.6 (Recur s ive Bin(n, p) G e n e r a t o r ( I I ) )

1. If n ^ 10, output X ~ Bin(n,p) using Algorithm 4-2; otherwise, proceed with the next step.

2. Set k = \np\. Generate U^) ~ Beta (A;, n + 1 — k) via Algorithm 4.25.

3. If f/(fc) ^ p, generate Z ~ Bin ( n — fc, ̂ _t/(le) J and output:

X=k+Z;

otherwise (that is, ifU^ > p), generate Z ~ Bin ( k — 1, [^ p ) and output:

X = k-Z .

The following MATLAB code implements the algorithm. With n — 1010 and p = 1/2, the number of recursive calls to the function binomrnd_beta.m is typically 6 or 7, but despite this the recursive algorithm based on the Beta(a,/3) generator is faster than the one based on the NegBin(r,p) generator.

function x=binomrnd_beta(n,p) 7. recursive binomial generator based on Beta dist. if n<=10

x=sum(rand(1,n)<p); else

k=ceil(n*p) ;Uk=betarnd(k,n+l-k) ;’/, generate beta r.v. if Uk<p

DISCRETE DISTRIBUTIONS 9 1

end

eis«

end

x=k+binomrnd.

x=k-binomrnd.

_beta(n-

_beta(k-

-k

-1

(p-Uk)/(l-Uk));

(Uk-p)/Uk);

4.1.3 Geometrie Distribution

The pdf of the geometr ic distribution is given by

f(x;p) = (l-p)x~1p, i = l ,2,3, (4.1)

where 0 ^ p ^ 1. We write the distribution as Geom(p). The geometric distribution is used to describe the time of first success in an infinite sequence of independent Bernoulli trials with success probability p. Examples of the graph of the pdf are given in Figure 4.2.

o.eo

0.4

0.2 o-"

-G(0.6)

o ·

G(0.3)

4—ô^ 8 10 12 14 16

x

Figure 4.2 The pdfs of the Geom(0.3) (solid dot) and Geom(0.6) (circle) distributions.

Remark 4.1.1 (Alternat ive Definition) There is another, nonequivalent, defi-nition of the geometric distribution, where the pdf is given by

f(x;p) = (l-p)xp,x = 0,1,2,

that is, a shifted version of (4.1). This alternate definition describes the number of trials before the first success in an infinite sequence of independent Bernoulli trials. We will always assume the definition in (4.1), unless otherwise specified. To distinguish the two cases, we denote the geometric distribution starting at 0 with Geom0(p). If X ~ Geom(p), then (X — 1) ~ Geom0(p).

9 2 PROBABILITY DISTRIBUTIONS

Table 4.3 Moment and tail properties of the Geom(p) distribution.

Property

Expectation

Variance

Probability generating

Tail probability F(X >

function

x)

1

P 1-p

p2

zp

l - z { l -

{l-pf

-p)

Condition

\z\^l

1 = 1 ,2 , . . .

Other properties and relations are:

1. Memoryless property: ¥(X >x + y\X>x) = V(X > y) for x, y = 1, 2, —

2. Sum: If Χχ,...,Xn ~ Geomo(p), then

n

^Xk ~ NegBin(n,p) . fc=l

3. Exponential distribution: Let Y ~ Εχρ(λ), with λ = — ln(l — p). Then, \Y~\ ~ Geom(p).

4. Convergence to exponential distribution: Let Xn ~ Geom(pn), with pn —> 0 a s n - t oo. Then,

p n X„ —> F ~ Exp(l) as n —> oo .

The relationship between the exponential and geometric distributions described in Property 3 yields the following generator.

Algor i thm 4.7 (Geom(p) Generator (I))

1. Generate Y ~ Exp(— ln(l — p)).

2. Output X = \Y].

Writing Algorithm 4.7 directly in terms of uniform random variables gives:

Algori thm 4.8 (Geom(p) Generator (II))

1. Generate U ~ U(0,1).

2. Output X=\$$a

DISCRETE DISTRIBUTIONS 9 3

4.1.4 Hypergeometric Distribution

The hypergeometr ic distribution has pdf

f(x;n,r,N) Q( N-r\ n—x)

o max{0, r + n — TV} ^ x ^ min{n, r } ,

where TV, n ^ TV, and r ^ TV are positive integers. We write the distribution as Hyp(ro,r, TV). The hypergeometric distribution is used in the following situation. Consider an urn with TV balls, r of which are red. Draw n balls from the urn at random without replacement. Then, the number of red balls among the n chosen balls has a Hyp(n, r, TV) distribution. A consequence is that the sequence X L , X2, ■ ■ ■ with XN ~ Hyp(n,pTV, TV) converges in distribution to B\n(n,p) as TV —> oo.

Examples of the graph of the pdf are given in Figure 4.3. Here, the same values for p are used as in Figure 4.1, namely, p = 0.1 (solid dot) and p = 0.5 (circle).

0.4

0.3

0.2

0.1

. ^r-Hyp(20,30,300)

Hyp(20,150,300)

o o o

QO O O O O • o o «

10 X

15 20

Figure 4.3 The pdfs of the Hyp(20,30,300) (solid dot) and Hyp(20,150,300) (circle) distributions.

Table 4.4 Moment properties of the Hyp(n, r, N) distribution.

Property Condition

Expectation

Variance

TV r

TV ( ' - * ) T V - n

T V - 1

Prob. gen. function ( j ^ - ( n ^ ! ^(-n, -r;N-n-r + l;z) 1*1 < i

Here, 2-F1 is the hypergeometric function (see Section D.9.7). The urn description "3° 716 of the Hyp(n, r, TV) distribution motivates the following generation algorithm.

9 4 PROBABILITY DISTRIBUTIONS

Algor i thm 4.9 (Hyp(n, r, TV) Generat ion) Consider an urn with N balls, numbered 1 , . . . , N. The first r of these are red.

1. Draw n balls uniformly without replacement from the N numbered balls.

2. Let X be the number of red balls.

■ EXAMPLE 4.3 (Hypergeometr ic Generat ion)

The following MATLAB code gives an implementation of the above algorithm.

"/.hyperg N = n = r = w = w(l K = x = for

end

100 20; 30;

.m ; ’/»total number of balls % take n 7, number

zeros(l,N); r) 10"

= 1; 5 ; 7,sampl(

zeros(l,K); i=l [s, x(i

:K

balls of red balls

3 size

ix] = sort(rand(1,N)); ) = sum(w (ix(l:n)));

Algorithm 4.9 is not efficient for large TV. A universally fast, but much more complicated generation algorithm can be found in [8, Page 545].

4.1.5 Negative Binomial Distribution

The negative binomial distribution has pdf

nx;r,P) = ^ ^ p r ( l - p ) x , 1 = 0 , 1 , 2 , . . . , Γ ( Γ ) Χ !

where r ^ 0 and 0 < p < 1. We write the distribution as NegBin(r, p). In many applications r is a positive integer, r — n. The distribution is then also known as the Pascal distribution, in which case the pdf can be written as

/ ( * ; n , p ) = ( " ^ " ^ " ( l - p f , i = 0 , l , 2 , . . . .

The Pascal distribution describes the number of failures before the n-th success in a Bernoulli process. The dependence of the distribution on r, for a fixed mean, is illustrated in Figure 4.4.

DISCRETE DISTRIBUTIONS 9 5

0.25

I

0.2

0.15

0.1

0.05

0^

NegBin (0.5,1/21) NegBin(10,0.5)

o o • o

• o β · .

o o 0 ·

10 X

15 20

Figure 4.4 The pdfs of the NegBin(0.5,1/21) (solid dot) and NegBin(10,0.5) (circle) distributions.

Table 4.5 Moment and tail properties of the NegBin(r,p) distribution.

Property Condition

Expectation

Variance

Probability generating function

rÇi-p)

V r(l-p)

p2

l - z ( l - p )

Tail probability f(X > x) h-P(x + 1, r)

N O a; = 0,1, 2,.

Here, Ιχ(α,β) denotes the incomplete beta function. Other properties and relations are:

1. Bernoulli process: Consider an infinite sequence of independent Bernoulli trials with success parameter p. Let Yn denote the number of failures before the n-th success and let Nt denote the number of successes in the first t trials. Then,

{Yn ^ x} 4Φ {Nx+n ^ n) for all n and x € {0 ,1 , . . . } .

Moreover, Nx+n ~ Bin(a; + n,p) and Yn ~ NegBin(n,p).

2. Geometric distribution: If Χχ,... ,Xn ~ Geomo(p), then

715

^ X f c ~ N e g B i n ( n , p )

9 6 PROBABILITY DISTRIBUTIONS

3. Negative binomial distribution: Let Xk ~ NegBin {rk,p), k = 1 , . . . ,n, inde-pendently. Then,

Y^Xk ~ NegBin(^r fc, p) . fc=l ^ f c = l '

4. Convergence to the Poisson distribution: Let Xn ~ NegBin (n, 1 — ^ ) . Then,

X n —► y ~ Poi(A) as B ^ O O .

5. Gamma-Poisson mixture: If Λ ~ Gamma(n, y^r) and (X | Λ = λ) ~ Poi(A), then X ~ NegBin(n,p).

For small integer-valued r (r = n), we can generate a NegBin (n,p) random variable via the sum of n Geomo(p) random variables (see Property 2 above), leading to the following algorithm.

Algorithm 4.10 ( NegBin (n,p) Generator via Geometric Distribution)

1. Generate [/1;..., L7n ~ U(0,1).

2. Set c = ln(l - p) and Y, = [\n(Ui)/c\ , i = 1 , . . . , n.

3. Return X = YJli^1Yi.

This direct approach becomes inefficient for large r and does not work for non-integer values of r. We recommend the following uniformly fast generator (valid for all values of r and p), based on Property 5 above.

Algorithm 4.11 (Uniformly Fast NegBin(r,p) Generator)

1. Generate Λ ~ Gamma ( r, y^r ) ·

2. Conditional on A, generate and output X ~ Poi(A).

4.1.6 Phase-Type Distribution (Discrete Case)

The discrete phase-type distribution has pdf

f{x;a,A) = αΑχ-1{Ι-Α)1 χ = 1 , 2 , . . . ,

where a is a 1 x m probability vector, 1 is a m x 1 vector of Is, A is an m x m matrix such that I — A has an inverse and such that

*=(o* 1") "^ 632 is the transition matrix of a Markov chain, and Ao = (I — A)l. We write the

distribution as DPH(a,A). A random variable X ~ DPH(a,^4) can be thought of as the time of absorption in state m + 1 of a discrete-time Markov chain on { 1 , . . . , m, m + 1} with transition matrix P and initial distribution (α,Ο).

DISCRETE DISTRIBUTIONS 9 7

Table 4.6 Moment and tail properties of the DPH(a, A) distribution.

Property Condition

Expectation (μ) α{1 — A)~ll

Variance aA(I — A)21 + μ — μ2

Probability generating function za(I — z A ) - 1 A o \z\ ^ 1

Tail probability F{X > x) aAxl x = 1 ,2, . . .

Other properties and relations (see, for example, [26]) are:

1. Geometric distribution: The geometric distribution Geom(p) is the simplest discrete phase-type distribution, with m = l, A=l—p and a = 1.

2. Negative binomial: If X ~ NegBin(n,p), then X + n ~ DPH(a,^4), with a the l x n vector ( 1 , 0 , . . . , 0) and A the n x n matrix

fl-p V 0 . . . 0 \ 0 1-p p ... 0

A= : : · · . · . . :

0 . . . 0 1 - p p \ 0 . . . 0 0 1 - p /

3. Sum: The sum of a finite number of independent phase-type random variables has again a phase-type distribution.

4. Mixture: The mixture of a finite number of phase-type distributions is again a phase-type distribution.

The fact that X ~ DPH(o:,^4) can be viewed as the time of absorption of a Markov chain leads to the following algorithm, where P(y, ·) denotes the y-th row of the transition matrix P and a denotes the initial distribution.

A l g o r i t h m 4.12 ( D P H ( a , A) G e n e r a t o r )

1. Draw Y0 ~ a . Set t = 0.

2. While Yt^m + 1, draw Yt+1 ~ P(Yt, ·) and sett = t + l.

3. Return X = t .

9 8 PROBABILITY DISTRIBUTIONS

■ E X A M P L E 4.4 (NegBin as a DPH D i s t r i b u t i o n )

The negative binomial distribution can be seen as a phase-type distribution — see Property 2 above. The following MATLAB program implements Algorithm 4.12 for the corresponding phase-type distribution with m = 4 and p = 0.2 and compares the samples (with m subtracted) with the true, NegBin(m,p), distribution.

%negbin.m a lpha = [1 0 0 0] ; p = 0 . 2 ; m = 4 ; A = [1 -p , p , 0 , 0; 0, 1-p ,p ,0 ; 0 , 0 , 1-p, p ; 0, 0, 0 , i - p ] ; A0 = abs((eye(m) - A)*ones(m,1) ) ; P = [A A0; z e r o s ( l , m ) 1] N = 10~5; x = z e r o s ( N . l ) ; fo r i = l : N ;

X = 0; Y = min(find(cumsum(alpha)> r a n d ) ) ; while Y ~= m+i

Y = min(f ind(cumsum(P(Y,:))> r a n d ) ) ; X = X+l;

end x ( i ) = X;

end x = x - m; % s h i f t e d samples come from a NegBin(n.p) d i s t r i b u t i o n .

xx = [0: 2*m/p] ; count = h i s t ( x . x x ) ; ex = nbinpdf(xx,m,p)*N; hold on p l o t ( x x , c o u n t , ' . r ' ) p l o t ( x x , e x , O b ' ) hold off

4.1.7 Poisson Distribution

The pdf of the Po i s son distribution is given by

f{x;X) = ^re-\ 1 = 0 , 1 , 2 , . . . , x\

where λ > 0 is the r a t e parameter. We write the distribution as Poi(A). Examples of the graph of the pdf are given in Figure 4.5.

DISCRETE DISTRIBUTIONS 99

0.& r

0.4-

.0 .3-

Ό .2 -

0.1 -

-λ = 0.7

-λ = 4 o o

o o

Ø 0 10 12 x

-λ = 11

14 16 18 20

Figure 4.5 The pdfs of the Poi(0.7) (solid dot), Poi(4) (circle), and Poi(ll) (plus) distributions.

The Poisson distribution is often used to model the number of arrivals of some sort during a fixed period of time. The Poisson distribution is closely related to the exponential distribution via the Po i s son process ; see Section 5.4. 170

Table 4.7 Moment and tail properties of the Ροί(λ) distribution.

Property Condition

Expectation λ

Variance λ

Probability generating function β _ λ ( 1 - 2 ) \z\ ^ 1

Tail probability Ψ(Χ > x) P( [x + l j , A) a; = 0 ,1 ,2 , .

Here, P{a, x) is the incomplete gamma function. Other properties and relations are:

1. Binomial limit: Let Xn ~ Bin(n, A/n). Then,

Xn —> Y ~ Poi(A) as n —> oo .

2. Sum of Poisson random variables: Let X ~ Poi(A) and Y ~ Poi(^) be inde-pendent. Then, Z = X + Y ~ Poi(A + v).

3. Multinomial distribution: Let 0 ^ Pi ^ 1, i = 1 , . . . , m and p = ( p i , . . . , p m ) . If iV~Poi(A) and ((Χχ,... ,Xm)\N) ~ Mnom(7V,p), then

Xi ~ Poi(Api), z = 1 , . . . , m, independently.

us» 716

4. Normal approximation: X N(A, A) for large A.

5. Binomial conditional distribution: Let X ~ Poi(A) and F ~ Ροί(μ) be inde-pendent, and let Z = X + Y. Then,

A {X | Z = z) ~ Bin Lz

λ + μ

1 0 0 PROBABILITY DISTRIBUTIONS

6. Exponential distribution: Let {1^} ~ Εχρ(λ). Then,

r n

X = max in: } Y} ; < 1 } ~ Poi(A) . (4.2)

That is, the Poisson random variable X can be interpreted as the maximal number of iid exponential variables whose sum does not exceed 1.

Let {Ui} ~ U(0,1). Rewriting (4.2), we see that

X = max < n : Y J — InUj < A >

= max In: In ( J J U3A > - λ \

r ™

in: ft K 3 = 1

ί / , - > β - λ | (4.3)

has a Poi(A) distribution. This leads to the following algorithm.

Algor i thm 4.13 (Ροΐ(λ) G e n e r a t o r )

1. Set n = 1 and a = 1.

2. Generate Un ~ U(0,1) and set a = aU„.

3. If a ^ e ~ \ sei n = n + 1 and go to Step 2.

4- Otherwise, return X = n — 1 as a random variable from Poi(A).

As e~x is small for large λ, for large λ this algorithm becomes slow and more random numbers Uj are required to satisfy Π?=ι ^ ί < β _ λ · However, for large n we can skip a considerable number of steps in Algorithm 4.13 (say m = an steps for some a less than but close to 1) by drawing Σ™=ι ~ In Uj directly from a Gamma(m, 1) distribution. This observation gives rise to the following recursive algorithm, due to Ahrens and Dieter [2], for generating from a Poisson distribution with large λ, say λ > 100.

Algor i thm 4.14 (Ροΐ(λ) G e n e r a t o r for L a r g e λ )

1. Setm= [(7/8) λ ] .

2. Generate Y ~ Gamma(m, 1).

3. IfY^ A, generate Z ~ Poi(A — Y) and set X = m + Z; otherwise, generate X ~ B i n ( m - 1 , A / Y ) .

Algorithm 4.14 relies on fast generation of Gamma(n, 1) and Bin(n,p) random vari-ables for large n, and p close to 1. Atkinson [3] proposes an acceptance-rejection algorithm for generating Poisson random variables using the Logistic^, σ) distribu-tion, with σ = -j== and μ = λ σ, as the proposal distribution. Although the logistic

DISCRETE DISTRIBUTIONS 1 0 1

distribution is continuous, it can be related to a discrete distribution via the floor function |_ ■ J. The details of the algorithm are as follows.

Algori thm 4.15 (Poi(A) Generator via Logistic(//, σ))

1. Set σ = π / \ / 3λ , μ = λσ, c = 0.767 — 3.36/λ, and k = lnc — λ — Ιησ.

2. Until X > —0.5, keep generating:

μ - In (ΐ=μ) , Λ X=- K-JL-L, t / ~ U ( 0 , l ) .

σ

3. Let N = [X + 0.5J and V ~ U(0,1). / /

M - a X + l n ( ( i + J _ ^ ) 2 ^ f c + iVlnA-ln(JV!),

accept N as a random variable from Poi(A); otherwise, repeat from Step 2.

In cases where a large number of independent Poisson random variables are desired, one can generate them in batches as in the following algorithm, based on Property 3 above.

Algori thm 4.16 (Generat ing Poisson R a n d o m Variables in Batches)

1. Draw N ~ Poi(A).

2. Conditional on N, draw X = (Χχ,.. .,Xm) ~ Mnom(iV, p ) . Output X , where Xi ~ Ροί(λρ,), i = 1 , . . . , m, independently.

4.1.8 Uniform Distribution (Discrete Case)

The discrete uniform distribution has pdf

f(x;a,b) = ———, x 6 {a, ...,b} ,

where a, b € Z, b ̂ a are parameters. The discrete uniform distribution is used as a model for choosing a random element from {a,..., b} such that each element is equally likely to be drawn. We denote this distribution by DU (a, b). The uniform distribution can also be defined on any finite set ÜT, in which case

/ ( * ; | . r | ) = T ~ , ^ € ^ T ,

where 12J\ denotes the number of elements in 3£. We write the distribution as DU(,T) or simply U(-T).

As with its continuous counterpart, generating uniform draws from a discrete uniform distribution is central in Monte Carlo simulation. There are numerous specialized algorithms for doing so when the set 3£ is nontrivial. Such nontrivial sets include the set of all permutations of the integers { 1 , 2 , . . . , n } , the set of all (free) trees, and the set of fc-regular bipartite graphs with m and n vertices for the two sets in the partition. See also Section 3.3. "3° 70

102 PROBABILITY DISTRIBUTIONS

Table 4.8 Moment properties of the DU(a, b) distribution.

Property Condition

Expectation

Variance

2

(b-a)(b-a + 2)

12

Probability generating function — — \z\ ^ 1 (b- a + 1)(1 - z)

Drawing from a discrete uniform distribution on {a, . . . , & } , where a and b are integers, is carried out via a simple table lookup method.

Algori thm 4.17 (DU(a , b) Generator)

Draw U ~ U(0,1) and output X = [a + (b + 1 - a)U\.

4.2 CONTINUOUS DISTRIBUTIONS

We list various continuous distributions in alphabetical order. Recall that an abso-"®° 611 lutely continuous distribution is completely specified by its pdf.

4.2.1 Beta Distribution

The be ta distribution has pdf

f(x;a,ß) = ^ > , * € [ 0 , 1 ] , B(a,ß)

715 where a > 0 and ß > 0 are called shape parameters and B is the beta function:

Γ(α)Γ(/3) Β(α,β) =

Γ(α + β)

We write the distribution as Beta (a, β). The dependence of the beta distribution on its shape parameters is illustrated in Figure 4.6.

CONTINUOUS DISTRIBUTIONS 1 0 3

Figure 4.6 Various pdfs of the beta distribution.

The Beta( l /2 ,1/2) distribution is known as the arcsine distribution, which we will denote as Arcsine. A related distribution is the beta-prime or inverted b e t a distribution, obtained from the standard beta distribution via Y = X/(l — X).

For a multivariate generalization of the beta distribution, see the Dirichlet dis-tribution in Section 4.3.1.

Table 4.9 Moment and tail properties of the Beta(a,/3) distribution.

Property

Expectation

\ / c i T i t i n / ^ i i V (XL XCXihlXjKZ

Moment EXk

Characteristic function

Tail probability W(X > x)

a

a + ß

aß {a + ß)2(a + ß + l) B(a + k,ß)

B(a,ß)

1F1(a:a + ß;it)

h-x(ß,a)

Condition

x € [0,1]

1 0 4 PROBABILITY DISTRIBUTIONS

Here, Ix(a,ß) denotes the incomplete beta function and ι ί \ (α : ; 7; a;) the confluent 715 hypergeometric function. Other properties and relations are:

1. Uniform distribution: Beta(l, 1) = U(0,1).

2. Arcsine distribution: If U ~ U(0,1), then

X = 1/2 + cos(nU)/2 = COS2(TTU/2) ~ Beta( l /2 ,1 /2) = Arcsine .

3. Student's t distribution: If X ~ Beta(ct,a), then

In addition, if X ~ Beta ( | , τ|) and -B ~ Ber(l/2) independently, then

4. Gamma distribution: Let X ~ Gamma(a,é>) be independent of Y ~ Gamma(/?,0). Then,

■y-j-y ~Beta(a, /?) .

More generally, suppose Xk ~ Gamma(afc, 1), fc = l , . . . , n , independently. Then, the random variables

Xi H H t , yfc = ^ Γ ^ : 7 Τ ' k = l,...,n-l,

and 5„ = X\ + ■ ■ ■ + Xn are independent. Moreover,

Yk ~ Beta(ai H \-ak, Ofc+i) and 5„ ~ Gamma(o;1 H h α„, 1) .

5. Uniform distribution: Let U,V ~ U(0,1). Then, conditional on t/1/" + ^χ//3 <ξ ΐ5 We have

t / i / f + "Vi/ß ~ B e t a ( Q ^ ) a n d f / 1 / a~Beta(a, /3 + l ) .

6. Order statistics: Let .ΧΊ,..., X n ~ U(0,1) and let X(\) ^ X{2) ^ · · · ^ -^(n) be the order statistics. Then,

X(i) ~ Beta(i, n + 1 — i) .

7. Moments: Let Χχ,... ,Xn ~ Beta (a,/3). If 7 > min j ^, ^ - , ^ τ Κ then (see

[10])

F π ι γ y 12-Y _ TT Γ ( α + J^) r(/? + ÏÏ) r ( l + (J + ^ )

CONTINUOUS DISTRIBUTIONS 1 0 5

Depending on the values of the parameters of the beta distribution, we have the following algorithms, the last being the most generally applicable.

If a or ß equals 1, the inverse-transform method yields:

Algor i thm 4.18 (Beta(a , 1) Generator)

Draw U ~ U(0,1) and output X = U1/a.

Algori thm 4.19 (Beta( l , /3 ) Generator)

Draw U ~ U(0,1) and output X = l - U1^.

From Property 2 above we have:

Algor i thm 4.20 ( B e t a ( l / 2 , 1 / 2 ) = Arcsine Generator)

Draw U ~ U(0,1) and output X = cos2(?ri7/2).

For generating from a symmetric beta distribution X ~ Beta (a ,a ) , with a > 1/2, the polar method (see Section 3.1.2.7) can be employed. In particular, using "S" 54 fR{r) = 2 c r ( l - r2)c~l with c = a - 1/2, we find that the pdf of X = RcosO is that of IB — 1 with B ~ Beta (a ,a ) . Since R can be written as Vl — Ullc, with U ~ U(0,1), we obtain the following algorithm.

Algor i thm 4.21 ( B e t a ( a , a ) Generator (I) , a > 1 /2)

Draw UUU2 ~ U(0,1) and output B = \ ( l + \J I - uf^ COS(2TTU2) J .

The evaluation of the cosine function can be avoided by a rejection step as in the following algorithm.

Algori thm 4.22 (Beta(a , a ) Generator (II), a > 1 /2)

1. Draw U ~ U(0,1) and V ~ U ( - l , 1), independently. Set S = X2 + Y2.

2. If S > 1, repeat from Step 1; otherwise, output

Another generation method for symmetric beta distributions, which holds for general a > 0 follows from Property 3 above:

Algori thm 4.23 (Beta(a , a) Generator)

Draw T ~ t2a and return X — \ ( 1 + , "Γ 2 1.

An alternative technique for generating from the symmetric Beta (a, a) distribu-tion that uses approximate inversion can be found in [22]. Unlike Algorithms 4.21 and 4.22 it applies when a ^ 1/2, and can be more efficient for small a.

1 0 6 PROBABILITY DISTRIBUTIONS

For integer a = m and ß = n, Property 6 yields the following algorithm.

Algori thm 4.24 (Beta(a , /3) Generator wi th Integer a = m and ß = ri)

1. Generate U\,..., Um+n-\ ~ U(0,1).

2. Return the m-th order statistic U(m) as a random variable from Beta(m, n).

It can be shown that the total number of comparisons needed to find t/(m) is (m/2)(m + 2n — 1), so that this procedure loses efficiency for large m and n.

Property 4, however, yields the most generally applicable algorithm.

Algor i thm 4.25 (Beta(a , /3 ) Generator)

1. Generate independently Y\ ~ Gamma(a, 1) and Y2 ~ Gamma(/3,1).

2. Return X = Y\/(Y\ + Y2) as a random variable from Beta(a,/3).

4.2.2 Cauchy Distribution

The Cauchy distribution has pdf

A location-scale version of this distribution is given by the pdf / ( x ; μ, σ) = f((x — μ)/σ)/σ; we write Cauchy^, σ) for the corresponding distribution. The graph of the pdf of the Cauchy(0,1) distribution is given in Figure 4.7.

0.4r

0.3 A

S0 -2 / \ 0.1 / \

0 I 1 . r^"^ 1 1 1 l ! I -10 - 8 - 6 - 4 - 2 0 2 4 6 8 10

x

Figure 4.7 The pdf of the standard Cauchy distribution.

Table 4.10 Moment and tail properties of the Cauchy(0,1) distribution.

Property

Expectation undefined (00 — 00)

Characteristic function e~'*'

Tail probability ¥(X > x) \ ~ \ arctan(x)

CONTINUOUS DISTRIBUTIONS 1 0 7

Other properties and relations are:

1. Student's t distribution: Cauchy(0,1) Ξ t±.

2. Normal distribution: Let Υι,Υ2 ~ N(0,1). Then,

^ ~ Cauchy(0,1) *2

In addition,

Y1-Y2 Cauchy(0,1)

3. Reciprocal: If X ~ Cauchy^, σ), then γ ~ Cauchy ( 2^_σ2, 2f.2 ] .

4. SWi: Let A", ~ C a u c h y ^ , σ;), i = 1 ,2 , . . . , n be independent. Then,

^ X i ~ Cauchyί Σμί'Σσί

t = l ^ i = l » = 1

Since Cauchy^, σ) forms a location-scale family, we only consider generation from Cauchy(0,1). The following algorithm is a direct consequence of the inverse-transform method and the fact that cot(7rx) = tan (πχ — f )·

A l g o r i t h m 4.26 (Cauchy(0,1) G e n e r a t o r )

Draw U ~ U(0,1) and output X = cot^U) (or X = tsm(nU - π/2)).

The ratio of uniforms method yields the following algorithm. «3° 66

A l g o r i t h m 4.27 (Cauchy(0,1) G e n e r a t o r v ia R a t i o of Un i fo rms)

1. Generate U, V ~ U(0,1). Set V = V - 1/2.

2. If U2 + V2 ^ 1, set X = V/U and return X. Otherwise, repeat from Step 1.

Finally, Property 2 above leads to the following algorithm.

A l g o r i t h m 4.28 (Cauchy(0,1) G e n e r a t o r v ia R a t i o of N o r m a l s )

1. Generate YUY2 ~ N(0,1).

2. Return X = Yi/Y2.

1 0 8 PROBABILITY DISTRIBUTIONS

4.2.3 Exponential Distribution

The exponent ia l distribution has pdf

f(x;X) = Xe~Xx, x^O,

where λ > 0 is the rate parameter. We write the distribution as Εχρ(λ). The exponential distribution can be viewed as a continuous version of the geometric distribution. It plays a central role in the theory and application of Markov jump

ts' 166 processes, and in stochastic modeling in general, due to its memoryless property (see Property 1 below). Graphs of the pdf for various values of λ are given in Figure 4.8.

Figure 4.8 Pdfs of the Εχρ(λ) distribution for various values of A.

Table 4.11 Moment and tail properties of the Εχρ(λ) distribution.

Property Condition

Expectation

Variance

Moment generating function

Tail probability Ψ(Χ > x)

Other properties and relations are:

1. Memoryless property: If X ~ Εχρ(λ), then

¥{X > s + t\X > s)=¥(X >t), s,t^0.

2. Sum: If Χχ,..., Χη ~ Εχρ(λ), then V^ Xk ~ Gamma(n, λ). fe=l

1 λ

1 A5

Λ

λ - ί „-λχ

t< λ

x>0

CONTINUOUS DISTRIBUTIONS 1 0 9

3. Minimum: Let Xi ~ Εχρ(λ^), i = 1, . . . , n , independently. Then, M = min{Xi , . . .,Xn} ~ Εχρ(λι H (- λ η ) . In addition, P ( M = X») = λ ί / ( λ ι + ••· + λ„).

4. Poisson and geometric distributions: Let μ > 0 be an arbitrary constant and

Z ^ 1 be a truncated Ροί(μ) random variable with pdf V(Z = z) = z,r^_1y

Let G ~ Geom0(l — e~M) and E/i, t / 2 , . . . ~üd U(0,1), then

X = μ (G + mm{Uu ..., Uz}) ~ Exp(l) .

Noting that U ~ U(0,1) implies 1 — U ~ U(0,1), we obtain the following inverse-transform algorithm.

Algori thm 4.29 (Εχρ(λ) Generator)

Draw U ~ U(0,1) anrf output X = —j \nU.

There are many alternative procedures for generating variables from the expo-nential distribution. The interested reader is referred to [8]. Note that Εχρ(λ) forms a scale family. Hence, it suffices to specify how to generate Y ~ Exp(l) and then ·®' 47 return Y/X as a random variable from Εχρ(λ).

4.2.4 F Distribution

The F or Fisher—Snedecor distribution has pdf

Γ(ΐΕ±η) ( m / n ) m / V m - 2 ) / 2

f(x; m , n ) - r ^ p ( ^ [χ + {m/n)x](m+n)/2 ' x > ° '

where m and n are positive integer parameters known as the degrees of freedom. We write this distribution as F(m, n). Graphs of the pdf for various values of the parameters are given in Figure 4.9. The F distribution arises frequently in statistics in the context of hypothesis testing and analysis of variance; see also Section B.1.4. "®° 660

Figure 4.9 Pdfs of the F distribution for various values of the parameters.

1 1 0 PROBABILITY DISTRIBUTIONS

Table 4.12 Moment and tail properties of the F(m, n) distribution.

Property Condition

Expectation ^32 n > 2

Variance ? o^7~ i\ n > 5

Tail probability P (X > x) In/(n+m x) ( ^ , y j ^ ^ °

"S" 715 Here, Ιχ(α,β) denotes the incomplete beta function. Other properties and relations are:

1. Expectation and variance: In addition to the cases listed above, for n = 1,2 the expectation is oc and the variance does not exist. For n = 3,4 the variance is oo.

2. Reciprocal: If X ~ F(m,n), then γ ~ F(n ,m) .

3. Chi-square distribution: Let X ~ χ ^ and y ~ χ^ be independent. Then,

X / m

Y/n F(m, n) .

4. Exponential distribution: If X i , X 2 ~ Exp(l), then Z = X / y ~ F(2,2). That is, fz(z) = 1/(1+ z)2,z^0.

A simple generator is based on the defining Property 3 above.

Algor i thm 4.30 (F(m, n ) Generator)

1. Draw X ~ χ ^ and Y ~ χ^ independently.

2. Output

Let X and y be as in Algorithm 4.30. Using the fact that χ^ = Gamma(ci/2,1/2) and Property 4 of the beta distribution on Page 104, we can see that B = X / ( X + y ) ~ Beta(m/2,n/2) , which leads to the following algorithm.

Algori thm 4.31 ( F ( m , n ) Generator via B e t a Distr ibut ion)

1. Draw B ~ Beta(m/2, n /2 ) .

2. Output

χ = nB m(l-B) '

CONTINUOUS DISTRIBUTIONS 1 1 1

4.2.5 Fréchet Distribution

The Fréchet or t y p e II ex treme value distribution has pdf

f(x;a) = ax~a~1e~x ", x > 0 ,

where a > 0 is the shape parameter. Graphs of the pdf for various values of a are given in Figure 4.10. A location-scale version of this distribution is given by the pdf / ( χ ; α , μ , σ ) = f((x — μ)/σ;α)/σ, χ ^ μ; we write this distribution as F r é c h e t ( a ^ , a ) .

Figure 4.10 Pdfs of the standard Fréchet distribution for various values of the shape parameter a.

The Fréchet distribution is one of three possible limiting distributions for the maximum of iid random variables. Similarly, the "reflected" distribution, with pdf "S" 705 / (—x) , x < 0, is one of the three possible limiting distributions for the minimum of iid random variables.

Table 4.13 Moment and tail properties of the Frechet(a,0,1) distribution.

Property Condition

Expectation Γ(1 — a - 1 ) a > 1

Variance Γ(1 - 2α" 1 ) - Γ(1 - a'1)2 a > 2

Tail probability V(X > x) 1 - e x p ( - r r - Q ) x > 0

1 1 2 PROBABILITY DISTRIBUTIONS

Since Fréchet(a, μ, σ) forms a location-scale family, it suffices to describe how to generate from the standard Eréchet distribution. This can be done directly via the inverse-transform method.

Algori thm 4.32 (Fréchet(a, 0 ,1 ) Generator)

Draw U ~ U(0,1) and output X = ( - I n [ T T 1 / " .

4.2.6 Gamma Distribution

The g a m m a distribution has pdf

f{x;a,X) = — , x > 0 , (4.4) Γ(α)

where a > 0 is called the shape parameter and λ > 0 the scale parameter. In 716 the formula for the pdf, Γ is the gamma function (see Section D.9.5). We write the

distribution as Gamma(a, λ). An important special case is the Gamma(n/2,1/2) distribution with n € {1,2,

. . . } , which is called a chi-square distribution; the parameter n is then referred to as the number of degrees of freedom. The distribution is written as χ^. A graph of the pdf of the χ^ distribution, for various n, is given in Figure 4.11.

Figure 4.11 Pdfs of the χ^ distribution for various degrees of freedom n.

Another well-known special case is the Gamma(n,A) distribution, where n is a strictly positive integer. In this case, the distribution is known as an Erlang dis-tribution with shape parameter n and scale parameter λ. It is denoted by Erl(n, λ).

CONTINUOUS DISTRIBUTIONS 1 1 3

Table 4.14 Moment and tail properties of the Gammafa, λ) distribution.

Property Condition

Expectation — ¸

Variance —r X2

Moment generating function I ] t < λ

Tail probability F(X > x) I - P{a, \x)

Here, P(a, x) is the incomplete gamma function. Other properties and relations "3* 716 are:

1. Exponential distribution: Gamma(l,A) = Εχρ(λ).

2. Sums: If Χι,... ,Xn ~ Gamma(a,À), then

Y ^ Xk ~ Gamma(na, λ) fc=l

3. Beta distribution (I): Let Z ~ Exp(l) and Y ~ Beta(a, 1 — a) for 0 < a < 1 be independent. Then,

V Z ~ Gamma(a, 1) .

4. Beta distribution (II): Let Yk ~ Beta(ai + · · · + ctk,ctk+i), cik > 0, fc = 1 , . . . , n — 1, independently. Let l·^, ~ Gamma(aH h a n , 1) be independent of the {Yk}- Then, the random variables

n

Xk = (l-Yk-1)Y[Yj, k = l,...,n, 3=k

where YQ = 0, are independent and Xk ~ Gamma(öfe, 1), k = 1 , . . . , n.

5. F distribution: Let U ~ χ ^ and V ~ χ^ be independent. Then

i / /m

y / n F(m, n) .

6. Dirichlet distribution: Let X L , . . . , X „ + I be independent random variables with Xk ~ Gamma(afc, λ), k = 1 , . . . , n + 1. Then, with a = (a\,..., α η +ι ) a n d X = ( X 1 , . . . , X n ) T ,

■v

Dirichlet(a) . Ση + l -^

fc=lAfc

1 1 4 PROBABILITY DISTRIBUTIONS

Since Gamma(a, λ) is a scale family, it suffices to only give algorithms for gener-ating random variables X ~ Gamma(a, 1), because X/X ~ Gamma(a, λ). Since the cdf of the gamma distribution does not generally exist in explicit form, the inverse-transform method cannot always be applied to generate random variables from this distribution. Thus, alternative methods are called for. The following algorithm, by Marsaglia and Tsang [25], provides a highly efficient acceptance-rejection method for generating Gamma(a, 1) random variables with a ^ 1; see also [28, Pages 60-61] for a detailed explanation.

Algor i thm 4.33 (Gamma(a, 1) Generator for a ^ 1)

1. Set d = a — 1/3 and c = l/\/9~d.

2. Generate Z ~ N(0,1) and V ~ U(0,1) independently.

3. IfZ> - 1 / c and InU < \Z2 + d- dV + dlnV, where V = (1 + cZ)3, return X = dV; otherwise, go back to Step 2.

Ahrens and Dieter [1] propose the following acceptance-rejection procedure that does not rely on a normal generator for a > 1.

Algor i thm 4.34 (Gamma(a, 1) Generator for a > 1)

1. Set b = a — 1 and c = a + b.

2. Generate U ~ U(0,1). Set Y = J~c tan(7r(£/ - 1/2)) and X = b + Y.

3. If X < 0, repeat from Step 2; otherwise, proceed to the next step.

4. Generate V ~ U(0,1). / /

V > exp (bin (^-j - Y + In ( 1 + —

repeat from Step 2; otherwise, output X.

Ahrens and Dieter [1] show that the expected number of trials before a sample is generated from Gamma(a, 1) decreases from π (when a is close to 1) to Λ/ΊΓ (when a —» c»).

Cheng and Feast [7] propose a slightly longer algorithm based on the acceptance-rejection method with proposal given by the Burr distribution, with pdf:

χ\-ι f(x; μ, λ) = Χμ Λχ2, ζ > 0 , μ > 0 , λ > 0 .

The algorithm does not involve evaluation of the tan(·) function.

Algor i thm 4.35 (Gamma(a, 1) Generator for a > 1)

1. Set c\ = a — 1, c2 = "~ ' , cs = 2 /ci , C4 = 1 + C3, and c$ — a - 1 ' 2 .

2. Generate U,U~2 ~ U(0,1), independently. If a > 2.5, set

U1=U2+cB(l-1.86U);

otherwise, set U\ = U.

CONTINUOUS DISTRIBUTIONS 1 1 5

3. If 0 < Ui < I, set W = C2 U2/Ui and proceed to Step 4,' otherwise, repeat from Step 2.

4- if

03^+W + W"1 < c 4 or c 3 l n E / i - l n W + i y < l ,

output X = c\W; otherwise, repeat from Step 2.

For the case a < 1 one can use the fact that if X ~ Gamma(l + a, 1), and U ~ U(0,1) are independent, then XUl/a ~ Gamma(a, 1). Alternatively, one can use the following algorithm based on Property 3.

Algori thm 4.36 (Gamma(a, 1) Generator for a < 1)

1. Generate Y ~ Beta(a, 1 — a) and Z ~ Exp(l) independently.

2. Output X = Y Z as a random variable from Gamma(a, 1).

If a fast beta generator is not available, then one could use the following acceptance-rejection algorithm by Best [5], based on [1].

Algori thm 4.37 (Gamma(a, 1) Generator for a < 1)

1. Set d = 0.07 + 0.75^1 - a and b = 1 + e~da/d.

2. Generate UUU2 ~ U(0,1) and setV = bU1.

3. IfV ζ 1, then set X = dVl'a. Check whether U2 < (2-X)/{2 + X). If true, return X; otherwise, check whether U2 < e _ x . If true, return X; otherwise, go back to Step 2.

If V > 1, then set X = -ln(d(b - V)/a) and Y = X/d. Check whether U2(a + j / ( l — a)) ^ 1. If true, return X; otherwise, check if U2 < Ya~l. If true, return X; otherwise, go back to Step 2.

■ EXAMPLE 4.5 (Best 's Gamma(a, 1) Generator for a < 1)

The following MATLAB code is an implementation of Algorithm 4.37.

7,gamma_best .m N = 10~5; alpha = 0.3; d= 0.07 + 0.75*sqrt(l-alpha); b = 1 + exp(-d)*alpha/d; x = zeros(N,1); for i = 1:N

cont = true; while cont

UI = rand; U2 = rand; V = b*Ul; if V <= 1

X = d*V~(l/alpha); if U2 <= (2-X)/(2+X)

1 1 6 PROBABILITY DISTRIBUTIONS

cont = false; break; else

if U2 <= exp(-X) cont = false; break;

end end

else

X = -log(d*(b-V)/alpha);

y = X/d; if U2*(alpha + y*(l-alpha)) < 1

cont= false; break; else

if U2 <= y~(alpha - 1) cont= false;break;

end end

end end

x(i) = X; end

A random variable X ~ Gamma(n, 1) for integer n can be viewed as the sum of n independent exponential random variables (see Property 2), resulting in the following algorithm.

Algor i thm 4.38 (Gamma(n, 1) Generator W i t h Integer n)

1. Generate Uu...,Un ~ U(0,1).

2. Return X = - l n f l L i uk-

A random variable X ~ χ{ = Gamma(l/2,1/2) can be simply generated as the square of a standard normal random variable (Property 3 on Page 123).

Algor i thm 4.39 ( G a m m a ( l / 2 , 1 / 2 ) Generator)

Draw Z ~ N(0,1) and output X = Z2.

4.2.7 Gumbel Distribution

The Gumbel distribution or t ype I ex treme value distribution has pdf

f(x) = e-x-e~x, xeR.

A location-scale version of this distribution is given by the pdf f(x; μ, σ) = f((x — μ)/σ)/σ; we write Gumbel(/x,a) for the corresponding distribution. A graph of the standard Gumbel pdf is given in Figure 4.12.

CONTINUOUS DISTRIBUTIONS 1 1 7

0.4r

0.3 / \

S02 / \ 0.1 / \ v

0 I - Z , , ι ■ 1 1 - 4 - 2 0 2 4 6 8

x

Figure 4.12 The pdf of the standard Gumbel distribution.

The Gumbel distribution is one of three possible limiting distributions for the maximum of iid random variables. For example, the maximum of iid standard *& 705 normal random variables, when properly scaled, converges in distribution to the Gumbel distribution [16, Page 275]. Similarly, the Gumbel distribution reflected around 0, that is, with pdf / (—x) , is one of the three possible limiting distributions for the minimum of iid random variables. Note also that if X ~ Exp(l), then - l n X ~ G u m b e l ( 0 , l ) .

Table 4.15 Moment and tail properties of the Gumbel(0,1) distribution.

Property Condition

Expectation

Variance

Moment generating function

Tail probability Ψ(Χ > x)

As the class of Gumbel(p,a) forms a location-scale family, it suffices to only consider generating from the Gumbel(0,1) distribution. The following algorithm is a direct consequence of the inverse-transform method, as the cdf of the Gumbel(0,1) distribution is F(x) = exp(— exp(—x)).

Algori thm 4.40 (Gumbel(0,1) G e n e r a t o r )

Draw U ~ U(0,1) and output X = - l n ( - In i / ) .

-Γ ' ( 1 ) = 0.577216· ■■

ni 6

Γ ( 1 - ί ) ί < 1

1 - e"e_x

118 PROBABILITY DISTRIBUTIONS

4.2.8 Laplace Distribution

The Laplace distribution, also called the double-exponent ia l distribution, is defined via the pdf

/ ( x ) = i e - M , xeR.

A location-scale version of this distribution is given by the pdf f(x; μ, σ) = f((x — μ)/σ)/σ; we write Laplace(/z, σ) for the corresponding distribution. The pdf of the standard Laplace distribution is given in Figure 4.13.

0.5

0.4

0.3

0.2

0.1

-6 -4

Figure 4.13 The pdf of the standard Laplace distribution.

Table 4.16 Moment and tail properties of the Laplace(0,1) distribution.

Property Condition

Expectation 0

Variance 2

Moment generating function

Tail probability V(X > x)

1-t2

s g n ^ X e - N - 1) + 1

1*1 < 1

Since Laplace^, σ) forms a location-scale family, we only consider generation from Laplace(0,1). The latter is obtained by assigning a random sign to an Exp(l) random variable, leading to the following algorithm.

Algori thm 4.41 (Laplace(0,1) Generator (I))

1. Generate Y ~ Exp(l) and B ~ Ber( | ) independently.

2. Output X = {2B- \)Y.

CONTINUOUS DISTRIBUTIONS 1 1 9

The above method requires the generation of two independent random variables (one exponential and one Bernoulli random variable). However, as the next algo-rithm illustrates, it is sufficient to use only one random variable.

Algor i thm 4.42 (Laplace(0,1) Generator (II))

1. Generate U ~ U ( - l / 2 , 1 / 2 ) .

2. Output X = s g n ( [ / ) l n ( l - 2 | [ / | ) .

An alternative is based on the fact that if V and W are independent Exp(l) random variables, then V — W ~ Laplace(0,1).

Algori thm 4.43 (Laplace(0,1) Generator ( I I I ) )

1. Generate V,W ~ Exp(l).

2. Output X = V-W.

Example 3.12 provides yet another generation method.

Algori thm 4.44 (Laplace(0,1) Generator (IV))

1. Generate E ~ Exp(l) and Y ~ N(0,1) independently.

2. Output X = YV2Ë.

55

4.2.9 Logistic Distribution

The logistic distribution has pdf

/(*) = (1+e- - 3 Λ 2 ' x e

A location-scale version of this distribution is given by the pdf f(x; μ, σ) = f((x — μ)/σ)/σ. We write this distribution as Logistic(/x, σ). The distribution derives its name from the fact that the cdf F{x) = 1/(1 +e~x) satisfies the logistic differential equation f(x) = F'(x) = F(x)(l - F{x)), so that x = ln[F(a;)/(l - F{x))]. The graph of the pdf of the standard logistic distribution is given in Figure 4.14. Note that the pdf is symmetric around 0.

Figure 4.14 The pdf of the standard logistic distribution.

1 2 0 PROBABILITY DISTRIBUTIONS

Table 4.17 Moment and tail properties of the Logistic(0,1) distribution.

Property Condition

Expectation 0

Variance π 2 / 3

Moment generating function Γ(1 - ί) Γ(1 + t) = i ï ï ^ \t\ < 1

Tail probability F(X > x) 1/(1 + ex)

Other properties and relations are:

1. Uniform: If U ~ U(0,1), then

In(j^j) ~ Logistic(0,1) .

2. Exponential quotient: Let X, Y ~ Exp(l). Then,

l n i — j ~ Logistic(0,1) .

Since Logistic(/x, σ) forms a location-scale family, we only consider generation from Logistic(0,1). The following algorithm follows directly from the inverse-transform method.

Algori thm 4.45 (Logistic(0,1) G e n e r a t o r )

Generate U ~ U(0,1) and output X = In ' 1-U

4.2.10 Log-Normal Distribution

The log-normal distribution with scale parameter σ > 0 and location parameter / i ë R i s defined via the pdf

/(χ;μ,σ) = 1-= e x p f - ^ ^ 7 ^ ) , x>0.

We write LogN(/i, σ2) for the distribution. The characterizing property of the dis-tribution is that if X ~ LogN(^,a2) , then Ι η Χ ~ Ν(μ, σ2) . As a consequence, it arises as the scaled limit of products of iid random variables, in the same way that the normal distribution arises for sums via the central limit theorem. Note that the log-normal distribution is not determined uniquely by its moments, see [13]. The dependence of the log-normal distribution on the scale parameter for μ = 0 is illustrated in Figure 4.15.

CONTINUOUS DISTRIBUTIONS 1 2 1

Figure 4.15 Pdfs of the log-normal distribution for μ = 0 and various values of the scale parameter σ.

Table 4.18 Moment and location properties of the LogN(/x,a2) distribution.

Property

(•-l) Expectation βμ+σ I2

Variance ε2μ

Moment EXk ekß+k2a2/2

Mode β μ _ σ 2

Median βμ

Other properties and relations are:

1. Normal distribution: If X ~ Ν(μ,σ2) , then ex ~ Ι^ Ι \ Ι (μ ,σ 2 ) .

2. Power transformation: If Y ~ LogN(/x, σ2) , then e°Yh ~ Logl\l(a + δμ, (&σ)2).

3. Product: If Jfj ~ L o g N ^ , σ2) independently, i = 1 , . . . , n, then

n • n n

t = l M = l i = l

4. Reciprocal: If X ~ LogN(ju,a2), then 1/X ~ LogN( -^ , a 2 ) .

The generation of a LogN(μ, σ2) distributed random variable follows immediately from its relation to the normal distribution as given in Property 1 above.

Algor i thm 4.46 (LogN(/u,σ2) Generator)

1. Generate Y ~ Ν(μ,σ 2 ) .

2. Output X = eY.

1 2 2 PROBABILITY DISTRIBUTIONS

4.2.11 Normal Distribution

The standard normal or standard Gaussian distribution has pdf

f(x) = ^=e-x2'2, xeR.

The corresponding location-scale family of pdfs is therefore

/ ( χ ; μ , σ2 ) = _ ^ e - H ^ ) , x e R . (4.5)

σ ν 2 π

We write the distribution as Ν(μ, σ2) . We denote the pdf and cdf of the N(0,1) distribution as φ and Φ, respectively. Here, Φ(χ) = J_ ip{t) at = | + |erf(-^=),

715 where erf (a;) is the error function. The normal distribution plays a central role in statistics and arises naturally

as the limit of the sum of iid random variables via the central limit theorem (see 625 Section A.8.3). Its crucial property is that any affine combination of independent

normal random variables is again normal. In Figure 4.16 the probability densities for three different normal distributions are depicted. More information on the Gaussian distribution, especially regarding the multidimensional case, can be found in Section 4.3.3.

Figure 4.16 Pdfs of the normal distribution for various values of the parameters.

Table 4.19 Moment and tail properties of the Ν(μ, σ2) distribution.

Property

Expectation μ

Variance σ2

Moment generating function β ί μ + ί σ I2

Tail probability F(X > x) \ - | e r f ( | ^ | )

CONTINUOUS DISTRIBUTIONS 1 2 3

Other properties and relations are:

1. Affine combinations: Let X\, X 2 , · · ■, Xn be independent with Xj ~ Ν(μ,, σ2) , i = 1 , . . . , n. Then,

a + ri / n n x

Σ biXi ~ N ί α + £ fti/ü, £ δ2σ2 )

2. Central limit theorem: Let X i , . . . , X n be independent and identically dis-tributed with expectation μ and variance σ2 < oo. Let £>n = X i + ■ · · + X n -Then

5 η " " μ Λ ^ ~ Ν ( 0 , 1 ) as n - > o o .

3. ignare: If X - N(0,1), then X2 ~ Gamma(l/2,1/2).

4. Quotient: Let X i , X 2 ~ N(0,1). Then, ^ ~ Cauchy(0,1).

5. Exponential: If X ~ Ν(μ, σ2) , then e x ~ LogN(/x, σ2) .

6. SiaWe distribution: N(0,2) ΞΞ Stable(2,/3). In addition, if X ~ N(0,1), then X " 2 ~ S t a b l e ( i , l ) ^ L é v y .

7. Fractional moments: If X ~ N(0,1), then

r (a+1) 2 a / 2

Ε|ΛΊβ = - * - Μ » a > ° -

In addition, if X j , . . . , Xn ~ N(0,1), then (see [10])

- π »-^i--n^f. l<i<;Kn J = l V '

and for c ^ 0,

F T W 2 ^ π I Y2 Y2!2" _ ττ Γ(1 + 2c + 2ja) Γ(1 + (j + 1)α)

^ r l 1 1 · " ^ ' 1 " I l r(l + c + i a ) r ( l + a ) '

Since N (μ, σ) forms a location-scale family, we only consider generation from N(0,1). The most prominent application of the polar method (see Section 3.1.2.7) " ^ 54 lies in the generation of standard normal random variables, leading to the celebrated Box-Muller method.

Algor i thm 4.47 ( N ( 0 , 1 ) Generator, B o x - M u l l e r Approach)

1. Generate t / j , £/2 ~ U(0,1).

2. Return two independent standard normal variables, X and Y, via

X = J-~2\nU1 cos(27r£/2) , , - K j (4-6)

1 2 4 PROBABILITY DISTRIBUTIONS

Marsaglia [23] (see also [24]) developed an alternative generation method for N(0,1) random variable generation which is also based on the polar method, but which avoids the evaluation of the relatively costly cos and sin functions. The method uses a simple acceptance-rejection approach to generate random vectors within the unit circle: draw Vi, V2 ~Μ U(—1,1) and accept (Vi, V^/v^S if 5 = V2 + V2 ^ 1. As in the Box-Muller method, a pair of independent N(0, Un-distributed random variables is obtained by multiplying the coordinates of the generated point with the radius R = y/—2In 17, where U ~ U(0,1). This leads to the following algorithm.

Algori thm 4.48 ( N ( 0 , 1 ) Generator, Polar/Acceptance—Reject ion)

1. Generate Ux,U2 ~ U(0,1). Set Vi = 2Ut - l,i = 1, 2 and S = Vf + Vf.

2. IfS^l. Draw U ~ U(0,1) and return

X = Vx y/-2lnU/S ,

Y = V2 v / - 2 In t / / 5 .

Otherwise, go back to Step 1.

The following method [8, Pages 197-199] is based on the ratio of uniforms method ■^ 66 as illustrated in Example 3.18 and uses a squeeze function to further speed up the

calculation.

Algor i thm 4.49 ( N ( 0 , 1 ) Generator, Rat io of Uniforms wi th Squeezing)

1. Set a+ = \ / 2 / e and a_ = —y/2/e.

2. Generate U ~ U(0,1) and V ~ U(a_ ,o + ) independently, and set X = V/U.

3. IfX2^6-8U + 2U2, return X.

4. Else if X2 > jj - 2U, repeat from. Step 2.

5. Else if X2 < —4 In 17, return X. Otherwise, restart from Step 2.

Finally, the following algorithm uses acceptance-rejection with an exponential proposal distribution. This gives a probability of acceptance of •y/7r/(2e) « 0.76 .

"S* 60 The theory behind it is given in Example 3.15.

Algori thm 4.50 ( N ( 0 , 1 ) Generator, Acceptance—Rejection from Exp(l))

1. Generate X ~ Exp(l) and U' ~ U(0,1), independently.

2. IfU' ^ e " ^ " 1 ) 2 / 2 , generate U ~ U(0,1) and output Z = (1 - 2 I { t / 0 / 2 } ) X; otherwise, repeat from Step 1.

CONTINUOUS DISTRIBUTIONS 1 2 5

4.2.12 Pareto Distribution

The Pareto ( type II) distribution, sometimes known as the Lomax distribution [20], with scale parameter λ > 0 and shape parameter a > 0 is defined via the pdf

f{x;a,\) = α λ ( 1 + λ ζ ) - ( α + 1 ) , x ^ 0 .

We write the distribution as Pareto(a, λ). The effect of the parameter a on the shape of the distribution is illustrated in Figure 4.17.

Figure 4.17 Pdfs of the Pareto distribution for two values of the shape parameter a and fixed scale parameter λ = 1.

Remark 4.2.1 (Alternat ive Definition) Also commonly used is the Pareto (type I) distribution, which has pdf

(\ - ( a + l )

X \ , X ^ XQ ,

xoj

with shape parameter a > 0 as before, and scale parameter XQ > 0. We will denote this distribution by Paretol(a, XQ). It is related to the type II distribution for any λ > 0 via X ~ Pareto(a, λ) <ί=> (X + λ " 1 ) ~ Paretol(a, A ' 1 ) .

Table 4.20 Moment and tail properties of the Pareto(a, λ) distribution.

Property Condition

Expectation xta-i) a > 1

Variance Λ»(α-ι>(α-2) a > 2

Moment EXk ττ^^Ρτ1 a>k

Tail probability P (X > x) (1 + λ χ)~α x^O

1 2 6 PROBABILITY DISTRIBUTIONS

Other properties and relations are:

1. Expectation and variance: In addition to the cases listed above, for 0 < a ^ 1 the expectation is oo and the variance does not exist. For 1 < a < 2 the variance is oo.

2. Exponential: If Y ~ Exp(l), then

eY / « _ l ~ Pareto(a, 1) .

3. Gamma mixture: If Y ~ Gamma(a, 1) and (X | V) ~ Exp(Y), then X ~ Pareto(a, 1).

The Pareto(a, λ) family of distribution is a scale family. As a consequence, if X ~ Pareto(a, 1), then X/X ~ Pareto(a, λ). Generating from Pareto(a, 1) is easily established via the inverse-transform method, using the fact that the cdf is given by F{x) = 1 - (1 + x)~a, x ^ 0.

Algor i thm 4.51 (Pareto(a, 1) Generator via Inverse-Transform)

Generate U ~ U(0,1) and return X = U~xla - 1.

An equivalent algorithm (see also Property 2 above) is the following.

Algor i thm 4.52 (Pareto(a, 1) Generator)

Generate Y ~ Exp(l) and return X = eY/a — 1.

Property 3 results in the next algorithm.

Algori thm 4.53 (Pareto(a, 1) Generator)

1. Generate Y ~ Gamma(a, 1).

2. Given Y, return X ~ Exp(y).

4.2.13 Phase-Type Distribution (Continuous Case)

The continuous phase- type distribution has pdf

f(x;a,A) = -aeAxAl, x^O,

where a is a 1 x m probability vector and A is an m x m invertible matrix such that

*=(<£ " f ) ^ 166 is the Q-matrix (infinitesimal generator) of a Markov jump process. Above, 0 and 1

are the m x 1 vectors of zeros and ones, respectively, and eAx is a matrix exponential, defined as

^ n! ' n=0

eA*

CONTINUOUS DISTRIBUTIONS 1 2 7

We write the distribution as PH(a,A). In the table below, ß(A) < 0 denotes the largest eigenvalue of A.

Table 4.21 Moment and tail properties of the PH(a, A) distribution.

Property Condition

Expectation —<xA~l\

Variance 2aA~2l - ( α Α _ 1 1 ) 2

Moment EXk ( - l ) f c fc! aA~kl

Moment generating function a(tl + A)~1Al t ^ — g(A)

Tail probability Ψ{Χ > x) aeAxl

Other properties and relations (see, for example, [26]) are:

1. Absorption time: A random variable X ~ PH(a , A) can be thought of as the time to absorption in state m +1 of a Markov jump process {Yt,t > 0} taking «S* 635 values in { 1 , . . . , m, m + 1}. The Markov jump process has Q-matrix Q and initial distribution (α,Ο).

2. Exponential distribution: The exponential distribution Εχρ(λ) is the simplest continuous phase-type distribution, with m = 1, A = — X, and a — 1.

3. Generalized Erlang distribution: A random variable with a gene ra l i zed E r -lang distribution can be thought of as the sum of n independent exponential random variables with rates λ ι , . . . , Xn. The distribution is of phase-type with m = n, a = (1, 0 , . . . , 0) and

A-

If all Xi = X for all i, then we have the ordinary Erlang distribution, Erl(n, λ) = Gamma(n,A).

Algorithm 16.2 describes the computation of the cdf of the generalized Erlang "3° 556 distribution for the case where the {Xk} are distinct and sorted in descending order: λι > λ2 > · · ■ > Xn.

4. Hyperexponential distribution: The h y p e r e x p o n e n t i a l distribution is the mixture of n exponential distributions, with parameters λ^ > 0 and mixing probabilities c^ > 0, i = 1, 2 , . . . , n. It can be represented as the phase-type distribution with a = (at\,..., an) and A = d iag(—λι , . . . , —λη).

/ - λ χ 0

0 V o

λι - λ 2

0 λ2

0 0

—λη_ι 0

0 \ 0

λ η -1

1 2 8 PROBABILITY DISTRIBUTIONS

5. Coxian distribution: The Coxian distribution is a phase-type distribution with a = (1, 0 , . . . , 0) and

f-Xi 0

0

Ριλ ι - λ 2

0

0 0

o \ 0

" λ η - 1 P r a - ΐ λ η - Ι

0 - λ „ /

6. Sum: The sum of a finite number of independent phase-type distributed ran-dom variables has again a phase-type distribution.

7. Mixture: The mixture of a finite number of phase-type distributions is again a phase-type distribution.

Let Q be the Q-matrix as in (4.7). Denote the transition matrix of the cor-responding jump chain (see Property 1) by K and the exponential holding time

635 parameters by {qi, i = 1 , . . . , m}. Let K(y, ·) denote the y-th. row of K and a the initial distribution.

Algori thm 4.54 ( Ρ Η ( α , Α ) G e n e r a t o r )

1. Draw Y ~ ex. SetT = 0.

2. Draw S ~ Exp(çy). SetT = T + S.

3. Draw Y ~K(Y,·).

4- IfY = m+l, return X — T; otherwise, go back to Step 2.

■ EXAMPLE 4.6 (Coxian Distr ibut ion Generat ion)

Consider the generation from the Coxian distribution given in Property 5 above. The following MATLAB program implements Algorithm 4.54 for the case m = 4, Xi = 3, pi = 0.9, i = 1 , . . . , m, and a = (1,0,0,0). The sample mean and variance are compared with the true expectation of 1.1463 and true variance of 0.4961 (rounded).

%coxian_ex.m a lpha = [ 1 0 0

P = m = lam A =

Al =

q = Q = K =

0 . 9 ; 4; = 3 ;

0 ] ;

[-lam , lam*p, 0, 0 , 0 , - lam,

= - A*ones(m - d i ag (A) ;

lam*p;

,D;

[A Al; z e r o s ( l , m ) d i a g ( [ l . / q ;

K(m+l,m+l) = 1; N = 10"5;

i ] )*(C

0; 0

0]

+

o, o,

-lam 0 ,-

lam*p ,0 ; . . . -lam] ;

d i a g ( [ q ; 0 ] ) ) ;

CONTINUOUS DISTRIBUTIONS 1 2 9

x = zeros(N,1); for i=l:N;

T = 0; Y = min(find(cumsum(alpha)> rand)); while Y "= m+1

T= T -log(rand)/q(Y); Y = min(find(cumsum(K(Y,:))> rand));

end

x(i) = T; end

tmemean = -alpha*inv(A)*ones(m,1) mx = mean(x) truevar = alpha*inv(A)*(2*eye(m,m)-ones(m,1)*alpha)*inv(A)*ones(m,1) vx = var(x) hist(x,100)

4.2.14 Stable Distribution

The s table (also called a-stable , Levy a-s table or Levy skew a-stable) dis-tribution has no closed-form pdf in general. Instead, it is defined in terms of its characteristic function [12, 27, 32], with

<j>(t) = Eel exp ( - | i | Q ( l - i / ? s g n ( i ) t a n ( f a ) ) ) αφ\

e x p ( - | t | ( l + i£/3sgn(t)ln|t |)) a = 1 (4.8)

where 0 < a ^ 2 is the characteristic exponent and — 1 < /3 ^ 1 the skewness parameter. Here sgn(i) denotes the sign of i, that is, sgn(i) = —1,0, and 1, for t < 0, t = 0, and t > 0, respectively. Stable distributions arise naturally in the study of Levy processes; see Section 5.13.

The pdf can be obtained via numerical inverse-Fourier transform techniques. In particular (note that SR[0(-t)] = &[Φ{ΐ)] and 3[<£(-i)] = -Q#(£) ] ) ,

/ ( * ) I Γ π Jo

: φ(ί) dt. (4.9)

The support of the distribution is K, except in the cases β = Ι , α < 1 (where the support is [0, oo)) and β = —Ι,α < 1 (where the support is (—oo, 0]); see, for example, [32]. When \β\ = 1 the stable law is said to be extremal . When β = 0 the distribution is symmetric around 0.

A location-scale version of this distribution is given by the transformation Y = μ + σΧ, where X has the characteristic function above. The defining prop-erty of a stable distribution is that any affine combination of independent stable random variables is again a stable random variable. We write the distribution as Stable(a, β, μ, σ). The standard stable distributions Stable(a, β, 0,1) are denoted by Stable(a, β). Graphs of the standard stable pdf for various values of the parameters are given in Figure 4.18.

208

1 3 0 PROBABILITY DISTRIBUTIONS

Figure 4.18 Pdfs of the standard stable distribution for various values of a and ß.

Three special cases of stable distributions for which the pdf has an easy form are (1) the Stable(2,/3) distribution, which is equal to the N(0,2) distribution; (2) the Stable(l,0) distribution, which is equal to the standard Cauchy distribution; (3) the Stable( | , 1), which is known as the Levy distribution (denoted here as Levy), with pdf

fix) = —=x~3/2e~i:, x>0.

Other properties and relations include:

1. Expectation and variance: The expectation of the Stable(a,/3) distribution is 0 for 1 < a ^ 2 and oo otherwise. The second moment is oo except when a = 2 where it is 2.

2. Sum: If Xi ~ Stable(a,/3j), i = 1,2 are independent, then

* l+ *2 - t .. / ßl+fa —-^ Stable [ a.

3. Extremals: If Χχ ~ Stable(a, 1) and X2 ~ Stable(a, —1) are independent, then

4^) " Xl + (^) a X2 ~ S t a b l e ( a ' ^ ■ 4. Normal distribution: If X ~ N(0,1). then (1/X2) ~ Levy.

5. Reflection: If X ~ Stab\e(a,0), then ( - X ) ~ Stable(a, -0).

CONTINUOUS DISTRIBUTIONS 1 3 1

Zolotarev [32, Page 78] derives integral expressions for the pdf and the cdf of the Stable(a,/3) distribution from (4.9) containing only real-valued (trigonometric) functions. Based on this, Chambers et al. [6] develop the following generation method for general Stable(a,/3) distributions. For details and proofs we refer to [32] and [31] (the last is accompanied by a small correction).

Algori thm 4.55 (Stable(a, /3) Generator, a φ 1)

1. Set B = arctan(/3 t&n(na/2))/a.

2. Generate V ~ U(—π/2,π/2) and W ~ Exp(l) independently.

3. Output

__ sm{a(V + B)) ίcos[V - a(V + B)]\(1~a)/a

~ [cos{aB)cos{V)]1/a \ W )

When a = 1 one can instead use the following algorithm.

Algori thm 4.56 (Stable(l , /3) Generator)

1. Generate V ~ U(—π/2,π/2) and W ~ Exp(l) independently.

2. Output

( | + / ? v ) t a n ( K ) - / 3 1 n ^ X = -7Γ + 2ßV/n

For Stable(2,/3) = N(0,2) and Stable(l,0) = Cauchy, we refer to the corresponding random variable generation sections. For the Levy distribution, Property 4 above yields the following algorithm.

Algori thm 4.57 ( S t a b l e ( l / 2 , 1 ) = Levy Generator)

Generate Y - N(0,1) and output X = 1/Y2 .

4.2.15 Student's t Distribution

Student's t distribution or simply the t distribution has pdf

ηψ) ι.. * , 2 \ - ("+ l ) /2

f{x;v)= μ-lfU 1 + - - i e l · (4·10)

where v > 0. We write the distribution as tu. If the parameter v takes integer values, then it is referred to as the degrees of freedom of the t distribution. A location-scale version of this distribution is given by the pdf /(χ;μ,σ) = / ( ( x — μ)/σ)/σ; we write tI /(/i, σ2) for the corresponding distribution.

The distribution arises in statistics in the problem of estimating the mean of a normally distributed population when the population variance needs to be es-timated and the sample size is small. In particular, let X\,... ,Xn ~üd Ν(μ,σ2) , with sample mean Xn = (Xi + ■ · · + Xn)/n and sample variance 5^ = 5ΖΓ=ι(-^* — "^" ^62

1 3 2 PROBABILITY DISTRIBUTIONS

Xn)2/(n - 1). Then,

Χ„-μ T = . !- ~ t„_ i .

The effect of the parameter v on the shape of the distribution is illustrated in Figure 4.19.

Figure 4.19 Pdfs of the t„ distribution for various values of v.

Table 4.22 Moment properties of the t„ distribution.

Property Condition

Expectation 0 v > 1

v Variance v > 2

v-2

21~i\^t\^2K!,(^\t\) Characteristic function 2 v > 0

Tail probability P (X > x) - / „ / ( I /H-^ ) ( - , - J

Here, Kv{x) denotes the modified Bessel function of the third kind and Ιχ(α,β) 715 denotes the incomplete beta function. Other properties and relations are:

1. Cauchy distribution: t i = Cauchy(0.1).

2. Normal distribution: If .ΧΊ, ~ t„, then Xv —> Z ~ N(0,1) as z/ —► 00.

3. Normal and gamma mixture: If X ~ N(0,1) and y ~ χ^ = Gamma(^/2,1/2) are independent, then

X

CONTINUOUS DISTRIBUTIONS 1 3 3

4. Beta distribution: If T ~ t2«, then (equivalent to Property 3 of the beta distribution on Page 104)

B = - ( 1 + , | ~ Betaia, a) .

5. Gamma distribution: Let X ~ Gamma(l/2,1) and V ~ Gamma(i ' /2,1), inde-pendently. If B ~ Ber(l/2) is independent of X and Y, then

In addition, if Z ~ Gamma(^/2,1), independently of Y, then

6. Sçitare: If X ~ t„ , then X 2 ~ F ( l , n ) .

7. F distribution: If X ~ F(n, n) , then ^ ( - / Ϋ - 1 / Λ / Χ ) ~ t n .

Since ΐ ! /(μ, σ2) forms a location-scale family, we only consider generation from t„. There are many simple and effective generation algorithms for the t distribution. Property 3 above, which is fundamental in classical statistics, yields the following algorithm.

Algori thm 4.58 (t„ Generator via G a m m a )

1. Generate Z ~ N(0,1) and Y ~ Gamma(z//2,1/2) independently.

2. Output X = ZI s/YJv .

An alternative approach is to use the relation between the Student and beta dis-tribution as in Property 4 above, for example.

Algori thm 4.59 (t„ Generator via Beta )

1. Generate Y ~ Beta ( '

2. Output

1.2' 2)

X = ^J-VY(i - Y)

The following simple algorithm, from [14], is based on the ratio of uniforms method. " ^ 66

Algori thm 4.60 (t„ Generator via Rat io of Uniforms)

1. Generate U ~ U{-y/ü, y/v) and V ~ U(0,1).

2. SetW = V1/".

3. IfW2 + ^ - ^ 1, return X = U/W. Otherwise, restart from Step 1.

The polar method (see Section 3.1.2.7), with radial pdf " ^ 54

/ i î ( r ) = r ( l + r » ~ 1 ~ l ' / 2 , r ^ O ,

results in the following algorithm.

1 3 4 PROBABILITY DISTRIBUTIONS

Algor i thm 4.61 (t„ Generat ion via Polar M e t h o d (I))

1. Generate U,V ™ U[0,1].

2. Set Θ = 2TTL7 and R = Jv (V-2/" - l ) .

3. Return X = Rcos{&) and Y = i î s i n ( e ) . Both X and Y are tv random variables, but are not independent.

The following algorithm, from [4], is again based on the polar method, but replaces trigonometric function calculation with (faster) acceptance-rejection.

Algori thm 4.62 (t„ Generator via Polar M e t h o d (II))

1. Generate U,V ~ U [ - l , l ] .

2. Set W = U2 + V2. IfW > 1, return to Step 1.

3. Set C = U2/W andR = v (W~2IV - l ) .

4. Return X = sgn{U)\fRC.

Finally, Kinderman et al. [19] proposed the following acceptance-rejection algo-rithm.

Algor i thm 4.63 (t„ Generator via Acceptance—Rejection)

1. Generate Ui,U2 ~ U(0,1).

2. If U\ < 0.5, set X = AUl_l and V = ^ ; otherwise, set X = AU\ — 3 and

V = U2.

3. If V < 1 — '-γ- or V < (l + ^-) , accept X as a random variable from

X-v; otherwise, repeat from Step 1.

4.2.16 Uniform Distribution (Continuous Case)

The uniform distribution on the interval [a, b] has pdf

f{x;a,b) = z , a ^ x ^ b .

b — a

We write the distribution as U[a, b\. A graph of the pdf is given in Figure 4.20.

1

a b x —>

Figure 4.20 The pdf of the uniform distribution on [a, b].

CONTINUOUS DISTRIBUTIONS 1 3 5

The uniform distribution is used as a model for choosing a point randomly from the interval [a, b], such that each point is equally likely to be drawn. The uniform distribution on an arbitrary Borel set B in Kn with non-zero Lebesgue measure (for example, area, volume) \B\ is defined similarly: its pdf is constant, taking value l/\B\ on B and 0 otherwise. We write U(B) or simply UB. The U[o, b] distribution is a location-scale family, a s Z ~ U[a, b] has the same distribution as a + (b — a)X, with X ~ U[0,1].

Table 4.23 Moment and tail properties of the U(a, b) distribution.

Property Condition

a + b 2

(a-

ew

(6-b-

-b? L2

_eat

-a)t X

Expectation

Variance

Moment generating function

Tail probability P(X > x) x G (a, b) b � a

The generation of U(0,1) random variables, crucial for any Monte Carlo method, is discussed in detail in Chapter 1. U(a, b) random variable generation follows "S" 1 immediately from the inverse-transform method.

Algori thm 4.64 (U(a, b) Generat ion)

Generate U ~ U(0,1) and return X = a+ (b — a)U.

4.2.17 Wald Distribution

The Wald or inverse Gaussian distribution has pdf

/ ( ζ ; μ , λ ) = ^/ — e ^ ^ , x>0,

where μ > 0 is a location parameter and λ > 0 is a scale parameter. We write this distribution as Wald^, λ). The Wald distribution arises as the first time that a Brownian motion with positive drift hits a positive level. More specifically, if "®° 182 {Bt, t > 0} is a Brownian motion with drift μ > 0 and diffusion coefficient σ2 > 0 and if ra = inf {i ^ 0 : Bt > a} , then τα ~ Wald(a//U, ο 2 /σ 2 ) . Graphs of the pdf for various values of the parameters are given in Figure 4.21.

1 3 6 PROBABILITY DISTRIBUTIONS

Figure 4.21 Pdfs of the Wald distribution for various values of the parameters.

Table 4.24 Moment and tail properties of the Wald^, λ) distribution.

Property Condition

Expectation μ

Variance ^~ Λ (l i / i 2 μ

Λ2 ' )

Moment gen. function βμ \ / t < -^

Tail probability P (X > x) Φ ( Λ / ^ ( 1 - ^ ) ) - φ ( - y ^ ( l + * ) ) e ^ z > 0

Here, Φ(χ) is the cdf of the standard normal distribution. Other properties and relations are:

1. Chi-square distribution: If X ~ Wald(/u, λ), then

Λ ( * - Μ ) 2 O A _ ^ ^ X l "

49 2. Inverse gamma distribution: Let ΑΓμ ~ Wald(/i, λ). Then,

Χ μ —> F ~ lnvGamma(l/2, A/2) as μ —> oc .

3. Swm: If Xj ~ Wald(/x, λ α^), ι = 1 , . . . , n, independently, then

n n

\ J X j ~ Wald(^a, Aa2) , with a = V _ ] a i · i = l t = l

A simple generator can be derived from the following transformation. Let X ~

Wald(/z, A) and g{x) = (χ~μ> . For each y > 0 there are two solutions, say Χχ and

CONTINUOUS DISTRIBUTIONS 1 3 7

X2, to g{x) = y. It is easy to check that X\ > μ and X2 ^ μ are the roots of a quadratic equation, and that their product is simply μ2. So we can set x\ = x and %2 = μ2/χ, with x ^ μ satisfying g(x) = y. A simple modification of the standard transformation rule (A.33) yields that the pdf of Y satisfies

jY{y> -g'^/x) + g'(x) '

where fx is the pdf of X, g'{x) = λ(μ~2—χ~2), and x = μ+ι^ + ^^4ρλμ + μ?υ2. Evaluation of the above expression yields fY(y) = β - ^ ^ τ π / ) " 1 / 2 for all y > 0, which is the pdf of the xf distribution, so that we have derived Property 1 above.

In order to generate the appropriate X-value for a given y-value we have to choose the appropriate root {μ2/X or X) with probabilities in the proportion ÎXf(f(l)] X -g'$/x) = χ3/μ3 X μ2Ιχ2 = ΧΙμ to 1· This leads t0 the f o l l o w i ng algorithm.

Algor i thm 4.65 (\Νβ\ά(μ, λ ) Generator)

1. Generate W ~ N(0,1) and set Y = W2.

2. Set

2λ 2λ f+^f + ~ν<ίμ\Υ + μ2Υ2.

3. Generate B ~ Ber(—^) . If B = 1, output X = Z; otherwise, output X =

Z '

4.2.18 Weibull Distribution

The Weibull distribution has pdf

f(x;a,X)=a\{\x)a-1e-(Xx)a, x^O,

where λ > 0 is called the scale parameter and a > 0 the shape parameter. We write the distribution as Weib(a, λ). A location-scale version of this distribution is given by the pdf /(χ;α,μ,σ) = f((x — μ)/σ;α,1)/σ, χ ^ μ; we write this distribution as Weib(a, μ, σ). Note that Weib(a, λ) = Weib(a, 0, λ _ 1 ) .

In reliability theory the Weibull distribution is often used as the distribution of a lifetime of a component. The reflection of this distribution, with pdf f(—x; a, μ, σ), which we will call the reversed Weibull distribution, is one of the three extreme value type distributions (type III), denoted here by — We'\b(a,μ,σ).

A special case is the Weib(2,1/(σ\/2)) distribution, known as the Rayle igh distribution, denoted by Rayleigh(σ). The dependence of the Weibull distribution on its parameters is illustrated in Figure 4.22.

1 3 8 PROBABILITY DISTRIBUTIONS

Figure 4.22 Pdfs of the Weib(a, λ) distribution for various values of its parameters.

Table 4.25 Moment and tail properties of the Weib(a, λ) distribution.

Property Condition

Expectation λ~ 1Γ(1 + α" 1 )

Variance λ~2(Γ(1 + 2α" 1 ) - Γ(1 + a'1)2)

Moment EXk \-kT(l + to"1)

Tail probability F(X > x) e " ^ ' * x^O

Other properties and relations are:

1. Exponential distribution: If Y ~ Εχρ(λ) and β > 0, then Υβ ~ Weib(/3_1, Χβ).

2. Normal distribution: If X, Y ~ N(0, σ2) , then y/X* + Y2 ~ Rayleigh(a).

3. Gumbel distribution: If X ~ Weib(a, λ), then — l n X ~ Gumbel(lnA, 1/a).

Since Weib(a, μ, σ) defines a location-scale family, it suffices to consider gener-ating Weib(o:,À) random variables only. This is easily established via the inverse-transform method, as the cdf satisfies F(x) = 1 — exp(—( \x)a) , x ^ 0.

Algor i thm 4.66 (Weib(a, λ ) Generator)

Generate U ~ U(0,1) and output X = j (— In U) ° .

4.3 MULTIVARIATE DISTRIBUTIONS

We list various multivariate distributions in alphabetical order. Recall that an "3° 611 absolutely continuous or discrete distribution is completely specified by its pdf.

MULTIVARIATE DISTRIBUTIONS 1 3 9

Remark 4.3.1 (Singular Continuous Distr ibutions) Many practically en-countered multivariate singular continuous distributions involve absolutely continu-ous random vectors that are mapped to a lower-dimensional manifold. For example, the uniform distribution on the perimeter of a circle is singular with respect to the Lebesgue measure on R2 and so its two-dimensional pdf is zero. However, it is read-ily thought of as the distribution of a one-dimensional uniform random variable bent into a circle.

4.3.1 Dirichlet Distribution

The s tandard Dirichlet (or t y p e I Dirichlet) distribution has pdf

p/^-->n+l \ n / n N α„+ι—1 1 I 2 j i = l ai /(χ;a) = Vffi, V Πg?'"1 x - Σ * 0 » 3^ο,» = ι , . . . ,n,5><i Πί=ι r ( « i ) V ' = 1 x t = l

where on > 0, i = 1 , . . . , n + 1 are shape parameters. We write this distribution as Dirichlet(ai , . . . , α η +ι ) or Dirichlet(a), with a = (αχ , . . . ,αη+ι)τ. The Dirichlet distribution can be regarded as a multivariate generalization of the beta distribution (see Section 4.2.1), in the sense that each marginal Xk has a beta distribution (a more general statement is made in Property 6 below).

In Bayesian statistics, the Dirichlet distribution is the conjugate prior for the "3° 675 success probabilities of the multinomial distribution (see Section 4.3.2). A graph of the pdf for the two-dimensional case is given in Figure 4.23.

Figure 4.23 The Dirichlet pdf in two dimensions, with parameter vector ce (1.5,1.5,3)T.

1 4 0 PROBABILITY DISTRIBUTIONS

Table 4.26 Moment properties of the Dirichlet(a) distribution.

Property

_, . a Expectation vector —

c

. . cd iag(ä) — ä ä Covanance matrix —

cz(c+ 1)

Here, a = ( a i , . . . , a n ) T and c = Σ " _ ι α ί - Other properties and relations are:

1. Gamma to Dirichlet: Let Y i , . . . , Yn+i be independent random variables with Yfc ~ Gamma(afc, 1), k = 1 , . . . , n + 1, and define

Yk Xk = n — , fc = l, . . . , n .

K V^7l+1 y_ ' ' '

Then, X = ( X j , . . . , Xn) ~ Dirichlet(a).

2. Dirichlet to gamma: Let y ~ Gamma(2™=1 oti, 1) be a random variable in-dependent of X = (Χχ,..., Xn) ~ Dirichlet(a). Define

n

y"i = rXi, t = i,...,n and Yn+i = Y ( I - ^ X i ) . 1 = 1

Then, Y i , . . . , Y n + i are independent with Yi ~ Gamma(öi, 1), i = 1 , . . . , n +1.

3. Beia io Dirichlet: Let ΥΊ , . . . , Yn be independent random variables with Yj ~ -<n+l

i - 1

Beta(aj, Σ " ί ί + 1 a , ) , i = 1 , . . . , n. Define

Χί = ^Π(1-ν}), t = l,...,n.

Then, X = ( X 1 ; . . . , X n ) - Dirichlet(a).

4. Dirichlet to beta: If X = (Xi,... ,Xn) ~ Dirichlet(a) and

v X% · _ , l ~ i - x 1 XiS ΐ - 1 - · · · ' η >

then Y i , . . . , Yn are independent with Yi ~ Beta(aj, Σ?= ί+ ι aj)> i = 1,· · ■ ,n.

5. Uniform: The n-dimensional Dirichlet(l , . . . , 1) distribution has a constant 71 density on the unit simplex {x € Kn : Xi ^ 0, i = 1 , . . . , n, X)™=1 a;» ^ 1} and

thus corresponds to the uniform distribution on that set.

6. Beta distribution of marginals: As a consequence of Property 4, Χγ ~ Beta(ai , Σίφΐ α ») ' a n d 0°Y relabeling) Xk ~ Beta(afc, 5 ^ f c a;) in general.

MULTIVARIATE DISTRIBUTIONS 1 4 1

7. Subvectors: From Property 1 it follows immediately that for any choice {ii,..., ik\ of distinct indices from { 1 , . . . , n},

{Xh ,...,Xik)~ D i r i c h l e t ^ , . . . , aik, β) ,

with ß = Σ"^1 ai - Σ*=1 aij.

8. Moments: Let X ~ Dirichlet(a), with a = (a,a,... ,a, β)τ. Then, for 7 ^ 0 , we have (see [10])

F Π I γ v | 2 7 _ Tjan + ß) nfî T(a + j 7 ) Γ(1 + (j + 1)7) Κ*<ί<η ' Γ»(α)Γ(αη + /3 + η ( η - 1 ) 7 ) Α Α Γ(1 + 7 )

The fondamental relation between the Dirichlet and the gamma distribution given in Property 1 provides the following generation method.

Algori thm 4.67 (Dirichlet(a) Generator)

1. Generate Yk ~ Gamma(at, 1), k — 1 , . . . , n + 1 independently.

2. Output X = (Xi,.. · , Xn), where

Yk Xk = „ n + i . * = l , . . . , n .

An alternative generation mechanism is based on the relation between the Dirichlet distribution and the beta distribution, see Property 3 above. In general, beta generators require a gamma generator, unless the parameter values are integers, in which case one can use order statistics (see Algorithm 4.24 on Page 106). The following algorithm can be useful when the parameters are small and integer-valued.

Algori thm 4.68 (Dirichlet(a) Generator via B e t a R a n d o m Variables)

1. Generate n + l

2. Output X = (Xi,.. ·, Xn), where

i-\

Yi ~ Beta ( ai, ] P a3A, i = 1 , . . . , i ^ 7=i+l '

Xi=Yi]l(l-Yj), i = l,...,n.

4.3.2 Multinomial Distribution

The mult inomial distribution has joint pdf given by

/ ( x ; n, p) = n! TT -— for all x £ { 0 , . . . , n}k for which ^ Xi i=i Xt' i=i

1 4 2 PROBABILITY DISTRIBUTIONS

675

where k and n are strictly positive integers, each pi > 0, and Σί=ιΡί = 1· Sum-marizing the {pi} in a fc-dimensional (column) vector p , we write the distribution as Mnom(n,p).

The multinomial distribution arises in the experiment in which n balls are thrown into k bins, where the probability of a ball falling in the i-th bin is pj. This situation arises, for example, when a histogram is constructed for an iid sample of size n from some distribution. The multinomial distribution can be seen as a generalization of the binomial distribution. In Bayesian statistics, the multinomial distribution is the conjugate prior for the shape parameters of the Dirichlet distribution. A graph of the pdf for a three-dimensional case is given in Figure 4.24.

0.15

& 0.1

H 0.05- .rfsAfe 5 χ2

10 10

Figure 4.24 The multinomial pdf with parameters n = 10 and p = (0.2,0.5,0.3)T. Note that X3 = n — xi — xi is not shown.

Table 4.27 Moment properties of the Mnom(n, p) distribution.

Property Condition

Expectation vector n p

Covariance matrix n(diag(p) — p p T )

Probability generating function Έζ1 1 ■ ■ ■ zk

k = ί Σ ί = ι P i zi) \z,\ ^ l,i = 1, ■ ■ ■ ,k

MULTIVARIATE DISTRIBUTIONS 1 4 3

Other properties are:

1. Binomial: If (Χχ,..., Xk) ~ Mnom(n, p) , then Xi ~ Bin(n,pi), i = 1 , . . . , k.

2. Conditionals: (Xt+i | Xi, · · ·, Xt) ~ Bin ( n — X^ i = 1 -̂ »> — t

3. Subvectors: For any choice { « i , . . . , i m } of distinct indices from { 1 , . . . , k},

m \

Xix,... ,Xim,n -Υ^ Xi ] ~ Mnom(n,p) , .7 = 1 7

where p = (ph,...,pim,l- Y™=1 VijY'■

The following algorithm for generating multinomial vectors mimics throwing n balls into k bins with probabilities pi,... ,Pk-

Algori thm 4.69 (Direct Mnom(n, p) Generat ion)

1. SetX={X1,...,Xk) = (0,...,0).

2. For i = 1 , . . . , n draw Zi ~ p and set Xzt = Xz, + 1-

3. Return X .

From Property 2, the conditional densities of a multinomially distributed vector are binomial. Hence, the following algorithm can be used to generate a random vector X = (Xi,..., Xk) ~ Mnom(n, p).

Algor i thm 4.70 (Mnom(n,p) Generator)

1. Set s = 0, q = 1, and t = 1.

2. While t ^ k generate

Xt ~ Binin-s, — J,

and sei s = s + Xt, q = q — pt, and t = t + 1.

3. Output X = (Xi,...,Xk).

4.3.3 Multivariate Normal Distribution

The s tandard multivariate normal or s tandard multivariate Gaussian dis-tribution in n dimensions has pdf

/ ( x ) = ^ ^ = β " 3 χ Τ χ , X É I " . (4.11)

All marginals of X = (Xi,..., Xn)T are iid standard normal random variables.

Suppose that Z has an m-dimensional standard normal distribution. If A is an n x m matrix and μ is an n x 1 vector, then the affine transformation

X = μ + AZ

1 4 4 PROBABILITY DISTRIBUTIONS

is said to have a multivariate normal or multivariate Gaussian distribution with mean vector μ and covariance matrix Σ = AAT. We write the distribution as Ν(μ ,Σ) .

The covariance matrix Σ is always symmetric and positive semidefinite. When the matrix Σ is singular (that is, det(E) = 0), the distribution of X is singular with respect to the Lebesgue measure on R n . When A is of full rank (that is, rank(A) = min{m, n}) the covariance matrix Σ is positive definite. In this case Σ has an inverse and the distribution of X has pdf

/ ( χ ; μ , Σ ) = , 1 = e - i (χ-μ)τΣ-1(χ-μ) χ e R n _ , ^ ^

ν/(2π)™ det(E) V ;

The matrix Λ = Σ - 1 is called the precision matrix. A random vector X is thus multivariate normaljvith precision matrix Λ and mean vector μ if and only if the logarithm of its pdf / ( x ; μ, A) = / ( x ; μ, Σ) satisfies

In / ( x ; μ, A) = - - ( χ τ Λ χ - 2 χ τ Λ μ ) + constant , x e l " . (4.13)

The multivariate normal distribution is a natural extension of the normal distri-bution (see Section 4.2.11) and plays a correspondingly important role in multivari-ate statistics. This multidimensional counterpart also has the property that any affine combination of independent multivariate normal random variables is again multivariate normal. A graph of the standard normal pdf for the two-dimensional case is given in Figure 4.25.

Figure 4.25 The standard multivariate normal pdf in two dimensions.

MULTIVARIATE DISTRIBUTIONS 1 4 5

Table 4.28 Moment properties of the Ν(μ, Σ) distribution.

Property Condition

Expectation vector μ

Covariance matrix Σ

Moment generating function E β*Τ χ = exp ( t T μ + ± ί τ Σ t ) t e l "

Other properties and relations are:

1. Affine combinations: Let X i , X2, · ■ ■, X r be independent mi-dimensional nor-mal variables, with X, ~ Ν(μ ί ; Σ^), i = 1 , . . . , r. Let a be an n x 1 vector and let each Ai be an n x TO; matrix for i = 1 , . . . , r . Then,

r / r r

i=l ^ i= l i=l

In other words, any affine combination of independent multivariate normal random variables is again multivariate normal.

2. Standardization (whitening): A particular case of the affine combinations property is the following. Suppose X ~ Ν(μ, Σ) is an n-dimensional nor-mal random variable with det(E) > 0. Let A be the Cholesky factor of the matrix Σ. That is, A is an n x n lower triangular matrix of the form

(4.14)

and such that Σ =

A =

/ 0 1 1 0 · · · 0 \

«21 Û22 · · ' 0

\a„i an2 · ■ ■ annj

= AAT. It follows that

A- -\Χ-μ)~Ν(0,Ι).

Note that the Cholesky factor can be obtained efficiently via the Cholesky square root method (see Section D.3). "3° 706

3. Marginal distributions: Let X be an n-dimensional normal variable, with X ~ N(/x, Σ) . Separate the vector X into a part of size p and one of size q = n — p and, similarly, partition the mean vector and covariance matrix:

x=(x:). <·=&)· ==(£ %)· <«5> where Σ ρ is the upper left p x p corner of Σ, Σ ? is the lower right q x q corner of Σ, and ΣΓ is the p x q upper right block of Σ. Then, the distributions of the marginal vectors X p and X 9 are also multivariate normal, with X p ~ N ( M p ) E p ) a n d X , ~ N ( M , , E , ) .

1 4 6 PROBABILITY DISTRIBUTIONS

Note that an arbitrary selection of p and q elements can be achieved by first performing a linear transformation Z = AX, where A is the nxn permutation matrix that appropriately rearranges the elements in X.

4. Conditional distributions: Suppose we again have an n-dimensional vector X ~ Ν(μ, Σ) , partitioned as for the marginal distribution property above, but with det(E) > 0. Then, we have the conditional distributions

(X„ | X , = χ , ) ~ Ν(μρ + Σ , - Σ - ^ χ , - μ , ) , Σ ρ - S ^ S j ) ,

and

( X , I X p = x p ) ~ Ν(μ, + Σ ^ Σ - 1 ^ - μ ρ ) , Σ , - Σ Γτ Σ ρ - % ) .

As with the marginals property, arbitrary conditioning can be achieved by first permuting the elements of X by way of an affine transformation using a permutation matrix.

The conditional distributions can also be conveniently characterized via the precision matrix. In particular, if the latter is decomposed as

A =(£ £)■ ^ then the logarithm of the conditional pdf of X p given X ? = x , is given by

'Λρ Λ Λ fxp - 2μρΝ

1

+ constant

= - - [xjApXp - 2χρΓ(Λρμρ + Λ,,μ^ - Arxq)] + constant

= ~ 2 [XP Λ ρ Χ ρ ~ 2χ-ρΑΡμρ\ι\ + c o n s t a n t

for some p-dimensional vector μρ ι q. This shows that conditional on X ? = x 9

the random vector X p has a multivariate Gaussian distribution with precision matrix Λρ and mean vector μριη- Moreover, this mean vector can be found by solving the linear equation

Λ ρ ( Μ ρ | β - μ Ρ ) = Λ Γ ( μ ϊ - χ ς ) . (4.17)

5. Square: If X = (Xlt..., Xn)T ~ N(0,1), then X T X ~ Gamma(n/2,1/2) =

χ^. More generally, combining with the standardization property, if X ~ Ν(μ, Σ) is n-dimensional normal with det(E) > 0, then ( X — μ ) τ Σ _ 1 ( Χ — μ ) ~

A n

Property 2 is the key to generating a multivariate normal random vector X ~ Ν(μ, Σ) , and leads to the following algorithm.

Algor i thm 4.71 (Ν(μ , Σ ) Generator)

1. Derive the Cholesky decomposition Σ = AAT.

2. Generate ΖΐΊ...,Ζη ~ N(0,1) and let Z = (Zlt..., Zn)T.

3. Output X = μ + AL.

MULTIVARIATE DISTRIBUTIONS 1 4 7

4.3.4 Multivariate Student's t Distribution

The multivariate Student's t distribution in n dimensions has pdf

/ ( χ ; , / ) = Μ ^ Γ ( ξ ) 1 1 + ^ χ ; ' xeR"' (4J8)

where ι/ ^ 0 is the degrees of freedom or shape parameter. We write the distri-bution as t„ . Suppose Y = (Yy,... ,Ym)T has an m-dimensional tv distribution. If A is an n x m matrix and μ is an n x 1 vector, then the affine transformation

X = μ + AY

is said to have a multivariate Student 's or multivariate t distribution with mean vector μ and scale matr ix Σ = AAT. We write the distribution as t„(/x, Σ) . If Σ is positive definite (see Section 4.3.3), then X has pdf

p (y+n\ / i χ - ^ f{x; v, μ, Σ) = \ V . 1 + - ( x - / χ ) Τ Σ " 1 ( χ - μ) ]

(4.19) The multivariate ί distribution is a radially symmetric extension of the univariate t distribution and plays an important role as a proposal distribution in Markov chain Monte Carlo algorithms and Bayesian statistical modeling; see Example 6.2. «s* 231 In particular, it arises as the posterior distribution of the mean of a multivariate normal distribution [21]. Note that an alternative product form extension of the multivariate t distribution is also used in, for example, kernel density estimation [30, Page 103]. In this alternative extension, the pdf of the multivariate t distribution in n dimensions is defined as the product of n univariate pdfs: ΠΓ=ι / ( x i i ^)ι where f{xi\ v) is the Student's t pdf in (4.10).

Table 4.29 Moment properties of the t„(//, Σ) distribution.

Property Condition

Expectation vector μ

Covariance matrix 1^2 ^ v > 2

Characteristic Func. e i t T ^ ^ Ü r ^ L K*MWVï Σ^Η\\) t e R n

Here, Kv(x) denotes the modified Bessel function of the second kind. Other prop- " ^ 717 erties and relations are:

1. Special cases: Let X ~ t v ^ , Σ), then

X —► Z ~ Ν(μ, Σ) as v —» 00 .

If v = 1, then (4.18) gives the pdf of the multivariate radially symmetr ic Cauchy distribution.

1 4 8 PROBABILITY DISTRIBUTIONS

2. Normal distribution: Let Z ~ N(0 , / ) and S ~ Gamma(^/2,1/2) be inde-pendent. Then,

X = . /J Z ~ t„ . V

3. Wishart distribution: Let V be an n x n random matrix with a Wishar t^ + n - 1,1) distribution. If Y ~ N(0,1), then

X = yu {yx'2yl Y ~1„ ,

where V 1 ' 2 is a symmetric matrix such that y 1 / 2 ! / 1 / 2 = y .

4. Marginal distributions: If X ~ ΐ„ (μ , Σ) and the vector X is decomposed as in (4.15), then

X P ~ ί „ ( μ ρ , Σ ρ ) and X , ~ ί „ ( μ ς , Σ ΐ ) .

5. Product moments: If X ~ t„ , then [21, Page 11]

i = l \ 2 / j —1

Generation from the multivariate ί distribution follows from its relation to the multivariate normal distribution in Property 2 above.

A l g o r i t h m 4.72 ( ί „ ( μ , Σ ) G e n e r a t o r )

./. Draw the random column vector Z ~ N(0,7).

2. Draw S ~ Gamma (f, ±) = χ 2 .

5. Compute Y = y^f Z.

■®° 706 4- Return X = μ + AY, where A is the Cholesky factor ο / Σ , so that AAT = Σ .

4.3.5 Wishart Distribution

Let x = (xij,i,j = Ι,.,.,η) denote an n x n positive definite matrix; the latter property is written as x >- 0. Note that x is necessarily symmetric. The W i s h a r t distribution on the space of such matrices has pdf

/ ( x ) = c" 1 d e t ( x ) ( l / - " - 1 ) / 2 e x p f - i t r ( Σ _ 1 χ ) ] , x X 0 ,

where v ^ n is the n u m b e r of deg ree s of f reedom, E ^ O i s a n n x n covariance matrix, and the normalization constant c is

n - i l

= det^)"/2 2-/2πη(„-ΐ)/4 l[r(V~3 + 1)

MULTIVARIATE DISTRIBUTIONS 1 4 9

We denote the distribution by Wishart(^, Σ) . The Wishart distribution is closely related to the multivariate normal distribu-

tion. In particular, if Y i , . . . , Y r ~üd N(0, Σ) are n-dimensional normal variables with d e t ^ ) > 0, then

^ Y f c Y j - W i s h a r t ( r ^ ) . fc=l

In Bayesian statistics, the Wishart distribution is the conjugate prior for Σ - 1 of a "S* 675 Ν(μ, Σ) random variable, when μ is known.

Table 4.30 Moment properties of the Wishart(y, Σ) distribution.

Property Condition

Expectation matrix νΈ

Moment gen. function E e t r ( T X ) = det(7 - 2 Σ T ) - » ' 2 ( Σ " 1 - 2T) y 0

Other properties and relations are:

1. Sums: If X i ~ Wishart(^i, Σ) and X2 ~ Wishart(i/2, Σ) are independent, then

X i + X 2 ~ Wishart(1/1 + v2, Σ) .

2. χ2. distribution: Wishart(r, 1) = χ2, if r is a positive integer. Thus, the Wishart distribution is the multivariate analogue of the χ2. distribution.

3. Scaling: If X ~ Wishart(i/, Σ) is an n x n random matrix and A is an n x m constant matrix, then ATXA is an m x m random matrix with

Α Τ Χ Λ ~ Wishart(i/, ΑΤΣΑ) .

The special case where A = a is an n x 1 vector (m = 1) and v = r is a positive integer gives

a T X a ,

&Τ Σ &

Xr'

4. Bartlett decomposition: Let Σ = CCT be the Cholesky factorization of the n x n matrix Σ, and let A be an n x n lower diagonal matrix of the form given in (4.14), such that α -̂ ~üd N(0,1) for all j < i and an = ^/Ϋϊ, where

i+l Yi ~ Xr-t+ii * = 1, · · ■ i ^ are independent. Then, we have

X = CAA ' C ' ~ Wishart(r, Σ) .

Generation from the Wishart(r, Σ) distribution for small integer values of r ^ n follows directly from its defining property.

1 5 0 PROBABILITY DISTRIBUTIONS

Algor i thm 4.73 (Wishart(r, Σ ) Generator)

1. Draw the random column vectors Y i , . . . , Y r ~ N(0, Σ) .

2. Return the matrix X = Σ £ = 1 Yfc Y ^ ·

The above algorithm becomes inefficient for large values of r. An alternative and more efficient algorithm is based on the Bartlett decomposition; see Property 4 and [29].

Algor i thm 4.74 (Wishart(r, Σ ) Generator via Bart let t Decompos i t ion)

706 1. Compute C in the Cholesky factorization Σ = CCT of the n x n matrix Σ .

2. Generate the lower diagonal matrix A in (4.14), with Oij ~nd N(0,1) for all j < i and an = Λ/ΫΪ, where Yi ~ χ^- ϊ+ΐ ' « = 1, · · · , ^ are independent.

3. Output the matrix X = CAATCT.

Further Reading

Further information on discrete and continuous distributions may be found in [9, 15] and [9, 16, 17, 18, 20], respectively. Phase-type distributions and their applications are discussed in [26]. Details on stable distributions are given in [11, 12, 27, 32]. For a comprehensive reference on random variable generation algorithms, see [8].

REFERENCES

1. J. H. Ahrens and U. Dieter. Computer methods for sampling from gamma, beta, Poisson, and binomial distributions. Computing, 12(3):223-246, 1974.

2. J. H. Ahrens and U. Dieter. Sampling from binomial and Poisson distributions: A method with bounded computation times. Computing, 25(3):193-208, 1980.

3. A. C. Atkinson. The computer generation of Poisson random variables. Journal of the Royal Statistical Society, Series C, 28(l):29-35, 1979.

4. R. W. Bailey. Polar generation of random variâtes with the t-distribution. Mathe-matics of Computation, 62(206):779-781, 1994.

5. D. J. Best. A note on gamma variate generators with shape parameters less than unity. Computing, 30(2):185-188, 1983.

6. J. M. Chambers, C. L. Mallows, and B. W. Stuck. A method for simulating stable random variables. Journal of the American Statistics Association, 71(354):340-344, 1976.

7. R. C. H. Cheng and G. M. Feast. Some simple gamma variate generators. Computing, 28(3):290-295, 1979.

8. L. Devroye. Non-Uniform Random Variate Generation. Springer-Verlag, New York, 1986.

9. M. Evans, N. Hastings, and B. Peacock. Statistical Distributions. John Wiley & Sons, New York, second edition, 1993.

REFERENCES 1 5 1

10. P. J. Forrester and S. O. Warnaar . The importance of the Seiberg integral. Bulletin of the American Mathematical Society, 45(4):489-534, 2008.

11. B. V. Gnedenko and A. N. Kolmogorov. Limit Distributions for Sums of Independent Random Variables. Addison-Wesley, Reading, Massachusetts, 1954.

12. P. Hall. A comedy of errors: The canonical form for a stable characteristic function. The Bulletin of the London Mathematical Society, 13(l):23-27, 1981.

13. C. C. Heyde. On a property of the lognormal distribution. Journal of the Royal Statistical Society, Series B, 25(2):392-393, 1963.

14. W. Hörmann. A simple generator for the t distribution. Computing, 81(4):317-322, 2007.

15. N. L. Johnson and S. Kotz. Distributions in Statistics: Discrete Distributions. Houghton Mifflin Company, New York, 1969.

16. N. L. Johnson and S. Kotz. Distributions in Statistics: Continuous Univariate Dis-tributions, Volume 1. Houghton Mifflin Company, New York, 1970.

17. N. L. Johnson and S. Kotz. Distributions in Statistics: Continuous Univariate Dis-tributions, Volume 2. Houghton Mifflin Company, New York, 1970.

18. N. L. Johnson and S. Kotz. Distributions in Statistics: Continuous Multivariate Distributions. John Wiley & Sons, New York, 1972.

19. A. J. Kinderman, J. F . Monahan, and J. G. Ramage. Computer methods for sampling from Student 's t distribution. Mathematics of Computation, 31(140):1009-1018, 1977.

20. C. Kleiber and S. Kotz. Statistical Size Distributions in Economics and Actuarial Sciences. John Wiley & Sons, New York, 2003.

21. S. Kotz and S. Nadarajah. Multivariate t Distributions and Their Applications. Cam-bridge University Press, Cambridge, 2004.

22. P. L'Ecuyer and R. Simard. Inverting the symmetrical be ta distribution. ACM Trans-actions on Mathematical Software, 32(4):509-520, 2006.

23. G. Marsaglia. Improving the polar method for generating a pair of normal ran-dom variables. Technical Report Dl-82-0203, Boeing Scientific Research Laboratories, September 1962.

24. G. Marsaglia and T. A. Bray. A convenient method for generating normal variables. SIAM Review, 6(3):260-264, 1964.

25. G. Marsaglia and W. Tsang. A simple method for generating gamma variables. ACM Transactions on Mathematical Software, 26(3):363-372, 2000.

26. M. F . Neuts. Matrix-Geometric Solutions in Stochastic Models: An Algorithmic Ap-proach. Dover Publications, New York, 1981. Unabridged and corrected edition from 1994.

27. J. P. Nolan. Stable Distributions - Models for Heavy Tailed Data. Birkhäuser, Boston, 2009. In progress, Chapter 1 online at h t t p : / / a c a d e m i c 2 . a m e r i c a n . e d u / - j p n o l a n .

28. R. Y. Rubinstein and D. P. Kroese. Simulation and the Monte Carlo Method. John Wiley & Sons, New York, second edition, 2007.

29. W. B. Smith and R. R. Hocking. Algorithm AS 53: Wishart variate generator. Jour-nal of the Royal Statistical Society, Series C, 21(3):341-345, 1972.

30. M. P. Wand and M. C. Jones. Kernel Smoothing. Chapman & Hall, London, 1995.

31. R. Weron. On the Chambers-Mallows-Stuck method for simulating skewed stable random variables. Statistics and Probability Letters, 28(2):165-171, 1996.

32. V. M. Zolotarev. One-Dimensional Stable Distributions. American Mathematical Society, Providence, Rhode Island, 1986.