[springer texts in statistics] an introduction to probability and stochastic processes || univariate...

26
Section I Univariate Random Variables Discrete Random Variables These are real-valued functions X defined on a probability space, taking on a finite or countably infinite number of values {Xl' X2' ... }. They can be described by a discrete density function Px(x) = iJ=D(X = x). Such a density function has the following properties: (i) Px(x) ;;:: 0, X E IR (ii) {x: Px(x) =F O} is a finite or countably infinite set (iii) L Px(x} = 1. x Typically discrete random variables are integer-valued. Random vari- ables describe measured outcomes from experiments in which random- ness (or nondeterminism) contributes. We say that X has finite expectation if L jxjpx(x) < 00. x In this case we define its expectation EX to be EX = L xPx(x}. (1) x Suppose f is a real-valued function defined on IR. We would like to consider the random variable Y = f(X}. The possible values for Yare Yi = f(xJ and M. A. Berger, An Introduction to Probability and Stochastic Processes © Springer-Verlag New York, Inc. 1993

Upload: marc-a

Post on 08-Dec-2016

212 views

Category:

Documents


0 download

TRANSCRIPT

Section I

Univariate Random Variables

Discrete Random Variables

These are real-valued functions X defined on a probability space, taking on a finite or countably infinite number of values {Xl' X2' ... }. They can be described by a discrete density function

Px(x) = iJ=D(X = x).

Such a density function has the following properties:

(i) Px(x) ;;:: 0, X E IR

(ii) {x: Px(x) =F O} is a finite or countably infinite set

(iii) L Px(x} = 1. x

Typically discrete random variables are integer-valued. Random vari­ables describe measured outcomes from experiments in which random­ness (or nondeterminism) contributes.

We say that X has finite expectation if

L jxjpx(x) < 00. x

In this case we define its expectation EX to be

EX = L xPx(x}. (1) x

Suppose f is a real-valued function defined on IR. We would like to consider the random variable Y = f(X}. The possible values for Yare Yi = f(xJ and

M. A. Berger, An Introduction to Probability and Stochastic Processes© Springer-Verlag New York, Inc. 1993

2 I. Univariate Random Variables

(2)

Here we allow for the possibility that f may not be one to one. From this it follows that

L Iylpy(y) = L Iyl L Px(x) = L If(x)IPx(x). y y XEJ~I(y) x

We conclude from this that Y has finite expectation if and only if Lx If(x) I Px(x) < 00; if this holds, then by a similar calculation

Ef(X) = L f(x)px(x). (3) x

Observe that (3) is consistent in the following sense. Suppose f and g are two functions for which f(X) = g(X). This happens when f(x) = g(x) for all x at which Px(x) > O. Then it follows from (3) that Ef(X) = Eg(X). We shall have occasion to deal with complex-vallled functions f defined on. JR. In this case we say that f(X) has finite expectation if each of f1 (X) and fz(X) have finite expectation, f1 = Re(f) being the real part of f and fz= Im(f) being the imaginary part off Since

If11, Ifzl s If I s If11 + Ifzl this is equivalent to the condition Lx If(x)IPx(x) < 00 (If I and If(x) I here denote the modulus of the complex entities). If this condition holds, then we define

Ef(X) = Ef1 (X) + iEfz(X) = L f(x)px(X). (4) x

Properties of Expectation

(Pd If fj are complex-valued functions defined on JR for which fj(X) have finite expectation, and if a j are (complex) constants, 1 sis n, then Ll=l ajfj(X) has finite expectation and

(5)

In particular if X has finite expectation and if a and b are constants then aX + b has finite expectation and E(aX + b) = aEX + b. Also, Eb = b (thinking of b on the left as a constant random variable).

(Pz) If X has finite expectation and X ~ 0 then EX ~ O. Moreover EX = 0 in this case if and only if X = O. In particular if f and g are real-valued functions defined on JR for which f(X) and g(X) have finite expectation, and if f(X) s g(X)l, then Ef(X) s Eg(X), with equality if and only if f(X) = g(X). Also, if X has finite expectation then IEXI s EIXI.

1 This means that f(x) s g(x) for all x at which Px(x) > O.

Properties of Expectation 3

(P3 ) If m 1 ~ X ~ m2 for some constants m 1 and m2, then X has finite expectation and m 1 ~ EX ~ m 2 . In particular if I X I ~ 8 for some constant 8 > 0 then I EX I ~ 8.

(P4 ) If X is nonnegative integer-valued, then X has finite expectation if and only if the series I~=l P(X ;;:: x) converges. If this series does converge then its sum is EX.

(Ps) Jensen's Inequality. If X has finite expectation and if f is a convex real-valued function defined on ~ for which f(X) has finite expectation then

f(EX) ~ Ef(X). (6)

PROOFS.

(Pd First observe that

~\;~ a;.t;(x)\px(X) ~ ~;~ la;.t;(x)lpx(x~ _

n

= I lad I 1.t;(x)lpx(x) < 00. ;=1 x

so that Ii'=l aJ';(X) has finite expectation. Thus by (3)

EL~ a;.t;(X)] = ~ L~ a;.t;(x)]px(X)

n n

= I a; I.t;(x)px(x) = I a;E.t;(X). 0 ;=1 x ;=1

(P2) If X ;;:: 0 then Px(x) = 0 for x < O. Thus

EX = I XPx(X) = I XPx(X);;:: o. x x:2:0

Furthermore, equality holds here if and only if Px(x) = 0 for x > 0, in which case Px(O) = 1. Using (P1) and applying this result here to the random variable g(X) - f(X), and then specializing to the choices f(x) = ±x, g(x) = Ixlleads to the other conclusions in (P2). 0

(P3 ) Letm = max(lm1 1, 1m2!)· Sincepx(x) = Oifx ¢ [m1, m2] it follows in particular that Px(x) = 0 if Ixl > m. Thus

I Ixlpx(x) = I Ixlpx(x) ~ I mpx(x) = m. x ~sm ~sm

FFom this we see that X has finite expectation. The fact that m1 ~ EX ~ m2 follows from (P2) and (the very last part of) (Pd. 0

(P4 )

I~=l xPx(x) = I~=l Px(x) I~=l 1 = I~=l I~=Y Px(x) = I~=l P(X ;;:: y). o

4 I. Univariate Random Variables

(Ps) We need to show that if Pi> 0, LPi = 1 then

f(f PiXi) ~ fP;!(x;).

If there are only a finite number of PiS, then this inequality follows directly from the convexity of f Otherwise, observe that for any n

f(Li'=l PiXi) < Li'=l P;!(Xi) Li'=l Pi - Li'=l Pi .

Since convex functions are necessarily continuous, we can take limits as n -+ 00 and arrive at the desired result. 0

Let X be a random variable such that X 2 has a finite expectation. Since Ixi ~ X2 + 1 it is easily seen that X itself also has a finite expectation. We define the variance of X to be

Var X = E(X - EX)2. (7)

This is always a nonnegative number, and we write Var X = 0'1 and refer to O'X as the standard deviation of X. Since EX = J.l is a constant it follows from (P d that the variance is also given by the expression

Var X = EX2 - J.l2. (8)

One consequence ofthis is that EX2 ~ (EX)2, for any random variable X such that X 2 has finite expectation. For any constant a

(9)

so that Var X is the minimum value of E(X - a)2, that minimum being realized at a = J.l. This allows us to interpret EX as the best constant approximation to X, in the least squares sense.

Let r ~ 0 be an integer. We say that X has a moment of order r if X' has finite expectation, and in that case we define the rth moment of X as EX'. If X has a moment of order r, then it has a moment of order k for all k ~ r, since Ixlk ~ lxi' + 1. If X has a moment of order r then X - J.l has a moment of order r, by (Pd, where J.l = EX; it is referred to as the rth central moment of X. Thus the first centered moment of X is always zero (whenever X has a finite expectation), and the second centered moment of X is its variance (whenever X 2 has a finite expectation).

The characteristic function of a random variable X is defined as

<Px(u) = Ee iuX, -00 < u < 00. (10)

Since leiuxi = 1 it follows that eiuX always has finite expectation, for any value of u, and thus <Px is well-defined. Observe that

(11) x

When X is integer-valued this is simply the Fourier series for <Px. It is

Properties of Expectation 5

special in that its coefficients are nonnegative numbers summing to one. In particular since LxPj,(x) < 00, ({J E L2( -n, n) and we can recover the distribution of X from ({J via its Fourier coefficients

1 I" . Px(x) = - e-lUX({Jx(u) du; 2n _"

x = 0, ± 1, ± 2, .... (12)

Slightly more generally, if X takes on the values {kd : k = 0, ± 1, ± 2, ... } then the same argument shows that

d I"ld . Px(x) = - e-,ux({Jx(u) du; 2n -"Id

x = 0, ± d, ± 2d, .... (13)

In general, if the range of X is not assumed to lie inside some lattice, then

() 1· 1 IN -iux () d (14) Px x = 1m 2N e ({Jx U u. N .... oo -N

The proof of (14) depends on the following results from real analysis.

I. (Dominated Convergence Theorem) Let fn and f be real-valued func­tions defined on ~ (n = 1,2, ... ), and suppose that for each x

lim fix) = f(x). n .... oo

If there exists a real-valued function g for which g(X) has finite expectation, satisfying Ifni::::;; g (n = 1, 2, ... ), then

lim Efn(X) = Ef(X). n .... oo

II. (Fubini's Theorem) Let f be a complex-valued function of two vari­ables, continuous in the second (everywhere). If there exists a real­valued function g for which g(X) has finite expectation, satisfying

J:oo If(x, u)1 du ::::;; g(x)

for each x, then

J:oo Ef(X, u) du = E J:oo f(X, u) du.

PROOF OF (14)

_1_ IN e-iux({Jx(U) du = _1_ IN Eeiu(X-x) du 2N -N 2N -N

= _1_ E IN eiu(X-x) du 2N -N

(by (II) with g == 1)

= Esin N(X - x). N(X - x)

6 I. Univariate Random Variables

. . sinN(y-x) . ISintl . Smce hm ( ) = I{x}(Y) and SInce - ~ 1 for all t, It follows N .... oo N y - x t

from Result I that

. sin N(X - x) lIm E N(X ) = EI{x}(X) = Px(x). N"'oo - X

o

Properties of Characteristic Functions

(Pz) ICPxl ~ l.

(P3 ) CPx( -u) = CPx(u).

(P4 ) If X has a moment of order k, then Cf>x is k times dif.ferentiable at U= O,and

(15)

(Ps) Cf>x determines the distribution of X uniquely.

(P6 ) Cf>x is uniformly continuous on IR.

(P 7) Cf>x is positive-semidefinite in the sense that

(16)

for any real numbers Ul> Uz, ... , Un and any complex numbers '1' eZ"'" 'n.

PROOFS. Properties (Pl)-(P3 ) are immediate and property (Ps) follows from (14). 0

(P4 ) Use Taylor's Theorem with remainder on sin t and cos t to write

e iuX = ± (iuX>' j=O j!

(iuX)k . + ",[COS(Ol(X)UX) + i SIn(Oz(X)uX) - 1],

where 1011, 1021 ~ 1. For each x

lim Xk[COS(Ol(X)UX) + i sin(Oz(x)ux) - 1] = o. u .... o

Properties of Characteristic Functions 7

Furthermore, for each u

Ixk[cos(Ol(x)ux) + i sin(02(x)ux) - 11 ~ 3lxlk. Result I from real analysis applies even if the index n is allowed to be real rather than integral. Using this version of it we conclude that

k-l (iuY (iU)k CfJx(u) = ,L -.-, EXk + -k' [EXk + 0(1)].

}=o J. .

ICfJx(u + h) - CfJX(u) I = IEeiUX(eihX - 1)1

~ Ele ihx - 11. Apply Result I as h - 0 to conclude that

lim Ele ihx - 11 = o. h-+O

If X has a moment of order k then we define its kth cumulant to be

1 dk

~ -d k log CfJx(O). I U

o

o

o

The moment generating function t/lx(t) of a random variable X is defined by

(17)

Its domain consists of all real numbers t such that etX has finite expecta­tion, and it is easily seen that this turns out to be an interval containing t = O. In order for t/lx to exist for t in some neighborhood of zero, it is necessary that X have moments of all orders. These moments are then computable as

(18)

If X is nonnegative and integer-valued then we define its probability­generating function <l>x(t) to be the power series

<l>X(t) = EtX = L Px(k)tk. (19) x

Its domain is a symmetric interval about t = 0 with radius r ~ 1 equal to the radius of convergence for this power series. In order for r > 1 it is necessary that X have moments of all orders. These moments are then

8 I. Univariate Random Variables

recoverable through

dk EX(X - 1) ... (X - k + 1) = dt k <I>x(1).

The distribution of X is always recoverable from <I>x as

1 dk

Px(k) = k! dtk <I>x(O).

(20)

We conclude our discussion of discrete random variables with, the following.

Chebyshev's Inequality. Suppose X has a second moment. Then Jar any t>O

(21)

where J1 = EX and (J2 = Var(X).

PROOF. Define

J = tIA , where A = {Ix - J11 ~ t}.

Since Ix - J11 ~ J(x) we have

EIX - J112 ~ Ej2(X) = t21P'(IX - J11 ~ t). o

Chebyshev's inequality is one way of quantifying the fact that (J2 is a measure of the "spread" of X about its mean. The smaller the value of (J,

the more concentrated X is about its mean. The following is a list of some basic discrete distributions, together

with some qualitative and quantitative descriptions.

Basic Distributions (from Parzen [45])

Bernoulli (0 < P < 1)

This is a random variable tnat takes on the values 1 (success) and 0 (failure) with respective probabilities p and q = 1 - p. Trials that can result in either success or failure are called Bernoulli trials.

J1 = p,

Px(x) = {q, p,

x=O x = 1

<Px(u) = pe iu + q

E(X - J1)3 = pq(q - p),

Basic Distributions 9

Binomial(n = 1,2, ... ; ° <p < 1)

This models the number of successes in n independent Bernoulli trials, in which the probability of success at each trial is p. It also arises in random sampling with replacement.

Px(x) = (:) pXqn-x; x = 0,1, ... , n

({)x(u) = (pe iU + qt J1 = np, (52 = npq, E(X - J1)3 = npq(q - p),

Hypergeometric(r = 1,2, ... ; n, r1 = 0,1,-... , r)

This models the number of objects of type one present in a sample of size n drawn from a population of r objects, r1 being of type one and r2 = r - r1 being of type two. Sampling is done without replacement (hence the requirement n s r).

max(O, n - r2 ) S x s min(n, rd

(~) <l>x(') ~ (:/(., p, y, ,)

where a = -n, [3 = -rb }' = r2 - n + 1, and F is the hypergeometric function 1

a[3 t a(a + 1)[3([3 + 1) t2

F(a,[3,y,t)=I+~-I'+ ( 1) 2'+'" y. y y + .

J1 = np, (52 = npq -- , (r - n) r - 1

( r - n)(r - 2n) E(X - J1)3 = npq(q - p) r _ 1 r _ 2 '

1 (r2) y(y + 1) ... r2 . If r2 < n then y :s; 0 and by writing out the factors = and canceling n n!

them with the factors of y in the denominators of the coefficients for F, one gets a series whose lowest power is 1 - y.

10 I. Univariate Random Variables

npq(r - n) E(X - fl)4 = (r _ l)(r _ 2)(r _ 3) {r(r + 1) - 6n(r - n)

+ 3pq[r2(n - 2) - rn2 + 6n(r - n)]}

where p = r1/r and q = 1 - p. The function F is a polynomial solution to the differential equation

d 2 F dF t(t - 1) dt2 + [y - (ex + f3 + l)t] dt - exf3F = 0.

Geometric (0 < p < 1)

This models the number of trials required to achieve the first success in a sequence of independent Bernoulli trials, in which the probability of success at each trial is p.

fl = l/p,

Px(x) = pqX-1;

pet t/Jx(t) = -1 --t'

- qe

x = 1,2, ...

E(X - fl)4 = ;2 (1 + 9 ;2). Some authors define the geometric distribution in terms of the number of failures encountered until the first success is obtained. This corresponds to replacing X with X-I.

Negative Binomial (r > 0; 0 < P < 1)

This models the number offailures encountered in a sequence of in de pen­dent Bernoulli trials (with probability p of success at each trial) before achieving the rth success.

Px(x) = (r +: -l)prqx; x = 0,1, ...

I/Ix(t) = C ! qer}, qJx(u) = C _Pqe iU)'.

fl = rq/p, 3 rq ( q) E(X - fl) = - 1 + 2- , p2 P

Absolutely Continuous Random Variables 11

We can use the negative binomial distribution even if the parameter r is

nonintegral. The coefficient (r + : - 1) is evaluated as

r(r + 1) ... (r + x - 1) x!

Poisson (A > 0)

This models the number of occurrences of events of a specified type in a period of time of length 1, when events of this type are occurring randomly at a mean rate A per unit time. Many counting time random phenomena are known from experience to be approximately Poisson distributed. Some examples of such phenomena are Jhe number of atoms of a radioactive substance that disintegrate in a unit time interval, the number of calls that come into a telephone exchange in a unit time interval, the number of misprints on a page of a book, and the number of bacterial colonies that grow on a Petri dish that has been smeared with a bacterial suspension.

x =0,1, ...

I/Ix(t) = exp[A(e' - 1)], fPx(u) = exp[A(eiu - 1)],

Jl = A, (12 = A, E(X - Jl)3 = A, E(X - Jl)4 = A + 3A 2 •

Absolutely Continuous Random Variables

These are random variables X with distribution satisfying

P(X E A) = L fx(x) dx, (22)

for all Borel subsets A s;;; ~, where f is a nonnegative integrable function defined on ~,

f:oo fx(x) dx = 1. (23)

We refer to fx as the density of X. It is sometimes more convenient to work with the (cumulative) distribution function

Fx(x) = P(X :::;; x) = f~oo fx(y) dy. (24)

This function Fx is a continuous nondecreasing function with Fx( -00) = o and Fx( (0) = 1, where we have introduced the notation Fx( -00) =

12 I. Univariate Random Variables

lim Fx(x) and Fx( (0) = lim Fx(x). The connection between fx and Fx is X--C()

simply

fx = d::. (25)

We say that X has finite expectation if J~oo Ixlfx(X) dx < 00. If this holds then we define the expectation EX of X to be

EX = J:oo xfx(X) dx. {26)

Suppose g is a one-to-one differentiable function and consider the ran­dom variable Y = g(X). The distribution function of Y is

dg > 0 dx

dg dx <0.

In any event, we see that Y is absolutely continuous and

fy(y) = fx(x) II ~ g(x) I, x = g-l(y). (27)

If g is not one to one, then one needs to sum over the various inverse branches. Thus, in gereral,

fy(y) = L fx(X)/1 dd g(x) I· g(x)=y x

(28)

By making the substitution y = g(x) we see that

J IYlfy(y) dy = J:oo Ig(x)lfx(x) dx,

and thus g(X) has finite expectation ifand only if J:oo Ig(x)lfx(x) dx < 00.

In this case, a similar calculatipn shows that

Eg(X) = J:oo g(x)fx(x) dx. (29)

Suppose that instead of being differentiable, g has only a countable range. Then Y = g(X) is discrete, with

py(y) = f fx(x) dx. g-l(y)

In this case

Absolutely Continuous Random Variables 13

and we have the same conclusion as earlier. Namely, g(X) has finite

expectation if and only if J:oo Ig(x)lfx(x) dx < 00, and in this case (29) is

valid. Up to this point we have only considered discrete and absolutely

continuous random variables. Thus if X is absolutely continuous then we can only discuss now random variables Y = g(X) for functions g that are either differentiable or at most countably valued.

Many of the results stated earlier for discrete random variables remain true for absolutely continuous random variables; in fact since many of the preceding proofs were formulated in a way not explicitly involving the discrete density function Px, they carryover as well In (Pl) we have to limit ourselves to complex-valued functions /; whose real and imaginary parts are either differentiable or at most countably valued. In (Pz) the equality EX = 0 will never hold since an absolutely continuous random variable X cannot be equal to zero (with probability one). The functions f and g have to be either differentiable or countably valued. Property (P3 ) remains valid, and the proofs for properties (Pd-(P3 ) (modified as described earlier) remain valid from the discrete case if one simply replaces sums with integrals. Properties (P4) and (Ps) need to be restated as follows.

(P~) If X is (absolutely continuous and) nonnegative real-valued, then X

has finite expectation if and only if the integral too IP(X ~ x) dx converges,

00

and moreover this obtains if and only if the series L IP(X ~ k) converges. k=l

If they do converge, the"

k~l IP(X ~ k) .:s;; EX = I10 IP(X ~ x) dx .:s;; 1 + ~l IP(X ~ k).

(P;) (Jensen's Inequality). If X (is absolutely continuous and) has finite expectation and if g is a differentiable convex function defined on III for which g(X) has finite expectation then

g(EX) .:s;; Eg(X).

PROOFS OF (P~) AND (P;). (P~) Define Y = [X]. Then Y is discrete and Y.:s;; X .:s;; Y + 1. Thus X has finite expectation if and only if Y does. By applying (P4 ) to Y we see that this is the case if and only if

I. Univariate Random Variables

00 00

2: IJ=D(X ~ k) = 2: IJ=D(Y ~ k) < 00, k=l k=l

00

in which case E Y = 2: IJ=D(X ~ k). k=l

Observe now that if X has finite expectation then

lim x[1 - Fx(x)] = 0, (30) ""'00

since

x[1 - Fx(x)] = x Loo fx(Y) dy ~ Loo yfx(Y) dy.

Similarly if Loo IJ=D(X ~ x) dx < 00 then X must have finite expectation

and (30) must hold, since

Loo IJ=D(X ~ x) dx ~ Jl IJ=D(X ~ k).

When (30) does hold, it follows upon integrating by parts that

LX> xfx(x) dx = Loo [1 - Fx(x)] dx - x[1 - Fx(x)]/~=o

= Loo [1 - Fx(x)] dx.

(Ps) For any x and y we have

g(x) - g(y) ~ g'(y)(x - y),

since g is convex and differentiable. Thus

g(X) - g(EX) ~ g'(EX)(X - EX).

o

Applying expectation to both sides of this inequality leads to the desired conclusion. 0

The various statistics (standard deviation, variance, moments, central moments, cumulants) and the characteristic and moment generating functions associated with an absolutely continuous random variable X are defined exactly as we defined them for discrete random variables. Chebyshev's inequality still holds and is proved exactly as earlier. Simi­larly the properties of characteristic functions still hold. In fact other than (Ps) the proofs given earlier of these properties and Chebyshev's inequality never made explicit use of the discrete density Px, and so they carry right over to the absolutely continuous setting. Property (Ps), on

Absolutely Continuous Random Variables 15

the other hand, made explicit use of inversion formula (14) for discrete random variables. So this property needs to be addressed now.

When X is absolutely continuous with density Ix, the characteristic function C(Jx turns out to be the Fourier transform (or integral) of Ix; namely,

C(Jx(U) = f~oo eiUXIx(x) dx.

It is special in that the Ll-function Ix is nonnegative with total integral one. The distribution of X can be recovered through the inversion formula

(31)

In particular if C(Jx is in,tegrable, then

Ix(x) = 21 foo e-iuxC(Jx(u) duo n -00

(32)

PROOF. We now prove (31). By interchanging the order of integration (Result II from the earlier analysis)

21 fR e-iux1 ~ e-iUX2 C(Jx(u) du = EgR(X) n -R IU

(33)

where

( ) _ 1 fR [Sin u(y - Xl) sin u(y - X 2 )] d gR y - - - u.

n 0 u u

Using the fact that

. fR sin u hm -- du = n/2, R .... oo 0 U

we see that for Xl < X2

lim gR(Y) = I(x"x2)(Y); R .... oo

Furthermore since sm u du is bounded for all R we see that g R is fR •

o u bounded. Thus by using Result I from the preceding analysis, we can let R~ 00 in (33) and arrive at (31). D

The following is a list of some basic absolutely continuous distribu­tions, together with some qualitative and quantitative descriptions.

16 I. Univariate Random Variables

Basic Distributions (from Parzen [45])

Uniform (a < b)

This models the location on a line of a dart tossed in such a way that it always hits between the endpoints of the interval a to b, and any two subintervals (of the interval a to b) of equal length have an equal chance of being hit. Similarly if a well-balanced dial is spun around and comes to rest after a large number of revolutions, it is reasonable to assume that the angle of the dial after it stops moving is uniformly distributed on (0, 2n). Often in numerical analysis it is assumed that the rounding error caused by dropping all digits more than n places beyond the decimal point is uniformly distributed on (0, 1O-n ).

1 fx(x) = b - a' a<x<b

e'b _ e'a e iub _ e iua

I/lx(t) = t(b _ a)' IPx(u) = iu(b - a)

a + b 2 (b - a)2 /l = -2-' (J = 12' E(X - /l)3 = 0,

E(X _ )4 = (b - a)4 /l 80

Normal (-00 < m < 00, (J > 0)

Normally distributed random variables occur most often in practical applications. Maxwell's law in physics asserts that under appropriate conditions the components of the velocity of a molecule of gas will be normally distributed, with (J2 determined from certain physical quan­tities. Many random variables of interest have distributions that are approximately normal. Thus measurement errors in physical experi­ments, variability of outputs from industrial production lines, and biolog­ical variability (e.g., height and weight) have been found empirically to have approximately normal distributions. It has also been found, both empirically and theoretically, that random fluctuations that result from a combination of many unrelated causes, each individually insignificant, tend to be approximately normally distributed. Theoretical results in this direction are known as "central limit theorems." The number of successes in n independent Bernoulli trials, n large (probability of success p at each trial), approximately obeys a normal probability law with m = np, (J2 = npq.

Basic Distributions 17

fx(x) = ~exp[_~(x - m)2], u-J2n 2 u

-00 < x < 00

t/lx(t) = exp (tm + ~ t2( 2). <Px(u) = exp (ium - ~ U2( 2 )

J.l = m, u2 = u2, E(X - J.l)3 = 0, E(X - J1f = 3u4.

Exponential (A. > 0)

The exponential distribution models decay times for radioactive parti­cles. It also models the waiting time required to observe the first occur­rence of an event of a specified type when events of this type are occurring randomly at a mean rate), per unit time. Examples of such waiting times are the time until a piece of equipment fails, the time it takes to complete a job, or the time it takes to get a new customer. It serves as the waiting time distribution for the Poisson counting random variable.

fx(x) = A.e-;'x, x > 0

<Px(u) = (1 _ ~) -1

J.l = l/A.,

Gamma (r > 0, A. > 0)

This models the waiting time required to observe the rth occurrence of an event of a specified type when events of this type are occurring randomly at a mean rate A. per unit time. There are many applied situations when the density of a random variable can be approximated reasonably well by a gamma density with appropriate parameters.

A. i (x) = _(A.X),-le-;'X x> 0 x qr) ,

( A. )r (iu)-r t/lx(t) = A. _ t' <Px(u) = 1 - I

J.l = riA.,

E( _ )4 = 6r + 3r2

X J.l A.4

18 I. Univariate Random Variables

Chi-Square (n > 0)

This models the sum xf + ... + X; of the squares of n independent random variables, each N(O,1). It corresponds to GAMMA (r = n12, A = 1/2).

( 1 )n/2 I/Ix(t) = 1 - 2t ' li'x(U) = (1 - 2iu)-n/2

Il = n, E(X - 1l)3 = 8n, E(X - 1l)4 = 12n(n + 4).

F-Distribution

This models the ratio U 11m, where U and V are independent random vari­

Vn ables, x2 distributed with m and n degrees of freedom, respectively.

x>O

n Il = n _ 2 (n > 2),

2 2n2(m + n - 2) (f = m(n _ 2)2(n _ 4) (n > 4)

Only the moments of order up to [ n ; 1] exist. This distribution is also

called the variance ratio distribution and is widely used in statistics for the analysis of variance (ANOV A). It is named after the statistician Sir Ronald Fisher. Related to the F-distribution is the z-distribution, corre­sponding to the random variable Z = ! log X.

-00 < z < 00

Basic Distributions 19

t-Distribution (n > 0)

This models the ratio ~ where X and U are independent random vi r; If}

variables with N(O, 1) and x2 (n) distributions, respectively.

r(~) 1

fx(x) = (n) ( x 2)<n+l)/2' for - 1 +-2 n

-00 < x < 00

It = 0, 2 n

(J = -- (n > 2), n-2

E(X - 1t)3 = 0,

3n2 E(X - 11)4 = (n >- A\

r (n - 2)(n - 4) ~

Only the moments of order up to n - 1 exist. This distribution is also called Student's distribution. It is named after "Student" (W.S. Gosset).

Beta (p > 0, q > 0)

Cauchy

1 fx(x) = --XP-l (1 - X)q-l,

B(p, q) Osxs1

P It = P + q'

2 pq (J = ------_.

(p + q + 1)(p + q)2·

1 fx(x) = 11:(1 + x 2 )'

-00 < x < 00

qJx(u) = e- 1ul•

None ofthe moments exist. This corresponds to the t-distribution, n = 1.

Rayleigh (cr > 0)

This models J X 2 + y2 where X and Yare independent random vari­ables, each N(O, (J2).

fx(x) = :2 exp [ -~(~Yl x> 0.

20 I. Univariate Random Variables

Distribution Functions

A distribution function (abbreviated dJ.) is a real-valued function F de­fined on IR that is increasing and right continuous with F( -00) = 0 and F( 00) = 1. If X is any real-valued random variable then Fx defined by

Fx(x) = IP(X :::;; x)

is a dJ., and conversely it can be shown that any dJ. corresponds to a random variable in this way. We define the point mass at t, bt , t9 be the dJ. bt = 1[t.00). When X is a discrete random variable with discrete density Px,

(34) y

and when X is absolutely continuous with density fx,

Fx(x) = f~oo fx(y) dy. (35)

Correspondingly we say that a dJ. of the form (34) is discrete, and one of the form (35) is absolutely continuous.

Since Lx [F(x) - F(x -)] :::;; 1 it follows that adJ. F can have at most a countably infinite number of jumps. Define the discrete part of F to be

(36) y

Observe that for -00 :::;; Xl < x2 :::;; 00

Fix2 ) - Fix l ) = L [F(y) - F(y-)] :::;; F(x 2 ) - F(xd (37) Xl <Y:S;X2

and

FAx) - FAx-) = F(x) - F(x-); (38)

this last fact follows by applying a dominated convergence argument (Result I from the preceding analysis) to evaluate the limit

lim L [F(y) - F(y-)] [by (x) - by(x - e)], ,-1-0 y

since the jumps F(y) - F(y - ) are summable. It follows from (37) that Fd is increasing and inherits the right continuity of F. It also follows by setting Xl = -00 and letting X2 --+ -00 that Fi -00) = O. However, Fd may not be a dJ. since in general FA 00) :::;; 1.

Consider next the function Fe = F - Fd. It is clear that FA -00) = 0 since F and Fd have this property; inequality (37) shows that Fe is increas­ing. Furthermore Fe is right continuous, being the difference of right continuous functions. More significant is the fact that Fe is also left continuous (making it continuous altogether), which follows directly from (38). We refer to Fe as the continuous part of F. Again, Fe may not be

Distribution Functions 21

a dJ. simply because Fe( 00) ~ 1. On the other hand Fi 00) + FAro) = 1, so that F can be written as a convex combination of a discrete and continuous dJ. This decomposition is unique since the difference of two discrete dJ.s can be continuous if and only if they are identical.

It is shown in real analysis that if F is a dJ. then the derivative F' exists as an element of L 1. (It exists almost everywhere with respect to Lebesgue measure on R) Furthermore for -00 ~ Xl < X2 ~ 00

(39)

We say that F is singular if F' = 0 almost everywhere. Define the abso­lutely continuous part of F to be

Fae(x) = f~CXl F'(y) dy. (40)

Then it follows from (39) that Fae satisfies the same inequality (37) as Fd; namely,

(41)

So, as we argued for Fe' the function F. = F - Fae is increasing and right continuous with F.( -00) = O. Furthermore, almost everywhere F; = 0, so that F. is singular. We refer to F. as the singular part of F. The discrete parts of F and F. coincide since F and F. have the same jumps everywhere. The remainder F. - Fd must therefore be singular continuous. We thus conclude that:

Every df. can be written uniquely as a convex combination of a discrete, a singular continuous and an absolutely continuous df.

The uniqueness follows from the fact that the difference of two absolutely continuous dJ.s can be singular if and only if they are identical.

A classic example of a singular continuous dJ. is the Cantor function. It is defined as follows. If k is the natural number with binary expansion

k = L sj2j (Sj = 0 or 1) j

denote

Then the Cantor function is defined on [0,1] by

( ) _ 2k + 1 l" [6k* + 1 6k* + 2J F X -~ Jor x E 3m' 3m

(m = 1,2, ... and k = 0, 1, ... , 2m- 1 - 1).

22 I. Univariate Random Variables

Its continuity follows from the fact that for x < y

y - x ::; 3-m => F(y) - F(x) ::; rm.

It arises in the following fashion. Define a sequence of numbers {xn } by Xl = 0 and

with probability 1/2

with probability 1/2;

where these probabilities are figured independently at each n. Then with probability one, for all x

lim # of points Xl' X2"'" Xn that are::; x = F(x). n-+oo n

In fact what one shows is that this limit is a dJ. satisfying the difference equation

2F(x) = F(3x) + F(3x - 2)

with boundary conditions

F(O) = 0, F(I) = 1.

It can easily be seen that the unique dJ. satisfying these requirements is the Cantor function. (Show directly that F(x) = 1/2 for x E [1/3, 2/3], F(x) = 1/4 for x E [1/9, 2/9], F(x) = 3/4 for x E [7/9, 8/9], F(x) = 1/8 for x E [1/27, 2/27], etc.).

Given adJ. F and a real-valued function 9 defined on ~, we construct the Riemann-Stieltjes integral as follows. Let IT = (XO, Xl' ••• , xn), where

a = Xo < Xl < ... < Xn = b,

be any partition of [a, b] and denote

p(IT) = max (Xi - Xi-d.

Let x; E [X i - 1 , x;], 1 ::; i::; n. If n ,

lim L g(x;)[F(Xi) - F(Xi-1)] p(nj-+O i=l

exists then we call it the Riemann-Stieltjes integral of g over [a, b], de­noted r g(x) dF(x).

If this exists for all a < b, and if

!~~ r g(x) dF(x) a-+--co

Computer Generation of Random Variables 23

also exists, then we call it the Riemann-Stieltjes integral of g (over ~), denoted

f g(x) dF(x).

When F is discrete and ofthe form (34), then this integral corresponds to

L g(x)px(x); x

when F is absolutely continuous and of the form (35), then this integral corresponds to

f g(x)fx(x) dx.

When g is complex-valued we define the Riemanll-Stieltjes integral through the real and imaginary parts of g.

Let X be a random variable with dJ. Fx. Then for any Borel subset A~~

P(X E A) = f IA(x) dF(x) = L dF(x).

We say that X has finite expectation if f Ixl dF(x) < 00, in which case

EX = f x dF(x).

The properties of expectation stated for discrete and absolutely continu­ous random variables are valid in general as well. The characteristic function cpx is defined by

CPx(u) = f eiux dF(x).

The properties stated earlier for characteristic functions also remain valid. The inversion formula (31) holds whenever Xl and X2 are points of continuity of Fx, and this uniquely determines Fx. (The proof involves a more delicate analysis of (33).) The inversion formula (14) also holds, where Px(x) is interpreted as the jump F(x) - F(x - ) = P(X = x).

Computer Generation of Random Variables

First, suppose that our computer can generate a random variable U that is uniformly distributed on (0, 1). We are given adJ. F and desire to generate a random variable X having F as its dJ. Define

F- l (u) = min(x E ~ : F(x) ;::: u), U E (0,1). (42)

24 I. Univariate Random Variables

Since F is right continuous this minimum exists. Observe that according to this definition

Thus 1P(F-1(U) ::.;; x) = IP(U ::.;; F(x)) = F(x),

and we see that X = F-1(U) has the desired dJ. IfF has the discrete form LPi(jai where a 1 < a2 < ... , then (42) leads to

F-1(u) = ai for L Pj < U ::.;; L Pi-j<i j~i

One method of generating U is the mUltiplicative congruential method. The samples are generated according to the recursion

Xi = CX i - 1 mod(231 - 1).

Each Xi is then scaled into (0, 1). If the~multiplier c is a primitive root modulo 2'31 - 1 (which is prime), then the generator will have the maximum period of 231 - 2. Some popular values for care 16,807 or 397,204,094 or 950,706,376. Typically one can set the seed (if so desired) to any number between 1 and 231 - 2.

Another method is to use a sequence like

Xi = CXi-1 + d (mod c - 1)

where c = 217 and d = 213 - 1. Each Xi is then scaled into [0, 1). The period here will be c - 1, since it is prime.

Exercises

1. (Parzen [45J) It is estimated that the probability of detecting a moderate attack of tuberculosis using an x-ray photograph of the chest is 0.6. In a city with 60,000 inhabitants, a mass x-ray survey is planned so as to detect all the people with tuberculosis. Two x-ray photographs will be taken of each indi­vidual, and he or she will be judged a suspect if at least one of these photo­graphs is found to be "positive." Suppose that in the city there are 2000 persons with moderate attacks of tuberculosis. Let X denote the number of them who, as a result of the survey, will be judged "suspects." Find the mean and variance of X.

2. (Parzen [45J) A man with n keys wants to open his door. He tries the keys independently and at random. Let N. be the number of trials required to open the door. Find EN. and Var(N.) if

(i) unsuccessful keys are not eliminated from further selections. (ii) they are.

Assume that exactly one of the keys can open the door.

3. (Parzen [45J) Let U be a random variable uniformly distributed on the interval [0, 1]. Let g(u) be a nondecreasing function, defined for 0 :::; u :::; 1.

Exercises

Find g(u) if the random variable g(U) has the given distribution.

(a) g(U) is Cauchy. (b) g(U) is exponential, mean 1/11. (c) g(U) is N(m, (12).

25

4. (Parzen [45]) Find the mean and variance of cos nX, where X has the given distribution.

(a) X is N(m, (12). (b) X is Poisson, mean A. (c) X is uniformly distributed on [ -1, 1].

5. (Gnedenko [23]) Find EX and Var(X) for each of the following random variables.

(a) (Pascal distribution)

ak

IP'(X = k) = (1 + a)k+l; k =D, 1,2, ...

where a > O. (b) (Polya distribution)

IP' X = k = (_a_)k (1 + [3)(1 + 2[3) ... (1 + (k - 1)[3) . ( ) 1 + rx[3 k! Po,

k = 0,1,2, ... where

Po = IP'(X = 0) = (1 + rx[3)-I/p.

(c) (Laplace distribution) 1

fx(x) = 2rx e-([x-a l)/ ••

(d) (Lognormal distribution)

1 [(lOg x - rx)2] fx(x) = [3xfo exp 2[32 ' x > O.

6. (Gnedenko [23]) (a) The density function of the magnitude of the velocity of a molecule is

given by the Maxwell distribution

4x2 fx(x) = __ e- x2/. 2 x> 0,

rx3Jn ' where rx > O. Find the average speed and the average kinetic energy of a molecule (the mass of a molecule is m), and the variances of the speed and kinetic energy.

(b) The probability density of the distance x from a reflecting wall at which a molecule in Brownian motion will be found at time to + t if it was at a distance of Xo from the wall at time to is given by the expression

f( 1 { [(X + xo)2] [(X - xo)2]} x) = 2fiiii exp 4Dt + exp - 4Dt ' x 2': O.

Find the expectation and variance of the magnitude of the displacement of the molecule during the time from to to to + t.

I. Univariate Random Variables

Gnedenko [23]) a) A random variable X is normally distributed. Find EIX - ml, where

m=EX. b) Let X be the number of occurences of an event A in n independent trials in

each of which IP'(A) = p. Find EX3, EX4, and EIX - npl.

Gnedenko [23]) a) Find the characteristic function corresponding to each of the probability

density functions.

a f(x) = _e-a1xl

2 .

a f(x) = n(a2 + x 2)"

a-Ixl f(x) =--2-'

a I~I::;; a.

ax 2 sin2 "2

f(x) = nax2 .

b) Find the probability distribution of each of the random variables whose characteristic function is equal to

cp(u) = cos u.

cp(u) = cos2 u.

a cp(u) = --..

a+ IU

sin au cp(u) = --.

au