chapter 4: continuous random variables...chapter 4: continuous random variables 4.1 introduction...

Chapter 4: CONTINUOUS

RANDOM VARIABLES

4.1 Introduction

Reminder: a rv is said to be continuous if its

cdf is a continuous function .

If the function FX(x) = Pr(X ≤ x) of x is

continuous, what is Pr(X = x)?

Pr(X = x) = Pr(X ≤ x)− Pr(X < x)

= 0, by continuity

A continuous random variable does not

possess a probability function .

Probability cannot be assigned to individual

values of x; instead, probability is assigned to

intervals. [Strictly, half-open intervals]

Consider the events {X ≤ a} and {a < X ≤ b}.These events are mutually exclusive, and

{X ≤ a} ∪ {a < X ≤ b} = {X ≤ b} .

So the addition law of probability (axiom A3)

gives:

Pr(X ≤ b) = Pr(X ≤ a) + Pr(a < X ≤ b) ,

or Pr(a < X ≤ b) = Pr(X ≤ b)− Pr(X ≤ a)

= FX(b)− FX(a) .

So, given the cdf for any continuous random

variable X, we can calculate the probability

that X lies in any interval (a, b] .

Note: The probability Pr(X = a) that a

continuous rv X is exactly a is 0. Because of

this, we often do not distinguish between

open, half-open and closed intervals for

continous rvs.

Example: In §3.4, we gave an example of a

continuous cdf:

FY (y) =

0 , y < 0 ,

1− e−y , y ≥ 0 .

This is the cdf of what is termed the

exponential distribution with mean 1.

For the case of that distribution, we can find

Pr(Y ≤ 1) = FY (1) = 1− e−1 = 0.6322

Pr(2 ≤ Y ≤ 3) = FY (3)− FY (2)

= (1− e−3)− (1− e−2) = 0.0856

Pr(Y ≥ 2.5) = FY (∞)− FY (2.5)

= 1− (1− e−2.5) = 0.0821

4.2 Probability density function

If X is continuous, then Pr(X = x) = 0.

But what is the probability that ‘X is close to

some particular value x?’.

Consider Pr(x < X ≤ x + h ), for small h.

Recall:d FX(x)

dx'

FX(x + h)− FX(x)

h.

So Pr(x < X ≤ x + h) = FX(x + h)− FX(x)

' hd FX(x)

dx.

DEFINITION: The derivative (w.r.t. x) of

the cdf of a continous rv X is called the

probability density function of X.

The probability density function is the limit

of

Pr(x < X ≤ x + h)

has h → 0 .

The probability density function

Alternative names: pdf,

density function,

density.

Notation for pdf: fX(x)

Recall: The cdf of X is denoted by FX(x)

Relationship: fX(x) =d FX(x)

dx

Care needed: Make sure f and F cannot be

confused!

Interpretation

• When multiplied by a small number h,

the pdf gives, approximately, the probability

that X lies in a small interval, length h, close

to x.

• If, for example, fX(4) = 2 fX(7), then

X occurs near 4 twice as often as near 7.

Properties of probability density functions

Because the pdf of a rv X is the derivative of

the cdf of X, it follows that

• fX(x) ≥ 0, for all x,

•∫ ∞−∞

fX(x) dx = 1,

• FX(x) =∫ x

−∞fX(y)dy,

• Pr(a < X ≤ b) =∫ b

afX(x)dx.

4.3 Mean and Variance

Reminder: for a discrete rv, the formulae formean and variance are based on theprobability function Pr(X = x). We need toadapt these formulae for use with continuousrandom variables.

DEFINITION:For a continuous rv X with pdf fX(x), theexpectation of a function g(x) is defined as

E{g(X)} =∫ ∞−∞

g(x) fX(x) dx

Hence, for the mean :

E(X) =∫ ∞−∞

x fX(x) dx

Compare this with the equivalent definitionfor a discrete random variable:

E(X) =∑x

xPr(X = x) , or E(X) =∑x

xpX(x) .

For the variance, recall the definition.

Var(X) = E[{X − E(X)}2]

Hence Var(X) =∫ ∞−∞

(x− µ)2 fX(x) dx

As in the discrete case, the best way to

calculate a variance is by using the result:

Var(X) = E(X2)− {E(X)}2 .

In practice, we therefore usually calculate

E(X2) =∫ ∞−∞

x2 fX(x) dx

as a stepping stone on the way to obtaining

Var(X).

4.4 The Uniform Distribution

Distribution of a rv which is equally likely to

take any value in its range, say a to b (b > a).

The pdf is constant:6

-

a b

fX(x)

1b−a

Because fX(x) is constant over [a, b] and∫ ∞−∞

fX(x) dx =∫ b

afX(x) dx = 1,

fX(x) =

1

b−a , a < x < b ,

0 elsewhere.

Uniform Distribution: cdf

For this distribution the cumulative

distribution function (cdf) is

FX(x) =∫ x

−∞fX(y) dy

=

0 , x < a ,

x−ab−a , a ≤ x ≤ b ,

1 , x > b .

6

-��

��

��

a b

FX(x)

1

0

Uniform Distribution: Mean and Variance

E(X) = µ =∫ b

ax

1

b− adx

= 12(a + b).

Var(X) = σ2 = E(X2)− µ2

=∫ b

ax2 1

b− adx−

(a + b)2

4

=1

12(b− a)2.

For example, if a random variable is uniformly

distributed on the range (20,140), then

a = 20 and b = 140, so the mean is 80 . The

variance is 1200 , so the standard deviation is

34.64.

4.5 The exponential distribution

A continuous random variable X is said tohave an exponential distribution if its range is(0,∞) and its pdf is proportional to e−λx, forsome positive λ.

That is,

fX(x) =

0 , x < 0 ,

ke−λx , x ≥ 0 ,

for some constant k. To evaluate k, we usethe fact that all pdfs must integrate to 1 .

Hence ∫ ∞−∞

fX(x) dx =∫ ∞0

ke−λx dx

=k

λ

[−e−λx

]∞0

=k

λ

Since this must equal 1, k = λ.

Properties of the exponential distribution

The distribution has pdf

fX(x) =

λe−λx, x ≥ 0 ,

0 , x < 0 .

and its cdf is given by

FX(x) =∫ x

0λe−λy dy

= 1− e−λx, x > 0 .

Mean and Variance

E(X) =∫ ∞0

x λe−λx dx =1

λ.

For the variance, we use integration by partsto obtain

E(X2) =∫ ∞0

x2 λe−λx dx =2

λ2.

Hence Var(X) = E(X2)− {E(X)}2

=2

λ2−(1

λ

)2=

1

λ2.

Applications

The exponential distribution is often used tomodel the lengths of gaps between eventsoccurring haphazardly (that is, quite atrandom, and with no memory) in time.• births in a hospital• passage of cars along a road• arrival of ships at a terminal

There are close links with the Poissondistribution, which (see §3.8) is used tomodel the number of such events occurring ina fixed time interval.

Let X be the number of events occurring inan interval of length t: then X has thePoisson distribution with mean λt. Let T bethe gap until the first event occurs.Then the events {X = 0} and {T > t} areidentical. We note that

Pr(X = 0) = e−λt

Pr(T > t) = 1− FT (t) = 1− (1− e−λt) = e−λt.

4.6 The Normal Distribution

DEFINITION: A random variable X withprobability density function

fX(x) =1

σ√

2πe− (x−µ)2

2σ2 ,

for all x, is said to have the Normaldistribution with parameters µ and σ2.

We show later that E(X) = µ, Var(X) = σ2.

We write: X ∼ N(µ, σ2) .

Shape of the density function (pdf):

The pdf is symmetrical about x = µ .It has a single mode at x = µ.

It has points of inflection at x = µ± σ.

‘A bell-shaped curve,’ tails off rapidly.

Scaling of the pdf

The function fX(x) = 1σ√

2πe− (x−µ)2

2σ2 must

integrate to 1 over (−∞,∞) if it is to be a

valid pdf. The proof that it does so is tricky,

and beyond the scope of this course. But it

can be shown that∫ ∞−∞

e− (x−µ)2

2σ2 dx = σ√

2π

as is required.

Cumulative distribution function

If X ∼ N(µ, σ2), the cdf of X is the integral:

FX(x) =∫ x

−∞

1

σ√

2πe− (x−µ)2

2σ2 dx.

This cannot be evaluated analytically.

Numerical integration is necessary: extensive

tables are available.

The Standardised Normal Distribution

The Normal distribution with mean 0 andvariance 1 is known as the standardisedNormal distribution (SND). We usuallydenote a random variable with thisdistribution by Z. Hence

Z ∼ N(0,1).

Special notation φ(z) is used for the pdf ofN(0,1). We write

φ(z) =1√2π

e−12z2

, −∞ < z < ∞.

The cdf of Z is denoted by Φ(z) . We write

Φ(z) =∫ z

−∞φ(x) dx

=∫ z

−∞

1√2π

e−12x2

dx

Tables of Φ(z) are available in statisticaltextbooks and computer programs.

Brief extract from a table of the SND

Z Φ(z)

0.0 0.50000.5 0.69151.0 0.84131.5 0.93322.0 0.9772

Tables in textbooks and elsewhere containvalues of Φ(z) for z = 0, 0.01, 0.02, and soon, up to z = 4.0 or further.

But the range of Z is (−∞,∞), so we needvalues of Φ(z) for z < 0 . To obtain thesevalues we use the fact that the pdf of N(0,1)is symmetrical about z = 0.This means that

Φ(z) = 1−Φ(−z).

This equation can be used to obtain Φ(z) fornegative values of z.For example,Φ(−1.5) = 1− 0.9332 = 0.0668.

Examples

1. If Z ∼ N(0,1), find Pr(−0.5 < Z ≤ 1.5).

Pr(−0.5 < Z ≤ 1.5) = Φ(1.5)−Φ(−0.5)

= Φ(1.5)− {1−Φ(0.5)}= 0.9332− {1− 0.6915}= 0.6247 .

2. Evaluate C = Pr(1 ≤ Z2 ≤ 4).

The event (1 ≤ Z2 ≤ 4) is the union of the

two events (−2 ≤ Z ≤ −1) and (1 ≤ Z ≤ 2) .

Since these two events are mutually

exclusive , the probability of their union is the

sum of the probabilities (axiom A3). Hence

C = Pr(−2 ≤ Z ≤ −1) + Pr(1 ≤ Z ≤ 2)

= {Φ(−1)−Φ(−2)}+ {Φ(2)−Φ(1)}

= 2(0.9772− 0.8413)

= 0.2718 .

Properties of the normal distribution

1. Transformation to SND, N(0,1).

Consider a rv X ∼ N(µ, σ2). Its cdf is

FX(x) =∫ x

−∞

1

σ√

2πe−(y−µ)2

2σ2 dy.

Substituting z = (y − µ)/σ , we obtain

FX(x) =∫ x−µ

σ

−∞

1√2π

e−12z2

dz = Φ(x− µ

σ) .

This important result FX(x) = Φ(x−µσ )

means that the cdf of any normal distributioncan be obtained from tables of N(0,1).

Example:X ∼ N(10,16). Find Pr(8 < X ≤ 18).

Since σ = 4, the required probability is

Φ(18− 10

4

)−Φ

(8− 10

4

)

= Φ(2.0)−Φ(−0.5)

= 0.9772− (1− 0.6915) = 0.6687.

2. General Linear Transformations

THEOREM: If X ∼ N(µ, σ2) and Y = X−ab ,

where b > 0, then Y ∼ N

(µ− a

b,

σ2

b2

).

PROOF:

Pr(Y ≤ y) = Pr(

X − a

b≤ y

)= Pr(X ≤ a + by), since b > 0 ,

= Φ(

a + by − µ

σ

)

= Φ

y − µ−ab

σ/b

.

Hence Y is normally distributed with

E(Y ) =µ− a

b; Var(Y ) =

σ2

b2

that is, Y ∼ N

(µ− a

b,

σ2

b2

).

Reminder: If X ∼ N(µ, σ2), and Y = X−ab ,

where b > 0, then Y ∼ N(

µ−ab , σ2

b2

).

An important special case occurs when

a = µ and b = σ , in which case

Z =X − µ

σ∼ N(0,1).

The process of subtracting the mean and

then dividing by the standard deviation σ is

known as standardisation .

Summary: If X ∼ N(µ, σ2) , then

(i)X − a

b∼ N

(µ− a

b,σ2

b2

)

(ii) cX + d ∼ N(cµ + d, c2σ2)

(iii) Z =X − µ

σ∼ N(0,1).

Alternative approach to example:

X ∼ N(10,16); require Pr(8 < X ≤ 18).

We previously used the fact that

FX(x) = Φ(x−µσ ).

The required probability is FX(18)− FX(8),

which is

Φ(18− 10

4

)−Φ

(8− 10

4

).

An alternative is to work as follows:

Pr(8 < X ≤ 18) = Pr(8− µ

σ<

X − µ

σ≤

18− µ

σ

).

We then note that the term X−µσ is just Z ,

and work with the SND, substituting µ = 10

and σ = 4.

3. Mean of X, where X ∼ N(µ, σ2).

We will use the fact that the distribution ofZ ∼ N(0,1) is symmetrical about z = 0, sothat φ(z) = φ(−z) , and

zφ(z) = −(−z)φ(−z) .

E(X) =∫ ∞−∞

xfX(x)dx

=∫ ∞−∞

x1

σ√

2πe−(x−µ)2

2σ2 dx .

Substituting z =x− µ

σgives

E(X) =∫ ∞−∞

(µ + σz)1√2π

e−z22 dz

= µ∫ ∞−∞

φ(z)dz + σ∫ ∞−∞

zφ(z)dz

= µ× 1 + σ × 0 = µ.

[Since zφ(z) = −(−z)φ(−z), the parts of thesecond integral for negative and positive xcancel.]

4. Variance of X, where X ∼ N(µ, σ2)

Consider

E(X2) =∫ ∞−∞

x2fX(x)dx

Substituting z = x−µσ , we obtain

E(X2) =∫ ∞−∞

(µ + σz)2φ(z)dz

= µ2 · 1 + 2µσ · 0 + σ2∫ ∞−∞

z2φ(z)dz

= µ2 + σ2 × 1.

[Note: This last result can be obtained usingintegration by parts (see, for example, Meyerp.168). The treatment in Clarke & Cookep. 352 may also be useful.]

Hence Var(X) = E(X2)− {E(X)}2

= µ2 + σ2 − µ2

= σ2.

So, if X ∼ N(µ, σ2), E(X) = µ, Var(X) = σ2.

5. Importance of the Normal distribution

• It is mathematically very convenient ,

so it is tempting to use it

• it occurs (at least approximately)

frequently in practice

• there is some theoretical justification

via the

Central Limit Theorem.

Basic result (not proved here): if Y1, . . . , Yn

are random variables, then under widely

occurring conditions the distribution of

X =n∑

i=1

Yi

when suitably scaled, converges to a Normal

distribution as n →∞.

Condition: no subset of the Y s dominates the

rest.

4.7 Normal Approximations

Consider a binomial rv X, with index n andparameter p. We have already modelled sucha random variable: if we have n Bernoullitrials with probability p of success, then thetotal number of successes X ∼ B(n, p).

Suppose that

Yi =

1 if trial i results in success,

0 if trial i results in failure,

then we can write

X =n∑

i=1

Yi.

If n is large, the Y s satisfy the conditions forX to be approximately Normally distributed.

Note: X is discrete, but the Normaldistribution is continuous.

We must also consider which Normaldistribution to use as an approximation to thebinomial.

Recall: X ∼ B(n, p):

E(X) = np,

Var(X) = npq (writing q = 1− p).

Consider the rv W ∼ N(np, npq).

The means and variances of X and W match,

but X is discrete and W is continuous; a

continuity correction is needed.

To derive this, we pretend that X (discrete)

really results from rounding an imaginary

continuous rv V to the nearest integer;

that is, Pr(X = x) = Pr(x− 1

2 ≤ V < x + 12

).

We now treat W as an approximation to V ,

not to X. So the approximation used is:

Pr(X = x) ' Φ

x + 12 − np

√npq

−Φ

x− 12 − np√

npq

.

Example: Toss 100 fair coins, and let X be

the number of heads. Calculate Pr(X = 50)

Since X ∼ B(100, 12), E(X) = 50 and

Var(X) = 25. So we use W ∼ N(50,25) as

the Normal approximation.

We therefore require

Φ

50 + 12 − 50

√25

−Φ

50− 12 − 50

√25

= Φ(0.1)−Φ(−0.1) = 0.0796.

The exact answer is

Pr(X = 50) =(100

50

)(12)

50(1−12)

50 = 0.07958 .

The corresponding calculations for Pr(X ≤ x)are very easy:

Pr(X ≤ x) corresponds to Pr(V ≤ x + 12).

So we use Pr(X ≤ x) ' Φ

x + 12 − np

√npq

.

Example: Toss 100 fair coins, and let X bethe number of heads. Find Pr(41 ≤ X ≤ 60).

Since X ∼ B(100, 12), E(X) = 50 and

Var(X) = 25, we again use W ∼ N(50,25) asthe Normal approximation.

Now {X ≤ 60} corresponds to {V ≤ 60.5} .

Also {X ≥ 41} corresponds to {V > 40.5} .

The approximation to Pr(41 ≤ X ≤ 60) istherefore

Φ

60 + 12 − 50

5

−Φ

41− 12 − 50

5

= Φ(2.1)−Φ(−1.9) = 0.9534.

Notes on Normal approximations

1. Validity of approximation

From the Central Limit Theorem, we expectthat the index n of the binomial distributionmust be ‘large’.

Recall that the range of B(n, p) is{0,1,2, . . . , n}. The range of the Normaldistribution is (−∞,∞). We would expect theapproximation to be poor if Pr(X = 0) andPr(X = n) are not both very small.

Rule of thumb: Using the Normal distributionN(np, npq) as an approximation to B(n, p) isgenerally considered to be acceptable if

np > 5 and nq = n(1− p) > 5 .

Example: Consider the case n = 10, p = 12.

Exact:Pr(X = 3) =

(103

).(1

2)3(1− 1

2)7 = 0.1172.

Approximation:Pr(X = 3) ' Φ(3.5−5√

2.5)−Φ(2.5−5√

2.5) = 0.1145.

2. Normal approximation to the Poisson

There is a similar Normal approximation tothe Poisson distribution. This is valid if µ isnot too small.

Pr(X = x) ' Φ

x + 12 − µ√

µ

−Φ

x− 12 − µ√

µ

.

Example: Consider the case µ = 25.

Exact: Pr(X = 20) = e−252520

20! = 0.0519.

Approximation:

Pr(X = 20) ' Φ

(20.5− 25√

25

)−Φ

(19.5− 25√

25

)= 0.1841− 0.1357 = 0.0484.

Exact: Pr(X ≤ 20) =20∑

x=0

e−2525x

x!= 0.1855.

Approximation:

Pr(X ≤ 20) ' Φ

(20.5− 25√

25

)= 0.1841.

CHAPTER 4 SUMMARY

Continuous random variables

• A rv X is continuous if its cdf FX(x) is acontinuous function.

• A continuous rv does not have a pf.

• Probability is assigned to intervals.

• pdf: fX(x) = ddxFX(x)

• Interpretation of pdf and variousproperties.

• Mean and variance: formulae.

• Distributions: uniform and exponential.

CHAPTER 4 SUMMARY

continued

• The Normal distribution:

– notation N(µ, σ2).

– standardisation, tables, φ(z) and Φ(z).

– mean = µ, variance = σ2.

– linear transformations, Z = X−µσ .

– importance, central limit theorem.

– Normal approximation to the binomial

and Poisson.

chapter 4: continuous random variables...chapter 4: continuous random variables 4.1 introduction...

Documents