chapter 4: continuous random variables...chapter 4: continuous random variables 4.1 introduction...
TRANSCRIPT
Chapter 4: CONTINUOUS
RANDOM VARIABLES
4.1 Introduction
Reminder: a rv is said to be continuous if its
cdf is a continuous function .
If the function FX(x) = Pr(X ≤ x) of x is
continuous, what is Pr(X = x)?
Pr(X = x) = Pr(X ≤ x)− Pr(X < x)
= 0, by continuity
A continuous random variable does not
possess a probability function .
Probability cannot be assigned to individual
values of x; instead, probability is assigned to
intervals. [Strictly, half-open intervals]
Consider the events {X ≤ a} and {a < X ≤ b}.These events are mutually exclusive, and
{X ≤ a} ∪ {a < X ≤ b} = {X ≤ b} .
So the addition law of probability (axiom A3)
gives:
Pr(X ≤ b) = Pr(X ≤ a) + Pr(a < X ≤ b) ,
or Pr(a < X ≤ b) = Pr(X ≤ b)− Pr(X ≤ a)
= FX(b)− FX(a) .
So, given the cdf for any continuous random
variable X, we can calculate the probability
that X lies in any interval (a, b] .
Note: The probability Pr(X = a) that a
continuous rv X is exactly a is 0. Because of
this, we often do not distinguish between
open, half-open and closed intervals for
continous rvs.
Example: In §3.4, we gave an example of a
continuous cdf:
FY (y) =
0 , y < 0 ,
1− e−y , y ≥ 0 .
This is the cdf of what is termed the
exponential distribution with mean 1.
For the case of that distribution, we can find
Pr(Y ≤ 1) = FY (1) = 1− e−1 = 0.6322
Pr(2 ≤ Y ≤ 3) = FY (3)− FY (2)
= (1− e−3)− (1− e−2) = 0.0856
Pr(Y ≥ 2.5) = FY (∞)− FY (2.5)
= 1− (1− e−2.5) = 0.0821
4.2 Probability density function
If X is continuous, then Pr(X = x) = 0.
But what is the probability that ‘X is close to
some particular value x?’.
Consider Pr(x < X ≤ x + h ), for small h.
Recall:d FX(x)
dx'
FX(x + h)− FX(x)
h.
So Pr(x < X ≤ x + h) = FX(x + h)− FX(x)
' hd FX(x)
dx.
DEFINITION: The derivative (w.r.t. x) of
the cdf of a continous rv X is called the
probability density function of X.
The probability density function is the limit
of
Pr(x < X ≤ x + h)
has h → 0 .
The probability density function
Alternative names: pdf,
density function,
density.
Notation for pdf: fX(x)
Recall: The cdf of X is denoted by FX(x)
Relationship: fX(x) =d FX(x)
dx
Care needed: Make sure f and F cannot be
confused!
Interpretation
• When multiplied by a small number h,
the pdf gives, approximately, the probability
that X lies in a small interval, length h, close
to x.
• If, for example, fX(4) = 2 fX(7), then
X occurs near 4 twice as often as near 7.
Properties of probability density functions
Because the pdf of a rv X is the derivative of
the cdf of X, it follows that
• fX(x) ≥ 0, for all x,
•∫ ∞−∞
fX(x) dx = 1,
• FX(x) =∫ x
−∞fX(y)dy,
• Pr(a < X ≤ b) =∫ b
afX(x)dx.
4.3 Mean and Variance
Reminder: for a discrete rv, the formulae formean and variance are based on theprobability function Pr(X = x). We need toadapt these formulae for use with continuousrandom variables.
DEFINITION:For a continuous rv X with pdf fX(x), theexpectation of a function g(x) is defined as
E{g(X)} =∫ ∞−∞
g(x) fX(x) dx
Hence, for the mean :
E(X) =∫ ∞−∞
x fX(x) dx
Compare this with the equivalent definitionfor a discrete random variable:
E(X) =∑x
xPr(X = x) , or E(X) =∑x
xpX(x) .
For the variance, recall the definition.
Var(X) = E[{X − E(X)}2]
Hence Var(X) =∫ ∞−∞
(x− µ)2 fX(x) dx
As in the discrete case, the best way to
calculate a variance is by using the result:
Var(X) = E(X2)− {E(X)}2 .
In practice, we therefore usually calculate
E(X2) =∫ ∞−∞
x2 fX(x) dx
as a stepping stone on the way to obtaining
Var(X).
4.4 The Uniform Distribution
Distribution of a rv which is equally likely to
take any value in its range, say a to b (b > a).
The pdf is constant:6
-
a b
fX(x)
1b−a
Because fX(x) is constant over [a, b] and∫ ∞−∞
fX(x) dx =∫ b
afX(x) dx = 1,
fX(x) =
1
b−a , a < x < b ,
0 elsewhere.
Uniform Distribution: cdf
For this distribution the cumulative
distribution function (cdf) is
FX(x) =∫ x
−∞fX(y) dy
=
0 , x < a ,
x−ab−a , a ≤ x ≤ b ,
1 , x > b .
6
-�����
�������
���
a b
FX(x)
1
0
Uniform Distribution: Mean and Variance
E(X) = µ =∫ b
ax
1
b− adx
= 12(a + b).
Var(X) = σ2 = E(X2)− µ2
=∫ b
ax2 1
b− adx−
(a + b)2
4
=1
12(b− a)2.
For example, if a random variable is uniformly
distributed on the range (20,140), then
a = 20 and b = 140, so the mean is 80 . The
variance is 1200 , so the standard deviation is
34.64.
4.5 The exponential distribution
A continuous random variable X is said tohave an exponential distribution if its range is(0,∞) and its pdf is proportional to e−λx, forsome positive λ.
That is,
fX(x) =
0 , x < 0 ,
ke−λx , x ≥ 0 ,
for some constant k. To evaluate k, we usethe fact that all pdfs must integrate to 1 .
Hence ∫ ∞−∞
fX(x) dx =∫ ∞0
ke−λx dx
=k
λ
[−e−λx
]∞0
=k
λ
Since this must equal 1, k = λ.
Properties of the exponential distribution
The distribution has pdf
fX(x) =
λe−λx, x ≥ 0 ,
0 , x < 0 .
and its cdf is given by
FX(x) =∫ x
0λe−λy dy
= 1− e−λx, x > 0 .
Mean and Variance
E(X) =∫ ∞0
x λe−λx dx =1
λ.
For the variance, we use integration by partsto obtain
E(X2) =∫ ∞0
x2 λe−λx dx =2
λ2.
Hence Var(X) = E(X2)− {E(X)}2
=2
λ2−(1
λ
)2=
1
λ2.
Applications
The exponential distribution is often used tomodel the lengths of gaps between eventsoccurring haphazardly (that is, quite atrandom, and with no memory) in time.• births in a hospital• passage of cars along a road• arrival of ships at a terminal
There are close links with the Poissondistribution, which (see §3.8) is used tomodel the number of such events occurring ina fixed time interval.
Let X be the number of events occurring inan interval of length t: then X has thePoisson distribution with mean λt. Let T bethe gap until the first event occurs.Then the events {X = 0} and {T > t} areidentical. We note that
Pr(X = 0) = e−λt
Pr(T > t) = 1− FT (t) = 1− (1− e−λt) = e−λt.
4.6 The Normal Distribution
DEFINITION: A random variable X withprobability density function
fX(x) =1
σ√
2πe− (x−µ)2
2σ2 ,
for all x, is said to have the Normaldistribution with parameters µ and σ2.
We show later that E(X) = µ, Var(X) = σ2.
We write: X ∼ N(µ, σ2) .
Shape of the density function (pdf):
The pdf is symmetrical about x = µ .It has a single mode at x = µ.
It has points of inflection at x = µ± σ.
‘A bell-shaped curve,’ tails off rapidly.
Scaling of the pdf
The function fX(x) = 1σ√
2πe− (x−µ)2
2σ2 must
integrate to 1 over (−∞,∞) if it is to be a
valid pdf. The proof that it does so is tricky,
and beyond the scope of this course. But it
can be shown that∫ ∞−∞
e− (x−µ)2
2σ2 dx = σ√
2π
as is required.
Cumulative distribution function
If X ∼ N(µ, σ2), the cdf of X is the integral:
FX(x) =∫ x
−∞
1
σ√
2πe− (x−µ)2
2σ2 dx.
This cannot be evaluated analytically.
Numerical integration is necessary: extensive
tables are available.
The Standardised Normal Distribution
The Normal distribution with mean 0 andvariance 1 is known as the standardisedNormal distribution (SND). We usuallydenote a random variable with thisdistribution by Z. Hence
Z ∼ N(0,1).
Special notation φ(z) is used for the pdf ofN(0,1). We write
φ(z) =1√2π
e−12z2
, −∞ < z < ∞.
The cdf of Z is denoted by Φ(z) . We write
Φ(z) =∫ z
−∞φ(x) dx
=∫ z
−∞
1√2π
e−12x2
dx
Tables of Φ(z) are available in statisticaltextbooks and computer programs.
Brief extract from a table of the SND
Z Φ(z)
0.0 0.50000.5 0.69151.0 0.84131.5 0.93322.0 0.9772
Tables in textbooks and elsewhere containvalues of Φ(z) for z = 0, 0.01, 0.02, and soon, up to z = 4.0 or further.
But the range of Z is (−∞,∞), so we needvalues of Φ(z) for z < 0 . To obtain thesevalues we use the fact that the pdf of N(0,1)is symmetrical about z = 0.This means that
Φ(z) = 1−Φ(−z).
This equation can be used to obtain Φ(z) fornegative values of z.For example,Φ(−1.5) = 1− 0.9332 = 0.0668.
Examples
1. If Z ∼ N(0,1), find Pr(−0.5 < Z ≤ 1.5).
Pr(−0.5 < Z ≤ 1.5) = Φ(1.5)−Φ(−0.5)
= Φ(1.5)− {1−Φ(0.5)}= 0.9332− {1− 0.6915}= 0.6247 .
2. Evaluate C = Pr(1 ≤ Z2 ≤ 4).
The event (1 ≤ Z2 ≤ 4) is the union of the
two events (−2 ≤ Z ≤ −1) and (1 ≤ Z ≤ 2) .
Since these two events are mutually
exclusive , the probability of their union is the
sum of the probabilities (axiom A3). Hence
C = Pr(−2 ≤ Z ≤ −1) + Pr(1 ≤ Z ≤ 2)
= {Φ(−1)−Φ(−2)}+ {Φ(2)−Φ(1)}
= 2(0.9772− 0.8413)
= 0.2718 .
Properties of the normal distribution
1. Transformation to SND, N(0,1).
Consider a rv X ∼ N(µ, σ2). Its cdf is
FX(x) =∫ x
−∞
1
σ√
2πe−(y−µ)2
2σ2 dy.
Substituting z = (y − µ)/σ , we obtain
FX(x) =∫ x−µ
σ
−∞
1√2π
e−12z2
dz = Φ(x− µ
σ) .
This important result FX(x) = Φ(x−µσ )
means that the cdf of any normal distributioncan be obtained from tables of N(0,1).
Example:X ∼ N(10,16). Find Pr(8 < X ≤ 18).
Since σ = 4, the required probability is
Φ(18− 10
4
)−Φ
(8− 10
4
)
= Φ(2.0)−Φ(−0.5)
= 0.9772− (1− 0.6915) = 0.6687.
2. General Linear Transformations
THEOREM: If X ∼ N(µ, σ2) and Y = X−ab ,
where b > 0, then Y ∼ N
(µ− a
b,
σ2
b2
).
PROOF:
Pr(Y ≤ y) = Pr(
X − a
b≤ y
)= Pr(X ≤ a + by), since b > 0 ,
= Φ(
a + by − µ
σ
)
= Φ
y − µ−ab
σ/b
.
Hence Y is normally distributed with
E(Y ) =µ− a
b; Var(Y ) =
σ2
b2
that is, Y ∼ N
(µ− a
b,
σ2
b2
).
Reminder: If X ∼ N(µ, σ2), and Y = X−ab ,
where b > 0, then Y ∼ N(
µ−ab , σ2
b2
).
An important special case occurs when
a = µ and b = σ , in which case
Z =X − µ
σ∼ N(0,1).
The process of subtracting the mean and
then dividing by the standard deviation σ is
known as standardisation .
Summary: If X ∼ N(µ, σ2) , then
(i)X − a
b∼ N
(µ− a
b,σ2
b2
)
(ii) cX + d ∼ N(cµ + d, c2σ2)
(iii) Z =X − µ
σ∼ N(0,1).
Alternative approach to example:
X ∼ N(10,16); require Pr(8 < X ≤ 18).
We previously used the fact that
FX(x) = Φ(x−µσ ).
The required probability is FX(18)− FX(8),
which is
Φ(18− 10
4
)−Φ
(8− 10
4
).
An alternative is to work as follows:
Pr(8 < X ≤ 18) = Pr(8− µ
σ<
X − µ
σ≤
18− µ
σ
).
We then note that the term X−µσ is just Z ,
and work with the SND, substituting µ = 10
and σ = 4.
3. Mean of X, where X ∼ N(µ, σ2).
We will use the fact that the distribution ofZ ∼ N(0,1) is symmetrical about z = 0, sothat φ(z) = φ(−z) , and
zφ(z) = −(−z)φ(−z) .
E(X) =∫ ∞−∞
xfX(x)dx
=∫ ∞−∞
x1
σ√
2πe−(x−µ)2
2σ2 dx .
Substituting z =x− µ
σgives
E(X) =∫ ∞−∞
(µ + σz)1√2π
e−z22 dz
= µ∫ ∞−∞
φ(z)dz + σ∫ ∞−∞
zφ(z)dz
= µ× 1 + σ × 0 = µ.
[Since zφ(z) = −(−z)φ(−z), the parts of thesecond integral for negative and positive xcancel.]
4. Variance of X, where X ∼ N(µ, σ2)
Consider
E(X2) =∫ ∞−∞
x2fX(x)dx
Substituting z = x−µσ , we obtain
E(X2) =∫ ∞−∞
(µ + σz)2φ(z)dz
= µ2 · 1 + 2µσ · 0 + σ2∫ ∞−∞
z2φ(z)dz
= µ2 + σ2 × 1.
[Note: This last result can be obtained usingintegration by parts (see, for example, Meyerp.168). The treatment in Clarke & Cookep. 352 may also be useful.]
Hence Var(X) = E(X2)− {E(X)}2
= µ2 + σ2 − µ2
= σ2.
So, if X ∼ N(µ, σ2), E(X) = µ, Var(X) = σ2.
5. Importance of the Normal distribution
• It is mathematically very convenient ,
so it is tempting to use it
• it occurs (at least approximately)
frequently in practice
• there is some theoretical justification
via the
Central Limit Theorem.
Basic result (not proved here): if Y1, . . . , Yn
are random variables, then under widely
occurring conditions the distribution of
X =n∑
i=1
Yi
when suitably scaled, converges to a Normal
distribution as n →∞.
Condition: no subset of the Y s dominates the
rest.
4.7 Normal Approximations
Consider a binomial rv X, with index n andparameter p. We have already modelled sucha random variable: if we have n Bernoullitrials with probability p of success, then thetotal number of successes X ∼ B(n, p).
Suppose that
Yi =
1 if trial i results in success,
0 if trial i results in failure,
then we can write
X =n∑
i=1
Yi.
If n is large, the Y s satisfy the conditions forX to be approximately Normally distributed.
Note: X is discrete, but the Normaldistribution is continuous.
We must also consider which Normaldistribution to use as an approximation to thebinomial.
Recall: X ∼ B(n, p):
E(X) = np,
Var(X) = npq (writing q = 1− p).
Consider the rv W ∼ N(np, npq).
The means and variances of X and W match,
but X is discrete and W is continuous; a
continuity correction is needed.
To derive this, we pretend that X (discrete)
really results from rounding an imaginary
continuous rv V to the nearest integer;
that is, Pr(X = x) = Pr(x− 1
2 ≤ V < x + 12
).
We now treat W as an approximation to V ,
not to X. So the approximation used is:
Pr(X = x) ' Φ
x + 12 − np
√npq
−Φ
x− 12 − np√
npq
.
Example: Toss 100 fair coins, and let X be
the number of heads. Calculate Pr(X = 50)
Since X ∼ B(100, 12), E(X) = 50 and
Var(X) = 25. So we use W ∼ N(50,25) as
the Normal approximation.
We therefore require
Φ
50 + 12 − 50
√25
−Φ
50− 12 − 50
√25
= Φ(0.1)−Φ(−0.1) = 0.0796.
The exact answer is
Pr(X = 50) =(100
50
)(12)
50(1−12)
50 = 0.07958 .
The corresponding calculations for Pr(X ≤ x)are very easy:
Pr(X ≤ x) corresponds to Pr(V ≤ x + 12).
So we use Pr(X ≤ x) ' Φ
x + 12 − np
√npq
.
Example: Toss 100 fair coins, and let X bethe number of heads. Find Pr(41 ≤ X ≤ 60).
Since X ∼ B(100, 12), E(X) = 50 and
Var(X) = 25, we again use W ∼ N(50,25) asthe Normal approximation.
Now {X ≤ 60} corresponds to {V ≤ 60.5} .
Also {X ≥ 41} corresponds to {V > 40.5} .
The approximation to Pr(41 ≤ X ≤ 60) istherefore
Φ
60 + 12 − 50
5
−Φ
41− 12 − 50
5
= Φ(2.1)−Φ(−1.9) = 0.9534.
Notes on Normal approximations
1. Validity of approximation
From the Central Limit Theorem, we expectthat the index n of the binomial distributionmust be ‘large’.
Recall that the range of B(n, p) is{0,1,2, . . . , n}. The range of the Normaldistribution is (−∞,∞). We would expect theapproximation to be poor if Pr(X = 0) andPr(X = n) are not both very small.
Rule of thumb: Using the Normal distributionN(np, npq) as an approximation to B(n, p) isgenerally considered to be acceptable if
np > 5 and nq = n(1− p) > 5 .
Example: Consider the case n = 10, p = 12.
Exact:Pr(X = 3) =
(103
).(1
2)3(1− 1
2)7 = 0.1172.
Approximation:Pr(X = 3) ' Φ(3.5−5√
2.5)−Φ(2.5−5√
2.5) = 0.1145.
2. Normal approximation to the Poisson
There is a similar Normal approximation tothe Poisson distribution. This is valid if µ isnot too small.
Pr(X = x) ' Φ
x + 12 − µ√
µ
−Φ
x− 12 − µ√
µ
.
Example: Consider the case µ = 25.
Exact: Pr(X = 20) = e−252520
20! = 0.0519.
Approximation:
Pr(X = 20) ' Φ
(20.5− 25√
25
)−Φ
(19.5− 25√
25
)= 0.1841− 0.1357 = 0.0484.
Exact: Pr(X ≤ 20) =20∑
x=0
e−2525x
x!= 0.1855.
Approximation:
Pr(X ≤ 20) ' Φ
(20.5− 25√
25
)= 0.1841.
CHAPTER 4 SUMMARY
Continuous random variables
• A rv X is continuous if its cdf FX(x) is acontinuous function.
• A continuous rv does not have a pf.
• Probability is assigned to intervals.
• pdf: fX(x) = ddxFX(x)
• Interpretation of pdf and variousproperties.
• Mean and variance: formulae.
• Distributions: uniform and exponential.
CHAPTER 4 SUMMARY
continued
• The Normal distribution:
– notation N(µ, σ2).
– standardisation, tables, φ(z) and Φ(z).
– mean = µ, variance = σ2.
– linear transformations, Z = X−µσ .
– importance, central limit theorem.
– Normal approximation to the binomial
and Poisson.