epidemiology 9509 - principles of biostatistics chapter 5 probability...
TRANSCRIPT
Epidemiology 9509 Probability Distributions
Epidemiology 9509Principles of Biostatistics
Chapter 5Probability Distributions
John Koval
Department of Epidemiology and BiostatisticsUniversity of Western Ontario
1
Epidemiology 9509 Probability Distributions
why do we need probability distributions
◮ your population study/clinical trial has gone from apopulation to a sample;has gone from the population parameters to sample statistics
◮ In order to do inference,we need to go from the sample statistics to the populationparameters
◮ there is a mathematical concept called probability distribution
which also goes from parameters to data
◮ later we will go beyond probability distributionsto enable us to do inference
◮ first we have to learn how samples are producedprobabilistically
2
Epidemiology 9509 Probability Distributions
what we are learning today
1. several discrete distributions
2. several continuous distributions
3. using a continuous distributionto approximate a discrete one
3
Epidemiology 9509 Probability Distributions
Probability distributions
Random variable, Xproduces a value according to the rulesof a probability distribution
Here are some probability distributions
1. Discrete
1.1 equiprobable1.2 Bernoulli1.3 Binomial1.4 Poisson
2. Continuous
2.1 uniform2.2 normal
4
Epidemiology 9509 Probability Distributions
Equiprobable DistributionN outcomes; P( any outcome ) = 1/Nusually outcomes =1,2,...,Neg N=2
Figure 5.1 Equiprobable distribution (N=2)
Probability
Figure 4.2: Equiprobable Distribution (N=2)
1 2
0.5
Outcome5
Epidemiology 9509 Probability Distributions
Equiprobable Distribution (N=6)
Figure 5.2 Equiprobable distribution (N=6)
Probability
Figure 4.3: Equiprobable Distribution (N=6)
1/6
1 2 3 4 5 6
Outcome
6
Epidemiology 9509 Probability Distributions
Bernoulli Distribution
Jacob Bernoulli (1654-1705) - Swiss
probability of ”success” π not 0.5unfair coin
eg, π = 0.4Pr(X = 1) = 0.4Pr(X = 0) = 0.6
in general Pr(X = x) = πx(1− π)1−x
7
Epidemiology 9509 Probability Distributions
Bernoulli Distribution (continued)
Figure 5.3 Bernoulli Distribution (π = 0.4)Probability
0
0.6
0.2
0.4
1
Outcome
8
Epidemiology 9509 Probability Distributions
Binomial Distribution
Sample of size nx successes
Pr(X = x) =n Cxπx(1− π)n−x
for examplePr(X = 2) = 10(9)
2(1) (0.4)2(0.6)8
= 45(0.16)(0.1679616) = 0.121
9
Epidemiology 9509 Probability Distributions
Binomial Distribution B(10,0.4)
x Pr(X=x)
0 0.0061 0.0402 0.1213 0.2154 0.2515 0.2016 0.1117 0.0428 0.0119 0.00210 0.000
10
Epidemiology 9509 Probability Distributions
B(10,0.4)
Figure 5.4 Binomial Distribution (n=10, π = 0.4)
0.15
0.10
0.00
0.25
0.20
0.05
8 9765430 1 2 10
Probability
11
Epidemiology 9509 Probability Distributions
Moments of a Probability Distribution
µ = E (X )=
∑
n
i=1 xiPr(X = xi)= nπ
σ2 = Var(X )= E (X − µ)2
=∑
n
i=1(xi − µ)2Pr(X = xi)= nπ(1− π)
For our example,µ = nπ = 10(0.4) = 4σ2 = nπ(1− π) = 10(0.4)(0.6) = 2.4σ =
√σ2 =
√2.4 ≈ 1.55
12
Epidemiology 9509 Probability Distributions
Poisson Distribution
Simeon Poisson (1781-1840)
ordinal; frequencies
number of MIs (myocardial infarctions)
Pr(X = x) = e−µ
µx
x!where µ is the average
σ2 = Var(X ) = µFor µ = 4 we have
13
Epidemiology 9509 Probability Distributions
Poisson Distribution P(4)
x Pr(X=x)
0 0.0181 0.0732 0.1473 0.1954 0.1955 0.1566 0.1047 0.0608 0.0309 0.01310 0.00511 0.00212 0.001
14
Epidemiology 9509 Probability Distributions
P(4)
Figure 5.5 Poisson Distribution (µ = 4)
0.15
0.10
0.00
0.25
0.20
0.05
8 9765430 1 2 10
Probability
15
Epidemiology 9509 Probability Distributions
Continuous distributions
as if N were a very large numberPr(X = 20) = .00000000001
compute probabilities of intervalseg Pr(20 ≤ X < 21)eg Pr(X < 21)
with Probability Density Function, f (x)Pr(a < X < b) =area under curve above interval (a,b)
16
Epidemiology 9509 Probability Distributions
Uniform distribution
Figure 5.6 Uniform Distribution (0,1)
0 1
1.00
17
Epidemiology 9509 Probability Distributions
Uniform distribution (continued)
Probability of every interval of width, w , is the sameFigure 5.7 Uniform Distribution (0,1) probabilities
0 1
1.00
0.2 0.50.3
Pr(0 < X ≤ 0.2) = 0.2 = Pr(0.3 < X ≤ 0.5)
18
Epidemiology 9509 Probability Distributions
Moments of Continuous Probability Distributions
E (X ) =∫
∞
−∞xf (x)dx
Var(X ) = E (((X − E (X ))2) =∫
∞
−∞(x − E (X ))2f (x)dx
19
Epidemiology 9509 Probability Distributions
Moments of Uniform Distribution
Mean: E (X ) = 12 = 0.5
Variance: Var(X ) = 112 = 0.0833
Standard deviation: sd(X ) =√
Var(X ) =√0.0833 = 0.289
Note that ”all” of the distribution is contained in the interval
E (X )± 2 ∗ sd(X )= 0.5 ± 2(0.289)= (−0.078, 1.078)
20
Epidemiology 9509 Probability Distributions
The Normal distributionGaussian
f (x) =1
σ√2π
exp−1
2
(
x − µ
σ
)2
mean (E(X)) is µ, variance (Var(X)) is σ2
Figure 5.9: Normal distribution N(µ, σ2)
µ21
Epidemiology 9509 Probability Distributions
Standardized Normal N(0,1)
Figure 5.10: Normal distribution N(0, 1)
0.0
22
Epidemiology 9509 Probability Distributions
Normal probabilities
Pr(X < b) = area under the curve above (−∞,b)Pr(a < X < b) = area under the curve above (a,b)
23
Epidemiology 9509 Probability Distributions
example
Normal distribution Age ∼ N(20,25)Figure 5.11: Calculating Pr(X < 22) for N(20,25)
20.0
22.0
24
Epidemiology 9509 Probability Distributions
standardization
don’t have tables for N(20,25)have tables for N(0,1)
Figure 5.12: Calculating Pr(X < 0.4) forN(0, 1)
0.0
0.4
25
Epidemiology 9509 Probability Distributions
STANDARDIZE
ZN = X−µ
σ
Pr(X < 22) = Pr(
ZN < 22−205
)
= Pr(ZN < 0.4)
= 1− z0.4= 1− 0.3446(Appendix A, Table A.1, page A.2, row 6,entitled ”0.4”, first column)
= 0.6554
26
Epidemiology 9509 Probability Distributions
example two
Figure 5.13: Calculating Pr(17 < X < 22)
20.0
22.017.0
27
Epidemiology 9509 Probability Distributions
example two (continued)
Pr(17 < X < 22) = Pr(
17−205 < ZN < 22−20
5
)
= Pr(−0.6 < ZN < 0.4)= Pr(ZN < 0.4)− Pr(ZN < −0.6)= (1− z0.4)− (1− z(−0.6))= (1− z0.4)− (1− (1− z0.6))= (1− z0.4)− (z0.6))
= 0.6554 − 0.2743 = 0.3809
28
Epidemiology 9509 Probability Distributions
Normal approximation to binomial
For large n, binomial looks like Normalnπ ≥ 5, n(1 − π) ≥ 5
lets try n = 20, π = 0.4 E (X ) = µ = nπ = 20(0.4) = 8
Var(X ) = σ2 = nπ(1− π) = 20(0.4)(0.6) = 4.8
29
Epidemiology 9509 Probability Distributions
Figure 5.14 Normal approximation to Bin(20,0.4)
0.15
0.10
0.00
0.25
0.20
0.05
Probability
2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Normal Approximation
Binomial
30
Epidemiology 9509 Probability Distributions
calculation
Pr(Xbin ≤ 7) ≈ Pr(Xnorm < 7.5)Figure 5.15 Important block in normal approximation
0.15
0.10
0.00
0.25
0.20
0.05
Probability
2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Normal Approximation
Binomial
31
Epidemiology 9509 Probability Distributions
calculation (continued)
= Pr(ZN <(
7.5−82.19
)
= Pr(ZN < −0.228)= 1− z(−0.228)
= 1− (1− z0.228)= z0.228
by linear interpolation in Table A.1, page A.2= 4
5z0.23 +15z0.22
= 45(0.4090) +
15(0.4129) = 0.4098
if we use the program R to calculate z0.228,we get 0.4098 (my linear interpolation is good)
if we use R to calculate the binomial exactly,we get 0.4159(the normal approximation is good)
32