epidemiology 9509 - principles of biostatistics chapter 5 probability...

32
Epidemiology 9509 Probability Distributions Epidemiology 9509 Principles of Biostatistics Chapter 5 Probability Distributions John Koval Department of Epidemiology and Biostatistics University of Western Ontario 1

Upload: others

Post on 16-Feb-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Epidemiology 9509 - Principles of Biostatistics Chapter 5 Probability Distributionspublish.uwo.ca/.../chapter5/probability_distribution.pdf · 2012-11-15 · Principles of Biostatistics

Epidemiology 9509 Probability Distributions

Epidemiology 9509Principles of Biostatistics

Chapter 5Probability Distributions

John Koval

Department of Epidemiology and BiostatisticsUniversity of Western Ontario

1

Page 2: Epidemiology 9509 - Principles of Biostatistics Chapter 5 Probability Distributionspublish.uwo.ca/.../chapter5/probability_distribution.pdf · 2012-11-15 · Principles of Biostatistics

Epidemiology 9509 Probability Distributions

why do we need probability distributions

◮ your population study/clinical trial has gone from apopulation to a sample;has gone from the population parameters to sample statistics

◮ In order to do inference,we need to go from the sample statistics to the populationparameters

◮ there is a mathematical concept called probability distribution

which also goes from parameters to data

◮ later we will go beyond probability distributionsto enable us to do inference

◮ first we have to learn how samples are producedprobabilistically

2

Page 3: Epidemiology 9509 - Principles of Biostatistics Chapter 5 Probability Distributionspublish.uwo.ca/.../chapter5/probability_distribution.pdf · 2012-11-15 · Principles of Biostatistics

Epidemiology 9509 Probability Distributions

what we are learning today

1. several discrete distributions

2. several continuous distributions

3. using a continuous distributionto approximate a discrete one

3

Page 4: Epidemiology 9509 - Principles of Biostatistics Chapter 5 Probability Distributionspublish.uwo.ca/.../chapter5/probability_distribution.pdf · 2012-11-15 · Principles of Biostatistics

Epidemiology 9509 Probability Distributions

Probability distributions

Random variable, Xproduces a value according to the rulesof a probability distribution

Here are some probability distributions

1. Discrete

1.1 equiprobable1.2 Bernoulli1.3 Binomial1.4 Poisson

2. Continuous

2.1 uniform2.2 normal

4

Page 5: Epidemiology 9509 - Principles of Biostatistics Chapter 5 Probability Distributionspublish.uwo.ca/.../chapter5/probability_distribution.pdf · 2012-11-15 · Principles of Biostatistics

Epidemiology 9509 Probability Distributions

Equiprobable DistributionN outcomes; P( any outcome ) = 1/Nusually outcomes =1,2,...,Neg N=2

Figure 5.1 Equiprobable distribution (N=2)

Probability

Figure 4.2: Equiprobable Distribution (N=2)

1 2

0.5

Outcome5

Page 6: Epidemiology 9509 - Principles of Biostatistics Chapter 5 Probability Distributionspublish.uwo.ca/.../chapter5/probability_distribution.pdf · 2012-11-15 · Principles of Biostatistics

Epidemiology 9509 Probability Distributions

Equiprobable Distribution (N=6)

Figure 5.2 Equiprobable distribution (N=6)

Probability

Figure 4.3: Equiprobable Distribution (N=6)

1/6

1 2 3 4 5 6

Outcome

6

Page 7: Epidemiology 9509 - Principles of Biostatistics Chapter 5 Probability Distributionspublish.uwo.ca/.../chapter5/probability_distribution.pdf · 2012-11-15 · Principles of Biostatistics

Epidemiology 9509 Probability Distributions

Bernoulli Distribution

Jacob Bernoulli (1654-1705) - Swiss

probability of ”success” π not 0.5unfair coin

eg, π = 0.4Pr(X = 1) = 0.4Pr(X = 0) = 0.6

in general Pr(X = x) = πx(1− π)1−x

7

Page 8: Epidemiology 9509 - Principles of Biostatistics Chapter 5 Probability Distributionspublish.uwo.ca/.../chapter5/probability_distribution.pdf · 2012-11-15 · Principles of Biostatistics

Epidemiology 9509 Probability Distributions

Bernoulli Distribution (continued)

Figure 5.3 Bernoulli Distribution (π = 0.4)Probability

0

0.6

0.2

0.4

1

Outcome

8

Page 9: Epidemiology 9509 - Principles of Biostatistics Chapter 5 Probability Distributionspublish.uwo.ca/.../chapter5/probability_distribution.pdf · 2012-11-15 · Principles of Biostatistics

Epidemiology 9509 Probability Distributions

Binomial Distribution

Sample of size nx successes

Pr(X = x) =n Cxπx(1− π)n−x

for examplePr(X = 2) = 10(9)

2(1) (0.4)2(0.6)8

= 45(0.16)(0.1679616) = 0.121

9

Page 10: Epidemiology 9509 - Principles of Biostatistics Chapter 5 Probability Distributionspublish.uwo.ca/.../chapter5/probability_distribution.pdf · 2012-11-15 · Principles of Biostatistics

Epidemiology 9509 Probability Distributions

Binomial Distribution B(10,0.4)

x Pr(X=x)

0 0.0061 0.0402 0.1213 0.2154 0.2515 0.2016 0.1117 0.0428 0.0119 0.00210 0.000

10

Page 11: Epidemiology 9509 - Principles of Biostatistics Chapter 5 Probability Distributionspublish.uwo.ca/.../chapter5/probability_distribution.pdf · 2012-11-15 · Principles of Biostatistics

Epidemiology 9509 Probability Distributions

B(10,0.4)

Figure 5.4 Binomial Distribution (n=10, π = 0.4)

0.15

0.10

0.00

0.25

0.20

0.05

8 9765430 1 2 10

Probability

11

Page 12: Epidemiology 9509 - Principles of Biostatistics Chapter 5 Probability Distributionspublish.uwo.ca/.../chapter5/probability_distribution.pdf · 2012-11-15 · Principles of Biostatistics

Epidemiology 9509 Probability Distributions

Moments of a Probability Distribution

µ = E (X )=

n

i=1 xiPr(X = xi)= nπ

σ2 = Var(X )= E (X − µ)2

=∑

n

i=1(xi − µ)2Pr(X = xi)= nπ(1− π)

For our example,µ = nπ = 10(0.4) = 4σ2 = nπ(1− π) = 10(0.4)(0.6) = 2.4σ =

√σ2 =

√2.4 ≈ 1.55

12

Page 13: Epidemiology 9509 - Principles of Biostatistics Chapter 5 Probability Distributionspublish.uwo.ca/.../chapter5/probability_distribution.pdf · 2012-11-15 · Principles of Biostatistics

Epidemiology 9509 Probability Distributions

Poisson Distribution

Simeon Poisson (1781-1840)

ordinal; frequencies

number of MIs (myocardial infarctions)

Pr(X = x) = e−µ

µx

x!where µ is the average

σ2 = Var(X ) = µFor µ = 4 we have

13

Page 14: Epidemiology 9509 - Principles of Biostatistics Chapter 5 Probability Distributionspublish.uwo.ca/.../chapter5/probability_distribution.pdf · 2012-11-15 · Principles of Biostatistics

Epidemiology 9509 Probability Distributions

Poisson Distribution P(4)

x Pr(X=x)

0 0.0181 0.0732 0.1473 0.1954 0.1955 0.1566 0.1047 0.0608 0.0309 0.01310 0.00511 0.00212 0.001

14

Page 15: Epidemiology 9509 - Principles of Biostatistics Chapter 5 Probability Distributionspublish.uwo.ca/.../chapter5/probability_distribution.pdf · 2012-11-15 · Principles of Biostatistics

Epidemiology 9509 Probability Distributions

P(4)

Figure 5.5 Poisson Distribution (µ = 4)

0.15

0.10

0.00

0.25

0.20

0.05

8 9765430 1 2 10

Probability

15

Page 16: Epidemiology 9509 - Principles of Biostatistics Chapter 5 Probability Distributionspublish.uwo.ca/.../chapter5/probability_distribution.pdf · 2012-11-15 · Principles of Biostatistics

Epidemiology 9509 Probability Distributions

Continuous distributions

as if N were a very large numberPr(X = 20) = .00000000001

compute probabilities of intervalseg Pr(20 ≤ X < 21)eg Pr(X < 21)

with Probability Density Function, f (x)Pr(a < X < b) =area under curve above interval (a,b)

16

Page 17: Epidemiology 9509 - Principles of Biostatistics Chapter 5 Probability Distributionspublish.uwo.ca/.../chapter5/probability_distribution.pdf · 2012-11-15 · Principles of Biostatistics

Epidemiology 9509 Probability Distributions

Uniform distribution

Figure 5.6 Uniform Distribution (0,1)

0 1

1.00

17

Page 18: Epidemiology 9509 - Principles of Biostatistics Chapter 5 Probability Distributionspublish.uwo.ca/.../chapter5/probability_distribution.pdf · 2012-11-15 · Principles of Biostatistics

Epidemiology 9509 Probability Distributions

Uniform distribution (continued)

Probability of every interval of width, w , is the sameFigure 5.7 Uniform Distribution (0,1) probabilities

0 1

1.00

0.2 0.50.3

Pr(0 < X ≤ 0.2) = 0.2 = Pr(0.3 < X ≤ 0.5)

18

Page 19: Epidemiology 9509 - Principles of Biostatistics Chapter 5 Probability Distributionspublish.uwo.ca/.../chapter5/probability_distribution.pdf · 2012-11-15 · Principles of Biostatistics

Epidemiology 9509 Probability Distributions

Moments of Continuous Probability Distributions

E (X ) =∫

−∞xf (x)dx

Var(X ) = E (((X − E (X ))2) =∫

−∞(x − E (X ))2f (x)dx

19

Page 20: Epidemiology 9509 - Principles of Biostatistics Chapter 5 Probability Distributionspublish.uwo.ca/.../chapter5/probability_distribution.pdf · 2012-11-15 · Principles of Biostatistics

Epidemiology 9509 Probability Distributions

Moments of Uniform Distribution

Mean: E (X ) = 12 = 0.5

Variance: Var(X ) = 112 = 0.0833

Standard deviation: sd(X ) =√

Var(X ) =√0.0833 = 0.289

Note that ”all” of the distribution is contained in the interval

E (X )± 2 ∗ sd(X )= 0.5 ± 2(0.289)= (−0.078, 1.078)

20

Page 21: Epidemiology 9509 - Principles of Biostatistics Chapter 5 Probability Distributionspublish.uwo.ca/.../chapter5/probability_distribution.pdf · 2012-11-15 · Principles of Biostatistics

Epidemiology 9509 Probability Distributions

The Normal distributionGaussian

f (x) =1

σ√2π

exp−1

2

(

x − µ

σ

)2

mean (E(X)) is µ, variance (Var(X)) is σ2

Figure 5.9: Normal distribution N(µ, σ2)

µ21

Page 22: Epidemiology 9509 - Principles of Biostatistics Chapter 5 Probability Distributionspublish.uwo.ca/.../chapter5/probability_distribution.pdf · 2012-11-15 · Principles of Biostatistics

Epidemiology 9509 Probability Distributions

Standardized Normal N(0,1)

Figure 5.10: Normal distribution N(0, 1)

0.0

22

Page 23: Epidemiology 9509 - Principles of Biostatistics Chapter 5 Probability Distributionspublish.uwo.ca/.../chapter5/probability_distribution.pdf · 2012-11-15 · Principles of Biostatistics

Epidemiology 9509 Probability Distributions

Normal probabilities

Pr(X < b) = area under the curve above (−∞,b)Pr(a < X < b) = area under the curve above (a,b)

23

Page 24: Epidemiology 9509 - Principles of Biostatistics Chapter 5 Probability Distributionspublish.uwo.ca/.../chapter5/probability_distribution.pdf · 2012-11-15 · Principles of Biostatistics

Epidemiology 9509 Probability Distributions

example

Normal distribution Age ∼ N(20,25)Figure 5.11: Calculating Pr(X < 22) for N(20,25)

20.0

22.0

24

Page 25: Epidemiology 9509 - Principles of Biostatistics Chapter 5 Probability Distributionspublish.uwo.ca/.../chapter5/probability_distribution.pdf · 2012-11-15 · Principles of Biostatistics

Epidemiology 9509 Probability Distributions

standardization

don’t have tables for N(20,25)have tables for N(0,1)

Figure 5.12: Calculating Pr(X < 0.4) forN(0, 1)

0.0

0.4

25

Page 26: Epidemiology 9509 - Principles of Biostatistics Chapter 5 Probability Distributionspublish.uwo.ca/.../chapter5/probability_distribution.pdf · 2012-11-15 · Principles of Biostatistics

Epidemiology 9509 Probability Distributions

STANDARDIZE

ZN = X−µ

σ

Pr(X < 22) = Pr(

ZN < 22−205

)

= Pr(ZN < 0.4)

= 1− z0.4= 1− 0.3446(Appendix A, Table A.1, page A.2, row 6,entitled ”0.4”, first column)

= 0.6554

26

Page 27: Epidemiology 9509 - Principles of Biostatistics Chapter 5 Probability Distributionspublish.uwo.ca/.../chapter5/probability_distribution.pdf · 2012-11-15 · Principles of Biostatistics

Epidemiology 9509 Probability Distributions

example two

Figure 5.13: Calculating Pr(17 < X < 22)

20.0

22.017.0

27

Page 28: Epidemiology 9509 - Principles of Biostatistics Chapter 5 Probability Distributionspublish.uwo.ca/.../chapter5/probability_distribution.pdf · 2012-11-15 · Principles of Biostatistics

Epidemiology 9509 Probability Distributions

example two (continued)

Pr(17 < X < 22) = Pr(

17−205 < ZN < 22−20

5

)

= Pr(−0.6 < ZN < 0.4)= Pr(ZN < 0.4)− Pr(ZN < −0.6)= (1− z0.4)− (1− z(−0.6))= (1− z0.4)− (1− (1− z0.6))= (1− z0.4)− (z0.6))

= 0.6554 − 0.2743 = 0.3809

28

Page 29: Epidemiology 9509 - Principles of Biostatistics Chapter 5 Probability Distributionspublish.uwo.ca/.../chapter5/probability_distribution.pdf · 2012-11-15 · Principles of Biostatistics

Epidemiology 9509 Probability Distributions

Normal approximation to binomial

For large n, binomial looks like Normalnπ ≥ 5, n(1 − π) ≥ 5

lets try n = 20, π = 0.4 E (X ) = µ = nπ = 20(0.4) = 8

Var(X ) = σ2 = nπ(1− π) = 20(0.4)(0.6) = 4.8

29

Page 30: Epidemiology 9509 - Principles of Biostatistics Chapter 5 Probability Distributionspublish.uwo.ca/.../chapter5/probability_distribution.pdf · 2012-11-15 · Principles of Biostatistics

Epidemiology 9509 Probability Distributions

Figure 5.14 Normal approximation to Bin(20,0.4)

0.15

0.10

0.00

0.25

0.20

0.05

Probability

2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

Normal Approximation

Binomial

30

Page 31: Epidemiology 9509 - Principles of Biostatistics Chapter 5 Probability Distributionspublish.uwo.ca/.../chapter5/probability_distribution.pdf · 2012-11-15 · Principles of Biostatistics

Epidemiology 9509 Probability Distributions

calculation

Pr(Xbin ≤ 7) ≈ Pr(Xnorm < 7.5)Figure 5.15 Important block in normal approximation

0.15

0.10

0.00

0.25

0.20

0.05

Probability

2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

Normal Approximation

Binomial

31

Page 32: Epidemiology 9509 - Principles of Biostatistics Chapter 5 Probability Distributionspublish.uwo.ca/.../chapter5/probability_distribution.pdf · 2012-11-15 · Principles of Biostatistics

Epidemiology 9509 Probability Distributions

calculation (continued)

= Pr(ZN <(

7.5−82.19

)

= Pr(ZN < −0.228)= 1− z(−0.228)

= 1− (1− z0.228)= z0.228

by linear interpolation in Table A.1, page A.2= 4

5z0.23 +15z0.22

= 45(0.4090) +

15(0.4129) = 0.4098

if we use the program R to calculate z0.228,we get 0.4098 (my linear interpolation is good)

if we use R to calculate the binomial exactly,we get 0.4159(the normal approximation is good)

32