the normal distribution - university of wisconsin–madison · pdf filethe normal...

64
The Normal Distribution

Upload: hadang

Post on 06-Mar-2018

234 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: The Normal Distribution - University of Wisconsin–Madison · PDF fileThe Normal Distribution The Normal Distribution is the most important distribution of continuous random variables

The Normal Distribution

Page 2: The Normal Distribution - University of Wisconsin–Madison · PDF fileThe Normal Distribution The Normal Distribution is the most important distribution of continuous random variables

Outline

1 Introduction: Continuous Random Variables (3.6)

2 Introduction: Normal Distribution (4.1-4.2)

3 Areas Under a Normal Curve (4.3)Standard Normal Table: Computing ProbabilitiesStandard Normal Table: Finding QuantilesGeneral Normal DistributionUsing the ComputerA Case Study

4 Assessing Normality (4.4)

Page 3: The Normal Distribution - University of Wisconsin–Madison · PDF fileThe Normal Distribution The Normal Distribution is the most important distribution of continuous random variables

Outline

1 Introduction: Continuous Random Variables (3.6)

2 Introduction: Normal Distribution (4.1-4.2)

3 Areas Under a Normal Curve (4.3)Standard Normal Table: Computing ProbabilitiesStandard Normal Table: Finding QuantilesGeneral Normal DistributionUsing the ComputerA Case Study

4 Assessing Normality (4.4)

Page 4: The Normal Distribution - University of Wisconsin–Madison · PDF fileThe Normal Distribution The Normal Distribution is the most important distribution of continuous random variables

Recall: Discrete Random Variables

We can describe a discrete random variable with a table.For example:

k 0 1 5 10Pr {X = k} 0.1 0.5 0.1 0.3

But a continuous random variable can take any value in ainterval; we could never list all of these values in a table

Page 5: The Normal Distribution - University of Wisconsin–Madison · PDF fileThe Normal Distribution The Normal Distribution is the most important distribution of continuous random variables

Continuous Distributions

A continuous random variable has possible values over acontinuum.

The total probability of one is not in discrete chunks atspecific locations, but rather is ground up like a very finedust and sprinkled on the number line.

We cannot represent the distribution with a table ofpossible values and the probability of each.

Page 6: The Normal Distribution - University of Wisconsin–Madison · PDF fileThe Normal Distribution The Normal Distribution is the most important distribution of continuous random variables

Continuous Distributions (continued)

Instead, we represent the distribution with a probabilitydensity function which measures the thickness of theprobability dust.

Probability is measured over intervals as the area underthe curve.A legal probability density f :

1 is never negative (f (x) ≥ 0 for −∞ < x < ∞).2 has a total area under the curve of one (

∫∞−∞ f (x)dx = 1).

Page 7: The Normal Distribution - University of Wisconsin–Madison · PDF fileThe Normal Distribution The Normal Distribution is the most important distribution of continuous random variables

For You

Read 3.6 in your book

Page 8: The Normal Distribution - University of Wisconsin–Madison · PDF fileThe Normal Distribution The Normal Distribution is the most important distribution of continuous random variables

Outline

1 Introduction: Continuous Random Variables (3.6)

2 Introduction: Normal Distribution (4.1-4.2)

3 Areas Under a Normal Curve (4.3)Standard Normal Table: Computing ProbabilitiesStandard Normal Table: Finding QuantilesGeneral Normal DistributionUsing the ComputerA Case Study

4 Assessing Normality (4.4)

Page 9: The Normal Distribution - University of Wisconsin–Madison · PDF fileThe Normal Distribution The Normal Distribution is the most important distribution of continuous random variables

The Normal DistributionThe Normal Distribution is the most important distributionof continuous random variables.The normal density curve is the famous symmetric,bell-shaped curve.The central limit theorem is the reason that the normalcurve is so important. Essentially, many statistics that wecalculate from large random samples will haveapproximate normal distributions (or distributions derivedfrom normal distributions), even if the distributions of theunderlying variables are not normally distributed.This fact is the basis of most of the methods of statisticalinference we will study in the last half of the course.Chapter 4 introduces the normal distribution as aprobability distribution.Chapter 5 culminates in the central limit theorem, theprimary theoretical justification for most of the methods ofstatistical inference in the remainder of the textbook.

Page 10: The Normal Distribution - University of Wisconsin–Madison · PDF fileThe Normal Distribution The Normal Distribution is the most important distribution of continuous random variables

The famous bell curve

Used everywhere: very useful description for lots of biological(and other) random variables.

Body weight,

Crop yield,

Protein content in soybean,

Density of blood components

Y ∼ N (µ, σ)

values can, inprinciple, go toinfinity, both ways.

µ − 4σ µ − 2σ µ µ + 2σ µ + 4σ

σ σ

Page 11: The Normal Distribution - University of Wisconsin–Madison · PDF fileThe Normal Distribution The Normal Distribution is the most important distribution of continuous random variables

The Normal Density

Normal curves have the following bell-shaped, symmetricdensity.

f (x) =1√2πσ

e− 1

2

(x−µ

σ

)2

, (−∞ < x < ∞)

Parameters: The parameters of a normal curve are themean µ and the standard deviation σ.

Notation: a random variable is normal with mean µ andstandard deviation σ,

Y ∼ N (µ, σ)

Standard Normal. We often consider µ = 0 and σ = 1.

Z ∼ N (0, 1)

Page 12: The Normal Distribution - University of Wisconsin–Madison · PDF fileThe Normal Distribution The Normal Distribution is the most important distribution of continuous random variables

Recall: Empirical Rule

The 68–95–99.7 Rule.

For every normal curve:1 ≈ 68% of the area is within one SD of the mean2 ≈ 95% of the area is within two SDs of the mean3 ≈ 99.7% of the area is within three SDs of the mean

Page 13: The Normal Distribution - University of Wisconsin–Madison · PDF fileThe Normal Distribution The Normal Distribution is the most important distribution of continuous random variables

Standard Normal Density

Standard Normal Density

Possible Values

Den

sity

−3 −2 −1 0 1 2 3

Area within 1 = 0.68

Area within 2 = 0.95

Area within 3 = 0.997

Page 14: The Normal Distribution - University of Wisconsin–Madison · PDF fileThe Normal Distribution The Normal Distribution is the most important distribution of continuous random variables

Outline

1 Introduction: Continuous Random Variables (3.6)

2 Introduction: Normal Distribution (4.1-4.2)

3 Areas Under a Normal Curve (4.3)Standard Normal Table: Computing ProbabilitiesStandard Normal Table: Finding QuantilesGeneral Normal DistributionUsing the ComputerA Case Study

4 Assessing Normality (4.4)

Page 15: The Normal Distribution - University of Wisconsin–Madison · PDF fileThe Normal Distribution The Normal Distribution is the most important distribution of continuous random variables

Recall: the Normal Density

f (x) =1√2πσ

e− 1

2

(x−µ

σ

)2

, (−∞ < x < ∞)

(can you integrate that?)

Page 16: The Normal Distribution - University of Wisconsin–Madison · PDF fileThe Normal Distribution The Normal Distribution is the most important distribution of continuous random variables

Compute areas

There is no formula to calculate general areas under thenormal curve.

(The integral of the density has no closed form solution.)

We will learn to use normal tables for this.

We will also learn how to use R.

However, you will have to use normal tables on the exams.

Page 17: The Normal Distribution - University of Wisconsin–Madison · PDF fileThe Normal Distribution The Normal Distribution is the most important distribution of continuous random variables

The Standard Normal Table

We have tables for N (0, 1)

The standard normal table lists the area to the left of zunder the standard normal curve for each value from−3.49 to 3.49 by 0.01 increments.The normal table is located:

1 inside front cover of your textbook.2 Table 3 (p. 675-676)

Numbers in the margins represent z.

Numbers in the middle of the table are areas to the left of z.

Page 18: The Normal Distribution - University of Wisconsin–Madison · PDF fileThe Normal Distribution The Normal Distribution is the most important distribution of continuous random variables

Warning!

Learn the normal table today.

Make it part of your being.

Page 19: The Normal Distribution - University of Wisconsin–Madison · PDF fileThe Normal Distribution The Normal Distribution is the most important distribution of continuous random variables

Standard Normal Table: Computing Probabilities

Page 20: The Normal Distribution - University of Wisconsin–Madison · PDF fileThe Normal Distribution The Normal Distribution is the most important distribution of continuous random variables

The standard normal Z ∼ N (0, 1)

−4 −2 0 1 2 3 4

Whole area = 1 (total probability rule)Pr {Z ≤ 0} =

.5

by symmetryPr {Z = 0} = 0Pr {Z < 1} = .8413 (table, front cover)Pr {0 ≤ Z ≤ 1} =

0.8413− 0.5 = .3413

Pr {−1 ≤ Z ≤ 1} =

2 ∗ 0.3413 = .6826

Pr {Z > 1.5} =

1− Pr {Z ≤ −1.5}= 1− 0.9332 = .0668

Draw a picture and make use of symmetry.

Pr {−0.5 ≤ Z ≤ 0.3} =

Pr {Z ≤ 0.3} − Pr {Z ≤ −0.5}= .6179− .3085 = .3094

Page 21: The Normal Distribution - University of Wisconsin–Madison · PDF fileThe Normal Distribution The Normal Distribution is the most important distribution of continuous random variables

The standard normal Z ∼ N (0, 1)

−4 −2 0 1 2 3 4

Whole area = 1 (total probability rule)Pr {Z ≤ 0} =

.5

by symmetryPr {Z = 0} = 0Pr {Z < 1} = .8413 (table, front cover)Pr {0 ≤ Z ≤ 1} =

0.8413− 0.5 = .3413

Pr {−1 ≤ Z ≤ 1} =

2 ∗ 0.3413 = .6826

Pr {Z > 1.5} =

1− Pr {Z ≤ −1.5}= 1− 0.9332 = .0668

Draw a picture and make use of symmetry.

Pr {−0.5 ≤ Z ≤ 0.3} =

Pr {Z ≤ 0.3} − Pr {Z ≤ −0.5}= .6179− .3085 = .3094

Page 22: The Normal Distribution - University of Wisconsin–Madison · PDF fileThe Normal Distribution The Normal Distribution is the most important distribution of continuous random variables

The standard normal Z ∼ N (0, 1)

−4 −2 0 1 2 3 4

Whole area = 1 (total probability rule)Pr {Z ≤ 0} =

.5

by symmetryPr {Z = 0} = 0Pr {Z < 1} = .8413 (table, front cover)Pr {0 ≤ Z ≤ 1} =

0.8413− 0.5 = .3413

Pr {−1 ≤ Z ≤ 1} =

2 ∗ 0.3413 = .6826

Pr {Z > 1.5} =

1− Pr {Z ≤ −1.5}= 1− 0.9332 = .0668

Draw a picture and make use of symmetry.

Pr {−0.5 ≤ Z ≤ 0.3} =

Pr {Z ≤ 0.3} − Pr {Z ≤ −0.5}= .6179− .3085 = .3094

Page 23: The Normal Distribution - University of Wisconsin–Madison · PDF fileThe Normal Distribution The Normal Distribution is the most important distribution of continuous random variables

The standard normal Z ∼ N (0, 1)

−4 −2 0 1 2 3 4

Whole area = 1 (total probability rule)Pr {Z ≤ 0} = .5 by symmetryPr {Z = 0} = 0Pr {Z < 1} = .8413 (table, front cover)Pr {0 ≤ Z ≤ 1} =

0.8413− 0.5 = .3413

Pr {−1 ≤ Z ≤ 1} =

2 ∗ 0.3413 = .6826

Pr {Z > 1.5} =

1− Pr {Z ≤ −1.5}= 1− 0.9332 = .0668

Draw a picture and make use of symmetry.

Pr {−0.5 ≤ Z ≤ 0.3} =

Pr {Z ≤ 0.3} − Pr {Z ≤ −0.5}= .6179− .3085 = .3094

Page 24: The Normal Distribution - University of Wisconsin–Madison · PDF fileThe Normal Distribution The Normal Distribution is the most important distribution of continuous random variables

The standard normal Z ∼ N (0, 1)

−4 −2 0 2 4−5 1

Whole area = 1 (total probability rule)Pr {Z ≤ 0} = .5 by symmetryPr {Z = 0} = 0Pr {Z < 1} = .8413 (table, front cover)Pr {0 ≤ Z ≤ 1} =

0.8413− 0.5 = .3413

Pr {−1 ≤ Z ≤ 1} =

2 ∗ 0.3413 = .6826

Pr {Z > 1.5} =

1− Pr {Z ≤ −1.5}= 1− 0.9332 = .0668

Draw a picture and make use of symmetry.

Pr {−0.5 ≤ Z ≤ 0.3} =

Pr {Z ≤ 0.3} − Pr {Z ≤ −0.5}= .6179− .3085 = .3094

Page 25: The Normal Distribution - University of Wisconsin–Madison · PDF fileThe Normal Distribution The Normal Distribution is the most important distribution of continuous random variables

The standard normal Z ∼ N (0, 1)

−4 −2 0 2 40 1

Whole area = 1 (total probability rule)Pr {Z ≤ 0} = .5 by symmetryPr {Z = 0} = 0Pr {Z < 1} = .8413 (table, front cover)Pr {0 ≤ Z ≤ 1} =

0.8413− 0.5 = .3413

Pr {−1 ≤ Z ≤ 1} =

2 ∗ 0.3413 = .6826

Pr {Z > 1.5} =

1− Pr {Z ≤ −1.5}= 1− 0.9332 = .0668

Draw a picture and make use of symmetry.

Pr {−0.5 ≤ Z ≤ 0.3} =

Pr {Z ≤ 0.3} − Pr {Z ≤ −0.5}= .6179− .3085 = .3094

Page 26: The Normal Distribution - University of Wisconsin–Madison · PDF fileThe Normal Distribution The Normal Distribution is the most important distribution of continuous random variables

The standard normal Z ∼ N (0, 1)

−4 −2 0 2 40 1

Whole area = 1 (total probability rule)Pr {Z ≤ 0} = .5 by symmetryPr {Z = 0} = 0Pr {Z < 1} = .8413 (table, front cover)Pr {0 ≤ Z ≤ 1} = 0.8413− 0.5 = .3413Pr {−1 ≤ Z ≤ 1} =

2 ∗ 0.3413 = .6826

Pr {Z > 1.5} =

1− Pr {Z ≤ −1.5}= 1− 0.9332 = .0668

Draw a picture and make use of symmetry.

Pr {−0.5 ≤ Z ≤ 0.3} =

Pr {Z ≤ 0.3} − Pr {Z ≤ −0.5}= .6179− .3085 = .3094

Page 27: The Normal Distribution - University of Wisconsin–Madison · PDF fileThe Normal Distribution The Normal Distribution is the most important distribution of continuous random variables

The standard normal Z ∼ N (0, 1)

−4 −2 0 2 4−1 1

Whole area = 1 (total probability rule)Pr {Z ≤ 0} = .5 by symmetryPr {Z = 0} = 0Pr {Z < 1} = .8413 (table, front cover)Pr {0 ≤ Z ≤ 1} = 0.8413− 0.5 = .3413Pr {−1 ≤ Z ≤ 1} =

2 ∗ 0.3413 = .6826

Pr {Z > 1.5} =

1− Pr {Z ≤ −1.5}= 1− 0.9332 = .0668

Draw a picture and make use of symmetry.

Pr {−0.5 ≤ Z ≤ 0.3} =

Pr {Z ≤ 0.3} − Pr {Z ≤ −0.5}= .6179− .3085 = .3094

Page 28: The Normal Distribution - University of Wisconsin–Madison · PDF fileThe Normal Distribution The Normal Distribution is the most important distribution of continuous random variables

The standard normal Z ∼ N (0, 1)

−4 −2 0 2 4−1 1

Whole area = 1 (total probability rule)Pr {Z ≤ 0} = .5 by symmetryPr {Z = 0} = 0Pr {Z < 1} = .8413 (table, front cover)Pr {0 ≤ Z ≤ 1} = 0.8413− 0.5 = .3413Pr {−1 ≤ Z ≤ 1} = 2 ∗ 0.3413 = .6826Pr {Z > 1.5} =

1− Pr {Z ≤ −1.5}= 1− 0.9332 = .0668

Draw a picture and make use of symmetry.

Pr {−0.5 ≤ Z ≤ 0.3} =

Pr {Z ≤ 0.3} − Pr {Z ≤ −0.5}= .6179− .3085 = .3094

Page 29: The Normal Distribution - University of Wisconsin–Madison · PDF fileThe Normal Distribution The Normal Distribution is the most important distribution of continuous random variables

The standard normal Z ∼ N (0, 1)

−4 0 41.5 5

Whole area = 1 (total probability rule)Pr {Z ≤ 0} = .5 by symmetryPr {Z = 0} = 0Pr {Z < 1} = .8413 (table, front cover)Pr {0 ≤ Z ≤ 1} = 0.8413− 0.5 = .3413Pr {−1 ≤ Z ≤ 1} = 2 ∗ 0.3413 = .6826Pr {Z > 1.5} =

1− Pr {Z ≤ −1.5}= 1− 0.9332 = .0668

Draw a picture and make use of symmetry.

Pr {−0.5 ≤ Z ≤ 0.3} =

Pr {Z ≤ 0.3} − Pr {Z ≤ −0.5}= .6179− .3085 = .3094

Page 30: The Normal Distribution - University of Wisconsin–Madison · PDF fileThe Normal Distribution The Normal Distribution is the most important distribution of continuous random variables

The standard normal Z ∼ N (0, 1)

−4 0 41.5 5

Whole area = 1 (total probability rule)Pr {Z ≤ 0} = .5 by symmetryPr {Z = 0} = 0Pr {Z < 1} = .8413 (table, front cover)Pr {0 ≤ Z ≤ 1} = 0.8413− 0.5 = .3413Pr {−1 ≤ Z ≤ 1} = 2 ∗ 0.3413 = .6826Pr {Z > 1.5} = 1− Pr {Z ≤ −1.5}

= 1− 0.9332 = .0668

Draw a picture and make use of symmetry.

Pr {−0.5 ≤ Z ≤ 0.3} =

Pr {Z ≤ 0.3} − Pr {Z ≤ −0.5}= .6179− .3085 = .3094

Page 31: The Normal Distribution - University of Wisconsin–Madison · PDF fileThe Normal Distribution The Normal Distribution is the most important distribution of continuous random variables

The standard normal Z ∼ N (0, 1)

−4 0 4−1.5 1.5

Whole area = 1 (total probability rule)Pr {Z ≤ 0} = .5 by symmetryPr {Z = 0} = 0Pr {Z < 1} = .8413 (table, front cover)Pr {0 ≤ Z ≤ 1} = 0.8413− 0.5 = .3413Pr {−1 ≤ Z ≤ 1} = 2 ∗ 0.3413 = .6826Pr {Z > 1.5} = 1− Pr {Z ≤ −1.5}

= 1− 0.9332 = .0668

Draw a picture and make use of symmetry.

Pr {−0.5 ≤ Z ≤ 0.3} =

Pr {Z ≤ 0.3} − Pr {Z ≤ −0.5}= .6179− .3085 = .3094

Page 32: The Normal Distribution - University of Wisconsin–Madison · PDF fileThe Normal Distribution The Normal Distribution is the most important distribution of continuous random variables

The standard normal Z ∼ N (0, 1)

−4 4−0.50.3

Whole area = 1 (total probability rule)Pr {Z ≤ 0} = .5 by symmetryPr {Z = 0} = 0Pr {Z < 1} = .8413 (table, front cover)Pr {0 ≤ Z ≤ 1} = 0.8413− 0.5 = .3413Pr {−1 ≤ Z ≤ 1} = 2 ∗ 0.3413 = .6826Pr {Z > 1.5} = 1− Pr {Z ≤ −1.5}

= 1− 0.9332 = .0668

Draw a picture and make use of symmetry.

Pr {−0.5 ≤ Z ≤ 0.3} = Pr {Z ≤ 0.3} − Pr {Z ≤ −0.5}

= .6179− .3085 = .3094

Page 33: The Normal Distribution - University of Wisconsin–Madison · PDF fileThe Normal Distribution The Normal Distribution is the most important distribution of continuous random variables

The standard normal Z ∼ N (0, 1)

−4 4−0.50.3

Whole area = 1 (total probability rule)Pr {Z ≤ 0} = .5 by symmetryPr {Z = 0} = 0Pr {Z < 1} = .8413 (table, front cover)Pr {0 ≤ Z ≤ 1} = 0.8413− 0.5 = .3413Pr {−1 ≤ Z ≤ 1} = 2 ∗ 0.3413 = .6826Pr {Z > 1.5} = 1− Pr {Z ≤ −1.5}

= 1− 0.9332 = .0668

Draw a picture and make use of symmetry.

Pr {−0.5 ≤ Z ≤ 0.3} = Pr {Z ≤ 0.3} − Pr {Z ≤ −0.5}= .6179− .3085 = .3094

Page 34: The Normal Distribution - University of Wisconsin–Madison · PDF fileThe Normal Distribution The Normal Distribution is the most important distribution of continuous random variables

Standard Normal Table: Finding Quantiles

Page 35: The Normal Distribution - University of Wisconsin–Madison · PDF fileThe Normal Distribution The Normal Distribution is the most important distribution of continuous random variables

Basic Idea

This is the inverse problem.

Before. Question: here is the interval; Answer: Probabilityof the interval

Now. Question: Here is the probability. Answer: What isthe interval that matches with the probability.

Problem: many intervals have the same area.

We ask for a certain type of interval.

Page 36: The Normal Distribution - University of Wisconsin–Madison · PDF fileThe Normal Distribution The Normal Distribution is the most important distribution of continuous random variables

The Reverse Question: Finding quantiles

Pr {Z ≤?} = .975

? = 1.96

the value ? is a quantile .Use Table 3 (or front cover) backward

Pr {Z ≤?} = 0.20

? = −0.84

Pr {Z ≥?} = 0.80

? = −0.84

−3 0 3?

−3 0 3?

Page 37: The Normal Distribution - University of Wisconsin–Madison · PDF fileThe Normal Distribution The Normal Distribution is the most important distribution of continuous random variables

The Reverse Question: Finding quantiles

Pr {Z ≤?} = .975 ? = 1.96the value ? is a quantile .Use Table 3 (or front cover) backward

Pr {Z ≤?} = 0.20

? = −0.84

Pr {Z ≥?} = 0.80

? = −0.84

−3 0 3?

−3 0 3?

Page 38: The Normal Distribution - University of Wisconsin–Madison · PDF fileThe Normal Distribution The Normal Distribution is the most important distribution of continuous random variables

The Reverse Question: Finding quantiles

Pr {Z ≤?} = .975 ? = 1.96the value ? is a quantile .Use Table 3 (or front cover) backward

Pr {Z ≤?} = 0.20 ? = −0.84Pr {Z ≥?} = 0.80

? = −0.84

−3 0 3?

−3 0 3?

Page 39: The Normal Distribution - University of Wisconsin–Madison · PDF fileThe Normal Distribution The Normal Distribution is the most important distribution of continuous random variables

The Reverse Question: Finding quantiles

Pr {Z ≤?} = .975 ? = 1.96the value ? is a quantile .Use Table 3 (or front cover) backward

Pr {Z ≤?} = 0.20 ? = −0.84Pr {Z ≥?} = 0.80 ? = −0.84

−3 0 3?

−3 0 3?

Page 40: The Normal Distribution - University of Wisconsin–Madison · PDF fileThe Normal Distribution The Normal Distribution is the most important distribution of continuous random variables

zα Notation

Instead of ? we introduce a fancy notation.

Let zα be the number such that

Pr {Z < zα} = 1− α

Draw Figure 4.19 (α to the right; 1− α to the left)

Example. z.025 = 1.96

Draw a picture!

(Note: your book uses a capital, Zα)

Page 41: The Normal Distribution - University of Wisconsin–Madison · PDF fileThe Normal Distribution The Normal Distribution is the most important distribution of continuous random variables

General Normal Distribution

Page 42: The Normal Distribution - University of Wisconsin–Madison · PDF fileThe Normal Distribution The Normal Distribution is the most important distribution of continuous random variables

General Normal?

We are good if Y ∼ N (0, 1)

What if Y ∼ N (10, 5) or N (11, 3), or ...

(Directions from “Home”)

Page 43: The Normal Distribution - University of Wisconsin–Madison · PDF fileThe Normal Distribution The Normal Distribution is the most important distribution of continuous random variables

Standardization

All normal curves have the same shape, and are simplyrescaled versions of the standard normal density.

Consequently, every area under a general normal curvecorresponds to an area under the standard normal curve.

The key standardization formula is

z =x − µ

σ

Solving for x yieldsx = µ + zσ

which says algebraically that x is z standard deviationsabove the mean.

Page 44: The Normal Distribution - University of Wisconsin–Madison · PDF fileThe Normal Distribution The Normal Distribution is the most important distribution of continuous random variables

Probability Example

If X ∼ N(100, 2), find Pr {X > 97.5}.Solution:

Pr {X > 97.5} = Pr(

X − 1002

>97.5− 100

2

)= Pr {Z > −1.25}= 1− Pr {Z < −1.25}= ?

Page 45: The Normal Distribution - University of Wisconsin–Madison · PDF fileThe Normal Distribution The Normal Distribution is the most important distribution of continuous random variables

Quantile Example

If X ∼ N(100, 2), find the cutoff values for the middle 70%of the distribution.

Solution: The cutoff points will be the 0.15 and 0.85quantiles.

From the table, 1.03 < z < 1.04 and z = 1.04 is closest.

Thus, the cutoff points are the mean plus or minus 1.04standard deviations.

100− 1.04(2) = 97.92, 100 + 1.04(2) = 102.08

Page 46: The Normal Distribution - University of Wisconsin–Madison · PDF fileThe Normal Distribution The Normal Distribution is the most important distribution of continuous random variables

General form N (µ, σ)

All normal distributions have the same shape.

Transformation

Z =Y − µ

σ∼ N (0, 1)

Systolic blood pressure in healthy adults has a normaldistribution with mean 112 mmHg and standard deviation 10mmHg, i.e. Y ∼ N (112, 10).

One day, I have 92 mmHg.

Pr {Y ≤ 92} = Pr{

Y − 11210

≤ 92− 11210

}= Pr {Z ≤ −2} = 0.0227

Page 47: The Normal Distribution - University of Wisconsin–Madison · PDF fileThe Normal Distribution The Normal Distribution is the most important distribution of continuous random variables

General form N (µ, σ)

µ = 112 mmHg, σ = 10 mmHg.

Pr {102 ≤ Y ≤ 122} = Pr{

102− 11210

≤ Y − 11210

≤ 122− 11210

}= Pr {−1 ≤ Z ≤ 1} = .6826

68.3% of healthy adults have systolic blood pressure between102 and 122 mmHg.A patient’s systolic blood pressure is 137 mmHg.

Pr {Y ≥ 137} = Pr{

Y − 11210

≥ 137− 11210

}= Pr {Z ≥ 2.5} = 1− .9938 = 0.0062

This patient’s blood pressure is very high...

Page 48: The Normal Distribution - University of Wisconsin–Madison · PDF fileThe Normal Distribution The Normal Distribution is the most important distribution of continuous random variables

General form N (µ, σ)µ = 112 mmHg, σ = 10 mmHg.What is “High blood pressure”? For instance, it could be thevalue BP∗ such that

Pr {Y ≤ BP∗} = .95

We need z∗ such that Pr {Z ≤ z∗} = .95. Table in back cover(last line): z∗ = 1.645. Thus BP∗ lies 1.645 standard deviationsabove the mean:

BP∗ = 112 + 1.645 ∗ 10 = 112 + 16.45 = 128.45 mmHg

Formal approach:

.95 = Pr {Y ≤ BP∗} = Pr{

Y − 11210

≤ BP∗ − 11210

}= Pr {Z ≤ z∗}

with z∗ =BP∗ − 112

10i.e. BP∗ = 112 + 10 z∗

Page 49: The Normal Distribution - University of Wisconsin–Madison · PDF fileThe Normal Distribution The Normal Distribution is the most important distribution of continuous random variables

Using the Computer

Page 50: The Normal Distribution - University of Wisconsin–Madison · PDF fileThe Normal Distribution The Normal Distribution is the most important distribution of continuous random variables

Other Options

Remember you need the tables for the exam.

Statistical software (e.g. R)

Online Calculators. See course website, handouts.

Calculators for Various Statistical Problems

Page 51: The Normal Distribution - University of Wisconsin–Madison · PDF fileThe Normal Distribution The Normal Distribution is the most important distribution of continuous random variables

Doing calculations with R

> pnorm(1)[1] 0.8413447> pnorm(2)-pnorm(-2)[1] 0.9544997> pnorm(3)-pnorm(-3)[1] 0.9973002> 1- pnorm(137, mean=112, sd=10)[1] 0.006209665> qnorm(.95, mean=112, sd=10)[1] 128.4485> pbinom(1, size=6, prob=1/6)[1] 0.7367755> dbinom(0:6, size=6, prob=1/6)[1] 0.335 0.402 0.201 0.054 0.008 0.001 0.000

norm normal distributionbinom binomialp probability: Pr {Y ≤ . . .}q quantiled density, or probability mass

function: Pr {Y = . . .}

Page 52: The Normal Distribution - University of Wisconsin–Madison · PDF fileThe Normal Distribution The Normal Distribution is the most important distribution of continuous random variables

A Case Study

Page 53: The Normal Distribution - University of Wisconsin–Madison · PDF fileThe Normal Distribution The Normal Distribution is the most important distribution of continuous random variables

Exam Hint

This how exam questions will often look.

I give you some question in the context of a scientific area;you figure out the probabilty calculations to use.

(word problems)

Page 54: The Normal Distribution - University of Wisconsin–Madison · PDF fileThe Normal Distribution The Normal Distribution is the most important distribution of continuous random variables

Case Study

ExampleBody temperature varies within individuals over time (it can behigher when one is ill with a fever, or during or after physicalexertion). However, if we measure the body temperature of asingle healthy person when at rest, these measurements varylittle from day to day, and we can associate with each person anindividual resting body temperture. There is, however, variationamong individuals of resting body temperture.

Page 55: The Normal Distribution - University of Wisconsin–Madison · PDF fileThe Normal Distribution The Normal Distribution is the most important distribution of continuous random variables

The Question

ExampleIn the population, suppose that:

the mean resting body temperature is 98.25 degreesFahrenheit;

the standard deviation is 0.73 degrees Fahrenheit;

resting body temperatures are normally distributed.

Let X be the resting body temperature of a randomly chosenindividual. Find:

1 Pr {X < 98}, the proportion of individuals with temperatureless than 98.

2 Pr {98 < X < 100}, the proportion of individuals withtemperature between 98 and 100.

3 The 0.90 quantile of the distribution.4 The cutoff values for the middle 50% of the distribution.

Page 56: The Normal Distribution - University of Wisconsin–Madison · PDF fileThe Normal Distribution The Normal Distribution is the most important distribution of continuous random variables

Outline

1 Introduction: Continuous Random Variables (3.6)

2 Introduction: Normal Distribution (4.1-4.2)

3 Areas Under a Normal Curve (4.3)Standard Normal Table: Computing ProbabilitiesStandard Normal Table: Finding QuantilesGeneral Normal DistributionUsing the ComputerA Case Study

4 Assessing Normality (4.4)

Page 57: The Normal Distribution - University of Wisconsin–Madison · PDF fileThe Normal Distribution The Normal Distribution is the most important distribution of continuous random variables

Normal Probability Plots

A standard question begins, “Assuming that variable Y hasa normal distribution,. . . ”. But how do we know thedistribution is approximately normal?

Sometimes we can rely on the central limit theorem.

If given data, there are tests for normality, but there arereasons not to do these.

It is generally better to make a plot that sheds light on thequestion, “Is the data so far from normality as to bias amethod that assumes normality?”

There is no easy answer to the question, but a normalprobability plot is much more informative than the result ofa test.

Page 58: The Normal Distribution - University of Wisconsin–Madison · PDF fileThe Normal Distribution The Normal Distribution is the most important distribution of continuous random variables

What is a Normal Probability Plot?

A normal probability plot is a plot of the sorted sample dataversus something close to the expected z-score for thecorresponding rank of a random normal sample of thesame size.

For example, the expected z-score of the minumum from arandom sample of size 10 from a normal population isabout −1.54.

If the plotted points are close to a straight line, there isevidence that the distribution is close to normal.

If the plotted points are far from a straight line, there isevidence of non-normality.

Page 59: The Normal Distribution - University of Wisconsin–Madison · PDF fileThe Normal Distribution The Normal Distribution is the most important distribution of continuous random variables

The Basic Idea

A normal probability plot plots the ordered observations onthe y-axis (y(1), . . . , y(n)) against some standard “normalscores” x1, . . . , xn on the x-axis.

If the data were truly from a normal distribution it would lieon a “straight” line.

Why a straight line? Y = µ + σZ

Page 60: The Normal Distribution - University of Wisconsin–Madison · PDF fileThe Normal Distribution The Normal Distribution is the most important distribution of continuous random variables

Normal Example: All three plots are normal (n = 50)

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

−2 −1 0 1 2

−2

02

Normal Q−Q Plot

Theoretical Quantiles

Sam

ple

Qua

ntile

s

Histogram of x

x

Fre

quen

cy

−3 −2 −1 0 1 2 3

05

1015

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

−2 −1 0 1 2

−2

02

Normal Q−Q Plot

Theoretical QuantilesS

ampl

e Q

uant

iles

Histogram of x

x

Fre

quen

cy

−3 −2 −1 0 1 2 3

05

15

●●●

●●

●●

●●

●●

●●

●●●

●●

● ●

−2 −1 0 1 2

−1

12

Normal Q−Q Plot

Theoretical Quantiles

Sam

ple

Qua

ntile

s

Histogram of x

x

Fre

quen

cy

−2 −1 0 1 2

04

812

Page 61: The Normal Distribution - University of Wisconsin–Madison · PDF fileThe Normal Distribution The Normal Distribution is the most important distribution of continuous random variables

Normal Example: All three plots are normal (n = 500)

●●

●●

●●

●●

●●●

●●●

●●

●●●

●●●●

●●

●●●●

●●●

●●

●●

●●●●●

●●

●●

●●●

●●●●●●

●●

●●

●●

●●

●●●

●●●●●

●●●

●●

●●

●●

●●

● ●●

●●

●●●

●●

●●

●●

●●

●●●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

● ●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●●

●●

●●

●●●

●● ●●

●●

●●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

● ●

●●

●●●

●●

●●●

●●●

●●

●●●

●●

●●● ●

●●

●●

●●

●●

●●●

●●

●●

−3 −2 −1 0 1 2 3

−3

−1

13

Normal Q−Q Plot

Theoretical Quantiles

Sam

ple

Qua

ntile

s

Histogram of x

x

Fre

quen

cy

−3 −1 0 1 2 3

040

80

●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●●

●●

●●

●●

● ●

●●

●●●

● ●

●●

●●

●●

●●●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●●

●●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●● ●

●●

●●●●●●

●●●

●●●

●●

●●●

●●

●●

●●●●●

●●●

●●●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●

● ●

●●

●●

●●

●●

●●

●●

●●●●

●●●●●

●●

●●

●●

●●

●●

●●

●●●●

●●●●

●●●

●●●●

●●

●●

●●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

−3 −2 −1 0 1 2 3

−3

−1

13

Normal Q−Q Plot

Theoretical QuantilesS

ampl

e Q

uant

iles

Histogram of x

x

Fre

quen

cy

−3 −2 −1 0 1 2 3

040

80

●●●

● ●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●●

●●●

●●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●●

●●●

●●●●●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●●●

●●

●●

●●●●

●●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●●●

●●●

●●

●●●●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●●●

●●●

●●

●●

●●●

●●

●●

●●●

●●●

●●●●

● ●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●●

●●●

●●

●●●

●●●

●●

●●

●●●

●●

●●

●●

●●●●

−3 −2 −1 0 1 2 3

−3

02

Normal Q−Q Plot

Theoretical Quantiles

Sam

ple

Qua

ntile

s

Histogram of x

x

Fre

quen

cy

−3 −1 0 1 2 3

040

80

Page 62: The Normal Distribution - University of Wisconsin–Madison · PDF fileThe Normal Distribution The Normal Distribution is the most important distribution of continuous random variables

Non-normal Example: One plot is not normal(n = 100)

●●

● ●

●●●●

●●

●●

●●●

●● ●●●

●● ●

●●

●●●

● ●

● ●●

●●

●●

●●

●●

●●

●●

●●●

●●●●●

●●

●●

−2 −1 0 1 2

02

46

8

Normal Q−Q Plot

Theoretical Quantiles

Sam

ple

Qua

ntile

s

Histogram of x

x

Fre

quen

cy

0 2 4 6 8

020

40

●●

●●

●●●

●●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●● ●

●●

●●

●●

●●

●●

−2 −1 0 1 2

−2

02

Normal Q−Q Plot

Theoretical QuantilesS

ampl

e Q

uant

iles

Histogram of x

x

Fre

quen

cy

−2 −1 0 1 2 3

05

15

●●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●●●

●●

●●

−2 −1 0 1 2

−2

01

2

Normal Q−Q Plot

Theoretical Quantiles

Sam

ple

Qua

ntile

s

Histogram of x

x

Fre

quen

cy

−2 −1 0 1 2

05

15

Page 63: The Normal Distribution - University of Wisconsin–Madison · PDF fileThe Normal Distribution The Normal Distribution is the most important distribution of continuous random variables

For You

Look at Figures 4.26 - 4.32 (p.136 - 139)

See how different shaped histograms lead to differentnormal probability plots

Page 64: The Normal Distribution - University of Wisconsin–Madison · PDF fileThe Normal Distribution The Normal Distribution is the most important distribution of continuous random variables

Warning!

Learn the normal table today.

Make it part of your being.