the behavior of the sample mean

7/30/2019 The Behavior of the Sample Mean

1/6

Announcement

The Behavior of the Sample Mean

(or Why Confidence Intervals Always Seem to be Based On the Normal Distribution)

[Many of the figures in this note are screen shots froma simulation at the Rice Virtual Lab in Statistics. You

might enjoytrying the simulationyourself after (or even while) reading this note. Java must be enabled in your

browser for this simulation to run.]

There is arguably no more important lesson to be learned in statistics thanhow sample means behave. It

explains whystatistical methods work. The vast majority of the things people do with statistics is compare

populations, and most of the time populations are compared by comparing their means.

The way individual observations behavedepends on the population fromwhich they are drawn. If we

draw a sample of individuals froma normally distributed population, the sample will follow a normal

distribution. If we draw a sample of individuals froma population with a skewed distribution, the sample

values will display the same skewness.Whatever the population looks like--normal, skewed, bimodal,

whatever--a sample of individual values will display the same characteristics. This should be no

surprise. Something would be very wrong if the sample of individual observationsdidn'tshare the

characteristics of the parent population.

We are now going to see a truly wondrous result. Statisticians refer to it asThe Central Limit Theorem. It

says that if you draw a large enough sample, the way the sample mean varies around the population mean can

be described bya normal distribution, NO MATTER WHAT THE POPULATION HISTOGRAM

LOOKS LIKE!

I'll repeat and summarize because this result is so important. If you draw a large sample, the histogram of

the individual observations will look like the population histogram from which the observationswere drawn. However, the way the sample mean varies around the population mean can be

described by the normal distribution.This makes it very easy to describe the way population means

behave. The way they vary about the population mean, for large samples, is unrelated to the shape of the

population histogram.

Let's look at an example. In the picture to the left,

the top panel shows a population skewed to the

right

the middle panel shows a sample of 25observations drawn fromthat population

the bottompanel shows the sample mean.


errydallal.com/LHSP/meandist.htm 1/6


2/6

The 25 observations show the kind of skewness to be expected froma sample of 25 fromthis population.

Let's do it again and keep collecting sample means.

And one more time. In each case, the individual

observations are spread out in a manner reminiscent

of the population histogram. The sample means,

however, are tightly grouped. This is not

unexpected. In each sample, we get observations

fromthroughout the distribution. The larger values

keep the mean frombeing very small while the

smaller values keep the mean frombeing very large.

There are so many observations, some large, some

small, that the mean ends up being "average". If the

sample contained only a few observations, the

sample mean might jump around considerably from

sample to sample, but with lots of observations the

sample mean doesn't get a chance to change verymuch.

Since the computer is doing all the work, let's go

hog wild and do it10,000 times!

Here's how those means fromthe 10,000 samples

of 25 observations each, behave. They behave like

things drawn froma normal distribution centered

about the mean of the original population!

At this point, the most common question is, "What's

with the 10,000 means?" and it's a good question.

Once this is sorted out, everything will fall into

place.

We do the experiment only once, that is, we get to see only one sample of 25 observations and one

sample mean.

The reason we draw the sample is to say something about the population mean.

In order to use the sample mean to say something about the population mean, we have to knowsomething about how different the two means can be.

This simulation tells us. The sample mean varies around the population mean as though

it came froma normal distribution

whose standard deviation is estimated by the Standard Error of the Mean, SEM =s/ n. (More

about the SEM below.)

All of the properties of the Normal Distribution apply:

68% of the time, the sample mean and population mean will be within 1 SEM of each other.

95% of the time, the sample mean and population mean will be within 2 SEMs of each other.

99% of the time, the sample mean and population mean will be within 2.57 SEMs of each other,and so on.

We will make formal use of this result in the note on Confidence Intervals.

This result isso important that statisticians have given it a special name. It is calledThe Central Limit




3/6

Theorem. It is a limit theorembecause it describes the behavior of the sample mean in the limit as the

sample size grows large. It is called theCentral limit theoremnot because there's any central limit, but

because it's a limit theoremthat iscentral to the practice of statistics!

The key to the Central limit Theoremis largesample size. The closer the histogramof the individual data

values is to normal, the smaller largecan be.

If individual observations follow a normal distribution exactly, the behavior of sample means can be

described by the normal distribution for anysample size, even 1.

If the departure fromnormality is mild, largecould be as few as 10. For biological units measured on a

continuous scale (food intake, weight) it's hard to come up with a measurement for which a sample of

100 observations is not sufficient.

One can always be perverse. If a variable is equal to 1 if "struck by lightning" and 0 otherwise, it might

take many millions of observations before the normal distribution can be used to describe the behavior

of the sample mean.

For variables like birth weight, caloric intake, cholesterol level, and crop yield measured on a continuous

underlying scale, large is somewhere between 30 and 100. Having said this, it's only fair that I try to convinceyou that it's true.

The vast majority of the measurements we deal with are made on biological units on a continuous scale

(cholesterol, birth weight, crop yield, vitamin intakes or levels, income). Most of the rest are indicators of

some characteristic (0/1 for absence/presence of premature birth, disease). Very few individual measurements

have population histograms that look less normal than one with three bars of equal height at 1,2, and 9, that is,

a population that is one-third1s, one- third2s, and one-third9s. It's not symmetric. One-third of the

population is markedly different fromthe other two-thirds. If the claimis true for this population, then perhaps

it's true for population histograms closer to the normal distribution.

The distribution of the sample mean for various

sample sizes is shown at the left. When the sample

size is 1, the sample mean is just the individual

observation. As the number of samples of a single

observation increases, the histogramof sample means

gets closer and closer to three bars of equal height at

1,2,9--the population histogramfor individual values. The

histogramof sample individual values always looks like

the population histogramof individual values as you takemore samples of individual values. It does NOT look

more and more normal unless the population fromwhich

the data are drawn is normal.

When samples of size two are taken, the first observation

is equally likely to be 1, 2 or 9, as is the second

observation.

Obs 1Obs 2Mean

1 1 1.0

1 2 1.5

1 9 5.0




4/6

2 1 1.5

2 2 2.0

2 9 5.5

9 1 5.0

9 2 5.5

9 9 9.0

The sample mean can take on the values 1, 1.5, 2, 5, 5.5, and 9.

There is only one way for the mean to be 1 (both observations are 1), but

there are two ways to get a mean of 1.5 (the first can be 1 and the second 2, or the first can be 2 and

the second 1).

There is one way to get a mean of 2,

two ways to get a mean of 5,

two ways to get a mean of 5.5, and

one way to get a mean of 9.

Therefore, when many samples of size 2 are taken and their means calculated, 1, 2, and 9 will each occur 1/9

of the time, while 1.5, 5, and 5.5 will each occur 2/9 of the time, as shown in the picture.

And so it goes for all sample sizes. Leave that to the mathematicians. The pictures are correct. Trust me.

However, you are welcome to try to construct themfor yourself, if you wish.

When n=10, the histogramof the sample means is very bumpy, but is becoming symmetric. When n=25, the

histogramlooks like a stegosaurus, but the bumpiness is starting to smooth out. When n=50, the bumpiness is

reduced and the normal distribution is a good description of the behavior of the sample mean. The behavior

(distribution) of the mean of samples of 100 individual values is nearly indistinguishable fromthe normaldistribution to the resolution of the display. If the mean of 100 observations fromthis population of 1s, 2s, and

9s can be described by a normal distribution, then perhaps the mean of our data can be described by a

normal distribution, too.

When the distribution of the individual observations is

symmetric, the convergence to normal is even faster. In

the diagrams to the left, one-third of the individual

observations are 1s, one-third are 2s, and one-third are

3s. The normal approximation is quite good, even for

samples as small as 10. In fact, even n=2 isn't too bad!

To summarize once again, the behavior of sample

means of large samples can be described by a

normal distribution even when individual

observations are not normally distributed.

This is about as far as we can go without introducing

some notation to maintain rigor. Otherwise, we'll sink into

a sea of confusion over samples and populations or

between the standard deviation and the (about-to-be-

defined) standard error of the mean.




5/6

Sample Population

mean

s standard deviation

n sample size

Thesamplehas mean and standard deviations. The sample comes froma population of individual values

with mean and standard deviation .

The behavior of sample means of large samples can be described by a normal distribution, but which normal

distribution? If you took a course in distribution theory, you could prove the following results: The mean of the

normal distribution that describes the behavior of a sample mean is equal to , the mean of the distribution of

the individual observations. For example, if individual daily caloric intakes have a population mean =1800

kcal, then the mean of 50 of them, say, is described by a normal distribution with a mean also equal to 1800

kcal.

The standard deviation of the normal distribution that describes the behavior of the sample mean is equal to

the standard deviation of the individual observations divided by the square root of the sample size, that is, /

n. Our estimate of this quantity, s/ n, is called theStandard Error of the Mean (SEM), that is,

SEM =s/ n.

I don't have a nonmathematical answer for the presence of the square root. Intuition says the mean should

vary less fromsample-to-sample as the sample sizes grow larger. This is reflected in the SEM, which

decreases as the sample size increases, but it drops like thesquare root of the sample size, rather than the

sample size itself.

To recap...

1. There are probability distributions. They do two things.

They describe the population, that is, they say what proportion of the population can be found

between any specified limits.

They describe the behavior of individual members of the population, that is, they give the

probability that an individual selected at randomfromthe population will lie between anyspecified limits.

2. When single observations are being described, the "population" is obvious. It is the population of

individuals fromwhich the sample is drawn. When probability distributions are used to describe

statistics such as sample means, there is a population, too. It is the (hypothetical) collection of values of

the statistic should the experiment or sampling procedure be repeated over and over.

3. (Most important and often ignored!) The common statistical procedures we will be discussing are

based on the probabilistic behavior of statistical measures. They areguaranteedto work as

advertised, but only if the data arise froma probability based sampling scheme or fromrandomizing

subjects to treatments. If the data do not result fromrandomsampling or randomization, there is noway to judge the reliability of statistical procedures based on randomsampling or randomization.

The Sample Mean As an Estimate of The Population Mean




6/6

These results say that for large sample sizes the behavior of sample means can be described by a normal

distribution whose mean is equal to the population mean of the individual values, , and whose standard

deviation is equal to / n, which is estimated by the SEM. In a course in probability theory, we use this

result to make statements about the a yet-to-be-obtained sample mean when the population mean is known.

In statistics, we use this result to make statements about an unknown population mean when the sample mean

is known.

Preview: Let's suppose we are talking about 100 dietary intakes and the SEM is 40 kcal. The results of thisnote say the behavior of the sample mean can be described by a normal distribution whose SD is 40 kcal.

We know that when things follow a normal distribution, they will be within 2 SDs of the population mean 95%

of the time. In this case, 2 SDs is 80 kcal. Thus, the sample mean and population mean will be within 80 kcal

of each other 95% of the time.

If we were told the population mean were 2000 kcal and were asked to predict the sample mean, we

would say there's a 95% chance that our sample mean would be in the range (1920[=2000-80],

2080[-2000+80]) kcal.

It works the other way, too. If the population mean is unknown, but the sample mean is 1980 kcal, we

would say we were 95% confident that the population mean was in the range (1900[=1980-80],

2060[=1980+80]) kcal.

Note: The use of the wordconfident in the previous sentence was not accidental. Confidentand

confidenceare the technical words used to describe this type of estimation activity. Further discussion occurs

in the notes onConfidence Intervals

The decrease of SEM with sample size reflects the common sense idea that the more data you have, the

better you can estimate something. Since the SEM goes down like the square root of the sample size, the bad

news is that to cut the uncertainty in half, the sample size would have to quadrupled. The good news is that ifyou can gather only half of the planned data, the uncertainty is only 40% larger than what it would have been

with all of the data, not twice as large.

Potential source of confusion: How can the SEM be an SD? Probability distributions have means and

standard deviations. This is true of the probability distribution that describes individual observations and the

probability distribution that describes the behavior of sample means drawn fromthat population Both of these

distributions have the same mean, denoted here. If the standard deviation of the distribution that describes

the individual observations is , then the standard deviation of the distribution that describes the sample mean

is / n, which is estimated by the SEM.

When you write your manuscripts, you'll talk about the SD of individual observations and the SEM as a

measure of uncertainty of the sample mean as an estimate of the population mean. You'll never see anyone

describing the SEM as estimating the SD of the sample mean. However, we have to be aware of this role for

the SEM if we are to be able to understand and discuss statistical methods clearly.

[back toLHSP]

Copyright 2000-2004Gerard E. Dallal

Last modified: Wed, 23 May 2012 02:52:24 GMT.



the behavior of the sample mean

Documents