the behavior of the sample mean

Upload: vgmrt

Post on 14-Apr-2018

217 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/30/2019 The Behavior of the Sample Mean

    1/6

    Announcement

    The Behavior of the Sample Mean

    (or Why Confidence Intervals Always Seem to be Based On the Normal Distribution)

    [Many of the figures in this note are screen shots froma simulation at the Rice Virtual Lab in Statistics. You

    might enjoytrying the simulationyourself after (or even while) reading this note. Java must be enabled in your

    browser for this simulation to run.]

    There is arguably no more important lesson to be learned in statistics thanhow sample means behave. It

    explains whystatistical methods work. The vast majority of the things people do with statistics is compare

    populations, and most of the time populations are compared by comparing their means.

    The way individual observations behavedepends on the population fromwhich they are drawn. If we

    draw a sample of individuals froma normally distributed population, the sample will follow a normal

    distribution. If we draw a sample of individuals froma population with a skewed distribution, the sample

    values will display the same skewness.Whatever the population looks like--normal, skewed, bimodal,

    whatever--a sample of individual values will display the same characteristics. This should be no

    surprise. Something would be very wrong if the sample of individual observationsdidn'tshare the

    characteristics of the parent population.

    We are now going to see a truly wondrous result. Statisticians refer to it asThe Central Limit Theorem. It

    says that if you draw a large enough sample, the way the sample mean varies around the population mean can

    be described bya normal distribution, NO MATTER WHAT THE POPULATION HISTOGRAM

    LOOKS LIKE!

    I'll repeat and summarize because this result is so important. If you draw a large sample, the histogram of

    the individual observations will look like the population histogram from which the observationswere drawn. However, the way the sample mean varies around the population mean can be

    described by the normal distribution.This makes it very easy to describe the way population means

    behave. The way they vary about the population mean, for large samples, is unrelated to the shape of the

    population histogram.

    Let's look at an example. In the picture to the left,

    the top panel shows a population skewed to the

    right

    the middle panel shows a sample of 25observations drawn fromthat population

    the bottompanel shows the sample mean.

    4/23/2013 The Behavior of the Sample Mean

    errydallal.com/LHSP/meandist.htm 1/6

  • 7/30/2019 The Behavior of the Sample Mean

    2/6

    The 25 observations show the kind of skewness to be expected froma sample of 25 fromthis population.

    Let's do it again and keep collecting sample means.

    And one more time. In each case, the individual

    observations are spread out in a manner reminiscent

    of the population histogram. The sample means,

    however, are tightly grouped. This is not

    unexpected. In each sample, we get observations

    fromthroughout the distribution. The larger values

    keep the mean frombeing very small while the

    smaller values keep the mean frombeing very large.

    There are so many observations, some large, some

    small, that the mean ends up being "average". If the

    sample contained only a few observations, the

    sample mean might jump around considerably from

    sample to sample, but with lots of observations the

    sample mean doesn't get a chance to change verymuch.

    Since the computer is doing all the work, let's go

    hog wild and do it10,000 times!

    Here's how those means fromthe 10,000 samples

    of 25 observations each, behave. They behave like

    things drawn froma normal distribution centered

    about the mean of the original population!

    At this point, the most common question is, "What's

    with the 10,000 means?" and it's a good question.

    Once this is sorted out, everything will fall into

    place.

    We do the experiment only once, that is, we get to see only one sample of 25 observations and one

    sample mean.

    The reason we draw the sample is to say something about the population mean.

    In order to use the sample mean to say something about the population mean, we have to knowsomething about how different the two means can be.

    This simulation tells us. The sample mean varies around the population mean as though

    it came froma normal distribution

    whose standard deviation is estimated by the Standard Error of the Mean, SEM =s/ n. (More

    about the SEM below.)

    All of the properties of the Normal Distribution apply:

    68% of the time, the sample mean and population mean will be within 1 SEM of each other.

    95% of the time, the sample mean and population mean will be within 2 SEMs of each other.

    99% of the time, the sample mean and population mean will be within 2.57 SEMs of each other,and so on.

    We will make formal use of this result in the note on Confidence Intervals.

    This result isso important that statisticians have given it a special name. It is calledThe Central Limit

    4/23/2013 The Behavior of the Sample Mean

    errydallal.com/LHSP/meandist.htm 2/6

  • 7/30/2019 The Behavior of the Sample Mean

    3/6

    Theorem. It is a limit theorembecause it describes the behavior of the sample mean in the limit as the

    sample size grows large. It is called theCentral limit theoremnot because there's any central limit, but

    because it's a limit theoremthat iscentral to the practice of statistics!

    The key to the Central limit Theoremis largesample size. The closer the histogramof the individual data

    values is to normal, the smaller largecan be.

    If individual observations follow a normal distribution exactly, the behavior of sample means can be

    described by the normal distribution for anysample size, even 1.

    If the departure fromnormality is mild, largecould be as few as 10. For biological units measured on a

    continuous scale (food intake, weight) it's hard to come up with a measurement for which a sample of

    100 observations is not sufficient.

    One can always be perverse. If a variable is equal to 1 if "struck by lightning" and 0 otherwise, it might

    take many millions of observations before the normal distribution can be used to describe the behavior

    of the sample mean.

    For variables like birth weight, caloric intake, cholesterol level, and crop yield measured on a continuous

    underlying scale, large is somewhere between 30 and 100. Having said this, it's only fair that I try to convinceyou that it's true.

    The vast majority of the measurements we deal with are made on biological units on a continuous scale

    (cholesterol, birth weight, crop yield, vitamin intakes or levels, income). Most of the rest are indicators of

    some characteristic (0/1 for absence/presence of premature birth, disease). Very few individual measurements

    have population histograms that look less normal than one with three bars of equal height at 1,2, and 9, that is,

    a population that is one-third1s, one- third2s, and one-third9s. It's not symmetric. One-third of the

    population is markedly different fromthe other two-thirds. If the claimis true for this population, then perhaps

    it's true for population histograms closer to the normal distribution.

    The distribution of the sample mean for various

    sample sizes is shown at the left. When the sample

    size is 1, the sample mean is just the individual

    observation. As the number of samples of a single

    observation increases, the histogramof sample means

    gets closer and closer to three bars of equal height at

    1,2,9--the population histogramfor individual values. The

    histogramof sample individual values always looks like

    the population histogramof individual values as you takemore samples of individual values. It does NOT look

    more and more normal unless the population fromwhich

    the data are drawn is normal.

    When samples of size two are taken, the first observation

    is equally likely to be 1, 2 or 9, as is the second

    observation.

    Obs 1Obs 2Mean

    1 1 1.0

    1 2 1.5

    1 9 5.0

    4/23/2013 The Behavior of the Sample Mean

    errydallal.com/LHSP/meandist.htm 3/6

  • 7/30/2019 The Behavior of the Sample Mean

    4/6

    2 1 1.5

    2 2 2.0

    2 9 5.5

    9 1 5.0

    9 2 5.5

    9 9 9.0

    The sample mean can take on the values 1, 1.5, 2, 5, 5.5, and 9.

    There is only one way for the mean to be 1 (both observations are 1), but

    there are two ways to get a mean of 1.5 (the first can be 1 and the second 2, or the first can be 2 and

    the second 1).

    There is one way to get a mean of 2,

    two ways to get a mean of 5,

    two ways to get a mean of 5.5, and

    one way to get a mean of 9.

    Therefore, when many samples of size 2 are taken and their means calculated, 1, 2, and 9 will each occur 1/9

    of the time, while 1.5, 5, and 5.5 will each occur 2/9 of the time, as shown in the picture.

    And so it goes for all sample sizes. Leave that to the mathematicians. The pictures are correct. Trust me.

    However, you are welcome to try to construct themfor yourself, if you wish.

    When n=10, the histogramof the sample means is very bumpy, but is becoming symmetric. When n=25, the

    histogramlooks like a stegosaurus, but the bumpiness is starting to smooth out. When n=50, the bumpiness is

    reduced and the normal distribution is a good description of the behavior of the sample mean. The behavior

    (distribution) of the mean of samples of 100 individual values is nearly indistinguishable fromthe normaldistribution to the resolution of the display. If the mean of 100 observations fromthis population of 1s, 2s, and

    9s can be described by a normal distribution, then perhaps the mean of our data can be described by a

    normal distribution, too.

    When the distribution of the individual observations is

    symmetric, the convergence to normal is even faster. In

    the diagrams to the left, one-third of the individual

    observations are 1s, one-third are 2s, and one-third are

    3s. The normal approximation is quite good, even for

    samples as small as 10. In fact, even n=2 isn't too bad!

    To summarize once again, the behavior of sample

    means of large samples can be described by a

    normal distribution even when individual

    observations are not normally distributed.

    This is about as far as we can go without introducing

    some notation to maintain rigor. Otherwise, we'll sink into

    a sea of confusion over samples and populations or

    between the standard deviation and the (about-to-be-

    defined) standard error of the mean.

    4/23/2013 The Behavior of the Sample Mean

    errydallal.com/LHSP/meandist.htm 4/6

  • 7/30/2019 The Behavior of the Sample Mean

    5/6

    Sample Population

    mean

    s standard deviation

    n sample size

    Thesamplehas mean and standard deviations. The sample comes froma population of individual values

    with mean and standard deviation .

    The behavior of sample means of large samples can be described by a normal distribution, but which normal

    distribution? If you took a course in distribution theory, you could prove the following results: The mean of the

    normal distribution that describes the behavior of a sample mean is equal to , the mean of the distribution of

    the individual observations. For example, if individual daily caloric intakes have a population mean =1800

    kcal, then the mean of 50 of them, say, is described by a normal distribution with a mean also equal to 1800

    kcal.

    The standard deviation of the normal distribution that describes the behavior of the sample mean is equal to

    the standard deviation of the individual observations divided by the square root of the sample size, that is, /

    n. Our estimate of this quantity, s/ n, is called theStandard Error of the Mean (SEM), that is,

    SEM =s/ n.

    I don't have a nonmathematical answer for the presence of the square root. Intuition says the mean should

    vary less fromsample-to-sample as the sample sizes grow larger. This is reflected in the SEM, which

    decreases as the sample size increases, but it drops like thesquare root of the sample size, rather than the

    sample size itself.

    To recap...

    1. There are probability distributions. They do two things.

    They describe the population, that is, they say what proportion of the population can be found

    between any specified limits.

    They describe the behavior of individual members of the population, that is, they give the

    probability that an individual selected at randomfromthe population will lie between anyspecified limits.

    2. When single observations are being described, the "population" is obvious. It is the population of

    individuals fromwhich the sample is drawn. When probability distributions are used to describe

    statistics such as sample means, there is a population, too. It is the (hypothetical) collection of values of

    the statistic should the experiment or sampling procedure be repeated over and over.

    3. (Most important and often ignored!) The common statistical procedures we will be discussing are

    based on the probabilistic behavior of statistical measures. They areguaranteedto work as

    advertised, but only if the data arise froma probability based sampling scheme or fromrandomizing

    subjects to treatments. If the data do not result fromrandomsampling or randomization, there is noway to judge the reliability of statistical procedures based on randomsampling or randomization.

    The Sample Mean As an Estimate of The Population Mean

    4/23/2013 The Behavior of the Sample Mean

    errydallal.com/LHSP/meandist.htm 5/6

  • 7/30/2019 The Behavior of the Sample Mean

    6/6

    These results say that for large sample sizes the behavior of sample means can be described by a normal

    distribution whose mean is equal to the population mean of the individual values, , and whose standard

    deviation is equal to / n, which is estimated by the SEM. In a course in probability theory, we use this

    result to make statements about the a yet-to-be-obtained sample mean when the population mean is known.

    In statistics, we use this result to make statements about an unknown population mean when the sample mean

    is known.

    Preview: Let's suppose we are talking about 100 dietary intakes and the SEM is 40 kcal. The results of thisnote say the behavior of the sample mean can be described by a normal distribution whose SD is 40 kcal.

    We know that when things follow a normal distribution, they will be within 2 SDs of the population mean 95%

    of the time. In this case, 2 SDs is 80 kcal. Thus, the sample mean and population mean will be within 80 kcal

    of each other 95% of the time.

    If we were told the population mean were 2000 kcal and were asked to predict the sample mean, we

    would say there's a 95% chance that our sample mean would be in the range (1920[=2000-80],

    2080[-2000+80]) kcal.

    It works the other way, too. If the population mean is unknown, but the sample mean is 1980 kcal, we

    would say we were 95% confident that the population mean was in the range (1900[=1980-80],

    2060[=1980+80]) kcal.

    Note: The use of the wordconfident in the previous sentence was not accidental. Confidentand

    confidenceare the technical words used to describe this type of estimation activity. Further discussion occurs

    in the notes onConfidence Intervals

    The decrease of SEM with sample size reflects the common sense idea that the more data you have, the

    better you can estimate something. Since the SEM goes down like the square root of the sample size, the bad

    news is that to cut the uncertainty in half, the sample size would have to quadrupled. The good news is that ifyou can gather only half of the planned data, the uncertainty is only 40% larger than what it would have been

    with all of the data, not twice as large.

    Potential source of confusion: How can the SEM be an SD? Probability distributions have means and

    standard deviations. This is true of the probability distribution that describes individual observations and the

    probability distribution that describes the behavior of sample means drawn fromthat population Both of these

    distributions have the same mean, denoted here. If the standard deviation of the distribution that describes

    the individual observations is , then the standard deviation of the distribution that describes the sample mean

    is / n, which is estimated by the SEM.

    When you write your manuscripts, you'll talk about the SD of individual observations and the SEM as a

    measure of uncertainty of the sample mean as an estimate of the population mean. You'll never see anyone

    describing the SEM as estimating the SD of the sample mean. However, we have to be aware of this role for

    the SEM if we are to be able to understand and discuss statistical methods clearly.

    [back toLHSP]

    Copyright 2000-2004Gerard E. Dallal

    Last modified: Wed, 23 May 2012 02:52:24 GMT.

    4/23/2013 The Behavior of the Sample Mean

    errydallal.com/LHSP/meandist.htm 6/6