lecture7 handouts
TRANSCRIPT
-
8/10/2019 Lecture7 Handouts
1/35
Lecture 7: Sampling and Sampling Distributions
Matt Golder & Sona Golder
Pennsylvania State University
Introduction
Over the last several lectures, we have attempted to obtain a workingknowledge of probability theory.
We now turn our attention to the application of that theory.
Specically, we will employ probability theory as a means to make inferencesabout the world.
Introduction
Figure: Research In A Single Figure
Recall that statistical inference involves drawing inferences about thepopulation from a sample (subgroup) taken from the population.
Notes
Notes
Notes
http://goforward/http://find/http://goback/http://goforward/http://find/http://goback/http://goforward/http://find/http://goback/ -
8/10/2019 Lecture7 Handouts
2/35
Introduction
To the extent that a sample differs from its population, that difference can bedue to two factors.
1 Bias: Some aspect of the sampling mechanism (or the research design)causes the sample to be systematically unrepresentative of the populationin one or more relevant ways.
2 Sampling Error : Any difference between the sample and the populationthat is nonsystematic, but is instead due to the randomness in the sampleselection design.
In general, bias is a far bigger and more complicated threat to validinference than is sampling error.
Sampling
Sampling involves drawing a subgroup of units from some larger population .
As weve seen, the units that make up the population are known as the units of analysis .
The population in question is often referred to as the sampling frame .
The sample size (the number of units in the sample) is usually denoted by N .
The population size (if it is known) is usually denoted by N .
And the things that are being sampled are called the primary sampling units(PSUs).
Simple Random Sampling
Simple random sampling begins with a population of size N , and randomlydraws N units from the population in any fashion that ensures that the probability of any one unit being drawn for the sample is 1/ N .
In a simple random sample, the PSUs are the same as the units of analysisthemselves.
In general, a simple random sample is the best sort of sample to have.
It leads to the simplest means of inference, and will (in the long run) be themost representative type of sample vis-a-vis the population.
Notes
Notes
Notes
http://find/http://find/http://find/ -
8/10/2019 Lecture7 Handouts
3/35
Simple Random Sampling
A simple random sample is generally the hardest to obtain.
A simple random sample requires a lot on the part of the researcher.
She must:
1 Know every unit in her population (its existence, location, etc.), in orderto assign it for sampling.
2 Be able to include all selected units in the sample that is drawn.
Simple Random Sampling
Know every unit in her population (its existence, location, etc.), in orderto assign it for sampling.
This can be difficult if the population is (i) large, (ii) ever-changing, (iii)amorphous, and/or (iv) hidden in some way.
These problems can result in a particular form of sampling bias, such as theinability to ensure that the probability of each units being sampled is 1/ N .
Simple Random Sampling
Be able to include all selected units in the sample that is drawn.
This raises the issue of nonresponse bias what if a survey respondent issimply unreachable?
So, simple random sampling can be difficult. And, it turns out that we canoften do better if we are interested in a particular subpopulation within thepopulation.
Notes
Notes
Notes
http://find/http://find/http://find/ -
8/10/2019 Lecture7 Handouts
4/35
Stratied Sampling
A stratied sample is one in which the population is divided a priori into two ormore groups, and individuals are sampled randomly from within those groups.
The groups are known as strata , and they are often selected on the basis of thestudys subject matter.
The primary sampling units remain the same; they are just divided up intostrata.
Stratied sampling is often done when there is interest in some particularcharacteristics of the population.
Stratied Sampling in Stata
To see how stratied sampling is done in Stata , we can take a look at Zornsdata on the 7,161 fully-decided cases handed down by the Warren and Burgercourts.
us is a string variable for the U.S citation of the case,
id is just an ID variable,
amrev are the number of amicus curiae (friend of the court) briefs ledsupporting reversal, and
amaff is the number of such briefs supporting affirmance.
sumam is just amrev + amaff ,
fedpet is 1 if the federal government was the petitioner, 0 otherwise,
constit is 1 if the decision was made on constitutional grounds, 0otherwise, and
sgam is 1 if the Solicitor General (the U.S.s attorney) led am amicus curiae brief in the case, and 0 otherwise.
Stratied Sampling in Stata
Only 25% of the decisions were made on constitutional grounds.
. sum constit
Variable | Obs Mean Std. Dev. Min Max-------------+--------------------------------------------------------
constit | 7161 .2535959 .4350993 0 1
Notes
Notes
Notes
http://find/http://find/http://find/ -
8/10/2019 Lecture7 Handouts
5/35
Stratied Sampling in Stata
Suppose we sample 10 observations each from constit = 0 and constit = 1 .
. sample 10, count by(constit)(7141 observations deleted)
. sum
Variable | Obs Mean Std. Dev. Min Max-------------+--------------------------------------------------------
us | 0id | 20 3923.55 2287.04 94 7032
amrev | 20 .3 .6569467 0 2amaff | 20 .2 .615587 0 2sumam | 20 .5 1.147079 0 4
-------------+--------------------------------------------------------fedpet | 20 .3 .4701623 0 1
constit | 20 .5 .5129892 0 1sgam | 20 .05 .2236068 0 1
Stratied Sampling in Stata
Suppose we sample 5% of the observations, stratied by constit .
. sample 5, by(constit)(6803 observations deleted)
. sum
Variable | Obs Mean Std. Dev. Min Max-------------+--------------------------------------------------------
us | 0
id | 358 3613.561 1928.962 55 7161amrev | 358 .396648 1.213459 0 12amaff | 358 .2793296 .8951582 0 9sumam | 358 .6759777 1.695614 0 13
-------------+--------------------------------------------------------fedpet | 358 .1675978 .3740315 0 1
constit | 358 .2541899 . 4360143 0 1sgam | 358 .0614525 .2404946 0 1
Stratied Sampling in R
Suppose we sample 10 observations each from constit = 0 and constit = 1 .
> library(sampling)> sample sample.data summary(sample.data)
Notes
Notes
Notes
http://find/http://goback/http://find/http://goback/http://find/http://goback/ -
8/10/2019 Lecture7 Handouts
6/35
Stratied Sampling
Suppose we divide the population into two subgroups as we just did with ourStata and R examples; call them A and B .
If the proportion of group A in the sample is equal to its proportion in thepopulation, then we say we are taking a proportional stratied sample .
We obtained a proportional stratied sample when we sampled 5% of the
observations stratied by constit .
If the proportion of some group A in the sample is different from its proportionin the population, then we say we have oversampled or undersampled thatgroup.
Given that only 25% of the population decisions were based onconstitutional grounds, we oversampled on constit = 1 when wesampled 10 observations each from constit = 0 and constit = 1 .
Stratied Sampling
Oversamples are often done when one or more groups of interest, such asnative Americans, are a relatively small fraction of the total population.
With over- and under-sampling, the probability of selection for any given unit isno longer 1/ N and so we must adjust our sample to reect the differentialprobabilities, by giving different weights to observations from different strata.
Stratied sampling requires that we know more about our population than isthe case with simple random sampling.
Cluster Sampling
Cluster sampling is slightly different from stratied sampling.
In a clustered sample, the units of analysis are grouped into clusters .
However, the clusters themselves are then sampled randomly , and all units ineach selected cluster are used in the sample.
Thus, cluster sampling changes the identity of the PSU, from the unit of analysis to the cluster.
This means that the probability of an individual unit of analysis being sampledis no longer equal to all others; moreover, it may not even be known directly atall.
Cluster samples are used extensively most major surveys are done via clustersampling (often by telephone area code and exchange).
Notes
Notes
Notes
http://find/http://find/http://find/ -
8/10/2019 Lecture7 Handouts
7/35
Multi-Stage Sampling
Multi-stage sampling is a generalization of cluster sampling.
1 Select a type of cluster, and identify subclusters of units within thecluster, etc. until we get to the lowest level cluster in which units arelocated.
2 Select randomly or in a stratied way some number of top-levelclusters.
3 Within each selected cluster, select again, randomly or stratifying some number of subclusters.
4 Within subclusters, select sub-subclusters, etc.5 At the lowest subcluster level, select some number of units from each
sub-cluster.
Multi-Stage Sampling
The rst-stage clusters are the primary sampling units, the second stage arethe secondary sampling units, and so forth.
Example: When sampling survey respondents, rst select by blocks, then byhouses within blocks, then by residents within each (selected) house.
This is a three-stage design.
The blocks are clusters, the houses are subclusters, and the individualsare the end units being sampled.
Multi-Stage Sampling
Multistage sampling can be useful in that we can obtain a probability samplewithout having to know the identity of each potential unit in the samplingframe.
In our example, we dont have to know exactly who lives in each house, as longas we have a rule that says something like select one person from amongthose in each house with equal probability at the last stage.
Most large national surveys are conducted using multistage sampling eitherby addresses/locations, or by telephone exchanges.
Notes
Notes
Notes
http://find/http://goback/http://find/http://goback/http://find/http://goback/ -
8/10/2019 Lecture7 Handouts
8/35
Probability Samples
So far we have discussed probability samples - we can calculate the probabilityof a given unit in the population being selected for the sample.
These probabilities can be difficult to calculate with multiple stages andstratication etc., but it can be done.
As a result, we can use something like weighting to adjust the results so thatthe sample has the statistical properties of a simple random sample.
Probability Samples: Weighting
Figure: Sample Weighting (Brooker & Schaefer, N=1,000)
Educational Level
Percent inPopulation
Sample Size Over/Under
Representation
Weight WeightedNumber in
Sample Number Percent Fraction Decimal
College Graduate 20% 300 30% 30/20 20/30 .667 200
SomeCollege 28% 330 33% 33/28 28/33 .848 280
HighSchool
Graduate 37% 270 27% 27/37 37/27 1.37 370
Less thanHS Grad 15% 100 10% 10/15 15/10 1.50 150
Total 100% 1,000 100% 1,000
Nonprobability Samples
Nonprobability samples are samples where the probability that every unit is inthe sample cannot be known, either because the sample is not probabilistic atall, or because it depends on probabilities that the researcher does not know orcontrol.
The researcher cannot correct for any unrepresentativeness in the samplingmechanism after the fact.
Nonprobability samples will be questionably representative at best, anddisastrously wrong at worst.
Notes
Notes
Notes
http://find/http://goback/http://find/http://goback/http://find/http://goback/ -
8/10/2019 Lecture7 Handouts
9/35
Nonprobability Samples
Convenience Sampling : The researcher samples whatever units comemost readily to hand. In this case, probability is not used in the samplingat all.
Purposive Sampling : The researcher selects units on the basis of whether she believes they ought to be in the sample. Again, there is no
use of probability at all in the sampling method.
Snowball Sampling : The researcher selects a unit, and then other unitswith some relationship to that rst unit are sampled, and so forth. Insuch a design, even if the initial sample is random / probabilistic, theprobabilities of selection for subsequent units cannot be known by theresearcher (even if, for example, they are known to the units themselves).With snowball sampling, the respondents are often similar to one anothersince they result from the rst responders social network.
Nonprobability Samples
Modern Straw Polls: Respondents call 1-800 numbers or use theinternet to voice an opinion. The samples suffer from many things,including self-selection problems.
Quota sampling : The researcher selects units of various types up tosome quota (for example, s/he might question 100 men and 100 women)and then stops, no longer selecting units of that type.
If quotas are combined with (say) convenience sampling, then theygenerate a nonprobability sample (Gallup 1948, Dewey-Trumanasco).
If quota sampling is combined with a probability sampling method,it can be a potentially viable (if messy) way of sampling.
Sampling Error
Sampling error , sometimes called the margin of error (MOE), is just the(random) difference between the population parameter and its sample estimate.
Although we dont know the population parameter, we can put bounds onthe sampling error.
Standard error = q (1 q )N where N is the sample size and q is the calculated quantity (proportion) of interest.
As well see shortly, we typically calculate a relative sampling error for aparticular level of condence .
Notes
Notes
Notes
http://find/http://find/http://find/ -
8/10/2019 Lecture7 Handouts
10/35
Sampling Error: Sample Size
Figure: Sampling Error vs. Sample Size for Simple Random Sample
0 100 200 300 400 500
0 . 1
0 . 2
0 . 3
0 . 4
Sample Size (N)
R e
l a t i v e
M a r g
i n o
f E r r o r
90 Percent
99 Percent
95 Percent
The margin of error is always decreasing, at a decreasing rate, in the samplesize.
Sampling Error: Sample Design
Calculating the margin of error can be difficult with complicated sampledesigns.
1 Simple random sampling is relatively easy, and the sample error willgenerally be small.
2 Stratied samples can make the margin of error even smaller if thestratication is done in a way that balances the sample across thestrata.
3 Cluster samples and their variants will have higher margins of error thanthose for which the units of analysis are the PSUs
Sampling Error: Population Size
Holding the sample size constant, increasing the population size increases themargin of error but with a high degree of diminishing returns.
When the population is small and approaches the sample size, then the sampledoes a better job of being representative.
We use nite population corrections to capture this when N 5% of thepopulation N .Obviously, when N = N , the margin of error is zero.
Notes
Notes
Notes
http://find/http://find/http://find/ -
8/10/2019 Lecture7 Handouts
11/35
Randomization
Beyond random sampling, the practice of randomization is a powerful tool forlearning about the world.
The main value of randomization is to prevent confounding effects from biasingour results.
Confounding usually occurs when an extraneous (third) variable has aninuence on both the cause and the effect that we are studying.
In the social sciences, we typically see two types of randomization:1 Random treatment assignment .2 Orthogonalization .
Observational studies typically focus on orthogonalization.
Random Treatment Assignment
Randomizing a treatment is the single best way to assess the causalrelationship between some factor and some outcome.
It involves (i) randomly assigning subjects to treatment and control groups,(ii) administering the treatment or placebo, and (iii) measuring the outcomes.
Random treatment assignment is often difficult or impossible in much of thesocial sciences.
Orthogonalization
Randomization can also be useful even if no treatment per se is administered.
Example: The U.S. Courts of Appeals decide cases in three-judge panels: (i)three judges from a circuit are randomly assigned to a panel, and (ii) cases arerandomly assigned to that panel.
Although there is no treatment here, randomization allows us to address anumber of interesting questions. For instance, randomization gives us leverageon whether gender makes a difference since we can be sure that a given case isno more likely to be decided by a panel of male judges than a panel of femaleones.
In effect, randomization makes it possible that all those potentially confoundingfactors in our study are no longer related to the variables we care about.
Notes
Notes
Notes
http://find/http://find/http://find/ -
8/10/2019 Lecture7 Handouts
12/35
Sampling Distributions
A population is considered known if we know the probability distribution f (x)of the associated random variable X .
There will be certain quantities that appear in f (x), such as and in thecase of the normal distribution or in the case of the binomial distribution.These quantities are known as population parameters .
Often times, we may not know everything about the population parameters.
For example, we may not know one or both of the values of or in a normaldistribution and we may want to draw statistical inferences about thesepopulation parameters from a sample.
Sample Statistics
Table: A Population of 100 Students Heights
Population Distribution Calculation of Calculation of Height Frequency Relative Frequency Mean Variance 2
x f(x) xf (x ) (x )2 f (x )60 1 0.01 0.60 0.8163 6 0.06 3.78 2.1666 24 0.24 15.84 2.1669 38 0.38 26.22 072 24 0.24 17.28 2.1675 6 0.06 4.50 2.1678 1 0.01 0.78 0.81
N = 100 1.00 = 69 .00 2 = 10 .26 = 3 .20
The goal is to take a random sample from the population and then use thesample to obtain values - sample statistics - to estimate and test hypothesesabout the population parameters .
Sample Statistics
Suppose we want to draw a random sample of size N = 100 from ourpopulation.
We would start by choosing an individual at random from the population. Thisindividual can have only one value, say x 1 , of the various p ossible heights. Wecall x1 the value of a random variable X 1 , where the subscript 1 corresponds tothe rst individual chosen.
Similarly, we can choose the second individual for the sample, who can haveany one of the values x2 of the possible heights, and x 2 can be taken as thevalue of a random variable X 2 .
We can continue this process up to X 100 because we want a sample size of 100.
Notes
Notes
Notes
http://find/http://find/http://find/ -
8/10/2019 Lecture7 Handouts
13/35
Sample Statistics
In the general case, a sample of size N would be described by the valuesx1 , x 2 , . . . , x n of the random variables X 1 , X 2 , . . . , X n .
In the case of sampling with replacement , X 1 , X 2 , . . . , X n would beindependent, identically distributed random variables having probability f (x).
This means that when we randomly draw each observation from a populationprobability distribution f (x), we get individual observations that each have thepopulation probability distribution f (x), i.e., each observation has the mean and standard deviation of the population.
The relative frequency column can be viewed as the probability distribution notonly of the population but also of a single observation taken at random.
The probability of getting someone with a height of 63 in our sample(0.06) is the same probability of 63 occurring in the population.
Sample Statistics
Denition : A very simple random sample is a sample whose N observations,X 1 , X 2 , . . . , X n are independent random variables. The distribution of each X is the population distribution f (x); that is
f (x1 ) = f (x2 ) = . . . = f (xn ) = population distribution, f (x)
Then each observation has the mean and standard deviation of thepopulation.
So, our sample is made up of N random variables X 1 , X 2 , . . . , X n each withprobability distribution f (x). Their joint distribution or likelihood is
Pr (X 1 = x1 , X 2 = x2 , . . . , X n = xn ) = f (x1 )f (x2 ) . . . f (xn )
Recall that this is the probability of getting all the values/observations in oursample.
Sample Statistics
Any quantity obtained from a sample for the purpose of estimating apopulation parameter is called a sample statistic , or simply a statistic .
Mathematically, a sample statistic, such as the mean, for a sample of size N can be dened as a function of the random variables X 1 , X 2 , . . . , X n , i.e.g(X 1 , X 2 , . . . , X n ).
The function g(X 1 , X 2 , . . . , X n ) is itself a random variable, whose values canbe represented by g(x1 , x 2 , . . . , x n ). The word statistic is often used for therandom variable itself or for its values.
In general, corresponding to each population parameter, there will be a statisticto be computed from the sample.
Notes
Notes
Notes
http://find/http://find/http://find/ -
8/10/2019 Lecture7 Handouts
14/35
Sampling Distributions
Anything that is a function of random variables is, itself, a random variable.
A sample statistic that is computed from X 1 , X 2 , . . . , X n is a function of theserandom variables and is therefore itself a random variable.
As a result, the sample statistic will have its own probability distribution.
The probability distribution of a sample statistic is often called the samplingdistribution of the statistic.
Sampling Distributions
What does all this mean?
Suppose we draw a sample of 100 from our p opulation of 100 students andcalculated a sample mean X .
This is only one sample that we could have drawn from the population andhence only one sample mean.
Suppose we draw a second sample of 100 from the population and calculatedthe mean of this sample. It is unlikely that the new mean will be exactly the
same as the rst mean.
This is what we mean by saying that sample statistics - in this case the mean are themselves random variables; they vary from one sample to another.
As a result, the estimated statistics will have their own sampling distribution.
Sampling Distributions
In effect, the frequentist perspective on statistical inference starts with the ideathat our sample is just one of many possible samples that we could have drawnfrom the population.
A sampling distribution of a statistic can be thought of as the theoreticaldistribution of some sample statistic (a mean, a variance etc.) that we wouldobserve through repeated sampling.
Thus, although we will typically only ever have one sample, we can takeaccount of the fact that other samples would have led to different statistics bylooking at the sampling distribution.
Notes
Notes
Notes
http://find/http://find/http://find/ -
8/10/2019 Lecture7 Handouts
15/35
Sampling Distributions
If we knew the population, we could easily learn about the samplingdistribution.
But we often dont have the population or even if we did it would beimpractical to sample from it enough to learn about the sampling distribution.
There are usually two ways of learning about a sampling distribution.1 Theoretical derivation : Try to theoretically derive the sampling
distribution.2 Experimental derivation: Use computers to generate a known
population (we control the parameters) and then simulate the samplingprocess on this population. This is known as a Monte Carlo analysis.
Moments of the Sample Mean
Suppose we are interested in estimating a population mean . If we take arandom sample of observations from this population and calculate the samplemean X , how good will X be as an estimator of its target ?
Let X 1 , X 2 , . . . , X N denote independent and identically distributed randomvariables for a random sample of size N . Then the sample mean is a randomvariable itself:
X = X 1 + X 2 + . . . + X N
N
If x 1 , x 2 , . . . , x N denote the values obtained in a particular sample of size N ,then we can also denote the sample mean as:
x = x 1 + x2 + . . . + xN
N
Moments of the Sample Mean: Mean
Each random draw of X N is itself a random variable.
As a result, X is a linear combination of random variables and is therefore itself a random variable with a mean and standard deviation.
X = E ( X ) = E (X 1 ) + E (X 2 ) + . . . + E (X N )
N
Recall that each observation X in a random sample has the populationdistribution f (x) with mean .
Thus, E (X 1 ) = E (X 2 ) = . . . = E (X n ) = and therefore
X = E ( X ) = + + . . . + N = N N =
Notes
Notes
Notes
http://find/http://find/http://find/ -
8/10/2019 Lecture7 Handouts
16/35
Moments of the Sample Mean: Mean
What this shows is that the expected value of the sample mean is equal to thepopulation mean, i.e, on average , X = .
Put differently, the sample mean X is an unbiased estimator of the populationmean .
Moments of the Sample Mean: Variance
Since each independently drawn X N is itself a random variable, then thevariance of X will be a linear combination of the variance of each X N .
2X = Var( X ) = E [( X )2 ]= Var
1N
N
i =1
X i
= 1N 2
N
i =1
Var(X i )
= 1N 2 (Var(X 1 ) + Var(X 2 ) + . . . + Var(X N ))
= N 2
N 2 =
2
N
Moments of the Sample Mean: Variance
The standard deviation of X is:
X = Standard Deviation of X = N
The standard deviation of a statistic is also referred to as the standard error of the statistic.
The standard error gives one a rough idea how much variation there will be inour estimate of X as a function of our sample size.
It should be clear that the precision of our estimate of X grows with samplesize.
Because X is an
unbiased estimator of the population mean , this means thatthe sampling distribution of X becomes more and more concentrated around
the true population value as the sample size increases, i.e. X is a consistentestimator of .
Notes
Notes
Notes
http://find/http://goback/http://find/http://goback/http://find/http://goback/ -
8/10/2019 Lecture7 Handouts
17/35
Moments of the Sample Mean: Variance
Example : Suppose we were to take many samples of size N = 4 from ourpopulation of 100 student heights and calculate the sample mean in each case.How would the sample means uctuate?
Moments of the Sample Mean: Variance
Example : Suppose we were to take many samples of size N = 4 from ourpopulation of 100 student heights and calculate the sample mean in each case.How would the sample means uctuate?
If the sample size was 4, the sample means would uctuate around the targetof 69 with a standard error of 1.6 i.e.
E ( X ) = X = = 69
X = SE X =
N = 3.2
2 = 1 .6
Moments of the Sample Mean: Variance
Example : Suppose the sample size quadrupled to N = 16 ? How would thesample means uctuate?
Notes
Notes
Notes
http://find/http://find/http://find/ -
8/10/2019 Lecture7 Handouts
18/35
Moments of the Sample Mean: Variance
Example : Suppose the sample size quadrupled to N = 16 ? How would thesample means uctuate?
If the sample size was 16, the sample means would uctuate around the targetof 69 with a standard error of 0.8 i.e.
E ( X ) = X
= = 69
SE X = N =
3.24
= 0 .8
One must quadruple the sample size to halve the sampling error.
Moments of the Sample Mean: Variance
If the population is of size N , if sampling is without replacement , and if thesample size is N N , then the variance of the sampling mean is given by:
2X = 2
N N N N 1
and the standard error is
X = N N N N 1
Youll notice that as N , the variance reduces back to 2X =
2
N and thestandard error reduces back to X = N .
Obviously, all of this assumes that the population size is known.
Shape and the Central Limit Theorem
At this point, we have dened the rst and second moments of the distributionfor the sample mean.
Theorem : If X is a random variable with mean and variance 2 , then,whatever the distribution of X , the sampling distribution of X has the samemean, and variance equal to
2
N .
But what does the sample distribution look like? What is its shape?
This brings us back to the Central Limit Theorem .
Notes
Notes
Notes
http://find/http://goback/http://find/http://goback/http://find/http://goback/ -
8/10/2019 Lecture7 Handouts
19/35
Shape and the Central Limit Theorem
Central Limit Theorem: If X has any distribution with mean and variance2 , then the distribution of
X XX
= X / N =
N X
approaches the standard normal distribution as the sample size increases.
In other words, the standardized variable associated with X is asymptotically normal.
This means that the distribution of X in large samples is approximately normalwith mean and variance
2
N , i.e., X = 1N
N i =1 X i N ,
2
N .
Shape and the Central Limit Theorem
This result is incredibly powerful.
It means that no matter what the distribution of the population or randomvariable, X , we can approximate the distribution of the sample mean, X , withthe normal distribution so long as the sample size is large enough.
When X is distributed normally, th en X is always normally distributedirrespective of sample size.
For non-normal variables, the CLT provides an approximation for nitesamples. In general, the approximation is good when the sample size islarge enough ( N > 30 is usually good) but can also be quite good evenfor small samples ( N = 10 or 20) when the variable has a symmetricdistribution.
Shape and the Central Limit Theorem
To illustrate the Central Limit Theorem, click here
To go to Distributions Related to Normal applet to illustrate the samplingdistribution of the mean, the central limit theorem, and the samplingdistribution of the variance, click here
Notes
Notes
Notes
http://www.mathcs.org/java/programs/CLT/clt.htmlhttp://www.mathcs.org/java/programs/CLT/clt.htmlhttp://www.mathcs.org/java/programs/CLT/clt.htmlhttp://www.mathcs.org/java/programs/CLT/clt.htmlhttp://www.mathcs.org/java/programs/CLT/clt.htmlhttp://www.mathcs.org/java/programs/CLT/clt.htmlhttp://www.mathcs.org/java/programs/CLT/clt.htmlhttp://www.mathcs.org/java/programs/CLT/clt.htmlhttp://www.mathcs.org/java/programs/CLT/clt.htmlhttp://www.mathcs.org/java/programs/CLT/clt.htmlhttp://www.mathcs.org/java/programs/CLT/clt.htmlhttp://www.mathcs.org/java/programs/CLT/clt.htmlhttp://www.mathcs.org/java/programs/CLT/clt.htmlhttp://www.mathcs.org/java/programs/CLT/clt.htmlhttp://www.mathcs.org/java/programs/CLT/clt.htmlhttp://www.brookscole.com/cgi-wadsworth/course_products_wp.pl?fid=M20b&flag=student&product_isbn_issn=9780495110811&disciplinenumber=17http://www.brookscole.com/cgi-wadsworth/course_products_wp.pl?fid=M20b&flag=student&product_isbn_issn=9780495110811&disciplinenumber=17http://www.brookscole.com/cgi-wadsworth/course_products_wp.pl?fid=M20b&flag=student&product_isbn_issn=9780495110811&disciplinenumber=17http://www.brookscole.com/cgi-wadsworth/course_products_wp.pl?fid=M20b&flag=student&product_isbn_issn=9780495110811&disciplinenumber=17http://www.brookscole.com/cgi-wadsworth/course_products_wp.pl?fid=M20b&flag=student&product_isbn_issn=9780495110811&disciplinenumber=17http://www.mathcs.org/java/programs/CLT/clt.htmlhttp://www.mathcs.org/java/programs/CLT/clt.htmlhttp://www.brookscole.com/cgi-wadsworth/course_products_wp.pl?fid=M20b&flag=student&product_isbn_issn=9780495110811&disciplinenumber=17http://www.brookscole.com/cgi-wadsworth/course_products_wp.pl?fid=M20b&flag=student&product_isbn_issn=9780495110811&disciplinenumber=17http://www.brookscole.com/cgi-wadsworth/course_products_wp.pl?fid=M20b&flag=student&product_isbn_issn=9780495110811&disciplinenumber=17http://www.brookscole.com/cgi-wadsworth/course_products_wp.pl?fid=M20b&flag=student&product_isbn_issn=9780495110811&disciplinenumber=17http://find/http://www.brookscole.com/cgi-wadsworth/course_products_wp.pl?fid=M20b&flag=student&product_isbn_issn=9780495110811&disciplinenumber=17http://www.brookscole.com/cgi-wadsworth/course_products_wp.pl?fid=M20b&flag=student&product_isbn_issn=9780495110811&disciplinenumber=17http://www.brookscole.com/cgi-wadsworth/course_products_wp.pl?fid=M20b&flag=student&product_isbn_issn=9780495110811&disciplinenumber=17http://www.mathcs.org/java/programs/CLT/clt.htmlhttp://find/http://find/ -
8/10/2019 Lecture7 Handouts
20/35
Shape and the Central Limit Theorem
The central limit theorem allows us to use the familiar standard normal tablesto determine how closely a sample mean X will estimate a population mean .
Example : With a sample of N = 10 , we have
Expected Height = E ( X ) = X = = 69
and
Standard Error = X = N =
3.2 10 = 1 .02
What is the probability that the sample mean X will be within 2 inches of thepopulation mean when N = 10 ? In other words, what is Pr (67 < X < 71)?
Shape and the Central Limit Theorem
z 1 = 67 69
1.02 = 1.96
z 2 = 71 69
1.02 = 1 .96
This implies that Pr (67 < X < 71) = Pr (1.96 < z < 1.96).Pr (z > 1.96) = 0 .025
And by symmetry,
Pr (z < 1.96) = 0 .025
Shape and the Central Limit Theorem
Thus, the probability that the sample mean X will be within 2 inches of thepopulation mean i.e. between 67 and 71 is:
Pr (67 < X < 71) = 1 Pr(z > 1.96) Pr(z < 1.96) = 0 .95
In other words, there is a 95% chance that the sample mean will be within 2inches of the population mean.
Notes
Notes
Notes
http://find/http://goback/http://find/http://goback/http://find/http://goback/ -
8/10/2019 Lecture7 Handouts
21/35
Shape and the Central Limit Theorem
Example : Suppose a large class in statistics has marks normally distributedaround a mean of 72 with a standard deviation of 9. Find the probability thatan individual student drawn at random will have a mark over 80.
Shape and the Central Limit Theorem
Example : Suppose a large class in statistics has marks normally distributedaround a mean of 72 with a standard deviation of 9. Find the probability thatan individual student drawn at random will have a mark over 80.
z = x
= 80 72
9 = 0 .89
P r (x > 80) = P r (z > 0.89) = 0 .187 19%
Shape and the Central Limit Theorem
Now nd the probability that a random sample of 10 students will have anaverage mark over 80.
Notes
Notes
Notes
http://find/http://goback/http://find/http://goback/http://find/http://goback/ -
8/10/2019 Lecture7 Handouts
22/35
Shape and the Central Limit Theorem
Now nd the probability that a random sample of 10 students will have anaverage mark over 80.
According to the CLT, X has an approximately normal distribution with anexpected value of 72 and a standard error of n =
9 10 = 2 .85.
z = x
SE = 80
72
2.85 = 2 .81
Pr(x > 80) = Pr (z > 2.81) = 0 .002
Although there is a reasonable chance (19%) that one student will get above80, there is very little chance (0.2%) that a sample average of 10 students willperform this well.
What happens if the population of students was not normally distributed?
Shape and the Central Limit Theorem
Example : Achievement test scores of all high school seniors in a state havemean 60 and variance 64. A random sample of N = 100 students from onelarge high school had a mean of 58. Is there evidence to suggest that this highschool is inferior? Calculate the probability that the sample mean is at most 58when N = 100 .
Shape and the Central Limit Theorem
Example : Achievement test scores of all high school seniors in a state havemean 60 and variance 64. A random sample of N = 100 students from onelarge high school had a mean of 58. Is there evidence to suggest that this highschool is inferior? Calculate the probability that the sample mean is at most 58when N = 100 .
Let x denote the mean of a random sample of N = 100 scores from apopulation with = 60 and 2 = 64 .
We want to approximate Pr (x 58).
Notes
Notes
Notes
http://find/http://goback/http://find/http://goback/http://find/http://goback/ -
8/10/2019 Lecture7 Handouts
23/35
Shape and the Central Limit Theorem
z = x
SE =
58 608 100 =
58 600.8
= 2.5
Pr(x 58) = Pr(z 2.5) = 0 .0062
Because this probability is so small, it is unlikely that the sample from theschool of interest can be regarded as a random sample from a population with = 60 and 2 = 64 .
In other words, the evidence suggests that this high school is inferior.
Moments of Sample Proportion
Sometimes we will be interested in using a proportion P from the sample tomake an inference about the proportion in the population .
Like X , P also uctuates from sample to sample in a pattern (samplingdistribution) that is easily summarized.
In random samples of size N , the sample proportion P uctuates around thepopulation proportion with variance 2P =
P (1 P )N and standard error
P =
P (1 P )N .
Thus, as N increases, the sampling distribution of P concentrates more andmore around its target .
For large values of N (N 30), the sampling distribution is very nearly anormal distribution.
Moments of Sample Proportion
Example : Suppose we had a population of voters 60% Republicans and 40%Democrats.
The sampling distribution for the proportion P of Republicans when the samplesize is N = 100 would be normally distributed around the target populationproportion = 0 .60 with a standard error P = p (1 p )N = 0 . 6(0 . 4)100 = 0 .05.How likely is it that a poll of 100 would contain a minority of Republicans?
Notes
Notes
Notes
http://find/http://find/http://find/ -
8/10/2019 Lecture7 Handouts
24/35
Moments of Sample Proportion
A minority means that the proportion P of Republicans is less than 50%.
z = p
SE = 0.5 0.6
0.05 = 2
Pr( p 0.5) = Pr (z 2) = 0 .023 2%
As you can see, the likelihood that a sample of 100 voters would contain aminority of Republicans is quite low.
Moments of Sample Proportion
Example : Of your rst 15 grandchildren, what is the chance there will be morethan 10 boys?
More than 10 boys is the same as saying that the proportion of boys is morethan 10/15.
Thus, the proportion P of boys in sample of 15 will uctuate around thepopulation proportion = 0 .5 with a standard errorP =
p (1 p )n =
0 . 5(0 . 5)
15 = 0 .129.
Moments of Sample Proportion
z = p
SE =
1015 0.5
0.129 = 1 .29
Pr p 1015
= Pr(z 1.29) = 0 .099 10%
Thus, the chance of more than 10 boys is about 10%.
Notes
Notes
Notes
http://find/http://find/http://find/ -
8/10/2019 Lecture7 Handouts
25/35
Moments of Sample Proportion
If the population is of size N , if sampling is without replacement, and if thesample size is N N , then the variance of the sampling proportion is given by
2P = P (1 P )
N N N N 1
and the standard error is
P = P (1 P )N N N N 1As N , the variance reduces back to 2P = P (1 P )N and the standard errorreduces back to P = P (1 P )N .Obviously, all of this assumes that the population size is known.
Proportions and Sample Means
What you should have noticed is that the sample proportion is just a disguisedsample mean.
All proportions essentially come from a sample in which observations can bethought of as taking on a value of 1 or 0.
In other words, proportions are sample means for dichotomous variables.
Proportions and Sample Means
Table: Population Mean and Variance for a 0-1 Variable
x f (x) xf (x) x2 f (x)0 (1 ) 0 01
= E (x2 ) = 2 = E (x2 ) 2 = 2 = (1 )
= (1 )Similarly the variance and standard error of P is equal to the variance andstandard error of X .
2X = 2
N =
(1 )N
= 2P
Notes
Notes
Notes
http://goforward/http://find/http://goback/http://goforward/http://find/http://goback/http://goforward/http://find/http://goback/ -
8/10/2019 Lecture7 Handouts
26/35
Sampling Distribution of the Variance
In addition to the mean, we might think to use the sample variance as anestimate of the population variance.
We dened the sample variance as:
s2 = 1
N 1
N
i =1
(X i
X )2
It turns out that s2 is an unbiased estimator of 2 . Why?
Sampling Distribution of the Variance
It can be shown that:
n
i =1
(X i X )2 =n
i =1
X 2i 1N
N
i =1
X i2
=N
i =1
X 2i N X 2
Hence
E N
i =1
(X i X )2 = E N
i =1
X 2i NE ( X 2 ) =N
i =1
E (X 2i ) NE ( X 2 )
Sampling Distribution of the Variance
Notice that E (X 2i ) is the same for i = 1 , 2, . . . , n . This and the fact that thevariance of a random variable is given by Var (X ) = E (X 2 ) [E (X )]2 meansthat
1 E (X 2i ) = E (X 2i ) [E (X i )]2 = 2 + 2 ,2 E ( X 2 ) = Var( X ) + [ E ( X )]2 =
2
N + 2 , and that
3
E N
i =1
(X i X )2 =N
i =1
(2 + 2 ) n2
N + 2
= N (2 + 2 ) N 2
N + 2
= N 2
2 = ( N
1)2
Notes
Notes
Notes
http://find/http://goback/http://find/http://goback/http://find/http://goback/ -
8/10/2019 Lecture7 Handouts
27/35
Sampling Distribution of the Variance
Thus,
E (s2 ) = 1n 1
E n
i =1
(X i X )2 = 1n 1
(n 1)2 = 2
In other words, s2
is an unbiased estimator for 2
.
As with the mean X , s2 also has a sampling distribution.
Sampling Distribution of the Variance
Its useful to rewrite
s2 = 1N 1
N
i =1
(X i X )2as
(N 1)s22
= 12
N
i =1
(X i X )2
by multiplying both sides by N
1 (to get the sum of squares) and dividing
both sides by 2 (to normalize the resulting statistic to the scale of X ).
It can be shown that this quantity is distributed 2 with N 1 degrees of freedom. Why?
Sampling Distribution of the Variance
For the intuition, suppose we have N = 2 normal variables X 1 and X 2 .
When N = 2 , then X = (1 / 2)( X 1 + X 2 ), and we have:
s2 = ( X 1 X 2 )2
2 .
This can be rewritten as
(N 1)s22
= (X 1 X 2 )2
22
=X 1 X 2 22
2
Notes
Notes
Notes
http://find/http://find/http://find/ -
8/10/2019 Lecture7 Handouts
28/35
Sampling Distribution of the Variance
Notice that
(N 1)s22
=X 1 X 2 22
2
.
is nothing more than the square of a standard normal variable because
E(X 1 X 2 ) = 0 and Var (X 1 X 2 ) = 2 2
.
We already know that such a variable is a chi-square variable with one (that is,N 1) degree of freedom.More generally, it is the case that the sampling distribution of the (rescaled)variance is distributed as chi-square with N 1 degrees of freedom.
Sampling Distribution of the Variance
The take-away point from all of this is that the statistics you calculate are,themselves, random variables, with their own sampling distributions andcharacteristics.
Central Limit Simulation in Stata
Here is a program to draw 1 sample of size 10 from a standard uniform distributionand return the sample mean:
. program define sample, rclass
. drop _all
. quietly set obs 10
. generate x = runiform()
. summarize x
. return scalar meanforsample=r(mean)
. end
Notes
Notes
Notes
http://find/http://find/http://find/ -
8/10/2019 Lecture7 Handouts
29/35
Central Limit Simulation in Stata
Lets look at one sample.
. set seed 10101
. onesample
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------x | 10 .4515166 .2999441 .0524637 .9794558
. return list
scalars:r(meanforsample) = .4515166394412518
Central Limit Simulation in Stata
The simulate command runs a specied command # times.
So, we could draw 1,000 samples of N = 10 from a standard uniform.
. simulate xbar = r(meanforonesample),seed(10101) reps(1000): onesample nodots
. sum xbar
Variable | Obs Mean Std. Dev. Min Max-------------+--------------------------------------------------------
xbar | 1000 .4987486 . 0888575 . 2452242 . 7403472
Central Limit Simulation in Stata
Figure: Central Limit Theorem
0
1
2
3
4
5
D e n s i t y
.2 .3 .4 .5 .6 .7r(meanforonesample)
. hist xbar
As the sample size and number of simulations increase, the results b ecome closer to anormal distribution.
Notes
Notes
Notes
http://find/http://goback/http://find/http://goback/http://find/http://goback/ -
8/10/2019 Lecture7 Handouts
30/35
Central Limit Simulation in Stata
Lets take a look at Zorns data on the 7,161 fully-decided cases handed downby the Warren and Burger courts.
For our purposes, we can think of these as the population of cases in theseyears.
Well focus on one variable:fedpet : 1 if the federal government is the petitioner, 0 otherwise.
Central Limit Simulation in Stata
Figure: fedpet Histogram
0
1 0
2 0
3 0
D e n s i t y
0 .2 .4 .6 .8 1Federal govt. petitioner
This clearly has a bimodal distribution.
Central Limit Simulation in Stata
. program define meansamp, rclass
. version 11
. use WarrenBurger.dta, clear
. sample 10, count
. tempvar z
. gen z=fedpet
. summarize z
. return scalar mean=r(mean)
. end
. simulate mean=r(mean), seed(10101) reps(1000): meansamp, nodots;
. histogram mean
Notes
Notes
Notes
http://find/http://find/http://find/ -
8/10/2019 Lecture7 Handouts
31/35
Central Limit Simulation in Stata
Figure: 1,000 Means for fedpet , Sample Size = 10
0
5
1 0
1 5
D e n s i t y
0 .2 .4 .6 .8r(mean)
The distribution is not exactly Normal, and theres a lot of variation in thesample mean.
Central Limit Simulation in Stata
Now lets increase the sample size to N = 100 .
. program define meansamp100, rclass
. version 11
. use WarrenBurger.dta, clear
. sample 100, count
. tempvar z
. gen z=fedpet
. summarize z
. return scalar mean=r(mean)
. end
. simulate mean=r(mean), seed(10101) reps(1000): meansamp100, nodots
. histogram mean
Central Limit Simulation in Stata
Figure: 1,000 Means for fedpet , Sample Size = 100
0
5
1 0
1 5
D e n s i t y
.05 .1 .15 .2 .25 .3r(mean)
We see that there is now a very small range for X and the distribution isalmost perfectly normal.
Notes
Notes
Notes
http://find/http://find/http://find/ -
8/10/2019 Lecture7 Handouts
32/35
Variance Simulation in Stata
Suppose that we now look at the sampling distribution of the variance, N = 20 .
. program define varsamp20, rclass
. version 11
. use WarrenBurger.dta, clear
. sample 20, count
. tempvar z
. gen z=fedpet
. summarize z
. return scalar rescaled_variance = (r(N)-1)*r(Var)/0.1437428
. end
. simulate rescaled_variance=r(rescaled_variance),seed(10101) reps(1000): varsamp20, nodots;
. version 3.1: set seed 10101;
. rndchi 1000 19;
. rename xc chi_squared19;
. twoway kdensity rescaled_variance|| kdensity chi_squared19
Variance Simulation in Stata
Figure: Density Plot: 1000 Rescaled Variances of fedpet , Sample Size = 20 (solidline is 219 )
0
. 0 2
. 0 4
. 0 6
. 0 8
0 10 20 30 40 50x
kdensity rescaled_variance kdensity chi_squared19
Variance Simulation in Stata
Suppose that we now look at the sampling distribution of the variance,N = 500 .
. program define varsamp500, rclass
. version 11
. use WarrenBurger.dta, clear
. sample 20, count
. tempvar z
. gen z=fedpet
. summarize z
. return scalar rescaled_variance = (r(N)-1)*r(Var)/0.1437428
. end
. simulate rescaled_variance=r(rescaled_variance),seed(10101) reps(1000): varsamp500, nodots;
. version 3.1: set seed 10101;
. rndchi 1000 499;
. rename xc chi_squared499;
. twoway kdensity rescaled_variance|| kdensity chi_squared499
Notes
Notes
Notes
http://find/http://find/http://find/ -
8/10/2019 Lecture7 Handouts
33/35
Variance Simulation in Stata
Figure: Density Plot: 1000 Rescaled Variances of fedpet , Sample Size = 500 (solidline is 219 )
0
. 0 0 5
. 0 1
. 0 1 5
350 400 450 500 550 600x
kdensity rescaled_variance kdensity chi_squared499
Central Limit Simulation in R
> court attach(court)> summary(court)> # Draw 1000 random samples of fedpet, each with N=10, and calculate the mean for each> a for (i in 1:1000){+ a [i ]< -mean(sample( fedpe t, 10 ,r epl ace=F) )+ }
> describe(a)a
n missing unique Mean1000 0 8 0.1786
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7Frequency 144 298 308 164 58 22 4 2% 14 30 31 16 6 2 0 0
> histogram(a,nint=7,xlab="Means of FedPet")
Central Limit Simulation in R
Figure: Histogram: 1000 Means of fedpet , Sample Size = 10
Means of FedPet
P e r c e n
t o
f T o
t a l
0
5
10
15
20
25
30
0.0 0.2 0.4 0.6
Notes
Notes
Notes
http://find/http://goback/http://find/http://goback/http://find/http://goback/ -
8/10/2019 Lecture7 Handouts
34/35
Central Limit Simulation in R
> # Same thing, but with N=100> for (i in 1:1000){+ a[i] Da plot(Da,main="",xlab="Mean of FedPet",lwd=2)
> abline(v=0.174,lwd=2)
Central Limit Simulation in R
Figure: Histogram: 1000 Means of fedpet , Sample Size = 100
0.05 0.10 0.15 0.20 0.25 0.30
0
2
4
6
8
1 0
Mean of FedPet
D e n s i t y
Variance Simulation in R
Suppose that we now look at the sampling distribution of the variance, N = 20 .
> s for(i in 1:1000){+ s[i] describe(s)s
n missing unique Mean .05 ...1000 0 11 19.17 6.611 ...
> plot(seq(0,40,length=40),dchisq(seq(0,40,length=40),19),t="l",+ lwd=2,xlab="Rescaled S^2",ylab="Density")> lines(Ds,lwd=2,lty=2,col="red")
> abline(v=19,lwd=2)
Notes
Notes
Notes
http://find/http://find/http://find/ -
8/10/2019 Lecture7 Handouts
35/35
Variance Simulation in R
Figure: Density Plot: 1000 Rescaled Variances of fedpet , Sample Size = 20 (solidline is 219 )
0 10 20 30 40
0 . 0
0
0 . 0
2
0 . 0
4
0 . 0
6
Rescaled S^2
D e n s i t y
Variance Simulation in R
Suppose that we now look at the sampling distribution of the variance,N = 500 .
> s for(i in 1:1000){+ s[i] describe(s)s
n missing unique Mean .05 ...1000 0 47 499.1 438.7 ...
> plot(seq(300,600,length=300),dchisq(seq(300,600,length=300),499),+ t="l",lwd=2,xlab="Rescaled S^2",ylab="Density")> lines(Ds,lwd=2,lty=2,col="red")> abline(v=499,lwd=2)
Variance Simulation in R
Figure: Density Plot: 1000 Rescaled Variances of fedpet , Sample Size = 500 (solidline is 2499 )
300 350 400 450 500 550 600 0
. 0 0 0
0 . 0
0 4
0 . 0
0 8
0 . 0
1 2
D e n s i t y
Notes
Notes
Notes