lecture7 handouts

8/10/2019 Lecture7 Handouts

1/35

Lecture 7: Sampling and Sampling Distributions

Matt Golder & Sona Golder

Pennsylvania State University

Introduction

Over the last several lectures, we have attempted to obtain a workingknowledge of probability theory.

We now turn our attention to the application of that theory.

Specically, we will employ probability theory as a means to make inferencesabout the world.

Introduction

Figure: Research In A Single Figure

Recall that statistical inference involves drawing inferences about thepopulation from a sample (subgroup) taken from the population.

Notes

Notes

Notes
http://goforward/http://find/http://goback/http://goforward/http://find/http://goback/http://goforward/http://find/http://goback/


2/35

Introduction

To the extent that a sample differs from its population, that difference can bedue to two factors.

1 Bias: Some aspect of the sampling mechanism (or the research design)causes the sample to be systematically unrepresentative of the populationin one or more relevant ways.

2 Sampling Error : Any difference between the sample and the populationthat is nonsystematic, but is instead due to the randomness in the sampleselection design.

In general, bias is a far bigger and more complicated threat to validinference than is sampling error.

Sampling

Sampling involves drawing a subgroup of units from some larger population .

As weve seen, the units that make up the population are known as the units of analysis .

The population in question is often referred to as the sampling frame .

The sample size (the number of units in the sample) is usually denoted by N .

The population size (if it is known) is usually denoted by N .

And the things that are being sampled are called the primary sampling units(PSUs).

Simple Random Sampling

Simple random sampling begins with a population of size N , and randomlydraws N units from the population in any fashion that ensures that the probability of any one unit being drawn for the sample is 1/ N .

In a simple random sample, the PSUs are the same as the units of analysisthemselves.

In general, a simple random sample is the best sort of sample to have.

It leads to the simplest means of inference, and will (in the long run) be themost representative type of sample vis-a-vis the population.

Notes

Notes

Notes
http://find/http://find/http://find/


3/35


A simple random sample is generally the hardest to obtain.

A simple random sample requires a lot on the part of the researcher.

She must:

1 Know every unit in her population (its existence, location, etc.), in orderto assign it for sampling.

2 Be able to include all selected units in the sample that is drawn.


Know every unit in her population (its existence, location, etc.), in orderto assign it for sampling.

This can be difficult if the population is (i) large, (ii) ever-changing, (iii)amorphous, and/or (iv) hidden in some way.

These problems can result in a particular form of sampling bias, such as theinability to ensure that the probability of each units being sampled is 1/ N .


Be able to include all selected units in the sample that is drawn.

This raises the issue of nonresponse bias what if a survey respondent issimply unreachable?

So, simple random sampling can be difficult. And, it turns out that we canoften do better if we are interested in a particular subpopulation within thepopulation.

Notes

Notes

Notes


4/35

Stratied Sampling

A stratied sample is one in which the population is divided a priori into two ormore groups, and individuals are sampled randomly from within those groups.

The groups are known as strata , and they are often selected on the basis of thestudys subject matter.

The primary sampling units remain the same; they are just divided up intostrata.

Stratied sampling is often done when there is interest in some particularcharacteristics of the population.

Stratied Sampling in Stata

To see how stratied sampling is done in Stata , we can take a look at Zornsdata on the 7,161 fully-decided cases handed down by the Warren and Burgercourts.

us is a string variable for the U.S citation of the case,

id is just an ID variable,

amrev are the number of amicus curiae (friend of the court) briefs ledsupporting reversal, and

amaff is the number of such briefs supporting affirmance.

sumam is just amrev + amaff ,

fedpet is 1 if the federal government was the petitioner, 0 otherwise,

constit is 1 if the decision was made on constitutional grounds, 0otherwise, and

sgam is 1 if the Solicitor General (the U.S.s attorney) led am amicus curiae brief in the case, and 0 otherwise.


Only 25% of the decisions were made on constitutional grounds.

. sum constit

Variable | Obs Mean Std. Dev. Min Max-------------+--------------------------------------------------------

constit | 7161 .2535959 .4350993 0 1

Notes

Notes

Notes


5/35


Suppose we sample 10 observations each from constit = 0 and constit = 1 .

. sample 10, count by(constit)(7141 observations deleted)

. sum


us | 0id | 20 3923.55 2287.04 94 7032

amrev | 20 .3 .6569467 0 2amaff | 20 .2 .615587 0 2sumam | 20 .5 1.147079 0 4

-------------+--------------------------------------------------------fedpet | 20 .3 .4701623 0 1

constit | 20 .5 .5129892 0 1sgam | 20 .05 .2236068 0 1


Suppose we sample 5% of the observations, stratied by constit .

. sample 5, by(constit)(6803 observations deleted)

. sum


us | 0

id | 358 3613.561 1928.962 55 7161amrev | 358 .396648 1.213459 0 12amaff | 358 .2793296 .8951582 0 9sumam | 358 .6759777 1.695614 0 13

-------------+--------------------------------------------------------fedpet | 358 .1675978 .3740315 0 1

constit | 358 .2541899 . 4360143 0 1sgam | 358 .0614525 .2404946 0 1

Stratied Sampling in R

Suppose we sample 10 observations each from constit = 0 and constit = 1 .

> library(sampling)> sample sample.data summary(sample.data)

Notes

Notes

Notes
http://find/http://goback/http://find/http://goback/http://find/http://goback/


6/35

Stratied Sampling

Suppose we divide the population into two subgroups as we just did with ourStata and R examples; call them A and B .

If the proportion of group A in the sample is equal to its proportion in thepopulation, then we say we are taking a proportional stratied sample .

We obtained a proportional stratied sample when we sampled 5% of the

observations stratied by constit .

If the proportion of some group A in the sample is different from its proportionin the population, then we say we have oversampled or undersampled thatgroup.

Given that only 25% of the population decisions were based onconstitutional grounds, we oversampled on constit = 1 when wesampled 10 observations each from constit = 0 and constit = 1 .

Stratied Sampling

Oversamples are often done when one or more groups of interest, such asnative Americans, are a relatively small fraction of the total population.

With over- and under-sampling, the probability of selection for any given unit isno longer 1/ N and so we must adjust our sample to reect the differentialprobabilities, by giving different weights to observations from different strata.

Stratied sampling requires that we know more about our population than isthe case with simple random sampling.

Cluster Sampling

Cluster sampling is slightly different from stratied sampling.

In a clustered sample, the units of analysis are grouped into clusters .

However, the clusters themselves are then sampled randomly , and all units ineach selected cluster are used in the sample.

Thus, cluster sampling changes the identity of the PSU, from the unit of analysis to the cluster.

This means that the probability of an individual unit of analysis being sampledis no longer equal to all others; moreover, it may not even be known directly atall.

Cluster samples are used extensively most major surveys are done via clustersampling (often by telephone area code and exchange).

Notes

Notes

Notes


7/35

Multi-Stage Sampling

Multi-stage sampling is a generalization of cluster sampling.

1 Select a type of cluster, and identify subclusters of units within thecluster, etc. until we get to the lowest level cluster in which units arelocated.

2 Select randomly or in a stratied way some number of top-levelclusters.

3 Within each selected cluster, select again, randomly or stratifying some number of subclusters.

4 Within subclusters, select sub-subclusters, etc.5 At the lowest subcluster level, select some number of units from each

sub-cluster.


The rst-stage clusters are the primary sampling units, the second stage arethe secondary sampling units, and so forth.

Example: When sampling survey respondents, rst select by blocks, then byhouses within blocks, then by residents within each (selected) house.

This is a three-stage design.

The blocks are clusters, the houses are subclusters, and the individualsare the end units being sampled.


Multistage sampling can be useful in that we can obtain a probability samplewithout having to know the identity of each potential unit in the samplingframe.

In our example, we dont have to know exactly who lives in each house, as longas we have a rule that says something like select one person from amongthose in each house with equal probability at the last stage.

Most large national surveys are conducted using multistage sampling eitherby addresses/locations, or by telephone exchanges.

Notes

Notes

Notes


8/35

Probability Samples

So far we have discussed probability samples - we can calculate the probabilityof a given unit in the population being selected for the sample.

These probabilities can be difficult to calculate with multiple stages andstratication etc., but it can be done.

As a result, we can use something like weighting to adjust the results so thatthe sample has the statistical properties of a simple random sample.

Probability Samples: Weighting

Figure: Sample Weighting (Brooker & Schaefer, N=1,000)

Educational Level

Percent inPopulation

Sample Size Over/Under

Representation

Weight WeightedNumber in

Sample Number Percent Fraction Decimal

College Graduate 20% 300 30% 30/20 20/30 .667 200

SomeCollege 28% 330 33% 33/28 28/33 .848 280

HighSchool

Graduate 37% 270 27% 27/37 37/27 1.37 370

Less thanHS Grad 15% 100 10% 10/15 15/10 1.50 150

Total 100% 1,000 100% 1,000

Nonprobability Samples

Nonprobability samples are samples where the probability that every unit is inthe sample cannot be known, either because the sample is not probabilistic atall, or because it depends on probabilities that the researcher does not know orcontrol.

The researcher cannot correct for any unrepresentativeness in the samplingmechanism after the fact.

Nonprobability samples will be questionably representative at best, anddisastrously wrong at worst.

Notes

Notes

Notes


9/35


Convenience Sampling : The researcher samples whatever units comemost readily to hand. In this case, probability is not used in the samplingat all.

Purposive Sampling : The researcher selects units on the basis of whether she believes they ought to be in the sample. Again, there is no

use of probability at all in the sampling method.

Snowball Sampling : The researcher selects a unit, and then other unitswith some relationship to that rst unit are sampled, and so forth. Insuch a design, even if the initial sample is random / probabilistic, theprobabilities of selection for subsequent units cannot be known by theresearcher (even if, for example, they are known to the units themselves).With snowball sampling, the respondents are often similar to one anothersince they result from the rst responders social network.


Modern Straw Polls: Respondents call 1-800 numbers or use theinternet to voice an opinion. The samples suffer from many things,including self-selection problems.

Quota sampling : The researcher selects units of various types up tosome quota (for example, s/he might question 100 men and 100 women)and then stops, no longer selecting units of that type.

If quotas are combined with (say) convenience sampling, then theygenerate a nonprobability sample (Gallup 1948, Dewey-Trumanasco).

If quota sampling is combined with a probability sampling method,it can be a potentially viable (if messy) way of sampling.

Sampling Error

Sampling error , sometimes called the margin of error (MOE), is just the(random) difference between the population parameter and its sample estimate.

Although we dont know the population parameter, we can put bounds onthe sampling error.

Standard error = q (1 q )N where N is the sample size and q is the calculated quantity (proportion) of interest.

As well see shortly, we typically calculate a relative sampling error for aparticular level of condence .

Notes

Notes

Notes


10/35

Sampling Error: Sample Size

Figure: Sampling Error vs. Sample Size for Simple Random Sample

0 100 200 300 400 500

0 . 1

0 . 2

0 . 3

0 . 4

Sample Size (N)

R e

l a t i v e

M a r g

i n o

f E r r o r

90 Percent

99 Percent

95 Percent

The margin of error is always decreasing, at a decreasing rate, in the samplesize.

Sampling Error: Sample Design

Calculating the margin of error can be difficult with complicated sampledesigns.

1 Simple random sampling is relatively easy, and the sample error willgenerally be small.

2 Stratied samples can make the margin of error even smaller if thestratication is done in a way that balances the sample across thestrata.

3 Cluster samples and their variants will have higher margins of error thanthose for which the units of analysis are the PSUs

Sampling Error: Population Size

Holding the sample size constant, increasing the population size increases themargin of error but with a high degree of diminishing returns.

When the population is small and approaches the sample size, then the sampledoes a better job of being representative.

We use nite population corrections to capture this when N 5% of thepopulation N .Obviously, when N = N , the margin of error is zero.

Notes

Notes

Notes


11/35

Randomization

Beyond random sampling, the practice of randomization is a powerful tool forlearning about the world.

The main value of randomization is to prevent confounding effects from biasingour results.

Confounding usually occurs when an extraneous (third) variable has aninuence on both the cause and the effect that we are studying.

In the social sciences, we typically see two types of randomization:1 Random treatment assignment .2 Orthogonalization .

Observational studies typically focus on orthogonalization.

Random Treatment Assignment

Randomizing a treatment is the single best way to assess the causalrelationship between some factor and some outcome.

It involves (i) randomly assigning subjects to treatment and control groups,(ii) administering the treatment or placebo, and (iii) measuring the outcomes.

Random treatment assignment is often difficult or impossible in much of thesocial sciences.

Orthogonalization

Randomization can also be useful even if no treatment per se is administered.

Example: The U.S. Courts of Appeals decide cases in three-judge panels: (i)three judges from a circuit are randomly assigned to a panel, and (ii) cases arerandomly assigned to that panel.

Although there is no treatment here, randomization allows us to address anumber of interesting questions. For instance, randomization gives us leverageon whether gender makes a difference since we can be sure that a given case isno more likely to be decided by a panel of male judges than a panel of femaleones.

In effect, randomization makes it possible that all those potentially confoundingfactors in our study are no longer related to the variables we care about.

Notes

Notes

Notes


12/35

Sampling Distributions

A population is considered known if we know the probability distribution f (x)of the associated random variable X .

There will be certain quantities that appear in f (x), such as and in thecase of the normal distribution or in the case of the binomial distribution.These quantities are known as population parameters .

Often times, we may not know everything about the population parameters.

For example, we may not know one or both of the values of or in a normaldistribution and we may want to draw statistical inferences about thesepopulation parameters from a sample.

Sample Statistics

Table: A Population of 100 Students Heights

Population Distribution Calculation of Calculation of Height Frequency Relative Frequency Mean Variance 2

x f(x) xf (x ) (x )2 f (x )60 1 0.01 0.60 0.8163 6 0.06 3.78 2.1666 24 0.24 15.84 2.1669 38 0.38 26.22 072 24 0.24 17.28 2.1675 6 0.06 4.50 2.1678 1 0.01 0.78 0.81

N = 100 1.00 = 69 .00 2 = 10 .26 = 3 .20

The goal is to take a random sample from the population and then use thesample to obtain values - sample statistics - to estimate and test hypothesesabout the population parameters .

Sample Statistics

Suppose we want to draw a random sample of size N = 100 from ourpopulation.

We would start by choosing an individual at random from the population. Thisindividual can have only one value, say x 1 , of the various p ossible heights. Wecall x1 the value of a random variable X 1 , where the subscript 1 corresponds tothe rst individual chosen.

Similarly, we can choose the second individual for the sample, who can haveany one of the values x2 of the possible heights, and x 2 can be taken as thevalue of a random variable X 2 .

We can continue this process up to X 100 because we want a sample size of 100.

Notes

Notes

Notes


13/35

Sample Statistics

In the general case, a sample of size N would be described by the valuesx1 , x 2 , . . . , x n of the random variables X 1 , X 2 , . . . , X n .

In the case of sampling with replacement , X 1 , X 2 , . . . , X n would beindependent, identically distributed random variables having probability f (x).

This means that when we randomly draw each observation from a populationprobability distribution f (x), we get individual observations that each have thepopulation probability distribution f (x), i.e., each observation has the mean and standard deviation of the population.

The relative frequency column can be viewed as the probability distribution notonly of the population but also of a single observation taken at random.

The probability of getting someone with a height of 63 in our sample(0.06) is the same probability of 63 occurring in the population.

Sample Statistics

Denition : A very simple random sample is a sample whose N observations,X 1 , X 2 , . . . , X n are independent random variables. The distribution of each X is the population distribution f (x); that is

f (x1 ) = f (x2 ) = . . . = f (xn ) = population distribution, f (x)

Then each observation has the mean and standard deviation of thepopulation.

So, our sample is made up of N random variables X 1 , X 2 , . . . , X n each withprobability distribution f (x). Their joint distribution or likelihood is

Pr (X 1 = x1 , X 2 = x2 , . . . , X n = xn ) = f (x1 )f (x2 ) . . . f (xn )

Recall that this is the probability of getting all the values/observations in oursample.

Sample Statistics

Any quantity obtained from a sample for the purpose of estimating apopulation parameter is called a sample statistic , or simply a statistic .

Mathematically, a sample statistic, such as the mean, for a sample of size N can be dened as a function of the random variables X 1 , X 2 , . . . , X n , i.e.g(X 1 , X 2 , . . . , X n ).

The function g(X 1 , X 2 , . . . , X n ) is itself a random variable, whose values canbe represented by g(x1 , x 2 , . . . , x n ). The word statistic is often used for therandom variable itself or for its values.

In general, corresponding to each population parameter, there will be a statisticto be computed from the sample.

Notes

Notes

Notes


14/35


Anything that is a function of random variables is, itself, a random variable.

A sample statistic that is computed from X 1 , X 2 , . . . , X n is a function of theserandom variables and is therefore itself a random variable.

As a result, the sample statistic will have its own probability distribution.

The probability distribution of a sample statistic is often called the samplingdistribution of the statistic.


What does all this mean?

Suppose we draw a sample of 100 from our p opulation of 100 students andcalculated a sample mean X .

This is only one sample that we could have drawn from the population andhence only one sample mean.

Suppose we draw a second sample of 100 from the population and calculatedthe mean of this sample. It is unlikely that the new mean will be exactly the

same as the rst mean.

This is what we mean by saying that sample statistics - in this case the mean are themselves random variables; they vary from one sample to another.

As a result, the estimated statistics will have their own sampling distribution.


In effect, the frequentist perspective on statistical inference starts with the ideathat our sample is just one of many possible samples that we could have drawnfrom the population.

A sampling distribution of a statistic can be thought of as the theoreticaldistribution of some sample statistic (a mean, a variance etc.) that we wouldobserve through repeated sampling.

Thus, although we will typically only ever have one sample, we can takeaccount of the fact that other samples would have led to different statistics bylooking at the sampling distribution.

Notes

Notes

Notes


15/35


If we knew the population, we could easily learn about the samplingdistribution.

But we often dont have the population or even if we did it would beimpractical to sample from it enough to learn about the sampling distribution.

There are usually two ways of learning about a sampling distribution.1 Theoretical derivation : Try to theoretically derive the sampling

distribution.2 Experimental derivation: Use computers to generate a known

population (we control the parameters) and then simulate the samplingprocess on this population. This is known as a Monte Carlo analysis.

Moments of the Sample Mean

Suppose we are interested in estimating a population mean . If we take arandom sample of observations from this population and calculate the samplemean X , how good will X be as an estimator of its target ?

Let X 1 , X 2 , . . . , X N denote independent and identically distributed randomvariables for a random sample of size N . Then the sample mean is a randomvariable itself:

X = X 1 + X 2 + . . . + X N

N

If x 1 , x 2 , . . . , x N denote the values obtained in a particular sample of size N ,then we can also denote the sample mean as:

x = x 1 + x2 + . . . + xN

N

Moments of the Sample Mean: Mean

Each random draw of X N is itself a random variable.

As a result, X is a linear combination of random variables and is therefore itself a random variable with a mean and standard deviation.

X = E ( X ) = E (X 1 ) + E (X 2 ) + . . . + E (X N )

N

Recall that each observation X in a random sample has the populationdistribution f (x) with mean .

Thus, E (X 1 ) = E (X 2 ) = . . . = E (X n ) = and therefore

X = E ( X ) = + + . . . + N = N N =

Notes

Notes

Notes


16/35

Moments of the Sample Mean: Mean

What this shows is that the expected value of the sample mean is equal to thepopulation mean, i.e, on average , X = .

Put differently, the sample mean X is an unbiased estimator of the populationmean .

Moments of the Sample Mean: Variance

Since each independently drawn X N is itself a random variable, then thevariance of X will be a linear combination of the variance of each X N .

2X = Var( X ) = E [( X )2 ]= Var

1N

N

i =1

X i

= 1N 2

N

i =1

Var(X i )

= 1N 2 (Var(X 1 ) + Var(X 2 ) + . . . + Var(X N ))

= N 2

N 2 =

2

N


The standard deviation of X is:

X = Standard Deviation of X = N

The standard deviation of a statistic is also referred to as the standard error of the statistic.

The standard error gives one a rough idea how much variation there will be inour estimate of X as a function of our sample size.

It should be clear that the precision of our estimate of X grows with samplesize.

Because X is an

unbiased estimator of the population mean , this means thatthe sampling distribution of X becomes more and more concentrated around

the true population value as the sample size increases, i.e. X is a consistentestimator of .

Notes

Notes

Notes


17/35


Example : Suppose we were to take many samples of size N = 4 from ourpopulation of 100 student heights and calculate the sample mean in each case.How would the sample means uctuate?


Example : Suppose we were to take many samples of size N = 4 from ourpopulation of 100 student heights and calculate the sample mean in each case.How would the sample means uctuate?

If the sample size was 4, the sample means would uctuate around the targetof 69 with a standard error of 1.6 i.e.

E ( X ) = X = = 69

X = SE X =

N = 3.2

2 = 1 .6


Example : Suppose the sample size quadrupled to N = 16 ? How would thesample means uctuate?

Notes

Notes

Notes


18/35


Example : Suppose the sample size quadrupled to N = 16 ? How would thesample means uctuate?

If the sample size was 16, the sample means would uctuate around the targetof 69 with a standard error of 0.8 i.e.

E ( X ) = X

= = 69

SE X = N =

3.24

= 0 .8

One must quadruple the sample size to halve the sampling error.


If the population is of size N , if sampling is without replacement , and if thesample size is N N , then the variance of the sampling mean is given by:

2X = 2

N N N N 1

and the standard error is

X = N N N N 1

Youll notice that as N , the variance reduces back to 2X =

2

N and thestandard error reduces back to X = N .

Obviously, all of this assumes that the population size is known.

Shape and the Central Limit Theorem

At this point, we have dened the rst and second moments of the distributionfor the sample mean.

Theorem : If X is a random variable with mean and variance 2 , then,whatever the distribution of X , the sampling distribution of X has the samemean, and variance equal to

2

N .

But what does the sample distribution look like? What is its shape?

This brings us back to the Central Limit Theorem .

Notes

Notes

Notes


19/35


Central Limit Theorem: If X has any distribution with mean and variance2 , then the distribution of

X XX

= X / N =

N X

approaches the standard normal distribution as the sample size increases.

In other words, the standardized variable associated with X is asymptotically normal.

This means that the distribution of X in large samples is approximately normalwith mean and variance

2

N , i.e., X = 1N

N i =1 X i N ,

2

N .


This result is incredibly powerful.

It means that no matter what the distribution of the population or randomvariable, X , we can approximate the distribution of the sample mean, X , withthe normal distribution so long as the sample size is large enough.

When X is distributed normally, th en X is always normally distributedirrespective of sample size.

For non-normal variables, the CLT provides an approximation for nitesamples. In general, the approximation is good when the sample size islarge enough ( N > 30 is usually good) but can also be quite good evenfor small samples ( N = 10 or 20) when the variable has a symmetricdistribution.


To illustrate the Central Limit Theorem, click here

To go to Distributions Related to Normal applet to illustrate the samplingdistribution of the mean, the central limit theorem, and the samplingdistribution of the variance, click here

Notes

Notes

Notes
http://www.mathcs.org/java/programs/CLT/clt.htmlhttp://www.mathcs.org/java/programs/CLT/clt.htmlhttp://www.mathcs.org/java/programs/CLT/clt.htmlhttp://www.mathcs.org/java/programs/CLT/clt.htmlhttp://www.mathcs.org/java/programs/CLT/clt.htmlhttp://www.mathcs.org/java/programs/CLT/clt.htmlhttp://www.mathcs.org/java/programs/CLT/clt.htmlhttp://www.mathcs.org/java/programs/CLT/clt.htmlhttp://www.mathcs.org/java/programs/CLT/clt.htmlhttp://www.mathcs.org/java/programs/CLT/clt.htmlhttp://www.mathcs.org/java/programs/CLT/clt.htmlhttp://www.mathcs.org/java/programs/CLT/clt.htmlhttp://www.mathcs.org/java/programs/CLT/clt.htmlhttp://www.mathcs.org/java/programs/CLT/clt.htmlhttp://www.mathcs.org/java/programs/CLT/clt.htmlhttp://www.brookscole.com/cgi-wadsworth/course_products_wp.pl?fid=M20b&flag=student&product_isbn_issn=9780495110811&disciplinenumber=17http://www.brookscole.com/cgi-wadsworth/course_products_wp.pl?fid=M20b&flag=student&product_isbn_issn=9780495110811&disciplinenumber=17http://www.brookscole.com/cgi-wadsworth/course_products_wp.pl?fid=M20b&flag=student&product_isbn_issn=9780495110811&disciplinenumber=17http://www.brookscole.com/cgi-wadsworth/course_products_wp.pl?fid=M20b&flag=student&product_isbn_issn=9780495110811&disciplinenumber=17http://www.brookscole.com/cgi-wadsworth/course_products_wp.pl?fid=M20b&flag=student&product_isbn_issn=9780495110811&disciplinenumber=17http://www.mathcs.org/java/programs/CLT/clt.htmlhttp://www.mathcs.org/java/programs/CLT/clt.htmlhttp://www.brookscole.com/cgi-wadsworth/course_products_wp.pl?fid=M20b&flag=student&product_isbn_issn=9780495110811&disciplinenumber=17http://www.brookscole.com/cgi-wadsworth/course_products_wp.pl?fid=M20b&flag=student&product_isbn_issn=9780495110811&disciplinenumber=17http://www.brookscole.com/cgi-wadsworth/course_products_wp.pl?fid=M20b&flag=student&product_isbn_issn=9780495110811&disciplinenumber=17http://www.brookscole.com/cgi-wadsworth/course_products_wp.pl?fid=M20b&flag=student&product_isbn_issn=9780495110811&disciplinenumber=17http://find/http://www.brookscole.com/cgi-wadsworth/course_products_wp.pl?fid=M20b&flag=student&product_isbn_issn=9780495110811&disciplinenumber=17http://www.brookscole.com/cgi-wadsworth/course_products_wp.pl?fid=M20b&flag=student&product_isbn_issn=9780495110811&disciplinenumber=17http://www.brookscole.com/cgi-wadsworth/course_products_wp.pl?fid=M20b&flag=student&product_isbn_issn=9780495110811&disciplinenumber=17http://www.mathcs.org/java/programs/CLT/clt.htmlhttp://find/http://find/


20/35


The central limit theorem allows us to use the familiar standard normal tablesto determine how closely a sample mean X will estimate a population mean .

Example : With a sample of N = 10 , we have

Expected Height = E ( X ) = X = = 69

and

Standard Error = X = N =

3.2 10 = 1 .02

What is the probability that the sample mean X will be within 2 inches of thepopulation mean when N = 10 ? In other words, what is Pr (67 < X < 71)?


z 1 = 67 69

1.02 = 1.96

z 2 = 71 69

1.02 = 1 .96

This implies that Pr (67 < X < 71) = Pr (1.96 < z < 1.96).Pr (z > 1.96) = 0 .025

And by symmetry,

Pr (z < 1.96) = 0 .025


Thus, the probability that the sample mean X will be within 2 inches of thepopulation mean i.e. between 67 and 71 is:

Pr (67 < X < 71) = 1 Pr(z > 1.96) Pr(z < 1.96) = 0 .95

In other words, there is a 95% chance that the sample mean will be within 2inches of the population mean.

Notes

Notes

Notes


21/35


Example : Suppose a large class in statistics has marks normally distributedaround a mean of 72 with a standard deviation of 9. Find the probability thatan individual student drawn at random will have a mark over 80.


Example : Suppose a large class in statistics has marks normally distributedaround a mean of 72 with a standard deviation of 9. Find the probability thatan individual student drawn at random will have a mark over 80.

z = x

= 80 72

9 = 0 .89

P r (x > 80) = P r (z > 0.89) = 0 .187 19%


Now nd the probability that a random sample of 10 students will have anaverage mark over 80.

Notes

Notes

Notes


22/35


Now nd the probability that a random sample of 10 students will have anaverage mark over 80.

According to the CLT, X has an approximately normal distribution with anexpected value of 72 and a standard error of n =

9 10 = 2 .85.

z = x

SE = 80

72

2.85 = 2 .81

Pr(x > 80) = Pr (z > 2.81) = 0 .002

Although there is a reasonable chance (19%) that one student will get above80, there is very little chance (0.2%) that a sample average of 10 students willperform this well.

What happens if the population of students was not normally distributed?


Example : Achievement test scores of all high school seniors in a state havemean 60 and variance 64. A random sample of N = 100 students from onelarge high school had a mean of 58. Is there evidence to suggest that this highschool is inferior? Calculate the probability that the sample mean is at most 58when N = 100 .


Example : Achievement test scores of all high school seniors in a state havemean 60 and variance 64. A random sample of N = 100 students from onelarge high school had a mean of 58. Is there evidence to suggest that this highschool is inferior? Calculate the probability that the sample mean is at most 58when N = 100 .

Let x denote the mean of a random sample of N = 100 scores from apopulation with = 60 and 2 = 64 .

We want to approximate Pr (x 58).

Notes

Notes

Notes


23/35


z = x

SE =

58 608 100 =

58 600.8

= 2.5

Pr(x 58) = Pr(z 2.5) = 0 .0062

Because this probability is so small, it is unlikely that the sample from theschool of interest can be regarded as a random sample from a population with = 60 and 2 = 64 .

In other words, the evidence suggests that this high school is inferior.

Moments of Sample Proportion

Sometimes we will be interested in using a proportion P from the sample tomake an inference about the proportion in the population .

Like X , P also uctuates from sample to sample in a pattern (samplingdistribution) that is easily summarized.

In random samples of size N , the sample proportion P uctuates around thepopulation proportion with variance 2P =

P (1 P )N and standard error

P =

P (1 P )N .

Thus, as N increases, the sampling distribution of P concentrates more andmore around its target .

For large values of N (N 30), the sampling distribution is very nearly anormal distribution.


Example : Suppose we had a population of voters 60% Republicans and 40%Democrats.

The sampling distribution for the proportion P of Republicans when the samplesize is N = 100 would be normally distributed around the target populationproportion = 0 .60 with a standard error P = p (1 p )N = 0 . 6(0 . 4)100 = 0 .05.How likely is it that a poll of 100 would contain a minority of Republicans?

Notes

Notes

Notes


24/35


A minority means that the proportion P of Republicans is less than 50%.

z = p

SE = 0.5 0.6

0.05 = 2

Pr( p 0.5) = Pr (z 2) = 0 .023 2%

As you can see, the likelihood that a sample of 100 voters would contain aminority of Republicans is quite low.


Example : Of your rst 15 grandchildren, what is the chance there will be morethan 10 boys?

More than 10 boys is the same as saying that the proportion of boys is morethan 10/15.

Thus, the proportion P of boys in sample of 15 will uctuate around thepopulation proportion = 0 .5 with a standard errorP =

p (1 p )n =

0 . 5(0 . 5)

15 = 0 .129.


z = p

SE =

1015 0.5

0.129 = 1 .29

Pr p 1015

= Pr(z 1.29) = 0 .099 10%

Thus, the chance of more than 10 boys is about 10%.

Notes

Notes

Notes


25/35


If the population is of size N , if sampling is without replacement, and if thesample size is N N , then the variance of the sampling proportion is given by

2P = P (1 P )

N N N N 1

and the standard error is

P = P (1 P )N N N N 1As N , the variance reduces back to 2P = P (1 P )N and the standard errorreduces back to P = P (1 P )N .Obviously, all of this assumes that the population size is known.

Proportions and Sample Means

What you should have noticed is that the sample proportion is just a disguisedsample mean.

All proportions essentially come from a sample in which observations can bethought of as taking on a value of 1 or 0.

In other words, proportions are sample means for dichotomous variables.

Proportions and Sample Means

Table: Population Mean and Variance for a 0-1 Variable

x f (x) xf (x) x2 f (x)0 (1 ) 0 01

= E (x2 ) = 2 = E (x2 ) 2 = 2 = (1 )

= (1 )Similarly the variance and standard error of P is equal to the variance andstandard error of X .

2X = 2

N =

(1 )N

= 2P

Notes

Notes

Notes
http://goforward/http://find/http://goback/http://goforward/http://find/http://goback/http://goforward/http://find/http://goback/


26/35

Sampling Distribution of the Variance

In addition to the mean, we might think to use the sample variance as anestimate of the population variance.

We dened the sample variance as:

s2 = 1

N 1

N

i =1

(X i

X )2

It turns out that s2 is an unbiased estimator of 2 . Why?


It can be shown that:

n

i =1

(X i X )2 =n

i =1

X 2i 1N

N

i =1

X i2

=N

i =1

X 2i N X 2

Hence

E N

i =1

(X i X )2 = E N

i =1

X 2i NE ( X 2 ) =N

i =1

E (X 2i ) NE ( X 2 )


Notice that E (X 2i ) is the same for i = 1 , 2, . . . , n . This and the fact that thevariance of a random variable is given by Var (X ) = E (X 2 ) [E (X )]2 meansthat

1 E (X 2i ) = E (X 2i ) [E (X i )]2 = 2 + 2 ,2 E ( X 2 ) = Var( X ) + [ E ( X )]2 =

2

N + 2 , and that

3

E N

i =1

(X i X )2 =N

i =1

(2 + 2 ) n2

N + 2

= N (2 + 2 ) N 2

N + 2

= N 2

2 = ( N

1)2

Notes

Notes

Notes


27/35


Thus,

E (s2 ) = 1n 1

E n

i =1

(X i X )2 = 1n 1

(n 1)2 = 2

In other words, s2

is an unbiased estimator for 2

.

As with the mean X , s2 also has a sampling distribution.


Its useful to rewrite

s2 = 1N 1

N

i =1

(X i X )2as

(N 1)s22

= 12

N

i =1

(X i X )2

by multiplying both sides by N

1 (to get the sum of squares) and dividing

both sides by 2 (to normalize the resulting statistic to the scale of X ).

It can be shown that this quantity is distributed 2 with N 1 degrees of freedom. Why?


For the intuition, suppose we have N = 2 normal variables X 1 and X 2 .

When N = 2 , then X = (1 / 2)( X 1 + X 2 ), and we have:

s2 = ( X 1 X 2 )2

2 .

This can be rewritten as

(N 1)s22

= (X 1 X 2 )2

22

=X 1 X 2 22

2

Notes

Notes

Notes


28/35


Notice that

(N 1)s22

=X 1 X 2 22

2

.

is nothing more than the square of a standard normal variable because

E(X 1 X 2 ) = 0 and Var (X 1 X 2 ) = 2 2

.

We already know that such a variable is a chi-square variable with one (that is,N 1) degree of freedom.More generally, it is the case that the sampling distribution of the (rescaled)variance is distributed as chi-square with N 1 degrees of freedom.


The take-away point from all of this is that the statistics you calculate are,themselves, random variables, with their own sampling distributions andcharacteristics.

Central Limit Simulation in Stata

Here is a program to draw 1 sample of size 10 from a standard uniform distributionand return the sample mean:

. program define sample, rclass

. drop _all

. quietly set obs 10

. generate x = runiform()

. summarize x

. return scalar meanforsample=r(mean)

. end

Notes

Notes

Notes


29/35


Lets look at one sample.

. set seed 10101

. onesample

Variable | Obs Mean Std. Dev. Min Max

-------------+--------------------------------------------------------x | 10 .4515166 .2999441 .0524637 .9794558

. return list

scalars:r(meanforsample) = .4515166394412518


The simulate command runs a specied command # times.

So, we could draw 1,000 samples of N = 10 from a standard uniform.

. simulate xbar = r(meanforonesample),seed(10101) reps(1000): onesample nodots

. sum xbar


xbar | 1000 .4987486 . 0888575 . 2452242 . 7403472


Figure: Central Limit Theorem

0

1

2

3

4

5

D e n s i t y

.2 .3 .4 .5 .6 .7r(meanforonesample)

. hist xbar

As the sample size and number of simulations increase, the results b ecome closer to anormal distribution.

Notes

Notes

Notes


30/35


Lets take a look at Zorns data on the 7,161 fully-decided cases handed downby the Warren and Burger courts.

For our purposes, we can think of these as the population of cases in theseyears.

Well focus on one variable:fedpet : 1 if the federal government is the petitioner, 0 otherwise.


Figure: fedpet Histogram

0

1 0

2 0

3 0

D e n s i t y

0 .2 .4 .6 .8 1Federal govt. petitioner

This clearly has a bimodal distribution.


. program define meansamp, rclass

. version 11

. use WarrenBurger.dta, clear

. sample 10, count

. tempvar z

. gen z=fedpet

. summarize z

. return scalar mean=r(mean)

. end

. simulate mean=r(mean), seed(10101) reps(1000): meansamp, nodots;

. histogram mean

Notes

Notes

Notes


31/35


Figure: 1,000 Means for fedpet , Sample Size = 10

0

5

1 0

1 5

D e n s i t y

0 .2 .4 .6 .8r(mean)

The distribution is not exactly Normal, and theres a lot of variation in thesample mean.


Now lets increase the sample size to N = 100 .

. program define meansamp100, rclass

. version 11


. sample 100, count

. tempvar z

. gen z=fedpet

. summarize z

. return scalar mean=r(mean)

. end

. simulate mean=r(mean), seed(10101) reps(1000): meansamp100, nodots

. histogram mean


Figure: 1,000 Means for fedpet , Sample Size = 100

0

5

1 0

1 5

D e n s i t y

.05 .1 .15 .2 .25 .3r(mean)

We see that there is now a very small range for X and the distribution isalmost perfectly normal.

Notes

Notes

Notes


32/35

Variance Simulation in Stata

Suppose that we now look at the sampling distribution of the variance, N = 20 .

. program define varsamp20, rclass

. version 11


. sample 20, count

. tempvar z

. gen z=fedpet

. summarize z

. return scalar rescaled_variance = (r(N)-1)*r(Var)/0.1437428

. end

. simulate rescaled_variance=r(rescaled_variance),seed(10101) reps(1000): varsamp20, nodots;

. version 3.1: set seed 10101;

. rndchi 1000 19;

. rename xc chi_squared19;

. twoway kdensity rescaled_variance|| kdensity chi_squared19


Figure: Density Plot: 1000 Rescaled Variances of fedpet , Sample Size = 20 (solidline is 219 )

0

. 0 2

. 0 4

. 0 6

. 0 8

0 10 20 30 40 50x

kdensity rescaled_variance kdensity chi_squared19


Suppose that we now look at the sampling distribution of the variance,N = 500 .

. program define varsamp500, rclass

. version 11


. sample 20, count

. tempvar z

. gen z=fedpet

. summarize z

. return scalar rescaled_variance = (r(N)-1)*r(Var)/0.1437428

. end

. simulate rescaled_variance=r(rescaled_variance),seed(10101) reps(1000): varsamp500, nodots;

. version 3.1: set seed 10101;

. rndchi 1000 499;

. rename xc chi_squared499;

. twoway kdensity rescaled_variance|| kdensity chi_squared499

Notes

Notes

Notes


33/35



0

. 0 0 5

. 0 1

. 0 1 5

350 400 450 500 550 600x

kdensity rescaled_variance kdensity chi_squared499

Central Limit Simulation in R

> court attach(court)> summary(court)> # Draw 1000 random samples of fedpet, each with N=10, and calculate the mean for each> a for (i in 1:1000){+ a [i ]< -mean(sample( fedpe t, 10 ,r epl ace=F) )+ }

> describe(a)a

n missing unique Mean1000 0 8 0.1786

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7Frequency 144 298 308 164 58 22 4 2% 14 30 31 16 6 2 0 0

> histogram(a,nint=7,xlab="Means of FedPet")


Figure: Histogram: 1000 Means of fedpet , Sample Size = 10

Means of FedPet

P e r c e n

t o

f T o

t a l

0

5

10

15

20

25

30

0.0 0.2 0.4 0.6

Notes

Notes

Notes


34/35


> # Same thing, but with N=100> for (i in 1:1000){+ a[i] Da plot(Da,main="",xlab="Mean of FedPet",lwd=2)

> abline(v=0.174,lwd=2)


Figure: Histogram: 1000 Means of fedpet , Sample Size = 100

0.05 0.10 0.15 0.20 0.25 0.30

0

2

4

6

8

1 0

Mean of FedPet

D e n s i t y

Variance Simulation in R

Suppose that we now look at the sampling distribution of the variance, N = 20 .

> s for(i in 1:1000){+ s[i] describe(s)s

n missing unique Mean .05 ...1000 0 11 19.17 6.611 ...

> plot(seq(0,40,length=40),dchisq(seq(0,40,length=40),19),t="l",+ lwd=2,xlab="Rescaled S^2",ylab="Density")> lines(Ds,lwd=2,lty=2,col="red")

> abline(v=19,lwd=2)

Notes

Notes

Notes


35/35



0 10 20 30 40

0 . 0

0

0 . 0

2

0 . 0

4

0 . 0

6

Rescaled S^2

D e n s i t y


Suppose that we now look at the sampling distribution of the variance,N = 500 .

> s for(i in 1:1000){+ s[i] describe(s)s

n missing unique Mean .05 ...1000 0 47 499.1 438.7 ...

> plot(seq(300,600,length=300),dchisq(seq(300,600,length=300),499),+ t="l",lwd=2,xlab="Rescaled S^2",ylab="Density")> lines(Ds,lwd=2,lty=2,col="red")> abline(v=499,lwd=2)



300 350 400 450 500 550 600 0

. 0 0 0

0 . 0

0 4

0 . 0

0 8

0 . 0

1 2

D e n s i t y

Notes

Notes

Notes

lecture7 handouts

Documents