sampling distributions, the clt, and...

63
Sampling Distributions, the CLT, and Estimation Carolyn J. Anderson EdPsych 580 Fall 2005 Sampling Distributions, the CLT, and Estimation – p. 1/63

Upload: vanngoc

Post on 06-Mar-2018

240 views

Category:

Documents


5 download

TRANSCRIPT

Page 1: Sampling Distributions, the CLT, and Estimationcourses.education.illinois.edu/EdPsy580/lectures/4SampDist_CLT... · Sampling Distributions, the CLT, and Estimation ... • Population

Sampling Distributions, the CLT,and Estimation

Carolyn J. Anderson

EdPsych 580

Fall 2005

Sampling Distributions, the CLT, and Estimation – p. 1/63

Page 2: Sampling Distributions, the CLT, and Estimationcourses.education.illinois.edu/EdPsy580/lectures/4SampDist_CLT... · Sampling Distributions, the CLT, and Estimation ... • Population

Sampling and Estimation

• Sampling Distributions• Normal distribution & Central Limit Theorem• Estimators and estimates• Statistical Inference (interval estimation)

Sampling Distributions, the CLT, and Estimation – p. 2/63

Page 3: Sampling Distributions, the CLT, and Estimationcourses.education.illinois.edu/EdPsy580/lectures/4SampDist_CLT... · Sampling Distributions, the CLT, and Estimation ... • Population

Recall The Big Picture

PopulationN

Sample

n

¶ ³-

Select a Subset

µ ´¾

Make Inferences

Sampling Distributions, the CLT, and Estimation – p. 3/63

Page 4: Sampling Distributions, the CLT, and Estimationcourses.education.illinois.edu/EdPsy580/lectures/4SampDist_CLT... · Sampling Distributions, the CLT, and Estimation ... • Population

Populationor “sample space” consists of elementary events.

• All potential units that could be observed.• If finite, then number of units countable.

If infinite, then number of potentialobservations is infinite.If “virtually infinite”, then very, very, very largenumber.

• Real or hypothetical.• All college students in U.S.

• All possible mean SAT scores from samples drawnfrom all college student in U.S.

Sampling Distributions, the CLT, and Estimation – p. 4/63

Page 5: Sampling Distributions, the CLT, and Estimationcourses.education.illinois.edu/EdPsy580/lectures/4SampDist_CLT... · Sampling Distributions, the CLT, and Estimation ... • Population

Random Variables• A Random Variable is a number assigned to

any particular member of the population. Thisset of numbers has a distribution.

• Population Distribution is the (frequency)distribution of these random variables. It hassome form with mean µ and variance σ2.

• Population distributions are almost alwaystreated as (theoretical) probabilitydistributions.

• Random sample with replacement −→long-run relative frequency of a value is thesame as the probability of that value.

Sampling Distributions, the CLT, and Estimation – p. 5/63

Page 6: Sampling Distributions, the CLT, and Estimationcourses.education.illinois.edu/EdPsy580/lectures/4SampDist_CLT... · Sampling Distributions, the CLT, and Estimation ... • Population

Parameters

Parameters of populations (“true values”) arevalues that summarize (define) the distribution.

• Mean• Variance• others

Sampling Distributions, the CLT, and Estimation – p. 6/63

Page 7: Sampling Distributions, the CLT, and Estimationcourses.education.illinois.edu/EdPsy580/lectures/4SampDist_CLT... · Sampling Distributions, the CLT, and Estimation ... • Population

Sample

• A Sample is a sub-set of n units from thepopulation.

• Quantities or values computed using asample of observations of random variablesare Statistics.

• Examples:• Mean: X = (1/n)

∑n

i=1 Xi

• Variance: s2n = (1/n)

∑n

i=1(Xi − X)2

• 2nd observation on X: X2

• Range: (Xmax − Xmin)Sampling Distributions, the CLT, and Estimation – p. 7/63

Page 8: Sampling Distributions, the CLT, and Estimationcourses.education.illinois.edu/EdPsy580/lectures/4SampDist_CLT... · Sampling Distributions, the CLT, and Estimation ... • Population

Sampling Distributions

** Key Concept**A “conceptual experiment”:

• Imagine randomly sampling n individuals froma population and computing some statisticbased on the sample.

• Repeat this (independently) many times.• Result: many values of the sample statistic−→ The sampling distribution of the samplestatistic.

Sampling Distributions, the CLT, and Estimation – p. 8/63

Page 9: Sampling Distributions, the CLT, and Estimationcourses.education.illinois.edu/EdPsy580/lectures/4SampDist_CLT... · Sampling Distributions, the CLT, and Estimation ... • Population

Sampling Distributions (continued)

From Hayes:

A Sampling Distribution is a theoretical probabilitydistribution that shows the functional relationbetween possible values of a given statisticbased on a sample of n cases and the probability(density) associated with each value, for allpossible samples of size n drawn from aparticular population.

Sampling Distributions, the CLT, and Estimation – p. 9/63

Page 10: Sampling Distributions, the CLT, and Estimationcourses.education.illinois.edu/EdPsy580/lectures/4SampDist_CLT... · Sampling Distributions, the CLT, and Estimation ... • Population

Sampling Distributions (continued)

• In general, the sampling distribution will notbe the same as the population distribution.

• We describe sampling distributions the sameway that we describe population (or asample) distributions. i.e., mean, variance,standard deviation, shape, etc.

Sampling Distributions, the CLT, and Estimation – p. 10/63

Page 11: Sampling Distributions, the CLT, and Estimationcourses.education.illinois.edu/EdPsy580/lectures/4SampDist_CLT... · Sampling Distributions, the CLT, and Estimation ... • Population

Characteristics of Sampling Dist.If the population distribution has mean µ andvariance σ2, then the sampling distribution for astatistic (for samples of size n) has

• Mean of the sampling distribution of the statistic equalsthe population mean of that statistic, µ.

• Variance of the sampling distribution of the statisticequals the population variance divided by the samplesize, σ2/n.

• Standard Deviation of the sampling distribution of thestatistic, “standard error of estimate”, equals σ/

√n.

Sampling Distributions, the CLT, and Estimation – p. 11/63

Page 12: Sampling Distributions, the CLT, and Estimationcourses.education.illinois.edu/EdPsy580/lectures/4SampDist_CLT... · Sampling Distributions, the CLT, and Estimation ... • Population

Characteristics of Sampling Dist.

The statements on the previous slide regardingthe mean, variance and standard deviation ofsampling distributions are true for all statisticsregardless of the shape of the parent/populationdistribution.

Sampling Distributions, the CLT, and Estimation – p. 12/63

Page 13: Sampling Distributions, the CLT, and Estimationcourses.education.illinois.edu/EdPsy580/lectures/4SampDist_CLT... · Sampling Distributions, the CLT, and Estimation ... • Population

Eg: Sampling Dist of the Mean• Population: Y is a random variable with mean

µ and variance σ2.• Sample: Random (independent) sample from

the population: Y1, Y2, . . . , Yn.

• The sample mean Y = (1/n)∑n

i=1 Yi.

• Expected value of Y (E(Y ), mean of thesampling distribution) of Y . . .

Sampling Distributions, the CLT, and Estimation – p. 13/63

Page 14: Sampling Distributions, the CLT, and Estimationcourses.education.illinois.edu/EdPsy580/lectures/4SampDist_CLT... · Sampling Distributions, the CLT, and Estimation ... • Population

Expected value ofYThe mean of the sampling distribution of Y . . .

E[Y ] = E[1

n(Y1 + Y2 + . . . + Yn)]

=1

nE[Y1 + Y2 + . . . + Yn]

=1

n(E[Y1] + E[Y2] + . . . + E[Yn])

=1

n(µ + µ + . . . + µ)

=1

n

n∑

i=1

µ = µ

Sampling Distributions, the CLT, and Estimation – p. 14/63

Page 15: Sampling Distributions, the CLT, and Estimationcourses.education.illinois.edu/EdPsy580/lectures/4SampDist_CLT... · Sampling Distributions, the CLT, and Estimation ... • Population

Variance of Y• Recall that σ2 = E[(Y − µ)2] = E(Y 2) − µ2.

• var(Y ) = E[(Y − µ)2] = E[Y 2] − µ2.• Square sample mean,

Y 2 =(Y1 + Y2 + . . . + Yn)2

n2

=(Y 2

1 + . . . + Y 2n + 2Y1Y2 + 2Y1Y3 + . . . + 2Y(n−1)Yn)

n2

• If two random variables, e.g. Y1 and Y2, are

independent, then

E(Y1Y2) = E(Y1)E(Y2) = µµ = µ2

Sampling Distributions, the CLT, and Estimation – p. 15/63

Page 16: Sampling Distributions, the CLT, and Estimationcourses.education.illinois.edu/EdPsy580/lectures/4SampDist_CLT... · Sampling Distributions, the CLT, and Estimation ... • Population

Variance of Y (continued)

E[Y 2] =E[(Y 2

1 + . . . + Y 2n + 2Y1Y2 + 2Y1Y3 . . . + 2Y(n−1)Yn)]

n2

=

∑n

i=1 E[Y 2i ] + 2

i>j E[YiYj]

n2

=

∑n

i=1(σ2 + µ2) + 2

i>j µ2

n2

=n(σ2 + µ2) + 2 (n−1)n

2µ2

n2

=σ2 + nµ2

n

=σ2

n+ µ2

Sampling Distributions, the CLT, and Estimation – p. 16/63

Page 17: Sampling Distributions, the CLT, and Estimationcourses.education.illinois.edu/EdPsy580/lectures/4SampDist_CLT... · Sampling Distributions, the CLT, and Estimation ... • Population

Variance of Y (continued)

var(Y ) = E[(Y − µ)2]

= E[Y 2] − µ2

= (σ2

n+ µ2) − µ2

=σ2

n

We made no assumptions regarding the natureof the population distribution, except that themean equals µ and variance equals σ2!

Sampling Distributions, the CLT, and Estimation – p. 17/63

Page 18: Sampling Distributions, the CLT, and Estimationcourses.education.illinois.edu/EdPsy580/lectures/4SampDist_CLT... · Sampling Distributions, the CLT, and Estimation ... • Population

Variance of Y (continued)

var(Y ) = σ2Y

= σ2/n

As n increases, var(Y ) decreases (i.e., precisionof the estimate of the statistic increases).

o

Sampling Distributions, the CLT, and Estimation – p. 18/63

Page 19: Sampling Distributions, the CLT, and Estimationcourses.education.illinois.edu/EdPsy580/lectures/4SampDist_CLT... · Sampling Distributions, the CLT, and Estimation ... • Population

Normal Distribution and the C.L.T.• The normal distribution is a particular

probability distribution for continuousvariables.

• The “Bell Curve”

• Why is it so important?• It’s a good approximation of the (population)

distribution of many measured variables.

• Many statistical procedures are based on theassumption of a normal distribution (e.g., samplingdistributions of statistics).

• It has lots of nice mathematic properties.Sampling Distributions, the CLT, and Estimation – p. 19/63

Page 20: Sampling Distributions, the CLT, and Estimationcourses.education.illinois.edu/EdPsy580/lectures/4SampDist_CLT... · Sampling Distributions, the CLT, and Estimation ... • Population

The Normal Distribution

Formal definition: The family of normaldistributions is a set of symmetric, bell shapedcurves each characterized by its µ and σ2. Theformula for the normal p.d.f is

f(x) =1√

2πσ2e−

1

2(x−µ

σ)2

where• e = 2.71828 . . . (base of natural log).• π = 3.14159 (circumference/diameter).

Sampling Distributions, the CLT, and Estimation – p. 20/63

Page 21: Sampling Distributions, the CLT, and Estimationcourses.education.illinois.edu/EdPsy580/lectures/4SampDist_CLT... · Sampling Distributions, the CLT, and Estimation ... • Population

Normal: µ = 0 and σ2= 4

Sampling Distributions, the CLT, and Estimation – p. 21/63

Page 22: Sampling Distributions, the CLT, and Estimationcourses.education.illinois.edu/EdPsy580/lectures/4SampDist_CLT... · Sampling Distributions, the CLT, and Estimation ... • Population

Normal: σ2= 1, µ = 0, 5, 10

Sampling Distributions, the CLT, and Estimation – p. 22/63

Page 23: Sampling Distributions, the CLT, and Estimationcourses.education.illinois.edu/EdPsy580/lectures/4SampDist_CLT... · Sampling Distributions, the CLT, and Estimation ... • Population

Normal: µ = 0, σ2= 1, 4, 16

Sampling Distributions, the CLT, and Estimation – p. 23/63

Page 24: Sampling Distributions, the CLT, and Estimationcourses.education.illinois.edu/EdPsy580/lectures/4SampDist_CLT... · Sampling Distributions, the CLT, and Estimation ... • Population

Normal: A bunch of different ones

Sampling Distributions, the CLT, and Estimation – p. 24/63

Page 25: Sampling Distributions, the CLT, and Estimationcourses.education.illinois.edu/EdPsy580/lectures/4SampDist_CLT... · Sampling Distributions, the CLT, and Estimation ... • Population

The Standard Normal Distribution

Sampling Distributions, the CLT, and Estimation – p. 25/63

Page 26: Sampling Distributions, the CLT, and Estimationcourses.education.illinois.edu/EdPsy580/lectures/4SampDist_CLT... · Sampling Distributions, the CLT, and Estimation ... • Population

The Standard Normal Distribution• You can transform any normally distributed variable

into a standard normal one:

z–score =Y − µ

σ

• A z-score equals how many standard deviations avalue of Y is from it’s mean,

zσ = Y − µ

• Use z-scores to find probabilities of continuousvariables from tabled values or computer programs forthe standard normal distribution.

• z ∼ N (0, 1). (special case of x ∼ N (µ, σ2)).

Sampling Distributions, the CLT, and Estimation – p. 26/63

Page 27: Sampling Distributions, the CLT, and Estimationcourses.education.illinois.edu/EdPsy580/lectures/4SampDist_CLT... · Sampling Distributions, the CLT, and Estimation ... • Population

The Standard Normal Distribution

Finding areas/probabilities for the standardnormal distribution:

• Course web-site — downloadable program,pvalue.exe

• UCLA web-site• SAS function “probnorm” (default is N (0, 1),

but can ask for others).

Sampling Distributions, the CLT, and Estimation – p. 27/63

Page 28: Sampling Distributions, the CLT, and Estimationcourses.education.illinois.edu/EdPsy580/lectures/4SampDist_CLT... · Sampling Distributions, the CLT, and Estimation ... • Population

The Central Limit TheoremVersion 1 (sums): Consider a random samplefrom a population distribution having mean µ andvariance σ2. If n is sufficiently large, then thesampling distribution of

∑ni=1 Yi is approximately

normal with mean nµ and variance σ2.

Version 2 (means): Consider a random samplefrom a population distribution having mean µ andvariance σ2. If n is sufficiently large, then thesampling distribution of Y is approximatelynormal with mean µ and variance σ2

Y= σ2/n.

Sampling Distributions, the CLT, and Estimation – p. 28/63

Page 29: Sampling Distributions, the CLT, and Estimationcourses.education.illinois.edu/EdPsy580/lectures/4SampDist_CLT... · Sampling Distributions, the CLT, and Estimation ... • Population

Example: Normal (0,1) “Parent”

Parent N (0, 1) =⇒ Sampling distribution of Y isN (0, 1/

√n)

Sampling Distributions, the CLT, and Estimation – p. 29/63

Page 30: Sampling Distributions, the CLT, and Estimationcourses.education.illinois.edu/EdPsy580/lectures/4SampDist_CLT... · Sampling Distributions, the CLT, and Estimation ... • Population

Uniform Parent (µ = .5)Pink is “kernal” density and Red is normal.

Need more than n = 10 for this one. . .Sampling Distributions, the CLT, and Estimation – p. 30/63

Page 31: Sampling Distributions, the CLT, and Estimationcourses.education.illinois.edu/EdPsy580/lectures/4SampDist_CLT... · Sampling Distributions, the CLT, and Estimation ... • Population

Skewed Parent (µ = 1)Pink is “kernal” density and Red is normal.

Need more than n = 10 for this one. . .Sampling Distributions, the CLT, and Estimation – p. 31/63

Page 32: Sampling Distributions, the CLT, and Estimationcourses.education.illinois.edu/EdPsy580/lectures/4SampDist_CLT... · Sampling Distributions, the CLT, and Estimation ... • Population

Skewed Parent (µ = 1) (continued)

Sampling Distributions, the CLT, and Estimation – p. 32/63

Page 33: Sampling Distributions, the CLT, and Estimationcourses.education.illinois.edu/EdPsy580/lectures/4SampDist_CLT... · Sampling Distributions, the CLT, and Estimation ... • Population

Dice Rolling (“Multinomial”)

Sampling Distributions, the CLT, and Estimation – p. 33/63

Page 34: Sampling Distributions, the CLT, and Estimationcourses.education.illinois.edu/EdPsy580/lectures/4SampDist_CLT... · Sampling Distributions, the CLT, and Estimation ... • Population

Dice Rolling (continued)

Look pretty normal?

Sampling Distributions, the CLT, and Estimation – p. 34/63

Page 35: Sampling Distributions, the CLT, and Estimationcourses.education.illinois.edu/EdPsy580/lectures/4SampDist_CLT... · Sampling Distributions, the CLT, and Estimation ... • Population

Example: Dice Rolling (continued)

Sampling Distributions, the CLT, and Estimation – p. 35/63

Page 36: Sampling Distributions, the CLT, and Estimationcourses.education.illinois.edu/EdPsy580/lectures/4SampDist_CLT... · Sampling Distributions, the CLT, and Estimation ... • Population

Example: Dice Rolling (continued)

Population: µ = 3.5, σ2 = 2.92, σ = 1.71

The MEANS Procedure Std Dev

Variable N Mean Std Dev Should be

spot1 1 3.5 1.71 1.71/√

1 = 1.71

mean2 2 3.5 1.21 1.71/√

2 = 1.21

mean5 5 3.5 0.76 1.71/√

5 = .76

mean20 20 3.5 0.38 1.71/√

20 = .38

mean50 50 3.5 0.24 1.71/√

50 = .24

Sampling Distributions, the CLT, and Estimation – p. 36/63

Page 37: Sampling Distributions, the CLT, and Estimationcourses.education.illinois.edu/EdPsy580/lectures/4SampDist_CLT... · Sampling Distributions, the CLT, and Estimation ... • Population

Example: Dice Rolling (continued)

Sampling Distributions, the CLT, and Estimation – p. 37/63

Page 38: Sampling Distributions, the CLT, and Estimationcourses.education.illinois.edu/EdPsy580/lectures/4SampDist_CLT... · Sampling Distributions, the CLT, and Estimation ... • Population

Another Discrete Distribution (Bernoulli)

P (Y = 0) = P (Y = 1) = .5, µ = .5, σ2 = .25

Sampling Distributions, the CLT, and Estimation – p. 38/63

Page 39: Sampling Distributions, the CLT, and Estimationcourses.education.illinois.edu/EdPsy580/lectures/4SampDist_CLT... · Sampling Distributions, the CLT, and Estimation ... • Population

Another Discrete Distribution (Bernoulli)

P (Y = 0) = P (Y = 1) = .5, µ = .5, σ2 = .25

Sampling Distributions, the CLT, and Estimation – p. 39/63

Page 40: Sampling Distributions, the CLT, and Estimationcourses.education.illinois.edu/EdPsy580/lectures/4SampDist_CLT... · Sampling Distributions, the CLT, and Estimation ... • Population

Another Discrete Distribution (Bernoulli)

P (Y = 0) = P (Y = 1) = .5, µ = .5, σ2 = .25

Variable n Mean Std Dev Should bex1 1 .50 .50

.25/1 = .5

mean2 2 .50 .35√

.25/2 = .35

mean5 5 .50 .22√

.25/5 = .22

mean50 50 .50 .07√

.25/50 = .07

mean100 100 .50 .05√

.25/100 = .05

mean500 500 .50 .02√

.25/500 = .02

mean5000 5,000 .50 .01√

.25/5000 = .01

Sampling Distributions, the CLT, and Estimation – p. 40/63

Page 41: Sampling Distributions, the CLT, and Estimationcourses.education.illinois.edu/EdPsy580/lectures/4SampDist_CLT... · Sampling Distributions, the CLT, and Estimation ... • Population

Another Discrete Distribution (Bernoulli)

P (Y = 0) = .99, P (Y = 1) = .01: µ = .01,σ2 = .0099

Sampling Distributions, the CLT, and Estimation – p. 41/63

Page 42: Sampling Distributions, the CLT, and Estimationcourses.education.illinois.edu/EdPsy580/lectures/4SampDist_CLT... · Sampling Distributions, the CLT, and Estimation ... • Population

Another Discrete Distribution (Bernoulli)

P (Y = 0) = .99, P (Y = 1) = .01, µ = .01,σ2 = .0099

How about n = 500 (left) and n = 5, 000 (right)?

Sampling Distributions, the CLT, and Estimation – p. 42/63

Page 43: Sampling Distributions, the CLT, and Estimationcourses.education.illinois.edu/EdPsy580/lectures/4SampDist_CLT... · Sampling Distributions, the CLT, and Estimation ... • Population

Another Discrete Distribution (Bernoulli)

µ = .01 and σ2 = .0099

Variable n Mean Std Dev

x1 1 .01 .1028864

mean2 2 .01 .0713202

mean5 5 .01 .0444879

mean50 50 .01 .0140665

mean100 100 .01 .0099457

mean500 500 .01 .0044401

mean5000 5000 .01 .0014053

Std Dev is not exactly equal to√

σ2/n =√

.0099/n

because need more than 100,000 sample means.Sampling Distributions, the CLT, and Estimation – p. 43/63

Page 44: Sampling Distributions, the CLT, and Estimationcourses.education.illinois.edu/EdPsy580/lectures/4SampDist_CLT... · Sampling Distributions, the CLT, and Estimation ... • Population

Implication of C.L.T or NOT?• As n increases, σ2

Ydecreases; the sampling error in

estimating µ decreases when sample size increases.NOT

• Sampling distributions of (most) statistics areapproximately normal regardless of the shape of theparent (population) distribution. YES

• Sampling distributions of statistics take on morenormal shapes as n increases. Usually with as smallas n = 25 to 30, the sampling distribution is wellapproximated by the normal. YES

If the population distribution is “well behaved”, the thenormal distribution is good for almost all sample sizes.

Sampling Distributions, the CLT, and Estimation – p. 44/63

Page 45: Sampling Distributions, the CLT, and Estimationcourses.education.illinois.edu/EdPsy580/lectures/4SampDist_CLT... · Sampling Distributions, the CLT, and Estimation ... • Population

C.L.T: Summary of Implications• Since the sampling distribution of Y is approximately

N(µ, σ2/n), we can use the tabled probabilities of thestandard normal distribution to compute intervalestimates of µ and do statistical tests (i.e., makestatistical inferences about the degree ofuncertainty)....more later

• n = 25 or 30 does not imply that we have sufficientprecision. We may require much larger n’s to detectsmall effects.

n = 30 means that often the samplings distribution of Y

is approximately normal.

Sampling Distributions, the CLT, and Estimation – p. 45/63

Page 46: Sampling Distributions, the CLT, and Estimationcourses.education.illinois.edu/EdPsy580/lectures/4SampDist_CLT... · Sampling Distributions, the CLT, and Estimation ... • Population

C.L.T: Summary of Implications• The sampling distribution of Y always has mean µ and

σ2/n. The shape of the sampling distribution of Y isnormal for small n only if the population distribution ofY is normal.

• n = 30 does not ensure that the sampling distributionof a statistic will be even approximately normal.

There are cases were it requires much larger samples.These cases usually are ones where the statistic equalthe sum of values that are discrete (e.g., Y = 0, 1) andthe probability of (say) Y = 1 is very very small.

• C.L.T. can be proven mathematically.

oSampling Distributions, the CLT, and Estimation – p. 46/63

Page 47: Sampling Distributions, the CLT, and Estimationcourses.education.illinois.edu/EdPsy580/lectures/4SampDist_CLT... · Sampling Distributions, the CLT, and Estimation ... • Population

Estimators and Estimates

• An estimator is a formula for computing anestimate

• An estimator is a random variable whosevalue depends on your sample.

• An estimate is a particular value of anestimator.

• Examples:

Sampling Distributions, the CLT, and Estimation – p. 47/63

Page 48: Sampling Distributions, the CLT, and Estimationcourses.education.illinois.edu/EdPsy580/lectures/4SampDist_CLT... · Sampling Distributions, the CLT, and Estimation ... • Population

Estimators and Estimates(continued)

• Examples of estimators and estimates:• The sample mean and variance are

estimators,

Y = (1/n)n

i=1

Yi s2n = (1/n)

n∑

i=1

(Yi − Y )2

• Given data from a sample the estimatesare. e.g. HSB reading scores:

Y = 55.89 s2n = 80.00

• The above estimates are point estimates.Sampling Distributions, the CLT, and Estimation – p. 48/63

Page 49: Sampling Distributions, the CLT, and Estimationcourses.education.illinois.edu/EdPsy580/lectures/4SampDist_CLT... · Sampling Distributions, the CLT, and Estimation ... • Population

Properties of Estimators

• Bias• Consistency• Relative efficiency• Sufficiency• Maximum likelihood

Sampling Distributions, the CLT, and Estimation – p. 49/63

Page 50: Sampling Distributions, the CLT, and Estimationcourses.education.illinois.edu/EdPsy580/lectures/4SampDist_CLT... · Sampling Distributions, the CLT, and Estimation ... • Population

Properties of Estimators: Bias• An estimator is unbiased if it’s expected value

equals the population value.• An estimator is biased if it’s expected value

does not equal the population value.

• The sample mean Y = (1/n)∑n

i=1 Yi is anunbiased estimator of µ:

E(Y ) = µ

• If the parent population is normal then themedian and mode are an unbiasedestimators of µ:

E(median) = E(mode) = µSampling Distributions, the CLT, and Estimation – p. 50/63

Page 51: Sampling Distributions, the CLT, and Estimationcourses.education.illinois.edu/EdPsy580/lectures/4SampDist_CLT... · Sampling Distributions, the CLT, and Estimation ... • Population

Properties of Estimators: Bias(continued)

• The sample variance s2n = (1/n)

∑ni=1(Yi− Y )2

is a biased estimator of σ2:

E(s2n) = σ2 − 1

nσ2

It’s a little too small.• The unbiased estimator of σ2 is

s2 = (1/(n − 1))∑n

i=1(Yi − Y )2:

E(s2) = σ2

Sampling Distributions, the CLT, and Estimation – p. 51/63

Page 52: Sampling Distributions, the CLT, and Estimationcourses.education.illinois.edu/EdPsy580/lectures/4SampDist_CLT... · Sampling Distributions, the CLT, and Estimation ... • Population

Consistency & Efficiency• Consistency: As the sample size n increases,

the sample statistic “converges in probability”to the population value.• The sample mean Y is a consistent estimator of µ.

• The 2nd observation in a sample is not a consistentestimator of µ.

• Relative Efficiency: An estimator is moreefficient if the variance of it’s samplingdistribution is less than the variance ofanother estimator. e.g., For normal Y, Y ismore efficient than the median.

Sampling Distributions, the CLT, and Estimation – p. 52/63

Page 53: Sampling Distributions, the CLT, and Estimationcourses.education.illinois.edu/EdPsy580/lectures/4SampDist_CLT... · Sampling Distributions, the CLT, and Estimation ... • Population

Sufficient• The statistic contains all the information in the

data about the population parameter.

e.g., Y is sufficient for µand

∑ni=1 Yi is sufficient for µ.

• Sufficient statistics don’t always exist.• In some population distributions, you may

need more than 1 parameter to completelyspecify the distribution.

e.g., Bernoulli needs the mean (orprobability).

Normal distribution needs Y and s2.Sampling Distributions, the CLT, and Estimation – p. 53/63

Page 54: Sampling Distributions, the CLT, and Estimationcourses.education.illinois.edu/EdPsy580/lectures/4SampDist_CLT... · Sampling Distributions, the CLT, and Estimation ... • Population

Maximum Likelihood

An estimator that maximizes the likelihood(probability) of obtaining the sample you got.

• y is the M.L.E of µ (it’s also consistent,efficient, and unbiased).

• s2n is the M.L.E but is biased.

• s2 is not the M.L.E but is unbiased.

Sampling Distributions, the CLT, and Estimation – p. 54/63

Page 55: Sampling Distributions, the CLT, and Estimationcourses.education.illinois.edu/EdPsy580/lectures/4SampDist_CLT... · Sampling Distributions, the CLT, and Estimation ... • Population

Interval Estimates & Statistical Inference

• So far, we’ve just considered “pointestimates” (a “best guess”).

• We might want a range of possible values.• A range of values that has a high probability

of containing the true population value.• Confidence Interval Estimate

Sampling Distributions, the CLT, and Estimation – p. 55/63

Page 56: Sampling Distributions, the CLT, and Estimationcourses.education.illinois.edu/EdPsy580/lectures/4SampDist_CLT... · Sampling Distributions, the CLT, and Estimation ... • Population

Confidence Interval for µ

• We know• E(Y ) = µ

• σ2Y

= σ2/n

• We assume that the sampling distribution ofY is normal (i.e., that n is “large enough”);that is

Y ≈ N(µ, σ2/n)

Sampling Distributions, the CLT, and Estimation – p. 56/63

Page 57: Sampling Distributions, the CLT, and Estimationcourses.education.illinois.edu/EdPsy580/lectures/4SampDist_CLT... · Sampling Distributions, the CLT, and Estimation ... • Population

Sampling Distribution of Y

E(Y ) = µ and σ2Y

= σ2/n

Sampling Distributions, the CLT, and Estimation – p. 57/63

Page 58: Sampling Distributions, the CLT, and Estimationcourses.education.illinois.edu/EdPsy580/lectures/4SampDist_CLT... · Sampling Distributions, the CLT, and Estimation ... • Population

Confidence Interval for µ

• Our best point estimate is Y , so an intervalestimate should be centered around Y .

• We add and subtract an amount c such that

Prob[

(Y − c) ≤ µ ≤ (Y + c)]

= 1 − α

• To find the value of c transform Y to z-scores;that is,

z =Y − µ

σY

where σx =√

σ2/n.

Sampling Distributions, the CLT, and Estimation – p. 58/63

Page 59: Sampling Distributions, the CLT, and Estimationcourses.education.illinois.edu/EdPsy580/lectures/4SampDist_CLT... · Sampling Distributions, the CLT, and Estimation ... • Population

Transform to z-Scoresz = (Y − µ)/σ2

x

Sampling Distributions, the CLT, and Estimation – p. 59/63

Page 60: Sampling Distributions, the CLT, and Estimationcourses.education.illinois.edu/EdPsy580/lectures/4SampDist_CLT... · Sampling Distributions, the CLT, and Estimation ... • Population

Confidence Interval for µ• Before we look at data,

Prob(

−zα/2 ≤Y − µ

σY

≤ zα/2

)

= 1 − α

Prob(

Y − zα/2σ2Y ≤ µ ≤ Y + zα/2σ

2Y

)

= 1 − α

• Once you get data, an interval estimate of µ is

x ± zα/2σx

• The probability that µ is in this interval is NOT1 − α.

Sampling Distributions, the CLT, and Estimation – p. 60/63

Page 61: Sampling Distributions, the CLT, and Estimationcourses.education.illinois.edu/EdPsy580/lectures/4SampDist_CLT... · Sampling Distributions, the CLT, and Estimation ... • Population

Correct Interpretation of CI

• Consider repeating the process of1. Draw/take a sample of size n

2. Compute the (1 − α)th confidence interval.

• (1 − α) × 100 percent of the time, the intervalwould contain µ.

• Note: later we’ll consider the more realisticsituation where estimate σ.

Sampling Distributions, the CLT, and Estimation – p. 61/63

Page 62: Sampling Distributions, the CLT, and Estimationcourses.education.illinois.edu/EdPsy580/lectures/4SampDist_CLT... · Sampling Distributions, the CLT, and Estimation ... • Population

HSB Reading Scores for Academic• Sample statistics for students attending

academic/prep school and the variable “RDG”(reading achievement in T-scores):

n = 308, Y = 55.89, s2 = 87.15, s = 9.34

• Standard error of the mean= 9.34/

√308 = .53

• The sampling of distribution of Y should bevery well approximated by the normaldistribution because of large n anddistribution of RDG scores is “nice”.

Sampling Distributions, the CLT, and Estimation – p. 62/63

Page 63: Sampling Distributions, the CLT, and Estimationcourses.education.illinois.edu/EdPsy580/lectures/4SampDist_CLT... · Sampling Distributions, the CLT, and Estimation ... • Population

HSB Reading Scores for Academic

64%CI: 55.89 ± 1.00(.53) −→ (55.36, 56.42)

90%CI: 55.89 ± 1.645(.53) −→ (55.02, 56.77)

95%CI: 55.89 ± 1.96(.53) −→ (54.85, 56.93)

99%CI: 55.89 ± 2.58(.53) −→ (54.52, 57.26)

Higher confidence levels (smaller α) −→the wider the intervals.

Sampling Distributions, the CLT, and Estimation – p. 63/63