stratified sampling - naval postgraduate schoolfaculty.nps.edu/rdfricke/oa4109/lecture 7-1 --...

23
Stratified Sampling Professor Ron Fricker Naval Postgraduate School Monterey, California 2/1/13 1 Reading Assignment: Scheaffer, Mendenhall, Ott, & Gerow Chapter 5

Upload: hakhuong

Post on 31-Aug-2018

224 views

Category:

Documents


0 download

TRANSCRIPT

Stratified Sampling!

Professor Ron Fricker!Naval Postgraduate School!

Monterey, California!

2/1/13 1

Reading Assignment:!Scheaffer, Mendenhall, Ott, & Gerow!

Chapter 5!

Goals for this Lecture!

•  Define stratified sampling!•  Discuss reasons for stratified sampling!•  Derive estimators for the population!

–  Mean!–  Total!–  Percentage!

•  Sample size calculation and allocation!–  Sampling proportional to strata size!

•  Discuss design effects!

2/1/13 2

Stratified Sampling!

•  Stratified sampling divides the sampling frame up into strata from which separate probability samples are drawn!

•  Examples!–  GWI survey, needed to obtain information from

members of each military service!–  For external validity, WMD survey had to sample

large urban areas!–  In a particular country, survey in support of IW

requires information on both Muslim and Christian communities!

2/1/13 3

Reasons for Stratified Sampling!

•  More precision!–  Assuming strata are relatively homogeneous, can

reduce the variance in the sample statistic(s)!–  So, get better information for same sample size!

•  Estimates (of a particular precision) needed for subgroups!–  May be necessary to meet survey’s objectives!–  May have to oversample to get sufficient

observations from small strata !•  Cost!

–  May be able to reduce cost per observation if population stratified into relevant groupings!

2/1/13 4

5

An Example!

2/1/13 5

Source: Survey Methodology, 1st ed., Groves, et al, 2004.

Strata!

•  When selecting a stratified random sample, must clearly specify the strata!–  Non-overlapping categories into which each

sampling unit must be classified!–  Sampling units can only be in one strata!–  Strata based on information about whole

population!•  Can have more than one type of classification!

–  E.g., officer/enlisted and male/female!–  Then each sampling unit (person in this case)

must be classified into one of four stratum!

2/1/13 6

Some Notation!

•  Additional notation for stratified sampling!•  L is the number of strata!•  Ni is the number of sampling units in stratum i •  ni is the sample size in stratum i •  N is the total number of sampling units in the

population: N = N1 + N2 + … + NL

•  In this lecture, we use SRS within each strata!•  That does not necessarily mean each strata has

the same selection probabilities!•  And often more complex sampling done within the

strata!

2/1/13 7

Mean Estimation!

•  Within a strata, probability a unit is sampled is ni/Ni

•  Via the SRS lecture, we know that an unbiased estimate of the mean of strata i is!

•  So the estimated total for strata i is!•  And the population mean estimate is just the

sum of all the estimated totals divided by the population size!

2/1/13 8 1 1

ˆ 1 1ˆL L

st i i ii i

y N yN N Nτ τ

= =

= = =∑ ∑

1Ni

1π i

yijj=1

ni

∑ = 1Ni

1ni / Ni

yijj=1

ni

∑ = 1Ni

Ni

ni

yijj=1

ni

∑ = 1ni

yijj=1

ni

∑ = yi

i i iN yτ =

Mean Estimation Summary!

•  Estimator for the mean:!•  Variance of :!

!

•  Bound on the error of estimation:!2/1/13 9

1

1 L

st i ii

y N yN =

= ∑sty

Var yst( ) = Var 1N

Ni yii=1

L

∑⎛⎝⎜

⎞⎠⎟

= 1N 2 Ni

2

i=1

L

∑ Var yi( )

= 1N 2 Ni

2

i=1

L

∑ 1−ni

Ni

⎝⎜⎞

⎠⎟si

2

ni

⎝⎜⎞

⎠⎟

2 Var yst( )

Example: TV Viewing Time (1)!

•  Survey to estimate average TV viewing time!

•  Population consists of two urban areas and rural residents!

•  Reasons to stratify:!–  Similarity of viewing

habits by towns and rural regions!

–  Cost: perhaps cheaper to survey each region separately (?)!

•  Sample allocated proportional to strata size!

2/1/13 10

Source: Elementary Survey Sampling, 7th ed., Scheaffer, Mendenhall, Ott, and Gerow, 2012.

Example: TV Viewing Time (2)!

2/1/13 11 Source: Elementary Survey Sampling, 7th ed., Scheaffer, Mendenhall, Ott, and Gerow, 2012.

Example: TV Viewing Time (3)!

2/1/13 12

Source: Elementary Survey Sampling, 7th ed., Scheaffer, Mendenhall, Ott, and Gerow, 2012.

Example: TV Viewing Time (4)!

•  Correct analysis:!

•  What if you just treated this as a SRS?!–  Incorrect analysis with fpc:!

–  Incorrect analysis without fpc:!

2/1/13 13

ˆ27.7, 1.4, 95% CI = (24.9, 30.5)stst yy σ= =

ˆ27.7, 11.3, 95% CI = (5.1, 50.3)SRS yy σ′ = =

ˆ27.7, 10.6, 95% CI = (6.5, 48.9)SRS yy σ= =

Design Effect!

•  The design effect (aka deff) compares how a complex sampling design, in this case stratified sampling, compares to SRS!

•  Design effect can be greater or less than 1!–  But with reasonably homogeneous strata, almost

always get decrease in variance!–  For this example, !

2/1/13 14

d 2 =

Var Yst( )Var YSRS( ) =

1N 2 Ni

2 1− ni / Ni( )si2 / ni

i=1

L

∑1− n / N( )s2 / n

d 2 =

Var Yst( )Var YSRS( ) =

1.97112.36

= 0.02

Total Estimation Summary!

•  Estimator for the total:!•  Variance of :!

!

•  Bound on the error of estimation:!2/1/13 15

1

ˆL

st st i ii

Ny N yτ=

= =∑ˆstτ

Var τ st( ) = Var Ni yii=1

L

∑⎛⎝⎜⎞⎠⎟

= Ni2

i=1

L

∑ Var yi( )

= Ni2

i=1

L

∑ 1−ni

Ni

⎝⎜⎞

⎠⎟si

2

ni

⎝⎜⎞

⎠⎟

2 Var τ st( )

Proportion Estimation Summary!

•  Proportion estimator:!•  Variance of :!

!•  Bound on the error of estimation:!2/1/13 16

1

1ˆ ˆL

st i ii

p N pN =

= ∑sty

Var pst( ) = Var 1N

Ni pii=1

L

∑⎛⎝⎜

⎞⎠⎟

= 1N 2 Ni

2

i=1

L

∑ Var pi( )

= 1N 2 Ni

2

i=1

L

∑ 1−ni

Ni

⎝⎜⎞

⎠⎟pi(1− pi )

ni

⎝⎜⎞

⎠⎟

2 Var pst( )

Sample Size Determination & Allocation!

•  Calculations more complicated as need to !–  Choose overall sample size n, and !–  Allocate sample to strata, n1 + n2 + … + nL = n !

•  “Best” allocation depends on!–  Purpose of the stratification!–  Number of sampling units in each stratum!–  Variability of sampling units within each stratum!–  Cost of surveying each sampling unit from each

stratum!

2/1/13 17

Sample Size Calculation for Estimating the Mean!

•  Begin as with SRS, setting!•  Now, define ai as the proportion of the sample

size to allocate to strata i, ni= n x ai !•  Then,!

2/1/13 18

B = 2 Var yst( )

n =Ni

2σ i2 ai

i=1

L

N 2B2 4+ Ni2σ i

2

i=1

L

Proportionate Allocation to Strata!

•  Sample size within each strata is proportional to strata size in population!

•  If N is population size and n is total sample size, then where!–  Ni is the population size of stratum i –  ni is the sample size for stratum i !

•  Then!

2/1/13 19

/ /i in n N N=

n =Ni

2σ i2 ai

i=1

L

N 2B2 4+ Ni2σ i

2

i=1

L

∑=

Ni2σ i

2( ) Ni N( )i=1

L

N 2B2 4+ Ni2σ i

2

i=1

L

∑=

Niσ i2

i=1

L

NB2 4+ 1N

Niσ i2

i=1

L

20

Example!

•  Note (simple average)!•  However!

2/1/13 20

(0.4 6) (0.5 5) (0.1 8) 5.7sty = × + × + × =(6 5 8) / 3 6.3y = + + =

Source: Survey Methodology, 1st ed., Groves, et al, 2004.

Ni Ni/N ni ni/Ni iy2is

Other Allocation Schemes!

•  Rather than allocating sample proportional to strata size, can!–  Allocate according to variability of the strata!

•  Idea is to allocate more of the sample to strata that are more variable!

•  Done right, can provide most precise population estimates!

–  Allocate according to the cost of collection per strata!

•  Idea is to allocate more of the sample to strata that cost less!

2/1/13 21

Poststratification!

•  Poststratification used when strata variable(s) not observed until after survey completed!–  That is, stratify on data collected in the survey!

•  Useful when sample demographics turn out not to match the population demographics!–  As with stratification, still need to know the

distribution of the strata variable in the population!–  But don’t need to know the values for each

element in the sampling frame!•  See text for more detail !

2/1/13 22

What We Have Covered!

•  Defined stratified sampling!•  Discussed reasons for stratified sampling!•  Derived estimators for the population!

–  Mean!–  Total!–  Percentage!

•  Sample size calculation and allocation!–  Sampling proportional to strata size!

•  Discussed design effects!

2/1/13 23