the world bank sample design issues in egra session 1.4 1

THE WORLD BANK

Sample Design Issues in EGRA

Session 1.4

1

Outline

1. General considerations/background

2. Specifics based on experiences thus far

3. How use affects size and design

2

General considerations & background

Almost everything we say will depend on the use, e.g.: Broad policy awareness

Project impact evaluation

Project monitoring/tracking

All require different approaches and sample sizes

3

Why sample? Much more efficient than full count Achieve reasonable accuracy at fraction of cost Example: for one grade, how many students? “Full count” is an illusion in any case?

Assumptions: Mean 30, SD 30, DEFT 2.5, for field labor is 20 children/day at $100/day for field labor if use paid labor


3,500 (cluster sample) 600,000 (census)

Accuracy:(95% conf interval on words per minute)

True value lies in range of 27.5 to 32.5

True value lies in range of 29.9 to 30.1

Considering that grade-to-grade diff is 14, a confidence interval of 5 would be great, at ½ of 1% of the cost of a census

Cost (field labor only!) $17,500 $3,000,000

4


Look at size and types of sample

Size – two common myths Common in many countries: “a 2% sample”

− This is a largely meaningless, yet very widespread, notion− A sample of 2 out of a population of 100 is useless− A sample of 20,000 out of a population of 10,000,000 is way bigger

than needed for any reasonable purpose− Percentages are not a good guide to sample size

Population: mostly irrelevant− E.g.: population of 10,000 might need sample of 370− But population of 10,000,000 needs only 384− A sip of soup will tell us how salty the soup is no matter how big the

pot is, as long as well stirred

5


Sample Size – the 3 determinants that really matter

How confident we want to be: 90%, 95% or 99%?− The more confidence we want, the bigger the sample we need

How big a margin of error we are willing to tolerate?− (How wide a confidence interval)− The less tolerant we are, the bigger the sample we need

Margin of error in terms of, say, words per minute: 7? 14?

How variable is the thing we are trying to measure?− The more variable, the bigger the sample we need

6

Outline




7

Specifics based on experience thus far

Sample Size: Considerations

We propose to use Oral Reading Fluency in connected text, in terms of correct words per minute, as the key “marker” for driving sample size calculations (cwpm)

From our research in 7 countries thus far we can tell that: Average cwpm difference between grades: 14 Average standard deviation: 29

And we figure 95% confidence is good enough (EPI studies in health sector use 90%)

8

Likely sample sizes needed

Assumes an ICC of 0.45 and 12 children per school (per grade)

ConfidenceError tolerance

(width of confidence interval)

3.5 7 14

90% 4,563 1,141 285

95% 6,542 1,635 409

99% 11,571 2,893 723

9

An important design issue

All these sample sizes assume a clustered approach

A simple random sample where you pick children completely at random requires smaller samples

The proposed (“clustered”) approach means selecting schools at random and then children at random

Two advantages (at least):

More economical, because of transport costs

We have no universal lists of children

10

An important design issue (cont’d)

But children within schools vary less than children in general, so there is a penalty

Generally, this means we need samples sizes about 6 times bigger than in a simple random sample

Pick schools, then pick, say, 12 students per school per grade

But we still save money, because need to visit less schools

11

Outline




12

How does use affect sampling?

Country-wide “awareness-raising” study requires one nation-wide (clustered) sample Maybe even with lower confidence levels

Tracking progress over time probably requires more accuracy, thus bigger samples And probably requires re-sampling to prevent gaming the

indicators (aside from other issues related to instrument design)

Teacher use for monitoring and loose parental accountability might require “census” of all students in classroom, not sample

Project use for monitoring school performance (or teacher performance) would require more than 12 students per school (requires 20) and adequate/inadequate classification

Too complex for this session, but some key points/examples:

13

Lot quality assurance sampling (LQAS)

Can borrow it from industry, health sector

Used if we don’t care about the average (e.g., average cwpm) but only whether given schools are “compliant” with a minimum cwpm or not

Allows one to judge a school based on a sample as small as 20 students

Note that otherwise one cannot use samples as small as 20 students to judge a specific school in terms of an average

Again, a technical subject – we don’t have time, but do remember that one can monitor with samples as small as 20, if all we care about is “compliance” vs “non-compliance”

14

the world bank sample design issues in egra session 1.4 1

Documents