the world bank sample design issues in egra session 1.4 1
TRANSCRIPT
THE WORLD BANK
Sample Design Issues in EGRA
Session 1.4
1
Outline
1. General considerations/background
2. Specifics based on experiences thus far
3. How use affects size and design
2
General considerations & background
Almost everything we say will depend on the use, e.g.: Broad policy awareness
Project impact evaluation
Project monitoring/tracking
All require different approaches and sample sizes
3
Why sample? Much more efficient than full count Achieve reasonable accuracy at fraction of cost Example: for one grade, how many students? “Full count” is an illusion in any case?
Assumptions: Mean 30, SD 30, DEFT 2.5, for field labor is 20 children/day at $100/day for field labor if use paid labor
General considerations & background
3,500 (cluster sample) 600,000 (census)
Accuracy:(95% conf interval on words per minute)
True value lies in range of 27.5 to 32.5
True value lies in range of 29.9 to 30.1
Considering that grade-to-grade diff is 14, a confidence interval of 5 would be great, at ½ of 1% of the cost of a census
Cost (field labor only!) $17,500 $3,000,000
4
General considerations & background
Look at size and types of sample
Size – two common myths Common in many countries: “a 2% sample”
− This is a largely meaningless, yet very widespread, notion− A sample of 2 out of a population of 100 is useless− A sample of 20,000 out of a population of 10,000,000 is way bigger
than needed for any reasonable purpose− Percentages are not a good guide to sample size
Population: mostly irrelevant− E.g.: population of 10,000 might need sample of 370− But population of 10,000,000 needs only 384− A sip of soup will tell us how salty the soup is no matter how big the
pot is, as long as well stirred
5
General considerations & background
Sample Size – the 3 determinants that really matter
How confident we want to be: 90%, 95% or 99%?− The more confidence we want, the bigger the sample we need
How big a margin of error we are willing to tolerate?− (How wide a confidence interval)− The less tolerant we are, the bigger the sample we need
Margin of error in terms of, say, words per minute: 7? 14?
How variable is the thing we are trying to measure?− The more variable, the bigger the sample we need
6
Outline
1. General considerations/background
2. Specifics based on experiences thus far
3. How use affects size and design
7
Specifics based on experience thus far
Sample Size: Considerations
We propose to use Oral Reading Fluency in connected text, in terms of correct words per minute, as the key “marker” for driving sample size calculations (cwpm)
From our research in 7 countries thus far we can tell that: Average cwpm difference between grades: 14 Average standard deviation: 29
And we figure 95% confidence is good enough (EPI studies in health sector use 90%)
8
Likely sample sizes needed
Assumes an ICC of 0.45 and 12 children per school (per grade)
ConfidenceError tolerance
(width of confidence interval)
3.5 7 14
90% 4,563 1,141 285
95% 6,542 1,635 409
99% 11,571 2,893 723
9
An important design issue
All these sample sizes assume a clustered approach
A simple random sample where you pick children completely at random requires smaller samples
The proposed (“clustered”) approach means selecting schools at random and then children at random
Two advantages (at least):
More economical, because of transport costs
We have no universal lists of children
10
An important design issue (cont’d)
But children within schools vary less than children in general, so there is a penalty
Generally, this means we need samples sizes about 6 times bigger than in a simple random sample
Pick schools, then pick, say, 12 students per school per grade
But we still save money, because need to visit less schools
11
Outline
1. General considerations/background
2. Specifics based on experiences thus far
3. How use affects size and design
12
How does use affect sampling?
Country-wide “awareness-raising” study requires one nation-wide (clustered) sample Maybe even with lower confidence levels
Tracking progress over time probably requires more accuracy, thus bigger samples And probably requires re-sampling to prevent gaming the
indicators (aside from other issues related to instrument design)
Teacher use for monitoring and loose parental accountability might require “census” of all students in classroom, not sample
Project use for monitoring school performance (or teacher performance) would require more than 12 students per school (requires 20) and adequate/inadequate classification
Too complex for this session, but some key points/examples:
13
Lot quality assurance sampling (LQAS)
Can borrow it from industry, health sector
Used if we don’t care about the average (e.g., average cwpm) but only whether given schools are “compliant” with a minimum cwpm or not
Allows one to judge a school based on a sample as small as 20 students
Note that otherwise one cannot use samples as small as 20 students to judge a specific school in terms of an average
Again, a technical subject – we don’t have time, but do remember that one can monitor with samples as small as 20, if all we care about is “compliance” vs “non-compliance”
14