designing an impact evaluation: randomization, statistical power, and some more fun…

20
Designing an impact evaluation: Randomization, statistical power, and some more fun…

Upload: liana-terrel

Post on 30-Mar-2015

218 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Designing an impact evaluation: Randomization, statistical power, and some more fun…

Designing an impact evaluation: Randomization, statistical power, and

some more fun…

Page 2: Designing an impact evaluation: Randomization, statistical power, and some more fun…

Designing a (simple) RCT in a couple steps

• You want to evaluate the impact of something (a program, a technology, a piece of information, etc.) on an outcome.

Example: Evaluate the impact of free school meals on pupils’s schooling outcomes.

• You decide to do it through a randomized controlled trial. – Why?

• The questions that follow: – Type of randomization – What is most appropriate? – Unit of randomization – What do we need to think about?– Sample size

> These are the things we will talk about now.

Page 3: Designing an impact evaluation: Randomization, statistical power, and some more fun…

I. Where to start• You have an HYPOTHESIS

Example: Free meals => increased school attendance => increased amount of schooling => improved test scores. Or could it go the other way?

• To test your hypothesis, you want to estimate the impact of a variable T on an outcome Y for an individual i.

In a simple regression framework:

• How could you do this? – Compare schools with free meals to schools with no free meals? – Compare test scores before the free meal program was implemented

to test scores after?

Yi=αi+βT+εi

Page 4: Designing an impact evaluation: Randomization, statistical power, and some more fun…

• You decided to do use a randomized design. Why??

– Randomization removes the selection bias > Trick question: Does the sample need to be randomly sampled from the entire population?

– Randomization solves the causal inference issue, by providing a counterfactual = comparison group.

While we can’t observe YiT and Yi

C at the same time, we can measure the average treatment effect by computing the difference in mean outcome between two a priori comparable groups.

We measure: ATE=E[YT]- E[YC]

II. Randomization basics

Page 5: Designing an impact evaluation: Randomization, statistical power, and some more fun…

• What to think of when deciding on your design? – Types of randomization/ unit of randomization

• Block design • Phase-in• Encouragement design• Stratification?

The decision should come from (1) your hypothesis, (2) your partner’s implementation plans, (3) the type of intervention!

Example: What would you do?

• Next step: How many units? = SAMPLE SIZE. Intuition --> Why do we need many observations?

II. Randomization basics

Page 6: Designing an impact evaluation: Randomization, statistical power, and some more fun…

Remember, we’re interested in Mean(T)-Mean(C)We measure scores in 1 treatment school and 1 control school> Can I say anything?

Page 7: Designing an impact evaluation: Randomization, statistical power, and some more fun…

Now 50 schools:

Page 8: Designing an impact evaluation: Randomization, statistical power, and some more fun…

Now 500 schools:

Page 9: Designing an impact evaluation: Randomization, statistical power, and some more fun…

• But how to pick the optimal size? -> It all depends on the minimum effect size you’d want to be able to detect.Note: Standardized effect sizes.

• POWER CALCULATIONS link minimum effect size to design. • They depend on several factors:

– The effect size you want– Your randomization choices– The baseline characteristics of your sample– The statistical power you want– The significance you want for your estimatesWe’ll look into these factors one by one, starting by the end…

III. Sample size

Page 10: Designing an impact evaluation: Randomization, statistical power, and some more fun…

• When trying to test an hypothesis, one actually tests the null hypothesis H0 against the alternative hypothesis Ha, and tries to reject the null.

H0: Effect size=0

Ha: Effect size≠0• Two types of error are to fear:

III. Power calculations (1) Hypothesis testing

TRUTH

YOUR CONCLUSION

Effective (reject H0) No effect (can’t reject H0)

Effective TYPE II ERROR POWER

No effect TYPE I ERROR SIGNIFICANCE

Page 11: Designing an impact evaluation: Randomization, statistical power, and some more fun…

• SIGNIFICANCE= Probability that you’d conclude that T has an effect when in fact it doesn’t. It tells you how confident you can be in your answer. (Denoted α)– Classical values: 1, 5, 10% – Hypothesis testing basically comes down to testing equality of means

between T and C using a t-test. For the effect to be significant, it must be that the t-stat obtained be greater than the t-stat of the significance level wanted. Or again:

must be greater or equal to tα=1.96

III. Power calculations (1) Significance

)(

ˆ

set

Page 12: Designing an impact evaluation: Randomization, statistical power, and some more fun…

• POWER= Probability that, if a significant effect exists, you will find it for a given sample size. (Denoted κ)– Classical values: 80, 90%

• To achieve a power κ, it must be that:• Or graphically…

• In short: To have a high chance to detect an effect, one needs enough power, which depends on the standard error of the estimate of ß.

III. Power calculations (2) Power

Page 13: Designing an impact evaluation: Randomization, statistical power, and some more fun…

• Intuition = the higher the standard error, the less precise the estimate, the more tricky it is to identify an effect, the higher the need for power!– Demonstration: How does the spread of a variable impact on the

precision a mean comparison test??

• We saw that power depended on the SE of the estimate of ß. But what does this standard error depend on?

– Standard deviation of the error (how heterogenous the sample is)– The proportion of the population treated (Randomization choices)– The sample size

III. Power calculations (3) Standard error of the estimate

Page 14: Designing an impact evaluation: Randomization, statistical power, and some more fun…
Page 15: Designing an impact evaluation: Randomization, statistical power, and some more fun…
Page 16: Designing an impact evaluation: Randomization, statistical power, and some more fun…
Page 17: Designing an impact evaluation: Randomization, statistical power, and some more fun…

• We now have all the ingredients of the equation. The minimum detectable effect (MDE) is:

• As you can see: – The higher the heterogeneity of the sample, the higher the MDE,– The lower N, the higher the MDE, – The higher the power, the lower the MDE

• Power calculations in practice, will correspond to playing with all these ingredients to find the optimal design to satisfy your MDE.– Optimal sample size?– Optimal portion treated?

III. Power calculations (4) Calculations

Page 18: Designing an impact evaluation: Randomization, statistical power, and some more fun…

• Several treatments? – What happens when more than one treatment?– It all depends on what you want to compare !!

• Stratification? – Reduces the standard deviation

• Clustered (block) design?– When using clusters, the outcomes of the observations within a cluster can

be correlated. What does this mean?– Intra-cluster correlation rhô, the portion of the total variance explained by

within variance, implies an increase in overall variance. – Impact on MDE?

– In short: the higher rhô, the higher the MDE (increase can be large)

III. Power calculations (5) More complicated frameworks

Page 19: Designing an impact evaluation: Randomization, statistical power, and some more fun…

• When thinking of designing an experiment: 1. What is your hypothesis?2. How many treatment groups?3. What unit of randomization? 4. What is the minimum effect size of interest? 5. What optimal sample size considering power/budget?

=> Power calculations !

Summary

Page 20: Designing an impact evaluation: Randomization, statistical power, and some more fun…