randomized controlled trial: a historical perspective...randomized controlled trial: a historical...
TRANSCRIPT
Randomized controlled trial: a historicalperspective
Luc Behaghel (PSE and Crest) and Philippe Zamora (Crest)
J-PAL Advanced course, Paris 2012
Randomized controlled trials: a long history in socialsciences
experimental psychology (late 19th century)
education (early 20th century)
experimental sociology (E. Greenwood, F.S. Chapin – early20th century)
rural health educationsocial effects of public housingrecreation programs for “delinquent” boys...
I The statistical framework used nowadays only came later (R.A.Fischer, Design of Experiments, 1935)
Randomized controlled trials: a long history in socialsciences
experimental psychology (late 19th century)
education (early 20th century)
experimental sociology (E. Greenwood, F.S. Chapin – early20th century)
rural health educationsocial effects of public housingrecreation programs for “delinquent” boys...
I The statistical framework used nowadays only came later (R.A.Fischer, Design of Experiments, 1935)
Randomized controlled trials: a long history in socialsciences
experimental psychology (late 19th century)
education (early 20th century)
experimental sociology (E. Greenwood, F.S. Chapin – early20th century)
rural health educationsocial effects of public housingrecreation programs for “delinquent” boys...
I The statistical framework used nowadays only came later (R.A.Fischer, Design of Experiments, 1935)
What about randomized clinical trials?
Large-scale randomized clinical trials sometimes viewed as a modelfor social sciences
A norm since the 1962 Drug Amendments
proof of “efficacy” required prior to marketing (“safety”required since 1938)based on “adequate and well-controlled studies”I rapidly interpreted as implying a control group and randomassignment to control or treatment, i.e. RCT
yet: only emerged very progressively(Marks, The Progress of Experiment, 1997)
went (and goes) through substantial debates
What about randomized clinical trials?
Large-scale randomized clinical trials sometimes viewed as a modelfor social sciences
A norm since the 1962 Drug Amendments
proof of “efficacy” required prior to marketing (“safety”required since 1938)based on “adequate and well-controlled studies”I rapidly interpreted as implying a control group and randomassignment to control or treatment, i.e. RCT
yet: only emerged very progressively(Marks, The Progress of Experiment, 1997)
went (and goes) through substantial debates
The Progress of Experiment
No influence of statistics on (clinical) medicine before 1950⇒ how can we explain its influence today?
1 Successful alliance of statisticians and “therapeutic reformers”(academic physicians) anxious to discipline physicians (in theirprescriptions) and pharmaceutical companies (in their claims),in the 1950s
Coincides with diffusion of statistical concepts and methods inmany areas (genetics, psychology, economics, physics)
2 Main arguments: objectivity and “common sense” against theinvestigator’s subjective bias: “The random method removesall responsibility from the observer.” (Bradford Hill, 1953)
3 Caveat: “an incomplete revolution, one in which the mostphysicians were acquainted neither with the intellectual powerthat lay behind the procedures advocated by statisticians norwith the limitations of statistical methods.”(Marks, 1997, p. 138)
The Progress of Experiment
No influence of statistics on (clinical) medicine before 1950⇒ how can we explain its influence today?
1 Successful alliance of statisticians and “therapeutic reformers”(academic physicians) anxious to discipline physicians (in theirprescriptions) and pharmaceutical companies (in their claims),in the 1950s
Coincides with diffusion of statistical concepts and methods inmany areas (genetics, psychology, economics, physics)
2 Main arguments: objectivity and “common sense” against theinvestigator’s subjective bias: “The random method removesall responsibility from the observer.” (Bradford Hill, 1953)
3 Caveat: “an incomplete revolution, one in which the mostphysicians were acquainted neither with the intellectual powerthat lay behind the procedures advocated by statisticians norwith the limitations of statistical methods.”(Marks, 1997, p. 138)
The Progress of Experiment
No influence of statistics on (clinical) medicine before 1950⇒ how can we explain its influence today?
1 Successful alliance of statisticians and “therapeutic reformers”(academic physicians) anxious to discipline physicians (in theirprescriptions) and pharmaceutical companies (in their claims),in the 1950s
Coincides with diffusion of statistical concepts and methods inmany areas (genetics, psychology, economics, physics)
2 Main arguments: objectivity and “common sense” against theinvestigator’s subjective bias: “The random method removesall responsibility from the observer.” (Bradford Hill, 1953)
3 Caveat: “an incomplete revolution, one in which the mostphysicians were acquainted neither with the intellectual powerthat lay behind the procedures advocated by statisticians norwith the limitations of statistical methods.”(Marks, 1997, p. 138)
Dilemmas of authority
Latent resistance of practicing physicians and clinicians.I RCTs were not the first attempt of therapeutic reformers
American Medical Association Council on Pharmacy andChemistry (1906)I system of consultants to gather and assess evidence on newdrugs
“Collective investigations” (late 1920s)I organized collaboration of university clinics for standardizedevaluation of therapies on hundreds of patients (but norandomization)
⇒ The shift to RCTs can be viewed as an attempt to transferauthority “from institutions to methods”.
Dilemmas of authority
Latent resistance of practicing physicians and clinicians.I RCTs were not the first attempt of therapeutic reformers
American Medical Association Council on Pharmacy andChemistry (1906)I system of consultants to gather and assess evidence on newdrugs
“Collective investigations” (late 1920s)I organized collaboration of university clinics for standardizedevaluation of therapies on hundreds of patients (but norandomization)
⇒ The shift to RCTs can be viewed as an attempt to transferauthority “from institutions to methods”.
Dilemmas of authority
Latent resistance of practicing physicians and clinicians.I RCTs were not the first attempt of therapeutic reformers
American Medical Association Council on Pharmacy andChemistry (1906)I system of consultants to gather and assess evidence on newdrugs
“Collective investigations” (late 1920s)I organized collaboration of university clinics for standardizedevaluation of therapies on hundreds of patients (but norandomization)
⇒ The shift to RCTs can be viewed as an attempt to transferauthority “from institutions to methods”.
The University Group Diabetes Program Study
Large RCT in the late 1960s, strongly supported by NIH, tosolve a lasting debate on diabetes therapy and to develop theRCT methodology
Surprising results (tested drug increases mortality)I treatment discontinued, FDA informed
15-year long controversy
1 Statistical inference issueI study validated by Biometric Society
2 Relevance of tested treatment (external validity)I a standardized protocol goes against the practice ofindividual diagnostic and customized treatment
3 Who has authority?I “When reasonable people disagree, where do the boundariesof unreasonable behavior begin?” (Marks, 1997, p. 233)
The University Group Diabetes Program Study
Large RCT in the late 1960s, strongly supported by NIH, tosolve a lasting debate on diabetes therapy and to develop theRCT methodology
Surprising results (tested drug increases mortality)I treatment discontinued, FDA informed
15-year long controversy
1 Statistical inference issueI study validated by Biometric Society
2 Relevance of tested treatment (external validity)I a standardized protocol goes against the practice ofindividual diagnostic and customized treatment
3 Who has authority?I “When reasonable people disagree, where do the boundariesof unreasonable behavior begin?” (Marks, 1997, p. 233)
The University Group Diabetes Program Study
Large RCT in the late 1960s, strongly supported by NIH, tosolve a lasting debate on diabetes therapy and to develop theRCT methodology
Surprising results (tested drug increases mortality)I treatment discontinued, FDA informed
15-year long controversy
1 Statistical inference issueI study validated by Biometric Society
2 Relevance of tested treatment (external validity)I a standardized protocol goes against the practice ofindividual diagnostic and customized treatment
3 Who has authority?I “When reasonable people disagree, where do the boundariesof unreasonable behavior begin?” (Marks, 1997, p. 233)
1 The early years
2 The golden age of evaluation
3 The credibility revolution
4 The creative years
5 The times of maturity
1. The early years: R.A. Fischer and the econometricsociety
Formal theory of RCTs due to R.A. Fisher (StatisticalMethods for Research Workers, 1925).
Influences economists, in particular members of theEconometric society (e.g. Hotelling, former volunteer at theRothamsted farm)Yet, Fisher reluctant to apply statistics to social sciences, dueto their “non-experimental” nature
Early developments of econometrics steer away fromexperimentation
Extreme view: theory cannot be proven false by statisticalevidence (Keynes!)More constructive compromise (Haavelmo, 1944): statisticsused to confirm theory in probabilistic wayHeckman (2010): the fundamental insight of the firsteconometricians in the Cowles commission is to show thatthere is no causal inference possible without a theoreticalmodelN the influence of statisticians in econometrics in the 1990s(e.g. Rubin) is to forget this lesson
2. from the end of 60’s to the beginning of 80’s : a goldenage of evaluation
From the mid 60’s, a huge and sharp increase of randomisedexperiment
According to Baruch (1978), 245 randomised fieldexperiments had been conducted in U.S for social policiesevaluations up to 1978
Some of them were ambitious and very costly
affected different kind of policies (subsidized work, incomemaintenance, job search counseling)
2. from the end of 60’s to the beginning of 80’s : a goldenage of evaluation
One of the first RCT is the famous ”‘Perry Pre-schoolprogram”’ (1961) whose results keep on feeding currentpapers : the follow-up survey has been following the 123people (control and treated ones) up to 2000.
This huge effort has been prompted by the 1% part of everysocial budget devoted to evaluation
Two famous examples : the Perry Preschool Program / TheRand Health Insurance Experiment
Pioneer study: the Perry Preschool Program
123 children born between 1958 and 1962 in Michigan
Half of them (drawn at random) entered the perry schoolprogram at 3 or 4 years old.
Education by skilled professionals in nurseries and kindergarten
the program includes too help to parents to improve theirinvolvement.
Program duration circa 30 weeks
follow-up survey (age : 14, 15, 19, 27 and 40 years old)
Pioneer study: the Perry Preschool Program
Number treatedgroup
Controlgroup
p-value
Test Evaluation cognitive skills at 15 y. old 95 122.2 94.5 < .001
Access to University 121 38% 21% .029
Jailed or arrested at least once 121 31% 51% .022
Welfare 120 18% 32% .044
Employed at 19 years old 121 50% 32% .032
Pioneer study 2: The Rand Health Insurance Experiment
5809 people randomly assigned in 1974 to different insuranceprograms with 0%, 25%, 50% and 75% sharing.
They were followed until 1982.
Main results : paying a portion of health cost make peoplegive up some “superfluous” cares, with little harm on theirhealth
But some heterogeneity : This result seems to be not true forpoor people.
I strong influence on the development of cost-sharing
I debate on the external validity of results (see later on)
Pioneer study 2: The Rand Health Insurance Experiment
5809 people randomly assigned in 1974 to different insuranceprograms with 0%, 25%, 50% and 75% sharing.
They were followed until 1982.
Main results : paying a portion of health cost make peoplegive up some “superfluous” cares, with little harm on theirhealth
But some heterogeneity : This result seems to be not true forpoor people.
I strong influence on the development of cost-sharing
I debate on the external validity of results (see later on)
Pioneer study 2: The Rand Health Insurance Experiment
5809 people randomly assigned in 1974 to different insuranceprograms with 0%, 25%, 50% and 75% sharing.
They were followed until 1982.
Main results : paying a portion of health cost make peoplegive up some “superfluous” cares, with little harm on theirhealth
But some heterogeneity : This result seems to be not true forpoor people.
I strong influence on the development of cost-sharing
I debate on the external validity of results (see later on)
RCTs in the US today
Still used in US to evaluate large and ambitious programs,and routinely used in education, public health
Moving To Opportunity (1994)Job Corps (training program for youth) and replication studies
But at a lower pace. Two reasons:
Impacts are often disappointingly small: in the welfare state,marginal effects of new policies are weakEvaluation takes time: too long for the political agenda
RCTs in the US today
Still used in US to evaluate large and ambitious programs,and routinely used in education, public health
Moving To Opportunity (1994)Job Corps (training program for youth) and replication studies
But at a lower pace. Two reasons:
Impacts are often disappointingly small: in the welfare state,marginal effects of new policies are weakEvaluation takes time: too long for the political agenda
3. The credibility revolution: overview
Starting in the 1990s, applied (micro)econometrics undergoes a“credibility revolution” (Angrist and Pischke, JEPerspectives,2010)
Selection bias taken (even) more seriouslyI influential within-study comparisons (LaLonde, 1986)
Standard selection correction procedures (like heckit)questioned for their lack of robustness
Search for
More credible sources of variationsI “design-based” studyLess parametric assumptionsI treatment effect model allowing for heterogeneous effects,flexible estimation, local interpretation of estimates
3. The credibility revolution: overview
Starting in the 1990s, applied (micro)econometrics undergoes a“credibility revolution” (Angrist and Pischke, JEPerspectives,2010)
Selection bias taken (even) more seriouslyI influential within-study comparisons (LaLonde, 1986)
Standard selection correction procedures (like heckit)questioned for their lack of robustness
Search for
More credible sources of variationsI “design-based” studyLess parametric assumptionsI treatment effect model allowing for heterogeneous effects,flexible estimation, local interpretation of estimates
3. The credibility revolution: overview
Starting in the 1990s, applied (micro)econometrics undergoes a“credibility revolution” (Angrist and Pischke, JEPerspectives,2010)
Selection bias taken (even) more seriouslyI influential within-study comparisons (LaLonde, 1986)
Standard selection correction procedures (like heckit)questioned for their lack of robustness
Search for
More credible sources of variationsI “design-based” studyLess parametric assumptionsI treatment effect model allowing for heterogeneous effects,flexible estimation, local interpretation of estimates
The treatment effect model
(akas “Rubin causal model”)
Counterfactual outcomes y(1), y(0)⇒ (individual) treatment effect y(1)− y(0)
Observe only one realization of counterfactual outcomes:
y = (1− T )y(0) + Ty(1)
Parameters of interest: average causal effects (ATE:E (y(1)− y(0)), ATT: E (y(1)− y(0)|T = 1)) but also otherfeatures of the distribution of treatment effects (e.g. fractionlosing)
The treatment effect model
(akas “Rubin causal model”)
Counterfactual outcomes y(1), y(0)⇒ (individual) treatment effect y(1)− y(0)
Observe only one realization of counterfactual outcomes:
y = (1− T )y(0) + Ty(1)
Parameters of interest: average causal effects (ATE:E (y(1)− y(0)), ATT: E (y(1)− y(0)|T = 1)) but also otherfeatures of the distribution of treatment effects (e.g. fractionlosing)
The treatment effect model
(akas “Rubin causal model”)
Counterfactual outcomes y(1), y(0)⇒ (individual) treatment effect y(1)− y(0)
Observe only one realization of counterfactual outcomes:
y = (1− T )y(0) + Ty(1)
Parameters of interest: average causal effects (ATE:E (y(1)− y(0)), ATT: E (y(1)− y(0)|T = 1)) but also otherfeatures of the distribution of treatment effects (e.g. fractionlosing)
Selection bias
E(y |T = 1) − E(y |T = 0)
= E(y(1)|T = 1)− E(y(0)|T = 0)
= E(y(1)− y(0)|T = 1) + E(y(0)|T = 1)− E(y(0)|T = 0)
E (y(0)|T = 1)− E (y(0)|T = 0): selection bias.I Treated individuals do better not because of treatment, butbecause they would have done better anyway without treatment.
Selection bias
E(y |T = 1) − E(y |T = 0)
= E(y(1)|T = 1)− E(y(0)|T = 0)
= E(y(1)− y(0)|T = 1) + E(y(0)|T = 1)− E(y(0)|T = 0)
E (y(0)|T = 1)− E (y(0)|T = 0): selection bias.I Treated individuals do better not because of treatment, butbecause they would have done better anyway without treatment.
Approaches to selection bias
Model selection (structural approach)I Roy model: self-selection into treatment is informative onpotential outcomes
Assume conditional independence (CIA)
y(1), y(0)⊥T |X
I selection after controlling for X is “as good as random”I matching, regressionpossibly controlling for lagged outcomeI diff-in-diff, diff-in-diff matching
“Design-based” approach
Randomized experimentsQuasi-experiments
Approaches to selection bias
Model selection (structural approach)I Roy model: self-selection into treatment is informative onpotential outcomes
Assume conditional independence (CIA)
y(1), y(0)⊥T |X
I selection after controlling for X is “as good as random”
I matching, regressionpossibly controlling for lagged outcomeI diff-in-diff, diff-in-diff matching
“Design-based” approach
Randomized experimentsQuasi-experiments
Approaches to selection bias
Model selection (structural approach)I Roy model: self-selection into treatment is informative onpotential outcomes
Assume conditional independence (CIA)
y(1), y(0)⊥T |X
I selection after controlling for X is “as good as random”I matching, regression
possibly controlling for lagged outcomeI diff-in-diff, diff-in-diff matching
“Design-based” approach
Randomized experimentsQuasi-experiments
Approaches to selection bias
Model selection (structural approach)I Roy model: self-selection into treatment is informative onpotential outcomes
Assume conditional independence (CIA)
y(1), y(0)⊥T |X
I selection after controlling for X is “as good as random”I matching, regressionpossibly controlling for lagged outcomeI diff-in-diff, diff-in-diff matching
“Design-based” approach
Randomized experimentsQuasi-experiments
Approaches to selection bias
Model selection (structural approach)I Roy model: self-selection into treatment is informative onpotential outcomes
Assume conditional independence (CIA)
y(1), y(0)⊥T |X
I selection after controlling for X is “as good as random”I matching, regressionpossibly controlling for lagged outcomeI diff-in-diff, diff-in-diff matching
“Design-based” approach
Randomized experimentsQuasi-experiments
Can non-experimental evaluations match experimentalresults? LaLonde vs. Dehejia-Wahba
LaLonde (1986) has the idea to use ex-post evaluationmethods to the data of a randomized experiment (trainingprogram included in the larger National Supported Workprogram) [“within-study comparison”]
uses a two-step Heckman method with different sets ofexclusion variables
non-experimental results are very dependent of the choice ofexclusion variables and far from the benchmark results
Can non-experimental evaluations match experimentalresults? LaLonde vs. Dehejia-Wahba
LaLonde (1986) has the idea to use ex-post evaluationmethods to the data of a randomized experiment (trainingprogram included in the larger National Supported Workprogram) [“within-study comparison”]
uses a two-step Heckman method with different sets ofexclusion variables
non-experimental results are very dependent of the choice ofexclusion variables and far from the benchmark results
Can non-experimental evaluations match experimentalresults? LaLonde vs. Dehejia-Wahba
LaLonde (1986) has the idea to use ex-post evaluationmethods to the data of a randomized experiment (trainingprogram included in the larger National Supported Workprogram) [“within-study comparison”]
uses a two-step Heckman method with different sets ofexclusion variables
non-experimental results are very dependent of the choice ofexclusion variables and far from the benchmark results
LaLonde (1986)
Matching LaLonde
Dehejia and Wahba (2002) use propensity score matchingmethods
varying the set of matching variables / the comparison samplesproviding guidelines to assess the quality of matching variables
I Results closer to the benchmarkI At this time, matching methods appear as a big advance.
Matching LaLonde
Dehejia and Wahba (2002) use propensity score matchingmethods
varying the set of matching variables / the comparison samplesproviding guidelines to assess the quality of matching variables
I Results closer to the benchmarkI At this time, matching methods appear as a big advance.
Dehejia and Wahba (2002)
Matching epilogue
The debate is now less vivid
Smith and Todd (2005) identify 3 “non sufficient” criteria formatching estimators:
same data for control and treated groupssame local areas for treated and control groupsa rich set of matching variables
in fact not general results but matching has now lost its pastglory...
... and most of researchers prefer quasi-experiments.
Within-study comparisons : Arceneaux et al.
“Comparing Experimental and Matching Methods Using aLarge-Scale Voter Mobilization Experiment”I shows the difficulties to correct selection biases with ex-postmethods that account only for observable selection variables
The program: a randomized phone call to remind electors theimportance of voting (2002 mid-term elections)
Outcome: participation to election (data from electoralregisters)
Result: no effect
Within-study comparisons : Arceneaux et al.
“Comparing Experimental and Matching Methods Using aLarge-Scale Voter Mobilization Experiment”I shows the difficulties to correct selection biases with ex-postmethods that account only for observable selection variables
The program: a randomized phone call to remind electors theimportance of voting (2002 mid-term elections)
Outcome: participation to election (data from electoralregisters)
Result: no effect
Within-study comparisons : Arceneaux et al.
“Comparing Experimental and Matching Methods Using aLarge-Scale Voter Mobilization Experiment”I shows the difficulties to correct selection biases with ex-postmethods that account only for observable selection variables
The program: a randomized phone call to remind electors theimportance of voting (2002 mid-term elections)
Outcome: participation to election (data from electoralregisters)
Result: no effect
Within-study comparisons : Arceneaux et al.
“Comparing Experimental and Matching Methods Using aLarge-Scale Voter Mobilization Experiment”I shows the difficulties to correct selection biases with ex-postmethods that account only for observable selection variables
The program: a randomized phone call to remind electors theimportance of voting (2002 mid-term elections)
Outcome: participation to election (data from electoralregisters)
Result: no effect
Arceneaux et al.
Arceneaux et al.
Trying to replicate experimental results
Trying to replicate experimental results
First non-experimental method : OLS on the whole sampleI Whatever the used covariates, the (biased) estimate ispositive and significant
Second non experimental method : matchingI Still positive (biased) estimate, even though smaller
Large influence in the U.S. political science researchcommunity (which considered matching methods as the newGraal)
Trying to replicate experimental results
First non-experimental method : OLS on the whole sampleI Whatever the used covariates, the (biased) estimate ispositive and significant
Second non experimental method : matchingI Still positive (biased) estimate, even though smaller
Large influence in the U.S. political science researchcommunity (which considered matching methods as the newGraal)
Trying to replicate experimental results
First non-experimental method : OLS on the whole sampleI Whatever the used covariates, the (biased) estimate ispositive and significant
Second non experimental method : matchingI Still positive (biased) estimate, even though smaller
Large influence in the U.S. political science researchcommunity (which considered matching methods as the newGraal)
4. The creative years: RCTs in development economics
Banerjee and Duflo (2010): “The experimental approach todevelopment economics”
Since mid 1990s, rapid surge in experiments in developingcountries
Starting with simple trials of inputs in the education / healthproduction function (e.g. textbook or flipcharts in education?)
Going to increasingly “smart designs”
The new wave: key characteristics
1 Test conventional wisdom
2 Micro approach, field involvement of researchers
3 Smaller experiments⇒ in a less saturated policy environment, interventions havelarge impact that can be detected on small samples
4 Importing insights from theory / lab (e.g. List, Levitt, Karlan)
I Useful for all this: variety of randomization approaches
How to randomize? Individual vs. collective
Depends on type of intervention and type of question
Some interventions only make sense at collective level (egclass-level interventions)
Ethical or political issues (avoid treating peers differently)
May want to combine the two, typically when spillovers arethe issue (eg Deworming in Miguel and Kremer, 04)
Randomizing entire groups may come closer to what wouldhappen at scale-up (equilibrium / crowding out effects)
But randomizing entire groups reduces statistical precision –see next lecture
Lottery
Phase-in
Encouragement
Rotation
5. The times of maturity (and debates!)
Are randomized experiments the “gold standard”?
Hot debate in development economics (“randomistas” –Banerjee, Duflo, Kremer,... – and their critics – Deaton,Ravallion, Rodrik)
Discussion among econometricians (Angrist and Imbens vs.Heckman)
French debate too
5. The times of maturity (and debates!)
Are randomized experiments the “gold standard”?
Hot debate in development economics (“randomistas” –Banerjee, Duflo, Kremer,... – and their critics – Deaton,Ravallion, Rodrik)
Discussion among econometricians (Angrist and Imbens vs.Heckman)
French debate too
5. The times of maturity (and debates!)
Are randomized experiments the “gold standard”?
Hot debate in development economics (“randomistas” –Banerjee, Duflo, Kremer,... – and their critics – Deaton,Ravallion, Rodrik)
Discussion among econometricians (Angrist and Imbens vs.Heckman)
French debate too
Internal validity (1): Hawthorne and John Henry effects
Occur when the experimental group (control or treatment) react tobeing part of an experiment (and being monitored)
True concern (and mostly specific to experiments)
Risk varies across treatments and contexts (e.g. probablyhigher when randomizing individuals)
Possible solutions
“Blind” experiments
Two-stage randomization (proposing vs. not proposing theexperiment; assigning to treatment or not): creates anadditional control group for “placebo test”
Internal validity (1): Hawthorne and John Henry effects
Occur when the experimental group (control or treatment) react tobeing part of an experiment (and being monitored)
True concern (and mostly specific to experiments)
Risk varies across treatments and contexts (e.g. probablyhigher when randomizing individuals)
Possible solutions
“Blind” experiments
Two-stage randomization (proposing vs. not proposing theexperiment; assigning to treatment or not): creates anadditional control group for “placebo test”
Internal validity (2): Spillover effects
Occur when the control group is affected by the existence of atreatment group nearby⇒ the difference in outcome is no longer the impact on thetreatment group
A. Finkelstein’s famous and recent paper shows that ATTcomputed from HIE Rand experiment was hugely biasedbecause of such effects (HIE estimates : +37% vs Finkelsteinestimation +400% using ex-post evaluation)
The Medicare introduction had impacts on control andtreated groups (induced technological progress and diffusionof behavior changes beyon the tretament group)
Possible solutions:
Ensure control and treatment units are sufficiently far away
Use variations in distance between control and treatmentunits to identify (and net out) the spillover effects (Migueland Kremer, Econometrica 04)
Internal validity (2): Spillover effects
Occur when the control group is affected by the existence of atreatment group nearby⇒ the difference in outcome is no longer the impact on thetreatment group
A. Finkelstein’s famous and recent paper shows that ATTcomputed from HIE Rand experiment was hugely biasedbecause of such effects (HIE estimates : +37% vs Finkelsteinestimation +400% using ex-post evaluation)
The Medicare introduction had impacts on control andtreated groups (induced technological progress and diffusionof behavior changes beyon the tretament group)
Possible solutions:
Ensure control and treatment units are sufficiently far away
Use variations in distance between control and treatmentunits to identify (and net out) the spillover effects (Migueland Kremer, Econometrica 04)
External validity (1): environmental dependence
1 Heterogeneous effect: The same program may have differenteffects across contexts and target populations
2 “Implementer effect”: The “same” program may havedifferent effects (and different contents) depending on theimplementer
External validity (1): environmental dependence
1 Heterogeneous effect: The same program may have differenteffects across contexts and target populations
2 “Implementer effect”: The “same” program may havedifferent effects (and different contents) depending on theimplementer
Responses to environmental dependence
1. Heterogeneous effect:
Individual studies:
1 check for heterogeneous effects by sub-groups2 replicate
Meta-analysis: cumulated knowledge from similar experiments(e.g.: Kremer and Holla (2004): high price elasticity ofdemand for health and education, especially around 0)
Responses to environmental dependence
1. Heterogeneous effect:
Individual studies:
1 check for heterogeneous effects by sub-groups2 replicate
Meta-analysis: cumulated knowledge from similar experiments(e.g.: Kremer and Holla (2004): high price elasticity ofdemand for health and education, especially around 0)
Responses to environmental dependence
2. Implementer effect:
Individual studies: need to emphasize the place of theprogram in the overall action plan of the implementingorganization, and the place of the organization itself
Allocate the evaluation effort systematically: role of fundingagencies
⇒ strength of the experimental approach: can – in principle – besystematically applied across a variety of environments
External validity (2): compliance issues
Not all members of the treatment group end up benefiting fromthe program⇒ only the impact on beneficiaries is identified. May differ fromthe impact on the whole population.
1 May reflect the actual policy
2 Related to analyzing the heterogeneity of program impacts– but sub-populations defined in terms of willingness to enterthe program: requires to reveal that willingness
External validity (3): equilibrium effects
What if everybody was to benefit from the program?Hard to experiment: no control group! But:
1 Experimental designs: vary size of treatment group acrosslocal markets
2 Use partial equilibrium “assumption-free” estimates as abuilding block in a broader, “assumption-dependent” model
External validity (3): equilibrium effects
What if everybody was to benefit from the program?Hard to experiment: no control group! But:
1 Experimental designs: vary size of treatment group acrosslocal markets
2 Use partial equilibrium “assumption-free” estimates as abuilding block in a broader, “assumption-dependent” model
To sum up:
External validity issues to be addressed for increased policyrelevance
Suggests to combine experiments, and to combineexperiments and theory
⇒ can be embedded in a broader process: “creativeexperimentation”
A few references
Introduction to RCTs
Duflo, Glennester and Kremer (2008) “Using randomization indevelopment economics research: a toolkit”, in Handbook of developmenteconomics, vol. 4, ch. 61.
The debate where it stands
Journal of Economic Perspectives, 24 (2010): “The Credibility Revolutionin Empirical Economics” (by Angrist and Pischke) and comments bymacroeconomists
Heckman (2010), “Building bridges between structural and programevaluation approaches to evaluating policy”, Journal of EconomicLiterature, 48(2)
Banerjee and Duflo (2010), “The experimental approach to developmenteconomics”, Annual Review of Economics, 1, 151-178.
A few references (2)
Benchmarking different approaches
LaLonde (1986), “Evaluating the Econometric Evaluation of TrainingPrograms with Experimental Data”, American Economic Review 76(4)
Heckman, Ichimura, Smith and Todd (1998), “Characterizing SelectionBias using Experimental Data”, Econometrica, 1017-98.
Dehejia and Wahba (2002), “Propensity score matching methods fornonexperimental causal studies”, Review of Economics and Statistics,84(1).