pols 7170x master’s seminar program/policy evaluation class 6-7 brooklyn college-cuny shang e. ha
TRANSCRIPT
POLS 7170XMaster’s SeminarProgram/policy Evaluation
Class 6-7Brooklyn College-CUNYShang E. Ha
Quasi-Experimental Impact Assessment
A randomized field experiment is the strongest research design for assessing program impact
When a randomized design is not feasible, there are alternative research designs that an evaluator can use Even when well crafted and implemented,
these alternative designs may still yield biased estimates of program effects; such biases systematically exaggerate or diminish program effects, and the direction the bias may take cannot usually be known in advance
Bias in Estimation of Program Effects
A program effect The difference between that observed
outcome and the outcome that would have occurred for those same targets, all other things being equal, had they not been exposed to the program
Bias comes into the picture when either the measurement of the outcome with program exposure or the estimate of what the outcome would have been without program exposure is higher or lower than the corresponding “true” value
Bias: An Example
A reading program for young children that emphasizes vocabulary development We have an appropriate vocabulary test
for measuring outcome We use this test to measure the children’s
vocabulary before and after the program We conduct a simple pre-post test [exhibit 9-A]
The vocabulary of young children tends to increase over time!
Selection Bias
A group comparison design for which the groups have not been formed through randomization is known as a nonequivalent comparison design
When the equivalence between the treatment group and the control group does not hold, the difference in outcome between the groups produces a form of bias in the estimate of program effects (selection bias)
Selection Bias
A program where a group of individuals volunteer to participate and use those who do not volunteer as the control group Because we are unlikely to know what all
the relevant differences are between volunteers and nonvolunteers, we have limited ability to determine the nature and extent of the bias
Attrition
Even in the case of well-executed randomized field experiments, bias can occur
Attrition Targets drop out of the intervention or
control group and cannot be reached Targets refuse to cooperate in outcome
measurement C.f. failure to treat
Other Sources of Bias
Secular trends Relatively long-term trends in community, region, or country In a period when a community’s birth rate is declining, a
program to reduce fertility may appear effective because of bias stemming from that downward trend
Interfering events Short-term events A natural disaster may make it appear that a program to
increase community cooperation has been effective, when in reality it is the crisis situation that has brought community members together
Maturation Natural maturational and developmental processes can produce
considerable change independently of the program A program to improve preventive health practices among adults
may seem ineffective because health generally declines with age
Matching
The intervention group is typically specified first and the evaluator then constructs a control group by selecting targets unexposed to the intervention that match those in the intervention group on selected characteristics
To the extent that the matching falls short of equating the groups on characteristics that will influence the outcome, selection bias will be introduced into the resulting program effect estimate
Matching Procedures
Individual matching: to draw a “partner” for each target who receives the intervention from the pool of potential targets unexposed to the program Relevant matching variables: age, gender,
father’s occupation, hours of work, etc Aggregate matching: individuals are not
matched case by case, but the overall distributions in the intervention and control groups on each matching variable are made comparable
Problems of Individual Matching
Individual matching is usually preferable to aggregate matching
But individual matching is more time-consuming and difficult to execute for a large number of matched variables
Matching by individuals can sometimes result in a drastic loss of cases If matching persons cannot be found for some
individuals in the intervention group, those unmatched individuals have to be discarded as data sources
Statistical Controls
The functional equivalent of matching Any program effect estimate based on a
simple comparison of the outcomes for the intervention and control groups must be presumed to include selection bias
If the relevant differences between the groups can be measured, statistical techniques can be used to attempt to statistically control for the differences between groups that would otherwise lead to biased program estimates
Statistical Controls: Illustration
Panel A: Outcome Comparison
Participants Non-Participants
Ave. Wage $7.75 $8.20
Panel B: Outcome Comparison after Adjusting for Educational Attainment
Participants Non-Participants
Less thanHS
High School
Less than HS
High School
Ave. Wage $7.60 $8.10 $7.75 $8.50
Panel C: Outcome Comparison after Adjusting for Education and Employment
Participants Non-Participants
Less than HS/UnEm
HS/UnEm Less than HS/UnEm
Less than HS/Emp
HS/UnEm HS/Emp
Ave. Wage $7.60 $8.10 $7.50 $7.83 $8.00 $8.60
How to Read Regression Results?
The adjustments shown in previous slide were accomplished in a very simple way to illustrate the logic of statistical controls.
In actual application, the evaluator would generally use multivariate statistical methods to control for a number of group differences simultaneously
How to Read Regression Results?
Regression Results Predicting Improvement in Test Scores (scale, 0-100)
Coefficient Standard Error
Program Intervention 12.34* 3.45
Constant 56.23* 20.23
# of Students=5,000
* Statistically significant at p<.05
Simple Pre-Post Studies
Outcomes are measured on the same targets before program participation and again after sufficiently long participation for effects to be expected E.g., the effects of Medicare (before eligible –
after eligible) In general, simple pre-post designs
provide biased estimates of program effects that have little value for purposes of impact assessment
Quasi-Experimental Design: Cautions
The advantages of quasi-experimental research designs for program impact assessment rest entirely on their practicality and convenience in situations where randomized field experiments are not feasible
Under favorable circumstances and carefully done, quasi-experiments can yield estimates of program effects that are comparable to those derived from randomized designs, but they can also produce wildly erroneous results
The Magnitude of a Program Effect
The most direct way to characterize the magnitude of the program effect is simply the numerical difference between the means of the two outcome values (treatment/intervention group vs. control group)
Problem of simple numerical difference between the means of the two outcomes It is very specific to the particular measurement
instrument E.g., the effects of a program on knowledge about
drug abuse Treatment group .17 & control group .15 a .02 increase What’s the scale of the outcome variable “knowledge
about drug abuse”?
Standardized Mean Difference
The standardized mean difference expresses the mean outcome difference between the intervention group and a control group in standard deviation units
The standard deviation is a statistical index of the variation across individuals or other units on a given measure that provides information about the range or spread of the scores
Describing the size of a program effect in standard deviation units indicates how large it is relative to the range of scores found between the lowest and highest ones recorded in the study A preschool program: the standardized mean difference size
A test of reading readiness: .50 (the mean score for the intervention group is half a standard deviation higher than that for the control group)
A test of advancing vocabulary: .35
Standardized Mean Difference
By convention, standardized mean difference effect size is given a positive value when the outcome is more favorable for the intervention group and a negative value if the control group is favored
[Exhibit 10-A] for formula
Odds-Ratio
Odds-ratio tends to be preferred when outcome variables are binary (e.g., pregnant or not; graduation or not, etc)
An odds ratio indicates how much smaller or larger the odds of an outcome even are for the intervention group compared to the control group An odds ratio of 1.0 indicates even odds; that is, participants in the
intervention group were no more and no less likely than controls to experience the change
Odds ratios greater/smaller than 1.0 indicate that intervention group members were more/less likely to experience a change An odds ratio of 2.0 means that members of the intervention
group were twice as likely to experience the outcome than members of the control group
Odds Ratio
Positive Outcome Negative Outcome
Intervention Group p 1-p
Control Group q 1-q
Odds Ratio = [p/(1-p)]/[q/(1-q)]
Statistical Significance
We would like to know whether the observed program effect is real or it occur by chance (statistical noise)
If the estimate of program effect is large relative to the expected level of statistical noise, we will be relatively confident that we have detected a real effect and not a chance pattern of noise
If the program effect estimate is small relative to statistical noise, we will have little confidence that we have observed a real program effect
Conventionally, statistical significance is set a the .05 alpha level (that means, the chance of a pseudo-effect produced by noise being as large as the observed program effect is 5% or less – we have a 95% confidence level that the observed effect is not simply the result of statistical noise)
Type I and Type II Errors
Population Circumstances
Results of Significance Test on Sample Data
Intervention and Control Means Differ
Intervention and Control Means Do Not
Differ
Significant Difference Correct conclusion(Prob. = 1-β)
Type I Error(Prob. = α)
Not a Significant Difference
Type II Error(Prob. = β)
Correct conclusion(Prob. = 1-α)
Type I and Type II Errors
Type I error: finding statistical significance when there is no program effect Easy to control: the conventional alpha level .05
means that the probability of a Type I error is being held to 5% or less
Type II error: not obtaining statistical significance when there is a program effect Difficult: the program design has to have
adequate statistical power (the probability that an estimate of the program effect will be statistically significant when it represents a real effect)
Statistical Power
Statistical power is a function of… The effect size to be detected; The sample size; The type of statistical significance test used; The alpha level set to control Type I error (usually fixed
at .05) [Example]
When program effects are not statistically significant, this result is generally taken as an indication that the program failed to produce effects
This interpretation of statistically nonsignificant results is technically incorrect if the lack of statistical significance was the result of an underpowered study and not the program failure to produce meaningful effects
Practical Significance
Statistical significance ≠ practical significance A small statistical effect may represent a
program effect of considerable practical significance
A large statistical effect for a program may be of little practical significance
Moderator Variables
A moderator variable characterizes subgroups in an impact assessment for which the program effects may differ Men vs. women, white vs. black, young vs.
old One important role of moderator analysis is to
avoid premature conclusions about program effectiveness based only on the overall mean program effects
Mediator Variables
A mediator variable is an intervening variable that comes between program exposure and some key outcome and represents a step on the causal pathway by which the program is expected to bring about change in the outcome
Exploration of mediator relationships helps the evaluator and the program stakeholders better understand what processes occur among the target population as a result of exposure to the program
Meta-Analysis
A statistical analysis of the statistical effects from multiple studies of a topic Reports of all available impact assessment
studies of a particular intervention or type of program are first collected
The program effects on selected outcomes are encoded
Other descriptive information about the evaluation methods, program participants, and nature of the intervention is also recorded
Publication bias