pols 7170x master’s seminar program/policy evaluation class 6-7 brooklyn college-cuny shang e. ha

POLS 7170XMaster’s SeminarProgram/policy Evaluation

Class 6-7Brooklyn College-CUNYShang E. Ha

Quasi-Experimental Impact Assessment

A randomized field experiment is the strongest research design for assessing program impact

When a randomized design is not feasible, there are alternative research designs that an evaluator can use Even when well crafted and implemented,

these alternative designs may still yield biased estimates of program effects; such biases systematically exaggerate or diminish program effects, and the direction the bias may take cannot usually be known in advance

Bias in Estimation of Program Effects

A program effect The difference between that observed

outcome and the outcome that would have occurred for those same targets, all other things being equal, had they not been exposed to the program

Bias comes into the picture when either the measurement of the outcome with program exposure or the estimate of what the outcome would have been without program exposure is higher or lower than the corresponding “true” value

Bias: An Example

A reading program for young children that emphasizes vocabulary development We have an appropriate vocabulary test

for measuring outcome We use this test to measure the children’s

vocabulary before and after the program We conduct a simple pre-post test [exhibit 9-A]

The vocabulary of young children tends to increase over time!

Selection Bias

A group comparison design for which the groups have not been formed through randomization is known as a nonequivalent comparison design

When the equivalence between the treatment group and the control group does not hold, the difference in outcome between the groups produces a form of bias in the estimate of program effects (selection bias)

Selection Bias

A program where a group of individuals volunteer to participate and use those who do not volunteer as the control group Because we are unlikely to know what all

the relevant differences are between volunteers and nonvolunteers, we have limited ability to determine the nature and extent of the bias

Attrition

Even in the case of well-executed randomized field experiments, bias can occur

Attrition Targets drop out of the intervention or

control group and cannot be reached Targets refuse to cooperate in outcome

measurement C.f. failure to treat

Other Sources of Bias

Secular trends Relatively long-term trends in community, region, or country In a period when a community’s birth rate is declining, a

program to reduce fertility may appear effective because of bias stemming from that downward trend

Interfering events Short-term events A natural disaster may make it appear that a program to

increase community cooperation has been effective, when in reality it is the crisis situation that has brought community members together

Maturation Natural maturational and developmental processes can produce

considerable change independently of the program A program to improve preventive health practices among adults

may seem ineffective because health generally declines with age

Matching

The intervention group is typically specified first and the evaluator then constructs a control group by selecting targets unexposed to the intervention that match those in the intervention group on selected characteristics

To the extent that the matching falls short of equating the groups on characteristics that will influence the outcome, selection bias will be introduced into the resulting program effect estimate

Matching Procedures

Individual matching: to draw a “partner” for each target who receives the intervention from the pool of potential targets unexposed to the program Relevant matching variables: age, gender,

father’s occupation, hours of work, etc Aggregate matching: individuals are not

matched case by case, but the overall distributions in the intervention and control groups on each matching variable are made comparable

Problems of Individual Matching

Individual matching is usually preferable to aggregate matching

But individual matching is more time-consuming and difficult to execute for a large number of matched variables

Matching by individuals can sometimes result in a drastic loss of cases If matching persons cannot be found for some

individuals in the intervention group, those unmatched individuals have to be discarded as data sources

Statistical Controls

The functional equivalent of matching Any program effect estimate based on a

simple comparison of the outcomes for the intervention and control groups must be presumed to include selection bias

If the relevant differences between the groups can be measured, statistical techniques can be used to attempt to statistically control for the differences between groups that would otherwise lead to biased program estimates

Statistical Controls: Illustration

Panel A: Outcome Comparison

Participants Non-Participants

Ave. Wage $7.75 $8.20

Panel B: Outcome Comparison after Adjusting for Educational Attainment


Less thanHS

High School

Less than HS

High School

Ave. Wage $7.60 $8.10 $7.75 $8.50

Panel C: Outcome Comparison after Adjusting for Education and Employment


Less than HS/UnEm

HS/UnEm Less than HS/UnEm

Less than HS/Emp

HS/UnEm HS/Emp

Ave. Wage $7.60 $8.10 $7.50 $7.83 $8.00 $8.60

How to Read Regression Results?

The adjustments shown in previous slide were accomplished in a very simple way to illustrate the logic of statistical controls.

In actual application, the evaluator would generally use multivariate statistical methods to control for a number of group differences simultaneously

How to Read Regression Results?

Regression Results Predicting Improvement in Test Scores (scale, 0-100)

Coefficient Standard Error

Program Intervention 12.34* 3.45

Constant 56.23* 20.23

# of Students=5,000

* Statistically significant at p<.05

Simple Pre-Post Studies

Outcomes are measured on the same targets before program participation and again after sufficiently long participation for effects to be expected E.g., the effects of Medicare (before eligible –

after eligible) In general, simple pre-post designs

provide biased estimates of program effects that have little value for purposes of impact assessment

Quasi-Experimental Design: Cautions

The advantages of quasi-experimental research designs for program impact assessment rest entirely on their practicality and convenience in situations where randomized field experiments are not feasible

Under favorable circumstances and carefully done, quasi-experiments can yield estimates of program effects that are comparable to those derived from randomized designs, but they can also produce wildly erroneous results

The Magnitude of a Program Effect

The most direct way to characterize the magnitude of the program effect is simply the numerical difference between the means of the two outcome values (treatment/intervention group vs. control group)

Problem of simple numerical difference between the means of the two outcomes It is very specific to the particular measurement

instrument E.g., the effects of a program on knowledge about

drug abuse Treatment group .17 & control group .15 a .02 increase What’s the scale of the outcome variable “knowledge

about drug abuse”?

Standardized Mean Difference

The standardized mean difference expresses the mean outcome difference between the intervention group and a control group in standard deviation units

The standard deviation is a statistical index of the variation across individuals or other units on a given measure that provides information about the range or spread of the scores

Describing the size of a program effect in standard deviation units indicates how large it is relative to the range of scores found between the lowest and highest ones recorded in the study A preschool program: the standardized mean difference size

A test of reading readiness: .50 (the mean score for the intervention group is half a standard deviation higher than that for the control group)

A test of advancing vocabulary: .35

Standardized Mean Difference

By convention, standardized mean difference effect size is given a positive value when the outcome is more favorable for the intervention group and a negative value if the control group is favored

[Exhibit 10-A] for formula

Odds-Ratio

Odds-ratio tends to be preferred when outcome variables are binary (e.g., pregnant or not; graduation or not, etc)

An odds ratio indicates how much smaller or larger the odds of an outcome even are for the intervention group compared to the control group An odds ratio of 1.0 indicates even odds; that is, participants in the

intervention group were no more and no less likely than controls to experience the change

Odds ratios greater/smaller than 1.0 indicate that intervention group members were more/less likely to experience a change An odds ratio of 2.0 means that members of the intervention

group were twice as likely to experience the outcome than members of the control group

Odds Ratio

Positive Outcome Negative Outcome

Intervention Group p 1-p

Control Group q 1-q

Odds Ratio = [p/(1-p)]/[q/(1-q)]

Statistical Significance

We would like to know whether the observed program effect is real or it occur by chance (statistical noise)

If the estimate of program effect is large relative to the expected level of statistical noise, we will be relatively confident that we have detected a real effect and not a chance pattern of noise

If the program effect estimate is small relative to statistical noise, we will have little confidence that we have observed a real program effect

Conventionally, statistical significance is set a the .05 alpha level (that means, the chance of a pseudo-effect produced by noise being as large as the observed program effect is 5% or less – we have a 95% confidence level that the observed effect is not simply the result of statistical noise)

Type I and Type II Errors

Population Circumstances

Results of Significance Test on Sample Data

Intervention and Control Means Differ

Intervention and Control Means Do Not

Differ

Significant Difference Correct conclusion(Prob. = 1-β)

Type I Error(Prob. = α)

Not a Significant Difference

Type II Error(Prob. = β)

Correct conclusion(Prob. = 1-α)

Type I and Type II Errors

Type I error: finding statistical significance when there is no program effect Easy to control: the conventional alpha level .05

means that the probability of a Type I error is being held to 5% or less

Type II error: not obtaining statistical significance when there is a program effect Difficult: the program design has to have

adequate statistical power (the probability that an estimate of the program effect will be statistically significant when it represents a real effect)

Statistical Power

Statistical power is a function of… The effect size to be detected; The sample size; The type of statistical significance test used; The alpha level set to control Type I error (usually fixed

at .05) [Example]

When program effects are not statistically significant, this result is generally taken as an indication that the program failed to produce effects

This interpretation of statistically nonsignificant results is technically incorrect if the lack of statistical significance was the result of an underpowered study and not the program failure to produce meaningful effects

Practical Significance

Statistical significance ≠ practical significance A small statistical effect may represent a

program effect of considerable practical significance

A large statistical effect for a program may be of little practical significance

Moderator Variables

A moderator variable characterizes subgroups in an impact assessment for which the program effects may differ Men vs. women, white vs. black, young vs.

old One important role of moderator analysis is to

avoid premature conclusions about program effectiveness based only on the overall mean program effects

Mediator Variables

A mediator variable is an intervening variable that comes between program exposure and some key outcome and represents a step on the causal pathway by which the program is expected to bring about change in the outcome

Exploration of mediator relationships helps the evaluator and the program stakeholders better understand what processes occur among the target population as a result of exposure to the program

Meta-Analysis

A statistical analysis of the statistical effects from multiple studies of a topic Reports of all available impact assessment

studies of a particular intervention or type of program are first collected

The program effects on selected outcomes are encoded

Other descriptive information about the evaluation methods, program participants, and nature of the intervention is also recorded

Publication bias

pols 7170x master’s seminar program/policy evaluation class 6-7 brooklyn college-cuny shang e. ha

Documents