presented by jim rugh to nonie conference in paris 28 march 2011
TRANSCRIPT
IOCE proposes more holistic perspectives
*What’s involved in “rigorous
impact evaluation”?
Presented by Jim Rughto NONIE Conference in Paris 28 March
2011
Join me in a review the basics of:
1. Evaluation Design
2. Logic models
3. Counterfactuals
4. Context (simple-complicated-complex)
5. Evaluation Implementation
*1. Evaluation Design
baseline end of project evaluation
Illustrating the need for quasi-experimental longitudinal time series evaluation design
Project participants
Comparison group
post project evaluation
An introduction to various evaluation designs
scale of major impact indicator 4
OK, let’s stop the action to identify each of the major types of evaluation (research) design …… one at a time, beginning with the most rigorous design.
5
First of all: the key to the traditional symbols:
X = Intervention (treatment), I.e. what the project does in a community
O = Observation event (e.g. baseline, mid-term evaluation, end-of-project evaluation)
P (top row): Project participants
C (bottom row): Comparison (control) group
6
baseline end of project evaluation
Comparison group
post project evaluation
Design #1: Longitudinal Quasi-experimental
P1 X P2 X P3 P4
C1 C2 C3 C4
Project participants
midterm
7
baseline end of project evaluation
Comparison group
Design #2: Quasi-experimental (pre+post, with comparison)
P1 X P2
C1 C2
Project participants
8
baseline end of project evaluation
Control group
Design #2+: Typical Randomized Control Trial
P1 X P2
C1 C2
Project participants
9
Research subjects randomly assigned either to project or control group.
end of project evaluation
Comparison group
Design #3: Truncated QED
X P1 X P2
C1 C2
Project participants
midterm
10
baseline end of project evaluation
Comparison group
Design #4: Pre+post of project; post-only comparison
P1 X P2
C
Project participants
11
end of project evaluation
Comparison group
Design #5: Post-test only of project and comparison
X P
C
Project participants
12
baseline end of project evaluation
Design #6: Pre+post of project; no comparison
P1 X P2
Project participants
13
end of project evaluation
Design #7: Post-test only of project participants
X P
Project participants
14
Need to fill in missing data through other means:• What change occurred during the life of the project?• What would have happened without the project (counterfactual)?• How sustainable is that change likely to be?
15
Design
T1
(baseline)
X(intervention)
T2
(midterm)
X(intervention,
cont.)
T3
(endline)
T4
(ex-post)
1 P1
C1X
P2
C2X
P3
C3
P4
C4
2 P1
C1X X
P2
C2
3X
P1
C1X
P2
C2
4P1 X X
P2
C2
5X X
P1
C1
6 P1 X X P2
7 X X P1Note: These 7 evaluation designs are described in the RealWorld Evaluation book
What kinds of evaluation designs are actually used in the real world (of international development)? Findings from meta-evaluations of 336 evaluation reports of an INGO.
Post-test only 59%
Before-and-after 25%
With-and-without 15%
Other counterfactual
1%
Even proponents of RCTs have acknowledged that RTCs are only appropriate for perhaps 5% of development interventions. An empirical study by Forss and Bandstein, examining evaluations in the OECD/DAC DEReC database by bilateral and multilateral organisations found only 5% used even a counterfactual design.
While we recognize that experimental and quasi experimental designs have a place in the toolkit for impact evaluations, we think that more attention needs to be paid to the roughly 95% of situations where these designs would not be possible or appropriate.
*2. Logic Models
19
Inputs ImplementationProcess
Outputs Outcomes Impacts
Economic context in which the
project operates
Political context in which the
project operates
Institutional and operational
context
Socio-economic and cultural characteristics of the affected populations
Note: The orange boxes are included in conventional Program Theory Models. The addition of the blue boxes provides the recommended more complete analysis.
One form of Program Theory (Logic) Model
Design Sustainability
20
PROBLEM
PRIMARY CAUSE
2
PRIMARY CAUSE 1
PRIMARY CAUSE 3
Secondary cause 2.2
Secondary cause 2.3
Secondary cause 2.1
Tertiary cause 2.2.1
Tertiary cause 2.2.2
Tertiary cause 2.2.3
Consequences Consequences Consequences
DESIRED IMPACT
OUTCOME 2
OUTCOME 1
OUTCOME 3
OUTPUT 2.2
OUTPUT 2.3
OUTPUT 2.1
Intervention 2.2.1
Intervention 2.2.2
Intervention 2.2.3
Consequences Consequences Consequences
Children are malnourished
Diarrheal disease
Insufficient food
Poor quality of
food
Unsanitary practices
Need for improved health
policies
Contaminated water
Flies and rodents
Do not use facilities correctly
People do not wash hands
before eating
High infant mortality rate
Women empowered
Young women
educated
Women in leadership
roles
Economic opportuniti
es for women
Female enrollment
rates increase
Curriculum improved
Improved educational
policies
Parents persuaded
to send girls to school
Schools built
School system hires
and pays teachers
Reduction in poverty
Advocacy Project Goal:
Improved educational policies enacted
Program Goal: Young women
educated
Construction Project
Goal: More classrooms
built
Teacher Education Project Goal:
Improve quality of curriculum
Program goal at impact level
ASSUMPTION(that others will do
this)PARTNER will do
this
OUR project
To have synergy and achieve impact all of these need to address
the same target population.
We need to recognize which evaluative process is most appropriate for measurement at various levels
• Impact • Outcomes
• Output• Activities• Inputs
PERFORMANCE MONITORING
PROJECT EVALUATION
PROGRAM EVALUATION
27
Ultimate Impact
End Outcomes Intermediate Outcomes
Outputs Interventions
Needs-based Higher Consequence
Specific Problem Cause Solution Process Inputs
American Red Cross
Program Goal Project Impact Outcomes Outputs Activities Inputs
AusAID Scheme Goal Major Development Objectives
Outputs Activities Inputs
CARE logframe Program Goal Project Final Goal Intermediate Objectives
Outputs Activities Inputs
CARE terminology
Program Impact Project Impact Effects Outputs Activities Inputs
CIDA + GTZ Overall goal Project purpose Results/Outputs Activities InputsCRS Proframe Goal Strategic Objective Intermediate
ResultsOutputs Activities Inputs
DANIDA + DfID
Goal Purpose Outputs Activities
EIDHR Overall Objectives
Specific Objective
Expected Results
Activities
European Union Overall Objective
Project Purpose Results Activities
FAO + UNDP + NORAD
Development Objective Immediate Objectives
Outputs Activities Inputs
PC/LogFrame Goal Purpose Outputs ActivitiesPeace Corps Purpose Goals Results Objectives Activities VolunteersSAVE – Results Framework
Goal Strategic Objective Intermediate Results
Outputs Activities Inputs
UNHCR Sector Objective
Goal Project Objective
Outputs Activities Input/Resources
USAID LogFrame
Final Goal Strategic Objective
Intermediate Results
Activities Inputs
USAID Results Framework
Goal Strategic Objective Intermediate Results
(Outputs) (Activities) (Inputs)
World Bank Long-term Objectives Short-term Objectives
Outputs Inputs
World Vision International
Program Goal Project Goal Outcomes Outputs Activities (Inputs)
The “Rosetta Stone of Logical Frameworks”
*3. Alternative Counterfactuals
29
Attribution and counterfactuals
How do we know if the observed changes in the project participants or communities income, health, attitudes, school
attendance, etc.
are due to the implementation of the project credit, water supply, transport vouchers,
school construction, etc.
or to other unrelated factors? changes in the economy, demographic
movements, other development programs, etc.
30
The Counterfactual
What change would have occurred in the relevant condition of the target population if there had been no intervention by this project?
31
*
Control group and comparison group
Control group = randomized allocation of subjects to project and non-treatment group
Comparison group = separate procedure for sampling project and non-treatment groups that are as similar as possible in all aspects except the treatment (intervention)
32
Some recent developments in impact evaluation in
international developmentJ-PAL is best understood as a network of affiliated researchers … united by their use of the randomized trial methodology…
2003
2010
2008
2006
2009
33
So, are Randomized Control Trials (RCTs) are the Gold Standard and should they be
used in most if not all program impact evaluations?Yes or no?
If so, under what circumstances should they be
used?
Why or why not?
If not, under what circumstances would they not
be appropriate?
Adapted from Patricia Rogers, RMIT University
34
Evidence-based policy for simple interventions (or simple aspects): when RCTs may be appropriate
Question needed for evidence-based policy What works?
What interventions look like Discrete, standardized intervention
How interventions work Pretty much the same everywhere
Process needed for evidence
uptake Knowledge transfer
35
•Complicated, complex programs where there are multiple interventions by multiple actors
•Projects working in evolving contexts (e.g. countries in transition, conflicts, natural disasters)
•Projects with multiple layered logic models, or unclear cause-effect relationships between outputs and higher level “vision statements” (as is often the case in the real world of international development projects)
When might rigorous evaluations of higher-level “impact” indicators require
much more than a simple RCT?
36
There are other methods for assessing the
counterfactualReliable secondary data that depicts relevant trends in the population
Longitudinal monitoring data (if it includes non-reached population)
Qualitative methods to obtain perspectives of key informants, participants, neighbors, etc.
There are situations in which a statistical counterfactual is not appropriate – even when
budget and time are not constraintsA conventional statistical counterfactual (with random selection
into treatment and control groups) is often not possible/appropriate:
When conducting the evaluation of complex interventions
When the project involves a number of interventions which may be used in different combinations in different locations
When each project location is affected by a different set of contextual factors
When it is not possible to use standard implementation procedures for all project locations
When many outcomes involve complex behavioral changes
When many outcomes are multidimensional or difficult to measure through standardized quantitative indicators.
37
Some of the alternative approaches for constructing a counterfactualA: Theory based approaches1. Program theory / logic models2. Realistic evaluation3. Process tracing4. Venn diagrams and many other PRA methods5. Historical methods6. Forensic detective work7. Compilation of a list of plausible alternative
causes8. …
(for more details see www.RealWorldEvaluation.org)
Some of the alternative approaches for constructing a counterfactualB: Quantitatively oriented approaches1. Pipeline design2. Natural variations3. Creative uses of secondary data4. Creative creation of comparison groups5. Comparison with other programs6. Comparing different types of interventions7. Cohort analysis8. …
(for more details see www.RealWorldEvaluation.org)
Some of the alternative approaches for constructing a counterfactualC: Qualitatively oriented approaches1. Concept mapping2. Creative use of secondary data3. Many PRA techniques4. Process tracing5. Compiling a book of possible causes6. Comparisons between different projects7. Comparisons among project locations with
different combinations and levels of treatment(for more details see
www.RealWorldEvaluation.org)
*4. Context
Different lenses needed for different situations in the RealWorld
Simple Complicated ComplexFollowing a recipe Sending a rocket to
the moonRaising a child
Recipes are tested to assure easy replication
Sending one rocket to the moon increases assurance that the next will also be a success
Raising one child provides experience but is no guarantee of success with the next
The best recipes give good results every time
There is a high degree of certainty of outcome
Uncertainty of outcome remains
Sources: Westley et al (2006) and Stacey (2007), cited in Patton 2008; also presented by Patricia Rodgers at Cairo impact conference 2009.
42
What’s a conscientious evaluator to do when facing such a complex
world?
DESIRED IMPACT
OUTCOME 2
OUTCOME 1
OUTCOME 3
OUTPUT 2.2
OUTPUT 2.3
OUTPUT 2.1
Intervention 2.2.1
Intervention 2.2.2
Intervention 2.2.3
Consequences Consequences Consequences
A Simple RCT
A more comprehensive design
Inputs
Outputs
Intermediate outcomes
Impacts
Donor Government Other donors
Credit for small farmers
Rural roads
SchoolsHealth services
Increased rural H/H income
Increased production
Increased school enrolment
Increased use of health services
Access to off-farm employment
Improved education performance
Improved health
Increased political participation
Expanding the results chain for multi-donor, multi-component program
Attribution gets very difficult! Consider plausible contributions each makes.
*5. Evaluation Implementation
47
OECD-DAC (2002: 24) defines impact as “the positive and negative, primary and secondary long-term effects produced by a development intervention, directly or indirectly, intended or unintended. These effects can be economic, sociocultural, institutional, environmental, technological or of other types”.
Definition of impact evaluation
Is it limited to direct attribution? Or point to the need for counterfactuals or Randomized Control Trials (RCTs)?
48
1. Direct cause-effect relationship between one output (or a very limited number of outputs) and an outcome that can be measured by the end of the research project? Pretty clear attribution.
… OR …
2. Changes in higher-level indicators of sustainable improvement in the quality of life of people, e.g. the MDGs (Millennium Development Goals)? More significant. But assessing plausible contribution is more feasible than assessing unique direct attribution.
So what should be included in a “rigorous impact evaluation”?
1) thorough consultation with and involvement by a variety of stakeholders,
2) articulating a comprehensive logic model that includes relevant external influences,
3) getting agreement on desirable ‘impact level’ goals and indicators,
4) adapting evaluation design as well as data collection and analysis methodologies to respond to the questions being asked, …
Rigorous impact evaluation should include (but is not limited to):
5) adequately monitoring and documenting the process throughout the life of the program being evaluated, 6) using an appropriate combination of methods to triangulate evidence being collected, 7) being sufficiently flexible to account for evolving contexts, …
Rigorous impact evaluation should include (but is not limited to):
8) using a variety of ways to determine the counterfactual, 9) estimating the potential sustainability of whatever changes have been observed, 10) communicating the findings to different audiences in useful ways, 11) etc. …
Rigorous impact evaluation should include (but is not limited to):
The point is that the list of what’s required for ‘rigorous’ impact evaluation goes way beyond initial randomization into treatment and ‘control’ groups.
To attempt to conduct an impact evaluation of a program using only one pre-determined tool is to suffer from myopia, which is unfortunate. On the other hand, to prescribe to donors and senior managers of major agencies that there is a single preferred design and method for conducting all impact evaluations can and has had unfortunate consequences for all of those who are involved in the design, implementation and evaluation of international development programs.
We must be careful that in using the “Gold Standard”
we do not violate the “Golden Rule”:
“Judge not that you not be judged!”
In other words:“Evaluate others as you would
have them evaluate you.”
Caution: Too often what is called Impact Evaluation is based on a “we will examine and judge you” paradigm. When we want our own programs evaluated we prefer a more holistic approach.
To use the language of the OECD/DAC, let’s be sure our evaluations are consistent with these
criteria: RELEVANCE: The extent to which the aid activity is suited to the priorities and policies of the target group, recipient and donor.EFFECTIVENESS: The extent to which an aid activity attains its objectives.EFFICIENCY: Efficiency measures the outputs – qualitative and quantitative – in relation to the inputs.IMPACT: The positive and negative changes produced by a development intervention, directly or indirectly, intended or unintended.SUSTAINABILITY is concerned with measuring whether the benefits of an activity are likely to continue after donor funding has been withdrawn. Projects need to be environmentally as well as financially sustainable.
The bottom line is defined by this question: Are our programs making plausible contributions towards positive impact on the quality of life of our intended beneficiaries? Let’s not forget them!
58
Thank you!
58