impact evaluation: balancing rigor with reality

Impact Evaluation:Balancing Rigor with Reality

Donna Smith-Moncrieffe, Public Safety Canada

Carleton University LectureMarch 18, 2014

2

Summary of the Presentation

Background Information National Crime Prevention Strategy Evaluation Strategy

Number of impact and process evaluations per year

Tools in the tool box

Types of project evaluation designs used in the federal government

Statistical tests and examples ANOVA Regression Realist method

Theory of Change Fidelity

- Types of Synthesis Methods

Challenges with balancing rigor and reality

Typical # of Evaluations Conducted Each Year

4

Tools in the tool box

5

Types of Designs and Challenges

6

Type of Evaluation Design

Challenge at NCPC, PS

RCT • In government programs, the evaluator is not in a position to randomly assign the youth to a treatment and a comparison group as per the design requirements

• Ethical issues• Contribution agreements are focussed on paying for youth in

treatment groups, not youth in control groups

Quasi-experiment(Delayed comparison group)

• Comparison group is formed with youth on a waiting list where T1 and T2 pre and post tests are completed

• Challenge is that the comparisons at the six month and 12 month post test (T3 & T4) are not possible as the delayed comparison group youth are usually returned to the treatment group

Quasi-experiment(with comparison group)

• Comparison group is formed with youth from a different community where they receive minimal services

• All aspects of internal validity (i.e. maturity, regression to the mean, history etc..) are still a threat unless covariates are assessed to determine how they impacted on the outcome

Types of Designs and Challenges

7

Type of Evaluation Design

Challenge at NCPC, PS

Implicit Design (strengthened version)(Using levels of dosage to create a comparison)

• Identifying the appropriate level of dosage is arbitrary• A validated cut-off for dosage is necessary to make the

appropriate comparison• Not all projects measure dosage in a standardized manner

Regression Discontinuity

• Requires the use of a standardized risk assessment tool that is not readily available or feasible for all projects

• Requires the identification of an assignment variable (i.e. risk level, dosage) that will strongly relate to the outcome variable

• NGOs often want to work with all youth regardless of their level of risk. Program managers may still want youth below the cut-off point to receive the program

Regression Discontinuity Example 1

8

Regression Discontinuity Example #2

9

Impact Evaluation: Using Multivariate Statistics

Table 12.59: T1 and T2 Scores on the Overall Education Attitudes Scale

TIME PERIOD

PIT CLIENTS COMPARISON GROUP N Mean SD N Mean SD

T1 191 53.88 7.23 99 49.51 10.55 T2 191 55.22 5.88 99 49.95 9.96

Sig Testing: The results of a two-way repeated measures ANOVA test shows that the T1-T2-Group interaction effect is not statistically significant (F=1.133, p=.288). Source: T1: B26, T2: A32. Base: Youth respondents with completed T1 and T2 interviews.

10

Impact Evaluation: Using Multivariate Statistics

11

Table 12.61: OLS Regressions Predicting T1-T2 Changes in Education Attitudes

Predictors Model A: ALL YOUTH

Model B: PIT CLIENTS

Model C: PIT CLIENTS

B SE P B SE P B SE P Client (1=PIT Client)

1.532 0.86 .075 -- -- -- -- -- --

Total Dosage -- -- -- .019 0.01 .135 -- -- -- Educ-Related Dosage

-- -- -- -- -- -- .165 0.07 .027

Gender (1=Male) -.843 0.84 .317 -1.087 0.89 .222 -1.457 0.91 .110 Age (in years) -.198 0.15 .184 .085 0.17 .615 .150 0.17 .380 Risk Score .428 0.12 .001 .387 0.19 .009 .419 0.15 .005 Constant -.042 2.75 .988 -4.106 3.13 .192 -5.485 3.22 .090 N 280 181 181 Model Fit: F (p) 3.600 (p =.007) 2.757 (p=.029) 3.466 (p=.009) R Square .050 .059 .073 Adjusted R Square .036 .037 .052

Realist Approach Example

Elements to consider in the Realist Approach

Site 1 (Cree , Quebec) Site 2 (Edmonton) Site 3(Toronto)

Fidelity Rating Low High High

Contextual Differences

• Literacy in English• Parental

engagement is low• Logistical challenges• Cultural differences

• Project implemented as per the model

• Project implemented as per the model

Effect sizes related to emotional regulation (aggression)

• Statistical assumptions not met to report effect sizes

• Moderate to High (0.40-0.61)

• Moderate to High (0.54-1.17)

Cost Benefit Analysis results

N/A Statistical Assumptions not met

1:4 in Total Competencies

N/A Statistical assumptions not met12

Synthesis Methods

13

Types of Synthesis Methods

Systematic Review

• Key points• Summary of rigorous

studies (RCT)• Uses Standardizes

measures (i.e. effect sizes)

• Examples include Cochrane Collaboration, Campbell Collaboration, what works clearing house

•• Challenges:• Narrow focus on program

effects• Does not address

contextual, program factors – how, why questions

• Information about the intervention is absent

Multiple Case Study

• Key Points• Prospective theory-based

approach to synthesis. • Best suited to answer how,

why questions but can also answer what (program effects) questions.

• Open to use of mixed methods (experimental, quasi-experiment , qualitative methodologies).

• Synthesis occurs at the program model level (LST, MST, etc.).

• . Each project = a case.

• Challenges: Development of theoretical framework is a critical first step. Requires extensive coordination between projects to generalize at the program model level.

Realist Synthesis

• Key Points• Retrospective theory-based

approach to synthesis.• Strong focus on what

works for whom, when, and under which conditions.

• Challenges:• Prospective planning is

required but challenging• May not use standardized

measures to synthesize

5

Challenges: Balancing Rigor and Reality

14

Challenges: Balancing Rigor and Reality Cont’d

15

16

Thank You

Contact Information:

Donna Smith-Moncrieffe, BSc., CrimDip, Msc.A/Regional ManagerPublic Safety [email protected]

mailto:[email protected]

impact evaluation: balancing rigor with reality

Education

delayed comparison group

impact evaluation

youth model b

appropriate comparison

types of designs

t1t2 changes

appropriate level of

levels of dosage