measuring impact

Measuring Impact

Course Overview

1. What is Evaluation?

2. Measuring Impact

3. Why Randomize?

4. How to Randomize?

5. Sampling and Sample Size

6. Threats and Analysis

7. Cost-Effectiveness Analysis

8. Project from Start to Finish

CASE STUDYWomen as Policymakers

What was the main purpose of the 73rd Amendment of India’s constitution?

A. To reserve leadership positions for women (and caste minorities)

B. To formalize local institutions of leadership

C. To give women the right to vote in local elections

To rese

rve le

adership po...

To form

alize

loca

l insti

tu...

To give w

omen the rig

ht ..

0% 0%0%

Theory of Change from Reservations to Final Outcomes

Women are empowered

Public goods show

women’s preferences

Pradhan’s preferences

matterDemocracy imperfect

More female Pradhans

Reservations for women

Women have different

preferences

Data Used Covers All Aspects of Theory of Change

Purpose of Measurement

Sources of Measurement

Indicators

Reservations Policy and GP Priorities

Administrative Data

• Reservation• Budgets• Balance sheets

Preferences of Women and Men

Transcript from village meeting

• Who speaks and when (gender)• Issues raised

Public Investments

Village Leader Interview

• Political experience• Investments undertaken

Public Investments

Village PRA • Village infrastructure + investments

• Perception of public good quality• Participation of men and women

Household (HH) Survey

• Declared HH preference• HH perceptions of quality of

public goods and services

Results: Reservations Increase Female Leadership

West Bengal Rajasthan

Of Pradhans:Reserved Unreserve

dReserve

dUnreserve

d

Total Number 54 107 40 60

Proportion female 100% 6.5% 100% 1.7%

Results: Women Raise Different Issues at GP Meetings


Of Pradhans: Women Men Women Men

Drinking water 31% 17% 54% 43%

Road improvement 31% 25% 13% 23%

Results: Public Investments Show Women’s Preferences


Issue Investment

Issue ReservedInvestmen

t

Issue ReservedInvestme

ntW M W M

Drinking Water

# facilities 31%

17%

9.09 54%

43%

2.62

Road Improvement

Road Condition (0-1)

31%

25%

0.18 13%

23%

-0.08

Irrigation # facilities 4% 20%

-0.38 2% 4% -0.02

Education Informal education center

6% 12%

-0.16 5% 13%

Why do You Need Data? Follow Theory of Change

Our theory and hypothesis helps us define the set of

outcomes

Need to find indicators that map the outcomes well

Characteristics: Who are the people the program works

with, and what is their environment?

• Sub-groups, covariates, predictors of compliance

Channels: How does the program work, or fail to work?

Outcomes: What is the purpose of the program? Did it

achieve that purpose?

Lecture Overview

Theory of Change: What Do You Want to Measure?

Theory of Measurement: What Makes a Good Measure?

Practice of Measurement: Measuring the Immeasurable

Collecting Data

WHAT MAKES A GOOD MEASURETheory of Measurement

The Main Challenge in Measurement: Getting Accuracy and Precision

More accurate M

ore

p

recis

e

Terms “Biased” and “Unbiased” Used to Describe Accuracy

More accurate

“Biased” “Unbiased”On average, we get the wrong answer

On average, we get the right answer

Terms “Noisy” and “Precise” Used to Describe Precision

M

ore

p

recis

e“Noisy”

“Precise”

Random error causes answer to bounce around

Measures of the same thing cluster together

A Noisy and a Precise Measure Can Both Be Biased

More accurate M

ore

p

recis

e“Noisy”

“Precise”



Choices in Real Measurement Often Harder

More accurate M

ore

p

recis

e“Noisy” but

“Unbiased”

“Precise” but “Biased”



The Main Challenge in Measurement: Getting Accuracy and Precision

More accurate M

ore

p

recis

e

Is this introducing noise or bias?

A. Noise / Random Error

B. Bias

A surveyor doesn´t follow the exact phrase of the question:Surveyor- “you feel unsecure in this neighborhood, right?” Respondent- “well, sometimes yes and sometimes, well, no…” Surveyor- “so, is more of a yes, right?”

- “mmm…well”Respondent - “ok, thanks. Next question.” code 1=yes

Random Error

Systematic E

rror

0%0%

Accuracy

In theory:

• How well does the indicator map to the outcome? (e.g. IQ test intelligence)

In practice: Are you getting unbiased answers?

• Respondent bias

• Recall bias

• Anchoring bias

• Social desirability bias (response bias)

• Framing effect / Neutrality

Precision and Error

In theory: The measure is precise, but not necessarily accurate In practice:

• Length, fatigue

• “How much did you spend on broccoli yesterday?” (as a measure of annual broccoli spending)

• Ambiguous wording (definitions, relationships, recall period, units of question)

Eg. Definition of terms – ‘household’, ‘income’

• Recall period /units of question

• Type of answer -Open/Closed

• Choice of options for closed questions

Likert (i.e. Strongly disagree, disagree, neither agree nor disagree, . . .)

Rankings

Quantitative scales. Numbers or bins.

Random Error

Mistakes in estimation or recall Question wording Surveyor training/quality Data entry Length, fatigue of survey How do you generalize from certain questions?

Which is worse?

A. Bias (Low Accuracy)

B. Noise (Low Precision)

C. Equally bad

D. Depends

E. Don’t know/can’t say

Bias (Lo

w Accura

cy)

Noise (L

ow Precision)

Equally bad

Depends

Don’t kn

ow/can’t

say

0% 0% 0%0%0%

A biased measure will bias our impact estimates

A. True

B. False

C. Depends

D. Don’t know

True

False

Depends

Don’t kn

ow

0% 0%0%0%

MEASURING THE UNMEASURABLE

Practice of Measurement

What is Hard to Measure?

(1) Things people do not know very well

(2) Things people do not want to talk about

(3) Abstract concepts

(4) Things that are not (always) directly observable

(5) Things that are best directly observed

How much juice did you consume last month?

A. <2 liters

B. 2-5 liters

C. 6-10 liters

D. >11 liters

<2 liters

2-5 liters

6-10 liters

>11 liters

0% 0%0%0%

1. Things people do not know very well

What: Anything to estimate, particularly across time. Prone to recall error and poor estimation

• Examples: distance to health center, profit, consumption, income, plot size

Strategies:

• Frame the question in a way people are likely to know and remember.

• Consistency checks – How much did you spend in the last week on x? How much did you spend in the last 4 weeks on x?

• Multiple measurements of same indicator – How many minutes does it take to walk to the health center? How many kilometers away is the health center?

How many glasses of juice did you consume yesterday

A. 0

B. 1-3

C. 4-6

D. >6

0

01-Mar

4-6 >6

0% 0%0%0%

How frequently do you yell at your spouse

A. Daily

B. Several times per week

C. Once per week

D. Once per month

E. Never

Daily

Severa

l times p

er week

Once per w

eek

Once per m

onthNeve

r

0% 0% 0%0%0%

2. Things people don’t want to talk about

What: Anything socially “risky” or something painful

• Examples: sexual activity, alcohol and drug use, domestic violence, conduct during wartime, mental health

Strategies:

• Don’t start with the hard stuff!

• Consider asking question in third person

• Always ensure comfort and privacy of respondent

Asking Sensitive Questions: List Randomization

Randomly ask part of the sample the question with / without a sensitive option

Response only a count, not the specific options

How many of these statements are true for you? This morning I took a

shower. My nearest bank branch

office is within walking distance.

I have tea every day. I use my loan for non-

business expenses.

How many of these

statements are true for

you?

This morning I took a

shower.

My nearest bank branch

office is within walking

distance.

I have tea every day.


Average number of true statements:

2.1

Average number of true statements:

2.8

2.8 – 2.1 = 0.7 full TRUE difference (on average)70% used their loan for non-business purposes.

List Randomization Shows Big Differences in Some Real Cases


Some questions are sensitive and people are hesitant to answer truthfully

List randomization is a way to get the answer on average without knowing confidential information on any one person







37

“I feel more empowered now than last year”

A. Strongly disagree

B. Disagree

C. Neither agree nor disagree

D. Agree

E. Strongly agree

Strongly

disagre

e

Disagre

e

Neither a

gree nor disa

gree

Agree

Strongly

agree

0% 0% 0%0%0%

3. Abstract concepts

What: Potentially the most challenging and interesting type of difficult-to-measure indicators Examples: empowerment, bargaining power, social cohesion,

risk aversion

Strategies:• Three key steps when measuring “abstract concepts”

• Define what you mean by your abstract concept

• Choose the outcome that you want to serve as the measurement of your concept

• Design a good question to measure that outcome

Often choice between choosing a self-reported measure and a behavioral measure – both can add value!

“I am involved in the decision to send my child to private vs public school”

A. Strongly disagree

B. Disagree

C. Neither agree nor disagree

D. Agree

E. Strongly agree

Strongly

disagre

e

Disagre

e

Neither a

gree nor disa

gree

Agree

Strongly

agree

0% 0% 0%0%0%

How “Socially Connected" do You Feel to the Other People in this Room?

ABCDE A B C D E

20% 20% 20%20%20%

You

Everyone else in

this room

How likely are you to take a taxi with a driver you don’t know after dark?

A. Very unlikely

B. Unlikely

C. Likely

D. Very likely

Very unlikely

Unlik

ely Li

kely

Very lik

ely

0% 0%0%0%

Perceptions and Attitudes

Ask directly

• “How effective is your leader?” (ineffective, somewhat

effective, effective, very…)

Indirect approaches often have better accuracy

• Listen to a Vignette (Male v. Female)

• Revealed preference – voting behavior

• Implicit Association tests

Implicit Association Test

People simplify the world for efficiency

• Use thumb rules to draw connections

• May not even be aware themselves

For some important outcomes, may be worth trying to measure these indirectly

• Implicit association one technique

Implicit Association Test: Match on Left or Right?


Parliament


Home


Parliament

Implicit Association Test

People simplify the world for efficiency

• Use thumb rules to draw connections

• May not even be aware themselves

Actually based on response time, not accuracy

• Are respondents faster to select “Parliament” when associated with a man than a woman?







51

What proportion race X people are denied jobs due to racial discrimination?

A. 0%

B. 1-20%

C. 21-40%

D. 41-60%

E. >60%

0%1-20%

21-40%

41-60%>60%

0% 0% 0%0%0%

4. Things that aren’t Directly Observable

What: You may want to measure outcomes that you can’t ask directly about or directly observe

• Examples: corruption, fraud, discrimination

Strategies:

• Audit studies, e.g. Rajasthan police experiment to register cases, Delhi Driver’s Licenses, Delhi doctors

• Multiple sources of data, e.g. inputs of funds vs. outputs received by recipients, pollution reports by different parties

• Don’t worry – there have already been lots of clever people before you – so do literature reviews!

5. Things that are Best Directly Observed

What: Behavioral preferences, anything that is more believable when done than said

Strategies:

• Develop detailed protocols

• Ensure data collection of behavioral measures done under the same circumstances for all individuals

• Example: how often do you wash your hands? Strategy: collect the data directly while disrupting

participants’ lives as little as possible E.g., put movement sensors in soap, measure the

weight of the soap

The Problem

With the following questions…

Question: After the last time you used the toilet, did you wash your hands? (The problem with this indicator is….)

A. Accuracy

B. Precision

C. Both

D. Neither

Accuracy

Precision

Both

Neither

0% 0%0%0%

Outcome: Gender BiasQuestion: How effective are women leaders? (ineffective, somewhat effective, effective, very…)

A. Accuracy

B. Precision

C. Both

D. Neither

Accuracy

Precision

Both

Neither

0% 0%0%0%

Examples

Study: Double-Fortified Salt (Duflo et al. forthcoming)

• Where: Bihar, India

• Intervention: selling low-cost iron-fortified salt to at-risk-for-anemia families.

• Indicators:Physical fitness (directly observable; step test)Physical development (directly observable; anthro

measures)Cognitive development (indirectly observable;

puzzles, tests)Health history (indirectly observable; surveying on

immunizations received, etc.)

58

COLLECTING DATA

When to Collect Data

[ Baseline ] : Before you start

During the intervention

End line : After you’re done

[ Scale-up, intervention ]

Methods of Data Collection: Not Just Surveys

Surveys-

household/individual

Administrative data

Logs/diaries

Qualitative – eg. focus

groups, RRA

Games and choice

problems

Observation

Health/Education tests and

measures

Where can we get Data?

The good . . . . and the bad

Administrative data• Collected by a

government or similar body as part of operations

• May already be collected and thus free• Can be extremely accurate (e.g. electricity bills)

• May not exist or not answer the question you want• May itself change behavior (e.g. taxes)

Other secondary data• Collected for

research or other purposes not admin

• May already be collected and thus free• Can inform the larger context of a project

• May not exist or not answer the question you want• Dubious quality

Primary data• Collected by

researchers for study

• Address the exact question of interest• Cover channels and assumptions

• Very costly and time consuming• May be biased answers

Primary Data Collection

Surveys

• Questions

• Exams, tests, games, vignettes, etc.

Direct Observation

• Personal or technological (e.g. smoke sensor, vibration sensor)

Diaries/Logs

Common Survey Modules Can be Adapted for a Particular Project

Demographics Economic

• Income, consumption, expenditure, time use

• Yields, production, etc. Beliefs

• Expectations or assumptions

• Bargaining power, patience, risk Anthropometric Cognitive, Learning

Primary Data Collection Considerations

Quality

• Reliability and validity of the data

Costs / Logistics

• Surveyor recruitment and training

• Field work and transport, interview time

• Electronic vs. paper

• Data entry, reconciliation, cleaning, etc.

Ethics Human subjects, data security

Reliability of Data Collection

The process of collecting “good” data requires a lot of

efforts and thought

Need to make sure that the data collected is precise and

accurate.

avoid false conclusions, for any research design

The survey process:

Design of questionnaire Survey printed on

paper/electronic filled in by enumerator interviewing the

respondent data entry electronic dataset

Where can this go wrong?

Things to Take-away :

Theory of change guides measurement

• Want to measure each step

Data collection all about trade-offs

• Quality and cost

• Bias (accuracy) and noise (precision)

Creative techniques can sometimes help

• Think about what outcomes are most important

Thank you!

measuring impact

Education

aspects of theory

theory of measurementwhat

leadership positionsfor

good measure

program work

getthe right answer

proportion female

practice of measurement