measuring impact
DESCRIPTION
Measuring ImpactTRANSCRIPT
Measuring Impact
Course Overview
1. What is Evaluation?
2. Measuring Impact
3. Why Randomize?
4. How to Randomize?
5. Sampling and Sample Size
6. Threats and Analysis
7. Cost-Effectiveness Analysis
8. Project from Start to Finish
CASE STUDYWomen as Policymakers
What was the main purpose of the 73rd Amendment of India’s constitution?
A. To reserve leadership positions for women (and caste minorities)
B. To formalize local institutions of leadership
C. To give women the right to vote in local elections
To rese
rve le
adership po...
To form
alize
loca
l insti
tu...
To give w
omen the rig
ht ..
0% 0%0%
Theory of Change from Reservations to Final Outcomes
Women are empowered
Public goods show
women’s preferences
Pradhan’s preferences
matterDemocracy imperfect
More female Pradhans
Reservations for women
Women have different
preferences
Data Used Covers All Aspects of Theory of Change
Purpose of Measurement
Sources of Measurement
Indicators
Reservations Policy and GP Priorities
Administrative Data
• Reservation• Budgets• Balance sheets
Preferences of Women and Men
Transcript from village meeting
• Who speaks and when (gender)• Issues raised
Public Investments
Village Leader Interview
• Political experience• Investments undertaken
Public Investments
Village PRA • Village infrastructure + investments
• Perception of public good quality• Participation of men and women
Household (HH) Survey
• Declared HH preference• HH perceptions of quality of
public goods and services
Results: Reservations Increase Female Leadership
West Bengal Rajasthan
Of Pradhans:Reserved Unreserve
dReserve
dUnreserve
d
Total Number 54 107 40 60
Proportion female 100% 6.5% 100% 1.7%
Results: Women Raise Different Issues at GP Meetings
West Bengal Rajasthan
Of Pradhans: Women Men Women Men
Drinking water 31% 17% 54% 43%
Road improvement 31% 25% 13% 23%
Results: Public Investments Show Women’s Preferences
West Bengal Rajasthan
Issue Investment
Issue ReservedInvestmen
t
Issue ReservedInvestme
ntW M W M
Drinking Water
# facilities 31%
17%
9.09 54%
43%
2.62
Road Improvement
Road Condition (0-1)
31%
25%
0.18 13%
23%
-0.08
Irrigation # facilities 4% 20%
-0.38 2% 4% -0.02
Education Informal education center
6% 12%
-0.16 5% 13%
Why do You Need Data? Follow Theory of Change
Our theory and hypothesis helps us define the set of
outcomes
Need to find indicators that map the outcomes well
Characteristics: Who are the people the program works
with, and what is their environment?
• Sub-groups, covariates, predictors of compliance
Channels: How does the program work, or fail to work?
Outcomes: What is the purpose of the program? Did it
achieve that purpose?
Lecture Overview
Theory of Change: What Do You Want to Measure?
Theory of Measurement: What Makes a Good Measure?
Practice of Measurement: Measuring the Immeasurable
Collecting Data
WHAT MAKES A GOOD MEASURETheory of Measurement
The Main Challenge in Measurement: Getting Accuracy and Precision
More accurate M
ore
p
recis
e
Terms “Biased” and “Unbiased” Used to Describe Accuracy
More accurate
“Biased” “Unbiased”On average, we get the wrong answer
On average, we get the right answer
Terms “Noisy” and “Precise” Used to Describe Precision
M
ore
p
recis
e“Noisy”
“Precise”
Random error causes answer to bounce around
Measures of the same thing cluster together
A Noisy and a Precise Measure Can Both Be Biased
More accurate M
ore
p
recis
e“Noisy”
“Precise”
Random error causes answer to bounce around
Measures of the same thing cluster together
Choices in Real Measurement Often Harder
More accurate M
ore
p
recis
e“Noisy” but
“Unbiased”
“Precise” but “Biased”
Random error causes answer to bounce around
Measures of the same thing cluster together
The Main Challenge in Measurement: Getting Accuracy and Precision
More accurate M
ore
p
recis
e
Is this introducing noise or bias?
A. Noise / Random Error
B. Bias
A surveyor doesn´t follow the exact phrase of the question:Surveyor- “you feel unsecure in this neighborhood, right?” Respondent- “well, sometimes yes and sometimes, well, no…” Surveyor- “so, is more of a yes, right?”
- “mmm…well”Respondent - “ok, thanks. Next question.” code 1=yes
Random Error
Systematic E
rror
0%0%
Accuracy
In theory:
• How well does the indicator map to the outcome? (e.g. IQ test intelligence)
In practice: Are you getting unbiased answers?
• Respondent bias
• Recall bias
• Anchoring bias
• Social desirability bias (response bias)
• Framing effect / Neutrality
Precision and Error
In theory: The measure is precise, but not necessarily accurate In practice:
• Length, fatigue
• “How much did you spend on broccoli yesterday?” (as a measure of annual broccoli spending)
• Ambiguous wording (definitions, relationships, recall period, units of question)
Eg. Definition of terms – ‘household’, ‘income’
• Recall period /units of question
• Type of answer -Open/Closed
• Choice of options for closed questions
Likert (i.e. Strongly disagree, disagree, neither agree nor disagree, . . .)
Rankings
Quantitative scales. Numbers or bins.
Random Error
Mistakes in estimation or recall Question wording Surveyor training/quality Data entry Length, fatigue of survey How do you generalize from certain questions?
Which is worse?
A. Bias (Low Accuracy)
B. Noise (Low Precision)
C. Equally bad
D. Depends
E. Don’t know/can’t say
Bias (Lo
w Accura
cy)
Noise (L
ow Precision)
Equally bad
Depends
Don’t kn
ow/can’t
say
0% 0% 0%0%0%
A biased measure will bias our impact estimates
A. True
B. False
C. Depends
D. Don’t know
True
False
Depends
Don’t kn
ow
0% 0%0%0%
MEASURING THE UNMEASURABLE
Practice of Measurement
What is Hard to Measure?
(1) Things people do not know very well
(2) Things people do not want to talk about
(3) Abstract concepts
(4) Things that are not (always) directly observable
(5) Things that are best directly observed
How much juice did you consume last month?
A. <2 liters
B. 2-5 liters
C. 6-10 liters
D. >11 liters
<2 liters
2-5 liters
6-10 liters
>11 liters
0% 0%0%0%
1. Things people do not know very well
What: Anything to estimate, particularly across time. Prone to recall error and poor estimation
• Examples: distance to health center, profit, consumption, income, plot size
Strategies:
• Frame the question in a way people are likely to know and remember.
• Consistency checks – How much did you spend in the last week on x? How much did you spend in the last 4 weeks on x?
• Multiple measurements of same indicator – How many minutes does it take to walk to the health center? How many kilometers away is the health center?
How many glasses of juice did you consume yesterday
A. 0
B. 1-3
C. 4-6
D. >6
0
01-Mar
4-6 >6
0% 0%0%0%
What is Hard to Measure?
(1) Things people do not know very well
(2) Things people do not want to talk about
(3) Abstract concepts
(4) Things that are not (always) directly observable
(5) Things that are best directly observed
How frequently do you yell at your spouse
A. Daily
B. Several times per week
C. Once per week
D. Once per month
E. Never
Daily
Severa
l times p
er week
Once per w
eek
Once per m
onthNeve
r
0% 0% 0%0%0%
2. Things people don’t want to talk about
What: Anything socially “risky” or something painful
• Examples: sexual activity, alcohol and drug use, domestic violence, conduct during wartime, mental health
Strategies:
• Don’t start with the hard stuff!
• Consider asking question in third person
• Always ensure comfort and privacy of respondent
Asking Sensitive Questions: List Randomization
Randomly ask part of the sample the question with / without a sensitive option
Response only a count, not the specific options
How many of these statements are true for you? This morning I took a
shower. My nearest bank branch
office is within walking distance.
I have tea every day. I use my loan for non-
business expenses.
How many of these
statements are true for
you?
This morning I took a
shower.
My nearest bank branch
office is within walking
distance.
I have tea every day.
Asking Sensitive Questions: List Randomization
Average number of true statements:
2.1
Average number of true statements:
2.8
2.8 – 2.1 = 0.7 full TRUE difference (on average)70% used their loan for non-business purposes.
List Randomization Shows Big Differences in Some Real Cases
Asking Sensitive Questions: List Randomization
Some questions are sensitive and people are hesitant to answer truthfully
List randomization is a way to get the answer on average without knowing confidential information on any one person
What is Hard to Measure?
(1) Things people do not know very well
(2) Things people do not want to talk about
(3) Abstract concepts
(4) Things that are not (always) directly observable
(5) Things that are best directly observed
37
“I feel more empowered now than last year”
A. Strongly disagree
B. Disagree
C. Neither agree nor disagree
D. Agree
E. Strongly agree
Strongly
disagre
e
Disagre
e
Neither a
gree nor disa
gree
Agree
Strongly
agree
0% 0% 0%0%0%
3. Abstract concepts
What: Potentially the most challenging and interesting type of difficult-to-measure indicators Examples: empowerment, bargaining power, social cohesion,
risk aversion
Strategies:• Three key steps when measuring “abstract concepts”
• Define what you mean by your abstract concept
• Choose the outcome that you want to serve as the measurement of your concept
• Design a good question to measure that outcome
Often choice between choosing a self-reported measure and a behavioral measure – both can add value!
“I am involved in the decision to send my child to private vs public school”
A. Strongly disagree
B. Disagree
C. Neither agree nor disagree
D. Agree
E. Strongly agree
Strongly
disagre
e
Disagre
e
Neither a
gree nor disa
gree
Agree
Strongly
agree
0% 0% 0%0%0%
How “Socially Connected" do You Feel to the Other People in this Room?
ABCDE A B C D E
20% 20% 20%20%20%
You
Everyone else in
this room
How likely are you to take a taxi with a driver you don’t know after dark?
A. Very unlikely
B. Unlikely
C. Likely
D. Very likely
Very unlikely
Unlik
ely Li
kely
Very lik
ely
0% 0%0%0%
How likely are you to take a taxi with a driver you don’t know after dark?
A. Very unlikely
B. Unlikely
C. Likely
D. Very likely
Very unlikely
Unlik
ely Li
kely
Very lik
ely
0% 0%0%0%
Perceptions and Attitudes
Ask directly
• “How effective is your leader?” (ineffective, somewhat
effective, effective, very…)
Indirect approaches often have better accuracy
• Listen to a Vignette (Male v. Female)
• Revealed preference – voting behavior
• Implicit Association tests
Implicit Association Test
People simplify the world for efficiency
• Use thumb rules to draw connections
• May not even be aware themselves
For some important outcomes, may be worth trying to measure these indirectly
• Implicit association one technique
Implicit Association Test: Match on Left or Right?
Implicit Association Test: Match on Left or Right?
Parliament
Implicit Association Test: Match on Left or Right?
Home
Implicit Association Test: Match on Left or Right?
Parliament
Implicit Association Test
People simplify the world for efficiency
• Use thumb rules to draw connections
• May not even be aware themselves
Actually based on response time, not accuracy
• Are respondents faster to select “Parliament” when associated with a man than a woman?
What is Hard to Measure?
(1) Things people do not know very well
(2) Things people do not want to talk about
(3) Abstract concepts
(4) Things that are not (always) directly observable
(5) Things that are best directly observed
51
What proportion race X people are denied jobs due to racial discrimination?
A. 0%
B. 1-20%
C. 21-40%
D. 41-60%
E. >60%
0%1-20%
21-40%
41-60%>60%
0% 0% 0%0%0%
4. Things that aren’t Directly Observable
What: You may want to measure outcomes that you can’t ask directly about or directly observe
• Examples: corruption, fraud, discrimination
Strategies:
• Audit studies, e.g. Rajasthan police experiment to register cases, Delhi Driver’s Licenses, Delhi doctors
• Multiple sources of data, e.g. inputs of funds vs. outputs received by recipients, pollution reports by different parties
• Don’t worry – there have already been lots of clever people before you – so do literature reviews!
5. Things that are Best Directly Observed
What: Behavioral preferences, anything that is more believable when done than said
Strategies:
• Develop detailed protocols
• Ensure data collection of behavioral measures done under the same circumstances for all individuals
• Example: how often do you wash your hands? Strategy: collect the data directly while disrupting
participants’ lives as little as possible E.g., put movement sensors in soap, measure the
weight of the soap
The Problem
With the following questions…
Question: After the last time you used the toilet, did you wash your hands? (The problem with this indicator is….)
A. Accuracy
B. Precision
C. Both
D. Neither
Accuracy
Precision
Both
Neither
0% 0%0%0%
Outcome: Gender BiasQuestion: How effective are women leaders? (ineffective, somewhat effective, effective, very…)
A. Accuracy
B. Precision
C. Both
D. Neither
Accuracy
Precision
Both
Neither
0% 0%0%0%
Examples
Study: Double-Fortified Salt (Duflo et al. forthcoming)
• Where: Bihar, India
• Intervention: selling low-cost iron-fortified salt to at-risk-for-anemia families.
• Indicators:Physical fitness (directly observable; step test)Physical development (directly observable; anthro
measures)Cognitive development (indirectly observable;
puzzles, tests)Health history (indirectly observable; surveying on
immunizations received, etc.)
58
COLLECTING DATA
When to Collect Data
[ Baseline ] : Before you start
During the intervention
End line : After you’re done
[ Scale-up, intervention ]
Methods of Data Collection: Not Just Surveys
Surveys-
household/individual
Administrative data
Logs/diaries
Qualitative – eg. focus
groups, RRA
Games and choice
problems
Observation
Health/Education tests and
measures
Where can we get Data?
The good . . . . and the bad
Administrative data• Collected by a
government or similar body as part of operations
• May already be collected and thus free• Can be extremely accurate (e.g. electricity bills)
• May not exist or not answer the question you want• May itself change behavior (e.g. taxes)
Other secondary data• Collected for
research or other purposes not admin
• May already be collected and thus free• Can inform the larger context of a project
• May not exist or not answer the question you want• Dubious quality
Primary data• Collected by
researchers for study
• Address the exact question of interest• Cover channels and assumptions
• Very costly and time consuming• May be biased answers
Primary Data Collection
Surveys
• Questions
• Exams, tests, games, vignettes, etc.
Direct Observation
• Personal or technological (e.g. smoke sensor, vibration sensor)
Diaries/Logs
Common Survey Modules Can be Adapted for a Particular Project
Demographics Economic
• Income, consumption, expenditure, time use
• Yields, production, etc. Beliefs
• Expectations or assumptions
• Bargaining power, patience, risk Anthropometric Cognitive, Learning
Primary Data Collection Considerations
Quality
• Reliability and validity of the data
Costs / Logistics
• Surveyor recruitment and training
• Field work and transport, interview time
• Electronic vs. paper
• Data entry, reconciliation, cleaning, etc.
Ethics Human subjects, data security
Reliability of Data Collection
The process of collecting “good” data requires a lot of
efforts and thought
Need to make sure that the data collected is precise and
accurate.
avoid false conclusions, for any research design
The survey process:
Design of questionnaire Survey printed on
paper/electronic filled in by enumerator interviewing the
respondent data entry electronic dataset
Where can this go wrong?
Things to Take-away :
Theory of change guides measurement
• Want to measure each step
Data collection all about trade-offs
• Quality and cost
• Bias (accuracy) and noise (precision)
Creative techniques can sometimes help
• Think about what outcomes are most important
Thank you!