instrumental variable regression
TRANSCRIPT
Wei Yu, PhD
February 27, 2019
I gratefully acknowledge Dr. Christine Chee for her preparation of this course
Instrumental Variable
Regression
Estimating Causal Effects
A common aim of health services research is the estimation of a causal effectβ What is the effect of [treatment] on [outcome]?
Ideally estimate the effect using a randomized controlled trialβ Conducting a randomized controlled trial is often not
possible
An alternative is to perform regression analysis using observational dataβ Treatment must be exogenous
β If treatment is not exogenous, estimated effects will be biased
2
Instrumental Variable (IV)
Regression When treatment is not exogenous,
another method is necessary
One possibility: instrumental variables
(IV) regression
3
Poll: Familiarity with IV Regression
New to IV regression
Somewhat familiar with IV regression
Advanced knowledge of IV regression
4
Outlines
Provide an introduction to instrumental variables (IV) regression
β Basic linear regression model
β Necessary conditions for a valid instrument
β Why and how instrumental variables regression works
β Examples
β Limitations
5
Linear Regression Model
ππ = π½0 + π½1ππ + ππ π: outcome variable of interest
π: explanatory variable of interest
π: error termβ π contains all other factors besides π that determine the
value of π
π½1: the change in π associated with a unit change in π
In order for απ½1 to be an unbiased estimate of the causal effect of π on π, π must be exogenous
6
Exogeneity
Assumption: πΈ ππ ππ = 0β Conditional mean of ππ given ππ is zero
β Additional information in ππ does not help us better predict ππβ π is βexogenousβ
β Implies that ππ and ππ cannot be correlated
ππ and ππ are correlated when there is:β Omitted variable bias
β Sample selection
β Simultaneous causality
If ππ and ππ are correlated then π is endogenous
β απ½1 is biased
7
Intuition
Idea behind instrumental variables regression:
β Variation in π has two components
βͺ One component is correlated with πβ Causes endogeneity
βͺ Other component is uncorrelated with πβ βExogenousβ variation
β Use only exogenous variation in π to estimate π½1
8
Instrumental Variables
Instrumental variables (instruments) can
be used to isolate the exogenous variation
in π that is uncorrelated with π
Two conditions for a valid instrument
β Instrument relevance
β Instrument exogeneity
9
Regression Model
ππ = π½0 + π½1ππ + ππ Problem: π is endogenous
β π and π are correlated
π contains all other factors besides π that
determine the value of π
Potential instrument π
10
Instrument Relevance
Instrument relevance: ππππ ππ , ππ β 0
β ππ is correlated with ππβ Variation in Z explains variation in X
β π affects π
π is βrelevantβ
11
Instrument Exogeneity
Instrument exogeneity: ππππ ππ , ππ = 0
β ππ is uncorrelated with ππβ π is uncorrelated with all other factors,
besides π, that determine π
β π does not affect π, except through π
Z is βexogenousβ
12
ππ = π½0 + π½1ππ + ππ
Valid Instrument
13
X YZ
uncorrelated
with π
correlated
with π
ππ = π½0 + π½1ππ + ππ
π only captures the variation in π that is uncorrelated with π
Valid Instrument
14
X YZ
uncorrelated
with π
correlated
with π
Intuition
ππ’π‘πππππ = π½0 + π½1π‘ππππ‘ππππ‘π + ππ Say treatment is assigned through a coin flip:
β Heads: patient gets treatment
β Tails: patient does not get treatment
Is the coin flip a valid instrument for treatment?β Does it affect whether or not a patient receives
treatment? It is relevant.
β Does it directly affect the outcome? It is exogenous.
Variation in an instrument mimics a randomization of patients to different likelihoods of receiving treatment
15
Instrumental Variables Model
ππ = π½0 + π½1ππ + ππ Endogeneity: ππππ ππ , ππ β 0
Valid instrument, π:
β Relevant: ππππ ππ , ππ β 0
β Exogenous: ππππ ππ , ππ = 0
16
Two Stage Least Squares (1)
First stage:
β Regress π on π:
ππ = π0 + π1ππ + πΎπ
β Predict X:ππ = ΰ·π0 + ΰ·π1ππ
17
uncorrelated
with πcorrelated
with π
Two Stage Least Squares (2)
Second stage:
β Regress π on π:
ππ = π½0πππΏπ + π½1
πππΏπ ππ + ππππππ
β Estimate απ½1πππΏπ
βͺ π is uncorrelated with π from the original regression model ππ = π½0 + π½1ππ + ππ
βͺ απ½1πππΏπ
is an unbiased estimate of π½1βͺ Note: standard errors in the second stage TSLS
regression need to be adjusted
18
General IV Model
ππ= π½0 + π½1π1π +β―+ π½ππππ + π½π+1π1π +β―+ π½π+ππππ
+ ππ k endogenous regressors: π1π,β¦, πππ r exogenous regressors or control variables: π1π,β¦,πππ
m instrumental variables: π1π,β¦, πππ
There must be at least as many instruments as there are endogenous variables: m β₯ k
19
Intensive Treatment for AMI
Does more intensive treatment of acute myocardial infarction (AMI) in the elderly reduce mortality?
β McClellan, McNeil, Newhouse (1994)
We want to estimate the effect of intensive treatment of AMI (cardiac catheterization, angioplasty, CABG) on mortality
20
Regression Model
Model:
ππππ‘ππππ‘π¦π = π½0 + π½1π‘ππππ‘ππππ‘π + ππ
Problem:β Whether or not a patient receives more intensive
treatment is correlated with many unobserved factors that may also affect mortalityβͺ E.g., health status, patient or physician preferences
21
Endogeneity
22
Evidence of selection bias
β Estimates that do not account for selection are biased
Endogeneity (2)
23
Instrument
Idea:
β Patients who live closer to hospitals that have the capacity to perform more intensive treatments are more likely to receive those treatments (relevance)
β The distance a patient lives from a given hospital should be independent of his health status (exogeneity)
Instrument (for intensive treatment): differential distance to catheterization and revascularization hospitals
24
Instrument (2)
25
Results
IV estimates of the effect of catheterization on mortality are much smaller than estimates that do not take into account selection
26
Results (2)
Catheterization within 90 days of AMI reduces mortality by 5 percentage points at 1 to 4 years
Caveats:β The validity of results hinge on the validity of the instrument
β IV estimates the LATE: this is an estimate of the marginal effect of catheterization (for patients who would not have otherwise received treatment if they lived relatively far from a catheterization or revascularization hospital)
β This estimate is an upper bound of the effect of catheterizationβͺ If catheterization or revascularization hospitals offer better care other than more intensive
procedures (e.g., more beds, specialists, ICU), then mortality should be lower at those hospitals
27
Distance as an Instrument?
What is the effect of primary care (PC) on health
outcomes?
β Endogeneity: people usually see a doctor when they are
sick
β Suppose
βͺ Patients who live closer to PC clinics are probably more likely
to see a PC provider
βͺ Patients who need to see a doctor move to live closer to clinics
β Can we use distance to the nearest PC clinic as an
instrument for PC use?
28
Polls: Distance as an Instrument?
1. Is distance relevant?
A. Yes
B. No
2. Is distance exogenous?
A. Yes
B. No
29
Distance as an Instrument? (2)
What is the effect of emergency department (ED)
services for car accident injuries on mortality?
β Endogeneity: only seriously injured passengers are taken to
the ED
β Suppose:
βͺ All people who need medical care are taken to the ED, regardless
of distance
βͺ Distance to the nearest ED is uncorrelated with accident severity
β Can we use distance to the nearest ED as an instrument for
treatment in an ED?
30
Polls: Distance as an Instrument?
1. Is distance relevant?
A. Yes
B. No
2. Is distance exogenous?
A. Yes
B. No
31
Other IV Examples
Zulman, Pal Chee, et al. (2015): effect of VA intensive management primary care on VA health care costs; instrument: random assignment to treatment vs. usual care groups
Bhattacharya, et al. (2011): effect of insurance coverage on body weight; instruments: distribution of firm size and Medicaid coverage for each state and year
Doyle (2013): effect of foster care on long- and short-term outcomes; instrument: random assignment to investigators
32
Weak Instruments
Instruments that explain little variation in X are weak
IV regression with weak instruments provide unreliable estimates
Rule of thumb to check for weak instruments when there is only one endogenous regressor:β From the first stage regression of TSLS, compute the F-statistic testing
the hypothesis that the coefficients on the instruments are all equal to zero
ππ = π0 + π1π1π +β―+ πππππ + πΎπ
π»0: π1 = β¦ = ππ = 0π»1: π1 β 0 ππβ¦ππ ππ β 0
β F-statistic > 10 indicates instruments are not weak
β Note: this is a rule of thumb; we still need a convincing argument that the instrument is relevant (strong)
33
Endogenous Instruments
Instruments that are correlated with the error term (other factors that affect the outcome variable) are endogenous
IV regression with endogenous instruments provide unreliable estimatesβ The point of IV regression is to isolate and utilize exogenous
variation in π to estimate π½1 When there are more instruments than there are endogenous
regressors, possible to test βoveridentifying restrictionsβ β Overidentifying restrictions test (J-statistic)
Need a convincing argument that the instruments are exogenous
34
Summary
IV regression is powerful tool to estimate causal effects
Conditions for a valid instrument:β Relevance: the instrument must affect treatment
β Exogeneity: the instrument must be uncorrelated with all other factors that may affect outcomes
Good instruments are difficult to find
Using an invalid (weak or endogenous) instrument will give meaningless results
Some tests available to check instrument validity, but what is absolutely necessary is a good βstoryβ for why an instrument is relevant and exogenous
35
Resources and References
Stock, James H. and Mark W. Watson, 2011. Introduction to Econometrics (Third
Edition). Boston, MA: Addison-Wesley. (Chapter 12: Instrumental Variables
Regression)
Bhattacharya, Jay, M. Kate Bundorf, Noemi Pace, and Neeraj Sood. 2011. βDoes
Health Insurance Make You Fat?β Chap. 2 in Economic Aspects of Obesity,
University of Chicago Press.
Doyle, Joseph, 2013. βCausal Effects of Foster Care: An Instrumental-Variables
Approach.β Children and Youth Services Review 3(7): 1143-1151.
McClellan, Mark, Barbara J. McNeil, and Joseph P. Newhouse. 1994. βDoes More
Intensive Treatment of Acute Myocardial Infarction in the Elderly Reduce
Mortality?β Journal of the American Medical Association 272(11): 859-866.
Zulman, Donna M., Christine Pal Chee, Stephen C. Ezeji-Okoye, Jonathan G.
Shaw, James S. Kahn, and Steven M. Asch. 2015. βEvaluating Innovative Care
Models for High Utilizing Patients.β VA HSR&D Pilot Project.
36