sheetal sekhri university of virginia bread igc summer ... · sheetal sekhri university of virginia...

Lecture 2: Causal Inference Using Observational Data

Sheetal SekhriUniversity of Virginia

BREAD IGC Summer School, India 2012

July 21, 2012

Using Observational Data

• Many policies and programs are evaluated after theirimplementation

• Policy design can sometimes provide plausibly exogenousvariation

• Observational data that can be combined withinstitutional details for evaluation

• Applications may also lend themselves to naturalexperiments

• Observational data can also be used in such contexts

Benefits of Using Observational Data

• Time horizon relative to field experiments

• Not as expensive

• Externally valid inference (depending on the data anddesign)

• Less fraught with behavioral concerns

Limitations of Using Observational Data

• Selection concerns

• Data quality

• Data limitations - not all desired data for an applicationmay exist

• Data restrictions

• Data quality

Working with Observational Data- Methods

• Difference-in-Difference (DID)

• Regression Discontinuity Design (RDD)

Working with Observational Data- Methods

• Difference-in-Difference (DID)

• Regression Discontinuity Design (RDD)

Difference-in-Difference

• Most popular method used in empirical analysis

• Emulate an experiment with treatment and comparisongroups

• Uses panel data and is a two way fixed effects model

DID- Basic Idea

• With panel data on the treated group, can compare preand post intervention or policy change

• But any discerned effect can arise due to secular changes

• Panel data on comparison group can provide thecounterfactual

• What would happen to treated group over time inabsence of treatment

DID- Basic Idea

DID- Implementation

• Isolate the design using tabular or graphic representation

• Formalize using regression analysis

DID- Implementation

• Isolate the design using tabular or graphic representation

• Formalize using regression analysis

Tabular Representation

Before After DifferenceTreatmentControlDifference

Before After DifferenceTreatment YT1 YT2

ControlDifference

Before After DifferenceTreatment YT1 YT2

Control YC1 YC2

Difference

Before After DifferenceTreatment YT1 YT2 ∆YT = YT2 − YT1

Control YC1 YC2

Difference

Control YC1 YC2 ∆YC = YC2 − YC1

Difference

Control YC1 YC2 ∆YC = YC2 − YC1

Difference ∆YT − ∆YC

DID- Identifying Assumption

• Control group shows the time path of the treatmentgroup without the intervention

• Time trends in absence of treatment should be the same

• Levels can be different

• If different time trends, effect over or under stated

• Identifying assumption- No differential pre-trends

Difference in Difference

Outcome

Pre-treatment

Control group Treatment group

Outcome

Post-treatmentPre-treatment

Outcome

Treatment effect comparing Treatment & control group in post period

Outcome

Treatment effect comparing just the Treatmentgroup in pre & post period

Outcome

Difference-in-Difference: Treatment effect by comparing the Treatmentgroup in pre & post period after eliminating pre-existing difference b/w Treatment and Control group

Difference in Difference: Parallel Trend Assumption

Outcome

Difference in Difference: Parallel Trend Assumption Violated

Outcome

DID- Regression Analysis

• Suppose our policy change effects villages such that a setof villages are treated

• We have data over time for all villages

• The panel has only 2 time periods

• Post is an indicator that switches to 1 after theintervention

• T is an indicator that takes value 1 for the villages to betreated

• Outcome variable Y varies by villages and time

• Yit = α0 + α1 Post + α2 T + α3 Post ∗ T + εit

Before After DifferenceTreatment α0 + α2

ControlDifference

Before After DifferenceTreatment α0 + α2 α0 + α1 + α2 + α3

ControlDifference

Before After DifferenceTreatment α0 + α2 α0 + α1 + α2 + α3 α1 + α3

ControlDifference

Control α0

Difference

Control α0 α0 + α1

Difference

Control α0 α0 + α1 α1

Difference

Control α0 α0 + α1 α1

Difference α3

DID- Robustness and Extension

• With many years data before the intervention, possible tocheck for pre-trends to make estimation more credible

• Common support required - check for significant overlapin distributions of T and C

• Placebo test- No effect should be discerned if treatmentis randomly considered to occur in any year prior toactual date

• Balance across treatment and control- selection model toshow determinants of treatment not time varying

DID- Extension

• Yit = β0 + Tt + Vi + β2 Tt ∗ Vi + εit

• Tt full set of year fixed effects

• Vi full set of village fixed effects

• Allows for covariance between Tt and Tt ∗ Vi and Vi andTt ∗ Vi

• Systematic differences between villages allowed

• Allow for intervention to occur in years with differentoutcome variable

DID- Extension

Regression Discontinuity Design- RDD

• Resource allocation based on a cutoff- scores, date ofbirth, rationing cutoffs

• Can use an RD design in such settings

• Powerful way of addressing selection

• Observable characteristics in T and C can be different

• Common support not needed

RDD- Basic Idea

• The control and treated observations are very similararound the cutoff

• Scoring barely above the cutoff matter of chance

• Unobservable characteristics like ability very similar butone group gets treatment and other does not

• Selection process completely known and can be modeled

• Regression function between assignment and outcomevariable determined

RDD- Basic Idea

10 20 30 40 50 60 70 80 90 100Assignment Variable

Regression Discontinuity: No Effect

Control Group Treatment Group

Regression Discontinuity: Significant Effect

Counterfactual

Regression

Counterfactual

Regression

Treatment Effect

RDD- Implementation

• Probability of treatment should be discontinuous at thecutoff- T sample on one side

• Those offered T should take it up and control groupsshould not be able to get treated

• Sharp versus fuzzy design require different approaches

• The pr of T changes from 0 to 1 at the cutoff in sharpdesign

• If the pr does not change very sharply or the over ridesare high, use assignment as IV for treatment

RDD- Implementation

Regression Discontinuity: Sharp Design

Regression Discontinuity: Fuzzy Design

RDD- Implementation

• Parametric , semi-parametric or non parametric methodscan be used for estimation

• Mis-specified functional form can be a problem

• Discontinuity in regression functions at the cutoff is thetreatment effect

• Functional forms can generate spurious effects or biasedeffects

• Non linear functional forms estimated as linear regressionfunctions is an example

RDD- Implementation

Threats to RD: Nonlinear Functional Form

Treatment Group Control Group

Treatment Effect

RDD- Functional Form

• Visual inspection of the data - normalize the AV bysubtracting the cutoff from the observation score

• Over fitting the model allowing for interaction terms aswell

• Will reduce power and need a lot of data around he cutoff

• Sensitivity analysis to different functional forms

• Non parametric approaches - local linear regressions

• Sensitivity to bandwidth and kernel choice

• In semi parametric approaches, smooth functionestimated with splines and covariates can be controlled

RDD- Threats to the Validity

• The cutoffs should be unknown to the population

• Cutoffs should not be manipulated

• Testing for manipulation, can perform Mcrary’s test

• Other potential outcomes should be continuous to avoidalternative confounding interpretations

• Test for continuity of several available control variables

Threats to RD: Manipulation of the

Assignment Variable

McCrary Test (2008) • Statistical test for testing discontinuity of the assignment variable at

the cutoff point

Assignment Variable

the cutoff point

• Assignment variable satisfying “McCrary” Test

46 48 50 52 54

Assignment Variable

the cutoff point

• Assignment variable violating “McCrary” Test

46 48 50 52 54

RDD- LATE Estimator

• The limitation of RDD - effect isolated at cutoff

• Cutoff may not be policy relevant or results may not beexternally valid

• RD frontier can arise if cutoff varies by years or sites

• Can pool different cutoff to get a more general estimatefor the range over which cutoff varies

• More generalizable but masks heterogeneity

RDD- LATE Estimator

• The limitation of RDD - effect isolated at cutoff

• Cutoff may not be policy relevant or results may not beexternally valid

• RD frontier can arise if cutoff varies by years or sites

• Can pool different cutoff to get a more general estimatefor the range over which cutoff varies

• More generalizable but masks heterogeneity

sheetal sekhri university of virginia bread igc summer ... · sheetal sekhri university of virginia...

Documents

igc 28: informal information session - wipo · igc 28:...

intergovernmental committee for the protection and...

+91 7600013333 / +91 9558075555 - sheetal...

sheetal corporation, vadodara, electrical products

sheetal sharma-profile

sheetal ppt

entrepreneur of the year sandeep sekhri - mediazone ·...

sheetal corporation, vadodara, wire wound resistors

the chamber of tax consultants - compliances...

design portfolio-sheetal srivastava

sheetal report

foreign direct investments and indirect foreign ... ·...

educampfle sheetal shah

sheetal cool products limited · 2019-09-12 · sheetal ice...

os graders of l.k.g.-a · paarth sekhri anuj sekhri taruna...

sheetal- ppt. of energy drink

dowry deaths: response to weather variability in...

international gender champions 2017 … · 3. going global...

sheetal refrigeration

sheetal rathod - child labour