estimating the dose-response function through the glm approach

ESTIMATING THE DOSE-RESPONSE FUNCTION THROUGH THE GLM APPROACH

Barbara Guardabascio, Marco Ventura

Italian National Institute of Statistics

7th June 2013, Potsdam

1

Outline of the talk

Motivations;

2

literature references;

our contribution to the topic;

the econometrics of the dose-response;

how to implement the dose-response;

our programs;

applications.

Motivations

3

Main question:

how effective are public policy programs with continuous treatment exposure?

Fundamental problem:

treated individuals are self-selected and not randomly.

Treatment is not randomly assigned

(possible) solution:

estimating a dose-response function

Motivations

4

-500

00

50

00

10

00

015

00

020

00

0

E[y

ear6

(t)]

0 2 4 6 8 10Treatment level

Dose Response Low bound

Upper bound

Confidence Bounds at .95 % levelDose response function = Linear prediction

Dose Response Function

-200

00

-100

00

010

00

0

E[y

ear6

(t+

1)]

-E[y

ea

r6(t

)]


Treatment Effect Low bound

Upper bound


Treatment Effect Function

What is a dose-response function?

It is a relationship between treatment and an outcome variable e.g.: birth weight, employment, bank debt, etc

5

Motivations

How can we estimate a dose-response function?

It can be estimated by using the Generalized Propensity Score (GPS)

Literature references

1. Propensity Score for binary treatments:

Rosenbaum and Rubin (1983), (1984)

6

3. Generalized Propensity Score for continuous treatments:

Hirano and Imbens, 2004; Imai and Van Dyk (2004)

2. for categorical treatment variables:

Imbens (2000), Lechner (2001)

Our contribution

7

Ad hoc programs have been provided to STATA users (Bia and Mattei, 2008), but …

… these programs contemplate only Normal distribution of the treatment variable

(gpscore.ado and doseresponse.ado)

We provide new programs to accommodate other distributions, not Normal.

(gpscore2.ado and doseresponse2.ado)

The econometrics of the dose-response

8

{Yi(t)} set of potential outcomes for

Where is the set of potential treatments over [t0, t1]


9

N individuals, i=1 … N

Xi vector of pre-treatment covariates;

Ti level of treatment delivered;

Yi (Ti) outcome corresponding to the treatment Ti

Let us suppose to have


10

Hirano-Imbens define the GPS as the conditional density of the actual treatment given the covariates

)()( tYEt i

We want the average dose response function

)|( XTrR


11

Within strata with the same r(t,x) the probability that T=t does not depend on X

),(|}{1 xtrtTX

Balancing property:


12

This means that the GPS can be used to eliminate any bias associated with differences in the covariates and …

tXTtY |)(

If weak unconfoundedness holds we have

13

rRtTYE

rXtrtYErt

,|

),(|)(),(

The dose-response function can be computed as:


),(,)( XtrtEt

14

1. Regress Ti on Xi and

The dose-respone can be implemented in 3 steps:

FIRST STEP:

take the conditional distribution of the treatment giventhe covariates Ti| Xi

How to implement the GPS

15

2i

' ,D ~ |)( XXTf ii

Where f(.) is a suitable transformation of T (link) D is a distribution of the exponential family

β parameters to be estimated

σ conditional SE of T|X


16

GPS

2' ˆ,ˆ,ˆ iii XTDR

1a. Test the balancing property


17

Model the conditional expectation of E[Yi| Ti, Ri ] as a function of Ti and Ri

SECOND STEP:

iiiiii

iii

RTRRTT

RTYErt

52

432

210

,|),(


18

Estimate the dose-response function by averaging the estimated conditionl expectation over the GPS at each level of the treatment we are interested in

THIRD STEP:

N

iiXtrtN

t ),(ˆ,ˆ1

)(


19

Where is the novelty?

in the FIRST STEP

Instead of a ML we use a GLM

exponential distribution (family)

combined with a link function


20

our programs

Link\Distr Normal Inv. Normal

Binomial Poisson Neg. Binomial

Gamma

Identity X X X X X X

Log X X X X X X

Logit X

Probit X

Cloglog X

Power X X X X X X

Opower X

Nbin X

Loglog X

Logc X

We have written two programs:

doserepsonse2.ado;

estimates the dose-response function and graphs the result.

It carries out step 1 – 2 – 3 of the previous slides by running other 2 programs

21

our programs

gpscore2.ado:

evaluates the gpscore under 6 different distributional assumptions

step 1 of the previous slides

22

doseresponse_model.ado:

Carries out step 2 of the previous slides

our programs

doseresponse2 varlist , outcome(varname) t(varname) family(string) link(string) gpscore(newvarname) predict(newvarname) sigma(newvarname) cutpoints(varname) nq_gps(#) index(string) dose_response(newvarlist)

Optionst_transf(transformation) normal_test(test) normal_level(#) test_varlist(varlist) test(type) flag(#) cmd(regression_cmd) reg_type_t(string) reg_type_gps(string) interaction(#) t_points(vector) npoints(#) delta(#) bootstrap(string) filename(filename) boot_reps(#) analysis(string) analysis_leve(#) graph(filename) flag_b(#) opt_nb(string) opt_b(varname) detail

23

our programs

gpscore2 varlist , t(varname) family(string) link(string) gpscore(newvarname) predict(newvarname) sigma(newvarname) cutpoints(varname) index(string) nq_gps(#)

Options

t_transf(transformation) normal_test(test) normal_level(#) test_varlist(varlist) test(type) flag_b(#) opt_nb(string) opt_b(varname) detail

24

our programs

Application

25

Data set by Imbens, Rubin and Sacerdote (2001);

The winners of a lottery in Massachussets:amount of the prize (treatment) Ti

earnings 6 years after winning (outcome) Yi

age, gender, education, # of tickets bought, working status, earnings before winning up to 6 Xi

Application: flogit

26

Fractional data: flogit model.

Treatment: prize/max(prize)

outcome: earnings after 6 year

family(binomial) link(logit)

Application: flogit

27

-400

00

-200

00

020

00

0

E[y

ear6

(t)]

0 .2 .4 .6 .8Treatment level


Upper bound



-100

00

-500

00

50

00

10

00

0

E[y

ear6

(t+

.1)]

-E[y

ear6

(t)]

0 .2 .4 .6 .8Treatment level


Upper bound



Application: count data

28

Count data: Poisson model.

Treatment: years of college+ high school


family(poisson) link(log)

Application: count data

29

-500

00

5000

1000

015

000

2000

0

E[y

ear

6(t)

]



Upper bound



-200

00-1

0000

010

000

E[y

ear

6(t+

1)]

-E[y

ear6

(t)]



Upper bound



Application: gamma distribution

30

Gamma distribution:

Treatment: age


family(gamma) link(log)

Application: gamma distribution

31

-500

000

5000

010

0000

1500

00

E[y

ear

6(t)

]

0 20 40 60 80Treatment level


Upper bound



-150

00-1

0000

-500

00

5000

E[y

ear

6(t+

1)]

-E[y

ear6

(t)]

0 20 40 60 80Treatment level


Upper bound



estimating the dose-response function through the glm approach

Documents

treatment tilet

actual treatment

function of ti

continuous treatment

conditional distribution

t1the econometrics

havethe econometrics

adothe econometrics