econometrics ii - uwasalipas.uwasa.fi/~sjp/teaching/ecmii/lectures/ecmiic3.pdf · background binary...

Econometrics II

Seppo Pynnonen

Department of Mathematics and Statistics, University of Vaasa, Finland

Spring 2018

Seppo Pynnonen Econometrics II

Background Binary Dependent Variable Tobit Model

Part III

Limited Dependent Variable Models

As of Jan 30, 2017Seppo Pynnonen Econometrics II


1 Background

2 Binary Dependent Variable

Linear, Logit, and Probit Regressions

The Linear Probability Model

The Logit and Probit Model

3 Tobit Model

Interpreting Tobit Estimates

Predicting with Tobit Regression

Checking Specification of Tobit Models



Limited dependent variables refer to variables whose range ofvalues is substantially restricted.

A binary variable takes only two values (0/1) is an example. Otherexamples are is a variable that takes a small number of integervalues.

Other kinds of limited variables are those whose values aretruncated for some reasons. For example, number of passengertickets in an airplane or some sports event, etc.

Note however that not all truncated cases need special treatment.An example is wage, which must be positive.

Typical truncated value variables are those that have in thelimiting value a big concentration of observations.



1 Background





3 Tobit Model






Linear, Logit, and Probit Regressions1 Background





3 Tobit Model







Up until now in regression

y = x′β + u, (1)

where x′β = β0 + β1x1 + · · ·+ βkxk , y has had quantitativemeaning (e.g. wage).

What if y indicates a qualitative event (e.g., firm has gone tobankruptcy), such that y = 1 indicates the occurrence of theevent (”success”) and y = 0 non-occurrence (”fail”), and wewant to explain it by some explanatory variables?




The meaning of the regression

y = x′β + u,

when y is a binary variable. Then, because E[u|x] = 0,

E[y |x] = x′β. (2)

Because y is a random variable that can have only values 0 or 1,we can define probabilities for y as P(y = 1|x) andP(y = 0|x) = 1− P(y = 1|x), such that

E[y |x] = 0 · P(y = 0|x) + 1 · P(y = 1|x) = P(y = 1|x).




Thus, E[y |x] = P(y = 1|x) indicates the success probability andregression in equation 2 models

P(y = 1|x) = β0 + β1x1 + · · ·+ βkxk , (3)

the probability of success. This is called the linear probabilitymodel (LPM).

The slope coefficients indicate the marginal effect of correspondingx-variable on the success probability, i.e., change in the probabilityas x changes, or

∆P(y = 1|x) = βj∆xj . (4)




In the OLS estimated model

y = β0 + β1x1 + . . . βkxk (5)

y is the estimated or predicted probability of success.

In order to correctly specify the binary variable, it may be useful toname the variable according to the ”success” category (e.g., in abankruptcy study, bankrupt = 1 for bankrupt firms andbankrupt = 0 for non-bankrupt firm [thus ”success” is just ageneric term]).




Example 1 (Married women participation in labor force (year1975))

Linear probability model (See R-snippet for the R-commands):

lm(formula = inlf ~ nwifeinc + educ + exper + I(exper^2) + age +

kidslt6 + kidsge6, data = wkng)

Residuals:

Min 1Q Median 3Q Max

-0.93432 -0.37526 0.08833 0.34404 0.99417

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 0.5855192 0.1541780 3.798 0.000158 ***

nwifeinc -0.0034052 0.0014485 -2.351 0.018991 *

educ 0.0379953 0.0073760 5.151 3.32e-07 ***

exper 0.0394924 0.0056727 6.962 7.38e-12 ***

I(exper^2) -0.0005963 0.0001848 -3.227 0.001306 **

age -0.0160908 0.0024847 -6.476 1.71e-10 ***

kidslt6 -0.2618105 0.0335058 -7.814 1.89e-14 ***

kidsge6 0.0130122 0.0131960 0.986 0.324415

---

Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1

Residual standard error: 0.4271 on 745 degrees of freedom

Multiple R-squared: 0.2642,Adjusted R-squared: 0.2573

F-statistic: 38.22 on 7 and 745 DF, p-value: < 2.2e-16




All others but kidsge6 are statistically significant with signs as might beexpected.

The coefficients indicate the marginal effects of the variables on theprobability that inlf = 1. Thus e.g., an additional year of educincreases the probability by 0.037 (other variables held fixed).

0 10 20 30 40

0.30.4

0.50.6

0.70.8

0.9

Marginal effect of experince on married women labor force participation

Experience (years)

Probab

ility

0 5 10 15

0.20.3

0.40.5

0.60.7

0.8

Marginal effect of eduction on married women labor force participation

Education (years)

Probab

ility




Some issues with associated to the LPM.

Dependent left hand side restricted to (0, 1), while right handside (−∞,∞), which may result to probability predictions lessthan zero or larger than one.

Heteroskedasticity of u, since by denotingp(x) = P(y = 1|x) = x′β

var[u|x ] = (1− p(x))p(x) (6)

which is not a constant but depends on x, and hence violatingAssumption 2.




1 Background





3 Tobit Model







The first of the above problems can be technically easily solved bymapping the linear function on the right hand side of equation (3)by a non-linear function to the range (0, 1). Such a function isgenerally called a link function.

That is, instead we write equation (3) as

P(y = 1|x) = G (x′β). (7)

Although any function G : R→ [0, 1] applies in principle, so calledlogit and probit transformations are in practice most popular (theformer is based on logistic distribution and the latter normaldistribution).

Economists favor often the probit transformation such that G isthe distribution function of the standard normal density, i.e.,

G (z) = Φ(z) =

∫ z

−∞

1√2π

e−12v2dv , (8)




In the logit tranformation

G (z) =ez

1 + ez=

1

1 + e−z=

∫ z

−∞

e−v

(1 + e−v )2dv . (9)

Both as S-shaped

−3 −1 0 1 2 3

0.00.2

0.40.6

0.81.0

Probit transformation

z

G(z)

−3 −1 0 1 2 3

0.00.2

0.40.6

0.81.0

Logit transformation

z

G(z)




The price, however, is that the interpretation of the marginaleffects is not any more as straightforward as with the LPM.

However, negative sign indicates decreasing effect on theprobability and positive increasing.

More precisely, using equation (7), the marginal change withrespect to xj (keeping others unchanged) is

∆P(y = 1|x′β) ≈ g(x′β)βj∆xj , (10)

where g is the derivative function of G(g(x′β) = (1/

√2π) exp

(−(x′β)2/2

)for probit and

g(x′β) = exp(−x′β)/ (1 + exp(−x′β))2 for logit).




Typically the marginal effects are evaluated by unit changes in xj(i.e., ∆xj = 1) at sample means of the x-variables with estimatedβ-coefficients [partial effect at the average (PEA)].

Another commonly used approach is to evaluate at the samplemean

1

n

n∑i=1

g(x′i β). (11)




There are various pseudo R-suared measures for binary responsemodels.

One is McFadden measure.

Another is squared correlation between yi s (prediceted probability)and observed yi s (which have 0/1 values).

Using R, the former can be computed as1− (residual deviance)/(null deviance),

where residual deviance is the value of the likelihood functionof the fitted model, and null deviance is the value of thelikelihood function when the intercept is included into the model.




Example 2 (Married women’s labor force . . . )

Probit: (family = binomial(link = ”probit”) in glm)

Call:

glm(formula = inlf ~ nwifeinc + educ + exper + I(exper^2) + age +

kidslt6 + kidsge6, family = binomial(link = "probit"), data = wkng)

Deviance Residuals:


-2.2156 -0.9151 0.4315 0.8653 2.4553

Coefficients:

Estimate Std. Error z value Pr(>|z|)

(Intercept) 0.2700736 0.5080782 0.532 0.59503

nwifeinc -0.0120236 0.0049392 -2.434 0.01492 *

educ 0.1309040 0.0253987 5.154 2.55e-07 ***

exper 0.1233472 0.0187587 6.575 4.85e-11 ***

I(exper^2) -0.0018871 0.0005999 -3.145 0.00166 **

age -0.0528524 0.0084624 -6.246 4.22e-10 ***

kidslt6 -0.8683247 0.1183773 -7.335 2.21e-13 ***

kidsge6 0.0360056 0.0440303 0.818 0.41350

---

Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1

Null deviance: 1029.7 on 752 degrees of freedom

Residual deviance: 802.6 on 745 degrees of freedom

AIC: 818.6

Pseudo R-square: 1 - 802.6 / 1029.7 = 0.221




Logit: (family = binomial(link = ”logit”) in glm)

glm(formula = inlf ~ nwifeinc + educ + exper + I(exper^2) + age +

kidslt6 + kidsge6, family = binomial(link = "logit"), data = wkng)

Deviance Residuals:


-2.1770 -0.9063 0.4473 0.8561 2.4032

Coefficients:


(Intercept) 0.425452 0.860365 0.495 0.62095

nwifeinc -0.021345 0.008421 -2.535 0.01126 *

educ 0.221170 0.043439 5.091 3.55e-07 ***

exper 0.205870 0.032057 6.422 1.34e-10 ***

I(exper^2) -0.003154 0.001016 -3.104 0.00191 **

age -0.088024 0.014573 -6.040 1.54e-09 ***

kidslt6 -1.443354 0.203583 -7.090 1.34e-12 ***

kidsge6 0.060112 0.074789 0.804 0.42154

---

Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1

Null deviance: 1029.75 on 752 degrees of freedom

Residual deviance: 803.53 on 745 degrees of freedom

AIC: 819.53, Pseudo R-squared: 1 - 803.53 / 1029.75 = 0.220

Qualitatively the results are similar to those of the LPM. (R exercise: create similar

graphs to those of the linear case for the marginal effects.)



1 Background





3 Tobit Model






Limited dependent variable is called a corner solution responsevariable if the variable is zero (say) for a nontrivial fraction in thepopulation but is roughly continuously distributed over positivevalues.

An example is the amount an individual is consuming alcohol in agiven month.

Nothing in principle prevents using a linear model for such a y .

The problem is that fitted values may be negative.



In cases where it is important to have a model that impliesnonnegative predicted values for y , the Tobit model is convenient.

The Tobit model (typically) expresses the observed response, y , interms of an underlying latent variable, y∗,

y∗ = x′β + u (12)

withy = max(0, y∗) (13)

and u|x ∼ N(0, σ2).



Accordingly y∗ ∼ N(x′β, σ2) and y = y∗ for y∗ ≥ 0, but y = 0 fory∗ < 0.

Given sample of observations on y , the parameters can beestimated by the method of maximum likelihood.

The log-likelihood function for observation i is

`i (β, σ2) = 1(yi = 0)× log

(1− Φ(x′iβ/σ)

)(14)

+1(yi > 0)× log

(1

σφ((yi − x′iβ)/σ

))where 1(A) is an indicator function with value 1 if the condition Ais true and zero otherwise, Φ(·) is the distribution function andφ(·) the density function of the N(0, 1) distribution.

The maximization of the log-likelihood, `(β, σ) =∑

i `i (β, σ), toobtain the ML estimates of β and σ is done by numerical methods.



Example 3 (Married women annual working hours)

Married women working hours

Hours

Frequ

ency

0 1000 2000 3000 4000 5000

050

100

150

200

250

300



OLS resultslm(formula = hours ~ nwifeinc + educ + exper + I(exper^2) + age +

kidslt6 + kidsge6, data = wkng)

Residuals:


-1511.3 -537.8 -146.9 538.1 3555.6

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 1330.4824 270.7846 4.913 1.10e-06 ***

nwifeinc -3.4466 2.5440 -1.355 0.1759

educ 28.7611 12.9546 2.220 0.0267 *

exper 65.6725 9.9630 6.592 8.23e-11 ***

I(exper^2) -0.7005 0.3246 -2.158 0.0312 *

age -30.5116 4.3639 -6.992 6.04e-12 ***

kidslt6 -442.0899 58.8466 -7.513 1.66e-13 ***

kidsge6 -32.7792 23.1762 -1.414 0.1577

---

Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1

Residual standard error: 750.2 on 745 degrees of freedom

Multiple R-squared: 0.2656,Adjusted R-squared: 0.2587

F-statistic: 38.5 on 7 and 745 DF, p-value: < 2.2e-16



Tobit regressionvglm(formula = hours ~ nwifeinc + educ + exper + I(exper^2) +

age + kidslt6 + kidsge6, family = tobit(Lower = 0), data = wkng)

Pearson residuals:


mu -8.429 -0.8331 -0.1352 0.8136 3.494

loge(sd) -0.994 -0.5814 -0.2366 0.2150 11.893

Coefficients:


(Intercept):1 965.28507 443.93450 2.174 0.029676 *

(Intercept):2 7.02289 0.03589 195.682 < 2e-16 ***

nwifeinc -8.81433 4.48480 -1.965 0.049371 *

educ 80.64715 21.56529 3.740 0.000184 ***

exper 131.56501 17.01343 7.733 1.05e-14 ***

I(exper^2) -1.86417 0.52992 -3.518 0.000435 ***

age -54.40524 7.34462 -7.408 1.29e-13 ***

kidslt6 -894.02622 111.46120 -8.021 1.05e-15 ***

kidsge6 -16.21577 38.48134 -0.421 0.673468

---

Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1

Number of linear predictors: 2

Names of linear predictors: mu, loge(sd)

Log-likelihood: -3819.095 on 1497 degrees of freedom

Number of iterations: 6



(Intercept):2 is an extra statistic related to residual standarddeviation.

OLS generally results to biased estimation due to the censored y -values.Tobit regression accounts the biasing effect.

However, we should make some adjustments to the Tobit coefficients

before interpreting the magnitudes, as discussed below.



Interpreting Tobit Estimates1 Background





3 Tobit Model







Because y∗ ∼ N(x′β, σ2) and y = y∗ for y∗ > 0 and y = 0 fory∗ < 0, we have P(y > 0|x) = 1− Φ(−x′β/σ) = Φ(x′β), suchthat E[y |x ] in (17) becomes

E[y |x ] = Φ(x′β/σ)E[y |y > 0, x] . (18)

To obtain E[y |y > 0, x] we can use the general result forz ∼ N(0, 1): For any c

E[z |z > c] = φ(c)/ (1− Φ(c))

from which we obtain, by noting that y = x′β + u andE[y |y > 0, x] = x′β + E[u|u > −x′β],

E[y |y > 0, x] = x′β + σφ(xβ/σ), (19)

where φ(c) = φ(c)/Φ(c) [note: φ(−c) = φ(c) and1− Φ(−c) = Φ(c)].




Thus the marginal contribution of xj to the (conditional)expectation is

∂

∂xjE[y |y > 0, x] = βj + βj φ

′(x′β), (20)

where φ′(·) is the derivative of φ(·).

Because for standard normal distributionφ′(z) = dφ(z)/dz = −zφ(z) and Φ′(z) = dΦ(z)/dz = φ(z), weget finally

∂

∂xjE[y |y > 0, x] = βj

(1− φ(x′β/σ)

(x′β/σ + φ(x′β/σ)

)). (21)




Equation (21) shows that the βj does not exactly reflect themarginal effect of xj on E[y |y > 0, x].

It becomes adjusted by the factor(1− φ(x′β/σ)

(x′β/σ + φ(x′β/σ)

)).

The marginal effect of xj on E[y |x]:

Combining equations (17) and (19), we have

E[y |x] = Φ(x′β/σ)x′β + σφ(x′β), (22)

where we have used the result Φ(z)φ(z) = φ(z).




From equation (17) we can compute the marginal effect of xj byutilizing φ′(z) = −zφ(z), so that

∂

∂xjE[y |x ] = βjΦ(x′β/σ) + βjφ(x′β/σ)x′β − βjφ(x′β)x′β

= βjΦ(x′β/σ). (23)

Again β becomes adjusted to some extend (causing difference fromOLS).

After estimating β and σ, Φ(x′β/σ) is often evaluated at themean n−1

∑i Φ(x′i β/σ).



Predicting with Tobit Regression1 Background





3 Tobit Model







Predicteions of E[y |x ] in equation (22) can be obtained byreplacing the parameters by their estimates

y = Φ(x′β/σ)x′β + σφ(x′β/σ), (24)

where Φ is the standard normal cumulative distribution functionand φ the standard normal density function (derivative function ofΦ).

Exercise: Using R, plot the predicted values for working hours as a

function of education (educ) when the other explanatory are set to their

means (for a solution, see R snippet for Example 3 on the course home

page).




Remark 1

In OLS the R-square is the correlation of the observed values with thepredicted values.

Using this practice, one can compute an R-square for a Tobit model as

well.

For the OLS solution, R2 = 0.258.

Saving the R vglm results into an object (above wkh.tbt), the predictedvalues can be extracted with the fitted() function.

In R S4 object the sub-objects are called slots. The observed dependentvalues are in slot @y, i.e., in our case wkh.tbt@y.

Thus, for the Tobit model command cor(wkh.tbt@y, fitted(wkh))2

produces R2 = 0.261, which is close to that of OLS.



Checking Specification of Tobit Models1 Background





3 Tobit Model







If we introduce a dummy variable w = 0 when y = 0 and w = 1 if y > 0,then E[w |x] = P[w = 1|x] = Φ(x′β/σ) is the probit model.

Accordingly, if the Tobit model holds, we can expect that the (scaled)Tobit slope estimate βj/σ of xj should be fairly close to that of probitestimate γj .

Comparing closeness of the slope coefficients can be used as an informal

specification check of appropriateness of the Tobit model.

=================================

Tobit/sigma Probit

---------------------------------

(Intercept):1 0.8603 0.2701

nwifeinc -0.0079 -0.0120

educ 0.0719 0.1309

exper 0.1173 0.1233

I(exper^2) -0.0017 -0.0019

age -0.0485 -0.0529

kidslt6 -0.7968 -0.8683

kidsge6 -0.0145 0.0360 (Insignificant in both models)

=================================

The (scaled) slope coefficients of the Tobit model are fairly close to those

of the probit model, suggesting appropriateness of the Tobit model.


econometrics ii - uwasalipas.uwasa.fi/~sjp/teaching/ecmii/lectures/ecmiic3.pdf · background binary...

Documents