logit, probit and tobit: models for categorical and ...logit, probit and tobit: models for...

Logit, Probit and Tobit:Models for Categorical and Limited

Dependent VariablesDependent Variables

By Rajulton FernandoPresented at

PLCS/RDC Statistics and Data Series at WesternMarch 23 2011March 23, 2011

Introduction• In social science research categorical data are often• In social science research, categorical data are often

collected through surveys. – Categorical Nominal and Ordinal variablesg– They take only a few values that do NOT have a metric.

• A) Binary Case) y• Many dependent variables of interest take only two

values (a dichotomous variable), denoting an event or non-event and coded as 1 and 0 respectively. Some examples:– The labor force status of a person.

– Voting behavior of a person (in favor of a new policy).– Whether a person got married or divorced.– Whether a person involved in criminal behaviour, etc.

Introduction• With such variables we can build models that• With such variables, we can build models that

describe the response probabilities, say P(yi = 1), of the dependent variable yi.p yi– For a sample of N independently and identically distributed

observations i = 1, ... ,N and a (K+1)-dimensional vector x′if l t i bl th b bilit th t t k lof explanatory variables, the probability that y takes value

1 is modeled as )()()|1( iiii zFxFxyP =′== βwhere β is a (K + 1)-dimensional column vector of parameters.

• The transformation function F is crucial. It maps the linear combination into [0,1] and satisfies in general F(−∞) = 0 F(+∞) = 1 and δF(z)/δz > 0 [that is it is aF(−∞) = 0, F(+∞) = 1, and δF(z)/δz > 0 [that is, it is a cumulative distribution function].

The Logit and Probit Models• When the transformation function F is the logistic• When the transformation function F is the logistic

function, the response probabilities are given by βixe ′

• And, when the transformation function F is the

βi

i

xiie

exyP ′+==

1)|1(

And, when the transformation function F is the cumulative density function (cdf) of the standard normal distribution, the response probabilities are given by

∫∫′

∞−

−′

∞−

=Φ=′Φ==ββ

πβ

ii x sx

iii dsedssxxyP2

21

21)()()|1(

• The Logit and Probit models are almost identical (see the Figure next slide) and the choice of the model is

bi l h h l i d l h iarbitrary, although logit model has certain advantages (simplicity and ease of interpretation)

Source: J.S. Long, 1997

The Logit and Probit Models• However the parameters of the two models are• However, the parameters of the two models are

scaled differently. The parameter estimates in a logistic regression tend to be 1.6 to 1.8 times higher g g gthan they are in a corresponding probit model.

• The probit and logit models are estimated by p g ymaximum likelihood (ML), assuming independence across observations. The ML estimator of β is

i d i ll ll di ib dconsistent and asymptotically normally distributed. However, the estimation rests on the strong assumption that the latent error term is normallyassumption that the latent error term is normally distributed and homoscedastic. If homoscedasticity is violated, no easy solution. , y

The Logit and Probit Models• Note: The response function (logistic or probit) is an• Note: The response function (logistic or probit) is an

S-shaped function, which implies a fixed change in Xhas a smaller impact on the probability when it is p p ynear zero than when it is near the middle. Thus, it is a non-linear response function.

• How to interpret the coefficients : In both models,If b > 0 p increases as X increasesIf b < 0 p decreases as X increasesIf b < 0 p decreases as X increases

– As mentioned above, b cannot be interpreted as a simple slope as in ordinary regression. Because the rate at which the curve ascends or descends changes according to the value of X.

– In other words, it is not a constant change as in ordinary , g yregression. The greatest rate of change is at p = 0.5

In the logit model we can interpret b as an effectThe Logit and Probit Models

– In the logit model, we can interpret b as an effect on the odds. That is, every unit increase in Xresults in a multiplicative effect of eb on the odds.p

Example: If b = 0.25, then e.25 = 1.28. Thus, when Xchanges by one unit, p increases by a factor of 1.28, or changes by 28%.

- In the probit model, use the Z-score terminology. F it i i X th Z ( thFor every unit increase in X, the Z-score (or the Probit of “success”) increases by b units. [Or, we can also say that an increase in X changes Z by bcan also say that an increase in X changes Z by bstandard deviation units.]

- If you like, you can convert the z-score to probabilities y , y pusing the normal table.

Models for Polytomous Data• B) Polytomous CaseB) Polytomous Case

– Here we need to distinguish between purely nominal variables and really ordinal variables.nominal variables and really ordinal variables.

– When the variable is purely nominal, we can extend the dichotomous logit model, using one of g , gthe categories as reference and modeling the other responses j=1,2,..m-1 compared to the reference.

• Example: In the case of 3 categories, using the 3rd category as the reference, logit p1 = ln(p1/p3) and logit p2 = ln(p2/p3), which will give two sets of parameter estimates.g p

)exp()2(

)exp()exp(1)exp()1(

2

21

1

xyP

xxxyP

βββ

β

==

++==

)exp()exp(11)3(

)exp()exp(1)2(

21

21

xxyP

xxyP

ββ

ββ

++==

++

Polytomous Case– When the variable is really ordinal we use cumulativeWhen the variable is really ordinal, we use cumulative

logits (or probits). The logits in this model are for cumulative categories at each point, contrasting categories above with categories below.

– Example: Suppose Y has 4 categories; then,• logit (p ) ln{p / (1 p )} a + bX• logit (p1) = ln{p1 / (1-p1)} = a1 + bX• logit (p1 + p2) = ln{(p1+ p2 )/(1-p1 – p2)} = a2 + bX• logit (p1+p2+p3) = ln{(p1+ p2 + p3 )/(1-p1–p2–p3)} = a3 + bX

– Since these are cumulative logits, the probabilities are attached to being in category j and lower.

– Since the right side changes only in the intercepts, and not in the slope coefficient, this model is known as Proportional odds model Thus in ordered logistic weProportional odds model. Thus, in ordered logistic, we need to test the assumption of proportionality as well.

Ordinal Logistic– a1 a2 a3 are the “intercepts” that satisfy the propertya1, a2, a3 … are the intercepts that satisfy the property

a1 < a2 < a3… interpreted as “thresholds” of the latent variable.

– Interpretation of parameter estimates depends on the software used! Check the software manual.

If the RHS a + bX a positi e coefficient is associated• If the RHS = a + bX, a positive coefficient is associated more with lower order categories and a negative coefficient is associated more with higher order categories.

• If the RHS = a – bX, a negative coefficient is more associated with lower ordered categories and a positiveassociated with lower ordered categories, and a positive coefficient is more associated with higher ordered categories.

Model for Limited Dependent Variable• C) Tobit Model• C) Tobit Model• This model is for metric dependent variable and

when it is “limited” in the sense we observe it only ifwhen it is limited in the sense we observe it only if it is above or below some cut off level. For example,– the wages may be limited from below by the minimum g y y

wage – The donation amount give to charity– “Top coding” income at, say, at $300,000– Time use and leisure activity of individuals

Extramarital affairs– Extramarital affairs• It is also called censored regression model. Censoring

can be from below or from above, also called left andcan be from below or from above, also called left and right censoring. [Do not confuse the term “censoring” with the one used in dynamic modeling.]

The Tobit Model• The model is called Tobit because it was first proposed• The model is called Tobit because it was first proposed

by Tobin (1958), and involves aspects of Probit analysis –a term coined by Goldberger for Tobin’s Probit.

• Reasoning behind: – If we include the censored observations as y = 0, the

d b i h l f ill ll d h d fcensored observations on the left will pull down the end of the line, resulting in underestimates of the intercept and overestimates of the slope. p

– If we exclude the censored observations and just use the observations for which y>0 (that is, truncating the sample), it will overestimate the intercept and underestimate theit will overestimate the intercept and underestimate the slope.

– The degree of bias in both will increase as the number of gobservations that take on the value of zero increases. (see Figure next slide)

Source: J.S. Long

The Tobit Model• The Tobit model uses all of the information• The Tobit model uses all of the information,

including info on censoring and provides consistent estimates.

• It is also a nonlinear model and similar to the probit model. It is estimated using maximum likelihood gestimation techniques. The likelihood function for the tobit model takes the form:

• This is an unusual function, it consists of two terms, the first for non-censored observations (it is the pdf),

d th d f d b ti (it i th df)and the second for censored observations (it is the cdf).

The Tobit Model• The estimated tobit coefficients are the marginal• The estimated tobit coefficients are the marginal

effects of a change in xj on y*, the unobservable latent variable and can be interpreted in the same way as in a p ylinear regression model.

• But such an interpretation may not be useful since we are interested in the effect of X on the observable y (or change in the censored outcome).

It b h th t h i i f d b lti l i– It can be shown that change in y is found by multiplying the coefficient with Pr(a<y*<b), that is, the probability of being uncensored. Since this probability is a fraction, the marginal effect is actually attenuated.

– In the above, a and b denote lower and upper censoring points For example in left censoring the limits will be:points. For example, in left censoring, the limits will be:

a =0, b=+∞.

Illustrations for logit, probit and tobit models, using womenwk.dta from Baum available at http://www.stata-press.com/data/imeus/womenwk.dta

Descriptive Statistics

N Minimum Maximum Mean Std. Deviation

age 2000 20 59 36.21 8.287

education 2000 10 20 13.08 3.046

married 2000 0 1 .67 .470

children 2000 0 5 1.64 1.399

wagefull 2000 -1.68 45.81 21.3118 7.01204

wage 1343 5.88 45.81 23.6922 6.30537

lw 1343 1.77 3.82 3.1267 .28651

work 2000 0 1 .67 .470

lwf 2000 .00 3.82 2.0996 1.48752

Valid N (listwise) 1343 Binary Logistic Regression

Model Summary

Step

-2 Log likelihood

Cox & Snell R

Square

Nagelkerke R

Square

1 2055.829a .212 .295

a. Estimation terminated at iteration number 5 because

parameter estimates changed by less than .001.

Hosmer and Lemeshow Test

Step Chi-square df Sig.

1 6.491 8 .592

Variables in the Equation

B S.E. Wald df Sig. Exp(B)

Step 1a age .058 .007 64.359 1 .000 1.060

education .098 .019 27.747 1 .000 1.103

married .742 .126 34.401 1 .000 2.100

children .764 .052 220.110 1 .000 2.148

Constant -4.159 .332 156.909 1 .000 .016

a. Variable(s) entered on step 1: age, education, married, children.

Binary Probit Regression (in SPSS, use the ordinal regression menu and select probit link function. Ignore the test of parallel lines, etc.)

Model Fitting Information

Model -2 Log

Likelihood Chi-Square df Sig.

Intercept Only 1645.024 Final 1166.702 478.322 4 .000

Link function: Probit.

Parameter Estimates

Estimate Std. Error Wald df Sig.

95% Confidence Interval

Lower Bound Upper Bound

Threshold [work = 0] 2.037 .209 94.664 1 .000 1.626 2.447

Location age .035 .004 67.301 1 .000 .026 .043

education .058 .011 28.061 1 .000 .037 .080

children .447 .029 243.907 1 .000 .391 .503

[married=0] -.431 .074 33.618 1 .000 -.577 -.285

[married=1] 0a . . 0 . . .

Link function: Probit.

a. This parameter is set to zero because it is redundant.

Tobit regression cannot be done in SPSS. Use Stata. Here are the Stata commands. First, fit simple OLS Regression of the variable lwf (just to check) . regress lwf age married children education Source | SS df MS Number of obs = 2000 -------------+------------------------------ F( 4, 1995) = 134.21 Model | 937.873188 4 234.468297 Prob > F = 0.0000 Residual | 3485.34135 1995 1.74703827 R-squared = 0.2120 -------------+------------------------------ Adj R-squared = 0.2105 Total | 4423.21454 1999 2.21271363 Root MSE = 1.3218 ------------------------------------------------------------------------------ lwf | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- age | .0363624 .003862 9.42 0.000 .0287885 .0439362 married | .3188214 .0690834 4.62 0.000 .1833381 .4543046 children | .3305009 .0213143 15.51 0.000 .2887004 .3723015 education | .0843345 .0102295 8.24 0.000 .0642729 .1043961 _cons | -1.077738 .1703218 -6.33 0.000 -1.411765 -.7437105 ------------------------------------------------------------------------------ . tobit lwf age married children education, ll(0)

Tobit regression Number of obs = 2000 LR chi2(4) = 461.85 Prob > chi2 = 0.0000 Log likelihood = -3349.9685 Pseudo R2 = 0.0645 ------------------------------------------------------------------------------ lwf | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- age | .052157 .0057457 9.08 0.000 .0408888 .0634252 married | .4841801 .1035188 4.68 0.000 .2811639 .6871964 children | .4860021 .0317054 15.33 0.000 .4238229 .5481812 education | .1149492 .0150913 7.62 0.000 .0853529 .1445454 _cons | -2.807696 .2632565 -10.67 0.000 -3.323982 -2.291409 -------------+---------------------------------------------------------------- /sigma | 1.872811 .040014 1.794337 1.951285 ------------------------------------------------------------------------------ Obs. summary: 657 left-censored observations at lwf<=0 1343 uncensored observations 0 right-censored observations . mfx compute, predict(pr(0,.)) Marginal effects after tobit y = Pr(lwf>0) (predict, pr(0,.)) = .81920975 ------------------------------------------------------------------------------ variable | dy/dx Std. Err. z P>|z| [ 95% C.I. ] X ---------+-------------------------------------------------------------------- age | .0073278 .00083 8.84 0.000 .005703 .008952 36.208 married*| .0706994 .01576 4.48 0.000 .039803 .101596 .6705 children | .0682813 .00479 14.26 0.000 .058899 .077663 1.6445 educat~n | .0161499 .00216 7.48 0.000 .011918 .020382 13.084 ------------------------------------------------------------------------------ (*) dy/dx is for discrete change of dummy variable from 0 to 1 . mfx compute, predict(e(0,.)) Marginal effects after tobit y = E(lwf|lwf>0) (predict, e(0,.)) = 2.3102021 ------------------------------------------------------------------------------ variable | dy/dx Std. Err. z P>|z| [ 95% C.I. ] X ---------+-------------------------------------------------------------------- age | .0314922 .00347 9.08 0.000 .024695 .03829 36.208 married*| .2861047 .05982 4.78 0.000 .168855 .403354 .6705 children | .2934463 .01908 15.38 0.000 .256041 .330852 1.6445 educat~n | .0694059 .00912 7.61 0.000 .051531 .087281 13.084 ------------------------------------------------------------------------------ (*) dy/dx is for discrete change of dummy variable from 0 to 1

logit, probit and tobit: models for categorical and ...logit, probit and tobit: models for...

Documents