understanding and interpreting results from logistic, multinomial, and ordered logistic

23
Understanding and Interpreting Results from Logistic, Multinomial, and Ordered Logistic Regression Models: Using Post-Estimation Commands in Stata Raymond Sin-Kwok Wong University of California-Santa Barbara

Upload: others

Post on 09-Feb-2022

14 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Understanding and Interpreting Results from Logistic, Multinomial, and Ordered Logistic

Understanding and Interpreting Results from Logistic, Multinomial, and Ordered Logistic Regression Models: Using Post-Estimation

Commands in Stata

Raymond Sin-Kwok Wong

University of California-Santa Barbara

Page 2: Understanding and Interpreting Results from Logistic, Multinomial, and Ordered Logistic

Model Estimation and Interpretation

• For OLS models, both model estimation and interpretation are relatively easily, since the effects are linear.

• For non-linear models, model estimation is simple but the interpretation of results can be tricky, especially for beginners who are not familiar with the non-linear relationship between dependent and independent variables.

Page 3: Understanding and Interpreting Results from Logistic, Multinomial, and Ordered Logistic

What my talk is about?

– Not about the rationale for statistical modeling or the mathematical and statistical derivation of specific non-linear models

– But about a set of post-estimation tools that would aid understanding and interpretation and the presentation of complex relationship among variables using graphical display

Page 4: Understanding and Interpreting Results from Logistic, Multinomial, and Ordered Logistic

Alternative Output Methods• (a) Display odds-ratios rather than logit coefficients

logit y x1 x2 x3 x4 x5

logit, or

Page 5: Understanding and Interpreting Results from Logistic, Multinomial, and Ordered Logistic

Alternative Output Methods• (b) Use LISTCOEFlistcoef [varlist] [,pvalue(#) [factor|percent|std] constant help]

Factor: factor changes in the odds or expected counts

Percent: % change in the odds or expected countsStd: Standardized coefficients

==============================================================Option

std factor percent

-----------------------------------------------------------------------------------------------------------Type 1: regress, probit, cloglog, Default No No

oprobit, tobit, cnreg, intreg

Type 2: logit, logistic, ologit Yes Default YesType 3: clogit, mlogit, poisson, No Default Yes

nbreg, zip, zinb==============================================================

Page 6: Understanding and Interpreting Results from Logistic, Multinomial, and Ordered Logistic

Alternative Output Methods

• Different standardized coefficients– x-standardized (bStdX)

• For a standard deviation increase inxk, y is expected to change by βk

Sx units, holding everything constant

– y-standardized (bStdY)• For a unit increase in xk, y is expected to change by βk

Sy

standard deviations, holding everything constant

– Fully standardized (bStdXY)• For a standard deviation increase in xk, y is expected to change

by βkS units, holding everything constant

Page 7: Understanding and Interpreting Results from Logistic, Multinomial, and Ordered Logistic

Post-Estimation Tests

regress y x1 x2 … xk

estimates store mod1

What is the use? For post-estimation analysis

Two kind of tests are common:

(a) Wald test, and

(b) LR tests

Page 8: Understanding and Interpreting Results from Logistic, Multinomial, and Ordered Logistic

Wald Test

test varlist, [accumulate]

For example,(a) regress y x1 x2 x3 x4 x5

test x1 x2 x3 x4 x5

This tests for the H0: β1 = β2 = β3 = β4 = β5 = 0

(b) test x1=2x2

test x3=x4, accumulate

This tests for the H0: β1 = 2β2 and β3 = β4

Page 9: Understanding and Interpreting Results from Logistic, Multinomial, and Ordered Logistic

LR (Likelihood-Ratio) Tests

lrtest [, saving(name) using(name) model(name) df(#) ]

For example,(a) logit chd age age2 sex estimate saturated model

(b) lrtest, saving(0) save results

(c) logit chd age sex estimate simpler model

(d) lrtest obtain test

(e) lrtest, saving(1) save results as 1

(f) logit chd sex estimate simplest model

(g) lrtest compare to saturated model

(h) lrtest, using(1) compare to model 1

(i) lrtest, model(1) repeat earlier test

Page 10: Understanding and Interpreting Results from Logistic, Multinomial, and Ordered Logistic

• . logit died studytime age drug

• Logit estimates Number of obs = 48

• LR chi2(3) = 13.67

• Prob > chi2 = 0.0034

• Log likelihood = -24.364293 Pseudo R2 = 0.2191

• ------------------------------------------------------------------------------

• died | Coef. Std. Err. z P>|z| [95% Conf. Interval]

• -------------+----------------------------------------------------------------

• studytime | -.0236468 .0457671 -0.52 0.605 -.1133487 .0660551

• age | .0793438 .0699391 1.13 0.257 -.0577343 .2164219

• drug | -1.150009 .5549529 -2.07 0.038 -2.237697 -.0623212

• _cons | -1.113136 3.945369 -0.28 0.778 -8.845918 6.619645

• ------------------------------------------------------------------------------

• . lrtest, saving(0)

Page 11: Understanding and Interpreting Results from Logistic, Multinomial, and Ordered Logistic

• . logit died studytime age

• Iteration 0: log likelihood = -31.199418

• Iteration 1: log likelihood = -26.82757

• Iteration 2: log likelihood = -26.734502

• Iteration 3: log likelihood = -26.734061

• Iteration 4: log likelihood = -26.734061

• Logit estimates Number of obs = 48

• LR chi2(2) = 8.93

• Prob > chi2 = 0.0115

• Log likelihood = -26.734061 Pseudo R2 = 0.1431

• ------------------------------------------------------------------------------

• died | Coef. Std. Err. z P>|z| [95% Conf. Interval]

• -------------+----------------------------------------------------------------

• studytime | -.0843475 .0353784 -2.38 0.017 -.153688 -.015007

• age | .0518897 .0646409 0.80 0.422 -.0748042 .1785836

• _cons | -.87332 3.729449 -0.23 0.815 -8.182906 6.436266

• ------------------------------------------------------------------------------

• . lrtest

• Logit: likelihood-ratio test chi2(1) = 4.74

• Prob > chi2 = 0.0295

• . lrtest, saving(1)

Page 12: Understanding and Interpreting Results from Logistic, Multinomial, and Ordered Logistic

• . logit died age

• Iteration 0: log likelihood = -31.199418

• Iteration 1: log likelihood = -29.955649

• Iteration 2: log likelihood = -29.945382

• Iteration 3: log likelihood = -29.945379

• Logit estimates Number of obs = 48

• LR chi2(1) = 2.51

• Prob > chi2 = 0.1133

• Log likelihood = -29.945379 Pseudo R2 = 0.0402

• ------------------------------------------------------------------------------

• died | Coef. Std. Err. z P>|z| [95% Conf. Interval]

• -------------+----------------------------------------------------------------

• age | .0893535 .0585925 1.52 0.127 -.0254857 .2041928

• _cons | -4.353928 3.238757 -1.34 0.179 -10.70177 1.993919

• ------------------------------------------------------------------------------

• . lrtest

• Logit: likelihood-ratio test chi2(2) = 11.16

• Prob > chi2 = 0.0038

• . lrtest, using(1)

• Logit: likelihood-ratio test chi2(1) = 6.42

• Prob > chi2 = 0.0113

Page 13: Understanding and Interpreting Results from Logistic, Multinomial, and Ordered Logistic

Fit Statistics

• fitstat calculates a large number of fit statistics for many kinds of regression models. It works after the following: clogit, cnreg, cloglog,intreg, logistic, logit, mlogit, nbreg, ocratio, ologit, oprobit, poisson,probit, regress, zinb, and zip. With the saving() and using() options, it can also be used to compare fit measures for two different models.

fitstat [, saving(name) using(name) bic force save dif]

Examples:

(a) logit y x1 x2 … x10

Fitstat

Page 14: Understanding and Interpreting Results from Logistic, Multinomial, and Ordered Logistic

Fit statistics

(b) To compute, save, and compare with other modelsLogit y x1 x2 x3 x4 x5 age

Quietly fitstat, saving(mod1)

Generate age2=age*age

Logit y x1 x2 x3 x4 x5 age age2

Fitstat, using(mod1)

Page 15: Understanding and Interpreting Results from Logistic, Multinomial, and Ordered Logistic

Post-Estimation Approach to Interpret Non-

Linear Regression Models

• For non-linear regression models, the interpretation of individual coefficients do not have the simple linear relationship. For example, the beta coefficient in a logistic regression model can only be interpreted as the logit coefficient. If we want to interpret the model in terms of predicted probability, the effect of a change in a variable depends on the values of all variables in the model. Or to put it differently, it depends on where we evaluate the effect.

Page 16: Understanding and Interpreting Results from Logistic, Multinomial, and Ordered Logistic

Post-Estimation Approach to Interpret Non-Linear Regression Models

• (1) Use predict command

regress y x1 x2 x3 generate predicted-y

predict

logit y x1 x2 x3 generate predicted P(Y=1)

predict

ologit y x1 x2 x3 generate predicted P(Y=k)

predict

mlogit y x1 x2 x3 generate predicted P(Y=k)

predict

poission y x1 x2 x3 generate predicted count

Page 17: Understanding and Interpreting Results from Logistic, Multinomial, and Ordered Logistic

Post-Estimation Approach to Interpret Non-Linear Regression Models

• (2) Use prchange command to compute discrete and marginal changes in the predicted outcomes

prchange [varlist] [if exp] [in range] [,x(variables_and_values) rest(stat) outcome(#)fromto brief nobase nolabel help all uncentereddelta(#) ]

Examples:(a) prchange age, x(x1=20 x2=10) rest(mean) help

(b) prchange, help

(c) prchange x1 x2, fromto

This will calculate x=min to max, 0 to 1, -.5 to .5, -.5 sd to .5 sd, and marginal effect

Page 18: Understanding and Interpreting Results from Logistic, Multinomial, and Ordered Logistic

Post-Estimation Approach to Interpret Non-Linear Regression Models

• (3) Use prvalue command to calculate the change in probability for a discrete change for any magnitudes in an independent variable.

prvalue[if exp] [in range] [,(variables_and_values) ][rest(stat)] [level(#)][save][dif][brief][all][maxcnt#)][nobase] [nolabel][ystar]

Examples:(a) prvalue, rest(median)

(b) prvalue, x(age=30) save brief

prvalue, x(age=40) dif brief(c) prvalue age, x(age=30) uncentered delta(10)

rest(mean) brief

This will generate a change in probability (P(Y=1)) from age 30 to age 40.

Page 19: Understanding and Interpreting Results from Logistic, Multinomial, and Ordered Logistic

Post-Estimation Approach to Interpret Non-Linear Regression Models

• (4) Use prgen command to compute predicted values as one variable changes over a range of values, which is useful for constructing plots. The syntax is:

prgen varname, generate(newvar)[from(#) to(#) ncases (#)]

[x(variables_and_values)][rest(stat)][maxcnt (#)][brief][nobase][all]

ExamplesTo compute predicted values from an ordered probit where warm has fourcategories SD, D, A and SA:. oprobit warm yr89 male white age ed prst. prgen age, f(20) t(80) gen(mn)

. prgen age, x(male=0) rest(grmean) f(20) t(80) gen(fe m)

. prgen age, x(male=1) rest(grmean) f(20) t(80) gen(ma l)

To plot the predicted probabilites for average males:. graph malp1 malp2 malp3 malp4 malX

Page 20: Understanding and Interpreting Results from Logistic, Multinomial, and Ordered Logistic

Post-Estimation Approach to Interpret Non-Linear Regression Models

Models and Predictions - * is the prefix all models:

*X: value of X

logit & probit:

Predicted probability of each outcome: *p0, *p1

ologit, oprobit

Predicted probabilities: *p#1,*p#2,... where #1,#2,... are values of the outcome variable.

Cumulative probabilities: *s#1,*s#2,... where #1,#2,... are of the outcome variable. *s#k is the probability of all categories up to or equal to #k.

mlogit:

Predicted probabilities: *p#1,*p#2,... where #1,#2,... are values of the outcome variable.

Page 21: Understanding and Interpreting Results from Logistic, Multinomial, and Ordered Logistic

Post-Estimation Approach to Interpret Non-Linear Regression Models

• (5) Use prtab command to construct a table of predicted values for all combinations of up to three variables. The syntax is:

prtab rowvar [colvar [supercolvar]] [if exp] [in rang e], [by(superrowvar)][x(variables_and_values)][rest(sta t)] [outcome(string)][brief][nobase][nolabel][novarlbl] [all]

Examples:

(a) probit faculty female fellow phd mcit3 mnas

prtab female fellow mnas

(b) ologit jobclass female fellow pub1 phd

prtab female fellow, x(phd=min)

(c) logit died female race age educ

prtab female race educ

Page 22: Understanding and Interpreting Results from Logistic, Multinomial, and Ordered Logistic

Post-Estimation Approach to Interpret Non-Linear Regression Models

• (6) Use mfx compute command to compute numerically calculates the marginal effects or the elasticities and their standard errors after estimation. Exactly whatmfx can calculate is determined by the previous estimation command and the predict() option. At which points the marginal effects or elasticities are to be evaluated is determined by the at() option. By default, mfx calculates the marginal effects or elasticities at the means of the independent variables by the default prediction option associated with the preceding estimation command.

Page 23: Understanding and Interpreting Results from Logistic, Multinomial, and Ordered Logistic

Post-Estimation Approach to Interpret Non-Linear Regression Models

Examples(a) logit foreign mpg price

mfx compute

mfx, at(mpg = 20, price = 6000)

mfx compute, predict(xb)

mfx replay, level(90)

(b) mlogit rep78 mpg displ, nolog

mfx compute, predict(outcome(1))

(c) regress mpg length weight

mfx compute, eyex