Download - Other Regression Stuff you might come across. Outline Generalized Linear Models Logistic regression and Related Survival analysis Generalized Additive

Other Regression

Stuff you might come across

Outline

Generalized Linear Models Logistic regression and Related Survival analysis Generalized Additive Models Time series

Quantile regression Polynomial Regression Multilevel modeling Multiple outcome (Multivariate) regression Summary

Why Logistic Regression?

Oftentimes the theoretical question involves an aspect of human behavior that can in some way be categorized It is one thing to use a t-test (or multivariate counterpart) to say groups are different,

however it may be the research goal to predict group membership Clinical/Medical context

Schizophrenic or not Clinical depression or not Cancer or not

Social/Cognitive context Vote yes or no Preference A over B Graduate or not

While typically used for dichotomous outcomes (binomial logistic regression), it can be extended to a polytomous one (multinomial logistic regression)

The question of whether to use a categorical variable as predictor or dependent variable is often arbitrary outside of theory, though in some cases the implied causal flow or other considerations would suggest one or the other In the examples above, we could easily turn them into t-test type analyses comparing e.g.

those who voted one way vs. the other However, it would not make theoretical (causal) sense to suggest that personality traits predict

gender, though the computer program won’t stop you

X2

X1

X3

X4

Categorical Y

Basic Model (Same as MR)

Logistic Regression

Why not just use regular regression? With normal regression with continuous

outcome we predicted the expected value of Y (i.e. its mean) given a specific value of X E(Yi|X=xi)

In the population, the conditional average is just the proportion of those who score a ‘1’, and yet even though our range is [0,1], predicted values from an OLS approach would fall outside of that range

Furthermore, if we tried to fit a regular regression assumptions for it would be untenable

We could constrain our outcomes to not go beyond the range but this still would be unsatisfactory

Logistic Regression

A cumulative probability distribution satisfies the 0,1 range problem, so if we can map our linear combination of predictors to something like this and think in terms of predicted probabilities, we can have our linear cake and eat it too

This is essentially what is going on in logistic regression You may also come across a ‘probit’ model

instead of a logit model, that is one that uses the cumulative normal distribution as its basis They are largely indistinguishable in practice,

though the logistic is computationally more easy and its ‘log odds’ interpretation more clear

Logistic Regression

We have all the same concerns here though logistic regression has fewer restrictive assumptions

The only “real” limitation with logistic regression is that the outcome must be nominal

Ratio of cases to variables: using discrete variables requires that there are enough responses in every given category to allow for reasonable estimation of parameters/predictive power Due to the maximum likelihood approach, some suggest even 50 cases per predictor

as a rule of thumb Also, the greater discrepancy among groups, the less likely the analysis will improve

notably over guessing If you had a 9:1 ratio, one could just guess the majority category and be correct the vast majority

of the time Linearity in the logit – the predictors should have a linear relationship with the

logit form of the DV There is no assumption about the predictors being linearly related to each other

Absence of collinearity among predictors No outliers Independence of errors

Assumes categories are mutually exclusive

Significance Test: Log-Likelihood (LL) 2 test between Model (M) with predictors + intercept, vs. Intercept (I) only model. If Likelihood 2 test is significant, the model with additional predictors is best.

Goodness-of-fit statistics help you to determine whether the model adequately describes the data Hosmer-Lemeshow

Here statistical significance is not desired, and as such it is more of a badness of fit really, and problematic since one can’t accept the null due to non-significance

Best used descriptively perhaps Pseudo r-squared statistics

In this dichotomous situation we will have trouble with devising an R2

AIC, BIC Deviance

Analogous to residual sums of squares from OLS, it is given in comparison to a perfectly fit null model

Model fit

In interpreting coefficients we’re now thinking about a particular case’s tendency toward some outcome

The problem with probabilities is that they are non-linear Going from .10 to .20 doubles the probability, but going from .80 to

.90 only increases the probability somewhat With logistic regression we start to think about the odds Odds are just an alternative way of expressing the likelihood

(probability) of an event. Probability is the expected number of the event divided by the total

number of possible outcomes Odds are the expected number of the event divided by the

expected number of non-event occurrences. Expresses the likelihood of occurrence relative to likelihood of non-

occurrence

Coefficients

Let's begin with probability. Let's say that the probability of success is .8, thus p = .8

Then the probability of failure is q = 1 - p = .2

The odds of success are defined as odds(success) = p/q = .8/.2 = 4, that is, the odds of success are 4 to 1.

We can also define the odds of failure odds(failure) = q/p = .2/.8 = .25, that is, the odds of failure are 1 to 4.

Odds

Next, let's compute the odds ratio by OR = odds(success)/odds(failure) = 4/.25 = 16

The interpretation of this odds ratio would be that the odds of success are 16 times greater than for failure.

Now if we had formed the odds ratio the other way around with odds of failure in the numerator, we would have gotten

OR = odds(failure)/odds(success) = .25/4 = .0625

Here the interpretation is that the odds of failure are one-sixteenth the odds of success.

Odds Ratio

Logit Natural log (e) of an odds Often called a log odds

The logit scale is linear with respect to the predictors Logits are continuous and are centered on zero (kind

of like z-scores) p = 0.50, odds = 1, then logit = 0 p = 0.70, odds = 2.33, then logit = 0.85 p = 0.30, odds = .43, then logit = -0.85

Logit

So conceptually putting things in our standard regression form: Log odds = bo + b1X

Now a one unit change in X leads to a b1 change in the log odds

In terms of odds:

In terms of probability:

Thus the logit, odds and probability are different ways of expressing the same thing

0 1( 1) b b Xodds Y e

0 1

0 1Pr( 1)

1

b b X

b b X

eY

e

Logit

The raw coefficients1 for our predictor variables in our output are the amount of change in the log odds given a one unit increase in that predictor

The coefficients are determined through an iterative process that finds the coefficients that best match the data at hand Maximum likelihood Starts with a set of coefficients (e.g. ordinary least squares

estimates) and then proceeds to alter until almost no change in fit

Coefficients

If desired we also receive a different type of coefficient expressed in odds rather than log odds which would be more interpretable

Anything above 1 suggests an increase in odds of an event, less than, a decrease in the odds

For example, if 1.14, moving on the predictor variable 1 unit increases the odds of the event by a factor of 1.14 Essentially it is the odds ratio for one value of X vs. the next

value of X More intuitively it refers to the percentage increase (or

decrease) of becoming a member of group such and such with a one unit increase in the predictor variable

% ( 1)*100be

Coefficients

Example: predicting art museum visitation by education, age, socioeconomic index, amount of TV watching, and political orientation (conservative-liberal) Gss93 dataset

Key things to look for Model fit: Pseudo-R2

Coefficients Classification accuracy Performing a logistic regression is no different than multiple

regression Once the appropriate function/menu is selected one selects

variables in the same fashion and may do sequential, stepwise etc.

Example

Example

The data are split with 58.3% not visiting a museum in the past year For this 0 represents no visit, 1 visit

Education, tv and sei appear to be the stronger predictors

The statistical test of the model suggests this is a significant upgrade in prediction from just guessing ‘no’ Nagelkerke’s R2 (not shown at right)

is .227

Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -3.723757 0.462248 -8.056 7.9e-16 ***age -0.005345 0.003858 -1.385 0.1659 educ 0.269442 0.029845 9.028 < 2e-16 ***polviews -0.071377 0.045361 -1.574 0.1156 tvhours -0.067493 0.033881 -1.992 0.0464 * sei 0.010229 0.004011 2.550 0.0108 * ---Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Null deviance: 1834.2 on 1349 degrees of freedomResidual deviance: 1584.8 on 1344 degrees of freedomAIC: 1596.8

Analysis of Deviance Table

Model 1: artmuseum ~ 1Model 2: artmuseum ~ age + educ + polviews + tvhours + sei

Resid. Df Resid. Dev Df Deviance P(>|Chi|)1 1349 1834.16 2 1344 1584.78 5 249.38 7.466e-52

More to interpret

In terms of odds we can see that one more year of education increases the percentage chance of visiting a museum by ~30.9%

By contrast, watching one more hour of TV per day would decrease the percentage chance by ~6.5%

(Intercept) age educ polviews tvhours sei

0.024 0.995 1.309 0.931 0.935 1.010

2.5 % 97.5 %

(Intercept) 0.010 0.059

age 0.987 1.002

educ 1.236 1.389

polviews 0.852 1.018

tvhours 0.873 0.998

sei 1.002 1.018

More to interpret

The plot at the right shows the conditional probability of the outcome given particular values of education

The classification plot shows 928/1350 correct classification or roughly 69% classification Actual No-Yes on row heading Predicted No-Yes on column

No Yes

No 630 157

Yes 265 298

Related Methods

Multinomial Regression More than two categories

Ordinal Regression Ordered categorical outcome

Loglinear analysis All categorical variables

Poisson regression Useful when predicting an outcome variable

representing counts

Related Methods

Discriminant function analysis is used for the same purposes as logistic regression, but while more restrictive with its assumptions, it may be the better alternative if assumptions are met

Perhaps the larger issue is that neither typically perform as well as newer methods (e.g. tree classification, Bayesian belief networks) and sometimes old ones (e.g. cluster analytic techniques)

Classification trees for example are highly interpretable, more flexible and usually more accurate

Survival Analysis

Collection of techniques that deal with the time it takes something to happen Other names: failure analysis, proportional hazards, event

history analysis or reliability analysis Examples: relapse, death, employee leaves, onset of

schizophrenic symptoms, obtaining Ph.D. Kinds of questions involve proportions ‘surviving’ at

various times, group differences, regular regression concerns with covariates (variable importance etc.)

Different types of survival analysis Life tables: focus on proportions of survivors at various times Prediction of survival time with various covariates, coded

categorical variables In a sense one can think of it as regression with time to

failure as the DV but in many cases not all do so by the time collection has ended, and as such different techniques are employed to deal with the ‘censoring’

Generalized Additive Models (GAMS) Additive regression models are nonparametric

procedures that allow for a smoothed fit (e.g. locally weighted scatterplot smoother- lowess1) in which the outcome is seen as a sum of functions of the predictors

While these provide better fit, they become more problematic with more predictors, and interpretation is difficult beyond even two predictors

As such, the procedure is more geared toward prediction rather than explanation

pp XfXfXfbY ...22110

Time series

Another ‘time-oriented’ analysis In some sense you are familiar with it in terms of a

repeated measures model With this analysis we have many time points of

data (e.g. 50 or more), and unlike our usual regression situations, we clearly do not have independence of observations

The goal is to spot trends and seasonal cycles, note the effect of an intervention or other covariates, forecasting, compare trends of different variables, look for changes in variance (heteroscedasticity) etc.

As an example, an autoregressive model would use previous time points of data to predict the variable in question for the next time point

tptptttt eYYYYY ...312111

Polynomial Regression

A form of nonlinear regression that is still accommodated by the general linear model They are still linear in the parameters

Predictors take on powers, and the line will have p-1 ‘bends’ given an order of p In the equation here there would be only

1 bend to the line While perhaps not as easily interpreted

as a typical regression in some situations, given your exposure to ‘interactions’ in regression you can think of X-Y relationship changing over the values of X

eXbXbbY 22110

Quantile regression

OLS regression is an attempt to estimate the mean of the DV at the values of the predictor

What if you estimated the median, i.e. the 50 th quantile, instead?

What about any other quantile? Quantile regression requires no new conceptual

understanding outside of this; you are in a sense simply looking at the regression model for different values of the DV

Graphic Model: fatalism and simplicity predicting depression

(Ginzberg data) Plotted are the coefficients and their CIs as they

change when the model is predicting the 10th, 25th, 50th,75th,and 90th percentile

Fatalism has a stronger coefficient as one predicts higher levels of depression

Red is the coefficient and CI for it from the regular OLS

Multilevel modeling

Many names: random effects modeling, mixed effects modeling, hierarchical linear modeling

Multilevel modeling can be conducted in a variety of situations When dealing with classes Paired data Repeated measures Essentially any time you think ANOVA, you might think of this instead

Classic example: a model at the student level informed by class, school, district etc.

A way to think about it is in terms of contextual modeling across classes/categories (or individuals in an RM setting) allowing the slope(s) and/or intercept to vary across classes (individuals)

One may see how the model changes, but get a weighted average across the models

In this way it has a connection to the Bayesian Model Averaging introduced before, and you might also see the descriptive fashion in which we previously looked at interactions as a form of contextual modeling

PCR, Canonical Correlation & PLS Principal components regression performs a PC analysis on the predictors and

uses the resulting components (or just the best) in predicting the outcome Components are linear combinations of the predictors that account for all the variance

in the predictors Produces independent components, and thus is a solution to collinearity

Canonical Correlation Correlates two sets of variables, and to do so requires getting linear combinations of

both an X and Y set Cancor maximizes the correlation of the linear combination of the X set with linear

combination of Y set More than one pair of linear combinations of the sets can (and will) be correlated Mostly a descriptive technique

Partial-Least Squares PCR creates combinations of predictors without regard to the DV, canonical correlation

does not seek to maximize prediction from the Xs to the Ys PLS is like a combination of the two which seeks to maximize the prediction between

Xs and Ys

Path Analysis

Path analysis allows regression to encompass multiple outcomes and indirect effects1

Structural equation modeling performs a similar function with latent variables and accounts for measurement error in the observed variables

Ability

FamilyBackground

Motivation

AcademicCoursework

Achievement

And finally…

About a thousand other techniques Which is appropriate for your situation will depend

on a variety of factors, but typically several will be available to help answer your research questions

As computational issues are minimal and all but the most cutting edge techniques are readily available, one should attempt to find one that will best answer your theoretical questions.

Download - Other Regression Stuff you might come across. Outline Generalized Linear Models Logistic regression and Related Survival analysis Generalized Additive

Top Related