logistic regression

25
Logistic Regression Database Marketing Instructor: N. Kumar

Upload: becka

Post on 07-Feb-2016

66 views

Category:

Documents


2 download

DESCRIPTION

Logistic Regression. Database Marketing Instructor: N. Kumar. Logistic Regression vs TGDA. Two-Group Discriminant Analysis Implicitly assumes that the Xs are Multivariate Normally (MVN) Distributed This assumption is violated if Xs are categorical variables - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Logistic Regression

Logistic Regression

Database Marketing

Instructor: N. Kumar

Page 2: Logistic Regression

Logistic Regression vs TGDA

Two-Group Discriminant Analysis Implicitly assumes that the Xs are Multivariate

Normally (MVN) Distributed This assumption is violated if Xs are categorical

variables Logistic Regression does not impose any

restriction on the distribution of the Xs Logistic Regression is the recommended

approach if at least some of the Xs are categorical variables

Page 3: Logistic Regression

DataFavored Stock Less Favored Stock

Success Size Success Size1 1 0 11 1 0 01 1 0 01 1 0 01 1 0 01 1 0 01 1 0 01 1 0 01 1 0 01 1 0 01 0 0 01 0 0 0

Page 4: Logistic Regression

Contingency Table

Type of Stock

Large Small Total

Preferred 10 2 12

Not Preferred

1 11 12

Total 11 13 24

Page 5: Logistic Regression

Basic Concepts

Probability Probability of being a preferred stock =

12/24 = 0.5 Probability that a company’s stock is

preferred given that the company is large = 10/11 = 0.909

Probability that a company’s stock is preferred given that the company is small = 2/13 = 0.154

Page 6: Logistic Regression

Concepts … contd.

Odds Odds of a preferred stock = 12/12 = 1 Odds of a preferred stock given that the

company is large = 10/1 = 10 Odds of a preferred stock given that the

company is small = 2/11 = 0.182

Page 7: Logistic Regression

Odds and Probability

Odds(Event) = Prob(Event)/(1-Prob(Event))

Prob(Event) = Odds(Event)/(1+Odds(Event))

Page 8: Logistic Regression

Logistic Regression

Take Natural Log of the odds: ln(odds(Preferred|Large)) = ln(10) = 2.303 ln(odds(Preferred|Small)) = ln(0.182) = -1.704

Combining these relationships ln(odds(Preferred|Size)) = -1.704 + 4.007*Size Log of the odds is a linear function of size The coefficient of size can be interpreted like the

coefficient in regression analysis

Page 9: Logistic Regression

Interpretation

Positive sign ln(odds) is increasing in size of the company i.e. a large company is more likely to have a preferred stock vis-à-vis a small company

Magnitude of the coefficient gives a measure of how much more likely

Page 10: Logistic Regression

General Model

ln(odds) = 0 + 1X1 + 2X2 +…+ kXK (1)

Recall: Odds = p/(1-p)

ln(p/1-p) = 0 + 1X1 + 2X2 +…+ kXK (2)

p =

p =

110

110

1 X

X

e

e

)( 1101

1Xe

Page 11: Logistic Regression

Logistic Function

00.10.20.30.40.50.60.70.80.9

1

-50 -30 -10 10 30 50

X

p Series1

Page 12: Logistic Regression

Estimation

Coefficients in the regression model are estimated by minimizing the sum of squared errors

Since, p is non-linear in the parameter estimates we need a non-linear estimation technique Maximum-Likelihood Approach Non-Linear Least Squares

Page 13: Logistic Regression

Maximum Likelihood Approach

Conditional on parameter , write out the probability of observing the data

Write this probability out for each observation Multiply the probability of each observation

out to get the joint probability of observing the data condition on

Find the that maximizes the conditional probability of realizing this data

Page 14: Logistic Regression

Logistic Regression

Logistic Regression with one categorical explanatory variable reduces to an analysis of the contingency table

Page 15: Logistic Regression

Interpretation of Results

Look at the –2 Log L statistic Intercept only: 33.271 Intercept and Covariates: 17.864 Difference: 15.407 with 1 DF (p=0.0001) Means that the size variable is explaining a

lot

Page 16: Logistic Regression

Do the Variables Have a Significant Impact? Like testing whether the coefficients in the

regression model are different from zero Look at the output from Analysis of Maximum

Likelihood Estimates Loosely, the column Pr>Chi-Square gives you the

probability of realizing the estimate in the Parameter estimate column if the estimate were truly zero – if this value is < 0.05 the estimate is considered to be significant

Page 17: Logistic Regression

Other things to Look for

Akaike’s Information Criterion (AIC), Schwartz’s Criterion (SC) – this like Adj-R2 – so there is a penalty for having additional covariates

The larger the difference between the second and third columns – the better the model fit

Page 18: Logistic Regression

Interpretation of the Parameter Estimates ln(p/(1-p)) = -1.705 + 4.007*Size

p/(1-p) = e(-1.705) e(4.007*Size)

For a unit increase in size, odds of being a favored stock go up by e4.007 = 54.982

Page 19: Logistic Regression

Predicted Probabilities and Observed Responses The response variable (success)

classifies an observation into an event or a no-event

A concordant pair is defined as that pair formed by an event with a PHAT higher than that of the no-event

Higher the Concordant pair % the better

Page 20: Logistic Regression

Classification

For a set of new observations where you have information on size alone

You can use the model to predict the probability that success = 1 i.e. the stock is favored

If PHAT > 0.5 success = 1else success=2

Page 21: Logistic Regression

Logistic Regression with multiple independent variables

Independent variables a mixture of continuous and categorical variables

Page 22: Logistic Regression

Data

Favored Stock Less Favored StockSuccess Size fp Success Size fp

1 1 0.58 0 1 2.281 1 2.8 0 0 1.061 1 2.77 0 0 1.081 1 3.5 0 0 0.071 1 2.67 0 0 0.161 1 2.97 0 0 0.71 1 2.18 0 0 0.751 1 3.24 0 0 1.611 1 1.49 0 0 0.341 1 2.19 0 0 1.151 0 2.7 0 0 0.441 0 2.57 0 0 0.86

Page 23: Logistic Regression

General Model

ln(odds) = 0 + 1Size + 2FP

ln(p/1-p) = 0 + 1Size + 2FP

p =

p =

FPSize

FPSize

e

e210

210

1

)( 2101

1FPSizee

Page 24: Logistic Regression

Estimation & Interpretation of the Results

Identical to the case with one categorical variable

Page 25: Logistic Regression

Summary

Logistic Regression or Discriminant Analysis

Techniques differ in underlying assumptions about the distribution of the explanatory (independent) variables

Use logistic regression if you have a mix of categorical and continuous variables