chapter 15 multiple regression

65
1 Slide Cengage Learning. All Rights Reserved. May not be scanned, copied duplicated, or posted to a publicly accessible website, in whole or in part. Chapter 15 Multiple Regression Multiple Regression Model Least Squares Method Multiple Coefficient of Determination Model Assumptions Testing for Significance Using the Estimated Regression Equation for Estimation and Prediction Categorical Independent Variables Residual Analysis Logistic Regression

Upload: tevin

Post on 04-Feb-2016

80 views

Category:

Documents


1 download

DESCRIPTION

Chapter 15 Multiple Regression. Multiple Regression Model. Least Squares Method. Multiple Coefficient of Determination. Model Assumptions. Testing for Significance. Using the Estimated Regression Equation for Estimation and Prediction. Categorical Independent Variables. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Chapter 15  Multiple Regression

1 1 Slide Slide© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied

or duplicated, or posted to a publicly accessible website, in whole or in part.

Chapter 15 Multiple Regression

Multiple Regression Model Least Squares Method Multiple Coefficient of

Determination Model Assumptions Testing for Significance Using the Estimated Regression

Equation for Estimation and Prediction Categorical Independent Variables Residual Analysis

Logistic Regression

Page 2: Chapter 15  Multiple Regression

2 2 Slide Slide© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied

or duplicated, or posted to a publicly accessible website, in whole or in part.

In this chapter we continue our study of regression analysis by considering situations involving two or more independent variables.

Multiple Regression

This subject area, called multiple regression analysis, enables us to consider more factors and thus obtain better estimates than are possible with simple linear regression.

Page 3: Chapter 15  Multiple Regression

3 3 Slide Slide© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied

or duplicated, or posted to a publicly accessible website, in whole or in part.

The equation that describes how the dependent variable y is related to the independent variables x1, x2, . . . xp and an error term is:

Multiple Regression Model

y = b0 + b1x1 + b2x2 + . . . + bpxp + e

where:b0, b1, b2, . . . , bp are the parameters, ande is a random variable called the error term

Multiple Regression Model

Page 4: Chapter 15  Multiple Regression

4 4 Slide Slide© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied

or duplicated, or posted to a publicly accessible website, in whole or in part.

The equation that describes how the mean value of y is related to x1, x2, . . . xp is:

Multiple Regression Equation

E(y) = 0 + 1x1 + 2x2 + . . . + pxp

Multiple Regression Equation

Page 5: Chapter 15  Multiple Regression

5 5 Slide Slide© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied

or duplicated, or posted to a publicly accessible website, in whole or in part.

A simple random sample is used to compute sample statistics b0, b1, b2, . . . , bp that are used as the point estimators of the parameters b0, b1, b2, . . . , bp.

Estimated Multiple Regression Equation

^y = b0 + b1x1 + b2x2 + . . . + bpxp

Estimated Multiple Regression Equation

Page 6: Chapter 15  Multiple Regression

6 6 Slide Slide© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied

or duplicated, or posted to a publicly accessible website, in whole or in part.

Estimation Process

Multiple Regression ModelE(y) = 0 + 1x1 + 2x2 +. . .+ pxp + e

Multiple Regression EquationE(y) = 0 + 1x1 + 2x2 +. . .+ pxp

Unknown parameters areb0, b1, b2, . . . , bp

Sample Data:x1 x2 . . . xp y. . . .. . . .

0 1 1 2 2ˆ ... p py b bx b x b x

Estimated MultipleRegression Equation

Sample statistics are

b0, b1, b2, . . . , bp

b0, b1, b2, . . . , bp

provide estimates ofb0, b1, b2, . . . , bp

Page 7: Chapter 15  Multiple Regression

7 7 Slide Slide© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied

or duplicated, or posted to a publicly accessible website, in whole or in part.

Least Squares Method

Least Squares Criterion

2ˆmin ( )i iy y

Computation of Coefficient Values

The formulas for the regression coefficientsb0, b1, b2, . . . bp involve the use of matrix algebra.

We will rely on computer software packages toperform the calculations.

Page 8: Chapter 15  Multiple Regression

8 8 Slide Slide© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied

or duplicated, or posted to a publicly accessible website, in whole or in part.

Least Squares Method

Computation of Coefficient Values

The formulas for the regression coefficientsb0, b1, b2, . . . bp involve the use of matrix algebra.

We will rely on computer software packages toperform the calculations.

The emphasis will be on how to interpret thecomputer output rather than on how to make themultiple regression computations.

Page 9: Chapter 15  Multiple Regression

9 9 Slide Slide© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied

or duplicated, or posted to a publicly accessible website, in whole or in part.

The years of experience, score on the aptitude test

test, and corresponding annual salary ($1000s) for a

sample of 20 programmers is shown on the next slide.

Example: Programmer Salary Survey

Multiple Regression Model

A software firm collected data for a sample of 20

computer programmers. A suggestion was made that

regression analysis could be used to determine if

salary was related to the years of experience and the

score on the firm’s programmer aptitude test.

Page 10: Chapter 15  Multiple Regression

10 10 Slide Slide© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied

or duplicated, or posted to a publicly accessible website, in whole or in part.

47158100166

92105684633

781008682868475808391

88737581748779947089

24.043.023.734.335.838.022.223.130.033.0

38.026.636.231.629.034.030.133.928.230.0

Exper.(Yrs.)

TestScore

TestScore

Exper.(Yrs.)

Salary($000s)

Salary($000s)

Multiple Regression Model

Page 11: Chapter 15  Multiple Regression

11 11 Slide Slide© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied

or duplicated, or posted to a publicly accessible website, in whole or in part.

Suppose we believe that salary (y) is related to

the years of experience (x1) and the score on the

programmer aptitude test (x2) by the following

regression model:

Multiple Regression Model

where y = annual salary ($000) x1 = years of experience x2 = score on programmer aptitude test

y = 0 + 1x1 + 2x2 +

Page 12: Chapter 15  Multiple Regression

12 12 Slide Slide© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied

or duplicated, or posted to a publicly accessible website, in whole or in part.

Solving for the Estimates of 0, 1, 2

Input DataLeast Squares

Output

x1 x2 y

4 78 24

7 100 43

. . .

. . .

3 89 30

ComputerPackage

for SolvingMultiple

RegressionProblems

b0 =

b1 =

b2 =

R2 =

etc.

Page 13: Chapter 15  Multiple Regression

13 13 Slide Slide© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied

or duplicated, or posted to a publicly accessible website, in whole or in part.

Regression Equation Output

Solving for the Estimates of 0, 1, 2

Coef SE Coef T p

Constant 3.17394 6.15607 0.5156 0.61279Experience 1.4039 0.19857 7.0702 1.9E-06Test Score 0.25089 0.07735 3.2433 0.00478

Predictor

Page 14: Chapter 15  Multiple Regression

14 14 Slide Slide© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied

or duplicated, or posted to a publicly accessible website, in whole or in part.

Estimated Regression Equation

SALARY = 3.174 + 1.404(EXPER) + 0.251(SCORE)

Note: Predicted salary will be in thousands of dollars.

Page 15: Chapter 15  Multiple Regression

15 15 Slide Slide© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied

or duplicated, or posted to a publicly accessible website, in whole or in part.

Interpreting the Coefficients

In multiple regression analysis, we interpret each

regression coefficient as follows: bi represents an estimate of the change in y corresponding to a 1-unit increase in xi when all other independent variables are held constant.

Page 16: Chapter 15  Multiple Regression

16 16 Slide Slide© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied

or duplicated, or posted to a publicly accessible website, in whole or in part.

Salary is expected to increase by $1,404 for each additional year of experience (when the variablescore on programmer attitude test is held constant).

b1 = 1.404

Interpreting the Coefficients

Page 17: Chapter 15  Multiple Regression

17 17 Slide Slide© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied

or duplicated, or posted to a publicly accessible website, in whole or in part.

Salary is expected to increase by $251 for each additional point scored on the programmer aptitudetest (when the variable years of experience is heldconstant).

b2 = 0.251

Interpreting the Coefficients

Page 18: Chapter 15  Multiple Regression

18 18 Slide Slide© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied

or duplicated, or posted to a publicly accessible website, in whole or in part.

Multiple Coefficient of Determination

Relationship Among SST, SSR, SSE

where: SST = total sum of squares SSR = sum of squares due to regression SSE = sum of squares due to error

SST = SSR + SSE

2( )iy y 2ˆ( )iy y 2ˆ( )i iy y= +

Page 19: Chapter 15  Multiple Regression

19 19 Slide Slide© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied

or duplicated, or posted to a publicly accessible website, in whole or in part.

ANOVA Output

Multiple Coefficient of Determination

Analysis of Variance

DF SS MS F PRegression 2 500.3285 250.164 42.76 0.000Residual Error 17 99.45697 5.850Total 19 599.7855

SOURCE

SSTSSR

Page 20: Chapter 15  Multiple Regression

20 20 Slide Slide© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied

or duplicated, or posted to a publicly accessible website, in whole or in part.

Multiple Coefficient of Determination

R2 = 500.3285/599.7855 = .83418

R2 = SSR/SST

Page 21: Chapter 15  Multiple Regression

21 21 Slide Slide© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied

or duplicated, or posted to a publicly accessible website, in whole or in part.

Adjusted Multiple Coefficientof Determination

Adding independent variables, even ones that are not statistically significant, causes the prediction errors to become smaller, thus reducing the sum of squares due to error, SSE.

Because SSR = SST – SSE, when SSE becomes smaller, SSR becomes larger, causing R2 = SSR/SST to increase.

The adjusted multiple coefficient of determination compensates for the number of independent variables in the model.

Page 22: Chapter 15  Multiple Regression

22 22 Slide Slide© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied

or duplicated, or posted to a publicly accessible website, in whole or in part.

Adjusted Multiple Coefficientof Determination

R Rn

n pa2 21 1

11

( )R R

nn pa

2 21 11

1

( )

2 20 11 (1 .834179) .814671

20 2 1aR

Page 23: Chapter 15  Multiple Regression

23 23 Slide Slide© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied

or duplicated, or posted to a publicly accessible website, in whole or in part.

The variance of , denoted by 2, is the same for all values of the independent variables.

The error is a normally distributed random variable reflecting the deviation between the y value and the expected value of y given by 0 + 1x1 + 2x2 + . . + pxp.

Assumptions About the Error Term

The error is a random variable with mean of zero.

The values of are independent.

Page 24: Chapter 15  Multiple Regression

24 24 Slide Slide© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied

or duplicated, or posted to a publicly accessible website, in whole or in part.

In simple linear regression, the F and t tests provide the same conclusion.

Testing for Significance

In multiple regression, the F and t tests have different purposes.

Page 25: Chapter 15  Multiple Regression

25 25 Slide Slide© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied

or duplicated, or posted to a publicly accessible website, in whole or in part.

Testing for Significance: F Test

The F test is referred to as the test for overall significance.

The F test is used to determine whether a significant relationship exists between the dependent variable and the set of all the independent variables.

Page 26: Chapter 15  Multiple Regression

26 26 Slide Slide© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied

or duplicated, or posted to a publicly accessible website, in whole or in part.

A separate t test is conducted for each of the independent variables in the model.

If the F test shows an overall significance, the t test is used to determine whether each of the individual independent variables is significant.

Testing for Significance: t Test

We refer to each of these t tests as a test for individual significance.

Page 27: Chapter 15  Multiple Regression

27 27 Slide Slide© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied

or duplicated, or posted to a publicly accessible website, in whole or in part.

Testing for Significance: F Test

Hypotheses

Rejection Rule

Test Statistics

H0: 1 = 2 = . . . = p = 0

Ha: One or more of the parameters

is not equal to zero.

F = MSR/MSE

Reject H0 if p-value < a or if F > F ,

where F is based on an F distribution

with p d.f. in the numerator andn - p - 1 d.f. in the denominator.

Page 28: Chapter 15  Multiple Regression

28 28 Slide Slide© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied

or duplicated, or posted to a publicly accessible website, in whole or in part.

F Test for Overall Significance

Hypotheses H0: 1 = 2 = 0

Ha: One or both of the parameters

is not equal to zero.

Rejection Rule For = .05 and d.f. = 2, 17; F.05 = 3.59

Reject H0 if p-value < .05 or F > 3.59

Page 29: Chapter 15  Multiple Regression

29 29 Slide Slide© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied

or duplicated, or posted to a publicly accessible website, in whole or in part.

ANOVA Output

F Test for Overall Significance

Analysis of Variance

DF SS MS F PRegression 2 500.3285 250.164 42.76 0.000Residual Error 17 99.45697 5.850Total 19 599.7855

SOURCE

p-value used to test for

overall significance

Page 30: Chapter 15  Multiple Regression

30 30 Slide Slide© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied

or duplicated, or posted to a publicly accessible website, in whole or in part.

F Test for Overall Significance

Test Statistics F = MSR/MSE = 250.16/5.85 = 42.76

Conclusion p-value < .05, so we can reject H0.(Also, F = 42.76 > 3.59)

Page 31: Chapter 15  Multiple Regression

31 31 Slide Slide© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied

or duplicated, or posted to a publicly accessible website, in whole or in part.

Testing for Significance: t Test

Hypotheses

Rejection Rule

Test Statistics

Reject H0 if p-value < a or

if t < -tor t > t where t

is based on a t distributionwith n - p - 1 degrees of freedom.

tbs

i

bi

tbs

i

bi

0 : 0iH : 0a iH

Page 32: Chapter 15  Multiple Regression

32 32 Slide Slide© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied

or duplicated, or posted to a publicly accessible website, in whole or in part.

t Test for Significanceof Individual Parameters

Hypotheses

Rejection RuleFor = .05 and d.f. = 17, t.025 = 2.11

Reject H0 if p-value < .05, or

if t < -2.11 or t > 2.11

0 : 0iH : 0a iH

Page 33: Chapter 15  Multiple Regression

33 33 Slide Slide© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied

or duplicated, or posted to a publicly accessible website, in whole or in part.

Coef SE Coef T p

Constant 3.17394 6.15607 0.5156 0.61279Experience 1.4039 0.19857 7.0702 1.9E-06Test Score 0.25089 0.07735 3.2433 0.00478

Predictor

Regression Equation Output

t Test for Significanceof Individual Parameters

t statistic and p-value used to test for the individual significance of

“Experience”

Page 34: Chapter 15  Multiple Regression

34 34 Slide Slide© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied

or duplicated, or posted to a publicly accessible website, in whole or in part.

t Test for Significanceof Individual Parameters

bsb

1

1

1 40391986

7 07 ..

.bsb

1

1

1 40391986

7 07 ..

.

bsb

2

2

2508907735

3 24 ..

.bsb

2

2

2508907735

3 24 ..

.

Test Statistics

Conclusions Reject both H0: 1 = 0 and H0: 2 = 0.

Both independent variables aresignificant.

Page 35: Chapter 15  Multiple Regression

35 35 Slide Slide© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied

or duplicated, or posted to a publicly accessible website, in whole or in part.

Testing for Significance: Multicollinearity

The term multicollinearity refers to the correlation among the independent variables.

When the independent variables are highly correlated (say, |r | > .7), it is not possible to determine the separate effect of any particular independent variable on the dependent variable.

Page 36: Chapter 15  Multiple Regression

36 36 Slide Slide© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied

or duplicated, or posted to a publicly accessible website, in whole or in part.

Testing for Significance: Multicollinearity

Every attempt should be made to avoid including independent variables that are highly correlated.

If the estimated regression equation is to be used only for predictive purposes, multicollinearity is usually not a serious problem.

Page 37: Chapter 15  Multiple Regression

37 37 Slide Slide© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied

or duplicated, or posted to a publicly accessible website, in whole or in part.

Using the Estimated Regression Equationfor Estimation and Prediction

The procedures for estimating the mean value of y and predicting an individual value of y in multiple regression are similar to those in simple regression.

We substitute the given values of x1, x2, . . . , xp into the estimated regression equation and use the corresponding value of y as the point estimate.

Page 38: Chapter 15  Multiple Regression

38 38 Slide Slide© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied

or duplicated, or posted to a publicly accessible website, in whole or in part.

Using the Estimated Regression Equationfor Estimation and Prediction

Software packages for multiple regression will often provide these interval estimates.

The formulas required to develop interval estimates for the mean value of y and for an individual value of y are beyond the scope of the textbook.

^

Page 39: Chapter 15  Multiple Regression

39 39 Slide Slide© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied

or duplicated, or posted to a publicly accessible website, in whole or in part.

In many situations we must work with categorical independent variables such as gender (male, female), method of payment (cash, check, credit card), etc.

For example, x2 might represent gender where x2 = 0 indicates male and x2 = 1 indicates female.

Categorical Independent Variables

In this case, x2 is called a dummy or indicator variable.

Page 40: Chapter 15  Multiple Regression

40 40 Slide Slide© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied

or duplicated, or posted to a publicly accessible website, in whole or in part.

The years of experience, the score on the programmer aptitude test, whether the

individual hasa relevant graduate degree, and the annual

salary($000) for each of the sampled 20

programmers areshown on the next slide.

Categorical Independent Variables

Example: Programmer Salary Survey

As an extension of the problem involving thecomputer programmer salary survey, suppose

thatmanagement also believes that the annual

salary isrelated to whether the individual has a graduate degree in computer science or information

systems.

Page 41: Chapter 15  Multiple Regression

41 41 Slide Slide© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied

or duplicated, or posted to a publicly accessible website, in whole or in part.

47158100166

92105684633

781008682868475808391

88737581748779947089

24.043.023.734.335.838.022.223.130.033.0

38.026.636.231.629.034.030.133.928.230.0

Exper.(Yrs.)

TestScore

TestScore

Exper.(Yrs.)

Salary($000s)

Salary($000s)Degr.

NoYes NoYesYesYes No No NoYes

Degr.

Yes NoYes No NoYes NoYes No No

Categorical Independent Variables

Page 42: Chapter 15  Multiple Regression

42 42 Slide Slide© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied

or duplicated, or posted to a publicly accessible website, in whole or in part.

Estimated Regression Equation

^

where:

y = annual salary ($1000) x1 = years of experience x2 = score on programmer aptitude test x3 = 0 if individual does not have a graduate degree 1 if individual does have a graduate degree

x3 is a dummy variable

y = b0 + b1x1 + b2x2 + b3x3^

Page 43: Chapter 15  Multiple Regression

43 43 Slide Slide© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied

or duplicated, or posted to a publicly accessible website, in whole or in part.

ANOVA Output

Analysis of Variance

DF SS MS F PRegression 3 507.8960 269.299 29.48 0.000Residual Error 16 91.8895 5.743Total 19 599.7855

SOURCE

Categorical Independent Variables

R2 = 507.896/599.7855 = .8468

2 20 1

1 (1 .8468) .818120 3 1aR

Previously,R Square = .8342

Previously,Adjusted R Square = .815

Page 44: Chapter 15  Multiple Regression

44 44 Slide Slide© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied

or duplicated, or posted to a publicly accessible website, in whole or in part.

Coef SE Coef T p

Constant 7.945 7.382 1.076 0.298Experience 1.148 0.298 3.856 0.001Test Score 0.197 0.090 2.191 0.044

Predictor

Regression Equation Output

Categorical Independent Variables

Grad. Degr. 2.280 1.987 1.148 0.268

Not significant

Page 45: Chapter 15  Multiple Regression

45 45 Slide Slide© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied

or duplicated, or posted to a publicly accessible website, in whole or in part.

More Complex Categorical Variables

If a categorical variable has k levels, k - 1 dummy variables are required, with each dummy variable being coded as 0 or 1.

For example, a variable with levels A, B, and C could be represented by x1 and x2 values of (0, 0) for A, (1, 0) for B, and (0,1) for C.

Care must be taken in defining and interpreting the dummy variables.

Page 46: Chapter 15  Multiple Regression

46 46 Slide Slide© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied

or duplicated, or posted to a publicly accessible website, in whole or in part.

For example, a variable indicating level of education could be represented by x1 and x2 values as follows:

More Complex Categorical Variables

HighestDegree x1 x2

Bachelor’s 0 0Master’s 1 0Ph.D. 0 1

Page 47: Chapter 15  Multiple Regression

47 47 Slide Slide© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied

or duplicated, or posted to a publicly accessible website, in whole or in part.

Residual Analysis

y

For simple linear regression the residual plot against

and the residual plot against x provide the same information.

y In multiple regression analysis it is preferable

to use the residual plot against to determine if the model assumptions are satisfied.

Page 48: Chapter 15  Multiple Regression

48 48 Slide Slide© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied

or duplicated, or posted to a publicly accessible website, in whole or in part.

Standardized Residual Plot Against y

Standardized residuals are frequently used in residual plots for purposes of:• Identifying outliers (typically, standardized

residuals < -2 or > +2)• Providing insight about the assumption that

the error term e has a normal distribution The computation of the standardized residuals

in multiple regression analysis is too complex to be done by hand

Excel’s Regression tool can be used

Page 49: Chapter 15  Multiple Regression

49 49 Slide Slide© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied

or duplicated, or posted to a publicly accessible website, in whole or in part.

Residual Output

Standardized Residual Plot Against y

Observation Predicted Y Residuals Standard Residuals1 27.89626 -3.89626 -1.7717072 37.95204 5.047957 2.2954063 26.02901 -2.32901 -1.0590484 32.11201 2.187986 0.9949215 36.34251 -0.54251 -0.246689

Page 50: Chapter 15  Multiple Regression

50 50 Slide Slide© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied

or duplicated, or posted to a publicly accessible website, in whole or in part.

Standardized Residual Plot Against y

Standardized Residual Plot

Standardized Residual Plot

-2

-1

0

1

2

3

0 10 20 30 40 50

Predicted Salary

Sta

nd

ard

R

es

idu

als

-3

Outlier

Page 51: Chapter 15  Multiple Regression

51 51 Slide Slide© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied

or duplicated, or posted to a publicly accessible website, in whole or in part.

Logistic Regression

Logistic regression can be used to model situations in which the dependent variable, y, may only assume two discrete values, such as 0 and 1.

In many ways logistic regression is like ordinary regression. It requires a dependent variable, y, and one or more independent variables.

The ordinary multiple regression model is not applicable.

Page 52: Chapter 15  Multiple Regression

52 52 Slide Slide© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied

or duplicated, or posted to a publicly accessible website, in whole or in part.

Logistic Regression

Logistic Regression Equation

The relationship between E(y) and x1, x2, . . . , xp is

better described by the following nonlinear equation. 0 1 1 2 2

0 1 1 2 2( )

1

p p

p p

x x x

x x x

eE y

e

Page 53: Chapter 15  Multiple Regression

53 53 Slide Slide© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied

or duplicated, or posted to a publicly accessible website, in whole or in part.

Logistic Regression

Interpretation of E(y) as a Probability in Logistic Regression

If the two values of y are coded as 0 or 1, the valueof E(y) provides the probability that y = 1 given aparticular set of values for x1, x2, . . . , xp.1 2( ) estimate of ( 1| , , , )pE y P y x x x

Page 54: Chapter 15  Multiple Regression

54 54 Slide Slide© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied

or duplicated, or posted to a publicly accessible website, in whole or in part.

Logistic Regression

Estimated Logistic Regression Equation

A simple random sample is used to compute sample statistics b0, b1, b2, . . . , bp that are used as the point estimators of the parameters b0, b1, b2, . . . , bp.

0 1 1 2 2

0 1 1 2 2ˆ

1

p p

p p

b b x b x b x

b b x b x b x

ey

e

Page 55: Chapter 15  Multiple Regression

55 55 Slide Slide© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied

or duplicated, or posted to a publicly accessible website, in whole or in part.

Logistic Regression

Example: Simmons Stores Simmons’ catalogs are expensive and

Simmonswould like to send them to only those customers

whohave the highest probability of making a $200

purchaseusing the discount coupon included in the

catalog.

Simmons’ management thinks that annual spending

at Simmons Stores and whether a customer has a

Simmons credit card are two variables that might be

helpful in predicting whether a customer who receives

the catalog will use the coupon to make a $200purchase.

Page 56: Chapter 15  Multiple Regression

56 56 Slide Slide© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied

or duplicated, or posted to a publicly accessible website, in whole or in part.

Logistic Regression

Example: Simmons Stores Simmons conducted a study by sending out

100catalogs, 50 to customers who have a Simmons

creditcard and 50 to customers who do not have the

card.At the end of the test period, Simmons noted for

each ofthe 100 customers:

1) the amount the customer spent last year at Simmons,

2) whether the customer had a Simmons credit card, and

3) whether the customer made a $200 purchase.

A portion of the test data is shown on the next slide.

Page 57: Chapter 15  Multiple Regression

57 57 Slide Slide© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied

or duplicated, or posted to a publicly accessible website, in whole or in part.

Logistic Regression

Simmons Test Data (partial)

Customer

12345678910

Annual Spending($1000)

2.2913.2152.1353.9242.5282.4732.3847.0761.1823.345

SimmonsCredit Card

1110100010

$200Purchase

0000010010

yx2x1

Page 58: Chapter 15  Multiple Regression

58 58 Slide Slide© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied

or duplicated, or posted to a publicly accessible website, in whole or in part.

Logistic Regression

ConstantSpendingCard

-2.14640.34161.0987

0.57720.12870.4447

0.0000.0080.013

Predictor Coef SE Coef p

1.41

3.00

OddsRatio

95% CILower Upper

1.09

1.25

Simmons Logistic Regression Table (using Minitab)

-3.722.662.47

Z

Log-Likelihood = -60.487

Test that all slopes are zero: G = 13.628, DF = 2, P-Value = 0.001

1.81

7.17

Page 59: Chapter 15  Multiple Regression

59 59 Slide Slide© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied

or duplicated, or posted to a publicly accessible website, in whole or in part.

Logistic Regression

Simmons Estimated Logistic Regression Equation

1 2

1 2

2.1464 0.3416 1.0987

2.1464 0.3416 1.0987ˆ

1

x x

x x

ey

e

Page 60: Chapter 15  Multiple Regression

60 60 Slide Slide© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied

or duplicated, or posted to a publicly accessible website, in whole or in part.

Logistic Regression

Using the Estimated Logistic Regression Equation

• For customers that spend $2000 annually and do not have a Simmons credit card:

• For customers that spend $2000 annually and do have a Simmons credit card:

2.1464 0.3416(2) 1.0987(0)

2.1464 0.3416(2) 1.0987(0)ˆ 0.1880

1e

ye

2.1464 0.3416(2) 1.0987(1)

2.1464 0.3416(2) 1.0987(1)ˆ 0.4099

1e

ye

Page 61: Chapter 15  Multiple Regression

61 61 Slide Slide© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied

or duplicated, or posted to a publicly accessible website, in whole or in part.

Logistic Regression

Testing for Significance

H0: 1 = 2 = 0

Ha: One or both of the parameters

is not equal to zero.

Hypotheses

Rejection Rule

Test Statistics z = bi/sbi

Reject H0 if p-value < a

Page 62: Chapter 15  Multiple Regression

62 62 Slide Slide© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied

or duplicated, or posted to a publicly accessible website, in whole or in part.

Logistic Regression

Testing for Significance

Conclusions For independent variable x1:

z = 2.66 and the p-value = .008.Hence, b1 = 0. In other words,

x1 is statistically significant.

For independent variable x2:

z = 2.47 and the p-value = .013.Hence, b2 = 0. In other words,

x2 is also statistically significant.

Page 63: Chapter 15  Multiple Regression

63 63 Slide Slide© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied

or duplicated, or posted to a publicly accessible website, in whole or in part.

Logistic Regression

Odds in Favor of an Event Occurring

Odds Ratio

1

0

oddsOdds Ratio

odds

1 2 1 2

1 2 1 2

( 1| , , , ) ( 1| , , , )odds

( 0| , , , ) 1 ( 1| , , , )p p

p p

P y x x x P y x x x

P y x x x P y x x x

Page 64: Chapter 15  Multiple Regression

64 64 Slide Slide© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied

or duplicated, or posted to a publicly accessible website, in whole or in part.

Logistic Regression

Estimated Probabilities

Credit Card

Yes

No

$1000 $2000 $3000 $4000 $5000 $6000 $7000

Annual Spending

0.3305 0.4099 0.4943 0.5791 0.6594 0.7315 0.7931

0.1413 0.1880 0.2457 0.3144 0.3922 0.4759 0.5610

Computedearlier

Page 65: Chapter 15  Multiple Regression

65 65 Slide Slide© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied

or duplicated, or posted to a publicly accessible website, in whole or in part.

Logistic Regression

Comparing Odds

Suppose we want to compare the odds of making a

$200 purchase for customers who spend $2000 annually

and have a Simmons credit card to the odds of making a

$200 purchase for customers who spend $2000 annually

and do not have a Simmons credit card.

1

.4099estimate of odds .6946

1- .4099

0

.1880estimate of odds .2315

1- .1880

.6946Estimate of odds ratio 3.00

.2315