chapter 15 multiple regression. regression multiple regression model y = 0 + 1 x 1 + 2 x 2 + …...

32
Chapter 15 Multiple Regression

Upload: kerrie-bryan

Post on 18-Dec-2015

241 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Chapter 15 Multiple Regression. Regression Multiple Regression Model y =  0 +  1 x 1 +  2 x 2 + … +  p x p +  Multiple Regression Equation y =

Chapter 15

Multiple Regression

Page 2: Chapter 15 Multiple Regression. Regression Multiple Regression Model y =  0 +  1 x 1 +  2 x 2 + … +  p x p +  Multiple Regression Equation y =

Regression

Multiple Regression Modely = b0 + b1x1 + b2x2 + … + bpxp + e

Multiple Regression Equationy = b0 + b1x1 + b2x2 + … + bpxp

Estimated Multiple Regression Equation

ppxbxbxbby ...ˆ 22110

Page 3: Chapter 15 Multiple Regression. Regression Multiple Regression Model y =  0 +  1 x 1 +  2 x 2 + … +  p x p +  Multiple Regression Equation y =

Car DataMPG Weight Year Cylinders

18 3504 70 815 3693 70 818 3436 70 816 3433 70 817 3449 70 815 4341 70 814 4354 70 814 4312 70 814 4425 70 815 3850 70 8

. . . .

. . . .

. . . .

Continuing on for 397 observations

Page 4: Chapter 15 Multiple Regression. Regression Multiple Regression Model y =  0 +  1 x 1 +  2 x 2 + … +  p x p +  Multiple Regression Equation y =

Multiple Regression, Example  Coefficients Standard Error t Stat

Intercept 46.3 0.800 57.8Weight -0.00765 0.000259 -29.4

R Square 0.687

  Coefficients Standard Error t StatIntercept -14.7 3.96 -3.71Weight -0.00665 0.000214 -31.0Year 0.763 0.0490 15.5

R Square 0.807

Page 5: Chapter 15 Multiple Regression. Regression Multiple Regression Model y =  0 +  1 x 1 +  2 x 2 + … +  p x p +  Multiple Regression Equation y =

Multiple Regression, Example

  Coefficients Standard Error t StatIntercept -14.4 4.03 -3.58Weight -0.00652 0.000460 -14.1Year 0.760 0.0498 15.2Cylinders -0.0741 0.232 -0.319

R Square 0.807

Predicted MPG for car weighing 4000 lbs built in 1980 with 6 cylinders:-14.4 -.00652(4000)+.76(80)-.0741(6)=-14.4-26.08+60.8-.4446=19.88

Page 6: Chapter 15 Multiple Regression. Regression Multiple Regression Model y =  0 +  1 x 1 +  2 x 2 + … +  p x p +  Multiple Regression Equation y =

2ˆ ii yySSE

2ˆ yySSR i

2 yySST i

SST = SSR + SSE

Sums of Squares

Page 7: Chapter 15 Multiple Regression. Regression Multiple Regression Model y =  0 +  1 x 1 +  2 x 2 + … +  p x p +  Multiple Regression Equation y =

Multiple Coefficient of DeterminationThe share of the variation explained by the estimated model.

R2 = SSR/SST

Multiple Correlation Coefficient

yyrRR ˆ2

The correlation coefficient of the actual and predicted values

Page 8: Chapter 15 Multiple Regression. Regression Multiple Regression Model y =  0 +  1 x 1 +  2 x 2 + … +  p x p +  Multiple Regression Equation y =

Adjusted Multiple Coefficient of Determination

1

111 22

pn

nRRa

Regression StatisticsMultiple R 0.898R Square 0.807Adjusted R Square 0.805Standard Error 3.44Observations 397

Page 9: Chapter 15 Multiple Regression. Regression Multiple Regression Model y =  0 +  1 x 1 +  2 x 2 + … +  p x p +  Multiple Regression Equation y =

F Test for Overall Significance

H0: b1 = b2 = . . . = bp = 0Ha: One or more of the parameters is not equal to zero

Reject H0 if: F > Fa OrReject H0 if: p-value < a

F = MSR/MSE

Page 10: Chapter 15 Multiple Regression. Regression Multiple Regression Model y =  0 +  1 x 1 +  2 x 2 + … +  p x p +  Multiple Regression Equation y =

ANOVA Table for Multiple Regression Model

Source Sum of Squares

Degrees of Freedom

Mean Squares F

Regression SSR p MSR = SSR/p F=MSR/MSE

Error SSE n-p-1 MSE = SSE/(n-p-1)

Total SST n-1

Page 11: Chapter 15 Multiple Regression. Regression Multiple Regression Model y =  0 +  1 x 1 +  2 x 2 + … +  p x p +  Multiple Regression Equation y =

ANOVA Example

ANOVA

  df SS MS FSignificance 

FRegression 3 19382 6460 547 6.42E-140Residual 393 4638 11.8Total 396 24021

Page 12: Chapter 15 Multiple Regression. Regression Multiple Regression Model y =  0 +  1 x 1 +  2 x 2 + … +  p x p +  Multiple Regression Equation y =

t Test for Coefficients

H0: b1 = 0Ha: b1 ≠ 0

Reject H0 if:t < -t /2a or t > t /2a Or if:p < a

t = b1/sb1

With a t distribution of n-p-1 df

Page 13: Chapter 15 Multiple Regression. Regression Multiple Regression Model y =  0 +  1 x 1 +  2 x 2 + … +  p x p +  Multiple Regression Equation y =

t Test Example

  Coefficients Standard Error t Stat P-valueIntercept -14.48 4.038 -3.587 0.0003769Weight -0.006525 0.0004603 -14.18 3.892E-37Year 0.7608 0.04985 15.26 1.258E-41Cylinders -0.07420 0.2322 -0.3196 0.7494

Page 14: Chapter 15 Multiple Regression. Regression Multiple Regression Model y =  0 +  1 x 1 +  2 x 2 + … +  p x p +  Multiple Regression Equation y =

MulticollinearityWhen two or more independent variables are highly correlated.

When multicollinearity is severe the estimated values of coefficients will be unreliable.

Page 15: Chapter 15 Multiple Regression. Regression Multiple Regression Model y =  0 +  1 x 1 +  2 x 2 + … +  p x p +  Multiple Regression Equation y =

MulticollinearityTwo guidelines for identifying multicollinearity:• If the absolute value of the correlation coefficient for two independent variables exceeds 0.7• If the correlation coefficient for an independent variable and some other independent variable is greater than the correlation with that variable and the dependent variable

Page 16: Chapter 15 Multiple Regression. Regression Multiple Regression Model y =  0 +  1 x 1 +  2 x 2 + … +  p x p +  Multiple Regression Equation y =

Multicollinearity

  MPG Weight Year CylindersMPG 1Weight -0.829 1Year 0.578 -0.300 1Cylinders -0.773 0.895 -0.344 1

Table of correlation coefficients:

Page 17: Chapter 15 Multiple Regression. Regression Multiple Regression Model y =  0 +  1 x 1 +  2 x 2 + … +  p x p +  Multiple Regression Equation y =

Multicollinearity  Coefficients Standard Error t Stat

Intercept -14.4 4.03 -3.58Weight -0.00652 0.000460 -14.1Year 0.760 0.0498 15.2Cylinders -0.0741 0.232 -0.319

R Square 0.807

  Coefficients Standard Error t StatIntercept -16.9 4.95 -3.42Year 0.747 0.0612 12.21

Cylinders -2.99 0.133 -22.46

R Square 0.708

Page 18: Chapter 15 Multiple Regression. Regression Multiple Regression Model y =  0 +  1 x 1 +  2 x 2 + … +  p x p +  Multiple Regression Equation y =

Qualitative Variables and Regression

Quantitative variable – A variable that can be measured numerically (interval or ratio scale of measurement)

Qualitative variable – A variable where labels or names are used to identify some attribute (nominal or ordinal scale of measurement)

Page 19: Chapter 15 Multiple Regression. Regression Multiple Regression Model y =  0 +  1 x 1 +  2 x 2 + … +  p x p +  Multiple Regression Equation y =

Qualitative Variables and Regression

The effect of a quantitative variable can be estimated using a dummy variable.

A dummy variable can equal 0 or 1, it creates different y intercepts for groups with different attributes.

Page 20: Chapter 15 Multiple Regression. Regression Multiple Regression Model y =  0 +  1 x 1 +  2 x 2 + … +  p x p +  Multiple Regression Equation y =

Qualitative Variables and Regression

Assume we estimate a regression model for the number of sick days an employee takes per year. A dummy variable is included that equals 1 if the individual smokes and 0 if they do not. Age is also included in the model.

Page 21: Chapter 15 Multiple Regression. Regression Multiple Regression Model y =  0 +  1 x 1 +  2 x 2 + … +  p x p +  Multiple Regression Equation y =

Qualitative Variables and Regression

Estimated model:Sick days taken = -1 +(3)Smoker + (.1)Age

Sick Days Smoker Age

3 0 45

6 1 50

0 0 20

5 0 65

10 1 60

Example of how data would be coded:

Page 22: Chapter 15 Multiple Regression. Regression Multiple Regression Model y =  0 +  1 x 1 +  2 x 2 + … +  p x p +  Multiple Regression Equation y =

Dummy VariablesSick days taken = -1 +(3)Smoker + (.1)Age

What is the y-intercept for nonsmokers? -1What is the y-intercept for smokers? 2What is the predicted number of sick days for a 40-year-old smoker? 6What is the average difference in the number of sick days taken by smokers and nonsmokers? 3

Page 23: Chapter 15 Multiple Regression. Regression Multiple Regression Model y =  0 +  1 x 1 +  2 x 2 + … +  p x p +  Multiple Regression Equation y =

Dummy Variables

If an attribute has three or more possible values you must include k-1 dummy variables in the model, where k is the number of possible values.

Page 24: Chapter 15 Multiple Regression. Regression Multiple Regression Model y =  0 +  1 x 1 +  2 x 2 + … +  p x p +  Multiple Regression Equation y =

Dummy VariablesSuppose we have three job classifications: manager, operator, and secretary

Operator dummy equals 1 if the person is an operator, 0 otherwise

Secretary dummy equals 1 if the person is an secretary, 0 otherwise

Manager is the omitted group (choice of omitted group will not alter the predicted values)

Page 25: Chapter 15 Multiple Regression. Regression Multiple Regression Model y =  0 +  1 x 1 +  2 x 2 + … +  p x p +  Multiple Regression Equation y =

Dummy VariablesSick days taken = -1 +(1)Operator + 1.5(Secretary) + (.1)Age

What are the y-intercepts for each job classification? Managers=-1, Operators=0, Secretaries=0.5 What is the predicted number of sick days for a 40-year-old secretary? 4.5What is the average difference in the number of sick days taken by operators and secretaries? 0.5

Page 26: Chapter 15 Multiple Regression. Regression Multiple Regression Model y =  0 +  1 x 1 +  2 x 2 + … +  p x p +  Multiple Regression Equation y =

Dummy VariablesIn some cases there will be multiple sets of dummy variables, such as:Sick days taken = -1 +(3)Smoker + (1)Operator + 1.5(Secretary) + (.1)Age

Note that there are now 6 different intercepts:Nonsmoker, Manager: -1 (omitted group)Smoker, Manager: 2Nonsmoker, Operator: 0Smoker, Operator: 3Nonsmoker, Secretary: 0.5Smoker, Secretary: 3.5

Page 27: Chapter 15 Multiple Regression. Regression Multiple Regression Model y =  0 +  1 x 1 +  2 x 2 + … +  p x p +  Multiple Regression Equation y =

Dummy VariablesNote that when dummy variables are used we are assuming that the coefficients of the other variables are the same for all groups.

In this example the increase in sick days used from aging a year is equal to 0.1 for all of the groups.

If there is reason to believe the effect of an independent variable differs by group, you may want to estimate separate equations for each group.

Page 28: Chapter 15 Multiple Regression. Regression Multiple Regression Model y =  0 +  1 x 1 +  2 x 2 + … +  p x p +  Multiple Regression Equation y =

Nonlinear Relationships

Nonlinear relationships can be modeled by including a variable that is a nonlinear function of an independent variable.

For example it is usually assumed that health care expenditures increase at an increasing rate as people age.

Page 29: Chapter 15 Multiple Regression. Regression Multiple Regression Model y =  0 +  1 x 1 +  2 x 2 + … +  p x p +  Multiple Regression Equation y =

Nonlinear Relationships

In that case you might try including age squared into the model:Health expend = 500 + (5)Age + (.5)AgeSQ

Age Health Expend10 60020 80030 110040 1500

Page 30: Chapter 15 Multiple Regression. Regression Multiple Regression Model y =  0 +  1 x 1 +  2 x 2 + … +  p x p +  Multiple Regression Equation y =

Nonlinear Relationships

If the dependent variable increases at a decreasing rate as the independent variable rises you might want to include the square root of the independent variable.

If you are unsure of the nature of the relationship you can use dummy variables for different ranges of values of the independent variable.

Page 31: Chapter 15 Multiple Regression. Regression Multiple Regression Model y =  0 +  1 x 1 +  2 x 2 + … +  p x p +  Multiple Regression Equation y =

Non-continuous Relationships

If the relationship between the dependent variable and an independent variable is non-continuous a slope dummy variable can be used to estimate two sets of coefficients for the independent variable.

For example, if natural gas usage is not affected by temperature when the temperature rises above 60 degrees, we could have:Gas usage = b0 + b1(GT60) + b2(Temp) + b2(GT60)(Temp)

Page 32: Chapter 15 Multiple Regression. Regression Multiple Regression Model y =  0 +  1 x 1 +  2 x 2 + … +  p x p +  Multiple Regression Equation y =

Non-continuous Relationships

Note that at temperatures above 60 degrees the net effect of a 1 degree increase in temperature on gas usage is -0.056 (-.866+.810)

  CoefficientsStandard Error t Stat P-value

Intercept 53.002 2.415 21.95 7.48E-18

GT60 -46.623 16.682 -2.79 0.0098

Temp -0.866 0.0595 -14.56 1.02E-13

(GT60)(Temp) 0.810 0.255 3.18 0.0039