ch14 - multiple regression and correlation

4
1 Multiple Regression Analysis Dr. Rick Jerz 1 1 Multiple Regression Analysis For two independent variables, the general form of the multiple regression equation is: X1 and X2 are the independent variables a is the Y-intercept b 1 is the net change in Y for each unit change in X1 holding X2 constant. It is called a regression coefficient 1 1 2 2 ˆ Y a bX bX = + + 2 2 Regression Plane for a 2-Independent Variable Linear Regression Equation 3 3 Multiple Regression Analysis The general multiple regression with k independent variables is given by: The least squares criterion is used to develop this equation. Because determining b1, b2, etc., is very tedious, a software package such as Excel or MINITAB is recommended. 1 1 2 2 3 3 ˆ k k Y a bX bX bX bX = + + + + + ! 4 4 An Example Salsberry Realty sells homes along the East Coast of the United States. One of the questions most frequently asked by prospective buyers is: If we purchase this home, how much can we expect to pay to heat it during the winter? The research department at Salsberry has been asked to develop some guidelines regarding heating costs for single-family homes. Three variables are thought to relate to the heating costs: (1) the mean daily outside temperature, (2) the number of inches of insulation in the attic, and (3) the age in years of the furnace. To investigate, Salsberry’s research department selected a random sample of 20 recently sold homes. It determined the cost to heat each home last January, as well. 5 5 Example Data 6 6

Upload: others

Post on 30-May-2022

13 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Ch14 - Multiple Regression and Correlation

1

Multiple Regression Analysis

Dr. Rick Jerz

1

1

Multiple Regression Analysis

• For two independent variables, the general form of the multiple regression equation is:

• X1 and X2 are the independent variables• a is the Y-intercept• b1 is the net change in Y for each unit change

in X1 holding X2 constant. It is called a regression coefficient

1 1 2 2Y a b X b X= + +

2

2

Regression Plane for a 2-Independent Variable Linear

Regression Equation

3

3

Multiple Regression Analysis

• The general multiple regression with k independent variables is given by:

• The least squares criterion is used to develop this equation. Because determining b1, b2, etc., is very tedious, a software package such as Excel or MINITAB is recommended.

1 1 2 2 3 3ˆ

k kY a b X b X b X b X= + + + + +!

4

4

An ExampleSalsberry Realty sells homes along the East Coast of

the United States. One of the questions most frequently asked by prospective buyers is: If we purchase this home, how much can we expect to pay to heat it during the winter? The research department at Salsberry has been asked to develop some guidelines regarding heating costs for single-family homes.

Three variables are thought to relate to the heating costs: (1) the mean daily outside temperature, (2) the number of inches of insulation in the attic, and (3) the age in years of the furnace.

To investigate, Salsberry’s research department selected a random sample of 20 recently sold homes. It determined the cost to heat each home last January, as well.

5

5

Example Data

6

6

Page 2: Ch14 - Multiple Regression and Correlation

2

Multiple Linear Regression in Excel

7

7

Multiple Linear Regression in Excel

8

8

Interpreting the Regression Coefficients

1. The regression coefficient for mean outside temperature is 4.583. The coefficient is negative and shows an inverse relationship between heating cost and temperature. As the outside temperature increases, the cost to heat the home decreases. If we increase temperature by 1 degree and hold the other two independent variables constant, we can estimate a decrease of $4.583 in monthly heating cost.

2. The attic insulation variable also shows an inverse relationship: the more insulation in the attic, the less the cost to heat the home. So the negative sign for this coefficient is logical. For each additional inch of insulation, we expect the cost to heat the home to decline $14.83 per month, regardless of the outside temperature or the age of the furnace.

3. The age of the furnace variable shows a direct relationship. With an older furnace, the cost to heat the home increases. Specifically, for each additional year older the furnace is, we expect the cost to increase $6.10 per month.

1 2 3ˆ 427.194 4.583 14.831 6.101Y X X X= - - +

9

9

Applying the Model for Estimation

• What is the estimated heating cost for a home if the mean outside temperature is 30degrees, there are 5 inches of insulation in the attic, and the furnace is 10 years old?

ˆ 427.194 4.583(30) 14.831(5) 6.101(10) 276.56Y = - - + =

10

10

Measures of Effectiveness

• Coefficient of multiple determination, R2

• For larger number of independent variables, we need to adjust this equation

11

11

Global Test: All Regression Coefficients are Zero

• The hypothesis is• H0: β1 = β2 = β3 = 0• H1: Not all βis are 0

12

12

Page 3: Ch14 - Multiple Regression and Correlation

3

Testing Each Independent Variable for Inclusion

• The test for individual variables determines which independent variables have regression coefficients that differ significantly from zero

• The variables that have zero regression coefficients are usually dropped from the analysis

• Can also be done using p-values

13

13

Testing Each Independent Variable for Inclusion

14

14

Stepwise Regression

• A step-by-step method to determine a regression equation that begins with a single independent variable and adds or deletes independent variables one by one. Only independent variables with nonzero regression coefficients are included in the regression equation

15

15

“Dummy” Variables

• These are “binary” variables, like “yes” “no”, that are most typically qualitative/categorical variables

• We can use these in our regression models by using 0s and 1s

• Example: In our real estate problem, we might want to include whether or not the home has an outdoor pool• Yes = 1, No = 0

16

16

Some Assumptions and Tests1. There is a linear relationship• Use a scatter diagram to plot the dependent variable

against each independent variable2. Homoscedasticity, variation of residuals are the

same (constant)• Use a scatter diagram, plot the predicted (on the

horizontal axis) vs residuals (on the vertical axis). There should be no trend or correlation

3. Residuals are normally distributed• Use a frequency diagram to plot the residuals.

Residuals should be normally distributed• Or use Chi-square tests for actual distribution

compared to a perfect normal distribution

17

17

There is a Linear Relationship

18

18

Page 4: Ch14 - Multiple Regression and Correlation

4

Homoscedasticity

19

19

Residuals areNormally Distributed

20

20

Some Assumptions and Tests (continued)

4. Independent variables should not be correlated (multicollinearity)• VIF should be less than 10 (in model)• Calculate correlation for independent variables (<.7)• Or, plot each independent variable against each

other5. Successive residuals should be independent

(autocorrelation)• Uses a scatter diagram (like #1). There should not be

any pattern of negative or positive trend, meaning slope = 0

• Or Durbin-Watson testBe careful predicting outside of data range!21

21

Multicollinearity

• Variables should not be correlated with each other. If so, one should be eliminated.

• VIF is used to identify correlated independent variables

22

22

Autocorrelation

23

23