ba 201 lecture 14 multiple regression model. topics developing the multiple linear regression...

39
BA 201 Lecture 14 Multiple Regression Model

Upload: augustus-atkins

Post on 27-Dec-2015

225 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: BA 201 Lecture 14 Multiple Regression Model. Topics Developing the Multiple Linear Regression Inferences on Population Regression Coefficients Pitfalls

BA 201

Lecture 14Multiple Regression Model

Page 2: BA 201 Lecture 14 Multiple Regression Model. Topics Developing the Multiple Linear Regression Inferences on Population Regression Coefficients Pitfalls

Topics Developing the Multiple Linear

Regression Inferences on Population Regression

Coefficients Pitfalls in Multiple Regression and Ethical

Issues

Page 3: BA 201 Lecture 14 Multiple Regression Model. Topics Developing the Multiple Linear Regression Inferences on Population Regression Coefficients Pitfalls

0 1 1 2 2i i i k ki iY b b X b X b X e

Population Y-intercept

Population slopes Random Error

The Multiple Regression Model

Relationship between 1 dependent & 2 or more independent variables is a linear

function

Dependent (Response) variable for sample

Independent (Explanatory) variables for sample model

1 2i i i k ki iY X X X

Residual

Page 4: BA 201 Lecture 14 Multiple Regression Model. Topics Developing the Multiple Linear Regression Inferences on Population Regression Coefficients Pitfalls

Simple Linear Regression Model Revisited

Y

XObserved Value

|Y X iX

i

ii iY X

0 1i iY b b X

ie

0 1i iib bY X e 1b

0b

Page 5: BA 201 Lecture 14 Multiple Regression Model. Topics Developing the Multiple Linear Regression Inferences on Population Regression Coefficients Pitfalls

Population Multiple Regression Model

X2

Y

X1Y|X = 0 + 1X 1i + 2X 2i

0

Y i = 0 + 1X 1i + 2X 2i + i

ResponsePlane

(X 1i,X 2i)

(O bserved Y )

i

X2

Y

X1Y|X = 0 + 1X 1i + 2X 2i

0

Y i = 0 + 1X 1i + 2X 2i + i

ResponsePlane

(X 1i,X 2i)

(O bserved Y )

i

Bivariate model(2 Independent Variables: X1 and X2)

i

Page 6: BA 201 Lecture 14 Multiple Regression Model. Topics Developing the Multiple Linear Regression Inferences on Population Regression Coefficients Pitfalls

Sample Multiple Regression Model

X2

Y

X1

b0

Y i = b0 + b1X 1 i + b2X 2 i + e i

ResponsePlane

(X 1i, X 2i)

(O bserved Y)

^

e i

Y i = b0 + b1X 1 i + b2X 2 i

X2

Y

X1

b0

Y i = b0 + b1X 1 i + b2X 2 i + e i

ResponsePlane

(X 1i, X 2i)

(O bserved Y)

^

e i

Y i = b0 + b1X 1 i + b2X 2 i

Bivariate model

Sample Regression PlaneSample Regression Plane

Page 7: BA 201 Lecture 14 Multiple Regression Model. Topics Developing the Multiple Linear Regression Inferences on Population Regression Coefficients Pitfalls

Multiple Linear Regression Equation

Too complicated

by hand! Ouch!

Page 8: BA 201 Lecture 14 Multiple Regression Model. Topics Developing the Multiple Linear Regression Inferences on Population Regression Coefficients Pitfalls

Multiple Regression Model: Example

Oil (Gal) Temp Insulation275.30 40 3363.80 27 3164.30 40 1040.80 73 694.30 64 6

230.90 34 6366.70 9 6300.60 8 10237.80 23 10121.40 63 331.40 65 10

203.50 41 6441.10 21 3323.00 38 352.50 58 10

(0F)

Develop a model for estimating heating oil used for a single family home in the month of January based on average temperature and amount of insulation in inches.

Page 9: BA 201 Lecture 14 Multiple Regression Model. Topics Developing the Multiple Linear Regression Inferences on Population Regression Coefficients Pitfalls

Multiple Regression in PHStat

PHStat | Regression | Multiple Regression …

EXCEL spreadsheet for the heating oil example.

Microsoft Excel Worksheet

Page 10: BA 201 Lecture 14 Multiple Regression Model. Topics Developing the Multiple Linear Regression Inferences on Population Regression Coefficients Pitfalls

1 2ˆ 562.151 5.437 20.012i i iY X X

Sample Multiple Regression Equation: Example

CoefficientsIntercept 562.1510092X Variable 1 -5.436580588X Variable 2 -20.01232067

Excel Output

For each degree increase in temperature, the estimated average amount of heating oil used is decreased by 5.437 gallons, holding insulation constant.

For each increase in one inch of insulation, the estimated average use of heating oil is decreased by 20.012 gallons, holding temperature constant.

0 1 1 2 2i i i k kiY b b X b X b X

Page 11: BA 201 Lecture 14 Multiple Regression Model. Topics Developing the Multiple Linear Regression Inferences on Population Regression Coefficients Pitfalls

Interpretation of Estimated Coefficients

Slope (bi) Estimated that the average value of Y changes

by bi for each 1 unit increase in Xi holding all other variables constant (ceterus paribus)

Example: If b1 = -2, then fuel oil usage (Y) is expected to decrease by an estimated 2 gallons for each 1 degree increase in temperature (X1) given the inches of insulation (X2)

Y-Intercept (b0) The estimated average value of Y when all Xi = 0

Page 12: BA 201 Lecture 14 Multiple Regression Model. Topics Developing the Multiple Linear Regression Inferences on Population Regression Coefficients Pitfalls

Simple and Multiple Regression Compared

Coefficients in a simplesimple regression pick up the impact of that variable plus the impacts of other variables that are correlated with it and the dependent variable but are excluded from the model.

Coefficients in a multiplemultiple regression net out the impacts of other variables in the equation. Hence they are called the net regression

coefficients. They still pick up the effects of other variables that

excluded form the model but are correlated with the included variables and the dependent variable.

Page 13: BA 201 Lecture 14 Multiple Regression Model. Topics Developing the Multiple Linear Regression Inferences on Population Regression Coefficients Pitfalls

Simple and Multiple Regression

Compared:Example

Two simple regressions:

Multiple Regression:

0 1

0 2

Oil Temp

Oil Insulation

0 1 2Oil Temp Insulation

Page 14: BA 201 Lecture 14 Multiple Regression Model. Topics Developing the Multiple Linear Regression Inferences on Population Regression Coefficients Pitfalls

CoefficientsIntercept 562.1510092Temp -5.436580588Insulation -20.01232067

Simple and Multiple Regression Compared: Excel

Output

0 1 2Oil Temp Insulationb b b e

0 1Oil Tempb b e 0 2Oil Insulationb b e

CoefficientsIntercept 436.4382299Temp -5.462207697

CoefficientsIntercept 345.3783784Insulation -20.35027027

-20.0123 -20.3503

-5.4366 -5.4622

Page 15: BA 201 Lecture 14 Multiple Regression Model. Topics Developing the Multiple Linear Regression Inferences on Population Regression Coefficients Pitfalls

Simple and Multiple Regression Compared: Excel

Output

Regression StatisticsMultiple R 0.982654757R Square 0.965610371Adjusted R Square 0.959878766Standard Error 26.01378323Observations 15

0 1 2Oil Temp Insulation

0 1Oil Temp 0 1Oil Insulation Regression Statistics

Multiple R 0.86974117R Square 0.756449704Adjusted R Square 0.737715065Standard Error 66.51246564Observations 15

Regression StatisticsMultiple R 0.465082527R Square 0.216301757Adjusted R Square 0.156017277Standard Error 119.3117327Observations 15

0.75645 0.96561 0. 30 216

0.97275

Page 16: BA 201 Lecture 14 Multiple Regression Model. Topics Developing the Multiple Linear Regression Inferences on Population Regression Coefficients Pitfalls

Venn Diagrams and Explanatory Power of a Simple

Regression

Oil

Temp

Variations in Oil explained by Temp or variations in Temp used in explaining variation in Oil

Variations in Oil explained by the error term

Variations in Temp not used in explaining variation in Oil

SSE

SSR

Page 17: BA 201 Lecture 14 Multiple Regression Model. Topics Developing the Multiple Linear Regression Inferences on Population Regression Coefficients Pitfalls

Venn Diagrams and Explanatory Power of a Simple

Regression

Oil

Temp

2

r

SSR

SSR SSE

(continued)

Page 18: BA 201 Lecture 14 Multiple Regression Model. Topics Developing the Multiple Linear Regression Inferences on Population Regression Coefficients Pitfalls

Venn Diagrams and Explanatory Power of a Multiple

Regression

Oil

TempInsulation

Overlapping Overlapping variation in both Temp and Insulation are used in explaining the variationvariation in Oil but NOTNOT in the estimationestimation of nor

12

Variation NOTNOT explained by Temp nor Insulation SSE

Page 19: BA 201 Lecture 14 Multiple Regression Model. Topics Developing the Multiple Linear Regression Inferences on Population Regression Coefficients Pitfalls

Coefficient of Multiple Determination

Proportion of Total Variation in Y Explained by All X Variables Taken Together

Never Decreases When a New X Variable is Added to Model Disadvantage When Comparing Models

212

Explained Variation

Total VariationY k

SSRr

SST

Page 20: BA 201 Lecture 14 Multiple Regression Model. Topics Developing the Multiple Linear Regression Inferences on Population Regression Coefficients Pitfalls

Venn Diagrams and Explanatory Power of

Regression

Oil

TempInsulation

212

Yr

SSR

SSR SSE

Page 21: BA 201 Lecture 14 Multiple Regression Model. Topics Developing the Multiple Linear Regression Inferences on Population Regression Coefficients Pitfalls

Adjusted Coefficient of Multiple Determination

Proportion of Variation in Y Explained by All X Variables adjusted for the Number of X Variables Used and the Sample Size

Penalize Excessive Use of Independent Variables Smaller than Useful in Comparing among Models Could Decrease If an Insignificant New X

Variable Is Added to the Model

2 212

11 1

1adj Y k

nr r

n k

212Y kr

Page 22: BA 201 Lecture 14 Multiple Regression Model. Topics Developing the Multiple Linear Regression Inferences on Population Regression Coefficients Pitfalls

Coefficient of Multiple Determination

Regression StatisticsMultiple R 0.982654757R Square 0.965610371Adjusted R Square 0.959878766Standard Error 26.01378323Observations 15

Excel Output

SST

SSRr ,Y 2

12

Adjusted r2

reflects the number of explanatory variables and sample size

is smaller than r2

Page 23: BA 201 Lecture 14 Multiple Regression Model. Topics Developing the Multiple Linear Regression Inferences on Population Regression Coefficients Pitfalls

Interpretation of Coefficient of Multiple Determination

96.56% of the total variation in heating oil can be explained by different temperature and the variation in the amount of insulation

95.99% of the total fluctuation in heating oil can be explained by different temperature and the variation in the amount of insulation after adjusting for the number of explanatory variables and sample size

2,12 .9656Y

SSRr

SST

2adj .9599r

Page 24: BA 201 Lecture 14 Multiple Regression Model. Topics Developing the Multiple Linear Regression Inferences on Population Regression Coefficients Pitfalls

Example: Adjusted r2 Can Decrease

Regression StatisticsMultiple R 0.982654757R Square 0.965610371Adjusted R Square 0.959878766Standard Error 26.01378323Observations 15

0 1 2Oil Temp Insulation

0 1 2 3Oil Temp Insulation Color

Regression StatisticsMultiple R 0.983482856R Square 0.967238528Adjusted R Square 0.958303581Standard Error 25.72417272Observations 15

Adjusted r 2 decreases when k increases from 2 to 3

Page 25: BA 201 Lecture 14 Multiple Regression Model. Topics Developing the Multiple Linear Regression Inferences on Population Regression Coefficients Pitfalls

Using The Model to Make Predictions

Predict the amount of heating oil used for a home if the average temperature is 300 and the insulation is 6 inches.

The predicted heating oil used is 278.97 gallons

1 2

ˆ 562.151 5.437 20.012

562.151 5.437 30 20.012 6

278.969

i i iY X X

Page 26: BA 201 Lecture 14 Multiple Regression Model. Topics Developing the Multiple Linear Regression Inferences on Population Regression Coefficients Pitfalls

Predictions in PHStat

PHStat | Regression | Multiple Regression … Check the “Confidence and Prediction

Interval Estimate” box EXCEL spreadsheet for the heating oil

example.

Microsoft Excel Worksheet

Page 27: BA 201 Lecture 14 Multiple Regression Model. Topics Developing the Multiple Linear Regression Inferences on Population Regression Coefficients Pitfalls

Another Example

The Excel spreadsheet that contains the multiple regression result of regressing Mid-term scores on quiz scores and attendance score

Microsoft Excel Worksheet

Page 28: BA 201 Lecture 14 Multiple Regression Model. Topics Developing the Multiple Linear Regression Inferences on Population Regression Coefficients Pitfalls

Residual Plots

Residuals Vs May need to transform Y variable

Residuals Vs May need to transform variable

Residuals Vs May need to transform variable

Residuals Vs Time May have autocorrelation

Y

1X

2X1X

2X

Page 29: BA 201 Lecture 14 Multiple Regression Model. Topics Developing the Multiple Linear Regression Inferences on Population Regression Coefficients Pitfalls

Residual Plots: Example

Insulation Residual Plot

0 2 4 6 8 10 12

No Discernable Pattern

Temperature Residual Plot

-60

-40

-20

0

20

40

60

0 20 40 60 80

Re

sid

ua

ls

Maybe some non-linear relationship

Page 30: BA 201 Lecture 14 Multiple Regression Model. Topics Developing the Multiple Linear Regression Inferences on Population Regression Coefficients Pitfalls

Testing for Overall Significance

Shows if there is a Linear Relationship between all of the X Variables Together and Y

Shows if Y Depends Linearly on all of the X Variables Together as a Group

Use F Test Statistic Hypotheses:

H0: …k = 0 (No linear relationship) H1: At least one i ( At least one independent

variable affects Y ) The Null Hypothesis is a Very Strong Statement Almost Always Reject the Null Hypothesis

Page 31: BA 201 Lecture 14 Multiple Regression Model. Topics Developing the Multiple Linear Regression Inferences on Population Regression Coefficients Pitfalls

Testing for Overall Significance

Test Statistic:

where F has k numerator and (n-k-1) denominator degrees of freedom

(continued)

all /

all

SSR kMSRF

MSE MSE

Page 32: BA 201 Lecture 14 Multiple Regression Model. Topics Developing the Multiple Linear Regression Inferences on Population Regression Coefficients Pitfalls

Test for Overall SignificanceExcel Output: Heating Oil

Example

ANOVAdf SS MS F Significance F

Regression 2 228014.6 114007.3 168.4712 1.65411E-09Residual 12 8120.603 676.7169Total 14 236135.2

k = 2, the number of explanatory variables n - 1

p value

Test StatisticMSR

FMSE

Page 33: BA 201 Lecture 14 Multiple Regression Model. Topics Developing the Multiple Linear Regression Inferences on Population Regression Coefficients Pitfalls

Test for Overall SignificanceExample Solution

F0 3.89

H0: 1 = 2 = … = k = 0

H1: At least one i 0 = .05df = 2 and 12

Critical Value(s):

Test Statistic:

Decision:

Conclusion:

Reject at = 0.05

There is evidence that at least one independent variable affects Y

= 0.05

F 168.47(Excel Output)

Page 34: BA 201 Lecture 14 Multiple Regression Model. Topics Developing the Multiple Linear Regression Inferences on Population Regression Coefficients Pitfalls

Test for Significance:Individual Variables

Shows if There is a Linear Relationship Between the Variable Xi and Y while Holding the Effects of other X’s Fixed

Show if Y Depends Linearly on a Single Xi Individually while Holding the Effects of other X’s Fixed

Use t Test Statistic Hypotheses:

H0: i 0 (No linear relationship) H1: i 0 (Linear relationship between Xi and Y)

Page 35: BA 201 Lecture 14 Multiple Regression Model. Topics Developing the Multiple Linear Regression Inferences on Population Regression Coefficients Pitfalls

t Test StatisticExcel Output: Example

Coefficients Standard Error t StatIntercept 562.1510092 21.09310433 26.65093769X Variable 1 -5.436580588 0.336216167 -16.16989642X Variable 2 -20.01232067 2.342505227 -8.543127434

t Test Statistic for X1 (Temperature)

t Test Statistic for X2 (Insulation)

i

i

b

btS

Page 36: BA 201 Lecture 14 Multiple Regression Model. Topics Developing the Multiple Linear Regression Inferences on Population Regression Coefficients Pitfalls

t Test : Example Solution

H0: 1 = 0

H1: 1 0

df = 12

Critical Value(s):

Test Statistic:

Decision:

Conclusion:

Reject H0 at = 0.05

There is evidence of a significant effect of temperature on oil consumption.t0 2.1788-2.1788

.025

Reject H0 Reject H0

.025

Does temperature have a significant effect on monthly consumption of heating oil? Test at = 0.05.

t Test Statistic = -16.1699

0b1

Page 37: BA 201 Lecture 14 Multiple Regression Model. Topics Developing the Multiple Linear Regression Inferences on Population Regression Coefficients Pitfalls

Confidence Interval Estimate for the Slope

Provide the 95% confidence interval for the population slope 1 (the effect of temperature on oil consumption).

11 1n p bb t S

Coefficients Lower 95% Upper 95%Intercept 562.151009 516.1930837 608.108935X Variable 1 -5.4365806 -6.169132673 -4.7040285X Variable 2 -20.012321 -25.11620102 -14.90844

-6.169 1 -4.704

The estimated average consumption of oil is reduced by between 4.7 gallons to 6.17 gallons per each increase of 10 F.

Page 38: BA 201 Lecture 14 Multiple Regression Model. Topics Developing the Multiple Linear Regression Inferences on Population Regression Coefficients Pitfalls

Additional Pitfalls and Ethical Issues

Fail to Understand that Interpretation of the Estimated Regression Coefficients are Performed Holding All Other Independent Variables Constant

Fail to Evaluate Residual Plots for Each Independent Variable

Page 39: BA 201 Lecture 14 Multiple Regression Model. Topics Developing the Multiple Linear Regression Inferences on Population Regression Coefficients Pitfalls

Summary

Developed the Multiple Regression Model

Addressed Testing the Significance of the Multiple Regression Model

Discussed Inferences on Population Regression Coefficients

Addressed Pitfalls in Multiple Regression and Ethical Issues