ba 201 lecture 14 multiple regression model. topics developing the multiple linear regression...
TRANSCRIPT
BA 201
Lecture 14Multiple Regression Model
Topics Developing the Multiple Linear
Regression Inferences on Population Regression
Coefficients Pitfalls in Multiple Regression and Ethical
Issues
0 1 1 2 2i i i k ki iY b b X b X b X e
Population Y-intercept
Population slopes Random Error
The Multiple Regression Model
Relationship between 1 dependent & 2 or more independent variables is a linear
function
Dependent (Response) variable for sample
Independent (Explanatory) variables for sample model
1 2i i i k ki iY X X X
Residual
Simple Linear Regression Model Revisited
Y
XObserved Value
|Y X iX
i
ii iY X
0 1i iY b b X
ie
0 1i iib bY X e 1b
0b
Population Multiple Regression Model
X2
Y
X1Y|X = 0 + 1X 1i + 2X 2i
0
Y i = 0 + 1X 1i + 2X 2i + i
ResponsePlane
(X 1i,X 2i)
(O bserved Y )
i
X2
Y
X1Y|X = 0 + 1X 1i + 2X 2i
0
Y i = 0 + 1X 1i + 2X 2i + i
ResponsePlane
(X 1i,X 2i)
(O bserved Y )
i
Bivariate model(2 Independent Variables: X1 and X2)
i
Sample Multiple Regression Model
X2
Y
X1
b0
Y i = b0 + b1X 1 i + b2X 2 i + e i
ResponsePlane
(X 1i, X 2i)
(O bserved Y)
^
e i
Y i = b0 + b1X 1 i + b2X 2 i
X2
Y
X1
b0
Y i = b0 + b1X 1 i + b2X 2 i + e i
ResponsePlane
(X 1i, X 2i)
(O bserved Y)
^
e i
Y i = b0 + b1X 1 i + b2X 2 i
Bivariate model
Sample Regression PlaneSample Regression Plane
Multiple Linear Regression Equation
Too complicated
by hand! Ouch!
Multiple Regression Model: Example
Oil (Gal) Temp Insulation275.30 40 3363.80 27 3164.30 40 1040.80 73 694.30 64 6
230.90 34 6366.70 9 6300.60 8 10237.80 23 10121.40 63 331.40 65 10
203.50 41 6441.10 21 3323.00 38 352.50 58 10
(0F)
Develop a model for estimating heating oil used for a single family home in the month of January based on average temperature and amount of insulation in inches.
Multiple Regression in PHStat
PHStat | Regression | Multiple Regression …
EXCEL spreadsheet for the heating oil example.
Microsoft Excel Worksheet
1 2ˆ 562.151 5.437 20.012i i iY X X
Sample Multiple Regression Equation: Example
CoefficientsIntercept 562.1510092X Variable 1 -5.436580588X Variable 2 -20.01232067
Excel Output
For each degree increase in temperature, the estimated average amount of heating oil used is decreased by 5.437 gallons, holding insulation constant.
For each increase in one inch of insulation, the estimated average use of heating oil is decreased by 20.012 gallons, holding temperature constant.
0 1 1 2 2i i i k kiY b b X b X b X
Interpretation of Estimated Coefficients
Slope (bi) Estimated that the average value of Y changes
by bi for each 1 unit increase in Xi holding all other variables constant (ceterus paribus)
Example: If b1 = -2, then fuel oil usage (Y) is expected to decrease by an estimated 2 gallons for each 1 degree increase in temperature (X1) given the inches of insulation (X2)
Y-Intercept (b0) The estimated average value of Y when all Xi = 0
Simple and Multiple Regression Compared
Coefficients in a simplesimple regression pick up the impact of that variable plus the impacts of other variables that are correlated with it and the dependent variable but are excluded from the model.
Coefficients in a multiplemultiple regression net out the impacts of other variables in the equation. Hence they are called the net regression
coefficients. They still pick up the effects of other variables that
excluded form the model but are correlated with the included variables and the dependent variable.
Simple and Multiple Regression
Compared:Example
Two simple regressions:
Multiple Regression:
0 1
0 2
Oil Temp
Oil Insulation
0 1 2Oil Temp Insulation
CoefficientsIntercept 562.1510092Temp -5.436580588Insulation -20.01232067
Simple and Multiple Regression Compared: Excel
Output
0 1 2Oil Temp Insulationb b b e
0 1Oil Tempb b e 0 2Oil Insulationb b e
CoefficientsIntercept 436.4382299Temp -5.462207697
CoefficientsIntercept 345.3783784Insulation -20.35027027
-20.0123 -20.3503
-5.4366 -5.4622
Simple and Multiple Regression Compared: Excel
Output
Regression StatisticsMultiple R 0.982654757R Square 0.965610371Adjusted R Square 0.959878766Standard Error 26.01378323Observations 15
0 1 2Oil Temp Insulation
0 1Oil Temp 0 1Oil Insulation Regression Statistics
Multiple R 0.86974117R Square 0.756449704Adjusted R Square 0.737715065Standard Error 66.51246564Observations 15
Regression StatisticsMultiple R 0.465082527R Square 0.216301757Adjusted R Square 0.156017277Standard Error 119.3117327Observations 15
0.75645 0.96561 0. 30 216
0.97275
Venn Diagrams and Explanatory Power of a Simple
Regression
Oil
Temp
Variations in Oil explained by Temp or variations in Temp used in explaining variation in Oil
Variations in Oil explained by the error term
Variations in Temp not used in explaining variation in Oil
SSE
SSR
Venn Diagrams and Explanatory Power of a Simple
Regression
Oil
Temp
2
r
SSR
SSR SSE
(continued)
Venn Diagrams and Explanatory Power of a Multiple
Regression
Oil
TempInsulation
Overlapping Overlapping variation in both Temp and Insulation are used in explaining the variationvariation in Oil but NOTNOT in the estimationestimation of nor
12
Variation NOTNOT explained by Temp nor Insulation SSE
Coefficient of Multiple Determination
Proportion of Total Variation in Y Explained by All X Variables Taken Together
Never Decreases When a New X Variable is Added to Model Disadvantage When Comparing Models
212
Explained Variation
Total VariationY k
SSRr
SST
Venn Diagrams and Explanatory Power of
Regression
Oil
TempInsulation
212
Yr
SSR
SSR SSE
Adjusted Coefficient of Multiple Determination
Proportion of Variation in Y Explained by All X Variables adjusted for the Number of X Variables Used and the Sample Size
Penalize Excessive Use of Independent Variables Smaller than Useful in Comparing among Models Could Decrease If an Insignificant New X
Variable Is Added to the Model
2 212
11 1
1adj Y k
nr r
n k
212Y kr
Coefficient of Multiple Determination
Regression StatisticsMultiple R 0.982654757R Square 0.965610371Adjusted R Square 0.959878766Standard Error 26.01378323Observations 15
Excel Output
SST
SSRr ,Y 2
12
Adjusted r2
reflects the number of explanatory variables and sample size
is smaller than r2
Interpretation of Coefficient of Multiple Determination
96.56% of the total variation in heating oil can be explained by different temperature and the variation in the amount of insulation
95.99% of the total fluctuation in heating oil can be explained by different temperature and the variation in the amount of insulation after adjusting for the number of explanatory variables and sample size
2,12 .9656Y
SSRr
SST
2adj .9599r
Example: Adjusted r2 Can Decrease
Regression StatisticsMultiple R 0.982654757R Square 0.965610371Adjusted R Square 0.959878766Standard Error 26.01378323Observations 15
0 1 2Oil Temp Insulation
0 1 2 3Oil Temp Insulation Color
Regression StatisticsMultiple R 0.983482856R Square 0.967238528Adjusted R Square 0.958303581Standard Error 25.72417272Observations 15
Adjusted r 2 decreases when k increases from 2 to 3
Using The Model to Make Predictions
Predict the amount of heating oil used for a home if the average temperature is 300 and the insulation is 6 inches.
The predicted heating oil used is 278.97 gallons
1 2
ˆ 562.151 5.437 20.012
562.151 5.437 30 20.012 6
278.969
i i iY X X
Predictions in PHStat
PHStat | Regression | Multiple Regression … Check the “Confidence and Prediction
Interval Estimate” box EXCEL spreadsheet for the heating oil
example.
Microsoft Excel Worksheet
Another Example
The Excel spreadsheet that contains the multiple regression result of regressing Mid-term scores on quiz scores and attendance score
Microsoft Excel Worksheet
Residual Plots
Residuals Vs May need to transform Y variable
Residuals Vs May need to transform variable
Residuals Vs May need to transform variable
Residuals Vs Time May have autocorrelation
Y
1X
2X1X
2X
Residual Plots: Example
Insulation Residual Plot
0 2 4 6 8 10 12
No Discernable Pattern
Temperature Residual Plot
-60
-40
-20
0
20
40
60
0 20 40 60 80
Re
sid
ua
ls
Maybe some non-linear relationship
Testing for Overall Significance
Shows if there is a Linear Relationship between all of the X Variables Together and Y
Shows if Y Depends Linearly on all of the X Variables Together as a Group
Use F Test Statistic Hypotheses:
H0: …k = 0 (No linear relationship) H1: At least one i ( At least one independent
variable affects Y ) The Null Hypothesis is a Very Strong Statement Almost Always Reject the Null Hypothesis
Testing for Overall Significance
Test Statistic:
where F has k numerator and (n-k-1) denominator degrees of freedom
(continued)
all /
all
SSR kMSRF
MSE MSE
Test for Overall SignificanceExcel Output: Heating Oil
Example
ANOVAdf SS MS F Significance F
Regression 2 228014.6 114007.3 168.4712 1.65411E-09Residual 12 8120.603 676.7169Total 14 236135.2
k = 2, the number of explanatory variables n - 1
p value
Test StatisticMSR
FMSE
Test for Overall SignificanceExample Solution
F0 3.89
H0: 1 = 2 = … = k = 0
H1: At least one i 0 = .05df = 2 and 12
Critical Value(s):
Test Statistic:
Decision:
Conclusion:
Reject at = 0.05
There is evidence that at least one independent variable affects Y
= 0.05
F 168.47(Excel Output)
Test for Significance:Individual Variables
Shows if There is a Linear Relationship Between the Variable Xi and Y while Holding the Effects of other X’s Fixed
Show if Y Depends Linearly on a Single Xi Individually while Holding the Effects of other X’s Fixed
Use t Test Statistic Hypotheses:
H0: i 0 (No linear relationship) H1: i 0 (Linear relationship between Xi and Y)
t Test StatisticExcel Output: Example
Coefficients Standard Error t StatIntercept 562.1510092 21.09310433 26.65093769X Variable 1 -5.436580588 0.336216167 -16.16989642X Variable 2 -20.01232067 2.342505227 -8.543127434
t Test Statistic for X1 (Temperature)
t Test Statistic for X2 (Insulation)
i
i
b
btS
t Test : Example Solution
H0: 1 = 0
H1: 1 0
df = 12
Critical Value(s):
Test Statistic:
Decision:
Conclusion:
Reject H0 at = 0.05
There is evidence of a significant effect of temperature on oil consumption.t0 2.1788-2.1788
.025
Reject H0 Reject H0
.025
Does temperature have a significant effect on monthly consumption of heating oil? Test at = 0.05.
t Test Statistic = -16.1699
0b1
Confidence Interval Estimate for the Slope
Provide the 95% confidence interval for the population slope 1 (the effect of temperature on oil consumption).
11 1n p bb t S
Coefficients Lower 95% Upper 95%Intercept 562.151009 516.1930837 608.108935X Variable 1 -5.4365806 -6.169132673 -4.7040285X Variable 2 -20.012321 -25.11620102 -14.90844
-6.169 1 -4.704
The estimated average consumption of oil is reduced by between 4.7 gallons to 6.17 gallons per each increase of 10 F.
Additional Pitfalls and Ethical Issues
Fail to Understand that Interpretation of the Estimated Regression Coefficients are Performed Holding All Other Independent Variables Constant
Fail to Evaluate Residual Plots for Each Independent Variable
Summary
Developed the Multiple Regression Model
Addressed Testing the Significance of the Multiple Regression Model
Discussed Inferences on Population Regression Coefficients
Addressed Pitfalls in Multiple Regression and Ethical Issues