Download  Multiple Linear Regression Ver1.1

7/27/2019 Multiple Linear Regression Ver1.1
1/61
Section 2: Multiple Linear Regression
Carlos M. CarvalhoThe University of Texas McCombs School of Business
mccombs.utexas.edu/faculty/carlos.carvalho/teaching
1

7/27/2019 Multiple Linear Regression Ver1.1
2/61
The Multiple Regression Model
Many problems involve more than one independent variable orfactor which affects the dependent or response variable.
More than size to predict house price!
Demand for a product given prices of competing brands,advertising,house hold attributes, etc.
In SLR, the conditional mean of Y depends on X. The MultipleLinear Regression (MLR) model extends this idea to include morethan one independent variable.
2

7/27/2019 Multiple Linear Regression Ver1.1
3/61
The MLR ModelSame as always, but with more covariates.
Y = 0 + 1 X 1 + 2 X 2 + + p X p +
Recall the key assumptions of our linear regression model:
(i) The conditional mean of Y is linear in the X j variables.
(ii) The error term (deviations from line)are normally distributedindependent from each otheridentically distributed (i.e., they have constant variance)
Y X 1 . . . X p N ( 0 + 1 X 1 . . . + p X p , 2 )
3

7/27/2019 Multiple Linear Regression Ver1.1
4/61
The MLR Model
Our interpretation of regression coefficients can be extended fromthe simple single covariate regression case:
j = E [Y X 1 , . . . , X p ] X j
Holding all other variables constant, j is the
average change in Y per unit change in X j .
4

7/27/2019 Multiple Linear Regression Ver1.1
5/61
The MLR ModelIf p = 2, we can plot the regression surface in 3D.Consider sales of a product as predicted by price of this product
(P1) and the price of a competing product (P2).
Sales = 0 + 1 P 1 + 2 P 2 +
5

7/27/2019 Multiple Linear Regression Ver1.1
6/61
Least Squares
Y = 0 + 1 X 1 . . . + p X p + , N (0, 2 )
How do we estimate the MLR model parameters?
The principle of Least Squares is exactly the same as before:
Dene the tted values
Find the best tting plane by minimizing the sum of squaredresiduals.
6

7/27/2019 Multiple Linear Regression Ver1.1
7/61
Least SquaresThe data...
p1 p2 Sales
5.1356702 5.2041860 144.48788
3.4954600 8.0597324 637.24524
7.2753406 11.6759787 620.78693
4.6628156 8.3644209 549.00714
3.5845370 2.1502922 20.42542
5.1679168 10.1530371 713.00665
3.3840914 4.9465690 346.706794.2930636 7.7605691 595.77625
4.3690944 7.4288974 457.64694
7.2266002 10.7113247 591.45483
... ... ... 7

7/27/2019 Multiple Linear Regression Ver1.1
8/61
Least Squares
Model: Sales i = 0 + 1 P 1i + 2 P 2i + i , N (0,
2
)Regression Statistics
Multiple R 0.99R Square 0.99
Adjusted R Square 0.99Standard Error 28.42Observations 100.00
ANOVAdf SS MS F Significance F
Regression 2.00 6004047.24 3002023.62 3717.29 0.00Residual 97.00 78335.60 807.58Total 99.00 6082382.84
Coefficients Standard Error t Stat Pvalue Lower 95% Upper 95%Intercept 115.72 8.55 13.54 0.00 98.75 132.68p1 97.66 2.67 36.60 0.00 102.95 92.36p2 108.80 1.41 77.20 0.00 106.00 111.60
b 0 = 0 = 115 .72, b 1 = 1 = 97.66, b 2 = 2 = 108 .80,s = = 28 .42
8

7/27/2019 Multiple Linear Regression Ver1.1
9/61
Plugin Prediction in MLR
Suppose that by using advanced corporate espionage tactics, I
discover that my competitor will charge $10 the next quarter.After some marketing analysis I decided to charge $8. How muchwill I sell?Our model is
Sales = 0 + 1 P 1 + 2 P 2 +
with N (0, 2 )
Our estimates are b 0 = 115, b 1 = 97, b 2 = 109 and s = 28which leads to
Sales = 115 + 97P 1 + 109P 2 +
with N (0, 28
2
) 9

7/27/2019 Multiple Linear Regression Ver1.1
10/61
Plugin Prediction in MLR
By pluggingin the numbers,
Sales = 115 + 978 + 10910 += 437 +
Sales P 1 = 8 , P 2 = 10N (437, 282 )
and the 95% Prediction Interval is (437 228)381 < Sales < 493
10

7/27/2019 Multiple Linear Regression Ver1.1
11/61

7/27/2019 Multiple Linear Regression Ver1.1
12/61

7/27/2019 Multiple Linear Regression Ver1.1
13/61

7/27/2019 Multiple Linear Regression Ver1.1
14/61
Residual Standard Error
The calculation for s 2 is exactly the same:
s 2 =ni =1 e
2i
n
p
1
=ni =1 (Y i Y i )2n
p
1
Y i = b 0 + b 1 X 1 i + + b p X pi The residual standard error is the estimate for the standard
deviation of ,i.e,
= s = s 2 .
14

7/27/2019 Multiple Linear Regression Ver1.1
15/61
Residuals in MLR
As in the SLR model, the residuals in multiple regression are
purged of any linear relationship to the independent variables.Once again, they are on average zero.
Because the tted values are an exact linear combination of the
X s they are not correlated to the residuals.
We decompose Y into the part predicted by X and the part due toidiosyncratic error.
Y = Y + e e = 0; corr(X j , e ) = 0; corr( Y , e ) = 0
15

7/27/2019 Multiple Linear Regression Ver1.1
16/61

7/27/2019 Multiple Linear Regression Ver1.1
17/61

7/27/2019 Multiple Linear Regression Ver1.1
18/61
Fitted Values in MLRWith just P 1...
2 4 6 8
0
2 0 0
4 0 0
6 0 0
8 0 0
1 0 0 0
p1
y =
S a
l e s
300 400 500 600 700
0
2 0 0
4 0 0
6 0 0
8 0 0
1 0 0 0
y.hat (SLR: p1)
y =
S a
l e s
Left plot: Sales vs P 1
Right plot: Sales vs. y (only P 1 as a regressor) 18

7/27/2019 Multiple Linear Regression Ver1.1
19/61

7/27/2019 Multiple Linear Regression Ver1.1
20/61

7/27/2019 Multiple Linear Regression Ver1.1
21/61

7/27/2019 Multiple Linear Regression Ver1.1
22/61

7/27/2019 Multiple Linear Regression Ver1.1
23/61
Intervals for Individual Coefficients
As in SLR, the sampling distribution tells us how close we canexpect b j to be from j
The LS estimators are unbiased: E [b j ] = j for j = 0 , . . . , d .
We denote the sampling distribution of each estimator as
b j
N ( j , s 2
b j )
23

7/27/2019 Multiple Linear Regression Ver1.1
24/61

7/27/2019 Multiple Linear Regression Ver1.1
25/61

7/27/2019 Multiple Linear Regression Ver1.1
26/61
In Excel... Do we know all of these numbers?
Regression StatisticsMultiple R 0.99R Square 0.99
Adjusted R Square 0.99Standard Error 28.42Observations 100.00
ANOVAdf SS MS F Significance F
Regression 2.00 6004047.24 3002023.62 3717.29 0.00Residual 97.00 78335.60 807.58
Total 99.00 6082382.84
Coefficients Standard Error t Stat Pvalue Lower 95% Upper 95%Intercept 115.72 8.55 13.54 0.00 98.75 132.68p1 97.66 2.67 36.60 0.00 102.95 92.36p2 108.80 1.41 77.20 0.00 106.00 111.60
95% C.I. for 1
b 1 2 s b 1
[97.66 2 2.67;97.66 + 2 2.67] = [102.95;92.36]
26

7/27/2019 Multiple Linear Regression Ver1.1
27/61

7/27/2019 Multiple Linear Regression Ver1.1
28/61
S i P f D

7/27/2019 Multiple Linear Regression Ver1.1
29/61
Supervisor Performance Data
$ $ $
29
S i P f D t

7/27/2019 Multiple Linear Regression Ver1.1
30/61
Supervisor Performance Data
$ $ $
$ $ $ $ $ $ $ $ $ $$ $ $ $ $ $ $
Is there any relationship here? Are all the coefficients signicant?What about all of them together?
30
Wh t l k t R 2

7/27/2019 Multiple Linear Regression Ver1.1
31/61
Why not look at R 2
R 2 in MLR is still a measure of goodness of t.
However it ALWAYS grows as we increase the number of
explanatory variables.Even if there is no relationship between the X s and Y ,R 2 > 0!!
To see this lets look at some Garbage Data
31
Garbage Data

7/27/2019 Multiple Linear Regression Ver1.1
32/61
Garbage Data
I made up 6 garbage variables that have nothing to do with Y ...
$ $ $
$ $ $ $ $ $ $ $ $ $ $ $$ $ $ $
$ $ $ $ $ $ $ $ $ $
32

7/27/2019 Multiple Linear Regression Ver1.1
33/61
The F test

7/27/2019 Multiple Linear Regression Ver1.1
34/61
The F test
We are testing:
H 0 : 1 = 2 = . . . p = 0
H 1 : at least one j = 0.
This is the Ftest of overall signicance. Under the null hypothesisf is distributed:
f
F p , n p 1
Generally, f > 4 is very signicant (reject the null).
34
The F test

7/27/2019 Multiple Linear Regression Ver1.1
35/61
The F testWhat kind of distribution is this?
0 2 4 6 8 10
0 . 0
0 . 2
0 . 4
0 . 6
F dist. with 6 and 23 df
d e n s i
t y
It is a right skewed, positive valued family of distributions indexed
by two parameters (the two df values). 35

7/27/2019 Multiple Linear Regression Ver1.1
36/61
Ftest

7/27/2019 Multiple Linear Regression Ver1.1
37/61
F test
The p value for the F test is
pvalue = Pr (F p , n p 1 > f )
We usually reject the null when the pvalue is less than 5%.
Big f REJECT!Small pvalue REJECT!
37

7/27/2019 Multiple Linear Regression Ver1.1
38/61
The Ftest

7/27/2019 Multiple Linear Regression Ver1.1
39/61
The F test
Note that f is also equal to (you can check the math!)
f =SSR / p
SSE / (n p 1)In Excel, the values underMS are SSR / p and SSE / (n
p
1)
$ $ $
$ $
$ $ $ $ $ $ $
f =191.33136.91
= 1 .39
39
Understanding Multiple Regression

7/27/2019 Multiple Linear Regression Ver1.1
40/61
Understanding Multiple Regression
There are two, very important things we need to understandabout the MLR model:
1. How dependencies between the X s affect our interpretation of a multiple regression;
2. How dependencies between the X s inate standard errors (akamulticolinearity)
We will look at a few examples to illustrate the ideas...
40
Understanding Multiple Regression

7/27/2019 Multiple Linear Regression Ver1.1
41/61
U de sta d g u t p e eg ess o
The Sales Data:
Sales : units sold in excess of a baseline
P1 : our price in $ (in excess of a baseline price)
P2 : competitors price (again, over a baseline)
41
Understanding Multiple Regression

7/27/2019 Multiple Linear Regression Ver1.1
42/61
g p g
If we regress Sales on our own price, we obtain a somewhat
surprising conclusion... the higher the price the more we sell!!
9876543210
1000
500
0
p1
S a
l e s
It looks like we should just raise our prices, right?NO, not if
you have taken this statistics class! 42
Understanding Multiple Regression

7/27/2019 Multiple Linear Regression Ver1.1
43/61
g p g
The regression equation for Sales on own price (P1) is:
Sales = 211 + 63 .7P 1
If now we add the competitors price to the regression we get
Sales = 116 97.7P 1 + 109P 2
Does this look better? How did it happen?
Remember: 97.7 is the affect on sales of a change inP 1with P 2 held xed!!
43

7/27/2019 Multiple Linear Regression Ver1.1
44/61
Understanding Multiple Regression

7/27/2019 Multiple Linear Regression Ver1.1
45/61
g p g
Lets look at a subset of points where P 1 varies and P 2 is
held approximately constant...
9876543210
1000
500
0
p1
S a
l e s
151050
9
8
7
65
4
3
2
1
0
p2
p 1
For a xed level of P 2, variation in P 1 is negatively correlatedwith Sales!!
45
Understanding Multiple Regression

7/27/2019 Multiple Linear Regression Ver1.1
46/61
Below, different colors indicate different ranges forP 2...
0 5 10 15
2
4
6
8
sales$p2
s a l e s
$ p 1
2 4 6 8
0
2 0 0
4 0 0
6 0 0
8 0
0
1 0 0 0
sales$p1
s a l e s
$ S a l e s
p1
p2
Sales
p1
for each fixed level of p2there is a negative relationshipbetween sales and p1
larger p1 are associated withlarger p2
46
Understanding Multiple Regression

7/27/2019 Multiple Linear Regression Ver1.1
47/61
Summary:1. A larger P 1 is associated with larger P 2 and the overall effect
leads to bigger sales2. With P 2 held xed, a larger P 1 leads to lower sales3. MLR does the trick and unveils the correct economic
relationship between Sales and prices!
47
Understanding Multiple Regression

7/27/2019 Multiple Linear Regression Ver1.1
48/61
Beer Data (from an MBA class)
nbeer number of beers before getting drunk
height and weight
75706560
20
10
0
height
n b e e r
Is number of beers related to height? 48

7/27/2019 Multiple Linear Regression Ver1.1
49/61
Understanding Multiple Regression

7/27/2019 Multiple Linear Regression Ver1.1
50/61
nbeers = 0 + 1 weight + 2 height + Regression Statistics
Multiple R 0.69R Square 0.48
Adjusted R Square 0.46Standard Error 2.78Observations 50.00
ANOVAdf SS MS F Significance F
Regression 2.00 337.24 168.62 21.75 0.00Residual 47.00 364.38 7.75Total 49.00 701.63
Coefficients Standard Error t Stat Pvalue Lower 95% Upper 95%Intercept 11.19 10.77 1.04 0.30 32.85 10.48weight 0.09 0.02 3.58 0.00 0.04 0.13height 0.08 0.20 0.40 0.69 0.32 0.47
What about now?? Height is not necessarily a factor...
50
Understanding Multiple Regression

7/27/2019 Multiple Linear Regression Ver1.1
51/61
200150100
75
70
65
60
weight
h e
i g h t nbeer weight
weight 0.692height 0.582 0.806
The correlations:
The two xs are
highly correlated !!
If we regress beers only on height we see an effect. Bigger
heights go with more beers.However, when height goes up weight tends to go up as well...in the rst regression, height was a proxy for the realcause of drinking ability. Bigger people can drink more and weight is a
more accurate measure of bigness. 51
Understanding Multiple Regression

7/27/2019 Multiple Linear Regression Ver1.1
52/61
200150100
75
70
65
60
weight
h e
i g h t nbeer weight
weight 0.692height 0.582 0.806
The correlations:
The two xs arehighly correlated !!
In the multiple regression, when we consider only the variationin height that is not associated with variation in weight, wesee no relationship between height and beers.
52
Understanding Multiple Regression

7/27/2019 Multiple Linear Regression Ver1.1
53/61
nbeers = 0 + 1 weight +
Regression Statistics Multiple R 0.69R Square 0.48
Adjusted R 0.47Standard 2.76Observatio 50
ANOVAdf SS MS F Significance F
Regressio 1 336.0317807 336.0318 44.11878 2.60227E08Residual 48 365.5932193 7.616525Total 49 701.625
Coefficient Standard Error t Stat Pvalue Lower 95% Upper 95% Intercept 7.021 2.213 3.172 0.003 11.471 2.571weight 0.093 0.014 6.642 0.000 0.065 0.121
Why is this a better model than the one with weight and height??53
Understanding Multiple Regression

7/27/2019 Multiple Linear Regression Ver1.1
54/61
In general, when we see a relationship betweeny and x (or x s),that relationship may be driven by variables lurking in thebackground which are related to your current x s.
This makes it hard to reliably nd causal relationships. Anycorrelation (association) you nd could be caused by othervariables in the background... correlation is NOT causation
Any time a report says two variables are related and theres asuggestion of a causal relationship, ask yourself whether or notother variables might be the real reason for the effect. Multipleregression allows us to control for all important variables byincluding them into the regression. Once we control for weight,
height and beers are NOT related!! 54
Understanding Multiple Regression

7/27/2019 Multiple Linear Regression Ver1.1
55/61
With the above examples we saw how the relationship
amongst the X s can affect our interpretation of a multipleregression... we will now look at how these dependencies willinate the standard errors for the regression coefficients, andhence our uncertainty about them.
Remember that in simple linear regression our uncertaintyabout b 1 is measured by
s 2b 1
=s 2
(n 1)s 2x =
s 2
ni =1 (X i X )2The more variation in X (the larger s 2x ) the more we knowabout 1 ... ie, (b 1 1 ) is smaller.
55
Understanding Multiple Regression

7/27/2019 Multiple Linear Regression Ver1.1
56/61
In Multiple Regression we seek to relate the variation inY tothe variation in an X holding the other X s xed. So, we needto see how much each X varies on its own.in MLR, the standard errors are dened by the followingformula:
s 2b j = s 2
variation in X j not associated with other X s
How do we measure the bottom part of the equation? Weregress X j on all the other X s and compute the residual sumof squares (call it SSE j ) so that
s 2b j =s 2
SSE j 56
Understanding Multiple Regression

7/27/2019 Multiple Linear Regression Ver1.1
57/61
In the number of beers example... s = 2 .78 and the regressionon height on weight gives...
SUMMARY OUTPUT
Regression StatisticsMultiple R 0.806R Square 0.649Adjusted R Squa 0.642Standard Error 2.051Observations 50.000
ANOVAdf SS MS F Significance F
Regression 1.000 373.148 373.148 88.734 0.000Residual 48.000 201.852 4.205
Total 49.000 575.000
Coefficients Standard Error t Stat Pvalue Lower 95% Upper 95%Intercept 53.751 1.645 32.684 0.000 50.444 57.058weight 0.098 0.010 9.420 0.000 0.077 0.119
SSE 2 = 201 .85
sb 2 =2.782
201.85 = 0 .20 Is this right? 57
Understanding Multiple Regression

7/27/2019 Multiple Linear Regression Ver1.1
58/61
What happens if we are regressing Y on X s that are highlycorrelated. SSE j goes down and the standard error s b j goesup!
What is the effect on the condence intervals (b j 2 s b j )?They get wider!This situation is called Multicolinearity
If a variable X does nothing on its own we cant estimateits effect on Y .
58
Back to Baseball Lets try to add AVG on top of OBPRegression Statistics

7/27/2019 Multiple Linear Regression Ver1.1
59/61
Regression StatisticsMultiple R 0.948136R Square 0.898961Adjusted R Square 0.891477Standard Error 0.160502Observations 30
ANOVA
df SS MS F Significance F Regression 2 6.188355 3.094177 120.1119098 3.63577E 14
Residual 27 0.695541 0.025761Total 29 6.883896
Coefficients andard Err t Stat P value Lower 95% Upper 95%Intercept 7.933633 0.844353 9.396107 5.30996E 10 9.666102081 6.201163AVG 7.810397 4.014609 1.945494 0.062195793 0.426899658 16.04769OBP 31.77892 3.802577 8.357205 5.74232E 09 23.9766719 39.58116
R / G = 0 + 1 AVG + 2 OBP +
Is AVG any good? 59
Back to Baseball  Now lets add SLGRegression Statistics

7/27/2019 Multiple Linear Regression Ver1.1
60/61
Regression StatisticsMultiple R 0.955698R Square 0.913359Adjusted R Square 0.906941Standard Error 0.148627Observations 30
ANOVA
df SS MS F Significance F Regression 2 6.28747 3.143735 142.31576 4.56302E 15Residual 27 0.596426 0.02209Total 29 6.883896
Coefficients andard Err t Stat P value Lower 95% Upper 95%Intercept 7.014316 0.81991 8.554984 3.60968E 09 8.69663241 5.332OBP 27.59287 4.003208 6.892689 2.09112E 07 19.37896463 35.80677
SLG 6.031124 2.021542 2.983428 0.005983713 1.883262806 10.17899
R / G = 0 + 1 OBP + 2 SLG +
What about now? Is SLG any good 60
Back to Baseball

7/27/2019 Multiple Linear Regression Ver1.1
61/61
CorrelationsAVG 1OBP 0.77 1SLG 0.75 0.83 1
When AVG is added to the model with OBP, no additionalinformation is conveyed. AVG does nothing on its own tohelp predict Runs per Game...
SLG however, measures something that OBP doesnt (power!)and by doing something on its own it is relevant to helppredict Runs per Game. (Okay, but not much...)
61