Transcript
Page 1: Multiple Linear Regression Ver1.1

7/27/2019 Multiple Linear Regression Ver1.1

http://slidepdf.com/reader/full/multiple-linear-regression-ver11 1/61

Section 2: Multiple Linear Regression

Carlos M. CarvalhoThe University of Texas McCombs School of Business

mccombs.utexas.edu/faculty/carlos.carvalho/teaching

1

Page 2: Multiple Linear Regression Ver1.1

7/27/2019 Multiple Linear Regression Ver1.1

http://slidepdf.com/reader/full/multiple-linear-regression-ver11 2/61

The Multiple Regression Model

Many problems involve more than one independent variable orfactor which affects the dependent or response variable.

More than size to predict house price!

Demand for a product given prices of competing brands,advertising,house hold attributes, etc.

In SLR, the conditional mean of Y depends on X. The MultipleLinear Regression (MLR) model extends this idea to include morethan one independent variable.

2

Page 3: Multiple Linear Regression Ver1.1

7/27/2019 Multiple Linear Regression Ver1.1

http://slidepdf.com/reader/full/multiple-linear-regression-ver11 3/61

The MLR ModelSame as always, but with more covariates.

Y = β 0 + β 1 X 1 + β 2 X 2 + · · ·+ β p X p +

Recall the key assumptions of our linear regression model:

(i) The conditional mean of Y is linear in the X j variables.

(ii) The error term (deviations from line)are normally distributedindependent from each otheridentically distributed (i.e., they have constant variance)

Y |X 1 . . . X p ∼N (β 0 + β 1 X 1 . . . + β p X p , σ

2 )

3

Page 4: Multiple Linear Regression Ver1.1

7/27/2019 Multiple Linear Regression Ver1.1

http://slidepdf.com/reader/full/multiple-linear-regression-ver11 4/61

The MLR Model

Our interpretation of regression coefficients can be extended fromthe simple single covariate regression case:

β j = ∂ E [Y |X 1 , . . . , X p ]∂ X j

Holding all other variables constant, β j is the

average change in Y per unit change in X j .

4

Page 5: Multiple Linear Regression Ver1.1

7/27/2019 Multiple Linear Regression Ver1.1

http://slidepdf.com/reader/full/multiple-linear-regression-ver11 5/61

The MLR ModelIf p = 2, we can plot the regression surface in 3D.Consider sales of a product as predicted by price of this product

(P1) and the price of a competing product (P2).

Sales = β 0 + β 1 P 1 + β 2 P 2 +

5

Page 6: Multiple Linear Regression Ver1.1

7/27/2019 Multiple Linear Regression Ver1.1

http://slidepdf.com/reader/full/multiple-linear-regression-ver11 6/61

Least Squares

Y = β 0 + β 1 X 1 . . . + β p X p + ε, ε∼N (0, σ 2 )

How do we estimate the MLR model parameters?

The principle of Least Squares is exactly the same as before:

Dene the tted values

Find the best tting plane by minimizing the sum of squaredresiduals.

6

Page 7: Multiple Linear Regression Ver1.1

7/27/2019 Multiple Linear Regression Ver1.1

http://slidepdf.com/reader/full/multiple-linear-regression-ver11 7/61

Least SquaresThe data...

p1 p2 Sales

5.1356702 5.2041860 144.48788

3.4954600 8.0597324 637.24524

7.2753406 11.6759787 620.78693

4.6628156 8.3644209 549.00714

3.5845370 2.1502922 20.42542

5.1679168 10.1530371 713.00665

3.3840914 4.9465690 346.706794.2930636 7.7605691 595.77625

4.3690944 7.4288974 457.64694

7.2266002 10.7113247 591.45483

... ... ... 7

Page 8: Multiple Linear Regression Ver1.1

7/27/2019 Multiple Linear Regression Ver1.1

http://slidepdf.com/reader/full/multiple-linear-regression-ver11 8/61

Least Squares

Model: Sales i = β 0 + β 1 P 1i + β 2 P 2i + i , ∼N (0, σ

2

)Regression Statistics

Multiple R 0.99R Square 0.99

Adjusted R Square 0.99Standard Error 28.42Observations 100.00

ANOVAdf SS MS F Significance F

Regression 2.00 6004047.24 3002023.62 3717.29 0.00Residual 97.00 78335.60 807.58Total 99.00 6082382.84

Coefficients Standard Error t Stat P-value Lower 95% Upper 95%Intercept 115.72 8.55 13.54 0.00 98.75 132.68p1 -97.66 2.67 -36.60 0.00 -102.95 -92.36p2 108.80 1.41 77.20 0.00 106.00 111.60

b 0 = β 0 = 115 .72, b 1 = β 1 = −97.66, b 2 = β 2 = 108 .80,s = σ = 28 .42

8

Page 9: Multiple Linear Regression Ver1.1

7/27/2019 Multiple Linear Regression Ver1.1

http://slidepdf.com/reader/full/multiple-linear-regression-ver11 9/61

Plug-in Prediction in MLR

Suppose that by using advanced corporate espionage tactics, I

discover that my competitor will charge $10 the next quarter.After some marketing analysis I decided to charge $8. How muchwill I sell?Our model is

Sales = β 0 + β 1 P 1 + β 2 P 2 +

with ∼N (0, σ 2 )Our estimates are b 0 = 115, b 1 = −97, b 2 = 109 and s = 28

which leads to

Sales = 115 + −97∗P 1 + 109∗P 2 +

with ∼N (0, 28

2

) 9

Page 10: Multiple Linear Regression Ver1.1

7/27/2019 Multiple Linear Regression Ver1.1

http://slidepdf.com/reader/full/multiple-linear-regression-ver11 10/61

Plug-in Prediction in MLR

By plugging-in the numbers,

Sales = 115 + −97∗8 + 109∗10 +

= 437 +

Sales |P 1 = 8 , P 2 = 10∼N (437, 282 )

and the 95% Prediction Interval is (437 ±2∗28)

381 < Sales < 493

10

Page 11: Multiple Linear Regression Ver1.1

7/27/2019 Multiple Linear Regression Ver1.1

http://slidepdf.com/reader/full/multiple-linear-regression-ver11 11/61

Page 12: Multiple Linear Regression Ver1.1

7/27/2019 Multiple Linear Regression Ver1.1

http://slidepdf.com/reader/full/multiple-linear-regression-ver11 12/61

Page 13: Multiple Linear Regression Ver1.1

7/27/2019 Multiple Linear Regression Ver1.1

http://slidepdf.com/reader/full/multiple-linear-regression-ver11 13/61

Page 14: Multiple Linear Regression Ver1.1

7/27/2019 Multiple Linear Regression Ver1.1

http://slidepdf.com/reader/full/multiple-linear-regression-ver11 14/61

Residual Standard Error

The calculation for s 2 is exactly the same:

s 2 =ni =1 e 2i

n

−p

−1

=ni =1 (Y i −Y i )2

n

−p

−1

Y i = b 0 + b 1 X 1 i + · · ·+ b p X pi

The residual “standard error” is the estimate for the standard

deviation of ,i.e,

σ = s = √ s 2 .

14

Page 15: Multiple Linear Regression Ver1.1

7/27/2019 Multiple Linear Regression Ver1.1

http://slidepdf.com/reader/full/multiple-linear-regression-ver11 15/61

Residuals in MLR

As in the SLR model, the residuals in multiple regression are

purged of any linear relationship to the independent variables.Once again, they are on average zero.

Because the tted values are an exact linear combination of the

X ’s they are not correlated to the residuals.

We decompose Y into the part predicted by X and the part due toidiosyncratic error.

Y = Y + e e = 0; corr(X j , e ) = 0; corr( Y , e ) = 0

15

Page 16: Multiple Linear Regression Ver1.1

7/27/2019 Multiple Linear Regression Ver1.1

http://slidepdf.com/reader/full/multiple-linear-regression-ver11 16/61

Page 17: Multiple Linear Regression Ver1.1

7/27/2019 Multiple Linear Regression Ver1.1

http://slidepdf.com/reader/full/multiple-linear-regression-ver11 17/61

Page 18: Multiple Linear Regression Ver1.1

7/27/2019 Multiple Linear Regression Ver1.1

http://slidepdf.com/reader/full/multiple-linear-regression-ver11 18/61

Fitted Values in MLRWith just P 1...

2 4 6 8

0

2 0 0

4 0 0

6 0 0

8 0 0

1 0 0 0

p1

y =

S a

l e s

300 400 500 600 700

0

2 0 0

4 0 0

6 0 0

8 0 0

1 0 0 0

y.hat (SLR: p1)

y =

S a

l e s

Left plot: Sales vs P 1

Right plot: Sales vs. y (only P 1 as a regressor) 18

Page 19: Multiple Linear Regression Ver1.1

7/27/2019 Multiple Linear Regression Ver1.1

http://slidepdf.com/reader/full/multiple-linear-regression-ver11 19/61

Page 20: Multiple Linear Regression Ver1.1

7/27/2019 Multiple Linear Regression Ver1.1

http://slidepdf.com/reader/full/multiple-linear-regression-ver11 20/61

Page 21: Multiple Linear Regression Ver1.1

7/27/2019 Multiple Linear Regression Ver1.1

http://slidepdf.com/reader/full/multiple-linear-regression-ver11 21/61

Page 22: Multiple Linear Regression Ver1.1

7/27/2019 Multiple Linear Regression Ver1.1

http://slidepdf.com/reader/full/multiple-linear-regression-ver11 22/61

Page 23: Multiple Linear Regression Ver1.1

7/27/2019 Multiple Linear Regression Ver1.1

http://slidepdf.com/reader/full/multiple-linear-regression-ver11 23/61

Intervals for Individual Coefficients

As in SLR, the sampling distribution tells us how close we canexpect b j to be from β j

The LS estimators are unbiased: E [b j ] = β j for j = 0 , . . . , d .

We denote the sampling distribution of each estimator as

b j ∼

N (β j , s 2

b j )

23

Page 24: Multiple Linear Regression Ver1.1

7/27/2019 Multiple Linear Regression Ver1.1

http://slidepdf.com/reader/full/multiple-linear-regression-ver11 24/61

Page 25: Multiple Linear Regression Ver1.1

7/27/2019 Multiple Linear Regression Ver1.1

http://slidepdf.com/reader/full/multiple-linear-regression-ver11 25/61

Page 26: Multiple Linear Regression Ver1.1

7/27/2019 Multiple Linear Regression Ver1.1

http://slidepdf.com/reader/full/multiple-linear-regression-ver11 26/61

In Excel... Do we know all of these numbers?

Regression StatisticsMultiple R 0.99R Square 0.99

Adjusted R Square 0.99Standard Error 28.42Observations 100.00

ANOVAdf SS MS F Significance F

Regression 2.00 6004047.24 3002023.62 3717.29 0.00Residual 97.00 78335.60 807.58

Total 99.00 6082382.84

Coefficients Standard Error t Stat P-value Lower 95% Upper 95%Intercept 115.72 8.55 13.54 0.00 98.75 132.68p1 -97.66 2.67 -36.60 0.00 -102.95 -92.36p2 108.80 1.41 77.20 0.00 106.00 111.60

95% C.I. for β 1

≈b 1 ±2 ×s b 1

[−97.66 −2 ×2.67;−97.66 + 2 ×2.67] = [−102.95;−92.36]

26

Page 27: Multiple Linear Regression Ver1.1

7/27/2019 Multiple Linear Regression Ver1.1

http://slidepdf.com/reader/full/multiple-linear-regression-ver11 27/61

Page 28: Multiple Linear Regression Ver1.1

7/27/2019 Multiple Linear Regression Ver1.1

http://slidepdf.com/reader/full/multiple-linear-regression-ver11 28/61

S i P f D

Page 29: Multiple Linear Regression Ver1.1

7/27/2019 Multiple Linear Regression Ver1.1

http://slidepdf.com/reader/full/multiple-linear-regression-ver11 29/61

Supervisor Performance Data

$ $ $

29

S i P f D t

Page 30: Multiple Linear Regression Ver1.1

7/27/2019 Multiple Linear Regression Ver1.1

http://slidepdf.com/reader/full/multiple-linear-regression-ver11 30/61

Supervisor Performance Data

$ $ $

$ $ $ $ $ $ $ $ $ $$ $ $ $ $ $ $

Is there any relationship here? Are all the coefficients signicant?What about all of them together?

30

Wh t l k t R 2

Page 31: Multiple Linear Regression Ver1.1

7/27/2019 Multiple Linear Regression Ver1.1

http://slidepdf.com/reader/full/multiple-linear-regression-ver11 31/61

Why not look at R 2

R 2 in MLR is still a measure of goodness of t.

However it ALWAYS grows as we increase the number of

explanatory variables.Even if there is no relationship between the X s and Y ,R 2 > 0!!

To see this let’s look at some “Garbage” Data

31

Garbage Data

Page 32: Multiple Linear Regression Ver1.1

7/27/2019 Multiple Linear Regression Ver1.1

http://slidepdf.com/reader/full/multiple-linear-regression-ver11 32/61

Garbage Data

I made up 6 “garbage” variables that have nothing to do with Y ...

$ $ $

$ $ $ $ $ $ $ $ $ $ $ $$ $ $ $

$ $ $ $ $ $ $ $ $ $

32

Page 33: Multiple Linear Regression Ver1.1

7/27/2019 Multiple Linear Regression Ver1.1

http://slidepdf.com/reader/full/multiple-linear-regression-ver11 33/61

The F test

Page 34: Multiple Linear Regression Ver1.1

7/27/2019 Multiple Linear Regression Ver1.1

http://slidepdf.com/reader/full/multiple-linear-regression-ver11 34/61

The F -test

We are testing:

H 0 : β 1 = β 2 = . . . β p = 0

H 1 : at least one β j = 0.

This is the F-test of overall signicance. Under the null hypothesisf is distributed:

f ∼

F p , n − p − 1

Generally, f > 4 is very signicant (reject the null).

34

The F test

Page 35: Multiple Linear Regression Ver1.1

7/27/2019 Multiple Linear Regression Ver1.1

http://slidepdf.com/reader/full/multiple-linear-regression-ver11 35/61

The F -testWhat kind of distribution is this?

0 2 4 6 8 10

0 . 0

0 . 2

0 . 4

0 . 6

F dist. with 6 and 23 df

d e n s i

t y

It is a right skewed, positive valued family of distributions indexed

by two parameters (the two df values). 35

Page 36: Multiple Linear Regression Ver1.1

7/27/2019 Multiple Linear Regression Ver1.1

http://slidepdf.com/reader/full/multiple-linear-regression-ver11 36/61

F-test

Page 37: Multiple Linear Regression Ver1.1

7/27/2019 Multiple Linear Regression Ver1.1

http://slidepdf.com/reader/full/multiple-linear-regression-ver11 37/61

F test

The p -value for the F -test is

p-value = Pr (F p , n − p − 1 > f )

We usually reject the null when the p-value is less than 5%.

Big f →REJECT!

Small p-value →REJECT!

37

Page 38: Multiple Linear Regression Ver1.1

7/27/2019 Multiple Linear Regression Ver1.1

http://slidepdf.com/reader/full/multiple-linear-regression-ver11 38/61

The F-test

Page 39: Multiple Linear Regression Ver1.1

7/27/2019 Multiple Linear Regression Ver1.1

http://slidepdf.com/reader/full/multiple-linear-regression-ver11 39/61

The F test

Note that f is also equal to (you can check the math!)

f =SSR / p

SSE / (n −p −1)

In Excel, the values underMS are SSR / p and SSE / (n

−p

−1)

$ $ $

$ $

$ $ $ $ $ $ $

f =191.33136.91

= 1 .39

39

Understanding Multiple Regression

Page 40: Multiple Linear Regression Ver1.1

7/27/2019 Multiple Linear Regression Ver1.1

http://slidepdf.com/reader/full/multiple-linear-regression-ver11 40/61

Understanding Multiple Regression

There are two, very important things we need to understandabout the MLR model:

1. How dependencies between the X ’s affect our interpretation of a multiple regression;

2. How dependencies between the X ’s inate standard errors (akamulticolinearity)

We will look at a few examples to illustrate the ideas...

40

Understanding Multiple Regression

Page 41: Multiple Linear Regression Ver1.1

7/27/2019 Multiple Linear Regression Ver1.1

http://slidepdf.com/reader/full/multiple-linear-regression-ver11 41/61

U de sta d g u t p e eg ess o

The Sales Data:

Sales : units sold in excess of a baseline

P1 : our price in $ (in excess of a baseline price)

P2 : competitors price (again, over a baseline)

41

Understanding Multiple Regression

Page 42: Multiple Linear Regression Ver1.1

7/27/2019 Multiple Linear Regression Ver1.1

http://slidepdf.com/reader/full/multiple-linear-regression-ver11 42/61

g p g

If we regress Sales on our own price, we obtain a somewhat

surprising conclusion... the higher the price the more we sell!!

9876543210

1000

500

0

p1

S a

l e s

It looks like we should just raise our prices, right?NO, not if

you have taken this statistics class! 42

Understanding Multiple Regression

Page 43: Multiple Linear Regression Ver1.1

7/27/2019 Multiple Linear Regression Ver1.1

http://slidepdf.com/reader/full/multiple-linear-regression-ver11 43/61

g p g

The regression equation for Sales on own price (P1) is:

Sales = 211 + 63 .7P 1

If now we add the competitors price to the regression we get

Sales = 116 −97.7P 1 + 109P 2

Does this look better? How did it happen?

Remember: −97.7 is the affect on sales of a change inP 1with P 2 held xed!!

43

Page 44: Multiple Linear Regression Ver1.1

7/27/2019 Multiple Linear Regression Ver1.1

http://slidepdf.com/reader/full/multiple-linear-regression-ver11 44/61

Understanding Multiple Regression

Page 45: Multiple Linear Regression Ver1.1

7/27/2019 Multiple Linear Regression Ver1.1

http://slidepdf.com/reader/full/multiple-linear-regression-ver11 45/61

g p g

Let’s look at a subset of points where P 1 varies and P 2 is

held approximately constant...

9876543210

1000

500

0

p1

S a

l e s

151050

9

8

7

65

4

3

2

1

0

p2

p 1

For a xed level of P 2, variation in P 1 is negatively correlatedwith Sales!!

45

Understanding Multiple Regression

Page 46: Multiple Linear Regression Ver1.1

7/27/2019 Multiple Linear Regression Ver1.1

http://slidepdf.com/reader/full/multiple-linear-regression-ver11 46/61

Below, different colors indicate different ranges forP 2...

0 5 10 15

2

4

6

8

sales$p2

s a l e s

$ p 1

2 4 6 8

0

2 0 0

4 0 0

6 0 0

8 0

0

1 0 0 0

sales$p1

s a l e s

$ S a l e s

p1

p2

Sales

p1

for each fixed level of p2there is a negative relationshipbetween sales and p1

larger p1 are associated withlarger p2

46

Understanding Multiple Regression

Page 47: Multiple Linear Regression Ver1.1

7/27/2019 Multiple Linear Regression Ver1.1

http://slidepdf.com/reader/full/multiple-linear-regression-ver11 47/61

Summary:1. A larger P 1 is associated with larger P 2 and the overall effect

leads to bigger sales2. With P 2 held xed, a larger P 1 leads to lower sales3. MLR does the trick and unveils the “correct” economic

relationship between Sales and prices!

47

Understanding Multiple Regression

Page 48: Multiple Linear Regression Ver1.1

7/27/2019 Multiple Linear Regression Ver1.1

http://slidepdf.com/reader/full/multiple-linear-regression-ver11 48/61

Beer Data (from an MBA class)

nbeer – number of beers before getting drunk

height and weight

75706560

20

10

0

height

n b e e r

Is number of beers related to height? 48

Page 49: Multiple Linear Regression Ver1.1

7/27/2019 Multiple Linear Regression Ver1.1

http://slidepdf.com/reader/full/multiple-linear-regression-ver11 49/61

Understanding Multiple Regression

Page 50: Multiple Linear Regression Ver1.1

7/27/2019 Multiple Linear Regression Ver1.1

http://slidepdf.com/reader/full/multiple-linear-regression-ver11 50/61

nbeers = β 0 + β 1 weight + β 2 height +

Regression StatisticsMultiple R 0.69R Square 0.48

Adjusted R Square 0.46Standard Error 2.78Observations 50.00

ANOVAdf SS MS F Significance F

Regression 2.00 337.24 168.62 21.75 0.00Residual 47.00 364.38 7.75Total 49.00 701.63

Coefficients Standard Error t Stat P-value Lower 95% Upper 95%Intercept -11.19 10.77 -1.04 0.30 -32.85 10.48weight 0.09 0.02 3.58 0.00 0.04 0.13height 0.08 0.20 0.40 0.69 -0.32 0.47

What about now?? Height is not necessarily a factor...

50

Understanding Multiple Regression

Page 51: Multiple Linear Regression Ver1.1

7/27/2019 Multiple Linear Regression Ver1.1

http://slidepdf.com/reader/full/multiple-linear-regression-ver11 51/61

200150100

75

70

65

60

weight

h e

i g h t nbeer weight

weight 0.692height 0.582 0.806

The correlations:

The two x’s are

highly correlated !!

If we regress “beers” only on height we see an effect. Bigger

heights go with more beers.However, when height goes up weight tends to go up as well...in the rst regression, height was a proxy for the realcause of drinking ability. Bigger people can drink more and weight is a

more accurate measure of “bigness”. 51

Understanding Multiple Regression

Page 52: Multiple Linear Regression Ver1.1

7/27/2019 Multiple Linear Regression Ver1.1

http://slidepdf.com/reader/full/multiple-linear-regression-ver11 52/61

200150100

75

70

65

60

weight

h e

i g h t nbeer weight

weight 0.692height 0.582 0.806

The correlations:

The two x’s arehighly correlated !!

In the multiple regression, when we consider only the variationin height that is not associated with variation in weight, wesee no relationship between height and beers.

52

Understanding Multiple Regression

Page 53: Multiple Linear Regression Ver1.1

7/27/2019 Multiple Linear Regression Ver1.1

http://slidepdf.com/reader/full/multiple-linear-regression-ver11 53/61

nbeers = β 0 + β 1 weight +

Regression Statistics Multiple R 0.69R Square 0.48

Adjusted R 0.47Standard 2.76Observatio 50

ANOVAdf SS MS F Significance F

Regressio 1 336.0317807 336.0318 44.11878 2.60227E-08Residual 48 365.5932193 7.616525Total 49 701.625

Coefficient Standard Error t Stat P-value Lower 95% Upper 95% Intercept -7.021 2.213 -3.172 0.003 -11.471 -2.571weight 0.093 0.014 6.642 0.000 0.065 0.121

Why is this a better model than the one with weight and height??53

Understanding Multiple Regression

Page 54: Multiple Linear Regression Ver1.1

7/27/2019 Multiple Linear Regression Ver1.1

http://slidepdf.com/reader/full/multiple-linear-regression-ver11 54/61

In general, when we see a relationship betweeny and x (or x ’s),that relationship may be driven by variables “lurking” in thebackground which are related to your current x ’s.

This makes it hard to reliably nd “causal” relationships. Anycorrelation (association) you nd could be caused by othervariables in the background... correlation is NOT causation

Any time a report says two variables are related and there’s asuggestion of a “causal” relationship, ask yourself whether or notother variables might be the real reason for the effect. Multipleregression allows us to control for all important variables byincluding them into the regression. “Once we control for weight,

height and beers are NOT related”!! 54

Understanding Multiple Regression

Page 55: Multiple Linear Regression Ver1.1

7/27/2019 Multiple Linear Regression Ver1.1

http://slidepdf.com/reader/full/multiple-linear-regression-ver11 55/61

With the above examples we saw how the relationship

amongst the X ’s can affect our interpretation of a multipleregression... we will now look at how these dependencies willinate the standard errors for the regression coefficients, andhence our uncertainty about them.

Remember that in simple linear regression our uncertaintyabout b 1 is measured by

s 2b 1

=s 2

(n −1)s 2x =

s 2

ni =1 (X i −X )2

The more variation in X (the larger s 2x ) the more “we know”about β 1 ... ie, (b 1 −β 1 ) is smaller.

55

Understanding Multiple Regression

Page 56: Multiple Linear Regression Ver1.1

7/27/2019 Multiple Linear Regression Ver1.1

http://slidepdf.com/reader/full/multiple-linear-regression-ver11 56/61

In Multiple Regression we seek to relate the variation inY tothe variation in an X holding the other X ’s xed. So, we needto see how much each X varies on its own.in MLR, the standard errors are dened by the followingformula:

s 2b j = s 2

variation in X j not associated with other X ’s

How do we measure the bottom part of the equation? Weregress X j on all the other X ’s and compute the residual sumof squares (call it SSE j ) so that

s 2b j =s 2

SSE j 56

Understanding Multiple Regression

Page 57: Multiple Linear Regression Ver1.1

7/27/2019 Multiple Linear Regression Ver1.1

http://slidepdf.com/reader/full/multiple-linear-regression-ver11 57/61

In the “number of beers example”... s = 2 .78 and the regressionon height on weight gives...

SUMMARY OUTPUT

Regression StatisticsMultiple R 0.806R Square 0.649Adjusted R Squa 0.642Standard Error 2.051Observations 50.000

ANOVAdf SS MS F Significance F

Regression 1.000 373.148 373.148 88.734 0.000Residual 48.000 201.852 4.205

Total 49.000 575.000

Coefficients Standard Error t Stat P-value Lower 95% Upper 95%Intercept 53.751 1.645 32.684 0.000 50.444 57.058weight 0.098 0.010 9.420 0.000 0.077 0.119

SSE 2 = 201 .85

sb 2 =2.782

201.85 = 0 .20 Is this right? 57

Understanding Multiple Regression

Page 58: Multiple Linear Regression Ver1.1

7/27/2019 Multiple Linear Regression Ver1.1

http://slidepdf.com/reader/full/multiple-linear-regression-ver11 58/61

What happens if we are regressing Y on X ’s that are highlycorrelated. SSE j goes down and the standard error s b j goesup!

What is the effect on the condence intervals (b j ±2 ×s b j )?They get wider!

This situation is called Multicolinearity

If a variable X does nothing “on its own” we can’t estimateits effect on Y .

58

Back to Baseball – Let’s try to add AVG on top of OBPRegression Statistics

Page 59: Multiple Linear Regression Ver1.1

7/27/2019 Multiple Linear Regression Ver1.1

http://slidepdf.com/reader/full/multiple-linear-regression-ver11 59/61

Regression StatisticsMultiple R 0.948136R Square 0.898961Adjusted R Square 0.891477Standard Error 0.160502Observations 30

ANOVA

df SS MS F Significance F Regression 2 6.188355 3.094177 120.1119098 3.63577E ‐14

Residual 27 0.695541 0.025761Total 29 6.883896

Coefficients andard Err t Stat P ‐value Lower 95% Upper 95%Intercept ‐7.933633 0.844353 ‐9.396107 5.30996E ‐10 ‐9.666102081 ‐6.201163AVG 7.810397 4.014609 1.945494 0.062195793 ‐0.426899658 16.04769OBP 31.77892 3.802577 8.357205 5.74232E ‐09 23.9766719 39.58116

R / G = β 0 + β 1 AVG + β 2 OBP +

Is AVG any good? 59

Back to Baseball - Now let’s add SLGRegression Statistics

Page 60: Multiple Linear Regression Ver1.1

7/27/2019 Multiple Linear Regression Ver1.1

http://slidepdf.com/reader/full/multiple-linear-regression-ver11 60/61

Regression StatisticsMultiple R 0.955698R Square 0.913359Adjusted R Square 0.906941Standard Error 0.148627Observations 30

ANOVA

df SS MS F Significance F Regression 2 6.28747 3.143735 142.31576 4.56302E ‐15

Residual 27 0.596426 0.02209Total 29 6.883896

Coefficients andard Err t Stat P ‐value Lower 95% Upper 95%Intercept ‐7.014316 0.81991 ‐8.554984 3.60968E ‐09 ‐8.69663241 ‐5.332OBP 27.59287 4.003208 6.892689 2.09112E ‐07 19.37896463 35.80677

SLG 6.031124 2.021542 2.983428 0.005983713 1.883262806 10.17899

R / G = β 0 + β 1 OBP + β 2 SLG +

What about now? Is SLG any good 60

Back to Baseball

Page 61: Multiple Linear Regression Ver1.1

7/27/2019 Multiple Linear Regression Ver1.1

http://slidepdf.com/reader/full/multiple-linear-regression-ver11 61/61

CorrelationsAVG 1OBP 0.77 1SLG 0.75 0.83 1

When AVG is added to the model with OBP, no additionalinformation is conveyed. AVG does nothing “on its own” tohelp predict Runs per Game...

SLG however, measures something that OBP doesn’t (power!)and by doing something “on its own” it is relevant to helppredict Runs per Game. (Okay, but not much...)

61


Top Related