multiple linear regression (mlr) handoutsyibi/teaching/stat222/2017/...multiple linear regression...

Multiple Linear Regression (MLR) Handouts

Yibi Huang

• Data and Models• Least Square Estimate, Fitted Values, Residuals• Sum of Squares• Do Regression in R• Interpretation of Regression Coefficients• t-Tests on Individual Regression Coefficients• F -Tests on Multiple Regression Coefficients/Goodness-of-Fit

MLR - 1

Data for Multiple Linear RegressionMultiple linear regression is a generalized form of simple linearregression, in which the data contains multiple explanatoryvariables.

SLR MLRx y x1 x2 . . . xp y

case 1: x1 y1 x11 x12 . . . x1p y1

case 2: x2 y2 x21 x22 . . . x2p y2...

......

.... . .

......

case n: xn yn xn1 xn2 . . . xnp yn

I For SLR, we observe pairs of variables.For MLR, we observe rows of variables.Each row (or pair) is called a case, a record, or a data point

I yi is the response (or dependent variable) of the ithobservation

I There are p explanatory variables (or covariates, predictors,independent variables), and xik is the value of the explanatoryvariable xk of the ith case

MLR - 2

Multiple Linear Regression Models

yi = β0 + β1xi1 + . . .+ βpxip + εi where εi ’s are i.i.d. N(0, σ2)

In the model above,

I εi ’s (errors, or noise) are i.i.d. N(0, σ2)

I Parameters include:

β0 = intercept;

βk = regression coefficients (slope) for the kthexplanatory variable, k = 1, . . . , p

σ2 = Var(εi ) is the variance of errors

I Observed (known): yi , xi1, xi2, . . . , xipUnknown: β0, β1, . . . , βp, σ2, εi ’s

I Random variables: εi ’s, yi ’sConstants (nonrandom): βk ’s, σ2, xik ’s

MLR - 3

Questions

I What are the mean, the variance, and the distribution of yi?

I We assume εi ’s are independent. Are yi ’s independent?

MLR - 4

Fitting the Model — Least Squares Method

I Recall for SLR, the leastsquares estimates (β0, β1)for (β0, β1) is the interceptand slope of the straight linewith the minimum sum ofsquared vertical distance tothe data points∑n

i=1(yi − β0 − β1xi )2

●

●

●

●

●

x

y

regression line

an arbitrarystraight line

●● ● ●

●

0 1 2 3 4 5 6 7 8

02

46

810

14

I MLR is just like SLR. The least squares estimates(β0, β1, . . . , βp) for β0, . . . , βp is the intercept and slopes ofthe (hyper)plane with the minimum sum of squared verticaldistance to the data points

n∑i=1

(yi − β0 − β1xi1 − . . .− βpxip)2

MLR - 5

Solving the Least Squares Problem (1)From now on, we use the “hat” symbol to differentiate theestimated coefficient βj from the actual unknown coefficient βj .

To find the (β0, β1, . . . , βp) that minimize

L(β0, β1, . . . , βp) =n∑

i=1

(yi − β0 − β1xi1 − . . .− βpxip)2

one can set the derivatives of L with respect to βj to 0

∂L

∂β0

= −2n∑

i=1

(yi − β0 − β1xi1 − . . .− βpxip)

∂L

∂βk= −2

n∑i=1

xik(yi − β0 − β1xi1 − . . .− βpxip), k = 1, 2, . . . , p

and then equate them to 0. This results in a system of (p + 1)equations in (p + 1) unknowns.

MLR - 6

Solving the Least Squares Problem (2)

The least square estimate (β0, β1, . . . , βp) is the solution to thefollowing system of equations, called normal equations.

nβ0 + β1∑n

i=1 xi1 + · · · + βp∑n

i=1 xip =∑n

i=1 yiβ0

∑ni=1 xi1 + β1

∑ni=1 x

2i1 + · · · + βp

∑ni=1 xi1xip =

∑ni=1 xi1yi

...

β0∑n

i=1 xik + β1∑n

i=1 xikxi1 + · · · + βp∑n

i=1 xikxip =∑n

i=1 xikyi...

β0∑n

i=1 xip + β1∑n

i=1 xipxi1 + · · · + βp∑n

i=1 x2ip =

∑ni=1 xipyi

I Don’t worry about solving the equations.R and many other softwares can do the computation for us.

I In general, βj 6= βj , but they will be close under someconditions

MLR - 7

Fitted Values

The fitted value or predicted value:

yi = β0 + β1xi1 + . . .+ βpxip

I Again, the “hat” symbol is used to differentiate the fittedvalue yi from the actual observed value yi .

MLR - 8

Residuals

I One cannot directly compute the errors

εi = yi − β0 − β1xi1 − . . .− βpxip

since the coefficients β0, β1, . . . , βp are unknown.

I The errors εi can be estimated by the residuals ei defined as:

residual ei = observed yi − predicted yi

= yi − yi

= yi − (β0 + β1xi1 + . . .+ βpxip)

= β0 + β1xi1 + . . .+ βpxip + εi

− β0 − β1xi1 − . . .− βpxip

I ei 6= εi in general since βj 6= βjI Graphical explanation

MLR - 9

Properties of Residuals

Recall the least square estimate (β0, β1, . . . , βp) satisfies theequations

n∑i=1

(yi − β0 − β1xi1 − . . .− βpxip︸︷︷︸= yi−yi = ei = residual

) = 0 and

n∑i=1

xik(︷︸︸︷yi − β0 − β1xi1 − . . .− βpxip) = 0, k = 1, 2, . . . , p.

Thus the residuals ei have the properties∑n

i=1ei = 0︸︷︷︸

Residuals add up to 0.

,∑n

i=1xikei = 0, k = 1, 2, . . . , p.︸︷︷︸

Residuals are orthogonal to covariates.

The two properties also imply that the residuals are uncorrelatedwith each of the p covariates. (proof: HW1 bonus).

MLR - 10

Sum of SquaresObserve that

yi − y = (yi − y) + (yi − yi )

Squaring up both sides we get

(yi − y)2 = (yi − y)2 + (yi − yi )2 + 2(yi − y)(yi − yi )

Summing up over all the cases i = 1, 2, . . . , n, we get

SST︷︸︸︷n∑

i=1

(yi − y)2 =

SSR︷︸︸︷n∑

i=1

(yi − y)2 +

SSE︷︸︸︷n∑

i=1

(yi − yi︸︷︷︸=ei

)2

+ 2n∑

i=1

(yi − y)(yi − yi )︸︷︷︸We’ll shortly show this is 0.

MLR - 11

Why∑n

i=1(yi − y)(yi − yi) = 0?

n∑i=1

(yi − y)(yi − yi︸︷︷︸=ei

)

=n∑

i=1

yiei −n∑

i=1

yei

=n∑

i=1

(β0 + β1xi1 + . . .+ βpxip)ei −n∑

i=1

yei

= β0

n∑i=1

ei︸︷︷︸=0

+β1

n∑i=1

xi1ei︸︷︷︸=0

+ . . .+ βp

n∑i=1

xipei︸︷︷︸=0

−yn∑

i=1

ei︸︷︷︸=0

= 0

in which we used the properties of residuals that∑n

i=1 ei = 0 and∑ni=1 xikei = 0 for all k = 1, . . . , p.

MLR - 12

Interpretation of Sum of Squares

n∑i=1

(yi − y)2

︸︷︷︸SST

=n∑

i=1

(yi − y)2

︸︷︷︸SSR

+n∑

i=1

(

=ei︷︸︸︷yi − yi )

2

︸︷︷︸SSE

I SST = total sum of squares

I total variability of yI depends on the response y only, not on the form of the

model

I SSR = regression sum of squares

I variability of y explained by x1, . . . , xp

I SSE = error (residual) sum of squares

I = minβ0,β1,...,βp

∑ni=1(yi − β0 − β1xi1 − · · · − βpxip)2

I variability of y not explained by x’s

MLR - 13

Nested Models

We say Model 1 is nested in Model 2 if Model 1 is a special caseof Model 2 (and hence Model 2 is an extension of Model 1).E.g., for the 4 models below,

Model A : Y = β0 + β1X1 + β2X2 + β3X3 + ε

Model B : Y = β0 + β1X1 + β2X2 + ε

Model C : Y = β0 + β1X1 + β3X3 + ε

Model D : Y = β0 + β1(X1 + X2) + ε

I B is nested in A . . . . . . . . . . . since A reduces to B when β3 = 0

I C is also nested in A . . . . . . . since A reduces to C when β2 = 0

I D is nested in B . . . . . . . . . since B reduces to D when β1 = β2

I B and C are NOT nested in either way

I D is NOT nested in C

MLR - 14

Nested Relationship is Transitive

If Model 1 is nested in Model 2, and Model 2 is nested in Model 3,then Model 1 is also nested in Model 3.

For example, for models in the previous slide,

D is nested in B, and B is nested in A,

implies D is also nested in A, which is clearly true because ModelA reduces to Model D when

β1 = β2, and β3 = 0.

When two models are nested (Model 1 is nested in Model 2),

I the smaller model (Model 1) is called the reduced model,and

I the more general model (Model 2) is called the full model.

MLR - 15

SST of Nested Models

Question: Compare the SST for Model A and the SST for ModelB. Which one is larger? Or are they equal?

What about the SST for Model C? For Model D?

MLR - 16

SSE of Nested ModelsWhen a reduced model is nested in a full model, then

(i) SSEreduced ≥ SSEfull , and (ii) SSRreduced ≤ SSRfull .

Proof. We will prove (i) forI the full model yi = β0 + β1xi1 + β2xi2 + β3xi3 + εi andI the reduced model yi = β0 + β1xi1 + β3xi3 + εi .

The proofs for other nested models are similar.

SSEfull = minβ0,β1,β2,β3

n∑i=1

(y1 − β0 − β1xi1 − β2xi2 − β3xi3)2

≤ minβ0,β1,β3

n∑i=1

(y1 − β0 − β1xi1 − β3xi3)2

= SSEreduced

Part (ii) follows directly from (i), the identity SST = SSR + SSE ,and the fact that all MLR models of the same data set have acommon SST

MLR - 17

Degrees of FreedomIt can be shown that if the MLR modelyi = β0 +β1xi1 + . . .+βpxip + εi , εi ’s i.i.d. ∼ N(0, σ2) is true then

SSE

σ2∼ χ2

n−p−1,

If we further assume that β1 = β2 = · · · = βp = 0, then

SST

σ2∼ χ2

n−1,SSR

σ2∼ χ2

p

and SSR is independent of SSE.Note the degrees of freedom of the 3 chi-square distributions

dfT = n − 1, dfR = p, dfE = n − p − 1

break down similarly

dfT = dfR + dfE

just like SST = SSR + SSE .MLR - 18

Why the Degrees of Freedom for Errors is dfE = n− p− 1?

The n residuals e1, . . . , en cannot all vary freely.

There are p + 1 constraints:

n∑i=1

ei = 0 andn∑

i=1

xkiei = 0 for k = 1, . . . , p.

So only n − (p + 1) of them can be freely varying.

The p + 1 constraints comes from the p + 1 coefficients β0, . . . , βpin the model, and each contributes one constraint ∂

∂βk= 0.

MLR - 19

Mean Square Error (MSE) — Estimate of σ2

The mean squares is the sum of squares divided by its degrees offreedom:

MST =SST

dfT=

SST

n − 1= sample variance of Y ,

MSR =SSR

dfR=

SSR

p,

MSE =SSE

dfE=

SSE

n − p − 1= σ2

I From the fact SSEσ2 ∼ χ2

n−p−1 and that the mean of a χ2k

distribution is k , we know that MSE is an unbiasedestimator for σ2.

I Though SSE always decreases as we add terms to the model,adding unimportant terms may increases MSE.

MLR - 20

Multiple R-SquaredMultiple R2, also called the coefficient of determination, isdefined as

R2 =SSR

SST= 1− SSE

SST= proportion of variability in y explained by x1, . . . , xp

which measures the strength of the linear relationship between yand the p covariates.

I 0 ≤ R2 ≤ 1I For SLR, R2 = r2

xy is the square of the correlation coefficientbetween X and Y (Proof: HW1).So multiple R2 is a generalization of the correlationcoefficient.

I When two models are nested, then

R2(reduced model) ≤ R2(full model).

I Is large R2 always preferable?

MLR - 21

Adjusted R-Squared

Because R2 always increases as we add terms to the model, somepeople prefer to use an adjusted R2 defined as

R2adj = 1− SSE/dfE

SST/dfT= 1− SSE/(n − p − 1)

SST/(n − 1)

= 1− n − 1

n − p − 1(1− R2).

I − p

n − p − 1≤ R2

adj ≤ R2 ≤ 1

I Unlike R2, R2adj can be negative

I R2adj does not always increase as more variables are added to

the model.In fact, if unnecessary terms are added, R2

adj may decrease.

MLR - 22

Example: Housing Price

Price BDR FLR FP RMS ST LOT BTH CON GAR LOC

53 2 967 0 5 0 39 1.5 1 0.0 0

55 2 815 1 5 0 33 1.0 1 2.0 0

56 3 900 0 5 1 35 1.5 1 1.0 0

58 3 1007 0 6 1 24 1.5 0 2.0 0

64 3 1100 1 7 0 50 1.5 1 1.5 0

44 4 897 0 7 0 25 2.0 0 1.0 0

49 5 1400 0 8 0 30 1.0 0 1.0 0

70 3 2261 0 6 0 29 1.0 0 2.0 0

72 4 1290 0 8 1 33 1.5 1 1.5 0

82 4 2104 0 9 0 40 2.5 1 1.0 0

85 8 2240 1 12 1 50 3.0 0 2.0 0

45 2 641 0 5 0 25 1.0 0 0.0 1

47 3 862 0 6 0 25 1.0 1 0.0 1

49 4 1043 0 7 0 30 1.5 0 0.0 1

56 4 1325 0 8 0 50 1.5 0 0.0 1

60 2 782 0 5 1 25 1.0 0 0.0 1

62 3 1126 0 7 1 30 2.0 1 0.0 1

64 4 1226 0 8 0 37 2.0 0 2.0 1

.

.

.

50 2 691 0 6 0 30 1.0 0 2.0 0

65 3 1023 0 7 1 30 2.0 1 1.0 0

Price = Selling price in $1000BDR = Number of bedroomsFLR = Floor space in sq. ft.FP = Number of fireplacesRMS = Number of roomsST = Storm windows

(1 if present, 0 if absent)

LOT = Front footage of lot in feetBTH = Number of bathroomsCON = Construction

(1 if frame, 0 if brick)

GAR = Garage size(0 = no garage,1 = one-car garage, etc.)

LOC = Location(1 if property is in zone A,0 otherwise)

MLR - 23

How to Do Regression Using R?

> housing = read.table("housing.dat",h=TRUE) # to load the data

> lm(Price ~ FLR+RMS+BDR+GAR+LOT+ST+CON+LOC, data=housing)

Call:

lm(formula = Price ~ FLR + RMS + BDR + GAR + LOT + ST + CON +

LOC, data = housing)

Coefficients:

(Intercept) FLR RMS BDR GAR

12.47980 0.01704 2.38264 -4.54355 5.07873

LOT ST CON LOC

0.38241 9.82757 4.86507 6.95701

The lm() command above asks R to fit the model

Price = β0 + β1FLR + β2RMS + β3BDR + β4GAR

+ β5LOT + β6ST + β7CON + β8LOC + ε

and R gives us the regression equation

Price = 12.480 + 0.017FLR + 2.383RMS− 4.544BDR + 5.079GAR

+ 0.382LOT + 9.828ST + 4.865CON + 6.957LOC

MLR - 24

More R Commands

> lm1 = lm(Price ~ FLR+RMS+BDR+GAR+LOT+ST+CON+LOC, data=housing)

> summary(lm1) # Regression output with more details

# including multiple R-squared,

# and the estimate of sigma

> lm1$coef # show the estimated beta’s

> lm1$fitted # show the fitted values

> lm1$res # show the residuals

Guess what we will get in R if we type the following commands?

> sum(lm1$res)

> mean(lm1$res)

> cor(lm1$res, housing$FLR)

> cor(lm1$res, housing$RMS)

> cor(lm1$res, housing$BDR)

MLR - 25


> summary(lm1)

Call:



Residuals:

Min 1Q Median 3Q Max

-6.020 -2.129 -0.213 2.147 6.492

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 12.479799 4.443985 2.808 0.012094 *

FLR 0.017038 0.002751 6.195 9.8e-06 ***

RMS 2.382638 1.418290 1.680 0.111251

BDR -4.543550 1.781145 -2.551 0.020671 *

GAR 5.078729 1.209692 4.198 0.000604 ***

LOT 0.382411 0.106832 3.580 0.002309 **

ST 9.827572 1.929232 5.094 9.0e-05 ***

CON 4.865071 1.890718 2.573 0.019746 *

LOC 6.957007 2.044084 3.403 0.003382 **

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 4.021 on 17 degrees of freedom

Multiple R-squared: 0.9305, Adjusted R-squared: 0.8979

F-statistic: 28.47 on 8 and 17 DF, p-value: 2.25e-08MLR - 26



F-statistic: 28.47 on 8 and 17 DF, p-value: 2.25e-08

I Residual standard error =√MSE = σ

I “on 17 degrees of freedom” becausedfE = n − p − 1 = 26− 8− 1 = 17

I Multiple R-squared = 1− SSE

SST

I Adjusted R-squared = R2adj = 1− SSE/(n − p − 1)

SST/n − 1

I The F-statistic 28.47 is for testing

H0 : β1 = β2 = · · · = βp = 0 v.s.

Ha : not all βj = 0, j = 1, . . . , p.

We will soon explain the meaning of this F -statistics.

MLR - 27

Price = 12.480 + 0.017FLR + 2.383RMS− 4.544BDR + 5.079GAR

+ 0.382LOT + 9.828ST + 4.865CON + 6.957LOC

The regression equation tells us:

I an extra square foot in floor area increase the price by $17 ,

I an additional room by . . . . . . . . . . . . . . . . . . . . . . . . . . . . $2383 ,

I an additional bedroom by . . . . . . . . . . . . . . . . . . . . . . . −$4544 ,

I an additional space in the garage by . . . . . . . . . . . . . . $5079 ,

I an extra foot in front footage by . . . . . . . . . . . . . . . . . $382 ,

I using storm windows by . . . . . . . . . . . . . . . . . . . . . . . . . . $9828 .

I constructing in brick rather than in frame by . . . . . . $4865 ,

I locating in zone A rather than other area by . . . . . . . $6957 .

Question:Why an additional bedroom makes a house less valuable?

MLR - 28

Interpretation of Regression CoefficientsI β0: intercept, the mean value of the response y when all

covariates xj ’ are zero.I May not make practical sense

e.g., in the housing price model, β0 doesn’t makepractical sense since there is no such housing unit with 0floor space.

I βj : regression coefficient for xj , is the mean change in theresponse y when xj is increased by one unit holding all othercovariates constant

I Interpretation of βj depends on the presence of othercovariates in the modele.g., the meaning of the 2 β1’s in the following 2 modelsare different

Model 1 : yi = β0 + β1xi1 + β2xi2 + β3xi3 + εi

Model 2 : yi = β0 + β1xi1 + εi .

MLR - 29

What’s Wrong?

# Model 1

> lmBDRonly = lm(Price ~ BDR, data=housing)

> lmBDRonly$coef

(Intercept) BDR

43.487365 3.920578

The regression coefficient for BDR is 3.92 in the Model 1 abovebut −4.54 in the Model 2 below.

# Model 2


> lm1$coef

(Intercept) FLR RMS BDR GAR

12.47979941 0.01703833 2.38263831 -4.54355024 5.07872939

LOT ST CON LOC

0.38241085 9.82757237 4.86507085 6.95700689

Considering BDR alone, house prices increase with BDR.However, when other covariates (FLR, RMS, etc) are taken intoaccount, house prices decrease with BDR.Does this make sense?

MLR - 30

One Sample t-Test (Review)

Given a sample of size n, Y1,Y2, . . . ,Yn, from some (normal)population with unknown mean µ and unknown variance σ2.Want to test

H0 : µ = µ0 v.s. Ha : µ 6= µ0

The t-statistic is

t =Y − µ0

SE(Y )=

Y − µ0

s/√n

where s =

√∑ni=1(Yi − Y )2

n − 1

If the population is normal, the t-statistic has a t-distribution withn − 1 degrees of freedom

MLR - 31

t-Tests on Individual Regression Coefficients

For a MLR model Yi = β0 + β1Xi1 + . . .+ βpXip + εi , to test thehypotheses,

H0 : βj = c v.s. Ha : βj 6= c

the t-statistic is

t =βj − c

SE(βj)

in which SE(βj) is the standard error for βj .

I General formula for SE(βj) is a bit complicate butunimportant in STAT222 and hence is omitted

I R can compute SE(βj) for us

I Formula for SE(βj) for a few special cases will be given later

This t-statistic also has a t-distribution with n − p − 1 degrees offreedom

MLR - 32


> summary(lm1)

Call:



Residuals:

Min 1Q Median 3Q Max

-6.020 -2.129 -0.213 2.147 6.492

Coefficients:


(Intercept) 12.479799 4.443985 2.808 0.012094 *

FLR 0.017038 0.002751 6.195 9.8e-06 ***

RMS 2.382638 1.418290 1.680 0.111251

BDR -4.543550 1.781145 -2.551 0.020671 *

GAR 5.078729 1.209692 4.198 0.000604 ***

LOT 0.382411 0.106832 3.580 0.002309 **

ST 9.827572 1.929232 5.094 9.0e-05 ***

CON 4.865071 1.890718 2.573 0.019746 *

LOC 6.957007 2.044084 3.403 0.003382 **

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1



F-statistic: 28.47 on 8 and 17 DF, p-value: 2.25e-08MLR - 33

Coefficients:


(Intercept) 12.479799 4.443985 2.808 0.012094 *

FLR 0.017038 0.002751 6.195 9.8e-06 ***

RMS 2.382638 1.418290 1.680 0.111251

......(some rows are omitted)

LOC 6.957007 2.044084 3.403 0.003382 **

I the first column gives variable names

I the column Estimate gives the LS estimate βj ’s for βj ’s

I the column Std. Error gives SE(βj), the standard error of βj

I the column t value gives t-value =βj

SE(βj )

I column Pr(>|t|) gives the P-value for testing H0: βj = 0 v.s.Ha: βj 6= 0.

E.g., for RMS, we see βRMS ≈ 2.38 and SE(βRMS) ≈ 1.42, and the

t-value for RMS = βRMS

SE(βRMS )≈ 2.38

1.42 ≈ 1.68. The P-value 0.111 is

the 2-sided P-value for testing H0: βRMS = 0

MLR - 34

Types of Tests for MLRThe t-test can only tests for a single coefficient. One may alsowant to test for multiple coefficients.E.g., for the following MLR model with 6 covariates

Y = β0 + β1X1 + . . .+ β6X6 + ε,

one may want to test whether

1. all the regression coefficients are equal to zeroH0: β1 = β2 = · · · = β6 = 0 v.s.Ha: at least one of β1 . . . , β6 is not 0

2. some subset of the regression coefficients are equal to zeroe.g. H0: β2 = β3 = β5 = 0 v.s.

Ha: at least one of β2, β3, β5 6= 0

3. some of the regression coefficients equal to each othere.g. H0: β2 = β3 = β5 v.s.

Ha: β2, β3 and β5 are not all equal

4. some β’s satisfy certain specified linear constraintse.g. H0 : β2 = β3 + β5 v.s. Ha: β2 6= β3 + β5

MLR - 35

Tests on Multiple Coefficients are Model ComparisonsEach of the four tests in the previous slide can be viewed as acomparison of 2 models, a full model and a reduced model.

I Testing β1 = β2 = . . . = β6 = 0

Full :Y = β0 + β1X1 + . . .+ β6X6 + ε

Reduced :Y = β0 + ε (All covariates are redundant)

I Testing β2 = β3 = β5 = 0

Full :Y = β0 + β1X1 + β2X2 + β3X3 + β4X4 + β5X5 + β6X6 + ε

Reduced :Y = β0 + β1X1 + β4X4 + β6X6 + ε (X2,X3,X5 are redundant)

I Testing β2 = β3 = β5

Full :Y = β0 + β1X1 + β2X2 + β3X3 + β4X4 + β5X5 + β6X6 + ε

Reduced :Y = β0 + β1X1 + β2(X2 + X3 + X5) + β4X4 + β6X6 + ε

(Y depends on X2, X3, X5 only through their sum X2 + X3 + X5)

MLR - 36

Tests on Multiple Coefficients are Model Comparisons (2)

I Testing β2 = β3 + β5

Full : Y = β0 + β1X1 + β2X2 + β3X3+

+ β4X4 + β5X5 + β6X6 + ε

Reduced : Y = β0 + β1X1 + (β3 + β5)X2 + β3X3+

+ β4X4 + β5X5 + β6X6 + ε

= β0 + β1X1 + β3(X2 + X3) + β4X4+

+ β5(X2 + X5) + β6X6 + ε

Observed the reduced model is always nested in the full model.

MLR - 37

General Framework for Testing Nested Models

H0: the reduced model is true v.s.Ha : the full model is true

I As the reduced model is nested in the full model, it is alwaystrue that

SSEreduced ≥ SSEfull

I Trade-off between Simplicity and PrecisionI Full model fits the data better (with smaller SSE) but is

more complicateI Reduced model doesn’t fit as well but is simpler.

I How to choose between the full and the reduced models?I If SSEreduced ≈ SSEfull , one can sacrifice a bit of

precision in exchange for simplicityI If SSEreduced � SSEfull , it costs a great reduction in

precision in exchange for simplicity. The full model ispreferred.

MLR - 38

The F -Statistic

F =(SSEreduced − SSEfull)/(dfreduced − dffull)

SSEfull/dffull

I SSEreduced − SSEfull is the reduction in SSE from replacingthe reduced model with the full model.

I dffull/dfreduced is the dfE for the full/reduced model.

I The denominator is the MSE of the full model

I F ≥ 0 since SSEreduced ≥ SSEfull ≥ 0

I The smaller the F -statistic, the more we favor the reducedmodel

I Under H0, the F -statistic has an F -distribution withdfreduced − dffull and dffull degrees of freedom.

MLR - 39

Testing All Coefficients Equal ZeroTesting the hypotheses

H0: β1 = · · · = βp = 0 v.s. Ha: not all β1 . . . , βp = 0

is a test to evaluate the overall significance of a model.

Full :Y = β0 + β1X1 + . . .+ βpXp + ε

Reduced :Y = β0 + ε (all covariates are unnecessary)

I The LS estimate for β0 in the reduced model is β0 = Y , so

SSEreduced =n∑

i=1

(Yi − β0)2 =∑i

(Yi − Y )2 = SSTfull

I dfreduced = dfEreduced = n − 1,because the reduced model has only one coefficient β0

I dffull = dfEfull = n − p − 1.

MLR - 40

Testing All Coefficients Equal ZeroSo

F =(SSEreduced − SSEfull)/(dfreduced − dffull)

SSEfull/dffull

=(SSTfull − SSEfull)/[n − 1− (n − p − 1)]

SSEfull/(n − p − 1)

=SSRfull/p

SSEfull/(n − p − 1).

Moreover, F ∼ Fp,n−p−1 under H0: β1 = β2 = · · · = βp = 0.In R, the F statistic and p-value are displayed in the last line of theoutput of the summary() command.


> summary(lm1)

... (output omitted)



F-statistic: 28.47 on 8 and 17 DF, p-value: 2.25e-08

MLR - 41

ANOVA and the F -Test

The test of all coefficients equal zero is often summarized in anANOVA table.

Sum of MeanSource df Squares Squares F

Regression dfR = p SSR MSR =SSR

dfRF =

MSR

MSE

Error dfE = n − p − 1 SSE MSE =SSE

dfE

Total dfT = n − 1 SST

ANOVA is the shorthand for analysis of variance.It decomposes the total variation in the response (SST) intoseparate pieces that correspond to different sources of variation,like SST = SSR + SSE in the regression setting.Throughout STAT222, we will introduce several other ANOVAtables.

MLR - 42

Testing Some Coefficients Equal to Zero

E.g., for the housing price data, we may want to test if we caneliminate RMS and CON from the model,i.e., H0: βRMS = βCON = 0.


> lm3 = lm(Price ~ FLR+BDR+GAR+LOT+ST+LOC, data=housing)

> anova(lm3,lm1)

Analysis of Variance Table

Model 1: Price ~ FLR + BDR + GAR + LOT + ST + LOC

Model 2: Price ~ FLR + RMS + BDR + GAR + LOT + ST + CON + LOC

Res.Df RSS Df Sum of Sq F Pr(>F)

1 19 472.03

2 17 274.84 2 197.19 6.0985 0.01008 *

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Note SSE is called RSS (residual sum of square) in R.

MLR - 43

Testing Equality of Coefficients (1)

Full model : Y = β0 + β1X1 + β2X2 + β3X3 + β4X4 + ε

Example 1: want to test H0: β1 = β4, then the reduced model is

Y = β0 + β1X1 + β2X2 + β3X3 + β1X4 + ε

= β0 + β1(X1 + X4) + β2X2 + β3X3 + ε

1. Make a new variable W = X1 + X4

2. Fit the reduced model by regressing Y on W , X2, and X3

3. Find SSEreduced and dfreduced− dffull = 14. Can be done in R as follows:

> lmfull = lm(Y ~ X1 + X2 + X3 + X4)

> lmreduced = lm(Y ~ I(X1 + X4) + X2 + X3)

> anova(lmreduced,lmfull)

The line lmreduced = lm(Y ~ I(X1 + X4) + X2 + X3) isequivalent to> W = X1 + X4

> lmreduced = lm(Y ~ W + X2 + X3)

MLR - 44


Example 2: want to test H0: β1 = β2 = β3, then the reducedmodel is

Y = β0 + β1X1 + β1X2 + β1X3 + β4X4 + ε

= β0 + β1(X1 + X2 + X3) + β4X4 + ε

1. Make a new variable W = X1 + X2 + X3

2. Fit the reduced model by regressing Y on W and X4

3. Find SSEreduced and dfreduced− dffull = 2

4. Can be done in R as follows

> lmfull = lm(Y ~ X1 + X2 + X3 + X4)

> lmreduced = lm(Y ~ I(X1 + X2 + X3) + X4)

> anova(lmreduced, lmfull)

MLR - 45


Example 3: want to test H0: β1 = β2 and β3 = β4, then thereduced model is

Y = β0 + β1X1 + β1X2 + β3X3 + β3X4 + ε

= β0 + β1(X1 + X2) + β3(X3 + X4) + ε

1. Make new variables W1 = X1 + X2, W2 = X3 + X4

2. Fit the reduced model by regressing Y on W1 and W2



> lmfull = lm(Y ~ X1 + X2 + X3 + X4)

> lmreduced = lm(Y ~ I(X1 + X2) + I(X3 + X4))


MLR - 46

Testing Coefficients under Constraints (1)Again say the full model is

Full model : Y = β0 + β1X1 + β2X2 + β3X3 + β4X4 + ε

Example 4: If H0: β2 = β3 + β4, then the reduced model is

Y = β0 + β1X1 + (β3 + β4)X2 + β3X3 + β4X4 + ε

= β0 + β1X1 + β3(X2 + X3) + β4(X2 + X4) + ε

1. Make new variables W1 = X2 + X3, W2 = X2 + X4

2. Fit the reduced model by regressing Y on X1, W1 and W2



> lmfull = lm(Y ~ X1 + X2 + X3 + X4)

> lmreduced = lm(Y ~ X1 + I(X2 + X3) + I(X2 + X4))


MLR - 47

Testing Coefficients under Constraints (2)

Example 5: If we think β2 = 2β1, then the reduced model is

Y = β0 + β1X1 + β2X2 + β3X3 + β4X4 + ε

= β0 + β1X1 + 2β1X2 + β3X3 + β4X4 + ε

= β0 + β1(X1 + 2X2) + β3X3 + β4X4 + ε

1. Make a new variable W = X1 + 2X2

2. Fit the reduced model by regressing Y on W , X3 and X4



> lmfull = lm(Y ~ X1 + X2 + X3 + X4)

> lmreduced = lm(Y ~ I(X1 + 2*X2) + X3 + X4)


MLR - 48

Relationship Between t-tests and F -tests (Optional)The F -test can also test for a single coefficient, and the result isequivalent to the t-test. E.g., if one wants to test a singlecoefficient β3 = 0 in the model

Y = β0 + β1X1 + β2X2 + β3X3 + β4X4 + ε

one way is to do a t-test using the commandsummary(lm(Y ~ X1 + X2 + X3 + X4)) and read the t-statistic andP-value for X3. Alternatively, one can also be viewed as a modelcomparison between

Full model :Y = β0 + β1X1 + β2X2 + β3X3 + β4X4 + ε

Reduced model :Y = β0 + β1X1 + β2X2 + β4X4 + ε

> anova(lm(Y ~ X1 + X2 + X3 + X4), lm(Y ~ X1 + X2 + X4))

One can show that the F -statistic = (t-statistic)2 and the P-valuesare the same, and thus the two tests are equivalent.

The proof involves complicate matrix algebra and is hence omitted.MLR - 49

Consider again the model

Price = β0 + β1FLR + β2RMS + β3BDR + β4GAR

+ β5LOT + β6ST + β7CON + β8LOC + ε

for the housing price data and want to test βBDR = 0.

From the output on Slide 33, we see the t-statistic for testβBDR = 0 is −2.551 with P-value 0.020671.

If using an F -test,

> lmfull = lm(Price ~ FLR+RMS+BDR+GAR+LOT+ST+CON+LOC, data=housing)

> lmreduced = lm(Price ~ FLR+RMS+GAR+LOT+ST+CON+LOC, data=housing)

> anova(lmreduced,lmfull)

Analysis of Variance Table

Res.Df RSS Df Sum of Sq F Pr(>F)

1 18 380.04

2 17 274.84 1 105.2 6.5072 0.02067 *

we see t2 = (−2.551)2 = 6.507601 ≈ 6.5072 = F (the subtledifference is due to rounding), and the P-value is also 0.02067.

MLR - 50

multiple linear regression (mlr) handoutsyibi/teaching/stat222/2017/...multiple linear regression...

Documents