01_03

Least squares estimation of regression linesRegression via least squares

Brian Caffo, Jeff Leek and Roger PengJohns Hopkins Bloomberg School of Public Health

General least squares for linear equationsConsider again the parent and child height data from Galton

2/19

Fitting the best lineLet be the child's height and be the (average over the pair of) parents' heights.

Consider finding the best line

Use least squares

How do we do it?

∙ Yi ith Xi ith

∙

Child's Height = + Parent's Height β0 β1

∙

− ( + )∑i=1

n

Yi β0 β1Xi 2

∙

3/19

Let's solve this problem generallyLet and our estimates be .

We want to minimize

Suppose that

then

∙ = +μi β0 β1Xi = +μi β0 β1Xi

∙

† ( − = ( − + 2 ( − )( − ) + ( −∑i=1

n

Yi μi)2 ∑

i=1

n

Yi μi)2 ∑

i=1

n

Yi μi μi μi ∑i=1

n

μi μi)2

∙

( − )( − ) = 0∑i=1

n

Yi μi μi μi

† = ( − + ( − ≥ ( −∑i=1

n

Yi μi)2 ∑

i=1

n

μi μi)2 ∑

i=1

n

Yi μi)2

4/19

Mean only regressionSo we know that if:

where and then the line

is the least squares line.

Consider forcing and thus ; that is, only considering horizontal lines

The solution works out to be

∙

( − )( − ) = 0∑i=1

n

Yi μi μi μi

= +μi β0 β1Xi = +μi β0 β1Xi

Y = + Xβ0 β1

∙ = 0β1 = 0β1

∙

= .β0 Y

5/19

Let's show it

Thus, this will equal 0 if

Thus

( − )( − ) =∑i=1

n

Yi μi μi μi

=

( − )( − )∑i=1

n

Yi β0 β0 β0

( − ) ( − ) β0 β0 ∑i=1

n

Yi β0

( − ) = n − n = 0∑ni=1 Yi β0 Y β0

= .β0 Y

6/19

Regression through the originRecall that if:

where and then the line

is the least squares line.

Consider forcing and thus ; that is, only considering lines through the origin

The solution works out to be

∙

( − )( − ) = 0∑i=1

n

Yi μi μi μi

= +μi β0 β1Xi = +μi β0 β1Xi

Y = + Xβ0 β1

∙ = 0β0 = 0β0

∙

= .β1∑i=1n YiXi

∑ni=1 X

2i

7/19

Let's show it

Thus, this will equal 0 if

Thus

( − )( − ) =∑i=1

n

Yi μi μi μi

=

( − )( − )∑i=1

n

Yi β1Xi β1Xi β1Xi

( − ) ( − ) β1 β1 ∑i=1

n

YiXi β1X2i

( − ) = − = 0∑ni=1 YiXi β1X2

i ∑ni=1 YiXi β1 ∑

ni=1 X

2i

= .β1∑i=1n YiXi

∑ni=1 X

2i

8/19

Recapping what we knowIf we define then .

If we define then

What about when ? That is, we don't want to restrict ourselves to horizontal lines orlines through the origin.

∙ =μi β0 =β0 Y

If we only look at horizontal lines, the least squares estimate of the intercept of that line is theaverage of the outcomes.

∙ =μi Xiβ1 =β1∑ i=1n YiX i

∑ni=1 X

2i

If we only look at lines through the origin, we get the estimated slope is the cross product of theX and Ys divided by the cross product of the Xs with themselves.

∙ = +μi β0 β1Xi

9/19

Let's figure it out

Note that

Then

( − )( − ) =∑i=1

n

Yi μi μi μi

=

( − − )( + − − )∑i=1

n

Yi β0 β1Xi β0 β1Xi β0 β1Xi

( − ) ( − − ) + ( − ) ( − − )β0 β0 ∑i=1

n

Yi β0 β1Xi β1 β1 ∑i=1

n

Yi β0 β1Xi Xi

0 = ( − − ) = n − n − n implies that = −∑i=1

n

Yi β0 β1Xi Y β0 β1X β0 Y β1X

( − − ) = ( − + − )∑i=1

n

Yi β0 β1Xi Xi ∑i=1

n

Yi Y β1X β1Xi Xi

10/19

Continued

And thus

So we arrive at

And recall

= ( − ) − ( − )∑i=1

n

Yi Y β1 Xi X Xi

( − ) − ( − ) = 0.∑i=1

n

Yi Y Xi β1 ∑i=1

n

Xi X Xi

= = = Cor(Y, X) .β1( − )∑n

i=1 Yi Y Xi

( − )∑ni=1 Xi X Xi

( − )( − )∑ni=1 Yi Y Xi X

( − )( − )∑ni=1 Xi X Xi X

Sd(Y)Sd(X)

= − .β0 Y β1X

11/19

ConsequencesThe least squares model fit to the line through the data pairs with as theoutcome obtains the line where

has the units of , has the units of .

The line passes through the point )

The slope of the regression line with as the outcome and as the predictor is .

The slope is the same one you would get if you centered the data, , and didregression through the origin.

If you normalized the data, , the slope is .

∙ Y = + Xβ0 β1 ( , )Xi Yi Yi

Y = + Xβ0 β1

= Cor(Y, X) = −β1Sd(Y)Sd(X)

β0 Y β1X

∙ β1 Y/X β0 Y

∙ ( ,X Y

∙ X YCor(Y, X)Sd(X)/Sd(Y)

∙ ( − , − )Xi X Yi Y

∙ , −X i XSd(X)

−Yi YSd(Y) Cor(Y, X)

12/19

Revisiting Galton's dataDouble check our calculations using R

y <- galton$childx <- galton$parentbeta1 <- cor(y, x) * sd(y) / sd(x)beta0 <- mean(y) - beta1 * mean(x)rbind(c(beta0, beta1), coef(lm(y ~ x)))

(Intercept) x[1,] 23.94 0.6463[2,] 23.94 0.6463

13/19

Revisiting Galton's dataReversing the outcome/predictor relationship

beta1 <- cor(y, x) * sd(x) / sd(y)beta0 <- mean(x) - beta1 * mean(y)rbind(c(beta0, beta1), coef(lm(x ~ y)))

(Intercept) y[1,] 46.14 0.3256[2,] 46.14 0.3256

14/19

Revisiting Galton's dataRegression through the origin yields an equivalent slope if you center the

data first

yc <- y - mean(y)xc <- x - mean(x)beta1 <- sum(yc * xc) / sum(xc 2)c(beta1, coef(lm(y ~ x))[2])

x 0.6463 0.6463

15/19

Revisiting Galton's dataNormalizing variables results in the slope being the correlation

yn <- (y - mean(y))/sd(y)xn <- (x - mean(x))/sd(x)c(cor(y, x), cor(yn, xn), coef(lm(yn ~ xn))[2])

xn 0.4588 0.4588 0.4588

16/19

Plotting the fitSize of points are frequencies at that X, Y combination.

For the red lie the child is outcome.

For the blue, the parent is the outcome (accounting for the fact that the response is plotted on thehorizontal axis).

Black line assumes (slope is ).

Big black dot is .

∙

∙

∙

∙ Cor(Y, X) = 1 Sd(Y)/Sd(x)

∙ ( , )X Y

17/19

The code to add the lines

abline(mean(y) - mean(x) * cor(y, x) * sd(y) / sd(x), sd(y) / sd(x) * cor(y, x), lwd = 3, col = "red")abline(mean(y) - mean(x) * sd(y) / sd(x) / cor(y, x), sd(y) cor(y, x) / sd(x), lwd = 3, col = "blue")abline(mean(y) - mean(x) * sd(y) / sd(x), sd(y) / sd(x), lwd = 2)points(mean(x), mean(y), cex = 2, pch = 19)

18/19

01_03

Documents