01_03
DESCRIPTION
Linear Regression by rTRANSCRIPT
![Page 1: 01_03](https://reader031.vdocuments.mx/reader031/viewer/2022020520/577cc6b81a28aba7119ef4d8/html5/thumbnails/1.jpg)
Least squares estimation of regression linesRegression via least squares
Brian Caffo, Jeff Leek and Roger PengJohns Hopkins Bloomberg School of Public Health
![Page 2: 01_03](https://reader031.vdocuments.mx/reader031/viewer/2022020520/577cc6b81a28aba7119ef4d8/html5/thumbnails/2.jpg)
General least squares for linear equationsConsider again the parent and child height data from Galton
2/19
![Page 3: 01_03](https://reader031.vdocuments.mx/reader031/viewer/2022020520/577cc6b81a28aba7119ef4d8/html5/thumbnails/3.jpg)
Fitting the best lineLet be the child's height and be the (average over the pair of) parents' heights.
Consider finding the best line
Use least squares
How do we do it?
∙ Yi ith Xi ith
∙
Child's Height = + Parent's Height β0 β1
∙
− ( + )∑i=1
n
Yi β0 β1Xi 2
∙
3/19
![Page 4: 01_03](https://reader031.vdocuments.mx/reader031/viewer/2022020520/577cc6b81a28aba7119ef4d8/html5/thumbnails/4.jpg)
Let's solve this problem generallyLet and our estimates be .
We want to minimize
Suppose that
then
∙ = +μi β0 β1Xi = +μi β0 β1Xi
∙
† ( − = ( − + 2 ( − )( − ) + ( −∑i=1
n
Yi μi)2 ∑
i=1
n
Yi μi)2 ∑
i=1
n
Yi μi μi μi ∑i=1
n
μi μi)2
∙
( − )( − ) = 0∑i=1
n
Yi μi μi μi
† = ( − + ( − ≥ ( −∑i=1
n
Yi μi)2 ∑
i=1
n
μi μi)2 ∑
i=1
n
Yi μi)2
4/19
![Page 5: 01_03](https://reader031.vdocuments.mx/reader031/viewer/2022020520/577cc6b81a28aba7119ef4d8/html5/thumbnails/5.jpg)
Mean only regressionSo we know that if:
where and then the line
is the least squares line.
Consider forcing and thus ; that is, only considering horizontal lines
The solution works out to be
∙
( − )( − ) = 0∑i=1
n
Yi μi μi μi
= +μi β0 β1Xi = +μi β0 β1Xi
Y = + Xβ0 β1
∙ = 0β1 = 0β1
∙
= .β0 Y
5/19
![Page 6: 01_03](https://reader031.vdocuments.mx/reader031/viewer/2022020520/577cc6b81a28aba7119ef4d8/html5/thumbnails/6.jpg)
Let's show it
Thus, this will equal 0 if
Thus
( − )( − ) =∑i=1
n
Yi μi μi μi
=
( − )( − )∑i=1
n
Yi β0 β0 β0
( − ) ( − ) β0 β0 ∑i=1
n
Yi β0
( − ) = n − n = 0∑ni=1 Yi β0 Y β0
= .β0 Y
6/19
![Page 7: 01_03](https://reader031.vdocuments.mx/reader031/viewer/2022020520/577cc6b81a28aba7119ef4d8/html5/thumbnails/7.jpg)
Regression through the originRecall that if:
where and then the line
is the least squares line.
Consider forcing and thus ; that is, only considering lines through the origin
The solution works out to be
∙
( − )( − ) = 0∑i=1
n
Yi μi μi μi
= +μi β0 β1Xi = +μi β0 β1Xi
Y = + Xβ0 β1
∙ = 0β0 = 0β0
∙
= .β1∑i=1n YiXi
∑ni=1 X
2i
7/19
![Page 8: 01_03](https://reader031.vdocuments.mx/reader031/viewer/2022020520/577cc6b81a28aba7119ef4d8/html5/thumbnails/8.jpg)
Let's show it
Thus, this will equal 0 if
Thus
( − )( − ) =∑i=1
n
Yi μi μi μi
=
( − )( − )∑i=1
n
Yi β1Xi β1Xi β1Xi
( − ) ( − ) β1 β1 ∑i=1
n
YiXi β1X2i
( − ) = − = 0∑ni=1 YiXi β1X2
i ∑ni=1 YiXi β1 ∑
ni=1 X
2i
= .β1∑i=1n YiXi
∑ni=1 X
2i
8/19
![Page 9: 01_03](https://reader031.vdocuments.mx/reader031/viewer/2022020520/577cc6b81a28aba7119ef4d8/html5/thumbnails/9.jpg)
Recapping what we knowIf we define then .
If we define then
What about when ? That is, we don't want to restrict ourselves to horizontal lines orlines through the origin.
∙ =μi β0 =β0 Y
If we only look at horizontal lines, the least squares estimate of the intercept of that line is theaverage of the outcomes.
∙ =μi Xiβ1 =β1∑ i=1n YiX i
∑ni=1 X
2i
If we only look at lines through the origin, we get the estimated slope is the cross product of theX and Ys divided by the cross product of the Xs with themselves.
∙ = +μi β0 β1Xi
9/19
![Page 10: 01_03](https://reader031.vdocuments.mx/reader031/viewer/2022020520/577cc6b81a28aba7119ef4d8/html5/thumbnails/10.jpg)
Let's figure it out
Note that
Then
( − )( − ) =∑i=1
n
Yi μi μi μi
=
( − − )( + − − )∑i=1
n
Yi β0 β1Xi β0 β1Xi β0 β1Xi
( − ) ( − − ) + ( − ) ( − − )β0 β0 ∑i=1
n
Yi β0 β1Xi β1 β1 ∑i=1
n
Yi β0 β1Xi Xi
0 = ( − − ) = n − n − n implies that = −∑i=1
n
Yi β0 β1Xi Y β0 β1X β0 Y β1X
( − − ) = ( − + − )∑i=1
n
Yi β0 β1Xi Xi ∑i=1
n
Yi Y β1X β1Xi Xi
10/19
![Page 11: 01_03](https://reader031.vdocuments.mx/reader031/viewer/2022020520/577cc6b81a28aba7119ef4d8/html5/thumbnails/11.jpg)
Continued
And thus
So we arrive at
And recall
= ( − ) − ( − )∑i=1
n
Yi Y β1 Xi X Xi
( − ) − ( − ) = 0.∑i=1
n
Yi Y Xi β1 ∑i=1
n
Xi X Xi
= = = Cor(Y, X) .β1( − )∑n
i=1 Yi Y Xi
( − )∑ni=1 Xi X Xi
( − )( − )∑ni=1 Yi Y Xi X
( − )( − )∑ni=1 Xi X Xi X
Sd(Y)Sd(X)
= − .β0 Y β1X
11/19
![Page 12: 01_03](https://reader031.vdocuments.mx/reader031/viewer/2022020520/577cc6b81a28aba7119ef4d8/html5/thumbnails/12.jpg)
ConsequencesThe least squares model fit to the line through the data pairs with as theoutcome obtains the line where
has the units of , has the units of .
The line passes through the point )
The slope of the regression line with as the outcome and as the predictor is .
The slope is the same one you would get if you centered the data, , and didregression through the origin.
If you normalized the data, , the slope is .
∙ Y = + Xβ0 β1 ( , )Xi Yi Yi
Y = + Xβ0 β1
= Cor(Y, X) = −β1Sd(Y)Sd(X)
β0 Y β1X
∙ β1 Y/X β0 Y
∙ ( ,X Y
∙ X YCor(Y, X)Sd(X)/Sd(Y)
∙ ( − , − )Xi X Yi Y
∙ , −X i XSd(X)
−Yi YSd(Y) Cor(Y, X)
12/19
![Page 13: 01_03](https://reader031.vdocuments.mx/reader031/viewer/2022020520/577cc6b81a28aba7119ef4d8/html5/thumbnails/13.jpg)
Revisiting Galton's dataDouble check our calculations using R
y <- galton$childx <- galton$parentbeta1 <- cor(y, x) * sd(y) / sd(x)beta0 <- mean(y) - beta1 * mean(x)rbind(c(beta0, beta1), coef(lm(y ~ x)))
(Intercept) x[1,] 23.94 0.6463[2,] 23.94 0.6463
13/19
![Page 14: 01_03](https://reader031.vdocuments.mx/reader031/viewer/2022020520/577cc6b81a28aba7119ef4d8/html5/thumbnails/14.jpg)
Revisiting Galton's dataReversing the outcome/predictor relationship
beta1 <- cor(y, x) * sd(x) / sd(y)beta0 <- mean(x) - beta1 * mean(y)rbind(c(beta0, beta1), coef(lm(x ~ y)))
(Intercept) y[1,] 46.14 0.3256[2,] 46.14 0.3256
14/19
![Page 15: 01_03](https://reader031.vdocuments.mx/reader031/viewer/2022020520/577cc6b81a28aba7119ef4d8/html5/thumbnails/15.jpg)
Revisiting Galton's dataRegression through the origin yields an equivalent slope if you center the
data first
yc <- y - mean(y)xc <- x - mean(x)beta1 <- sum(yc * xc) / sum(xc 2)c(beta1, coef(lm(y ~ x))[2])
x 0.6463 0.6463
15/19
![Page 16: 01_03](https://reader031.vdocuments.mx/reader031/viewer/2022020520/577cc6b81a28aba7119ef4d8/html5/thumbnails/16.jpg)
Revisiting Galton's dataNormalizing variables results in the slope being the correlation
yn <- (y - mean(y))/sd(y)xn <- (x - mean(x))/sd(x)c(cor(y, x), cor(yn, xn), coef(lm(yn ~ xn))[2])
xn 0.4588 0.4588 0.4588
16/19
![Page 17: 01_03](https://reader031.vdocuments.mx/reader031/viewer/2022020520/577cc6b81a28aba7119ef4d8/html5/thumbnails/17.jpg)
Plotting the fitSize of points are frequencies at that X, Y combination.
For the red lie the child is outcome.
For the blue, the parent is the outcome (accounting for the fact that the response is plotted on thehorizontal axis).
Black line assumes (slope is ).
Big black dot is .
∙
∙
∙
∙ Cor(Y, X) = 1 Sd(Y)/Sd(x)
∙ ( , )X Y
17/19
![Page 18: 01_03](https://reader031.vdocuments.mx/reader031/viewer/2022020520/577cc6b81a28aba7119ef4d8/html5/thumbnails/18.jpg)
The code to add the lines
abline(mean(y) - mean(x) * cor(y, x) * sd(y) / sd(x), sd(y) / sd(x) * cor(y, x), lwd = 3, col = "red")abline(mean(y) - mean(x) * sd(y) / sd(x) / cor(y, x), sd(y) cor(y, x) / sd(x), lwd = 3, col = "blue")abline(mean(y) - mean(x) * sd(y) / sd(x), sd(y) / sd(x), lwd = 2)points(mean(x), mean(y), cex = 2, pch = 19)
18/19
![Page 19: 01_03](https://reader031.vdocuments.mx/reader031/viewer/2022020520/577cc6b81a28aba7119ef4d8/html5/thumbnails/19.jpg)
19/19