medical statistics (full english class)
DESCRIPTION
Medical Statistics (full English class). Ji-Qian Fang School of Public Health Sun Yat-Sen University. Chapter 12 Linear Correlation and Linear Regression. 12.3 Linear regression. Initial meaning of “regression”: Galdon noted that if father is tall, his son - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Medical Statistics (full English class)](https://reader036.vdocuments.mx/reader036/viewer/2022062308/56812cad550346895d915f05/html5/thumbnails/1.jpg)
![Page 2: Medical Statistics (full English class)](https://reader036.vdocuments.mx/reader036/viewer/2022062308/56812cad550346895d915f05/html5/thumbnails/2.jpg)
Medical Statistics Medical Statistics (full English class)(full English class)
Ji-Qian Fang
School of Public Health
Sun Yat-Sen University
![Page 3: Medical Statistics (full English class)](https://reader036.vdocuments.mx/reader036/viewer/2022062308/56812cad550346895d915f05/html5/thumbnails/3.jpg)
Chapter 12 Chapter 12 Linear Correlation Linear Correlation
and and Linear RegressionLinear Regression
![Page 4: Medical Statistics (full English class)](https://reader036.vdocuments.mx/reader036/viewer/2022062308/56812cad550346895d915f05/html5/thumbnails/4.jpg)
Vocabulary for Chapter 12-2
univariate 单变量
multivariate 多变量
phenomenon 现象
least square 最小二乘
intercept 截距 slope 斜率
regression coefficient 回归系数
![Page 5: Medical Statistics (full English class)](https://reader036.vdocuments.mx/reader036/viewer/2022062308/56812cad550346895d915f05/html5/thumbnails/5.jpg)
population regression coefficient 总体回归系数 sample regression coefficient 样本回归系数 residual 剩余、残差
standard deviation of regression coefficient
回归系数的标准差
standard deviation of residual 剩余标准差
sum of squared residuals 残差平方和
coefficient of determination 决定系数、确定系数
subject matter 专业
![Page 6: Medical Statistics (full English class)](https://reader036.vdocuments.mx/reader036/viewer/2022062308/56812cad550346895d915f05/html5/thumbnails/6.jpg)
Initial meaning of “regression”: Galdon noted that if father is tall, his sonwill be relatively tall; if father is short, hisson will be relative short. But, if father is very tall, his son will not taller th
an his father usually; if father is very short, his son will not shorter than his father usually.
Otherwise, ……?!Galdon called this phenomenon “regression to th
e mean”
12.3 Linear regression
![Page 7: Medical Statistics (full English class)](https://reader036.vdocuments.mx/reader036/viewer/2022062308/56812cad550346895d915f05/html5/thumbnails/7.jpg)
What is regression in statistics?What is regression in statistics?
To find out the track of the means
100
120
140
160
180
200
220
100 120 140 160 180 200 220
Father’s height( cm)
Son’s height (cm)
![Page 8: Medical Statistics (full English class)](https://reader036.vdocuments.mx/reader036/viewer/2022062308/56812cad550346895d915f05/html5/thumbnails/8.jpg)
1700
1900
2100
2300
2500
2700
2900
55 60 65 70 75 80 85
cm胸围( )
ml
肺活量(
)
Vita
l C
apac
ity (
ml)
Chest circumference (cm)
Given the value of chest circumference (X), the vital capacity (Y) vary around a center (y|x)
All the centers locate on a line -- regression line. The relationship between the center y|x and X – regression equation
|y x X
![Page 9: Medical Statistics (full English class)](https://reader036.vdocuments.mx/reader036/viewer/2022062308/56812cad550346895d915f05/html5/thumbnails/9.jpg)
Linear regression
Try to estimate and , getting
Where
a -- estimate of , intercept
b -- estimate of , slop
-- estimate of y|x
bXaY ˆ
Y
1. Linear regression equation
|y x X
![Page 10: Medical Statistics (full English class)](https://reader036.vdocuments.mx/reader036/viewer/2022062308/56812cad550346895d915f05/html5/thumbnails/10.jpg)
Least square method
To find suitable a and b such that
By calculus,
minimum)ˆ(1
2
n
ii YY
XX
XY
l
lb
22 )(1
))((1
Xn
Xl
YXn
XYl
XX
XY
XbYa
![Page 11: Medical Statistics (full English class)](https://reader036.vdocuments.mx/reader036/viewer/2022062308/56812cad550346895d915f05/html5/thumbnails/11.jpg)
1137.459333.304
667.13756
9333.304)1036(15
171858)(
1
667.13756)35150)(1036(15
12441450))((
1
222
XX
XY
XX
XY
l
lb
Xn
Xl
YXn
XYl
5211.7720667.691137.453333.2343
3333.2343,0667.69
XbYa
YX
•
Slop bSlop b
Intercept a
Regression Equation
XY 1137.455211.772ˆ
![Page 12: Medical Statistics (full English class)](https://reader036.vdocuments.mx/reader036/viewer/2022062308/56812cad550346895d915f05/html5/thumbnails/12.jpg)
2. t test for regression coefficient
b is sample regression coefficient, change from sample to sample
There is a population regression coefficient, denoted by
Question : Whether =0 or not? H0: =0, H1: ≠0
α=0.05
![Page 13: Medical Statistics (full English class)](https://reader036.vdocuments.mx/reader036/viewer/2022062308/56812cad550346895d915f05/html5/thumbnails/13.jpg)
2
)ˆ( 2
.
n
YYs xy
20
ns
bt
b
Statistic
Standard deviation of regression coefficient
Standard deviation of residual
XYYY
n
iii bllYY
1
2)ˆ(
Sum of squared residuals
2
.
)( XX
ss xyb
![Page 14: Medical Statistics (full English class)](https://reader036.vdocuments.mx/reader036/viewer/2022062308/56812cad550346895d915f05/html5/thumbnails/14.jpg)
Back to Example 12-3
132157338.30826.12
01137.450
bs
bt
0826.129333.304
99.210
)( 2
.
XX
ss xyb
99.210215
291.578719
2
)ˆ( 2
.
n
YYs xy
291.578719667.137561137.4533.1199333
)ˆ(
1137.45,667.13756,33.1199333
1
2
XYYY
n
iii
XYYY
bllYY
bll
![Page 15: Medical Statistics (full English class)](https://reader036.vdocuments.mx/reader036/viewer/2022062308/56812cad550346895d915f05/html5/thumbnails/15.jpg)
3. Application of regression
1) To describe how the value of Y depending on X2) To estimate or predict the value of Y through a
value of X (known) -- based on the regression of Y on X.3) To control the value of X through a value of Y
(known) -- If X is not a random variable, based on the regression of Y on X. -- If X is also a random variable, based on the regression of X on Y.
![Page 16: Medical Statistics (full English class)](https://reader036.vdocuments.mx/reader036/viewer/2022062308/56812cad550346895d915f05/html5/thumbnails/16.jpg)
12.4 The relationship between12.4 The relationship betweenRegression and CorrelationRegression and Correlation
1. Distinguish and connection Distinguish:
Correlation: Both X and Y are random
Regression: Y is random
X is not random – Type regression
X is also random – Type regression
![Page 17: Medical Statistics (full English class)](https://reader036.vdocuments.mx/reader036/viewer/2022062308/56812cad550346895d915f05/html5/thumbnails/17.jpg)
Connection: When both X and Y are random
1) Same sign for correlation coefficient
and regression coefficient
2) t tests are equivalent
tr = tb
![Page 18: Medical Statistics (full English class)](https://reader036.vdocuments.mx/reader036/viewer/2022062308/56812cad550346895d915f05/html5/thumbnails/18.jpg)
3) Coefficient of determination Without regression, given the value of Xi we can only predict , the sum of
squared residuals is
After regression, given the value of Xi we can predict , the sum of squared residuals is
Contribution of regression
It can be proved
n
ii YYSS
1
2Total )(
n
iii YYSS
1
2Residual )ˆ(
ResidualTotalRegression SSSSSS
Total
RegressionionDeterminat oft CoefficienSS
SS
2ionDeterminat oft Coefficien r
YYi
iii bXaYY ˆ
![Page 19: Medical Statistics (full English class)](https://reader036.vdocuments.mx/reader036/viewer/2022062308/56812cad550346895d915f05/html5/thumbnails/19.jpg)
2. Caution --
for regression and correlation
1) Don’t put any two variables together for correlation and regression – They must have some relation in subject matter;
2) Correlation does not necessary mean causality
-- sometimes may be indirect relation or even no any real relation;
![Page 20: Medical Statistics (full English class)](https://reader036.vdocuments.mx/reader036/viewer/2022062308/56812cad550346895d915f05/html5/thumbnails/20.jpg)
3) A big value of r does not necessary mean a big regression coefficient b;
4) To reject H0: ρ=0 does not necessary mean the correlation is strong -- ρ≠0;
5) Scatter diagram is useful before working with linear correlation and linear regression;
6) The regression equation is not allowed to be applied beyond the range of the data set.
![Page 21: Medical Statistics (full English class)](https://reader036.vdocuments.mx/reader036/viewer/2022062308/56812cad550346895d915f05/html5/thumbnails/21.jpg)