Math 4030 – 13a
Correlation&
Regression
Correlation (Sec. 11.6): Two random variables, X and Y, both
continuous numerical;
Correlation exists when the value of one variable go “consistently” up or down with the change of the other variable.
Correlation coefficient: r [-1,1]
1
1
1
ni i
i x y
x x y yrn s s
Calculation:
1
1
1
ni i
i x y
x x y yrn s s
22
2
1 1 1
22
2
1 1 1
1 1 1 1
1,
1,
1
n n n
xx i i ii i i
n n n
yy i i ii i i
n n n n
xy i i i i i ii i i i
S x x x xn
S y y y yn
S x x y y x y x yn
xy
xx yy
Sr
S S
or
x y x2 y2 xy
x1 y1
… …
xn yn
xi yi xi2 yi
2 xiyi
Meaning of r values:
r = 0.5b = 0.8
r = - 0.9b = - 1.4
r = - 0.95b = - 0.08
r = 0.01b = 0.9
r vs. b:
xy
xx yy
Sr
S S
xy
xx
Sb
S
xx
yy
Sr b
S
• r and b have the same sign;• b is the slope of the linear relationship;• r is the strength of the linear relationship;• r [-1,1], b (-, +).
Correlation Coefficient and the Efficiency of the
(Linear) Regression Model
Decomposition of Variability
2 2xy xy
yy yyxx xx
S SS S
S S
22
1 1
22
1 1
ˆ ˆ
ˆ ˆ
n n
i i i ii i
n n
i i ii i
y y y y y y
y y y y
Coefficient of determination:2 2
xy xyyy yy
xx xx
S SS S
S S
Proportion of total variability explained by the linear regression: 2 2
2xy xx xy
yy xx yy
S S Sr
S S S
Regression StatisticsMultiple R 0.95148137R Square 0.9053168Adjusted R Square 0.8934814Standard Error 0.15905212Observations 10
Coefficient of Determination
Correlation Coefficient
22 11
11 R
kn
nRa
Testing about the normal population correlation coefficient :
Distribution of sample statistic r?
Fisher Z transformation:
X Y
X Y
X YE
r
r
1
1ln
2
1Z-Fisher
r (-1, 1) Fisher- (- , )
If joint distribution of (X,Y) is approximately bivariate normal, then
3
1,
1
1ln
2
1~-Fisher
nN
Test statistic for H0: = 0
0
0Z-Fisher
11
11ln
2
3
31
Z-Fisher
r
rn
n
Test statistic for H0: = 0
r
rn
1
1ln
2
3
Confidence interval for :
3
data Sample from score Z-Fisher 2/
n
z
Confidence interval for Fisher-Z score:
3
1,
1
1ln
2
1~-Fisher
nN
ff
ff
zz
zz
fee
ee
1
1ln
2
1z
Solve the two boundary value for using relationship
Strength vs. significance of the correlation:
• the significance, given by P-value, depends the significance, given by P-value, depends on the statistical evidence. When small, the on the statistical evidence. When small, the correlation (despite of the strength) exists. correlation (despite of the strength) exists.
• the strength, given by the r value, is the strength, given by the r value, is meaningful only it is supported by statistical meaningful only it is supported by statistical significance.significance.