math 4030 – 13a correlation & regression. correlation (sec. 11.6): two random variables, x...

13
Math 4030 – 13a Correlation & Regression

Upload: clifford-may

Post on 20-Jan-2016

235 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Math 4030 – 13a Correlation & Regression. Correlation (Sec. 11.6):  Two random variables, X and Y, both continuous numerical;  Correlation exists when

Math 4030 – 13a

Correlation&

Regression

Page 2: Math 4030 – 13a Correlation & Regression. Correlation (Sec. 11.6):  Two random variables, X and Y, both continuous numerical;  Correlation exists when

Correlation (Sec. 11.6): Two random variables, X and Y, both

continuous numerical;

Correlation exists when the value of one variable go “consistently” up or down with the change of the other variable.

Correlation coefficient: r [-1,1]

1

1

1

ni i

i x y

x x y yrn s s

Page 3: Math 4030 – 13a Correlation & Regression. Correlation (Sec. 11.6):  Two random variables, X and Y, both continuous numerical;  Correlation exists when

Calculation:

1

1

1

ni i

i x y

x x y yrn s s

22

2

1 1 1

22

2

1 1 1

1 1 1 1

1,

1,

1

n n n

xx i i ii i i

n n n

yy i i ii i i

n n n n

xy i i i i i ii i i i

S x x x xn

S y y y yn

S x x y y x y x yn

xy

xx yy

Sr

S S

or

x y x2 y2 xy

x1 y1

… …

xn yn

xi yi xi2 yi

2 xiyi

Page 4: Math 4030 – 13a Correlation & Regression. Correlation (Sec. 11.6):  Two random variables, X and Y, both continuous numerical;  Correlation exists when

Meaning of r values:

r = 0.5b = 0.8

r = - 0.9b = - 1.4

r = - 0.95b = - 0.08

r = 0.01b = 0.9

Page 5: Math 4030 – 13a Correlation & Regression. Correlation (Sec. 11.6):  Two random variables, X and Y, both continuous numerical;  Correlation exists when

r vs. b:

xy

xx yy

Sr

S S

xy

xx

Sb

S

xx

yy

Sr b

S

• r and b have the same sign;• b is the slope of the linear relationship;• r is the strength of the linear relationship;• r [-1,1], b (-, +).

Page 6: Math 4030 – 13a Correlation & Regression. Correlation (Sec. 11.6):  Two random variables, X and Y, both continuous numerical;  Correlation exists when

Correlation Coefficient and the Efficiency of the

(Linear) Regression Model

Page 7: Math 4030 – 13a Correlation & Regression. Correlation (Sec. 11.6):  Two random variables, X and Y, both continuous numerical;  Correlation exists when

Decomposition of Variability

2 2xy xy

yy yyxx xx

S SS S

S S

22

1 1

22

1 1

ˆ ˆ

ˆ ˆ

n n

i i i ii i

n n

i i ii i

y y y y y y

y y y y

Page 8: Math 4030 – 13a Correlation & Regression. Correlation (Sec. 11.6):  Two random variables, X and Y, both continuous numerical;  Correlation exists when

Coefficient of determination:2 2

xy xyyy yy

xx xx

S SS S

S S

Proportion of total variability explained by the linear regression: 2 2

2xy xx xy

yy xx yy

S S Sr

S S S

Page 9: Math 4030 – 13a Correlation & Regression. Correlation (Sec. 11.6):  Two random variables, X and Y, both continuous numerical;  Correlation exists when

Regression StatisticsMultiple R 0.95148137R Square 0.9053168Adjusted R Square 0.8934814Standard Error 0.15905212Observations 10

Coefficient of Determination

Correlation Coefficient

22 11

11 R

kn

nRa

Page 10: Math 4030 – 13a Correlation & Regression. Correlation (Sec. 11.6):  Two random variables, X and Y, both continuous numerical;  Correlation exists when

Testing about the normal population correlation coefficient :

Distribution of sample statistic r?

Fisher Z transformation:

X Y

X Y

X YE

r

r

1

1ln

2

1Z-Fisher

r (-1, 1) Fisher- (- , )

If joint distribution of (X,Y) is approximately bivariate normal, then

3

1,

1

1ln

2

1~-Fisher

nN

Page 11: Math 4030 – 13a Correlation & Regression. Correlation (Sec. 11.6):  Two random variables, X and Y, both continuous numerical;  Correlation exists when

Test statistic for H0: = 0

0

0Z-Fisher

11

11ln

2

3

31

Z-Fisher

r

rn

n

Test statistic for H0: = 0

r

rn

1

1ln

2

3

Page 12: Math 4030 – 13a Correlation & Regression. Correlation (Sec. 11.6):  Two random variables, X and Y, both continuous numerical;  Correlation exists when

Confidence interval for :

3

data Sample from score Z-Fisher 2/

n

z

Confidence interval for Fisher-Z score:

3

1,

1

1ln

2

1~-Fisher

nN

ff

ff

zz

zz

fee

ee

1

1ln

2

1z

Solve the two boundary value for using relationship

Page 13: Math 4030 – 13a Correlation & Regression. Correlation (Sec. 11.6):  Two random variables, X and Y, both continuous numerical;  Correlation exists when

Strength vs. significance of the correlation:

• the significance, given by P-value, depends the significance, given by P-value, depends on the statistical evidence. When small, the on the statistical evidence. When small, the correlation (despite of the strength) exists. correlation (despite of the strength) exists.

• the strength, given by the r value, is the strength, given by the r value, is meaningful only it is supported by statistical meaningful only it is supported by statistical significance.significance.