1 chapter 10 linear regression and correlation relationship between variables

125
1 Chapter 10 Linear regression and correlation Relationship between variables

Post on 19-Dec-2015

237 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1 Chapter 10 Linear regression and correlation Relationship between variables

1

Chapter 10

Linear regression and correlation

Relationship between variables

Page 2: 1 Chapter 10 Linear regression and correlation Relationship between variables

2

Relationship between variables

Age and blood pressure

Nutrient level and growth of cells

Height and weight

To determine the strength of relationship between two variables and to test if it is

statistically significant

Page 3: 1 Chapter 10 Linear regression and correlation Relationship between variables

3

Two samples t test vs regression

group 1 group 2

22 18

21 15

27 13

34 20

30 32

Observe group

22 1

21 1

27 1

34 1

30 1

18 2

15 2

13 2

20 2

32 2

Page 4: 1 Chapter 10 Linear regression and correlation Relationship between variables

4

Difference, variation and association analysis

Relationship Variable Y Variable X

Two sample t test (quantity)

One way ANOVA (quantity)

Regression and correlation

(quantity)

Group(0,1)

(category)

Group(ABC)

(category)

(quantity)

Page 5: 1 Chapter 10 Linear regression and correlation Relationship between variables

5

Sir Francis Galton (16 February 1822 – 17 January 1911)

Polymath:Meteorology (the anti-cyclone and the first popular weather maps);Psychology (synaesthesia);Biology (the nature and mechanism of heredity);Eugenicist; Criminology (fingerprints); Statistics (regression and correlation).

Page 6: 1 Chapter 10 Linear regression and correlation Relationship between variables

6

Page 7: 1 Chapter 10 Linear regression and correlation Relationship between variables

7

Page 8: 1 Chapter 10 Linear regression and correlation Relationship between variables

8

Related but Different

Regression analysis:

one of the variables (e.g. blood pressure) is dependent on (caused by) the other which are fixed and measured without error (e.g. age).

Correlation analysis:

both variables are experimental and measured with error (e.g height and weight).

Page 9: 1 Chapter 10 Linear regression and correlation Relationship between variables

9

Regression analysis

Recording

Number

X

temperature

(◦Celsius)

Y

heart rate

(beast/minute)

1 2 5

2 4 11

3 6 11

4 8 14

5 10 22

6 12 23

7 14 32

8 16 29

9 18 32

The experimental data

Repeated experiments

Page 10: 1 Chapter 10 Linear regression and correlation Relationship between variables

10

Correlation analysis

Animal

X

Length

(cm)

Y

Width

(cm)

1 10.7 5.8

2 11.0 6.0

3 9.5 5.0

4 11.1 6.0

5 10.3 5.3

6 10.7 5.8

7 9.9 5.2

8 10.6 5.7

9 10.0 5.3

10 12.0 6.3

The experimental data

More individuals measured

Page 11: 1 Chapter 10 Linear regression and correlation Relationship between variables

11

Equation for a straight line

If you know a and b, can predict Y from X

----the goal of regression

Y a bX

Regression analysis

Page 12: 1 Chapter 10 Linear regression and correlation Relationship between variables

12

Regression vs correlation

Page 13: 1 Chapter 10 Linear regression and correlation Relationship between variables

13

Concepts

Simple linear regression

Simple linear correlation

Correlation analysis based on ranks

Page 14: 1 Chapter 10 Linear regression and correlation Relationship between variables

14

Example

Consider growth rate of a yeast colony and nutrient level .

If you increase nutrient level, the growth rate would increase.

Growth rate is dependent on nutrient level but nutrient level is NOT dependent on growth rate.

Page 15: 1 Chapter 10 Linear regression and correlation Relationship between variables

15

Growth rate is called the Dependent Variable and is given the symbol Y.

Nutrient level (the causal factor) is called the Independent Variable and is given the symbol X.

Variables in Regression

Nutrient level X

Grow

th rate Y

Page 16: 1 Chapter 10 Linear regression and correlation Relationship between variables

16

Single linear model assumption

a) X’s are fixed and measured without error

b)

c)

d) Homoscedastic

|( ) Y XE Y X

2| ~ . . . (0, )i ii i i Y X i iY X or and i i d N

α,β:constant real numbers, β ≠0

independent identically distributed

Page 17: 1 Chapter 10 Linear regression and correlation Relationship between variables

17

Page 18: 1 Chapter 10 Linear regression and correlation Relationship between variables

18

General steps for simple linear regression analysis

① Graphing the data

② Fitting the best straight line

③ Testing whether the linear relationship is

statistically significant or not

Page 19: 1 Chapter 10 Linear regression and correlation Relationship between variables

19

① Graphing the data

② Fitting the best straight line

No relationship

Relationship but not straight-lined

Negative linear relationship

Positive linear relationship

Which one?

Need criterion

Page 20: 1 Chapter 10 Linear regression and correlation Relationship between variables

20

Example: Area of a yeast colony on successive days.

Are

a (y

)

Time days (x)

H

L

Slope (b) = H/L

a

Intercept (at x=0)

00

The best fit

Page 21: 1 Chapter 10 Linear regression and correlation Relationship between variables

21

ProblemA

rea

(y)

Time days (x)

How to estimate a and b?

Page 22: 1 Chapter 10 Linear regression and correlation Relationship between variables

22

0

Method

y

x0

Y Y( , )i iX Y

2( )Total iSS Y Y Total sum of squares for Y:

( ) YE Y Fitting to the dataY Y

Page 23: 1 Chapter 10 Linear regression and correlation Relationship between variables

23

MethodA

rea

(y)

Time days (x)

Y a bX

( , )i iX Y

Residual error sum of squares: 2ˆ( )E i iiSS Y Y

Fitting to the dataY a bX

a and b should minimize the residual error

Page 24: 1 Chapter 10 Linear regression and correlation Relationship between variables

24

Method

ˆ ˆ( ) ( )i i i iY Y Y Y Y Y ˆ ˆ( ) ( ) ( )i i i iY Y Y Y Y Y

Page 25: 1 Chapter 10 Linear regression and correlation Relationship between variables

25

ˆ ˆ( ) ( ) ( )i i i iY Y Y Y Y Y

2 2ˆ ˆ( ) [( ) ( )]i i i iY Y Y Y Y Y 2 2ˆ ˆ ˆ ˆ( ) 2 ( )( ) ( )i i i i i iY Y Y Y Y Y Y Y

=0

2 22 ˆ ˆ( ) ( ) ( )i i i iY Y Y Y Y Y Sum of Squares

Total

SSTotal

Sum of Squaresdue to regression

SSR

Sum of SquaresResidual or error

SSE

maximize minimize

Page 26: 1 Chapter 10 Linear regression and correlation Relationship between variables

26

Least Square Regression Equation

• Minimize SSError by partial derivatives

0 2 { [ ' ( )]} 0'

' ( ) 0

' 0

'

Errori i

i i

i

i

SSY a b X X

a

Y a b X X

Y na

Ya Y

n

2 2ˆ( ) { [ ' ( )]}Error i i i iSS Y Y Y a b X X

ˆ' ( ) ; ' ( )i i i i iY a b X X Y a b X X

=0

Page 27: 1 Chapter 10 Linear regression and correlation Relationship between variables

27

Least Square Regression Equation

2

0 2{ [ ' ( )]}[ ( )] 0

( ) '( ) ( ) 0

Errori i i

i i i i

SSY a b X X X X

b

Y X X a X X b X X

=0

2

2

2 2

( ) ( )

( )

( )

( )( ) /

( ) /

i i i

i i

i

i i i i XY

i i X

Y X X b X X

Y X Xb

X X

X Y X Y n SSb

X X n SS

Page 28: 1 Chapter 10 Linear regression and correlation Relationship between variables

28

Result

Least squares regression line

22

( )( )

( )XY

X

X YXY SSnb

X SSX

n

a Y bX

|ˆ ˆ

Y XY a bX |ˆ ˆ ' ( X)Y XY a b X

'a Y

Page 29: 1 Chapter 10 Linear regression and correlation Relationship between variables

29

③ Simple Linear Regression Analysis

A global test for regression (ANOVA)

A test for regression coefficient (Student

t test)

Page 30: 1 Chapter 10 Linear regression and correlation Relationship between variables

30

Hypothesis

H0: The variation in Y is not explained by

a linear model, i.e., β=0

Ha: A significant portion of the variation in

Y is explained by a linear model i.e., β≠0

Page 31: 1 Chapter 10 Linear regression and correlation Relationship between variables

31

Partitioning the Sum of Squares

2 2ˆ( ) ( ( ) )R i iSS Y Y Y b X X Y 2 22 2 2( ) ( )i i Xb X X b X X b SS

XY

X

SSb

SS

22 ( )

( )XY XYX XY

X X

SS SSSS bSS

SS SS

22 2 ( )

( ) iTotal i i

YSS Y Y Y

n

2ˆ( )E i i Total RSS Y Y SS SS

Page 32: 1 Chapter 10 Linear regression and correlation Relationship between variables

32

n-1SSTotalTotal

MSEn-2SSE

Error

See Table C.7MSR1SSRRegression

c.v.FE(MS)MSDFSSSource of variation

2 2Y XSS R

E

MS

MS

2Y

The ANOVA table for a regression analysis

If H0 is true

2 2

2( ) R Y X

E Y

EMS SSE F

EMS

2 2

2( ) 1Y X

Y

SSE F

If Ha is true,β=0

=1

Test statistic: (1, 2)R

E

MSF F n

MS

Page 33: 1 Chapter 10 Linear regression and correlation Relationship between variables

33

Coefficient of determination

a measure of the amount of the variability in Y that is explained by its dependence on X.

Coeff of D. = R

Total

SS

SS

Page 34: 1 Chapter 10 Linear regression and correlation Relationship between variables

34

Simple Linear Regression Analysis

A global test for regression (ANOVA)

A test for regression coefficient (Student

t test)

Page 35: 1 Chapter 10 Linear regression and correlation Relationship between variables

35

Hypothesis

H0: The variation in Y is not explained by

a linear model, i.e., β=0

Ha: A significant portion of the variation in

Y is explained by a linear model i.e., β≠0

Page 36: 1 Chapter 10 Linear regression and correlation Relationship between variables

36

t test statistic

Variance of b:

It’s estimate:

Standard error of b:

~ ( 2)b

bt t ns

2 21b Y

XSS

2 1( )

2E E

bX X

SS MSs

SS n SS

Eb

X

MSs

SS

Page 37: 1 Chapter 10 Linear regression and correlation Relationship between variables

37

F(1,n-2) = t2(n-2)

Eb

X

MSs

SS

2

21 1 / 2

R X

E E

X

b E

MS b SSANOVA F

MS MS

b SSbStudent t

s MS

F t

Page 38: 1 Chapter 10 Linear regression and correlation Relationship between variables

38

follow student’s t distribution

Confidence interval

Confidence interval for β

1 12 2

( ) 1b

bP t t

s

1 12 2

( ) 1 , 2b bC b t s b t s with df n

1 21 1

2 2

,b bL b t s L b t s

, 2b

bt df n

s

Page 39: 1 Chapter 10 Linear regression and correlation Relationship between variables

39

Since

And

Standard error of is:

|ˆ ˆ ( )i Y X iY Y b X X

2 2E b E XYs MS n s MS SS

Confidence Interval for |Y X

22

ˆ 2

( )1( ) [ ]

2 ( )i

iE E EiY

X i

X XMS MS SSs X X

n SS n n X X

|ˆY X

Sampling error

Page 40: 1 Chapter 10 Linear regression and correlation Relationship between variables

40

Confidence Interval for |Y X

follow student’s t distribution

Confidence interval

ˆ ˆ| | |1 1

2 2

ˆ ˆ( ) 1 ,

2

Y X Y X Y XY YC t s t s

with df n

ˆ

| | , 2ˆ

Y

Y X Y Xt df nS

L1 L2

Page 41: 1 Chapter 10 Linear regression and correlation Relationship between variables

41

Since

And

Standard error of is:

( )i i iY Y b X X

2 2E b E XYs MS n s MS SS

Confidence Interval for iY

22

2

( )1( ) [1 ]

2 ( )i

iE E EY E i

X i

X XMS MS SSs MS X X

n SS n n X X

iY

Sampling error

Page 42: 1 Chapter 10 Linear regression and correlation Relationship between variables

42

Understand the regression

analysis via example

Page 43: 1 Chapter 10 Linear regression and correlation Relationship between variables

43

Example1: Yield of tomato varieties

Variety 1 Variety 2

22 18

21 15

27 13

34 20

30 32

28 27

21 11

29 20

22 16

14 17

248 189Totals:

Variety

1

Variety

2

Mean 24.8 18.9

St. Dev. 5.8271 6.3675

Variance 33.96 40.54

Nos. of

observations 10 10

Summarized data:

Page 44: 1 Chapter 10 Linear regression and correlation Relationship between variables

44

A. Student’s t test

There is no difference between the two variances

There is difference between the two mean

22

21

33.96 0.84, 0.79640.54sF ps

Accept H0

0.045p Reject H0

2 40.54 33.9637.3

2ps

1 2

(24.8 18.9) (0 0) 5.92.16 2 18

2.731 137.3 ( )

10 10

t with n n

0 1 2:H

Page 45: 1 Chapter 10 Linear regression and correlation Relationship between variables

45

B. ANOVA

Item df SS MS F(1,18) P

Between 1 174.05 174.05 4.67** <0.01

Within 18 670.5 37.25

Total 19 844.55

Conclusion: Reject H0

0 1 2:H

Page 46: 1 Chapter 10 Linear regression and correlation Relationship between variables

46

Compare ANOVA with t test

t was 2.16 for 18df, 0.05 P 0.01

F was 4.67 for 1 and 18 df, 0.05 P 0.01

In fact, F= t2 (i.e. 4.67=2.162)

Why?

Because with t we are dealing with differences

while with F we are dealing with variances

(differences squared)

Page 47: 1 Chapter 10 Linear regression and correlation Relationship between variables

47

C. RegressionObserve Variety

22 1

21 1

27 1

34 1

30 1

18 2

15 2

13 2

20 2

32

.

2

.

Variety 1 Variety 2

22 18

21 15

27 13

34 20

30 32

28 27

21 11

29 20

22 16

14 17

Page 48: 1 Chapter 10 Linear regression and correlation Relationship between variables

48

Calculations20 30 437n X Y

2 22 ( ) 30

50 520X

XSS X

n

2 2 2

2

10 1 10 2 50

10393

626

X

Y

XY

2 22 ( ) 437

10393 844.5520Y

YSS Y

n

30 437626 29.5

20XY

X YSS XY

n

Page 49: 1 Chapter 10 Linear regression and correlation Relationship between variables

49

Regression coefficient

Intercept atx

Intercept

regression equation

Estimation

29.55.9

5XY

X

SSb

SS

5, 844.55, 29.5X Y XYSS SS SS

30.7 5.9Y X

21.85 5.9 1.5 30.7a Y b X

21.85a Y

21.85 5.9( 1.5)Y X

Page 50: 1 Chapter 10 Linear regression and correlation Relationship between variables

50

Testing the significance ANOVA

H0: no linear relation between y and x. β=0

Ha: the variation in y is linearly explained by

the variation in x. i.e., β≠0

1

2

|

( | 1)

( | 2)

( )

2

Y X

E Y X

E Y X

E Y X

2 1

2 1

0 :

0 :

0H

H

Page 51: 1 Chapter 10 Linear regression and correlation Relationship between variables

51

ANOVA SS

5, 844.55, 29.5X Y XYSS SS SS

2 2

844.55

( ) ( 29.5)174.05

5

670.5

Total Y

XYR

X

E Total R

SS SS

SSSS

SS

SS SS SS

Page 52: 1 Chapter 10 Linear regression and correlation Relationship between variables

52

Regression ANOVA

Item df SS MS F P

Regression 1 174.05 174.05 4.67** <0.01

Error 18 670.5 37.25

Total 19 844.55

Conclusion: Reject H0

Page 53: 1 Chapter 10 Linear regression and correlation Relationship between variables

53

Coefficient of determination

a measure of the amount of the variability in y that is explained by its dependence on x.

Coeff of D. = R

Total

SS

SS

174.0520.6%

844.55

Page 54: 1 Chapter 10 Linear regression and correlation Relationship between variables

54

test for regression coefficient

• H0: β=0

37.252.729

5E

bX

MSs

SS

5.92.16 ~ (18)

2.729b

bt ts

Page 55: 1 Chapter 10 Linear regression and correlation Relationship between variables

55

Example 2:Yeast Data

Yeast colony grown on

agar.

Area measured (mm2)

on 9 successive days

and area transformed

to logs.

Time

(days)

Area

(log mm2)

1 3.62 3.8

3 4.2

4 4.5

5 5.0

6 5.2

7 5.5

8 5.6

9 6.1

x = 45 y = 43.5

Page 56: 1 Chapter 10 Linear regression and correlation Relationship between variables

56

Exponential

Page 57: 1 Chapter 10 Linear regression and correlation Relationship between variables

57

Scatterplot of Yeast dataA

rea

(2lo

g m

m)

0 2 4 6 81 3 5 7 9

3.0

4.0

5.0

6.0

Days

Apparent positive

linear relationship

Page 58: 1 Chapter 10 Linear regression and correlation Relationship between variables

58

Nonlinear->linear

Power-law:

log log log

Exponential:

log log log

b

X

Y aX

Y a b X

Y ab

Y a X b

Page 59: 1 Chapter 10 Linear regression and correlation Relationship between variables

59

Calculations9 45 43.5n X Y

2 22 ( ) 45

285 609X

XSS X

n

2 2 2 2

2 2 2 2

1 2 9 285

3.6 3.8 6.1 216.15

1 3.6 2 3.8 9 6.1 236.2

X

Y

XY

2 22 ( ) 43.5

216.15 5.99Y

YSS Y

n

45 43.5236.2 18.7

9XY

X YSS XY

n

Time

(days)

Area

(2log mm)

1 3.6

2 3.8

3 4.2

4 4.5

5 5.0

6 5.2

7 5.5

8 5.6

9 6.1

X = 45 Y = 43.5

Page 60: 1 Chapter 10 Linear regression and correlation Relationship between variables

60

Regression coefficient

Intercept atx

Intercept

regression equation

Estimation

18.70.3117

60XY

X

SSb

SS

60, 5.9, 18.7X Y XYSS SS SS

3.27 0.3117Y X

4.83 0.3117 5 3.27a Y b X

4.83a Y

4.83 0.3117( 5.0)Y X

Page 61: 1 Chapter 10 Linear regression and correlation Relationship between variables

61

To fit line

regression equation

Use two extreme values of x; 0 and 9

When x = 0, y = 3.27

When x = 9, y = 3.27 + 0.3117*9 = 6.08

3.27 0.3117Y a bX X

Page 62: 1 Chapter 10 Linear regression and correlation Relationship between variables

62

Fitting the best line.

0 2 4 6 81 3 5 7 9

3.0

4.0

5.0

6.0

Days

Are

a

(y= 3.27 + 0.3117x)

(0, 3.27)

(9, 6.08)

(5, 4.83)

Page 63: 1 Chapter 10 Linear regression and correlation Relationship between variables

63

Testing the significance ANOVA

H0: no linear relation between y and x. β=0

Ha: the variation in y is linearly explained

by the variation in x. i.e., β≠0

Item df SS MS F c.v.

Regression 1

Error n-2

Total n-1

Page 64: 1 Chapter 10 Linear regression and correlation Relationship between variables

64

ANOVA SS

60, 5.9, 18.7X Y XYSS SS SS

2 2

5.9

( ) 18.75.8282

60

5.9 5.8282 0.0718

Total Y

XYR

X

E Total R

SS SS

SSSS

SS

SS SS SS

Page 65: 1 Chapter 10 Linear regression and correlation Relationship between variables

65

Regression ANOVA

Item df SS MS F P

Regression 1 5.8282 5.8282 565.8** <0.01

Error 7 0.0718 0.0103

Total 8 5.9000

Conclusion: Reject H0 (of no relationship) and conclude

that a significant portion of the variability in colony area

is explained by regression on time.

Page 66: 1 Chapter 10 Linear regression and correlation Relationship between variables

66

Coefficient of determination

a measure of the amount of the variability in y that is explained by its dependence on x.

Coeff of D. = R

Total

SS

SS

5.828298.8%

5.9

Page 67: 1 Chapter 10 Linear regression and correlation Relationship between variables

67

inference from yeast data

H0: Log area has no linear relationship with time

Ha: Log area has a linear relationship with time

inference: Reject H0 and accept Ha,

i.e. log area changes linearly with time.

Moreover, it explains 98.8% of the variation.

Page 68: 1 Chapter 10 Linear regression and correlation Relationship between variables

68

Confidence interval for β

(standard error of

b)

95% confidence interval for β

2 0.0163= 0.0001717

60

0.0001717 0.0131

Eb

X

b

MSs

SS

s

1

2

0.3117 2.365 0.0131 0.2807

0.3117 2.365 0.0131 0.3427

L

L

0.311723.79 565.8

0.0131b

bt Fs

Page 69: 1 Chapter 10 Linear regression and correlation Relationship between variables

69

Confidence interval for

standard error

95% confidence interval limits

1

2

4.83 2.365 0.034 4.75

4.83 2.365 0.034 4.91

L

L

|Y X

2

ˆ 2

2

( )1[ ]

( )

1 (5 5)0.0103[ ] 0.034

9 60

i

iEY

i

X Xs MS

n X X

|5ˆ ˆ 4.83 0.3117 (5 5) 4.83i YY

?

?

Page 70: 1 Chapter 10 Linear regression and correlation Relationship between variables

70

Summary of Regression

Regression analysis is used when one variable is fixed

(x) and likely to cause variation in the other (y)

Graph data to ascertain linear relationship apparent

Calculate the regression equation using the least

squares method

Test the significance of this equation with ANOVA

If significant, plot the equation on the graphed data

Calculate required confidence intervals

Page 71: 1 Chapter 10 Linear regression and correlation Relationship between variables

71

Partial pressure CO2 (torr)

respiration rate(breaths/minute)

1 30 8.1

2 32 8.0

3 34 9.9

4 36 11.2

5 38 11.0

6 40 13.2

7 42 14.6

8 44 16.6

9 46 16.7

10 48 18.3

11 50 18.2

Example 3: The effect of carbon dioxide on respiration rate

① Construct a scatterplot of these data

② Compute the linear regression equation

③ Test the significance of this equation via ANOVA

④ Calculate the 95% CI for β

⑤ Find the predicted respiration rate for

48 torr and the 95% CI

⑥ Find the predicted respiration rate for

38 torr and the 95% CI

⑦ Why these two CI have difference

length?

Page 72: 1 Chapter 10 Linear regression and correlation Relationship between variables

72

Scat terpl ot of CO2 pressure andrespi rat i on rate

7

10

13

16

19

22

26 30 34 38 42 46 50 54

CO2 pressure

resp

irat

ion

rate

Positive linear relationship

Page 73: 1 Chapter 10 Linear regression and correlation Relationship between variables

73

2

2

11

440 40 18040

145.8 13.25 2082.04

6085

n

X X X

Y Y Y

XY

2 22

( )( ) 40 13.256085.0-

11 0.575( ) 440

18040-11

XY

X

X YXYSS nb

XSSX

n

13.25 0.575 40 9.745a Y bX

ˆ 9.745 0.575Y X

Compute the linear regression equation

Page 74: 1 Chapter 10 Linear regression and correlation Relationship between variables

74

CO2

res rate

5045403530

20

18

16

14

12

10

8

6

S 0. 671009R-Sq 97. 3%R-Sq(adj ) 97. 0%

Fi tted Li ne Pl otres rate = - 9. 745 + 0. 5750 CO2

Page 75: 1 Chapter 10 Linear regression and correlation Relationship between variables

75

Test the significance 2 2

2

22

22

( ) 145.82082.04 149.53

11

( )( )( ) 253

145.47( ) 440

149.53 145.47 4.05

Total

R

E Total R

YSS Y

n

X YXY

nSSX

Xn

SS SS SS

149.5310Total

0.454.059Remainder

<0.01323.10**145.47145.471Regression

PFMSSSdfItem

Conclusion: Reject H0 and accept Ha, respiration rate

changes linearly with partial pressure CO2

Page 76: 1 Chapter 10 Linear regression and correlation Relationship between variables

76

The CI for β: [0.503 , 0.647]

The CI for y48: [17.53 , 18.18] 0.65

The CI for y38: [11.89 , 12.32] 0.43

t1-α/2 =?2.365, 2.306, 2.262, 2.228

df 7, 8, 9, 10

1 21 1

2 2

,b bL b t s L b t s

ˆ ˆ1 | 2 |1 1

2 2

ˆ ˆ,Y X Y XY YL t s L t s

Page 77: 1 Chapter 10 Linear regression and correlation Relationship between variables

77

The difference length between the two CI is due to that 48 torr gets further from average partial pressure CO2-

40 torr than 38 torr.

The CI length increases when partial pressure is further from the mean value, as it is determined by the standard error of Y and this standard error is determined by the difference between individual partial pressure and average partial pressure.

2

ˆ 2

( )1[ ]

( )i

iEY

i

X Xs MS

n X X

ˆ ˆ1 | 2 |1 1

2 2

ˆ ˆ,Y X Y XY YL t s L t s

Page 78: 1 Chapter 10 Linear regression and correlation Relationship between variables

78

SinceAnd Standard error of is:

|ˆ ˆ ( )i Y X iY Y b X X 2 2

E b E XYs MS n s MS SS

Confidence Interval for |Y X

22

ˆ 2

( )1( ) [ ]

2 ( )i

iE E EiY

X i

X XMS MS SSs X X

n SS n n X X

|ˆY X

Sampling error

22

2

( )1( ) [1 ]

2 ( )i

iE E EY E i

X i

X XMS MS SSs MS X X

n SS n n X X

Page 79: 1 Chapter 10 Linear regression and correlation Relationship between variables

79

y = 0. 575x - 9. 7455R2 = 0. 9729

7

9

11

13

15

17

19

30 35 40 45 50

CO2

RES

rate

dataCI f or YCI f or YCI f or YiCI f or Yi

(data)线性

Page 80: 1 Chapter 10 Linear regression and correlation Relationship between variables

80

Concepts

Simple linear regression

Simple linear correlation

Correlation analysis based on ranks

Page 81: 1 Chapter 10 Linear regression and correlation Relationship between variables

81

Correlation Analysis

To measure the intensity of association

observed between any pair of variables

and test it’s statistic significance

To test whether two variables are covary

or interdependent

Page 82: 1 Chapter 10 Linear regression and correlation Relationship between variables

82

Page 83: 1 Chapter 10 Linear regression and correlation Relationship between variables

83

Correlation analysis

Two variables, X and Y, are of the same status and we can’t tell which cause and effect not clear.

There may be some common underlying cause of both.

Examples:

height and weight

height of siblings

Page 84: 1 Chapter 10 Linear regression and correlation Relationship between variables

84

Types of relationship.

y

x

Positive correlation; large X’s are associated with large Y’s

y

x

Negative correlation; large X’s are associated with small Y’s

y

x

No correlation; Y and X no linear correlation.

Page 85: 1 Chapter 10 Linear regression and correlation Relationship between variables

85

No linear correlation ≠ independent

Y=X2

Dependent

Linear

correlation

Independent

Page 86: 1 Chapter 10 Linear regression and correlation Relationship between variables

86

( )( )i i

X Y

X X Y Y

S S

Index of association

Positive correlation

Large X’s associated with large Y’s

Negative correlationLarge X’s associated with small Y’s

Standardized normal

deviates for X

Standardized normal

deviates for Y

( )( ) 0i iX X Y Y

( )( ) 0i iX X Y Y

( )( )i i

X Y

X X Y Y

S S

Page 87: 1 Chapter 10 Linear regression and correlation Relationship between variables

87

Pearson product-moment correlation coefficient

Abbr: Pearson correlation coefficient

2 22 2

( )( )

( 1)

( )( )

( ) ( )[ ][ ]

i i

X Y

XY

X Y

X X Y Yr

n S S

X YXY SSn

SS SSX YX Y

n n

Corrected sum of squares of X

Corrected sum of squares of Y

Corrected cross products

Sample size

Page 88: 1 Chapter 10 Linear regression and correlation Relationship between variables

88

Pearson correlation coefficient

A widely used index of association

Association of two quantitative variables

An estimate of population correlation

coefficient XY

X Y

COV(X, Y) = E[(X-E(X))(Y-E(Y))]

Page 89: 1 Chapter 10 Linear regression and correlation Relationship between variables

89

Population correlation efficient is

Sample correlation efficient is

Linear Correlation Model

22

1

)()(

))((

))((1

yx

yx

y

yN

x

x

YX

YX

YX

N

Normal standard deviation of X Normal standard deviation of Y

2 21

( )( )1( )( )

1 ( ) ( )

nxy

X Y x y

SSX X Y YX X Y Yr

n s s SS SSX X Y Y

Page 90: 1 Chapter 10 Linear regression and correlation Relationship between variables

90

Linear Correlation Model

Y’s at each X are normal distributed

X’s at each Y are normal distributed

Conditional population mean is

Which follow the regression equation:

/ /

/ /

( )

( )Y X Y Y X X

X Y X X Y Y

X

Y

/

/

ˆ ( )

ˆ ( )

Y X

X Y

Y Y b X X

X X b Y Y

Page 91: 1 Chapter 10 Linear regression and correlation Relationship between variables

91

Regression vs Correlation

22 ( )

( )XY XYX

X X

SS SSSS

SS SS

| |

2

| |

; ;

( )

XY XYY X X Y

X Y

XY XY XYY X X Y

X Y X Y

SS SSb b

SS SS

SS SS SSb b

SS SS SS SS

2R XSS b SS

2 2

| |

( ) ( )XY X XYY X X Y

Y X Y

SS SS SSb b

SS SS SS

2r

Coeff of D. = R

Total

SS

SSSSR: Explainable variability

SSTotal=SSY: Total variability

Page 92: 1 Chapter 10 Linear regression and correlation Relationship between variables

92

characteristics of correlation coefficient

2 . R R

Total Total

SS SSr Coeff of D r

SS SS

Total R ESS SS SS 0 R TotalSS SS

20 1, . . 0 1R

Total

SSi e r

SS

1 1r

Page 93: 1 Chapter 10 Linear regression and correlation Relationship between variables

93

r = +1: a complete positive correlation

between x and y

r = -1: a complete negative correlation

between x and y

r = 0 : x and y are not correlation

Correlation has upper and lower limits of +1 and -1 respectively

Page 94: 1 Chapter 10 Linear regression and correlation Relationship between variables

94

Test of hypothesis: t test Hypothesis:

To test if r differs from zero

The standard error of r is:

follow t distribution with df=n-2

Critical value for given n and alpha

0 : 0

: 0a

H

H

2

0

12

r

r rt

s rn

21

2r

rs

n

21 / 2

21 / 22

tr

n t

Page 95: 1 Chapter 10 Linear regression and correlation Relationship between variables

95

Regression = Correlation

2

2

. 1

1

R E

Total Total

E

Total

SS SSr Coeff of D

SS SS

SSr

SS

2/

( 2)12

XY E XYr

TotalX Y X E

SS SS SSrt

SS nSS SS SS MSrn

, 2XY X XYb

b X E X E

SS SS SSbt df n

s SS MS SS MS

Page 96: 1 Chapter 10 Linear regression and correlation Relationship between variables

96

Understand the correlation

analysis via example

Page 97: 1 Chapter 10 Linear regression and correlation Relationship between variables

97

① Preliminary calculations

The length and width of eight overlapping plates composing the

shell of 10 Chiton olivaceous

Animal X

Length (cm)Y

Width (cm)

1 10.7 5.8

2 11 6

3 9.5 5

4 11.1 6

5 10.3 5.3

6 10.7 5.8

7 9.9 5.2

8 10.6 5.7

9 10 5.3

10 12 6.3

2

105.8

10.58

1123.9

X

X

X

2

56.4

5.64

319.68

Y

Y

Y

10n

599.31XY

Page 98: 1 Chapter 10 Linear regression and correlation Relationship between variables

98

② Calculate the correlation coefficient:

③ Test of significance

Hypotheses:

test statistic:

since reject H0

④ Confidence interval ???

0 : 0

: 0a

H

H

2

0 0.969 0 0.96911.14

0.0871 0.96910 2

r

rt

s

0.05(8)11.14 2.306t t

2 2 2 22 2

( )( ) 105.8 56.4599.31

10 0.969( ) ( ) 105.8 56.4

[1123.9 ][319.68 ][ ][ ]10 10

X XXY

nrX Y

X Yn n

Page 99: 1 Chapter 10 Linear regression and correlation Relationship between variables

99

Need different tests of

Can ONLY use t-test to test H0 that = 0,

because only in this case (independent) is the

distribution approximate Normal.

In all other cases, the distribution is

asymmetrical and so we have to use a different

test as follows

Page 100: 1 Chapter 10 Linear regression and correlation Relationship between variables

100

Distribution of r in samples

-1 0 +1

= 0 = 0.8

Fisher’s Z transformation must be employed:

hyperbolic tangent of r

Page 101: 1 Chapter 10 Linear regression and correlation Relationship between variables

101

Fisher’s Z Transformation

Transform r to Z: hyperbolic tangent

Z follow normal distribution approximately

Standard error of Z

1

tanh

1 1 1tanh ln [ln(1 ) ln(1 )]

2 1 2

z z

z z

e er z

e er

z r r rr

1( , )

2( 1) 3N

n n

1

3Z n

Page 102: 1 Chapter 10 Linear regression and correlation Relationship between variables

102

Testing when 0 0

follow approximately normal

distribution:

Hypothesis:

Test statistic:

0 0

0

:

:a

H

H

1 1 1 10 0tanh tanh tanh tanh

~ (0,1)1

3Z

r ru N

n

1

1

tanh

tanhZ r

Page 103: 1 Chapter 10 Linear regression and correlation Relationship between variables

103

Confidence interval for correlation coefficient

95% CI for ζρ

95% CI for ρ

1

2

11.960 1.960

3

11.960 1.960

3

Z

Z

L Z Zn

L Z Zn

1 1

2 2

tanh

tanh

r L

r L

Page 104: 1 Chapter 10 Linear regression and correlation Relationship between variables

104

Example when expected 0

For genetical reasons, the

correlation of height among

sibs (brothers or sisters) is

expected to be 0.5

i.e., H0 is that =0.5

Family

X

brother

(cm)

Y

sister

(cm)

1 71 69

2 68 64

3 66 65

4 67 64

5 70 65

6 71 62

7 70 62

8 73 64

9 72 66

… … …

50 66 62

390.558

(74)(64)XY

X Y

SSr

SS SS

2 0.31142r

Page 105: 1 Chapter 10 Linear regression and correlation Relationship between variables

105

Test of significance when 0

for r=0.5

for r=0.558

Standard error is

1ln[(1 0.5) (1 0.5)] 0.5493

2Z

1 (50 3) 0.146Z

1ln[(1 0.558) (1 0.558)] 0.63

2Z

Therefore cannot reject H0

0 0.63 0.550.55 1.960

0.146Z

Z Zu

Page 106: 1 Chapter 10 Linear regression and correlation Relationship between variables

106

Example when expected 0

1

1

0.63 1.960 0.146 0.63 0.2859 0.3441

0.63 1.960 0.146 0.63 0.2859 0.9159

L

L

0.3311 0.558 0.7240

0.3441 0.63 0.9159

r

Z

Page 107: 1 Chapter 10 Linear regression and correlation Relationship between variables

107

Concepts

Simple linear regression

Simple linear correlation

Correlation analysis based on ranks

Page 108: 1 Chapter 10 Linear regression and correlation Relationship between variables

108

Bivariate random sample (not normal distributed)

first grade students’ ages and their performance on a

standardized test

student heights and GPAs

To test the relationship between the two variables

To validate dependence of two random variables.

Page 109: 1 Chapter 10 Linear regression and correlation Relationship between variables

109

The characteristics for an index of association

Values only between -1 and +1, inclusive

The stronger the positive correlation is, the

closer the value to be +1

The stronger the negative correlation is, the

closer the value to be -1

For uncorrelated pairs of X an Y, the value

should be close to 0

Page 110: 1 Chapter 10 Linear regression and correlation Relationship between variables

110

Kendall Correlation Coefficient τ Direct compare the n observations with each other

Concordant (C): positive correlation

Discordant (D): negative correlation

Tie (E):

2( )

( 1) 2 ( 1)

C D C D

n n n n

The total # of comparisons

=C+D+E

Difference between the # of

concordant and discordant pairs

( )( ) 0i j i jX X Y Y

1 1

( )( ) 0i j i jX X Y Y

( )( ) 0i j i jX X Y Y

Page 111: 1 Chapter 10 Linear regression and correlation Relationship between variables

111

0002.5122712

0104112911

0021102410

1112.59279

04098388

04187377

0425.56326

05275355

080104424

1445.53323

0100112472

0110121531

tied pairs below

discordant pairs below

Concordant pairs below

Rank YiRank XiYiXi① Rank X and Y

② Compare each

observations

with other

observations

below it

③ Summarize C,

D and E

C=4+…=12

D=11+…=52

E=1+1=2

④ Calculate Kendall correlation coefficient:

2( ) 2 (12 52)0.606

( 1) 12 (12 1)

C D

n n

Page 112: 1 Chapter 10 Linear regression and correlation Relationship between variables

112

Birthmonth

month after cut-off

students evaluated

Dec 1 53

Jan 2 47

Feb 3 32

Mar 4 42

Apr 5 35

May 6 32

Jun 7 37

Jul 8 38

Aug 9 27

Sep 10 24

Oct 11 29

Nov 12 27

The relative age effects on academic and social performance in Geneva.

The older students tend to be

overrepresented and younger

ones underrepresented.

X: the month

Y: the # of students in grades K

through 4 evaluated for the

district’s Gifted and Talented

Student Program.

Page 113: 1 Chapter 10 Linear regression and correlation Relationship between variables

113

⑥ Test significance: two-tailed test

Hypotheses:

test statistic:

Table C.12 for n=12, p=2*0.0027=0.0054<0.01

reject H0

Conclusion: there is moderately strong negative

correlation between month after cut-off date and

referrals for gifted evaluation.

0 : 0

: 0a

H

H

min( , ) 12 1 132 2

E EC D

Page 114: 1 Chapter 10 Linear regression and correlation Relationship between variables

114

Spearman’s rank coefficient rs

One of the most common correlation coefficients

Idea: Rank X and Y observations separately and

compute the Pearson correlation coefficient on the

ranks rather than on the original data.

Advantage: to compute more simply

2

12

61

( 1)

n

iis

dr

n n

i ii x ywhere d r r

Page 115: 1 Chapter 10 Linear regression and correlation Relationship between variables

115

1 1

2 2

1 1

2 2 2

1 1

2

1 1 1 12

2 2

( 1) / 2

( 1)(2 1) / 6

( ) / ( 1) /12

/ ( 1) / 4

( 1) /12

6 ( 2 )

n n

yi xii i

n n

yi xii i

n n

rx ry yi yii i

n n n n

xi yi xi yi xi yii i i ip

rx ry

xi xi yi yii

r r n n

r r n n n

SS SS r r n n n

r r r r n r r n nr

n nSS SS

r r r r

2

12

22

12

2

12

3 ( 1) 2 ( 1)(2 1)

( 1)

6 ( ) ( 1)

( 1)

61

( 1)

n

n

xi yii

n

iis

n n n n n

n n

r r n n

n n

dr

n n

Page 116: 1 Chapter 10 Linear regression and correlation Relationship between variables

116

X: the month

Y: the difference between

actual and expected

numbers of professional

players

Month Actual players Expected players Difference

1 37 28.27 8.73

2 33 27.38 5.62

3 40 26.26 13.74

4 25 27.60 -2.60

5 29 29.16 -0.16

6 33 30.05 2.95

7 28 31.38 -3.38

8 25 31.83 -6.83

9 25 31.16 -6.16

10 23 30.71 -7.71

11 30 30.93 -0.93

12 27 30.27 -3.27

The distribution of Germany professional players' birthdays and that of the general population of Germany.

Page 117: 1 Chapter 10 Linear regression and correlation Relationship between variables

117

① Rank X and Y

② Compute the difference

③ Calculate

A negative correlation

The distribution of Germany professional players' birthdays and that of the general population of Germany.

7512-3.2712

4711-0.9311

9110-7.7110

639-6.169

628-6.838

347-3.387

-3962.956

-385-0.165

-264-2.64

-912313.743

-81025.622

-101118.731

Y:DifferenceX: Month yrxr id

i x yd r r

2

12

2

61

( 1)

6 4941

12(12 1)

0.727

n

iis

dr

n n

2 2 2

1

[( 10) 7 ]

494

n

ii

d

Page 118: 1 Chapter 10 Linear regression and correlation Relationship between variables

118

④ Test significance

Hypotheses:

Table C.13 critical value = 0.587, for n=12 and α=0.05

since |rs|=0.727>0.587 reject H0

inference: there is an excess in the number of players

born early in the competition year and a lack of those

born late among professional soccer players in

Germany.

0 : 0

: 0s

a s

H

H

Page 119: 1 Chapter 10 Linear regression and correlation Relationship between variables

119

Compare the two coefficients

Kendall’s correlation coefficient τ

easy to test significant difference from zero

Spearman’s rank correlation coefficient rs

more common, related to Pearson’s correlation

coefficient

Produce no radically different correlation

values.

Page 120: 1 Chapter 10 Linear regression and correlation Relationship between variables

120

Comparing different regressions

Common situation is either to compare your regression with a

published one

or to compare the regressions in two or more experiments you have done

Two linear regression equations:

1 1 1 1

2 2 2 2

ˆ

ˆ

Y a b X

Y a b X

Page 121: 1 Chapter 10 Linear regression and correlation Relationship between variables

121

Compare intercept: Using t test

Hypothesis:Standard error for a1-a2:

follow t distribution with

0 1 2 1 2: 0H or

1 2

2 21 2

/ 221 2 12 2 2

1 2

1 1( )( )

( ) ( )a a Y X

x xs s

Xn n XX X

n n

21

)()( 2121

aas

aat

2212 2 2

1 1 2 2

/1 2

( )( )( )( 2) ( )( 2)

( 2) ( 2)Y X

YYY n Y n

n nsn n

1 2( 2) ( 2)df n n

Page 122: 1 Chapter 10 Linear regression and correlation Relationship between variables

122

Compare slope: Using t test

Hypothesis:Standard error for β1-β2:

follow t distribution with

0 1 2 1 2: 0H or

1 2 / 2 21 22 2

1 2

1 1( ) ( )

( ) ( )b b Y Xs s

X XX X

n n

21

)()( 2121

bbs

bbt

2212 2 2

1 1 2 2

/1 2

( )( )( )( 2) ( )( 2)

( 2) ( 2)Y X

YYY n Y n

n nsn n

1 2( 2) ( 2)df n n

Page 123: 1 Chapter 10 Linear regression and correlation Relationship between variables

123

Combining data from two experiments

If the data for two experiments are not significantly different, can combine them.

Combined estimate of b is:

Standard error is:

Should combine the original data.

1 1 2 2

1 2

2 21 1 2 22 2

1 1 2 21 2

2212 2 2

1 21 2

( ) ( )( ( ) ) ( ( ) )

( )( )( ) ( )

X Y X Y

X X

X Y X YX Y X Y

SS SS n nXSS SS X

X Xn n

1 21 2

2212 2 2

1 21 2

[ ( 2)] [ ( 2)]

( )( )( ) ( )

E ESS n SS n

XXX X

n n

Page 124: 1 Chapter 10 Linear regression and correlation Relationship between variables

124

Two experiments

y

x

Expt. 1

Expt. 2

Same slopes, different means

y

x

Expt. 1

Expt. 2

Different slopes, means may or may not differ

Page 125: 1 Chapter 10 Linear regression and correlation Relationship between variables

125

Compare two different correlations

Fisher’s Z transformationHypothesis: variance for z1-z2:

1 20 1 2:H or z z

1 2

2

1 2

1 1

3 3z z n n

1 2

1 2

1 2 1 2

1 2

( ) ( )~ (0,1)

1 13 3

z z

z z z z z zu N

n n