analysis of variance approach to regression analysis

22
Analysis of variance approach to regression analysis … an alternative approach to testing for a linear association

Upload: melina

Post on 01-Feb-2016

44 views

Category:

Documents


0 download

DESCRIPTION

Analysis of variance approach to regression analysis. … an alternative approach to testing for a linear association. Basic idea … well, kind of …. Break down the variation in Y (“ total sum of squares ”) into two components: - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Analysis of variance approach to regression analysis

Analysis of variance approach to regression analysis

… an alternative approach to testing for a linear association

Page 2: Analysis of variance approach to regression analysis

10 9 8 7 6 5 4 3 2 1 0

60

50

40

x

y1S = 7.81137 R-Sq = 6.5 % R-Sq(adj) = 3.2 %

y1 = 54.4758 - 0.764016 x

Regression Plot

y-hat

y-bar

y-i

6.18272

1

n

ii yy 5.1708ˆ

2

1

n

iii yy 1.119ˆ

2

1

n

ii yy

Page 3: Analysis of variance approach to regression analysis

8.84872

1

n

ii yy 5.1708ˆ

2

1

n

iii yy 3.6679ˆ

2

1

n

ii yy

10 9 8 7 6 5 4 3 2 1 0

80

70

60

50

40

30

20

10

x

y2S = 7.81137 R-Sq = 79.9 % R-Sq(adj) = 79.2 %

y2 = 75.5458 - 5.76402 x

Regression Plot

y-bar

y-hat

y-i

Page 4: Analysis of variance approach to regression analysis

Basic idea … well, kind of …

• Break down the variation in Y (“total sum of squares”) into two components:– a component that is due to the change in X

(“regression sum of squares”)– a component that is just due to random error

(“error sum of squares”)

• If the regression sum of squares is much greater than the error sum of squares, conclude that there is a linear association.

Page 5: Analysis of variance approach to regression analysis

Row Year Men200m 1 1900 22.20 2 1904 21.60 3 1908 22.60 4 1912 21.70 5 1920 22.00 6 1924 21.60 7 1928 21.80 8 1932 21.20 9 1936 20.70 10 1948 21.10 11 1952 20.70 12 1956 20.60 13 1960 20.50 14 1964 20.30 15 1968 19.83 16 1972 20.00 17 1976 20.23 18 1980 20.19 19 1984 19.80 20 1988 19.75 21 1992 20.01 22 1996 19.32

Winning times (in seconds) in Men’s 200 meter Olympic sprints, 1900-1996.

Are men getting faster?

Page 6: Analysis of variance approach to regression analysis

200019501900

22.5

21.5

20.5

19.5

Year

Men

200m

S = 0.298134 R-Sq = 89.9 % R-Sq(adj) = 89.4 %

Men200m = 76.1534 - 0.0283833 Year

Regression Plot

Page 7: Analysis of variance approach to regression analysis

Analysis of Variance Table

Analysis of Variance

Source DF SS MS F PRegression 1 15.796 15.796 177.7 0.000Residual Error 20 1.778 0.089Total 21 17.574

The regression sum of squares, 15.796, accounts for most of the total sum of squares, 17.574.

There appears to be a significant linear association between year and winning times … let’s formalize it.

Page 8: Analysis of variance approach to regression analysis

iiii YYYYYY ˆˆ

Page 9: Analysis of variance approach to regression analysis

The cool thing is that the decomposition holds for the sum of the squared deviations, too. That is ….

2

1

2

1

2

1

ˆˆ

n

iii

n

ii

n

ii YYYYYY

Total sum of squares (SSTO)

Regression sum of squares (SSR)

Error sum of squares (SSE)

SSESSRSSTO

Page 10: Analysis of variance approach to regression analysis

Breakdown of degrees of freedom

211 nn

Degrees of freedom associated with SSTO

Degrees of freedom associated with SSR

Degrees of freedom associated with SSE

Page 11: Analysis of variance approach to regression analysis

Definition of Mean Squares

SSRSSR

MSR 1

The regression mean square (MSR) is defined as:

2n

SSEMSE

where, as you already know, the error mean square (MSE) is defined as:

Page 12: Analysis of variance approach to regression analysis

The Analysis of Variance (ANOVA) Table

Page 13: Analysis of variance approach to regression analysis

Expected Mean Squares

n

ii XXMSRE

1

221

2)(

2)( MSEE

If there is no linear association (β1= 0), we’d expect the ratio MSR/MSE to be 1. If there is linear association (β1≠0), we’d expect the ratio MSR/MSE to be greater than 1.

So, use the ratio MSR/MSE to draw conclusion about whether or not β1= 0.

Page 14: Analysis of variance approach to regression analysis

The F-test

0:

0:

1

10

AH

HHypotheses: Test statistic:

MSE

MSRF *

P-value = What is the probability that we’d get an F* statistic as large as we did, if the null hypothesis is true? (One-tailed test!)

Determine the P-value by comparing F* to an F distribution with 1 numerator degrees of freedom and n-2 denominator degrees of freedom.

Reject the null hypothesis if P-value is “small” as defined by being smaller than the level of significance.

Page 15: Analysis of variance approach to regression analysis

Analysis of Variance Table

Analysis of VarianceSource DF SS MS F PRegression 1 15.8 15.8 177.7 0.000Residual Error 20 1.8 0.09Total 21 17.6

DFE = n-2 = 22-2 = 20

DFTO = n-1 = 22-1 = 21

MSR = SSR/1 = 15.8

MSE = SSE/(n-2) = 1.8/20 = 0.09

F* = MSR/MSE = 15.796/0.089 = 177.7

P = Probability that an F(1,20) random variable is greater than 177.7 = 0.000…

Page 16: Analysis of variance approach to regression analysis

Equivalence of F-test to T-test

7.177)33.13( 2

Predictor Coef SE Coef T PConstant 76.153 4.152 18.34 0.000Year -0.0284 0.00213 -13.33 0.000

Analysis of VarianceSource DF SS MS F PRegression 1 15.796 15.796 177.7 0.000Residual Error 20 1.778 0.089Total 21 17.574

*)2,1(

2*)2( nn Ft

Page 17: Analysis of variance approach to regression analysis

Equivalence of F-test to t-test

• For a given significance level, the F-test of β1=0 versus β1≠0 is algebraically equivalent to the two-tailed t-test.

• Will get same P-values.

• If one test leads to rejecting H0, then so will the other. And, if one test leads to not rejecting H0, then so will the other.

Page 18: Analysis of variance approach to regression analysis

F test versus T test?

• F-test is only appropriate for testing that the slope differs from 0 (β1≠0). Use the t-test if you want to test that the slope is positive (β1>0) or negative (β1<0) .

• F-test will be more useful to us later when we want to test that more than one slope parameter is 0.

Page 19: Analysis of variance approach to regression analysis

Getting ANOVA table in Minitab

• Default output for either command– Stat >> Regression >> Regression …– Stat >> Regression >> Fitted line plot …

Page 20: Analysis of variance approach to regression analysis

Example: Oxygen consumption related to treadmill duration?

1000900800700600500

65

60

55

50

45

40

35

30

25

20

Treadmill duration (seconds)

Oxy

gen

cons

umpt

ion

(ml/k

g/m

in)

Page 21: Analysis of variance approach to regression analysis

1000 900 800 700 600 500

65

60

55

50

45

40

35

30

25

20

Treadmill duration (seconds)

Oxy

gen

cons

umpt

ion

(ml/k

g/m

in)

S = 4.12789 R-Sq = 79.6 % R-Sq(adj) = 79.1 %

vo2 = -1.10367 + 0.0643694 duration

Regression Plot

Page 22: Analysis of variance approach to regression analysis

The regression equation isvo2 = - 1.10 + 0.0644 duration

Predictor Coef SE Coef T PConstant -1.104 3.315 -0.33 0.741duration 0.064369 0.005030 12.80 0.000

S = 4.128 R-Sq = 79.6% R-Sq(adj) = 79.1%

Analysis of VarianceSource DF SS MS F PRegression 1 2790.6 2790.6 163.77 0.000Residual Error 42 715.7 17.0Total 43 3506.2