correlation and regression analysis – an application

31
1 Correlation and Regression Analysis – An Application Dr. Jerrell T. Stracener, SAE Fellow Leadership in Engineering EMIS 7370/5370 STAT 5340 : PROBABILITY AND STATISTICS FOR SCIENTISTS AND ENGINEERS Systems Engineering Program Department of Engineering Management, Information and Systems

Upload: mort

Post on 23-Feb-2016

54 views

Category:

Documents


1 download

DESCRIPTION

Systems Engineering Program. Department of Engineering Management, Information and Systems. EMIS 7370/5370 STAT 5340 : PROBABILITY AND STATISTICS FOR SCIENTISTS AND ENGINEERS. Correlation and Regression Analysis – An Application. Dr. Jerrell T. Stracener, SAE Fellow. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Correlation and Regression Analysis –  An Application

1

Correlation and Regression Analysis – An Application

Dr. Jerrell T. Stracener, SAE Fellow

Leadership in Engineering

EMIS 7370/5370 STAT 5340 : PROBABILITY AND STATISTICS FOR SCIENTISTS AND ENGINEERS

Systems Engineering ProgramDepartment of Engineering Management, Information and Systems

Page 2: Correlation and Regression Analysis –  An Application

2

Montgomery, Peck, and Vining (2001) present data concerning the performance of the 28 National Football league teams in 1976. It is suspected that the number of games won(y) is related to the number of yards gained rushing by an opponent(x). The data are shown in the following table:

Page 3: Correlation and Regression Analysis –  An Application

3

TeamGames Won (y)

Yards Rushing by

Opponent (x) TeamGames Won (y)

Yards Rushing by

Opponent (x)Washington 10 2205 Detroit 6 1901Minnesota 11 2096 Green Bay 5 2288New England 11 1847 Houston 5 2072Oakland 13 1903 Kansas City 5 2861Pittsburgh 10 1457 Miami 6 2411Baltimore 11 1848 New Orleans 4 2289Los Angeles 10 1564 New york Giants 3 2203Dallas 11 1821 New York Jets 3 2592Atlanta 4 2577 Philadelphia 4 2053Buffalo 2 2476 St. Louis 10 1979Chicago 7 1984 San Diego 6 2048Cincinnati 10 1917 San Francisco 8 1786Cleveland 9 1761 Seattle 2 2876Denver 9 1709 Tampa Bay 0 2560

Page 4: Correlation and Regression Analysis –  An Application

Stracener_EMIS 7370/STAT 5340_Fall 08_11.18.08

4

• Statistical analysis used to obtain a quantitative measure of the strength of the relationship between a dependent variable and one or more independent variables

Correlation Analysis

Page 5: Correlation and Regression Analysis –  An Application

Stracener_EMIS 7370/STAT 5340_Fall 08_11.18.08

5

Scatter Plot

Page 6: Correlation and Regression Analysis –  An Application

Stracener_EMIS 7370/STAT 5340_Fall 08_11.18.08

6

Sample correlation coefficient

21

2n

1ii

n

1i

2i

2n

1ii

n

1i

2i

n

1ii

n

1ii

n

1iii

yynxxn

yxyxnrρ

Notes: -1 r 1

R=r2 100% = coefficient of determination

Page 7: Correlation and Regression Analysis –  An Application

Stracener_EMIS 7370/STAT 5340_Fall 08_11.18.08

7

738.0r

1951,685*28*084,592128,284,29*28

195*084,59386,127*28r21

22

R=r2 100% =0.5447

Page 8: Correlation and Regression Analysis –  An Application

Stracener_EMIS 7370/STAT 5340_Fall 08_11.18.08

8

To test for no linear association between x & y,calculate

Where r is the sample correlation coefficient and nis the sample size.

2r1

2nrt

Correlation

5766.5)738.0(1

228*738.0

r1

2t22

nr

Page 9: Correlation and Regression Analysis –  An Application

Stracener_EMIS 7370/STAT 5340_Fall 08_11.18.08

9

Conclude no linear association if

then treat y1, y2, …, yn as a random sample

2n,2α2n,

2α ttt-

Correlation

Page 10: Correlation and Regression Analysis –  An Application

Stracener_EMIS 7370/STAT 5340_Fall 08_11.18.08

10

Since t=-5.5766 < -2.0555, we conclude that there is linear association between x and y and proceed with regression analysis

Correlation

Take α=0.05 and check from the T-table, we get

0555.2t- 26,025.02n,2α

t

Page 11: Correlation and Regression Analysis –  An Application

Stracener_EMIS 7370/STAT 5340_Fall 08_11.18.08

11

Simple linear regression model

XY 10

where Y is the response (or dependent) variable0 and 1 are the unknown parameters ~ N(0,) and data: (x1, y1), (x2, y2), ..., (xn, yn)

Linear Regression Model

Page 12: Correlation and Regression Analysis –  An Application

Stracener_EMIS 7370/STAT 5340_Fall 08_11.18.08

12

Least squares estimates of 0 and 1

2

11

2

1111

^

1

n

ii

n

ii

n

ii

n

ii

n

iii

xxn

yxyxnb

n

1ii1

n

1ii0

^

0 xbyn1βb

Page 13: Correlation and Regression Analysis –  An Application

Stracener_EMIS 7370/STAT 5340_Fall 08_11.18.08

13

0.00703b59,0842128,284,29*28

195*59,084386,127*28b

1

21

2n

1ii

n

1i

2i

n

1ii

n

1ii

n

1iii

1

^

1

xxn

yxyxnβb

estimates of 1

Page 14: Correlation and Regression Analysis –  An Application

Stracener_EMIS 7370/STAT 5340_Fall 08_11.18.08

14

7883.21

084,59*)00703.0(195281

xbyn1b

0

0

n

1ii1

n

1ii0

b

b

estimates of 0

Page 15: Correlation and Regression Analysis –  An Application

Stracener_EMIS 7370/STAT 5340_Fall 08_11.18.08

15

Point estimate of the linear model

is

εxββY 10

x00703.021.78825Y

Least squares regression equation

Page 16: Correlation and Regression Analysis –  An Application

Stracener_EMIS 7370/STAT 5340_Fall 08_11.18.08

16

Regression Fitted Line Plot

Page 17: Correlation and Regression Analysis –  An Application

Stracener_EMIS 7370/STAT 5340_Fall 08_11.17.08

17

Point estimate of 2

2n

1i

^

ii22 Yy

2n1Sσ

n

1ii

n

1iii

n

1ii

12n

1ii yXyXn

nbyy

2n1

726.5

yXyXnnb

yy

2n1 n

1ii

n

1iii

n

1ii

1

2n

1iin

1i

2i

n

Page 18: Correlation and Regression Analysis –  An Application

Stracener_EMIS 7370/STAT 5340_Fall 08_11.17.08

18

(1 - )100% confidence interval for 0 is

where

and

where

0L b2n,2α00 Stbβ

UL 00 β,β

0U b2n,2α00 Stbβ

2/1

2n

0ii

n

0i

2i

n

0i

2i

b

XXn

XSS

0

Interval Estimates for y intercept (0)

Page 19: Correlation and Regression Analysis –  An Application

Stracener_EMIS 7370/STAT 5340_Fall 08_11.17.08

19

Take =0.05, then 95% confidence interval for 0 is

696.2084,59292,284,128*28

292,284,128*3929.2

XXn

XSS

2/1

2

2/1

2n

0ii

n

0i

2i

n

0i

2i

b0

Interval Estimates for y intercept (0)

Page 20: Correlation and Regression Analysis –  An Application

Stracener_EMIS 7370/STAT 5340_Fall 08_11.17.08

20

33.27696.2*056.27883.21Stbβ0U b2n,

2α00

246.16696.2*056.27883.21Stbβ0L b2n,

2α00

Apply to the equation and we get the lower and upper bound for :

0bS0β

Interval Estimates for y intercept (0)

Page 21: Correlation and Regression Analysis –  An Application

Stracener_EMIS 7370/STAT 5340_Fall 08_11.17.08

21

(1 - )100% confidence interval for 1 is

where

and

where

1L b2n,2α11 Stbβ

UL 11 β,β

1U b2n,2α11 Stbβ

21

2n

0iin

0i

2i

b

XX

SS1

n

Interval Estimates for slope (1)

Page 22: Correlation and Regression Analysis –  An Application

Stracener_EMIS 7370/STAT 5340_Fall 08_11.17.08

22

00126.0

28084,59292,284,128

3929.2

XX

SS 2/1221

2n

0iin

0i

2i

b1

n

00961.000126.0*056.200703.0Stbβ1L b2n,

2α11

00444.000126.0*056.200703.0Stbβ1U b2n,

2α11

Interval Estimates for slope (1)

Page 23: Correlation and Regression Analysis –  An Application

Stracener_EMIS 7370/STAT 5340_Fall 08_11.17.08

23

Confidence interval for conditional mean of Y, given x=2205

291.1)(59084292,284,128*28

611,608,3*28281*3929.2*056.2298.6)(

1)()(

21

2

21

2

11

2

2^

2,2

^

x

x

xxn

xxnn

txYx

L

L

n

ii

n

ii

nL

Given x equal to 2205, we can calculate the confidence interval of conditional mean of Y

Page 24: Correlation and Regression Analysis –  An Application

Stracener_EMIS 7370/STAT 5340_Fall 08_11.17.08

24

305.11)(59084128284292*28

611,608,3*28281*3929.2*056.2298.6)(

1)()(

21

2

21

2

11

2

2^

2,2

^

x

x

xxn

xxnn

txYx

U

U

n

ii

n

ii

nU

Confidence interval for conditional mean of Y, given x=2205

and

Page 25: Correlation and Regression Analysis –  An Application

Stracener_EMIS 7370/STAT 5340_Fall 08_11.17.08

25

Page 26: Correlation and Regression Analysis –  An Application

Stracener_EMIS 7370/STAT 5340_Fall 08_11.17.08

26

Prediction interval for a single future value of Y, given x

21

2n

1ii

n

1i

2i

2^

2n,2

^

L

xxn

xxnn11t)x(Y)x(Y

and

21

2

11

2

2^

2,2

^ 11)()(

n

ii

n

ii

nU

xxn

xxnn

txYxY

Page 27: Correlation and Regression Analysis –  An Application

Stracener_EMIS 7370/STAT 5340_Fall 08_11.17.08

27

Given x= 2000,

7186.0)(084,59292,284,128*28

611,608,3*282811*3929.2*056.2738.7)(

11)()(

21

2

21

2

11

2

2^

2,2

^

xY

xY

xxn

xxnn

txYxY

L

L

n

ii

n

ii

nL

Prediction interval for a single future value of Y, given x=2000

738.72000*00703.07883.21)2000(^

Y

Page 28: Correlation and Regression Analysis –  An Application

Stracener_EMIS 7370/STAT 5340_Fall 08_11.17.08

28

757.14)(59084292,284,128*28

611,608,3*282811*3929.2*056.2738.7)(

11)()(

21

2

21

2

11

2

2^

2,2

^

xY

xY

xxn

xxnn

txYxY

U

U

n

ii

n

ii

nU

and

Prediction interval for a single future value of Y, given x=2000

Page 29: Correlation and Regression Analysis –  An Application

Stracener_EMIS 7370/STAT 5340_Fall 08_11.17.08

29

Page 30: Correlation and Regression Analysis –  An Application

Stracener_EMIS 7370/STAT 5340_Fall 08_11.17.08

30

X Y XY X^2 Y^2 Y ^ (Y-Y^)^2 (x-xbar)^2

2205 10 22050 4862025 100 6.297905 13.70551 8997.878

2096 11 23056 4393216 121 7.063641 15.49492 200.0204

1847 11 20317 3411409 121 8.812891 4.783447 69244.16

1903 13 24739 3621409 169 8.419485 20.98112 42908.16

1457 10 14570 2122849 100 11.55268 2.410815 426595.6

1848 11 20328 3415104 121 8.805866 4.814226 68718.88

1564 10 15640 2446096 100 10.80099 0.641591 298272

1821 11 20031 3316041 121 8.995543 4.017847 83603.59

2577 4 10308 6640929 16 3.684567 0.099498 217955.6

2476 2 4952 6130576 4 4.394103 5.731727 133851.4

1984 7 13888 3936256 49 7.850452 0.723268 15912.02

1917 10 19170 3674889 100 8.321134 2.818592 37304.16

1761 9 15849 3101121 81 9.417049 0.17393 121900.7

1709 9 15381 2920681 81 9.782355 0.612079 160915.6

1901 6 11406 3613801 36 8.433535 5.922094 43740.73

2288 5 11440 5234944 25 5.714821 0.51097 31633.16

2072 5 10360 4293184 25 7.232243 4.982909 1454.878

2861 5 14305 8185321 25 1.689439 10.95981 563786.4

2411 6 14466 5812921 36 4.850734 1.320812 90515.02

2289 4 9156 5239521 16 5.707796 2.916568 31989.88

2203 3 6609 4853209 9 6.311955 10.96905 8622.449

2592 3 7776 6718464 9 3.579191 0.335462 232186.3

2053 4 8212 4214809 16 7.36572 11.32807 3265.306

1979 10 19790 3916441 100 7.885577 4.470783 17198.45

2048 6 12288 4194304 36 7.400846 1.962368 3861.735

1786 8 14288 3189796 64 9.241422 1.541128 105068.6

2876 2 5752 8271376 4 1.584062 0.173004 586537.2

2560 0 0 6553600 0 3.803994 14.47037 202371.4

SUM 59084 195 386127 128284292 1685 195 148.872 3608611

x-bar 2110.1429

-709824 34.54949

101041120 9155 961785.6 -0.738027304 <-r Sb0 14.0723

2.696233

b1 -0.007025 5.725845085 <-S^2 b0l 16.2448

b0 21.788251 2.392873813 <--S b0u 27.33171

Sb1 0.00126 0.00126

Sb1l -0.00961 -0.00961

Y(2205)-> 6.2979048 Sb1u -0.00444 -0.00444

mu-l 1.291074258

mu-u 11.30473529

Y(2000)-> 7.7380503 y-l 0.718628866

y-u 14.7574718

Excel Calculation

Page 31: Correlation and Regression Analysis –  An Application

Stracener_EMIS 7370/STAT 5340_Fall 08_11.17.08

31

SUMMARY OUTPUT

Regression StatisticsMultiple R 0.738027R Square 0.544684Adjusted R Square 0.527172Standard Error 2.392874Observations 28

ANOVA

df SS MS F Significance FRegression 1 178.0923 178.0923 31.10324 7.381E-06Residual 26 148.872 5.725845Total 27 326.9643

Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0%Intercept 21.78825 2.696233 8.080996 1.46E-08 16.246064 27.3304377 16.2460641 27.33044X Variable 1 -0.00703 0.00126 -5.57703 7.38E-06 -0.009614 -0.0044359 -0.0096143 -0.00444

RESIDUAL OUTPUT

Observation Predicted Y Residuals1 6.297905 3.7020952 7.063641 3.9363593 8.812891 2.1871094 8.419485 4.5805155 11.55268 -1.552686 8.805866 2.1941347 10.80099 -0.800998 8.995543 2.0044579 3.684567 0.315433

10 4.394103 -2.394111 7.850452 -0.8504512 8.321134 1.67886613 9.417049 -0.4170514 9.782355 -0.7823515 8.433535 -2.4335416 5.714821 -0.7148217 7.232243 -2.2322418 1.689439 3.31056119 4.850734 1.14926620 5.707796 -1.707821 6.311955 -3.3119522 3.579191 -0.5791923 7.36572 -3.3657224 7.885577 2.11442325 7.400846 -1.4008526 9.241422 -1.2414227 1.584062 0.41593828 3.803994 -3.80399

Excel Regression Analysis Output