linear regression. psyc 6130, prof. j. elder 2 correlation vs regression: what’s the difference?...

29
Linear Regression

Upload: vivien-montgomery

Post on 16-Dec-2015

236 views

Category:

Documents


5 download

TRANSCRIPT

Page 1: Linear Regression. PSYC 6130, PROF. J. ELDER 2 Correlation vs Regression: What’s the Difference? Correlation measures how strongly related 2 variables

Linear Regression

Page 2: Linear Regression. PSYC 6130, PROF. J. ELDER 2 Correlation vs Regression: What’s the Difference? Correlation measures how strongly related 2 variables

PSYC 6130, PROF. J. ELDER 2

Correlation vs Regression: What’s the Difference?

• Correlation measures how strongly related 2 variables are.

• Regression provides a means for predicting the value of one variable based on the value of a related variable.

• The underlying mathematics are the same.

• Here we are dealing only with linear correlation and linear regression.

Page 3: Linear Regression. PSYC 6130, PROF. J. ELDER 2 Correlation vs Regression: What’s the Difference? Correlation measures how strongly related 2 variables

PSYC 6130, PROF. J. ELDER 3

Optimal Prediction using z Scores

• Consider 2 variables X and Y that may be related in some way.

– e.g.,

• X = midterm score, Y = final exam score

• X = reaction time, Y = error rate

• Suppose you know X for a particular case (e.g., individual, trial). What is your best guess at Y?

• The answer turns out to be pretty simple:

Y Xz rz

Page 4: Linear Regression. PSYC 6130, PROF. J. ELDER 2 Correlation vs Regression: What’s the Difference? Correlation measures how strongly related 2 variables

PSYC 6130, PROF. J. ELDER 4

Example: 6130A 2005-06 Assignment marksAssignment 1 Assignment 2

X Y86.7% 81.8%81.5% 82.4%85.0% 84.3%85.5% 86.8%90.2% 83.6%95.4% 87.4%91.9% 93.1%93.1% 93.1%94.8% 91.8%93.6% 93.7%94.8% 93.1%94.2% 94.3%94.8% 95.6%

Mean 90.9% 89.3%Sample Std. Dev. 4.66% 5.04%

Page 5: Linear Regression. PSYC 6130, PROF. J. ELDER 2 Correlation vs Regression: What’s the Difference? Correlation measures how strongly related 2 variables

PSYC 6130, PROF. J. ELDER 5

Graphical Representation

0.7998Y Xz z

Regression line

PSYC 6130A 2005-06

-3

-2

-1

0

1

2

3

-3 -2 -1 0 1 2 3

Assignment 1 z-Score

Ass

ignm

ent

2 z-

Sco

re

Page 6: Linear Regression. PSYC 6130, PROF. J. ELDER 2 Correlation vs Regression: What’s the Difference? Correlation measures how strongly related 2 variables

PSYC 6130, PROF. J. ELDER 6

The Raw-Score Regression Formula

YX YXY a b X

( )YY X

X

Y r X

YYX

X

b r

YX Y YX Xa b

or

where

In terms of population parameters: In terms of sample statistics:

YX YXY a b X

( )Y

X

sY Y r X X

s

YYX

X

sb r

s

YX YXa Y b X

or

where

Page 7: Linear Regression. PSYC 6130, PROF. J. ELDER 2 Correlation vs Regression: What’s the Difference? Correlation measures how strongly related 2 variables

PSYC 6130, PROF. J. ELDER 7

Example: 6130A 2005-06 Assignment marksAssignment 1 Assignment 2

X Y86.7% 81.8%81.5% 82.4%85.0% 84.3%85.5% 86.8%90.2% 83.6%95.4% 87.4%91.9% 93.1%93.1% 93.1%94.8% 91.8%93.6% 93.7%94.8% 93.1%94.2% 94.3%94.8% 95.6%

Mean 90.9% 89.3%Sample Std. Dev. 4.66% 5.04%

Page 8: Linear Regression. PSYC 6130, PROF. J. ELDER 2 Correlation vs Regression: What’s the Difference? Correlation measures how strongly related 2 variables

PSYC 6130, PROF. J. ELDER 8

Graphical Representation

PSYC 6130 Section A 2005-2006

75%

80%

85%

90%

95%

100%

80% 85% 90% 95% 100%

Assignment 1 Grade

Ass

ign

me

nt 2

Gra

de0.867

10.5%YX

YX

b

a

y = 0.867x + 10.5%

Regression line

Page 9: Linear Regression. PSYC 6130, PROF. J. ELDER 2 Correlation vs Regression: What’s the Difference? Correlation measures how strongly related 2 variables

PSYC 6130, PROF. J. ELDER 9

Residuals• The deviations of the actual Y values from the Y values predicted by

the regression line are called residuals.

• The regression line minimizes the sum of squared residuals (and hence is called a mean-squared fit).

PSYC 6130 Section A 2005-2006

75%

80%

85%

90%

95%

100%

80% 85% 90% 95% 100%

Assignment 1 Grade

Ass

ign

me

nt 2

Gra

de

Y

Yresidual Y Y

Page 10: Linear Regression. PSYC 6130, PROF. J. ELDER 2 Correlation vs Regression: What’s the Difference? Correlation measures how strongly related 2 variables

PSYC 6130, PROF. J. ELDER 10

Variance of the Estimate

• Total prediction error is expressed as the variance of the estimate (or mean-squared error) :

22est Y

( )Y Y

N

2 2est YNote that .Y

Equality applies only when 0.r

2

2est Y

( )

2

Y Ys

N

In terms of population parameters: In terms of sample statistics:

2est Y

est Y est Y ( ) standard error of is calle the estid mh .t e ates

Page 11: Linear Regression. PSYC 6130, PROF. J. ELDER 2 Correlation vs Regression: What’s the Difference? Correlation measures how strongly related 2 variables

PSYC 6130, PROF. J. ELDER 11

Explained and Unexplained Variance

2 2exp

1Explained Variance: ( )

N YY

PSYC 6130 Section A 2005-2006

75%

80%

85%

90%

95%

100%

80% 85% 90% 95% 100%

Assignment 1 Grade

Ass

ign

me

nt 2

Gra

de Y

Y

Y

Unexplained

Explained

2 2est

1Unexplained Variance ( )

Ny Y Y

Page 12: Linear Regression. PSYC 6130, PROF. J. ELDER 2 Correlation vs Regression: What’s the Difference? Correlation measures how strongly related 2 variables

PSYC 6130, PROF. J. ELDER 12

Summary of Variances

22exp

( )Explained Variance: YY

N

22

( )Unexplained Variance est Y

Y Y

N

Population:2

2 ( )Total Variance Y

Y

Y

N

Page 13: Linear Regression. PSYC 6130, PROF. J. ELDER 2 Correlation vs Regression: What’s the Difference? Correlation measures how strongly related 2 variables

PSYC 6130, PROF. J. ELDER 13

Summary of Variances

• It can be shown that:

• i.e., the variance is equal to the sum of the explained and unexplained variances.

Population:

2 2 2exp Y estY

Page 14: Linear Regression. PSYC 6130, PROF. J. ELDER 2 Correlation vs Regression: What’s the Difference? Correlation measures how strongly related 2 variables

PSYC 6130, PROF. J. ELDER 14

Summary of Variances

Sample:

2 2 2expExplained Variance: Y estYs s s

22

( )Unexplained Variance

2est Y

Y Ys

N

22 ( )

Total Variance s 1Y

Y Y

N

Page 15: Linear Regression. PSYC 6130, PROF. J. ELDER 2 Correlation vs Regression: What’s the Difference? Correlation measures how strongly related 2 variables

PSYC 6130, PROF. J. ELDER 15

Coefficient of Determination• The fraction of the total variance explained by the regression line is

called the coefficient of determination

• It can be shown that this is just the square of the Pearson coefficient r:

• Population:

• Sample:

2 22

2 2

( )Coefficient of Determination 1

( )Y estY

Y Y

Yr

Y

2 22

2 2

( ) 2Coefficient of Determination 1

( ) 1estY

Y

Y Y snr

Y Y n s

Page 16: Linear Regression. PSYC 6130, PROF. J. ELDER 2 Correlation vs Regression: What’s the Difference? Correlation measures how strongly related 2 variables

PSYC 6130, PROF. J. ELDER 16

Coefficient of Nondetermination• The fraction of the total variance that remains unexplained by the

regression line is called the coefficient of nondetermination

• It can be shown that this is just 1-r2:

• Population:

• Sample:

2 22

2 2

( )Coefficient of Nondetermination 1-

( )estY

Y Y

Y Yr

Y

2 22

2 2

( ) 2Coefficient of Nondetermination 1-

( ) 1estY

Y

Y Y snr

Y Y n s

Page 17: Linear Regression. PSYC 6130, PROF. J. ELDER 2 Correlation vs Regression: What’s the Difference? Correlation measures how strongly related 2 variables

PSYC 6130, PROF. J. ELDER 17

Summary of Coefficients

2 22

2 2

Coefficient of Determination:

( )r 1

( )Y estY

Y Y

Y

Y

Population: Sample:

2 22 est Y

2 2

Coefficient of Nondetermination:

( )1-r

( )Y Y

Y Y

Y

2 22

2 2

Coefficient of Determination:

( ) 2r 1

( ) 1estY

Y

Y Y sn

Y Y n s

2 22 est Y

2 2

Coefficient of Nondetermination:

( ) 21-r

( ) 1 Y

Y Y sn

Y Y n s

Page 18: Linear Regression. PSYC 6130, PROF. J. ELDER 2 Correlation vs Regression: What’s the Difference? Correlation measures how strongly related 2 variables

PSYC 6130, PROF. J. ELDER 18

Components of Variance: SPSS Output

ANOVA b

861347.2 1 861347.186 7465.139 .000 a

1325861 11491 115.383

2187209 11492

Regression

Residual

Total

Model

1

Sum of

Squares df Mean Square F Sig.

Predictors: (Constant), How tall are you without your shoes on (in cm.)a.

Dependent Variable: How much do you weigh (in kilograms)b.

2Explained SS: ( )Y Y

2Unexplained SS: ( )Y Y 2Total SS: ( )Y Y

22

( )Unexplained Variance

2est Y

Y Ys

N

Page 19: Linear Regression. PSYC 6130, PROF. J. ELDER 2 Correlation vs Regression: What’s the Difference? Correlation measures how strongly related 2 variables

PSYC 6130, PROF. J. ELDER 19

Estimating the Variance of the Estimate

• Uncertainty in predictions can be estimated using the assumption of homoscedasticity.

– (Etymology: hom- + Greek skedastikos able to disperse, from skedannynai to disperse)

– Thought question: does this also explain the origin of the verb skedaddle?

– In other words, homogeneity of variance in Y over the range of X.

Page 20: Linear Regression. PSYC 6130, PROF. J. ELDER 2 Correlation vs Regression: What’s the Difference? Correlation measures how strongly related 2 variables

PSYC 6130, PROF. J. ELDER 20

Confidence Intervals for Predictions

2

2

1 ( )1

( 1)crit estYX

X XY Y t s

N N s

Page 21: Linear Regression. PSYC 6130, PROF. J. ELDER 2 Correlation vs Regression: What’s the Difference? Correlation measures how strongly related 2 variables

PSYC 6130, PROF. J. ELDER 21

Example: 6130A 2005-06 Assignment marksAssignment 1 Assignment 2

X Y86.7% 81.8%81.5% 82.4%85.0% 84.3%85.5% 86.8%90.2% 83.6%95.4% 87.4%91.9% 93.1%93.1% 93.1%94.8% 91.8%93.6% 93.7%94.8% 93.1%94.2% 94.3%94.8% 95.6%

Mean 90.9% 89.3%Sample Std. Dev. 4.66% 5.04%

0.7998r

Page 22: Linear Regression. PSYC 6130, PROF. J. ELDER 2 Correlation vs Regression: What’s the Difference? Correlation measures how strongly related 2 variables

PSYC 6130, PROF. J. ELDER 22

Underlying Assumptions

• Independent random sampling

• Linearity

• Normal Distribution

• Homoscedasticity

Page 23: Linear Regression. PSYC 6130, PROF. J. ELDER 2 Correlation vs Regression: What’s the Difference? Correlation measures how strongly related 2 variables

PSYC 6130, PROF. J. ELDER 23

Regressing X on Y• Simply reverse the formulae, e.g.,

In terms of sample statistics:

XY XYX a b Y

( )X

Y

sX X r Y Y

s

XXY

Y

sb r

s

XY XYa X b Y

or

where

Page 24: Linear Regression. PSYC 6130, PROF. J. ELDER 2 Correlation vs Regression: What’s the Difference? Correlation measures how strongly related 2 variables

PSYC 6130, PROF. J. ELDER 24

When to Use Linear Regression

• Prediction

• Statistical Control

– Adjust for effects of confounding variable.

– Also known as partialing out the effect of the confounding variable.

• Experimental Psychology: modeling effect of continuous independent variable on continuous dependent variable.

– e.g., reaction time vs set size in visual search.

Page 25: Linear Regression. PSYC 6130, PROF. J. ELDER 2 Correlation vs Regression: What’s the Difference? Correlation measures how strongly related 2 variables

PSYC 6130, PROF. J. ELDER 25

Statistical Control Example: Mental Health

Women report more bad mental health days than men, t(8176)=-7.1, p<.001, 2-tailed.

Page 26: Linear Regression. PSYC 6130, PROF. J. ELDER 2 Correlation vs Regression: What’s the Difference? Correlation measures how strongly related 2 variables

PSYC 6130, PROF. J. ELDER 26

Statistical Control Example: Physical Health

Page 27: Linear Regression. PSYC 6130, PROF. J. ELDER 2 Correlation vs Regression: What’s the Difference? Correlation measures how strongly related 2 variables

PSYC 6130, PROF. J. ELDER 27

Correlation

Pearson’s r = 0.31

Page 28: Linear Regression. PSYC 6130, PROF. J. ELDER 2 Correlation vs Regression: What’s the Difference? Correlation measures how strongly related 2 variables

PSYC 6130, PROF. J. ELDER 28

After Partialing Out Physical Health

Page 29: Linear Regression. PSYC 6130, PROF. J. ELDER 2 Correlation vs Regression: What’s the Difference? Correlation measures how strongly related 2 variables

PSYC 6130, PROF. J. ELDER 29

Result of Partialing Out Physical Health

Controlling for physical health, women report more bad mental health days than men, t(8176)=-5.7, p<.001, 2-tailed.