statistics for the social sciences psychology 340 spring 2005 prediction cont

37
Statistics for the Social Sciences Psychology 340 Spring 2005 Prediction cont.

Post on 19-Dec-2015

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Statistics for the Social Sciences Psychology 340 Spring 2005 Prediction cont

Statistics for the Social SciencesPsychology 340

Spring 2005

Prediction cont.

Page 2: Statistics for the Social Sciences Psychology 340 Spring 2005 Prediction cont

Statistics for the Social Sciences

Outline (for week)

• Simple bi-variate regression, least-squares fit line– The general linear model

– Residual plots

– Using SPSS

• Multiple regression– Comparing models, (?? Delta r2)

– Using SPSS

Page 3: Statistics for the Social Sciences Psychology 340 Spring 2005 Prediction cont

Statistics for the Social Sciences

From last time

• Review of last time

Y = intercept + slope(X) + errorY

X123456

1 2 3 4 5 6

Y

residuals are = Y −Y( )

Page 4: Statistics for the Social Sciences Psychology 340 Spring 2005 Prediction cont

Statistics for the Social Sciences

From last time

Y

X123456

1 2 3 4 5 6

residuals are = Y −Y( )

Y • The sum of the residuals should always equal 0.– The least squares regression line splits the data

in half

• Additionally, the residuals to be randomly distributed. – There should be no pattern to the residuals. – If there is a pattern, it may suggest that there is

more than a simple linear relationship between the two variables.

Page 5: Statistics for the Social Sciences Psychology 340 Spring 2005 Prediction cont

Statistics for the Social Sciences

Seeing patterns in the error

– Useful tools to examine the relationship even further. • These are basically scatterplots of the Residuals (often

transformed into z-scores) against the Explanatory (X) variable (or sometimes against the Response variable)

• Residual plots

Page 6: Statistics for the Social Sciences Psychology 340 Spring 2005 Prediction cont

Statistics for the Social Sciences

Seeing patterns in the error

• The residual plot shows that the residuals fall randomly above and below the line. Critically there doesn't seem to be a discernable pattern to the residuals.

Residual plotScatter plot

• The scatter plot shows a nice linear relationship.

Page 7: Statistics for the Social Sciences Psychology 340 Spring 2005 Prediction cont

Statistics for the Social Sciences

Seeing patterns in the error

Residual plot

• The scatter plot also shows a nice linear relationship.

• The residual plot shows that the residuals get larger as X increases.

• This suggests that the variability around the line is not constant across values of X.

• This is referred to as a violation of homogeniety of variance.

Scatter plot

Page 8: Statistics for the Social Sciences Psychology 340 Spring 2005 Prediction cont

Statistics for the Social Sciences

Seeing patterns in the error

• The residual plot suggests that a non-linear relationship may be more appropriate (see how a curved pattern appears in the residual plot).

Residual plotScatter plot

• The scatter plot shows what may be a linear relationship.

Page 9: Statistics for the Social Sciences Psychology 340 Spring 2005 Prediction cont

Statistics for the Social Sciences

Regression in SPSS

– Variables (explanatory and response) are entered into columns

– Each row is an unit of analysis (e.g., a person)

• Using SPSS

Page 10: Statistics for the Social Sciences Psychology 340 Spring 2005 Prediction cont

Statistics for the Social Sciences

Regression in SPSS

• Analyze: Regression, Linear

Page 11: Statistics for the Social Sciences Psychology 340 Spring 2005 Prediction cont

Statistics for the Social Sciences

Regression in SPSS

– Predicted (criterion) variable into Dependent Variable field

– Predictor variable into the Independent Variable field

• Enter:

Page 12: Statistics for the Social Sciences Psychology 340 Spring 2005 Prediction cont

Statistics for the Social Sciences

Regression in SPSS

• The variables in the model

• r

• r2

• We’ll get back to these numbers in a few weeks

• Slope (indep var name)• Intercept (constant)

• Unstandardized coefficients

Page 13: Statistics for the Social Sciences Psychology 340 Spring 2005 Prediction cont

Statistics for the Social Sciences

Regression in SPSS

(indep var name)

• Standardized coefficient

• Recall that r = standardized in

bi-variate regression

Page 14: Statistics for the Social Sciences Psychology 340 Spring 2005 Prediction cont

Statistics for the Social Sciences

Multiple Regression

• Typically researchers are interested in predicting with more than one explanatory variable

• In multiple regression, an additional predictor variable (or set of variables) is used to predict the residuals left over from the first predictor.

Page 15: Statistics for the Social Sciences Psychology 340 Spring 2005 Prediction cont

Statistics for the Social Sciences

Multiple Regression

Y = intercept + slope (X) + error

• Bi-variate regression prediction models

Page 16: Statistics for the Social Sciences Psychology 340 Spring 2005 Prediction cont

Statistics for the Social Sciences

Multiple Regression

• Multiple regression prediction models

μY = β0 + β1X1 + β2 X2 + β 3X3 + β 4 X4 + ε

“fit” “residual”

Y = intercept + slope (X) + error

μY = β0 + β1X + ε

• Bi-variate regression prediction models

Page 17: Statistics for the Social Sciences Psychology 340 Spring 2005 Prediction cont

Statistics for the Social Sciences

Multiple Regression

• Multiple regression prediction models

First

Explanatory

Variable

Second

Explanatory

Variable

Fourth

Explanatory

Variable

whatever variability

is left overμY = β0 + β1X1 + β2 X2 + β 3X3 + β 4 X4 + ε

Third

Explanatory

Variable

Page 18: Statistics for the Social Sciences Psychology 340 Spring 2005 Prediction cont

Statistics for the Social Sciences

Multiple Regression

First

Explanatory

Variable

Second

Explanatory

Variable

Fourth

Explanatory

Variable

whatever variability

is left overμY = β0 + β1X1 + β2 X2 + β 3X3 + β 4 X4 + ε

Third

Explanatory

Variable

• Predict test performance based on: • Study time • Test time • What you eat for breakfast • Hours of sleep

Page 19: Statistics for the Social Sciences Psychology 340 Spring 2005 Prediction cont

Statistics for the Social Sciences

Multiple Regression

• Predict test performance based on: • Study time • Test time • What you eat for breakfast • Hours of sleep

• Typically your analysis consists of testing multiple regression models to see which “fits” best (comparing r2s of the models)

μY = β0 + β1X1 + β2 X2 + ε

μY = β0 + β1X1 + β2 X2 + β 4 X4 + εversus

μY = β0 + β1X1 + β2 X2 + β 3X3 + β 4 X4 + εversus

• For example:

Page 20: Statistics for the Social Sciences Psychology 340 Spring 2005 Prediction cont

Statistics for the Social Sciences

Multiple Regression

Response variableTotal variability it test performance

Total study timer = .6

Model #1: Some co-variance between the two variables

R2 for Model = .36

64% variance unexplained

• If we know the total study time, we can predict 36% of the variance in test performance

μY = β0 + β1X1 + ε

Page 21: Statistics for the Social Sciences Psychology 340 Spring 2005 Prediction cont

Statistics for the Social Sciences

Multiple Regression

Response variableTotal variability it test performance

Test timer = .1

Model #2: Add test time to the model

Total study timer = .6

R2 for Model = .49

51% variance unexplained

• Little co-variance between these test performance and test time• We can explain more the of variance in test performance

μY = β0 + β1X1 + β2 X2 + ε

Page 22: Statistics for the Social Sciences Psychology 340 Spring 2005 Prediction cont

Statistics for the Social Sciences

Multiple Regression

Response variableTotal variability it test performance

breakfastr = .0

Model #3: No co-variance between these test performance and breakfast food

Total study timer = .6

Test timer = .1

R2 for Model = .49

51% variance unexplained

μY = β0 + β1X1 + β2 X2 + β 3X3 + ε

• Not related, so we can NOT explain more the of variance in test performance

Page 23: Statistics for the Social Sciences Psychology 340 Spring 2005 Prediction cont

Statistics for the Social Sciences

Multiple Regression

Response variableTotal variability it test performance

breakfastr = .0

• We can explain more the of variance • But notice what happens with the overlap (covariation between explanatory

variables), can’t just add r’s or r2’s

Total study timer = .6

Test timer = .1

Hrs of sleepr = .45

R2 for Model = .60

40% variance unexplained

μY = β0 + β1X1 + β2 X2 + β 3X3 + β 4 X4 + ε

Model #4: Some co-variance between these test performance and hours of sleep

Page 24: Statistics for the Social Sciences Psychology 340 Spring 2005 Prediction cont

Statistics for the Social Sciences

Multiple Regression in SPSS

Setup as before: Variables (explanatory and response) are entered into columns

• A couple of different ways to use SPSS to compare different models

Page 25: Statistics for the Social Sciences Psychology 340 Spring 2005 Prediction cont

Statistics for the Social Sciences

Regression in SPSS

• Analyze: Regression, Linear

Page 26: Statistics for the Social Sciences Psychology 340 Spring 2005 Prediction cont

Statistics for the Social Sciences

Multiple Regression in SPSS

• Method 1:enter all the explanatory

variables together – Enter:

• All of the predictor variables into the Independent Variable field

• Predicted (criterion) variable into Dependent Variable field

Page 27: Statistics for the Social Sciences Psychology 340 Spring 2005 Prediction cont

Statistics for the Social Sciences

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Multiple Regression in SPSS

• The variables in the model

• r for the entire model

• r2 for the entire model

• Unstandardized coefficients

• Coefficient for var1 (var name)

• Coefficient for var2 (var name)

Page 28: Statistics for the Social Sciences Psychology 340 Spring 2005 Prediction cont

Statistics for the Social Sciences

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Multiple Regression in SPSS

• The variables in the model

• r for the entire model

• r2 for the entire model

• Standardized coefficients

• Coefficient for var1 (var name)

• Coefficient for var2 (var name)

Page 29: Statistics for the Social Sciences Psychology 340 Spring 2005 Prediction cont

Statistics for the Social Sciences

Multiple Regression

– Which to use, standardized or unstandardized?

– Unstandardized ’s are easier to use if you want to predict a raw score based on raw scores (no z-scores needed).

– Standardized ’s are nice to directly compare which variable is most “important” in the equation

Page 30: Statistics for the Social Sciences Psychology 340 Spring 2005 Prediction cont

Statistics for the Social Sciences

Multiple Regression in SPSS

• Predicted (criterion) variable into Dependent Variable field

• First Predictor variable into the Independent Variable field

• Click the Next button

• Method 2: enter first model, then add another

variable for second model, etc. – Enter:

Page 31: Statistics for the Social Sciences Psychology 340 Spring 2005 Prediction cont

Statistics for the Social Sciences

Multiple Regression in SPSS

• Method 2 cont: – Enter:

• Second Predictor variable into the Independent Variable field

• Click Statistics

Page 32: Statistics for the Social Sciences Psychology 340 Spring 2005 Prediction cont

Statistics for the Social Sciences

Multiple Regression in SPSS

– Click the ‘R squared change’ box

Page 33: Statistics for the Social Sciences Psychology 340 Spring 2005 Prediction cont

Statistics for the Social Sciences

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Multiple Regression in SPSS

• The variables in the first model (math SAT)• Shows the results of two models

• The variables in the second model (math and verbal SAT)

Page 34: Statistics for the Social Sciences Psychology 340 Spring 2005 Prediction cont

Statistics for the Social Sciences

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Multiple Regression in SPSS

• The variables in the first model (math SAT)

• r2 for the first model

• Coefficients for var1 (var name)

• Shows the results of two models

• The variables in the second model (math and verbal SAT)

• Model 1

Page 35: Statistics for the Social Sciences Psychology 340 Spring 2005 Prediction cont

Statistics for the Social Sciences

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Multiple Regression in SPSS

• The variables in the first model (math SAT)

• Coefficients for var1 (var name)

• Coefficients for var2 (var name)

• Shows the results of two models

• r2 for the second model

• The variables in the second model (math and verbal SAT)

• Model 2

Page 36: Statistics for the Social Sciences Psychology 340 Spring 2005 Prediction cont

Statistics for the Social Sciences

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Multiple Regression in SPSS

• The variables in the first model (math SAT)• Shows the results of two models

• The variables in the second model (math and verbal SAT)

• Change statistics: is the change in r2 from Model 1 to Model 2 statistically significant?

Page 37: Statistics for the Social Sciences Psychology 340 Spring 2005 Prediction cont

Statistics for the Social Sciences

Cautions in Multiple Regression

• We can use as many predictors as we wish but we should be careful not to use more predictors than is warranted.– Simpler models are more likely to generalize to other

samples.– If you use as many predictors as you have participants in

your study, you can predict 100% of the variance. Although this may seem like a good thing, it is unlikely that your results would generalize to any other sample and thus they are not valid.

– You probably should have at least 10 participants per predictor variable (and probably should aim for about 30).