Transcript
Page 1: Lecture slides stats1.13.l07.air

Statistics One

Lecture 7 Introduction to Regression

1

Page 2: Lecture slides stats1.13.l07.air

Three segments

•  Overview •  Calculation of regression coefficients •  Assumptions

2

Page 3: Lecture slides stats1.13.l07.air

Lecture 7 ~ Segment 1

Regression: Overview

3

Page 4: Lecture slides stats1.13.l07.air

Regression: Overview

•  Important concepts & topics – Simple regression vs. multiple regression – Regression equation – Regression model

4

Page 5: Lecture slides stats1.13.l07.air

Regression: Overview

•  Regression: a statistical analysis used to predict scores on an outcome variable, based on scores on one or multiple predictor variables – Simple regression: one predictor variable – Multiple regression: multiple predictors

5

Page 6: Lecture slides stats1.13.l07.air

Regression: Overview

•  Example: IMPACT (see Lab 2) – An online assessment tool to investigate the

effects of sports-related concussion •  http://www.impacttest.com

6

Page 7: Lecture slides stats1.13.l07.air

IMPACT example

•  IMPACT provides data on 6 variables –  Verbal memory –  Visual memory –  Visual motor speed –  Reaction time –  Impulse control –  Symptom score

7

Page 8: Lecture slides stats1.13.l07.air

IMPACT: Correlations pre-injury

8

Page 9: Lecture slides stats1.13.l07.air

IMPACT: Correlations pre-injury

9

Page 10: Lecture slides stats1.13.l07.air

IMPACT: Correlations post-injury

10

Page 11: Lecture slides stats1.13.l07.air

IMPACT: Correlations post-injury

11

Page 12: Lecture slides stats1.13.l07.air

IMPACT example

12

•  For this example, assume: – Symptom Score is the outcome variable – Simple regression example: •  Predict Symptom Score from just one variable

– Multiple regression example: •  Predict Symptom Score from two variables

Page 13: Lecture slides stats1.13.l07.air

Regression equation

•  Y = m + bX + e

– Y is a linear function of X – m = intercept – b = slope – e = error (residual)

13

Page 14: Lecture slides stats1.13.l07.air

Regression equation

•  Y = B0 + B1X1 + e

– Y is a linear function of X1 – B0 = intercept = regression constant – B1 = slope = regression coefficient – e = error (residual)

14

Page 15: Lecture slides stats1.13.l07.air

Model R and R2

•  R = multiple correlation coefficient – R = rÝY – The correlation between the predicted scores

and the observed scores •  R2

– The percentage of variance in Y explained by the model

15

Page 16: Lecture slides stats1.13.l07.air

IMPACT example

•  Y = B0 + B1X1 + e – Let Y = Symptom Score – Let X1 = Impulse Control

–  Solve for B0 and B1

– In R, function lm

16

Page 17: Lecture slides stats1.13.l07.air

IMPACT example

17

Ŷ = 20.48 + 1.43(X) r = .40 R2 = 16%

Page 18: Lecture slides stats1.13.l07.air

IMPACT example

18

Page 19: Lecture slides stats1.13.l07.air

Regression model

•  The regression model is used to model or predict future behavior – The model is just the regression equation – Later in the course we will discuss more

complex models that consist of a set of regression equations

19

Page 20: Lecture slides stats1.13.l07.air

Regression: It gets better

•  The goal is to produce better models so we can generate more accurate predictions – Add more predictor variables, and/or… – Develop better predictor variables

20

Page 21: Lecture slides stats1.13.l07.air

IMPACT example

•  Y = B0 + B1X1 + B2X2 + e – Let Y = Symptom Score – Let X1 = Impulse Control – Let X2 = Verbal Memory

–  Solve for B0 and B1 and B2

– In R, function lm

21

Page 22: Lecture slides stats1.13.l07.air

IMPACT example

22

Ŷ = 4.13 + 1.48(X1) + 0.22(X2) R2 = 22%

Page 23: Lecture slides stats1.13.l07.air

IMPACT example

23

Page 24: Lecture slides stats1.13.l07.air

Model R and R2

•  R = multiple correlation coefficient – R = rÝY – The correlation between the predicted scores

and the observed scores •  R2

– The percentage of variance in Y explained by the model

24

Page 25: Lecture slides stats1.13.l07.air

IMPACT example

25

R2 = 22% rÝY = .47

Page 26: Lecture slides stats1.13.l07.air

Segment summary

•  Important concepts & topics – Simple regression vs. multiple regression – Regression equation – Regression model

26

Page 27: Lecture slides stats1.13.l07.air

END SEGMENT

27

Page 28: Lecture slides stats1.13.l07.air

Lecture 7 ~ Segment 2

Calculation of regression coefficients

28

Page 29: Lecture slides stats1.13.l07.air

Estimation of coefficients

•  Regression equation: – Y = B0 + B1X1 + e – Ŷ = B0 + B1X1

•  (Y – Ŷ) = e (residual)

29

Page 30: Lecture slides stats1.13.l07.air

Estimation of coefficients

•  The values of the coefficients (e.g., B1) are estimated such that the regression model yields optimal predictions – Minimize the residuals!

30

Page 31: Lecture slides stats1.13.l07.air

Estimation of coefficients

•  Ordinary Least Squares estimation – Minimize the sum of the squared (SS) residuals

– SS.RESIDUAL = Σ(Y – Ŷ)2

31

Page 32: Lecture slides stats1.13.l07.air

IMPACT example

32

Page 33: Lecture slides stats1.13.l07.air

Estimation of coefficients

•  Sum of Squared deviation scores (SS) in variable Y –  SS.Y

33

SS.Y à

Page 34: Lecture slides stats1.13.l07.air

Estimation of coefficients

•  Sum of Squared deviation scores (SS) in variable X –  SS.X

34

SS.X à

Page 35: Lecture slides stats1.13.l07.air

Estimation of coefficients

•  Sum of Cross Products –  SP.XY

35

SS.Y à

SS.X à

SP.XY

Page 36: Lecture slides stats1.13.l07.air

Estimation of coefficients

•  Sum of Cross Products = SS of the Model –  SP.XY = SS.MODEL

36

SS.Y à

SS.X à

SS.MODEL

Page 37: Lecture slides stats1.13.l07.air

Estimation of coefficients

•  SS.RESIDUAL = (SS.Y – SS.MODEL)

37

SS.Y à

SS.X à

SS.MODEL

SS.RESIDUAL

Page 38: Lecture slides stats1.13.l07.air

Estimation of coefficients

•  Formula for the unstandardized coefficient – B1 = r x (SDy/ SDx)

38

Page 39: Lecture slides stats1.13.l07.air

Estimation of coefficients

•  Formula for the standardized coefficient –  If X and Y are standardized then

•  SDy = SDx = 1 •  B = r x (SDy/ SDx) • β = r

39

Page 40: Lecture slides stats1.13.l07.air

Segment summary

•  Important concepts – Regression equation and model – Ordinary least squares estimation – Unstandardized regression coefficients – Standardized regression coefficients

40

Page 41: Lecture slides stats1.13.l07.air

END SEGMENT

41

Page 42: Lecture slides stats1.13.l07.air

Lecture 7 ~ Segment 3

Assumptions

42

Page 43: Lecture slides stats1.13.l07.air

Assumptions

•  Assumptions of linear regression – Normal distribution for Y – Linear relationship between X and Y – Homoscedasticity

43

Page 44: Lecture slides stats1.13.l07.air

Assumptions

•  Assumptions of linear regression – Reliability of X and Y – Validity of X and Y – Random and representative sampling

44

Page 45: Lecture slides stats1.13.l07.air

Assumptions

•  Assumptions of linear regression – Normal distribution for Y – Linear relationship between X and Y – Homoscedasticity

45

Page 46: Lecture slides stats1.13.l07.air

Anscombe’s quartet

46

Page 47: Lecture slides stats1.13.l07.air

Anscombe’s quartet

•  Regression equation for all 4 examples:

– Ŷ = 3.00 + 0.50(X1)

47

Page 48: Lecture slides stats1.13.l07.air

Anscombe’s quartet

•  To test assumptions, save residuals

–  Y = B0 + B1X1 + e

–  e = (Y – Ŷ)

48

Page 49: Lecture slides stats1.13.l07.air

Anscombe’s quartet

•  Then examine a scatterplot with – X on the X-axis – Residuals on the Y-axis

49

Page 50: Lecture slides stats1.13.l07.air

Anscombe’s quartet

50

Page 51: Lecture slides stats1.13.l07.air

Segment summary

•  Assumptions when interpreting r – Normal distributions for Y – Linear relationship between X and Y – Homoscedasticity – Examine residuals to evaluate assumptions

51

Page 52: Lecture slides stats1.13.l07.air

END SEGMENT

52

Page 53: Lecture slides stats1.13.l07.air

END LECTURE 7

53


Top Related