lecture 21: correlation and regression, part 2

15
CORRELATION AND REGRESSION, PART 2 LECTURE #21

Upload: jason-edington

Post on 21-Jan-2018

55 views

Category:

Education


1 download

TRANSCRIPT

Page 1: Lecture 21: Correlation and regression, part 2

CORREL ATION AND

REGRESSION, PART 2

L E C T U R E # 2 1

Page 2: Lecture 21: Correlation and regression, part 2

CORRELATION AND REGRESSION, PART 2

• To give you some practice with the concepts of correlation and regression as developed in the last lecture, we’ll consider fictional students’ heights, forearm lengths, and head circumferences

• We’ll then place them in a table which you can copy into the lists on your calculator, putting Height in List 1, Forearm in List 2, and Head in List 3

• A good way to check that you’ve put in the correct numbers is to find the mean of each set and make sure it matches these means:

– Height: 167.1 cm

– Forearm: 43.5 cm

– Head: 56.1 cm

Person Height Forearm Head1 174 43 56

2 166 44 57

3 150 39 56

4 176 48 60

5 160 42 56

6 167 45 56

7 181 47 56

8 170 42 57

9 172 45 55

10 155 40 52

Page 3: Lecture 21: Correlation and regression, part 2

FINDING THE REGRESSION EQUATION AND THE SAMPLE CORRELATION COEFFICIENT• First, using height as the predictor variable and

forearm length as the response variable, let’s find the equation of the regression line and the value of the sample correlation coefficient

– We’ll round all numbers to the nearest thousandth

• Note that Height is in List 1 and Forearm in List 2

• Putting in LinReg (ax+b) L1, L2 – naming the predictor variable list first and the response variable list second – might yield the following screen:

• I say “might” because if this is what you got, though it gives the values of a and bin the regression equation, it doesn’t tell you the sample correlation coefficient, and you’ll have to change the settings on the your calculator

• Push 2nd 0 (CATALOG), cursor down to DiagnosticOn, and then press ENTER twice. That should do it, and now the same command – LinReg (ax+b) L1, L2 –should yield

Finding y and r Prediction Residuals Multiple Regression

Page 4: Lecture 21: Correlation and regression, part 2

FINDING THE REGRESSION EQUATION AND THE SAMPLE CORRELATION COEFFICIENT• So the regression equation, with a and b rounded to the nearest thousandth, becomes

𝒚 = 𝟎. 𝟐𝟓𝟖𝒙 + 𝟎. 𝟑𝟖𝟐

• and the sample correlation coefficient

r = 0.868

• Let’s try another one, this time using head size (List 3) as the predictor variable and height as the response variable

• You try it…

• Using LinReg (ax+b) L3, L1, we get

• So the regression equation is

𝒚 = 𝟐. 𝟑𝟏𝟖𝒙 + 𝟑𝟕. 𝟎𝟓𝟕

• and the sample correlation coefficient is

r = 0.472

• …to the nearest thousandth

Finding y and r Prediction Residuals Multiple Regression

Page 5: Lecture 21: Correlation and regression, part 2

C R I T I C A L V A L U E T A B L E

H E R E I S A C R I T I C A L V A L U E T A B L E

F O R r V A L U E S

Finding y and r Prediction Residuals Multiple Regression

Page 6: Lecture 21: Correlation and regression, part 2

PREDICTION

• As explained in the last lecture, the question of whether we’re justified in using the regression

equation to predict a value of the response variable for a certain value of the predictor

variable…

– …can be settled by looking at the critical value for r at a given significance level and for a given

number of degrees of freedom

• The critical value for the 5% significance level with 8 degrees of freedom was 0.632

• So for the purpose of predicting forearm length from height, where r was 0.868, we can use the

equation, and when we do, we give the y another name 𝒚, pronounced y-hat of course, and

meaning ‘the predicted value of y’

Finding y and r Prediction Residuals Multiple Regression

Page 7: Lecture 21: Correlation and regression, part 2

PREDICTION

• Let’s use the equation to predict the forearm length of a person of height 179 cm

𝒚 = 𝟎. 𝟐𝟓𝟖𝒙 + 𝟎. 𝟑𝟖𝟐 = 𝟎. 𝟐𝟓𝟖 𝟏𝟕𝟗 + 𝟎. 𝟑𝟖𝟐 = 𝟒𝟔. 𝟓𝟔𝟒

– The forearm length we would predict for a person whose height is 179 cm is 46.6 cm, to the nearest

tenth of a centimeter

• What if we were asked to predict the height of a person whose head size is 53 cm?

• Since 𝒓 = 𝟎. 𝟒𝟕𝟐 in this case, less than the critical value, we’ll have to resort to predicting that

the person, despite his or her head size, would be about average in height, and the mean height

of the sample, which we’ll now call 𝑦 because height is the y-variable, is 167.1 cm

– So that’s our prediction

Finding y and r Prediction Residuals Multiple Regression

Page 8: Lecture 21: Correlation and regression, part 2

RESIDUALS

• Once you’re established that two variables have a significant correlation, you can go on to see how accurate your prediction of the value of the response variable is in individual cases

• Last lecture we defined the residual as the difference between the actual value of the response variable and the predicted value based on the regression line

• Then I illustrated residuals graphically as the directed lengths of vertical lines on the graph, like this

• Now we’ll use a formula for the residual for the nth individual

• It’s 𝑦𝑛 − 𝑦𝑛 where

– 𝑦𝑛 is the actual value of the response variable for the nth

individual

– 𝑦𝑛 is the predicted value of the response variable for the nth

individual

Finding y and r Prediction Residuals Multiple Regression

Page 9: Lecture 21: Correlation and regression, part 2

RESIDUALS

• In the regression analysis in which height is the predictor variable and forearm length is the response variable, what is the residual for Person #2 to the nearest tenth?

• We can ask this question, because we showed that height and forearm length are significantly correlated

• The x-value (height) of Person #2 is 𝒙𝟐 = 𝟏𝟔𝟔cm

• The y-value (forearm length) is 𝒚𝟐 = 𝟒𝟒cm

• Using the regression equation 𝒚 = 𝟎. 𝟐𝟓𝟖𝒙 + 𝟎. 𝟑𝟖𝟐

• we get

𝒚𝟐 = 𝟎. 𝟐𝟓𝟖 𝟏𝟔𝟔 + 𝟎. 𝟑𝟖𝟐 = 𝟒𝟑. 𝟐𝟏cm

• The residual for Person #2 is thus

𝒚𝟐 − 𝒚𝟐 = 𝟒𝟒 − 𝟒𝟑. 𝟐𝟏 = 𝟎. 𝟕𝟗 cm

• Or 0.8 cm to the nearest tenth

• This means that Person #2’s forearms are a little under a centimeter longer than we’d expect given his or her height

Finding y and r Prediction Residuals Multiple Regression

Page 10: Lecture 21: Correlation and regression, part 2

RESIDUALS

• What’s the residual for Person #8?

• This person’s height is 𝒙𝟖 = 𝟏70cm , with forearm length 𝒚𝟖 = 𝟒𝟐cm

• We get the predicted value

𝒚𝟖 = 𝟎. 𝟐𝟓𝟖 𝟏𝟕𝟎 + 𝟎. 𝟑𝟖𝟐 ≈ 𝟒𝟒. 𝟐𝟒cm

• This yields

𝒚𝟖 − 𝒚𝟖 = 𝟒𝟐 − 𝟒𝟒. 𝟐𝟒 = −𝟐. 𝟐𝟒cm

• Thus Person #8’s residual to the nearest tenth is -2.2 cm

• Person #8’s forearms are a little more than 2 centimeters shorter than we’d expect given his or her height

Finding y and r Prediction Residuals Multiple Regression

Page 11: Lecture 21: Correlation and regression, part 2

MULTIPLE REGRESSION

• Finding and stating the linear regression equation and the sample correlation coefficient,

making predictions using the equation (or not, if a significant correlation cannot be claimed to

hold), and finding residuals

– these are the tasks we’ve covered

• Linear regression is just one way of attempting to make sense of the relationship between two

variables

• We saw some other patterns at the end of the last lecture

Finding y and r Prediction Residuals Multiple Regression

Page 12: Lecture 21: Correlation and regression, part 2

MULTIPLE REGRESSION

• But in all these cases, linear and otherwise, we’re trying to express the value of the response

variable in terms of one other variable, the predictor variable

• We call comparing two variables in this way a simple relationship, and the prediction equation is

an example of simple regression

• It’s ‘simple’ because there’s only one predictor variable

• Sometimes, though, it’s a lot better to use more than one predictor variable

Finding y and r Prediction Residuals Multiple Regression

Page 13: Lecture 21: Correlation and regression, part 2

MULTIPLE REGRESSION

• Say you’re trying to predict shoe size from height (as you’ll be doing in Assignment #9)

• Height turns out to be a pretty good predictor, but, as you know, some people have large feet

for their height and some have small feet

• Perhaps the tendency to have large or small feet runs in families

• Wouldn’t it be helpful to know the shoe sizes a person’s mother and father wear?

Finding y and r Prediction Residuals Multiple Regression

Page 14: Lecture 21: Correlation and regression, part 2

MULTIPLE REGRESSION

• You could label the person’s height 𝒙𝟏, the mother’s shoe size 𝒙𝟐, and the father’s shoe size 𝒙𝟑, and then you could (or at least a computer could) construct a prediction equation which would look like this:

𝑦 = 𝑎 + 𝑏1𝑥1 + 𝑏2𝑥2 +𝑏3𝑥3

• The relation between your height and your parents’ shoe sizes on the one hand, and your own shoe size on the other is an example of a multiple relationship, and the prediction equation involves multiple regression

• The job of the researcher is to find the best predictor variables to use, ones which do not overlap and depend upon each other, and which produce the closest fit to the data

Finding y and r Prediction Residuals Multiple Regression

Page 15: Lecture 21: Correlation and regression, part 2

ACTIVITY #21: CORRELATION AND REGRESSION

1) Using forearm length as the independent variable and height as the dependent variable, state the equation of the regression line and the value of the sample correlation coefficient, to the nearest thousandth

2) Using height as the predictor variable and head size as the response variable, state the equation of the regression line and the value of the sample correlation coefficient, to the nearest thousandth

In 3) and 4):

if r > 0.600, use the regression equation to make the prediction

If r < 0.600, predict the average for the response variable

3) Predict to the nearest tenth the height of a person whose forearm length is 42 cm.

4) Predict to the nearest tenth the head size of a person whose height is 172 cm.

5) In the regression analysis using forearm length as the independent variable and height as the dependent variable, find the residual to the nearest tenth for Person # 4.

6) In the regression analysis using forearm length as the independent variable and height as the dependent variable, find the residual to the nearest tenth for Person #9

Person Height Forearm Head

1 174 43 56

2 166 44 57

3 150 39 56

4 176 48 60

5 160 42 56

6 167 45 56

7 181 47 56

8 170 42 57

9 172 45 55

10 155 40 52