course: ap statistics chapter: 27 book:stats: modeling the world authors: bvd (2 nd edition)

55
Inference for Regression Course: AP Statistics Chapter: 27 Book: Stats: Modeling the World Authors: BVD (2 nd edition)

Upload: gwen-beasley

Post on 28-Dec-2015

221 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Course: AP Statistics Chapter: 27 Book:Stats: Modeling the World Authors: BVD (2 nd edition)

Inference for Regression

Course: AP Statistics

Chapter: 27

Book: Stats: Modeling the World

Authors: BVD (2nd edition)

Page 2: Course: AP Statistics Chapter: 27 Book:Stats: Modeling the World Authors: BVD (2 nd edition)

Inference for:

Categorical Variables: Use Chi-Squared Procedures

Quantitative Variables: Use LinearRegression

Page 3: Course: AP Statistics Chapter: 27 Book:Stats: Modeling the World Authors: BVD (2 nd edition)

Regression reminders

Regression Line: xbby 10ˆ

y Predicted value of y

int0 yb

slopeb 1

Page 4: Course: AP Statistics Chapter: 27 Book:Stats: Modeling the World Authors: BVD (2 nd edition)

Regression reminders

Regression Line: xbby 10ˆ

x is the explanatory variable

y is the response variable

Page 5: Course: AP Statistics Chapter: 27 Book:Stats: Modeling the World Authors: BVD (2 nd edition)

Regression reminders

Regression Line: xbby 10ˆ

residual = yye ˆ

y Predicted value of y

Actual value of yy

Page 6: Course: AP Statistics Chapter: 27 Book:Stats: Modeling the World Authors: BVD (2 nd edition)

So…what’s new??

Regression Line: xbby 10ˆ Now, in Chapter 27, the regression line we find represents a SAMPLE of some given data. It’s the best-fit line for that sample, so the slope and y-intercept we have found are the statistics for that line.

BIG QUESTION: What are the slope and y-intercepts for the POPULATION regression line?

Page 7: Course: AP Statistics Chapter: 27 Book:Stats: Modeling the World Authors: BVD (2 nd edition)

Chapter 27

Regression Line: xbby 10ˆ We are going to use our SAMPLE statistics to find our POPULATION statistics (or, at least, we’ll get as close as we can).

What are these called??

Page 8: Course: AP Statistics Chapter: 27 Book:Stats: Modeling the World Authors: BVD (2 nd edition)

Population Statistics

Regression Line: xbby 10ˆ = Sample slope

= Population slope

= Sample y-intercept

= Population y-intercept

1b

1

0b

0

Page 9: Course: AP Statistics Chapter: 27 Book:Stats: Modeling the World Authors: BVD (2 nd edition)

Population Regression Line

Sample Regression Line:

xbby 10ˆ

Population Regression Line:

xy 10ˆ

Page 10: Course: AP Statistics Chapter: 27 Book:Stats: Modeling the World Authors: BVD (2 nd edition)

Population Regression Line

Population Regression Line: xy 10ˆ So.. We have two statistics we need to find.

Q) What do we do? A) First find the slope and then find the y-intercept.

Q) What model will we use? A) Student’s t-curve, with

degree of freedom = df = n-2 (because we have both x values and y values to consider)

Page 11: Course: AP Statistics Chapter: 27 Book:Stats: Modeling the World Authors: BVD (2 nd edition)

Confidence Interval (for slope)

)(%95 11 bMEbCI

How do we find the Standard Error??

)(%95 11 bSEtbCI df

Page 12: Course: AP Statistics Chapter: 27 Book:Stats: Modeling the World Authors: BVD (2 nd edition)

Confidence Interval (for slope)

)(%95 11 bMEbCI

How do we find the Standard Error??

We don’t! We’ll let our calculator (or a computer printout) give it to us.

)(%95 11 bSEtbCI df

Page 13: Course: AP Statistics Chapter: 27 Book:Stats: Modeling the World Authors: BVD (2 nd edition)

Confidence Interval (for slope)Really? We don’t care about the Standard Error for the slope?

Well….actually, we care a little. It depends on three things:

1) The spread of the residuals -more about this later!

2) The spread of the x-values

3) The sample size (n)

eS

xS

Page 14: Course: AP Statistics Chapter: 27 Book:Stats: Modeling the World Authors: BVD (2 nd edition)

Confidence Interval (for slope)Let’s find the Standard Error. Ready to try??

Here is a sample computer printout.

Page 15: Course: AP Statistics Chapter: 27 Book:Stats: Modeling the World Authors: BVD (2 nd edition)

Confidence Interval (for slope)Help! That’s too confusing. What do I need?

Page 16: Course: AP Statistics Chapter: 27 Book:Stats: Modeling the World Authors: BVD (2 nd edition)

Confidence Interval (for slope)The Constant you see is the value of 0b

Page 17: Course: AP Statistics Chapter: 27 Book:Stats: Modeling the World Authors: BVD (2 nd edition)

Confidence Interval (for slope)Age is the name of x and the slope 1b

Page 18: Course: AP Statistics Chapter: 27 Book:Stats: Modeling the World Authors: BVD (2 nd edition)

Confidence Interval (for slope)Age is the name of x and the slope

1b

Page 19: Course: AP Statistics Chapter: 27 Book:Stats: Modeling the World Authors: BVD (2 nd edition)

Confidence Interval (for slope)Income is the name of y (the response variable)

1b

Page 20: Course: AP Statistics Chapter: 27 Book:Stats: Modeling the World Authors: BVD (2 nd edition)

Confidence Interval (for slope)The degree of freedom is given ….. df = 25

1b

Page 21: Course: AP Statistics Chapter: 27 Book:Stats: Modeling the World Authors: BVD (2 nd edition)

Confidence Interval (for slope)And so is the Standard Error for the slope!

337.7

1b

)( 1bSE

Page 22: Course: AP Statistics Chapter: 27 Book:Stats: Modeling the World Authors: BVD (2 nd edition)

Confidence Interval (for slope)The equation of the regression line would be:

1b

xy 203.2444.27300ˆ

Page 23: Course: AP Statistics Chapter: 27 Book:Stats: Modeling the World Authors: BVD (2 nd edition)

Confidence Interval (for slope)Wait a second….that’s chapter 8. We’re in Ch. 27. We want to find the Confidence Interval for Slope!

1b

Page 24: Course: AP Statistics Chapter: 27 Book:Stats: Modeling the World Authors: BVD (2 nd edition)

Confidence Interval (for slope))(%95 11 bSEtbCI df

7.337203.244 25 t

7.337060.2203.244

7.939,5.451

Page 25: Course: AP Statistics Chapter: 27 Book:Stats: Modeling the World Authors: BVD (2 nd edition)

Confidence Interval (for slope))(%95 11 bSEtbCI df 7.939,5.451

I am 95% confident that the true slope of the regression line is between -451.5 and 939.7. But….that’s not in context! We need to state this in context…..

Page 26: Course: AP Statistics Chapter: 27 Book:Stats: Modeling the World Authors: BVD (2 nd edition)

Confidence Interval (for slope))(%95 11 bSEtbCI df 7.939,5.451

I am 95% confident that the true change in the amount of income for 1 year increase in age is between $451.50 lost and $939.70 gained.

Page 27: Course: AP Statistics Chapter: 27 Book:Stats: Modeling the World Authors: BVD (2 nd edition)

Hypothesis Testing for SlopeThat’s great, but what about Hypothesis Testing?

What would the Null Hypothesis be?

Page 28: Course: AP Statistics Chapter: 27 Book:Stats: Modeling the World Authors: BVD (2 nd edition)

Hypothesis Testing for SlopeRemember, Null means Nothing. Or…no change.

Therefore, for each increase in x there must be no change in y.

That means the slope must be zero. Or….

Page 29: Course: AP Statistics Chapter: 27 Book:Stats: Modeling the World Authors: BVD (2 nd edition)

Hypothesis Testing for SlopeHypotheses

0: 10 H

0: 1 aH 2-tailed

Page 30: Course: AP Statistics Chapter: 27 Book:Stats: Modeling the World Authors: BVD (2 nd edition)

Hypothesis Testing for SlopeWhat about the t-score? The P-Value?

Easy! Everything is based on the student’s t-curve, so the mechanics are the same….

Page 31: Course: AP Statistics Chapter: 27 Book:Stats: Modeling the World Authors: BVD (2 nd edition)

Hypothesis Testing for Slope

For our line, it would be:

11

1

1

1

11 0

bSE

b

bSE

b

bSE

btdf

723.07.337

203.244

1

125

bSE

bt

Page 32: Course: AP Statistics Chapter: 27 Book:Stats: Modeling the World Authors: BVD (2 nd edition)

Hypothesis Testing for Slope

Hey…is that value in the computer printout??

723.07.337

203.244

1

125

bSE

bt

Page 33: Course: AP Statistics Chapter: 27 Book:Stats: Modeling the World Authors: BVD (2 nd edition)

Hypothesis Testing for SlopeThe P-Value is also the same:

)723.( 25tPValueP

Page 34: Course: AP Statistics Chapter: 27 Book:Stats: Modeling the World Authors: BVD (2 nd edition)

Hypothesis Testing for SlopeWith a P-value of .4763, which is large compared to an alpha level of .05, I will fail to reject the null and conclude that there is no evidence to suggest the true slope is different than zero.

Page 35: Course: AP Statistics Chapter: 27 Book:Stats: Modeling the World Authors: BVD (2 nd edition)
Page 36: Course: AP Statistics Chapter: 27 Book:Stats: Modeling the World Authors: BVD (2 nd edition)

Conditions & Assumptions

What about the conditions and assumptions???

We skipped them…. And ……

THAT’S BAD!

Page 37: Course: AP Statistics Chapter: 27 Book:Stats: Modeling the World Authors: BVD (2 nd edition)

Conditions & AssumptionsThere are 4 of them to satisfy.

1) Linearity assumptionThe scatterplot of the data should be “roughly linear”. We show this two ways and we have done both before!

1) Graph the scatterplot and look at it. Does it look straight?

2) Graph the residuals against the x-variable. It should be randomly scattered.

If this condition fails then straighten the data (see Ch. 9)

Page 38: Course: AP Statistics Chapter: 27 Book:Stats: Modeling the World Authors: BVD (2 nd edition)

Conditions & AssumptionsHere is a scatterplot comparing waist size to body fat percentage.

Page 39: Course: AP Statistics Chapter: 27 Book:Stats: Modeling the World Authors: BVD (2 nd edition)

Conditions & AssumptionsHere is a scatterplot of the residuals plotted against the x-value (waist size).

Page 40: Course: AP Statistics Chapter: 27 Book:Stats: Modeling the World Authors: BVD (2 nd edition)

Conditions & Assumptions2) Independence Assumption

The next three are a little tricky. That’s only because we need to understand what is happening with inference on regression lines. Here’s the situation:

When you have a sample of data and you find the sample regression line for that data you are fitting the line that best fits (or passes through) the y-values that you have plotted at each x-value.

Here is an example:

Page 41: Course: AP Statistics Chapter: 27 Book:Stats: Modeling the World Authors: BVD (2 nd edition)

Conditions & AssumptionsHere is the scatterplot again (comparing waist size to body fat percentage).

Page 42: Course: AP Statistics Chapter: 27 Book:Stats: Modeling the World Authors: BVD (2 nd edition)

Conditions & AssumptionsAt each x-value there are multiple y-values that spread out around the line.

Page 43: Course: AP Statistics Chapter: 27 Book:Stats: Modeling the World Authors: BVD (2 nd edition)

Conditions & AssumptionsFor the true regression line, the y-values at each x-value should each be nearly normal.

In fact, notice that the true regression line passes through the mean of each set of y-values…..

Page 44: Course: AP Statistics Chapter: 27 Book:Stats: Modeling the World Authors: BVD (2 nd edition)

Conditions & AssumptionsThat means the true regression line can be thought of as:

and the residuals would be:

Notice that this is the same as

xy 10

yy

yye ˆ

Page 45: Course: AP Statistics Chapter: 27 Book:Stats: Modeling the World Authors: BVD (2 nd edition)

Conditions & AssumptionsSo..the residuals are really what we care about here, not the y-values….

If the residuals for a regression model make sense then so will the y-values.

Page 46: Course: AP Statistics Chapter: 27 Book:Stats: Modeling the World Authors: BVD (2 nd edition)

Conditions & Assumptions2) Independence Assumption

Okay, back to #2. We now know the residuals (errors) are what we care about here. For #2 we want these to be independent for a given sample.

If the sample was collected randomly, we are fine. Just state that the data can be assumed to be independent because the sample was random. You have no reason to believe that any y-value (or residual) has any impact on another one. Easy!

Page 47: Course: AP Statistics Chapter: 27 Book:Stats: Modeling the World Authors: BVD (2 nd edition)

Conditions & Assumptions2) Independence Assumption

Wait…didn’t you say this was hard? Well, it can be. If you are graphing a time plot (x represents time) the y-values might not be independent. Now you need to check the residuals. So…we graph them against the x-values (you already did this!) and see what we get. It should be a random scatter. Any pattern will show there is some sort of relationship which indicates a lack of independence.

Moving on….

Page 48: Course: AP Statistics Chapter: 27 Book:Stats: Modeling the World Authors: BVD (2 nd edition)

Conditions & Assumptions3) Equal Variance Assumption

Okay…this one is a little tricky. But, that’s only because you don’t know WHY we are checking for it. Let’s stop and figure that out first.

The best thing to do is to go once more to that image of normal models along the line….

Page 49: Course: AP Statistics Chapter: 27 Book:Stats: Modeling the World Authors: BVD (2 nd edition)

Conditions & Assumptions3) Equal Variance Assumption

What we want is for the spread of each set of y-values to be roughly the same. Remember, we care about residuals, so what this means is that we want to Standard Deviation of the residuals to be uniform. That means the residuals should be the same throughout.

Page 50: Course: AP Statistics Chapter: 27 Book:Stats: Modeling the World Authors: BVD (2 nd edition)

Conditions & Assumptions3) Equal Variance Assumption

That means we want the spread of each set of y-values to be roughly the same. Remember, we care about residuals, so what this means is that we want the Standard Deviation of the residuals to be uniform.

Huh? Well, it means should not fan out, or clump together. The spread about the line should be the same (constant) throughout. This is called the, “DOES THE PLOT THICKEN?” Condition.

How do we check for this? Residuals again. If the plot does fan out, it will show up in the residual plot against y. Here it is:

Page 51: Course: AP Statistics Chapter: 27 Book:Stats: Modeling the World Authors: BVD (2 nd edition)

Conditions & Assumptions3) Equal Variance Assumption

Randomly scattered residuals!

y-values (body fat percentage)

Page 52: Course: AP Statistics Chapter: 27 Book:Stats: Modeling the World Authors: BVD (2 nd edition)

Conditions & Assumptions4) Normal Population Assumption

We’ve already looked at the residuals and seen that we want them to be nearly normal at each x-value. This is important so we can use the Student t-curve in the mechanics section.

How do we check this? Group all the residuals together (they are sitting in the Resid List, waiting for you!) and graph them to see if they are nearly normal.

Graph them? To check for nearly normal? HOW????

Page 53: Course: AP Statistics Chapter: 27 Book:Stats: Modeling the World Authors: BVD (2 nd edition)

Conditions & Assumptions4) Normal Population Assumption

HOW? You know how! We’ve done this before!!

Graph them as a histogram and check for unimodal and symmetric.

What about the normal probability plot? Should we do that as well?

Page 54: Course: AP Statistics Chapter: 27 Book:Stats: Modeling the World Authors: BVD (2 nd edition)

Conditions & Assumptions4) Normal Population Assumption

HOW? You know how! We’ve done this before!!

Graph them as a histogram and check for unimodal and symmetric.

What about the normal probability plot? Should we do that as well?

Not this time! Phew!!

Page 55: Course: AP Statistics Chapter: 27 Book:Stats: Modeling the World Authors: BVD (2 nd edition)

Practice Problem!

Let’s try one that is done on the calculator instead of the computer printout.

The best part of this is that Linear Regression for slope is never done by yourself. All of the calculations are either given by the calculator or by the computer or are really easy.

And…one more thing…..we rarely do regression for the y-intercept. It usually isn’t a value we care about!