ap statistics section 3.2 least squares regression

13
AP STATISTICS Section 3.2 Least Squares Regression

Upload: bertina-park

Post on 18-Jan-2016

233 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: AP STATISTICS Section 3.2 Least Squares Regression

AP STATISTICSSection 3.2 Least Squares Regression

Page 2: AP STATISTICS Section 3.2 Least Squares Regression

Objective: To be able to derive the least squares regression line.

A regression line is a straight line that describes how a response variable changes as the explanatory variable changes.

The regression line as the model:

The idealized model: or • is the response variable• is the explanatory variable• is the y–intercept: “a is the predicted value of y when x = 0.”• is the slope: “For every unit change in x there is a predicted

change of b in y.”

Page 3: AP STATISTICS Section 3.2 Least Squares Regression

Extrapolation: Using your model to make predictions outside the range of the x-values. RISKY!!

The Least Squares Regression Line: (LSRL) It is the line that minimizes the sum of the squares of the vertical distances between the points and the model.

Model:

• is the predicted response.• is the explanatory variable.

Page 4: AP STATISTICS Section 3.2 Least Squares Regression

Ex. Derive the LSRL for the number of TVs versus the number of rooms in someone’s house.

Page 5: AP STATISTICS Section 3.2 Least Squares Regression

1. Define all variables in the model.

2. Interpret the slope and y-intercept in the context of the problem.

3. Does the y-intercept have meaning in the context of this problem? Why?

4. What is the predicted number of TVs when there are 20 rooms in a house? How do you feel about this predictions?

Page 6: AP STATISTICS Section 3.2 Least Squares Regression

Residual: (error) the difference between the observed y-value and the predicted y-value.

residual = = observed – predicted• A residual value is positive if the point lies above the line.• A residual value is negative if the point lies below the line.

A residual plot is a scatterplot of the residuals versus x.• Used to determine how well the line fits the data.• Magnifies the errors.• Graphed around the line y = 0• Sum of the residuals is 0.• Some residual plots plot residuals versus the predictions

(. The appearance will be the same as residuals vs. x.

Page 7: AP STATISTICS Section 3.2 Least Squares Regression

• If most of the residuals are positive then the model is underestimating the predictions.

• If most of the residuals are negative then the model is overestimating the predictions.

• INTERPRETATION: If the residual plot is scattered, then the model is a good fit for the data.

• We do NOT want to see patterns in our residual plots.

BEWARE:

1. A curved pattern indicates that the data is nonlinear.

2. Watch for funnel/megaphone patterns.

3. Don’t look too hard for a pattern.

4. Be careful interpreting data sets where n is small.

Page 8: AP STATISTICS Section 3.2 Least Squares Regression

Ex. Use LSRL techniques to develop a model for shoe size versus height. Then calculate the residuals and create a residual plot.

Data:

1. Create a scatterplot of the data.

2. Describe the scatterplot.

3. Using your calculator, derive the LSRL and find correlation.

4. How does the value for correlation support your description in #2.

5. Define all variables in the model.

Page 9: AP STATISTICS Section 3.2 Least Squares Regression

6. Interpret the y-intercept in the context of the problem. Does it have

meaning in this setting?

7. Interpret the slope in the context of the problem.

8. Add the LSRL to the scatterplot on your calculator.

9. What is the predicted shoe size when the height is 80 inches tall?

How do you feel about this prediction?

10. Calculate the residuals for all observations. Show how the first

residual was calculated.

Page 10: AP STATISTICS Section 3.2 Least Squares Regression

10. Create a residual plot.

11. Interpret the residual plot.

Page 11: AP STATISTICS Section 3.2 Least Squares Regression

Standard deviation of the residuals:

The larger this quantity is the more variability there is in the points around the line.

Coefficient of Determination: () “The percentage of change in y that is explained by the linear relationship of y on x.”

where

Total Sum of Squares =

Sum of the Squared Errors =

Page 12: AP STATISTICS Section 3.2 Least Squares Regression

Ex. Find by hand for (0,0), (5,9) & (10,6)

Page 13: AP STATISTICS Section 3.2 Least Squares Regression

Q: If we only know r, can we 100% certain what the value of is?

Q: If we only know , can we be 100% what the value of r is?

Q: If we only know and the y-intercept, can we be 100% what r is?

Q: If we only know and the slope, can we be 100% what r is?

Q: Can be greater than r?

Q: 0 < < 1? True or False

Q: r < < 1? True or False

Q: 0 < < r? True or False