chapter 8: linear regression

35
Chapter 8: Linear Regression A.P. Statistics

Upload: spencer

Post on 16-Feb-2016

39 views

Category:

Documents


0 download

DESCRIPTION

Chapter 8: Linear Regression. A.P. Statistics. Linear Model. Making a scatterplot allows you to describe the relationship between the two quantitative variables. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Chapter 8:  Linear Regression

Chapter 8: Linear Regression

A.P. Statistics

Page 2: Chapter 8:  Linear Regression

Linear Model

• Making a scatterplot allows you to describe the relationship between the two quantitative variables.

• However, sometimes it is much more useful to use that linear relationship to predict or estimate information based on that real data relationship.

• We use the Linear Model to make those predictions and estimations.

Page 3: Chapter 8:  Linear Regression

Linear Model

Normal ModelAllows us to make predictions

and estimations about the population and future events.

It is a model of real data, as long as that data has a nearly symmetric distribution.

Linear ModelAllow us to make predictions

and estimations about the population and future events.

It is a model of real data, as long as that data has a linear relationship between two quantitative variables.

Page 4: Chapter 8:  Linear Regression

Linear Model and the Least Squared Regression Line

• To make this model, we need to find a line of best fit.

• This line of best fit is the “predictor line” and will be the way we predict or estimate our response variable, given our explanatory variable.

• This line has to do with how well it minimizes the residuals.

Page 5: Chapter 8:  Linear Regression

Residuals and the Least Squares Regression Line

• The residual is the difference between the observed value and the predicted value.

• It tells us how far off the model’s prediction is at that point

• Negative residual: predicted value is too big (overestimation)

• Positive residual: predicted value is too small (underestimation)

Page 6: Chapter 8:  Linear Regression

Residuals

Page 7: Chapter 8:  Linear Regression

Least Squares Regression Line

• The LSRL attempts to find a line where the sum of the squared residuals are the smallest.

• Why not just find a line where the sum of the residuals is the smallest?– Sum of residuals will always be zero – By squaring residuals, we get all positive values,

which can be added– Emphasizes the large residuals—which have a big

impact on the correlation and the regression line

Page 8: Chapter 8:  Linear Regression

Scatterplot of Math and Verbal SAT scores

480

500

520

540

560

580

600

620

640

660

680

Math_SAT500 520 540 560 580 600 620 640 660

Collection 1 Scatter Plot

Page 9: Chapter 8:  Linear Regression

Scatterplot of Math and Verbal SAT scores with incorrect LSRL

Verbal_SAT = 1.232Math_SAT - 144 Sum of squares = 2350

480500520540560580600620640660680

Math_SAT500 520 540 560 580 600 620 640 660

Collection 1 Scatter Plot

Page 10: Chapter 8:  Linear Regression

Scatterplot of Math and Verbal SAT scores with correct LSRL

Verbal_SAT = 1.11Math_SAT - 75.4 Sum of squares = 2076

; r2 = 0.91

480500520540560580600620640660680

Math_SAT500 520 540 560 580 600 620 640 660

Collection 1 Scatter Plot

Page 11: Chapter 8:  Linear Regression

Verbal_SAT = 1.11Math_SAT - 75.4 Sum of squares = 2076

; r2 = 0.91

480500520540560580600620640660680

Math_SAT500 520 540 560 580 600 620 640 660

Collection 1 Scatter Plot

Model of Collection 1 Simple Regression

Response attribute (numeric): Verbal_SATPredictor attribute (numeric): Math_SATSample count: 6

Equation of least-squares regression line: Verbal_SAT = 1.11024 Math_SAT - 75.424Correlation coefficient, r = 0.954082r-squared = 0.91027, indicating that 91.027% of the variation in Verbal_SAT is accounted for by Math_SAT.

The best estimate for the slope is 1.11024 +/- 0.4839 at a 95 % confidence level. (The standard error of the slope is 0.174288.)

When Math_SAT = 0 , the predicted value for a future observation of Verbal_SAT is -75.4244 +/- 288.073.

Page 12: Chapter 8:  Linear Regression

Least-Squares Regression Line

We Can Find the LSRL For Three Different Situations

• Using z-Scores of Real Data (Standardizing Data)

• Using Summary Statistics of Data (mean and standard deviation)

• Using Real Data

Page 13: Chapter 8:  Linear Regression
Page 14: Chapter 8:  Linear Regression

LSRL: Using z-Scores of Real Data

• LSRL passes through and

• LSRL equation is:

“moving one standard

deviation from the mean in x, we can expect to move about r standard deviations from the mean in y .”

yzxz

xy rzz ˆ

Page 15: Chapter 8:  Linear Regression

LSRL: Using z-Scores of Real Data (Interpretation)

LSRL of scatterplot:

For every standard deviation above (below) the mean a sandwich is in protein, we’ll predict that that its fat content is 0.83 standard deviations above (below) the mean.

proteinfat zz 83.0ˆ

Page 16: Chapter 8:  Linear Regression

LSRL: Using Summary Statistics of Data

Protein Fat

g 0.14g 2.17

xsx

g 4.16g 5.23

ysy

83.0r

xbby 10ˆ

slopeintercept-

1

0

byb

x

y

srs

b

xbyb

1

10

LSRL Equation:

Page 17: Chapter 8:  Linear Regression

LSRL: Using Summary Statistics of Data (Interpretation)

proteintaf 97.08.6ˆ

Slope: One additional gram of protein is associated with an additional 0.97 grams of fat.

y-intercept: An item that has zero grams of protein will have 6.8 grams of fat.

ALWAYS CHECK TO SEE IF Y-INTERCEPT MAKES SENSE IN THE CONTEXT OF THE PROBLEM AND DATA

Page 18: Chapter 8:  Linear Regression

LSRL: Using Summary Statistics of Data (Interpretation)

Use technology to get the LSRL. Making sure you check your conditions, etc.

Page 19: Chapter 8:  Linear Regression

Properties of the LSRL

The fact that the Sum of Squared Errors (SSE, same as Least Squared Sum)is as small as possible means that for this line:

• The sum and mean of the residuals is 0• The variation in the residuals is as small as

possible• The line contains the point of averages

yx,

Page 20: Chapter 8:  Linear Regression

Assumptions and Conditions for using LSRL

Quantitative Variable Condition

Straight Enough Conditionif not—re-express (Chapter 10)

Outlier Conditionwith and without ?

Page 21: Chapter 8:  Linear Regression

Residuals and LSRL

• Residuals should be used to see if a linear model is appropriate

• Residuals are the part of the data that has not been modeled in our linear model

Page 22: Chapter 8:  Linear Regression

Residuals and LSRLWhat to Look for in a

Residual Plot to Satisfy Straight Enough Condition:

No patterns, no interesting features (like direction or shape), should stretch horizontally with about same scatter throughout, no bends or outliers.

The distribution of residuals should be symmetric if the original data is straight enough.

Looking at a scatterplot of the residuals vs. the x-value is a good way to check the Straight Enough Condition, which determines if a linear model is appropriate.

Page 23: Chapter 8:  Linear Regression

Residuals, again

Page 24: Chapter 8:  Linear Regression

40

50

60

70

80

90

100

Exam_160 70 80 90 100

Collection 1 Scatter Plot

Page 25: Chapter 8:  Linear Regression

Exam_2 = 1.692Exam_1 - 75.5; r2 = 0.65

40

50

60

70

80

90

100

Exam_160 65 70 75 80 85 90

-20-10

01020

60 65 70 75 80 85 90Exam_1

Collection 1 Scatter Plot

Page 26: Chapter 8:  Linear Regression

Exam_2 = 0.788Exam_1 + 21.3; r2 = 0.84

40

50

60

70

80

90

100

Exam_160 70 80 90 100

-4

0

4

8

60 70 80 90 100Exam_1

Collection 1 Scatter Plot

Page 27: Chapter 8:  Linear Regression

A Complete Linear Regression AnalysisPART I

Draw a scatterplot of the data. Comment on what you see. (Satisfy Quantitative Data Condition)

• Form, strength, direction• Unusual Points, Deviations• Comment on General Variable Direction

Page 28: Chapter 8:  Linear Regression

A Complete Linear Regression AnalysisPART II

Compute r . Comment on what r means in context and if it is appropriate to use (does the relationship seem linear—Straight Enough Condition)

Page 29: Chapter 8:  Linear Regression

A Complete Linear Regression AnalysisPART III

Find the LSRL– Check all three conditions

• Quantitative Data Condition• Straight Enough Condition• Outlier Condition

Page 30: Chapter 8:  Linear Regression

A Complete Linear Regression AnalysisPART IV

Draw a residual plot and interpret it-is the linear model appropriate?

Page 31: Chapter 8:  Linear Regression

A Complete Linear Regression AnalysisPART V

Interpret slope in contextInterpret the y-intercept in

context

Page 32: Chapter 8:  Linear Regression

A Complete Linear Regression AnalysisPART VI

Compute R-Squared. Interpret the value and use as a measure for the accuracy of the model. “How well does the model predict?”

Page 33: Chapter 8:  Linear Regression

What is R-Squared

This value will determine how accurate the linear model is predicting your y-values from you x-values.

It is written as a percent.

It is, literally, your r-value squared.

Page 34: Chapter 8:  Linear Regression

R-Squared Interpretation

If a Regression analysis has an R-squared value of 97%, that means the model does an excellent job predicting the y-values in your model.

How do we interpret that?“97% of the variation is y can be accounted for

by the variation is x, on average.”

Page 35: Chapter 8:  Linear Regression

R-Squared Interpretation

• There are other ways to write that interpretation.

• Also, can be thought of as

how much error was eliminated in our predictions if we used the LSRL instead of a guess of . y