chapter 5 regression. chapter outline the least-squares regression line facts about least-squares...

27
Chapter 5 Chapter 5 Regression Regression

Upload: amberlynn-gilbert

Post on 24-Dec-2015

255 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Chapter 5 Regression. Chapter outline The least-squares regression line Facts about least-squares regression Residuals Influential observations Cautions

Chapter 5Chapter 5

RegressionRegression

Page 2: Chapter 5 Regression. Chapter outline The least-squares regression line Facts about least-squares regression Residuals Influential observations Cautions

Chapter outline

The least-squares regression line Facts about least-squares regression Residuals Influential observations Cautions about correlation and

regression Association does not imply causation

Page 3: Chapter 5 Regression. Chapter outline The least-squares regression line Facts about least-squares regression Residuals Influential observations Cautions

Correlation and Regression

Regression effects are depicted by the slope of the line.

Correlation can be seen as the spread of points around the regression line. The greater the amount of spread of points around the regression line, the less predictive is X of Y and consequently, the weaker the correlation.

Page 4: Chapter 5 Regression. Chapter outline The least-squares regression line Facts about least-squares regression Residuals Influential observations Cautions

Perfect Positive Correlation

0

5

10

15

20

25

0 5 10 15 20 25

Correlation r = 1

Page 5: Chapter 5 Regression. Chapter outline The least-squares regression line Facts about least-squares regression Residuals Influential observations Cautions

No Correlation

0

2

4

6

8

10

12

0 20 40 60 80 100 120

Feeling Thermometer for Clinton

Po

tato

Ch

ip C

on

sum

pti

on

Page 6: Chapter 5 Regression. Chapter outline The least-squares regression line Facts about least-squares regression Residuals Influential observations Cautions

Imperfect Correlation and Relationships

We rarely see perfect correlation

While Correlation is never perfect, we can draw a line to summarize the trend in the data points. This is the Regression Line

Page 7: Chapter 5 Regression. Chapter outline The least-squares regression line Facts about least-squares regression Residuals Influential observations Cautions

Regression Line

Regression Line: A straight line that describes how a response variable y changes as an explanatory variable x changes.

It can sometimes be used to predict the value of y for a given value of x.

Page 8: Chapter 5 Regression. Chapter outline The least-squares regression line Facts about least-squares regression Residuals Influential observations Cautions

Making Predictions

Page 9: Chapter 5 Regression. Chapter outline The least-squares regression line Facts about least-squares regression Residuals Influential observations Cautions

Where do we Draw the Line?

Age and Income

0

5

10

15

20

25

30

35

40

45

0 10 20 30 40 50 60 70 80

Age

Inco

me

in

$10

,000

Page 10: Chapter 5 Regression. Chapter outline The least-squares regression line Facts about least-squares regression Residuals Influential observations Cautions

Minimize the sum of the distances between the points and the line

0

2

4

6

8

10

12

0 1 2 3 4 5 6 7

-3.5+2

+2-.25

-.25

Square the Distances

Page 11: Chapter 5 Regression. Chapter outline The least-squares regression line Facts about least-squares regression Residuals Influential observations Cautions
Page 12: Chapter 5 Regression. Chapter outline The least-squares regression line Facts about least-squares regression Residuals Influential observations Cautions

The best fitting line would minimize the sum of the squared distance of every point in the scatterplot from the regression line Minimize This line -- the best-fitting line -- is that line which -- compared to any other line you could plot through the points -- produced the smallest sum of squared deviations.

2

1

)ˆ(

n

iii yy

Page 13: Chapter 5 Regression. Chapter outline The least-squares regression line Facts about least-squares regression Residuals Influential observations Cautions
Page 14: Chapter 5 Regression. Chapter outline The least-squares regression line Facts about least-squares regression Residuals Influential observations Cautions

•The slope b is the change in y when x increases by 1.

• The intercept a is the predicted value of y when x = 0.

Page 15: Chapter 5 Regression. Chapter outline The least-squares regression line Facts about least-squares regression Residuals Influential observations Cautions

Finding the equation of the regression line

Exercise 5.16 (Page 125)

Page 16: Chapter 5 Regression. Chapter outline The least-squares regression line Facts about least-squares regression Residuals Influential observations Cautions

Facts about least-squares regression line

Fact 1:It is a mathematical model for the data. Fact 2: The distinction between explanatory and response

variables is essential in regression. Fact 3: There is a close connection between correlation and the

slope of least squares line. Fact 4: The least-squares regression line always passes through

the point , where is the mean of the x values, and is the mean of the y values.

Fact 5: The correlation r describes the strength of a straight-line relationship. In the regression setting, this description takes a specific form: the square of the correlation, r2, is the fraction of the variation in the value of y that is explained by the least squares regression of y on x.

),( yx x y

Page 17: Chapter 5 Regression. Chapter outline The least-squares regression line Facts about least-squares regression Residuals Influential observations Cautions
Page 18: Chapter 5 Regression. Chapter outline The least-squares regression line Facts about least-squares regression Residuals Influential observations Cautions

Residual plots

A residual plot is a scatterplot of the regression is a scatterplot of the regression residuals against the explanatory variable. residuals against the explanatory variable. Residual plots help us assess the fit of a Residual plots help us assess the fit of a regression line.regression line.

A residual is the difference between an is the difference between an observed of the response variable and the value observed of the response variable and the value predicted by the regression line. That is, predicted by the regression line. That is,

Residual =observed y – predicted yResidual =observed y – predicted y == yy ˆ

Page 19: Chapter 5 Regression. Chapter outline The least-squares regression line Facts about least-squares regression Residuals Influential observations Cautions
Page 20: Chapter 5 Regression. Chapter outline The least-squares regression line Facts about least-squares regression Residuals Influential observations Cautions
Page 21: Chapter 5 Regression. Chapter outline The least-squares regression line Facts about least-squares regression Residuals Influential observations Cautions
Page 22: Chapter 5 Regression. Chapter outline The least-squares regression line Facts about least-squares regression Residuals Influential observations Cautions

Outliers and Influential Observations Outliers and Influential Observations

An outlier is an observation that lies outside the overall is an observation that lies outside the overall pattern of the other observationspattern of the other observations

An observation is influential for a statistical calculation if An observation is influential for a statistical calculation if removing it would markedly change the result of the removing it would markedly change the result of the calculation. calculation.

Points that are outliers in the Points that are outliers in the x direction of a scatterplot of a scatterplot are often influential for the least-squares regression line. are often influential for the least-squares regression line. Influential observations can also be described as outliers. Influential observations can also be described as outliers.

Page 23: Chapter 5 Regression. Chapter outline The least-squares regression line Facts about least-squares regression Residuals Influential observations Cautions

Outliers and Influential Observations Outliers and Influential Observations

0

50

100

150

200

250

300

350

0 2 4 6 8 10 12 14 16

Wine consumption

He

art

att

ac

ks

Influential observation

Outlier

Page 24: Chapter 5 Regression. Chapter outline The least-squares regression line Facts about least-squares regression Residuals Influential observations Cautions
Page 25: Chapter 5 Regression. Chapter outline The least-squares regression line Facts about least-squares regression Residuals Influential observations Cautions

Beware extrapolation

ExtrapolationExtrapolation is the use of a regression line for prediction far outside the range of values of the explanatory variable x that you used to obtain the line. Such predictions are often not accurate.

Example

Suppose Angela was 1.20m tall on January 1st 1975, and 1.40m tall on January 1st 1976. By extrapolation, estimate her height on January 1st 1977.

By extrapolation, it could be estimated that by January 1st 1977 she would have grown another 0.20m to be 1.60m tall. This however assumes that she continued to grow at the same rate. This must eventually become a false assumption, otherwise by January 1st 1980, she would be a giantess.

Page 26: Chapter 5 Regression. Chapter outline The least-squares regression line Facts about least-squares regression Residuals Influential observations Cautions

Lurking variable

A lurking variableA lurking variable is a variable that has an important effect on the relationship among the variable in a study but is not included among the variables studied.

Example: Studies of relationship between treatment of heart disease and the patients’ gender show that women are in general treated less aggressively than men with similar symptoms. Women are less likely to undergo bypass operation.

Question: Might this be discrimination? Answer: No. Be aware of the lurking variable: Although half of heart disease victim are women, they are on the average much older than male victim.

Page 27: Chapter 5 Regression. Chapter outline The least-squares regression line Facts about least-squares regression Residuals Influential observations Cautions

Association does not imply causation

Example: Sales of rum and number of Methodist ministers is positively correlated, but a large number of ministers does not encourage rum drinking.

Is there a lurking variable that influences both rum sales and Methodist ministers?

The the previous example, both the sales of rum and the number of Methodists ministers were correlated with the number of people in the U.S. As the number of people increases, it causes an increase in demand for both Methodist ministers and for rum.