5-1 bivar. unit 5 correlation and regression: examining and modeling relationships between variables...

70
5-1 bivar . Unit 5 Correlation and Regression: Examining and Modeling Relationships Between Variables Chapters 8 - 12 Outline: Two variables Scatter Diagrams to display bivariate data Correlation Concept, Interpretation, Computation, Cautions Regression Model: Using a LINE to describe the relation between two variables & for prediction •Finding "the" line •Interpreting its coefficients Residuals, Prediction Errors Extensions of Simple Linear Regression A.05

Post on 20-Dec-2015

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 5-1 bivar. Unit 5 Correlation and Regression: Examining and Modeling Relationships Between Variables Chapters 8 - 12 Outline: Two variables Scatter Diagrams

5-1bivar.

Unit 5Correlation and Regression:

Examining and Modeling Relationships Between Variables

Chapters 8 - 12

Outline: Two variables

Scatter Diagrams to display bivariate data

CorrelationConcept, Interpretation, Computation, Cautions

Regression Model:Using a LINE to describe the relation between two variables & for prediction•Finding "the" line•Interpreting its coefficients

Residuals, Prediction Errors

Extensions of Simple Linear Regression

A.05

Page 2: 5-1 bivar. Unit 5 Correlation and Regression: Examining and Modeling Relationships Between Variables Chapters 8 - 12 Outline: Two variables Scatter Diagrams

5-2bivar.

Four Scatter Diagrams

2 4 6 8

6

7

8

9

10

size of help wanted ad

# applicants

20

25

30

35

40

45

costpermin.($)

6.0 6.4 7.06.6CUME rating

2

4

8

10

12

14

3 6 9 12

% delinquent

age of credit account (years)

90

110

120

130

140

150

10 30 50entertain. expenses (x $100)

lastyear'ssales($1000)

Page 3: 5-1 bivar. Unit 5 Correlation and Regression: Examining and Modeling Relationships Between Variables Chapters 8 - 12 Outline: Two variables Scatter Diagrams

5-3bivar.

If there is STRONG ASSOCIATION between 2 variables, then knowing one helps a lot in predicting the other.

If there is WEAK ASSOCIATION between 2 variables, then information about one variable does not help much in predicting the other.

dependent variable

independent variable

Usually, the INDEPENDENT variable is thought to influence the DEPENDENT variable.

Association

Page 4: 5-1 bivar. Unit 5 Correlation and Regression: Examining and Modeling Relationships Between Variables Chapters 8 - 12 Outline: Two variables Scatter Diagrams

5-4bivar.

Summarizing the RelationshipBetween Two Variables

1. Plot the points in a scatter diagram.

2. Find average for X and average for Y. Plot the point of averages.

3. Find SD(X), which measures horizontal spread of points, and SD(Y), which measures vertical spread of points.

4. Find the correlation coefficient (r), which measures the degree of clustering / spread of points about a line (the SD line).

Y Y

X X

Page 5: 5-1 bivar. Unit 5 Correlation and Regression: Examining and Modeling Relationships Between Variables Chapters 8 - 12 Outline: Two variables Scatter Diagrams

5-5bivar.

Wood Products Shipments and Employment,

by state, 1989, excl. California

Employment x 100

Shipments ($ million)

0 50 100 150 200 2500

10

20

30

40

50

Page 6: 5-1 bivar. Unit 5 Correlation and Regression: Examining and Modeling Relationships Between Variables Chapters 8 - 12 Outline: Two variables Scatter Diagrams

5-6bivar.

Wood Products Data

469.8 7,900246.4 4,400205.4 2,800186.5 3,600175.8 3,800142.9 2,100139.7 2,400120.6 1,900118.0 1,500104.3 1,500

89.9 1,600 73.5 1,500 72.6 1,400 71.4 1,200 53.9 800 52.4 1,400 50.1 1,200 48.1 1,400 47.0 1,100 36.7 800 27.4 500 27.3 400 22.9 300

Shipments ($ million)

Shipments Employment

Page 7: 5-1 bivar. Unit 5 Correlation and Regression: Examining and Modeling Relationships Between Variables Chapters 8 - 12 Outline: Two variables Scatter Diagrams

5-7bivar.

Wood Products Shipments and Employment,

by state, 1989, excl. California

Employment x 100

Shipments ($ million)

0 50 100 150 200 2500

10

20

30

40

50

Page 8: 5-1 bivar. Unit 5 Correlation and Regression: Examining and Modeling Relationships Between Variables Chapters 8 - 12 Outline: Two variables Scatter Diagrams

5-8bivar.

Page 9: 5-1 bivar. Unit 5 Correlation and Regression: Examining and Modeling Relationships Between Variables Chapters 8 - 12 Outline: Two variables Scatter Diagrams

5-9bivar.

Linear Association

The correlation coefficient measures the LINEAR relationship between TWO variables.

It is a measure of LINEAR association or clustering around a line.

r near +1 r near -1

r positive, r negative near 0 near 0

r =1 r = -1

Page 10: 5-1 bivar. Unit 5 Correlation and Regression: Examining and Modeling Relationships Between Variables Chapters 8 - 12 Outline: Two variables Scatter Diagrams

5-10bivar.

Interpretation of r

The closer the correlation coefficient is to 1 (or -1), the more tightly clustered the points are around a line (the SD line).

The SD line passes through all points which are an equal # of SD's away from the average for both variables.

positive association negative association

Page 11: 5-1 bivar. Unit 5 Correlation and Regression: Examining and Modeling Relationships Between Variables Chapters 8 - 12 Outline: Two variables Scatter Diagrams

5-11bivar.

Twelve Plots, with r

Look in your textbook, pages 127 and 129.

Page 12: 5-1 bivar. Unit 5 Correlation and Regression: Examining and Modeling Relationships Between Variables Chapters 8 - 12 Outline: Two variables Scatter Diagrams

5-12bivar.

Page 13: 5-1 bivar. Unit 5 Correlation and Regression: Examining and Modeling Relationships Between Variables Chapters 8 - 12 Outline: Two variables Scatter Diagrams

5-13bivar.

Computing the Correlation Coefficient, r

= 1n (Xi-X)(Yi-Y)∑

SD(X) SD(Y)

= (XiYi) - X Y1

n ∑SD(X) SD(Y)

= Covariance(X,Y)SD(X) SD(Y)

Convert each variable to standard units. The average of the products gives the correlation coefficient.

r = average of (z-score for X) (z-score for Y)

Page 14: 5-1 bivar. Unit 5 Correlation and Regression: Examining and Modeling Relationships Between Variables Chapters 8 - 12 Outline: Two variables Scatter Diagrams

5-14bivar.

Example: Computation of r

X Y X-X (X-X)2 Y-Y (Y-Y)2 z-score for X z-score for Y product

Page 15: 5-1 bivar. Unit 5 Correlation and Regression: Examining and Modeling Relationships Between Variables Chapters 8 - 12 Outline: Two variables Scatter Diagrams

5-15bivar.

Some Cases When the Correlation Coefficient, r,

Does Not Give A Good Indication of Clustering

0 2 4 6 8 100

2

4

6

8

10

X

0 10 20 30 400

100

200

300

400

500

600

700

800

INDEP

r = .155 r = .536

Y

Page 16: 5-1 bivar. Unit 5 Correlation and Regression: Examining and Modeling Relationships Between Variables Chapters 8 - 12 Outline: Two variables Scatter Diagrams

5-16bivar.

0 1000200030004000500060007000

0

1000

2000

3000

4000

5000

6000

BODY WEIGHT IN KG

r = .933(36 data values)

BRAINWEIGHT IN KG

Page 17: 5-1 bivar. Unit 5 Correlation and Regression: Examining and Modeling Relationships Between Variables Chapters 8 - 12 Outline: Two variables Scatter Diagrams

5-17bivar.

“No Elephants”

0 100 200 300 400 500 6000

500

1000

1500

r = .596

body weight in kg

brain weight ingrams

(r = .887, excluding dinosaurs, elephants, humans)

Page 18: 5-1 bivar. Unit 5 Correlation and Regression: Examining and Modeling Relationships Between Variables Chapters 8 - 12 Outline: Two variables Scatter Diagrams

5-18bivar.

all brain data,log

transformed

-10 0 10 20-5

0

5

10

r=.856 (all data)

log (body weight)

log (brainweight)

Page 19: 5-1 bivar. Unit 5 Correlation and Regression: Examining and Modeling Relationships Between Variables Chapters 8 - 12 Outline: Two variables Scatter Diagrams

5-19bivar.

COUPON

PRICE

0 5 10 1580

90

100

110

120

r = .883 (all data)r = .984 (without flower bonds)

(Siegel)

Page 20: 5-1 bivar. Unit 5 Correlation and Regression: Examining and Modeling Relationships Between Variables Chapters 8 - 12 Outline: Two variables Scatter Diagrams

5-20bivar.

Interpretation of Empirical Association

1. DescriptiveExample: Height versus Weight

2. CausalExample: Total Cost vs. Volume of Production

3. NonsenseExample: Polio Incidence vs. Soft Drink Sales

Page 21: 5-1 bivar. Unit 5 Correlation and Regression: Examining and Modeling Relationships Between Variables Chapters 8 - 12 Outline: Two variables Scatter Diagrams

5-21bivar.

Prediction Using Correlation

1. What is the best prediction of the dependent variable?What if the value of the independent variable is available?

2. What is the likely size of the prediction error?

Fundamental Principle of Prediction

1. Use the mean of the relevant group.

2. SD of the group gives the "likely size of error."

Page 22: 5-1 bivar. Unit 5 Correlation and Regression: Examining and Modeling Relationships Between Variables Chapters 8 - 12 Outline: Two variables Scatter Diagrams

5-22bivar.

Page 23: 5-1 bivar. Unit 5 Correlation and Regression: Examining and Modeling Relationships Between Variables Chapters 8 - 12 Outline: Two variables Scatter Diagrams

5-23bivar.

Diamond State Telephone Company

Demand for LINES versus Proposed MONTHLY charge per line ($)

10 15 20 25 30 35100

150

200

250

MONTHLY

LIN

ES

Page 24: 5-1 bivar. Unit 5 Correlation and Regression: Examining and Modeling Relationships Between Variables Chapters 8 - 12 Outline: Two variables Scatter Diagrams

5-24bivar.

Look At The Vertical StripCorresponding to the Given X

Value

Y

X

Page 25: 5-1 bivar. Unit 5 Correlation and Regression: Examining and Modeling Relationships Between Variables Chapters 8 - 12 Outline: Two variables Scatter Diagrams

5-25bivar.

10 15 20 25 30 35100

150

200

250

MONTHLY

LIN

ES

x

x

x

Graph of Averages

x

x

estimated LINES = 237.495 - 3.867 MONTHLY

Page 26: 5-1 bivar. Unit 5 Correlation and Regression: Examining and Modeling Relationships Between Variables Chapters 8 - 12 Outline: Two variables Scatter Diagrams

5-26bivar.

Page 27: 5-1 bivar. Unit 5 Correlation and Regression: Examining and Modeling Relationships Between Variables Chapters 8 - 12 Outline: Two variables Scatter Diagrams

5-27bivar.

Linearly Related Variables

The REGRESSION LINE is to a scatter diagram as the AVERAGE is to a list of numbers.

The regression line estimates the average values for the dependent variable, Y, corresponding to each value, x, of the independent variable.

Page 28: 5-1 bivar. Unit 5 Correlation and Regression: Examining and Modeling Relationships Between Variables Chapters 8 - 12 Outline: Two variables Scatter Diagrams

5-28bivar.

Linearly Related Variables

If we have 2 variables, linearly related to one another, then knowing the value of one variable (for a particular individual) can help to estimate / predict the value of the other variable.

• If we know nothing re. the value of the independent variable (X), then we estimate the value of the dependent variable to be the OVERALL AVERAGE of the dependent variable (Y).

• If we know that the independent variable (X) has a particular value for a given individual, then we can take a "more educated guess" at the value of the dependent variable (Y).

Page 29: 5-1 bivar. Unit 5 Correlation and Regression: Examining and Modeling Relationships Between Variables Chapters 8 - 12 Outline: Two variables Scatter Diagrams

5-29bivar.

Regression and SD Lines

The REGRESSION LINE for modeling the relation between X (independent variable) and Y (dependent variable) passes through the POINT OF AVERAGES and has slope

That is, associated with each increase of one SD in X, there is an increase of r SD’s in Y, on the average.

The SD LINE for modeling the relation between X (independent variable) and Y (dependent variable) passes through the POINT OF AVERAGES and has slope

r SDY

SDX

SDY

SDX

Page 30: 5-1 bivar. Unit 5 Correlation and Regression: Examining and Modeling Relationships Between Variables Chapters 8 - 12 Outline: Two variables Scatter Diagrams

5-30bivar.

Estimating the Intercept andSlope of the Regression Line

The REGRESSION LINE for modeling the relation between X (independent variable) and Y (dependent variable) is also known as

The REGRESSION LINE for predicting Y from X, and has the form

Y = a + b x

= intercept + slope x.

Here,b = slope

= r SD(Y) / SD(X)

a = intercept

= avg(Y) - b avg(X)

= avg(Y) - r [SD(Y) / SD(X)] avg(X)

Page 31: 5-1 bivar. Unit 5 Correlation and Regression: Examining and Modeling Relationships Between Variables Chapters 8 - 12 Outline: Two variables Scatter Diagrams

5-31bivar.

Prediction from aRegression Model

Predicted value of Y corresponding to a given value of X is

Y = a + b X

= ( Y - r SDY

SDX X ) + ( r SDY

SDX ) X

= Y - ( X - X ) ( r SDY

SDX )

= Y - ( X - X ) ( slope )

Page 32: 5-1 bivar. Unit 5 Correlation and Regression: Examining and Modeling Relationships Between Variables Chapters 8 - 12 Outline: Two variables Scatter Diagrams

5-32bivar.

Page 33: 5-1 bivar. Unit 5 Correlation and Regression: Examining and Modeling Relationships Between Variables Chapters 8 - 12 Outline: Two variables Scatter Diagrams

5-33bivar.

TOTAL OBSERVATIONS: 21

LINES MONTHLY

N OF CASES 21 21MINIMUM 105.000 10.320MAXIMUM 201.000 34.000MEAN 154.048 21.581VARIANCE 1122.648 69.623STANDARD DEV 33.506 8.344

PEARSON CORRELATION MATRIX

LINES MONTHLYLINES 1.000MONTHLY -0.963 1.000

NUMBER OF OBSERVATIONS: 21

Page 34: 5-1 bivar. Unit 5 Correlation and Regression: Examining and Modeling Relationships Between Variables Chapters 8 - 12 Outline: Two variables Scatter Diagrams

5-34bivar.

Diamond State Questions

In the Diamond State Telephone Company example, avg (LINES) = 154.048 SD (LINES) = 33.506 avg (MONTHLY) = 21.581 SD (MONTHLY) = 8.344

r = -0.963

What are the coordinates for the point of averages?

What is the slope of the regression line?

Suppose the MONTHLY charge was set at $25.00.What would you estimate to be the demand for # LINES from the 62 new businesses?

Suppose the MONTHLY charge was set at $15.00.What would you estimate to be the demand for # LINES from the 62 new businesses?

Page 35: 5-1 bivar. Unit 5 Correlation and Regression: Examining and Modeling Relationships Between Variables Chapters 8 - 12 Outline: Two variables Scatter Diagrams

5-35bivar.

Another Diamond State Question

Suppose the MONTHLY charge was set at $50.00.What would you estimate to be the demand for # LINES from the 62 new businesses?

Page 36: 5-1 bivar. Unit 5 Correlation and Regression: Examining and Modeling Relationships Between Variables Chapters 8 - 12 Outline: Two variables Scatter Diagrams

5-36bivar.

Regression Computer Output

RegressionDEP VAR: LINES N: 21 MULTIPLE R: 0.963 SQUARED MULTIPLE R: 0.927ADJ SQRD MULTIPLE R: 0.923 STANDARD ERROR OF ESTIMATE: 9.273

VARIABLE COEFF STD ERROR STD COEF TOLERANCE T P(2 TAIL)

CONSTANT 237.495 5.732 0.000 . 41.432 0.000 MONTHLY -3.867 0.249 -0.963 1.000 -15.560 0.000

ANALYSIS OF VARIANCE SOURCE SUM-OF-SQUARES DF MEAN-SQUARE F-RATIO P

REGRESSION 20819.092 1 20819.092 242.103 0.000 RESIDUAL 1633.860 19 85.993

------------------------------------------------------------------------------------------------------------------

Page 37: 5-1 bivar. Unit 5 Correlation and Regression: Examining and Modeling Relationships Between Variables Chapters 8 - 12 Outline: Two variables Scatter Diagrams

5-37bivar.

Interpreting theRegression Coefficients

Page 38: 5-1 bivar. Unit 5 Correlation and Regression: Examining and Modeling Relationships Between Variables Chapters 8 - 12 Outline: Two variables Scatter Diagrams

5-38bivar.

Other Examples

1. X = Educational expenditure Y = Test scores

2. X = Height of a person Y = Weight of the person

3. X = # Service years of an automobile Y = Operating cost per year

4. X = Total weight of mail bags Y = # Mail orders

5. X = Price of product Y = Unit sales

6. X = Volume Y = Total cost of production

7. X = Calories in a candy bar Y = Grams of fat in the candy bar

8. X = Baseball slugging percentage Y = Player salary

9. X = Weight of a diamond Y = Price of the diamond

10.

11.

12.

Page 39: 5-1 bivar. Unit 5 Correlation and Regression: Examining and Modeling Relationships Between Variables Chapters 8 - 12 Outline: Two variables Scatter Diagrams

5-39bivar.

Wood Products

TOTAL OBSERVATIONS: 23

SHIPMENT EMPLOY N OF CASES 23 23 MINIMUM 22.900 3.000 MAXIMUM 469.80079.000 MEAN 112.28719.783 VARIANCE 9931.683 281.087 STANDARD DEV 99.658 16.766

Pearson Correlation Matrix SHIPMENT EMPLOYSHIPMENT 1.00EMPLOY 0.979 1.00

Number of Observations: 23

Page 40: 5-1 bivar. Unit 5 Correlation and Regression: Examining and Modeling Relationships Between Variables Chapters 8 - 12 Outline: Two variables Scatter Diagrams

5-40bivar.

Page 41: 5-1 bivar. Unit 5 Correlation and Regression: Examining and Modeling Relationships Between Variables Chapters 8 - 12 Outline: Two variables Scatter Diagrams

5-41bivar.

y=ship,x=employ,line

0 10 20 30 40 50 60 70 800

100

200

300

400

500

EMPLOY

Page 42: 5-1 bivar. Unit 5 Correlation and Regression: Examining and Modeling Relationships Between Variables Chapters 8 - 12 Outline: Two variables Scatter Diagrams

5-42bivar.

y=employ,x=ship,line

0 100 200 300 400 5000

10

20

30

40

50

60

70

80

SHIPMENT

Page 43: 5-1 bivar. Unit 5 Correlation and Regression: Examining and Modeling Relationships Between Variables Chapters 8 - 12 Outline: Two variables Scatter Diagrams

5-43bivar.

Computer Output - 1

DEP VAR: SHIPMENT N: 23 MULT R: 0.979 SQRD MULT R: 0.958

ADJ SQRD MULTIPLE R: 0.956 STD ERROR OF ESTIMATE: 21.018

VARIABLE COEFF STD ERROR STD COEF TOLER T P(2 TAIL)

CONSTANT -2.781 6.868 0.000 . -0.4050.690

EMPLOY 5.817 0.267 0.979 1.000 21.7630.000

ANALYSIS OF VARIANCE

SOURCE SUM-OF-SQUARES DF MEAN-SQUARE F-RATIO P

REGRESSION .209220.316 1 209220.31 473.619 0.000

RESIDUAL 9276.710 21 441.748

-------------------------------------------------------------------------------------------

Page 44: 5-1 bivar. Unit 5 Correlation and Regression: Examining and Modeling Relationships Between Variables Chapters 8 - 12 Outline: Two variables Scatter Diagrams

5-44bivar.

Computer Output - 2

DEP VAR: EMPLOY N: 23 MULT R: 0.979 SQRD MULT R: 0.958

ADJ SQRD MULT R: 0.956 STD ERROR OF ESTIMATE: 3.536

VARIABLE COEFF STD ERROR STD COEF TOLER T P(2 TAIL)

CONSTANT 1.298 1.125 0.000 . 1.154 0.262

SHIPMENT 0.165 0.008 0.979 1.000 21.7630.000

ANALYSIS OF VARIANCE

SOURCE SUM-OF-SQUARES DF MEAN-SQUARE F-RATIO P

REGRESSION 5921.363 1 5921.363 473.619 0.000

RESIDUAL 262.550 21 12.502

--------------------------------------------------------------------------------------------

Page 45: 5-1 bivar. Unit 5 Correlation and Regression: Examining and Modeling Relationships Between Variables Chapters 8 - 12 Outline: Two variables Scatter Diagrams

5-45bivar.

Insurance Availability in Chicago

Page 46: 5-1 bivar. Unit 5 Correlation and Regression: Examining and Modeling Relationships Between Variables Chapters 8 - 12 Outline: Two variables Scatter Diagrams

5-46bivar.

Chicago Plots

Page 47: 5-1 bivar. Unit 5 Correlation and Regression: Examining and Modeling Relationships Between Variables Chapters 8 - 12 Outline: Two variables Scatter Diagrams

5-47bivar.

Chicago Insurance, cont.

For cases with income less than or equal to $15,000,avg (Voluntary) = 6.376 SD (Voluntary) = 3.959avg (Income) = $10,332.756 SD (Income) = $2,109.819 r = 0.896

Derive the equation for the regression line.

According to this linear model, what is the estimated value for "Voluntary" in a ZIP code area with Income $12,000?... with Income $9,500?

Page 48: 5-1 bivar. Unit 5 Correlation and Regression: Examining and Modeling Relationships Between Variables Chapters 8 - 12 Outline: Two variables Scatter Diagrams

5-48bivar.

blank

Page 49: 5-1 bivar. Unit 5 Correlation and Regression: Examining and Modeling Relationships Between Variables Chapters 8 - 12 Outline: Two variables Scatter Diagrams

5-49bivar.

Regression Effect

In virtually all test-retest situations, the bottom group on the first test will, on average, show some improvement on the 2nd test, and the top group will, on average, fall back.

This is called the REGRESSION EFFECT.

The REGRESSION FALLACY is thinking that the regression effect must be due to something important, not just the spread of points around the line.

Page 50: 5-1 bivar. Unit 5 Correlation and Regression: Examining and Modeling Relationships Between Variables Chapters 8 - 12 Outline: Two variables Scatter Diagrams

5-50bivar.

blank

Page 51: 5-1 bivar. Unit 5 Correlation and Regression: Examining and Modeling Relationships Between Variables Chapters 8 - 12 Outline: Two variables Scatter Diagrams

5-51bivar.

Residuals

Regression methods allow us to estimate the average value of the dependent variable for each value of the independent variable.

Individuals will differ somewhat from the regression estimates.

How much?

Page 52: 5-1 bivar. Unit 5 Correlation and Regression: Examining and Modeling Relationships Between Variables Chapters 8 - 12 Outline: Two variables Scatter Diagrams

5-52bivar.

blank

Country Economic Birth RateAlgeria 2 48

Argentina 19 21Denmark 34 14Germany 40 11

Guatemala 8 41India 12 37

Ireland 20 22Jamaica 20 31Japan 37 19

Philippines 19 42United States 30 15

Russia 46 18

Algeria

Page 53: 5-1 bivar. Unit 5 Correlation and Regression: Examining and Modeling Relationships Between Variables Chapters 8 - 12 Outline: Two variables Scatter Diagrams

5-53bivar.

Residuals

Prediction error = actual - predicted

= vertical distance from the point to the regression

line

Page 54: 5-1 bivar. Unit 5 Correlation and Regression: Examining and Modeling Relationships Between Variables Chapters 8 - 12 Outline: Two variables Scatter Diagrams

5-54bivar.

Residuals for Economically Active Women and Crude Birth Rates

Country Economic Birth Rate Regr.Estim. ResidualAlgeria 2 48 44.1 3.9

Argentina 19 21 30.5 -9.5Denmark 34 14 18.5 -4.5Germany 40 11 13.7 -2.7

Guatemala 8 41 39.3 1.7India 12 37 36.1 0.9

Ireland 20 22 29.7 -7.7Jamaica 20 31 29.7 1.3Japan 37 19 16.1 2.9

Philippines 19 42 30.5 11.5United States 30 15 21.7 -6.7

Russia 46 18 8.9 9.1

Page 55: 5-1 bivar. Unit 5 Correlation and Regression: Examining and Modeling Relationships Between Variables Chapters 8 - 12 Outline: Two variables Scatter Diagrams

5-55bivar.

Residual Plots

A residual plot should NOT look systematic(no trend or pattern) --just a cloud of points around the horizontal axis.

Problem plots also can tell us something about the data.

Page 56: 5-1 bivar. Unit 5 Correlation and Regression: Examining and Modeling Relationships Between Variables Chapters 8 - 12 Outline: Two variables Scatter Diagrams

5-56bivar.

Residual Plot for Economically Active Women and Crude Birth

Rates

-15

-10

-5

0

5

10

15

0 10 20 30 40 50

Percent Economically Active Women

Residuals

Page 57: 5-1 bivar. Unit 5 Correlation and Regression: Examining and Modeling Relationships Between Variables Chapters 8 - 12 Outline: Two variables Scatter Diagrams

5-57bivar.

Chicago Insurance CaseResidual Plot

(versus Income)

Page 58: 5-1 bivar. Unit 5 Correlation and Regression: Examining and Modeling Relationships Between Variables Chapters 8 - 12 Outline: Two variables Scatter Diagrams

5-58bivar.

The Least Squares Property

of the Regression LineOf all lines, the regression line is the one

which has smallest sum of squared residuals (and also the smallest rms error).

Thus, it is The Least Squares Line.

Page 59: 5-1 bivar. Unit 5 Correlation and Regression: Examining and Modeling Relationships Between Variables Chapters 8 - 12 Outline: Two variables Scatter Diagrams

5-59bivar.

Look at the Scatter DiagramBefore Fitting a Regression

Model !

For each of the following data sets, the regression equation is

Y = 3.0 + 0.5 X and r = 0.82

Sorry, I didn’t scan in these plots yet.

Page 60: 5-1 bivar. Unit 5 Correlation and Regression: Examining and Modeling Relationships Between Variables Chapters 8 - 12 Outline: Two variables Scatter Diagrams

5-60bivar.

blank

Page 61: 5-1 bivar. Unit 5 Correlation and Regression: Examining and Modeling Relationships Between Variables Chapters 8 - 12 Outline: Two variables Scatter Diagrams

5-61bivar.

How Big Are The Residuals ?

R.M.S. Error of the Regression Line:

The rms error of the regression line says how far typical points are above or below the regression line.

Standard Deviation of Y:

The SD of Y says how far typical point are above or below a horizontal line through the average of y.

In other words, the SD of y is the rms error for predicting y by its average, just ignoring the x-values.

Page 62: 5-1 bivar. Unit 5 Correlation and Regression: Examining and Modeling Relationships Between Variables Chapters 8 - 12 Outline: Two variables Scatter Diagrams

5-62bivar.

How Big Are The Residuals ?

The overall size of the residuals is measured by computing their standard deviation.

The average of the residuals is zero.

Computing the rms error of the regression line:

The rms error of the regression line estimating Y from X can be figured as

Note that here Y is the dependent variable!

The rms error is to the regression line

as

the SD is to the average.

Page 63: 5-1 bivar. Unit 5 Correlation and Regression: Examining and Modeling Relationships Between Variables Chapters 8 - 12 Outline: Two variables Scatter Diagrams

5-63bivar.

How Big Are the Residuals?

Recall the First -Order Linear Model:

= prediction error

= residual

The mean of the residuals is zero.The SD of the residuals is also known as the "root mean

squared error of the regression line" (rms error).

Y = β0 + β1X + ε

ε = (actual Y-value) - (predicted Y-value)

ε

ε

Page 64: 5-1 bivar. Unit 5 Correlation and Regression: Examining and Modeling Relationships Between Variables Chapters 8 - 12 Outline: Two variables Scatter Diagrams

5-64bivar.

The overall size of the residuals is measured by computing their standard deviation.

The rms error is to the regression line

as

the SD is to the average

Computing the rms error:

The rms error of the regression line estimating Y from X can be figured as

Notes:

1.

2. Here Y is the dependent variable !

3. Here we are dividing by n, rather than n-2.

rms error

1 - r2 SD(Y) ≤ ( )SD Y

1 - r2 SD (Y)

Page 65: 5-1 bivar. Unit 5 Correlation and Regression: Examining and Modeling Relationships Between Variables Chapters 8 - 12 Outline: Two variables Scatter Diagrams

5-65bivar.

Looking At Vertical Strips

Page 66: 5-1 bivar. Unit 5 Correlation and Regression: Examining and Modeling Relationships Between Variables Chapters 8 - 12 Outline: Two variables Scatter Diagrams

5-66bivar.

Looking At Vertical Strips

For an oval cloud of points,the points in a vertical strip are off the

regression line (up and down) by amounts similar in size to the rms error of the regression line.

If the diagram is heteroscedastic, the rms error should not be used for individual strips.

Page 67: 5-1 bivar. Unit 5 Correlation and Regression: Examining and Modeling Relationships Between Variables Chapters 8 - 12 Outline: Two variables Scatter Diagrams

5-67bivar.

Using the Normal Curve Inside A Vertical Strip

For an oval cloud of points,the SD within a vertical strip is about equal to

the rms error of the regression line.

Page 68: 5-1 bivar. Unit 5 Correlation and Regression: Examining and Modeling Relationships Between Variables Chapters 8 - 12 Outline: Two variables Scatter Diagrams

5-68bivar.

blank

Page 69: 5-1 bivar. Unit 5 Correlation and Regression: Examining and Modeling Relationships Between Variables Chapters 8 - 12 Outline: Two variables Scatter Diagrams

5-69bivar.

Uses for r:(1) Describes the clustering of the scatter diagram around the

SD line, relative to the SD's

(2) Says how the average value of y depends on x

r SD(Y)

1 SD(X)

(3) Gives the accuracy of the regression estimates (the SD of the prediction errors) via the rms error for the regression line

1-r2 SD(Y)

Page 70: 5-1 bivar. Unit 5 Correlation and Regression: Examining and Modeling Relationships Between Variables Chapters 8 - 12 Outline: Two variables Scatter Diagrams

5-70bivar.

coeff of determin-4How much of the variation of Y has been explained by X?

(How much better are we at predicting Y when we do know the value of X?

Compare

Var ( Y - Y ) versus Var ( Y - Y )

Var ( Y ) versus ( 1 - r2 ) Var ( Y )

Thus, the proportion of the variation of Y which is NOT explained by X is

Var ( Y - Y )Var ( Y )

= ( 1 - r2 ) Var ( Y )Var ( Y )

= 1 - r2

And the proportion of the variation of Y which IS explained by X is

r2