evaluating theoretical models r-squared represents the proportion of the variance in y that is...

15
Evaluating Theoretical Models R-squared represents the proportion of the variance in Y that is accounted for by the model. When the model doesn’t do any better than guessing the mean (i.e., if we assume X does not cause Y), R 2 will equal zero. When the model is perfect (i.e., it accounts for the data perfectly), R 2 will equal 1.00. Y error VAR VAR R 1 2

Upload: dwain-daniels

Post on 16-Dec-2015

213 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Evaluating Theoretical Models R-squared represents the proportion of the variance in Y that is accounted for by the model. When the model doesn’t do any

Evaluating Theoretical Models

• R-squared represents the proportion of the variance in Y that is accounted for by the model.

• When the model doesn’t do any better than guessing the mean (i.e., if we assume X does not cause Y), R2 will equal zero. When the model is perfect (i.e., it accounts for the data perfectly), R2 will equal 1.00.

Y

error

VAR

VARR 12

Page 2: Evaluating Theoretical Models R-squared represents the proportion of the variance in Y that is accounted for by the model. When the model doesn’t do any

Why is R2 useful?

• R2 is useful because it is a standard metric for interpreting model fit.– It doesn’t matter how large the variance of Y is because

everything is evaluated relative to the variance of Y

– Set end-points: 1 is perfect and 0 is as bad as a model can be.

Page 3: Evaluating Theoretical Models R-squared represents the proportion of the variance in Y that is accounted for by the model. When the model doesn’t do any

Why is R2 useful?

• Finally, and importantly, we can begin to compare the relative fit of alternative models

• Why is this useful?• When we began our discussion of modeling, we noted that

there are ways to estimate parameter values, assuming the basic model is correct.

• Now, we can begin to address the question of whether the basic model is correct (or, more specifically, how good it is) by studying the model’s R2 and comparing it to the R2 of competing models.

Page 4: Evaluating Theoretical Models R-squared represents the proportion of the variance in Y that is accounted for by the model. When the model doesn’t do any

Example

• Data

Person x y

1 -2 -11.6

2 -1 -4.4

3 0 1.0

4 1 0.4

5 2 -3.6

Page 5: Evaluating Theoretical Models R-squared represents the proportion of the variance in Y that is accounted for by the model. When the model doesn’t do any

Model with no x

• The most basic model we can study is one in which Y-hat = My

• Recall, that the predicted values yield a horizontal line centered at the mean of Y (-4 in this example)

Page 6: Evaluating Theoretical Models R-squared represents the proportion of the variance in Y that is accounted for by the model. When the model doesn’t do any

Model with no x

• The variance of Y is 18 (rounded)

• The dotted lines here represent the error in prediction

• If we square these errors, we find the average squared error to be approximately 18

• Thus, R2 for this model is 1-(18/18) or 0.

Page 7: Evaluating Theoretical Models R-squared represents the proportion of the variance in Y that is accounted for by the model. When the model doesn’t do any

Model with linear term

• Next, let’s see what happens if we study a linear model of form Y-hat = a + bX

• The average squared error in this example is 10.07

• R2 is .44 (1 – (10/18)). The linear model accounts for 44% of the variance in Y.

Page 8: Evaluating Theoretical Models R-squared represents the proportion of the variance in Y that is accounted for by the model. When the model doesn’t do any

Model with a quadratic term

• Next, let’s see what happens if we study a model of form Y-hat = a + bX2

• The average squared error in this example is approximately 8.

• R2 is .55 (1 – (8/18)). The quadratic model accounts for 55% of the variance in Y (11% more than the linear model).

Page 9: Evaluating Theoretical Models R-squared represents the proportion of the variance in Y that is accounted for by the model. When the model doesn’t do any

Model with linear and quadratic terms

• Next, let’s see what happens if we study a linear + quad model of form Y-hat = a + bX + cX2

• The average squared error in this case is about .10.

• R2 is .99 (1 – (.10/18)). The linear + quadratic model accounts for 99% of the variance in Y (44% more than the quadratic model alone).

2225.ˆ XXY

Page 10: Evaluating Theoretical Models R-squared represents the proportion of the variance in Y that is accounted for by the model. When the model doesn’t do any

Summary of model comparisons

• Summary of the fit statistics for the various models

Model R2

No X .00

Linear .44

Quadratic .55

Linear + Quad .99

Page 11: Evaluating Theoretical Models R-squared represents the proportion of the variance in Y that is accounted for by the model. When the model doesn’t do any

Summary

• So, it looks like the model that combines the linear and the quadratic terms is the best model, of the four that we studied. It accounts for the data almost perfectly (99% of the variance in Y was explained by the model)

• Note: Even if the model does a decent job at explaining the variation in Y, it isn’t proper to conclude that it is correct.

• It might be the best model of those that were articulated, even if it is not literally correct.

Page 12: Evaluating Theoretical Models R-squared represents the proportion of the variance in Y that is accounted for by the model. When the model doesn’t do any

Residual term

• The part of Y that is unexplained by the model is called residual or error variance, and is often represented an an explicit variable in the model.

• This variable is often called the residual or error term, and is typically denoted by the Greek symbol epsilon or the Roman letter E. The variance of the residual scores is identical to the proportion of variance in Y that is unexplained by the model. If the model is good, the residual variance will be very small.

EbXaY

Page 13: Evaluating Theoretical Models R-squared represents the proportion of the variance in Y that is accounted for by the model. When the model doesn’t do any

Residual Term

• DATA = MODEL + RESIDUAL

EbXaY

EYY ˆ

Page 14: Evaluating Theoretical Models R-squared represents the proportion of the variance in Y that is accounted for by the model. When the model doesn’t do any

Y= a + bX + bX2 + E

A -11.6 .5 2-2 2-22

-0.1

B -4.4 .5 2-1 2-12

-0.9

C 0.4 .5 2 0 2 02 0.5

D 0.4 .5 2 1 2 12 -0.1

E -3.6 .5 2 2 2 22 -0.1

EbXbXaY 2

E is approximate

Page 15: Evaluating Theoretical Models R-squared represents the proportion of the variance in Y that is accounted for by the model. When the model doesn’t do any

• In the next class we will discuss three reasons why the error variance is greater than zero.– errors of measurement

– sampling error

– incorrect model