variable selection and model building part ii. statement of situation a common situation is that...

57
Variable selection and model building Part II

Upload: randolf-joseph

Post on 12-Jan-2016

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Variable selection and model building Part II. Statement of situation A common situation is that there is a large set of candidate predictor variables

Variable selection and model building

Part II

Page 2: Variable selection and model building Part II. Statement of situation A common situation is that there is a large set of candidate predictor variables

Statement of situation

• A common situation is that there is a large set of candidate predictor variables.

• (Note: The examples herein are not really that large.)

• Goal is to choose a small subset from the larger set so that the resulting regression model is simple and useful:– provides a good summary of the trend in the response

– and/or provides good predictions of response

– and/or provides good estimates of slope coefficients

Page 3: Variable selection and model building Part II. Statement of situation A common situation is that there is a large set of candidate predictor variables

Two basic methods of selecting predictors

• Stepwise regression: Enter and remove predictors, in a stepwise manner, until no justifiable reason to enter or remove more.

• Best subsets regression: Select the subset of predictors that do the best at meeting some well-defined objective criterion.

Page 4: Variable selection and model building Part II. Statement of situation A common situation is that there is a large set of candidate predictor variables

Two cautions!

• The list of candidate predictor variables must include all the variables that actually predict the response.

• There is no single criterion that will always be the best measure of the “best” regression equation.

Page 5: Variable selection and model building Part II. Statement of situation A common situation is that there is a large set of candidate predictor variables

Best subsets regression

…. or all possible subsets regression

Page 6: Variable selection and model building Part II. Statement of situation A common situation is that there is a large set of candidate predictor variables

Best subsets regression

• Consider all of the possible regression models from all of the possible combinations of the candidate predictors.

• Identify, for further evaluation, models with a subset of predictors that do the “best” at meeting some well-defined criteria.

• Further evaluate the models identified in the last step. Fine-tune the final model.

Page 7: Variable selection and model building Part II. Statement of situation A common situation is that there is a large set of candidate predictor variables

Example: Cement data

• Response y: heat evolved in calories during hardening of cement on a per gram basis

• Predictor x1: % of tricalcium aluminate

• Predictor x2: % of tricalcium silicate

• Predictor x3: % of tetracalcium alumino ferrite

• Predictor x4: % of dicalcium silicate

Page 8: Variable selection and model building Part II. Statement of situation A common situation is that there is a large set of candidate predictor variables

Example: Cement data

Page 9: Variable selection and model building Part II. Statement of situation A common situation is that there is a large set of candidate predictor variables

Why best subsets regression?

# of predictors (p-1)

# of regression models

1 2 : ( ) (x1)

2 4 : ( ) (x1) (x2) (x1, x2)

3 8: ( ) (x1) (x2) (x3) (x1, x2) (x1, x3) (x2, x3) (x1, x2, x3)

4 16: 1 none, 4 one, 6 two, 4 three, 1 four

Page 10: Variable selection and model building Part II. Statement of situation A common situation is that there is a large set of candidate predictor variables

Why best subsets regression?

• If there are p-1 possible predictors, then there are 2p-1 possible regression models containing the predictors.

• For example, 10 predictors yields 210 = 1024 possible regression models.

• A best subsets algorithm determines best subsets of each size, so that candidates for a final model can be identified by researcher.

Page 11: Variable selection and model building Part II. Statement of situation A common situation is that there is a large set of candidate predictor variables

Common ways of judging “best”

• Different criteria quantify different aspects of the regression model, so can lead to different choices for best set of predictors:– R-squared– Adjusted R-squared– MSE (or S = square root of MSE)– Mallow’s Cp

– (PRESS statistic)

Page 12: Variable selection and model building Part II. Statement of situation A common situation is that there is a large set of candidate predictor variables

Increase in R-squared

SSTO

SSE

SSTO

SSRR 12

• R2 can only increase as more variables are added.

• Use R-squared values to find the point where adding more predictors is not worthwhile, because it yields a very small increase in R-squared.

• Most often, used in combination with other criteria.

Page 13: Variable selection and model building Part II. Statement of situation A common situation is that there is a large set of candidate predictor variables

Best Subsets Regression: y versus x1, x2, x3, x4

Response is y

x x x x Vars R-Sq R-Sq(adj) C-p S 1 2 3 4

1 67.5 64.5 138.7 8.9639 X 1 66.6 63.6 142.5 9.0771 X 2 97.9 97.4 2.7 2.4063 X X 2 97.2 96.7 5.5 2.7343 X X 3 98.2 97.6 3.0 2.3087 X X X 3 98.2 97.6 3.0 2.3121 X X X 4 98.2 97.4 5.0 2.4460 X X X X

Cement example

Page 14: Variable selection and model building Part II. Statement of situation A common situation is that there is a large set of candidate predictor variables

Largest adjusted R-squared

MSESSTO

n

SSTO

SSE

pn

nRa

1

11

12

• Makes you pay a penalty for adding more predictors.

• According to this criterion, the best regression model is the one with the largest adjusted R-squared.

Page 15: Variable selection and model building Part II. Statement of situation A common situation is that there is a large set of candidate predictor variables

Smallest MSE

MSESSTO

n

SSTO

SSE

pn

nRa

1

11

12

• According to this criterion, the best regression model is the one with the smallest MSE.

• Adjusted R-squared increases only if MSE decreases, so the adjusted R-squared and MSE criteria yield the same models.

pn

yy

pn

SSEMSE ii

Page 16: Variable selection and model building Part II. Statement of situation A common situation is that there is a large set of candidate predictor variables

Best Subsets Regression: y versus x1, x2, x3, x4

Response is y

x x x x Vars R-Sq R-Sq(adj) C-p S 1 2 3 4

1 67.5 64.5 138.7 8.9639 X 1 66.6 63.6 142.5 9.0771 X 2 97.9 97.4 2.7 2.4063 X X 2 97.2 96.7 5.5 2.7343 X X 3 98.2 97.6 3.0 2.3087 X X X 3 98.2 97.6 3.0 2.3121 X X X 4 98.2 97.4 5.0 2.4460 X X X X

Cement example

Page 17: Variable selection and model building Part II. Statement of situation A common situation is that there is a large set of candidate predictor variables

Mallow’s Cp statistic

Page 18: Variable selection and model building Part II. Statement of situation A common situation is that there is a large set of candidate predictor variables

Mallow’s Cp statistic

• Cp estimates the size of the bias introduced in the estimates of the responses by having an underspecified model (a model with important predictors missing).

Page 19: Variable selection and model building Part II. Statement of situation A common situation is that there is a large set of candidate predictor variables

Biased prediction

• If there is no bias, the expected value of the observed responses and the expected value of the predicted responses both equal μY|x.

• Fitting the data with an underspecified model, introduces bias, , into predicted response at the ith data point.

iyE

iyE ˆ

iii yEyEB ˆ

Page 20: Variable selection and model building Part II. Statement of situation A common situation is that there is a large set of candidate predictor variables

Biased prediction

no biasbias

iyE ˆ iyE

iii yEyEB ˆ

Page 21: Variable selection and model building Part II. Statement of situation A common situation is that there is a large set of candidate predictor variables

Bias from an underspecified model

0

10

20

30 35 40 45

0

5

10

15

Height

We

ight

Weight = -1.22 + 0.283 Height + 0.111 Water, MSE = 0.017 Weight = -4.14 + 0.389 Height, MSE = 0.653

Page 22: Variable selection and model building Part II. Statement of situation A common situation is that there is a large set of candidate predictor variables

Variation in predicted responses

• Because of bias, variance in the predicted responses for data point i is due to two things:– random sampling variation – variance associated with the bias

2ˆ iy

2iB

Page 23: Variable selection and model building Part II. Statement of situation A common situation is that there is a large set of candidate predictor variables

Total variation in predicted responses

If there is no bias, Γp achieves its smallest value, p:

n

iii

n

iyp yEyEi

1

2

1

2ˆ2

ˆ1

Sum the two variance components over all n data points to obtain a measure of the total variation in the predicted responses:

pn

iyp i

01

1

2ˆ2

Page 24: Variable selection and model building Part II. Statement of situation A common situation is that there is a large set of candidate predictor variables

A good measure of an underspecified model

n

iii

n

iyp yEyEi

1

2

1

2ˆ2

ˆ1

So, Γp seems to be a good measure of an underspecified model:

The best model is simply the one with the smallest value of Γp. We even know that the theoretical minimum of Γp is p.

Page 25: Variable selection and model building Part II. Statement of situation A common situation is that there is a large set of candidate predictor variables

Cp as an estimate of Γp

If we know the population variance σ2, we can estimate Γp:

2

2

pnMSE

pC pp

where MSEp is the mean squared error from fitting the model containing the subset of p-1 predictors (p parameters).

Page 26: Variable selection and model building Part II. Statement of situation A common situation is that there is a large set of candidate predictor variables

Mallow’s Cp statistic

But we don’t know σ2. So, estimate it using MSEall, the mean squared error obtained from fitting the model containing all of the predictors.

all

allpp MSE

pnMSEMSEpC

• Estimating σ2 using MSEall :

assumes that there are no biases in the full model with all of the predictors, an assumption that may or may not be valid, but can’t be tested without additional information.

guarantees that Cp = p for the full model.

Page 27: Variable selection and model building Part II. Statement of situation A common situation is that there is a large set of candidate predictor variables

Summary facts about Mallow’s Cp

• Subset models with small Cp values have a small total (standardized) variance of prediction.

• When the Cp value is …– near p, the bias is small (next to none),

– much greater than p, the bias is substantial,

– below p, it is due to sampling error; interpret as no bias.

• For the largest model with all possible predictors, Cp= p (always).

Page 28: Variable selection and model building Part II. Statement of situation A common situation is that there is a large set of candidate predictor variables

Using the Cp criterion

• Identify subsets of predictors for which the Cp value is near p (if possible).– The full model always yields Cp= p, so don’t select the

full model based on Cp.

– If all models, except the full model, yield a large Cp not near p, it suggests some important predictor(s) are missing from the analysis.

– When more than one model has a Cp value near p, in general, choose the simpler model or the model that meets your research needs.

Page 29: Variable selection and model building Part II. Statement of situation A common situation is that there is a large set of candidate predictor variables

Best Subsets Regression: y versus x1, x2, x3, x4

Response is y

x x x x Vars R-Sq R-Sq(adj) C-p S 1 2 3 4

1 67.5 64.5 138.7 8.9639 X 1 66.6 63.6 142.5 9.0771 X 2 97.9 97.4 2.7 2.4063 X X 2 97.2 96.7 5.5 2.7343 X X 3 98.2 97.6 3.0 2.3087 X X X 3 98.2 97.6 3.0 2.3121 X X X 4 98.2 97.4 5.0 2.4460 X X X X

Cement example

Page 30: Variable selection and model building Part II. Statement of situation A common situation is that there is a large set of candidate predictor variables

The regression equation isy = 62.4 + 1.55 x1 + 0.510 x2 + 0.102 x3 - 0.144 x4

Source DF SS MS F PRegression 4 2667.90 666.97 111.48 0.000Residual Error 8 47.86 5.98Total 12 2715.76

The regression equation is y = 52.6 + 1.47 x1 + 0.662 x2

Source DF SS MS F PRegression 2 2657.9 1328.9 229.50 0.000Residual Error 10 57.9 5.8Total 12 2715.8

7.2

98.5

31398.58.53

all

allpp MSE

pnMSEMSEpC

Page 31: Variable selection and model building Part II. Statement of situation A common situation is that there is a large set of candidate predictor variables

The regression equation isy = 62.4 + 1.55 x1 + 0.510 x2 + 0.102 x3 - 0.144 x4

Source DF SS MS F PRegression 4 2667.90 666.97 111.48 0.000Residual Error 8 47.86 5.98Total 12 2715.76

The regression equation is y = 103 + 1.44 x1 - 0.614 x4

Source DF SS MS F PRegression 2 2641.0 1320.5 176.63 0.000Residual Error 10 74.8 7.5Total 12 2715.8

5.5

98.5

31398.55.73

all

allpp MSE

pnMSEMSEpC

Page 32: Variable selection and model building Part II. Statement of situation A common situation is that there is a large set of candidate predictor variables

Best Subsets Regression: y versus x1, x2, x3, x4

Response is y

x x x x Vars R-Sq R-Sq(adj) C-p S 1 2 3 4

1 67.5 64.5 138.7 8.9639 X 1 66.6 63.6 142.5 9.0771 X 2 97.9 97.4 2.7 2.4063 X X 2 97.2 96.7 5.5 2.7343 X X 3 98.2 97.6 3.0 2.3087 X X X 3 98.2 97.6 3.0 2.3121 X X X 4 98.2 97.4 5.0 2.4460 X X X X

Cement example

Page 33: Variable selection and model building Part II. Statement of situation A common situation is that there is a large set of candidate predictor variables

The regression equation isy = 71.6 + 1.45 x1 + 0.416 x2 - 0.237 x4

Predictor Coef SE Coef T P VIFConstant 71.65 14.14 5.07 0.001x1 1.4519 0.1170 12.41 0.000 1.1x2 0.4161 0.1856 2.24 0.052 18.8x4 -0.2365 0.1733 -1.37 0.205 18.9

S = 2.309 R-Sq = 98.2% R-Sq(adj) = 97.6%

Analysis of Variance

Source DF SS MS F PRegression 3 2667.79 889.26 166.83 0.000Residual Error 9 47.97 5.33Total 12 2715.76

Page 34: Variable selection and model building Part II. Statement of situation A common situation is that there is a large set of candidate predictor variables

The regression equation isy = 48.2 + 1.70 x1 + 0.657 x2 + 0.250 x3

Predictor Coef SE Coef T P VIFConstant 48.194 3.913 12.32 0.000x1 1.6959 0.2046 8.29 0.000 3.3x2 0.65691 0.04423 14.85 0.000 1.1x3 0.2500 0.1847 1.35 0.209 3.1

S = 2.312 R-Sq = 98.2% R-Sq(adj) = 97.6%

Analysis of Variance

Source DF SS MS F PRegression 3 2667.65 889.22 166.34 0.000Residual Error 9 48.11 5.35Total 12 2715.76

Page 35: Variable selection and model building Part II. Statement of situation A common situation is that there is a large set of candidate predictor variables

The regression equation isy = 52.6 + 1.47 x1 + 0.662 x2

Predictor Coef SE Coef T P VIFConstant 52.577 2.286 23.00 0.000x1 1.4683 0.1213 12.10 0.000 1.1x2 0.66225 0.04585 14.44 0.000 1.1

S = 2.406 R-Sq = 97.9% R-Sq(adj) = 97.4%

Analysis of Variance

Source DF SS MS F PRegression 2 2657.9 1328.9 229.50 0.000Residual Error 10 57.9 5.8Total 12 2715.8

Page 36: Variable selection and model building Part II. Statement of situation A common situation is that there is a large set of candidate predictor variables

Stepwise Regression: y versus x1, x2, x3, x4 Alpha-to-Enter: 0.15 Alpha-to-Remove: 0.15 Response is y on 4 predictors, with N = 13

Step 1 2 3 4Constant 117.57 103.10 71.65 52.58

x4 -0.738 -0.614 -0.237 T-Value -4.77 -12.62 -1.37 P-Value 0.001 0.000 0.205

x1 1.44 1.45 1.47T-Value 10.40 12.41 12.10P-Value 0.000 0.000 0.000

x2 0.416 0.662T-Value 2.24 14.44P-Value 0.052 0.000

S 8.96 2.73 2.31 2.41R-Sq 67.45 97.25 98.23 97.87R-Sq(adj) 64.50 96.70 97.64 97.44C-p 138.7 5.5 3.0 2.7

Page 37: Variable selection and model building Part II. Statement of situation A common situation is that there is a large set of candidate predictor variables

Residual analysis

115110105100959085807570

2

1

0

-1

Fitted Value

Sta

ndar

diz

ed

Re

sid

ual

Residuals Versus the Fitted Values(response is y)

Page 38: Variable selection and model building Part II. Statement of situation A common situation is that there is a large set of candidate predictor variables

Residual analysis

P-Value (approx): > 0.1000R: 0.9566W-test for Normality

N: 13StDev: 1.02369Average: 0.0024300

10-1

.999

.99

.95

.80

.50

.20

.05

.01

.001

Pro

babi

lity

SRES1

Normal Probability Plot

Page 39: Variable selection and model building Part II. Statement of situation A common situation is that there is a large set of candidate predictor variables

Example: Modeling PIQ

130.5

91.5

100.728

86.283

73.25

65.75

130.591.5

170.5

127.5

100.72886.283

73.2565.75

170.5127.5

PIQ

MRI

Height

Weight

Page 40: Variable selection and model building Part II. Statement of situation A common situation is that there is a large set of candidate predictor variables

Best Subsets Regression: PIQ versus MRI, Height, WeightResponse is PIQ

H W e e i i M g g R h h Vars R-Sq R-Sq(adj) C-p S I t t

1 14.3 11.9 7.3 21.212 X 1 0.9 0.0 13.8 22.810 X 2 29.5 25.5 2.0 19.510 X X 2 19.3 14.6 6.9 20.878 X X 3 29.5 23.3 4.0 19.794 X X X

Page 41: Variable selection and model building Part II. Statement of situation A common situation is that there is a large set of candidate predictor variables

Stepwise Regression: PIQ versus MRI, Height, Weight Alpha-to-Enter: 0.15 Alpha-to-Remove: 0.15 Response is PIQ on 3 predictors, with N = 38

Step 1 2Constant 4.652 111.276

MRI 1.18 2.06T-Value 2.45 3.77P-Value 0.019 0.001

Height -2.73T-Value -2.75P-Value 0.009

S 21.2 19.5R-Sq 14.27 29.49R-Sq(adj) 11.89 25.46C-p 7.3 2.0

Page 42: Variable selection and model building Part II. Statement of situation A common situation is that there is a large set of candidate predictor variables

Example: Modeling BP

120

110

53.25

47.75

97.325

89.375

2.125

1.875

8.275

4.425

72.5

65.5

120110

76.25

30.75

53.2547.75

97.32589.375

2.1251.875

8.2754.425

72.565.576.25

30.75

BP

Age

Weight

BSA

Duration

Pulse

Stress

Page 43: Variable selection and model building Part II. Statement of situation A common situation is that there is a large set of candidate predictor variables

Best Subsets Regression: BP versus Age, Weight, ...Response is BP D u W r S e a P t i t u r A g B i l e g h S o s s Vars R-Sq R-Sq(adj) C-p S e t A n e s

1 90.3 89.7 312.8 1.7405 X 1 75.0 73.6 829.1 2.7903 X 2 99.1 99.0 15.1 0.53269 X X 2 92.0 91.0 256.6 1.6246 X X 3 99.5 99.4 6.4 0.43705 X X X 3 99.2 99.1 14.1 0.52012 X X X 4 99.5 99.4 6.4 0.42591 X X X X 4 99.5 99.4 7.1 0.43500 X X X X 5 99.6 99.4 7.0 0.42142 X X X X X 5 99.5 99.4 7.7 0.43078 X X X X X 6 99.6 99.4 7.0 0.40723 X X X X X X

Page 44: Variable selection and model building Part II. Statement of situation A common situation is that there is a large set of candidate predictor variables

Stepwise Regression: BP versus Age, Weight, BSA, Duration, Pulse, Stress Alpha-to-Enter: 0.15 Alpha-to-Remove: 0.15 Response is BP on 6 predictors, with N = 20

Step 1 2 3Constant 2.205 -16.579 -13.667

Weight 1.201 1.033 0.906T-Value 12.92 33.15 18.49P-Value 0.000 0.000 0.000

Age 0.708 0.702T-Value 13.23 15.96P-Value 0.000 0.000

BSA 4.6T-Value 3.04P-Value 0.008

S 1.74 0.533 0.437R-Sq 90.26 99.14 99.45R-Sq(adj) 89.72 99.04 99.35C-p 312.8 15.1 6.4

Page 45: Variable selection and model building Part II. Statement of situation A common situation is that there is a large set of candidate predictor variables

The regression equation isBP = - 12.9 + 0.683 Age + 0.897 Weight + 4.86 BSA + 0.0665 Dur

Predictor Coef SE Coef T P VIFConstant -12.852 2.648 -4.85 0.000Age 0.68335 0.04490 15.22 0.000 1.3Weight 0.89701 0.04818 18.62 0.000 4.5BSA 4.860 1.492 3.26 0.005 4.3Dur 0.06653 0.04895 1.36 0.194 1.2

S = 0.4259 R-Sq = 99.5% R-Sq(adj) = 99.4%

Analysis of VarianceSource DF SS MS F PRegression 4 557.28 139.32 768.01 0.000Residual Error 15 2.72 0.18Total 19 560.00

Page 46: Variable selection and model building Part II. Statement of situation A common situation is that there is a large set of candidate predictor variables

The regression equation isBP = - 13.7 + 0.702 Age + 0.906 Weight + 4.63 BSA

Predictor Coef SE Coef T P VIFConstant -13.667 2.647 -5.16 0.000Age 0.70162 0.04396 15.96 0.000 1.2Weight 0.90582 0.04899 18.49 0.000 4.4BSA 4.627 1.521 3.04 0.008 4.3

S = 0.4370 R-Sq = 99.5% R-Sq(adj) = 99.4%

Analysis of Variance

Source DF SS MS F PRegression 3 556.94 185.65 971.93 0.000Residual Error 16 3.06 0.19Total 19 560.00

Page 47: Variable selection and model building Part II. Statement of situation A common situation is that there is a large set of candidate predictor variables

The regression equation isBP = - 16.6 + 0.708 Age + 1.03 Weight

Predictor Coef SE Coef T P VIFConstant -16.579 3.007 -5.51 0.000Age 0.70825 0.05351 13.23 0.000 1.2Weight 1.03296 0.03116 33.15 0.000 1.2

S = 0.5327 R-Sq = 99.1% R-Sq(adj) = 99.0%

Analysis of Variance

Source DF SS MS F PRegression 2 555.18 277.59 978.25 0.000Residual Error 17 4.82 0.28Total 19 560.00

Page 48: Variable selection and model building Part II. Statement of situation A common situation is that there is a large set of candidate predictor variables

Best subsets regression

• Stat >> Regression >> Best subsets …

• Specify response and all possible predictors.

• If desired, specify predictors that must be included in every model. – (Researcher’s knowledge!)

• Select OK. Results appear in session window.

Page 49: Variable selection and model building Part II. Statement of situation A common situation is that there is a large set of candidate predictor variables

Model building strategy

Page 50: Variable selection and model building Part II. Statement of situation A common situation is that there is a large set of candidate predictor variables

The first step

• Decide on the type of model needed– Predictive: model used to predict the response

variable from a chosen set of predictors.– Theoretical: model based on theoretical

relationship between response and predictors.– Control: model used to control a response

variable by manipulating predictor variables.

Page 51: Variable selection and model building Part II. Statement of situation A common situation is that there is a large set of candidate predictor variables

The first step (cont’d)

• Decide on the type of model needed– Inferential: model used to explore strength of

relationships between response and predictors.– Data summary: model used merely as a way to

summarize a large set of data by a single equation.

Page 52: Variable selection and model building Part II. Statement of situation A common situation is that there is a large set of candidate predictor variables

The second step

• Decide which predictor variables and response variable on which to collect the data.

• Collect the data.

Page 53: Variable selection and model building Part II. Statement of situation A common situation is that there is a large set of candidate predictor variables

The third step

• Explore the data– Check for outliers, gross data errors, missing

values on a univariate basis.– Study bivariate relationships to reveal other

outliers, to suggest possible transformations, to identify possible multicollinearities.

Page 54: Variable selection and model building Part II. Statement of situation A common situation is that there is a large set of candidate predictor variables

The fourth step

• Randomly divide the data into a training set and a test set:– The training set, with at least 15-20 error d.f.,

is used to fit the model.– The test set is used for cross-validation of the

fitted model.

Page 55: Variable selection and model building Part II. Statement of situation A common situation is that there is a large set of candidate predictor variables

The fifth step

• Using the training set, fit several candidate models:– Use best subsets regression.– Use stepwise regression (only gives one model

unless specifies different alpha-to-remove and alpha-to-enter values).

Page 56: Variable selection and model building Part II. Statement of situation A common situation is that there is a large set of candidate predictor variables

The sixth step

• Select and evaluate a few “good” models:– Select based on adjusted R2, Mallow’s Cp,

number and nature of predictors.– Evaluate selected models for violation of model

assumptions.– If none of the models provide a satisfactory fit,

try something else, such as more data, different predictors, a different class of model …

Page 57: Variable selection and model building Part II. Statement of situation A common situation is that there is a large set of candidate predictor variables

The final step

• Select the final model:– Compare competing models by cross-validating

them against the test data.– The model with a larger cross-validation R2 is a

better predictive model.– Consider residual plots, outliers, parsimony,

relevance, and ease of measurement of predictors.