stepwise regression sas. download the data atdata.htm atdata.htm

30
Stepwise Regression SAS

Upload: felix-hutchinson

Post on 03-Jan-2016

219 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Stepwise Regression SAS. Download the Data  atData.htm atData.htm

Stepwise Regression

SAS

Page 2: Stepwise Regression SAS. Download the Data  atData.htm atData.htm

Download the Data

• http://core.ecu.edu/psyc/wuenschk/StatData/StatData.htm

3.2 625 540 65 2.74.1 575 680 75 4.53.0 520 480 65 2.52.6 545 520 55 3.13.7 520 490 75 3.64.0 655 535 65 4.34.3 630 720 75 4.62.7 500 500 75 3.0 and so on

Page 3: Stepwise Regression SAS. Download the Data  atData.htm atData.htm

Download the SAS Code

• http://core.ecu.edu/psyc/wuenschk/SAS/SAS-Programs.htm

data grades; infile 'C:\Users\Vati\Documents\StatData\MultReg.dat'; input GPA GRE_Q GRE_V MAT AR;PROC REG; a: MODEL GPA = GRE_Q GRE_V MAT AR / STB SCORR2 selection=forward slentry = .05 details; run;

Page 4: Stepwise Regression SAS. Download the Data  atData.htm atData.htm

Forward Selection, Step 1Statistics for EntryDF = 1,28Variable Tolerance Model

R-SquareF Value Pr > F

GRE_Q 1.000000 0.3735 16.69 0.0003GRE_V 1.000000 0.3381 14.30 0.0008MAT 1.000000 0.3651 16.10 0.0004AR 1.000000 0.3853 17.55 0.0003

All predictors have p < the slentry value of .05.AR has the lowest p.AR enters first.

Page 5: Stepwise Regression SAS. Download the Data  atData.htm atData.htm

Step 2Statistics for EntryDF = 1,27Variable Tolerance Model

R-SquareF Value Pr > F

GRE_Q 0.742099 0.5033 6.41 0.0174GRE_V 0.835714 0.5155 7.26 0.0120MAT 0.724599 0.4923 5.69 0.0243

All predictors have p < the slentry value of .05.GRE-V has the lowest p.GRE-V enters second.

Page 6: Stepwise Regression SAS. Download the Data  atData.htm atData.htm

Step 3Statistics for EntryDF = 1,26Variable Tolerance Model

R-SquareF Value Pr > F

GRE_Q 0.659821 0.5716 3.41 0.0764MAT 0.670304 0.5719 3.42 0.0756

No predictor has p < .05, forward selection terminates.

Page 7: Stepwise Regression SAS. Download the Data  atData.htm atData.htm

The Final ModelParameter Estimates

Variable DF ParameterEstimate

StandardError

t Value Pr > |t| StandardizedEstimate

SquaredSemi-partialCorr Type II

Intercept 1 0.49718 0.57652 0.86 0.3961 0 .

GRE_V 1 0.00285 0.00106 2.69 0.0120 0.39470 0.13020

AR 1 0.32963 0.10483 3.14 0.0040 0.46074 0.17740

R2 = .516, F(2, 27) = 14.36, p < .001

Page 8: Stepwise Regression SAS. Download the Data  atData.htm atData.htm

Backward Selection

b: MODEL GPA = GRE_Q GRE_V MAT AR / STB SCORR2

selection=backward slstay = .05 details; run;• We start out with a simultaneous multiple

regression, including all predictors.• Then we trim that model.

Page 9: Stepwise Regression SAS. Download the Data  atData.htm atData.htm

Step 1

Variable ParameterEstimate

StandardError

Type II SS F Value Pr > F

Intercept -1.73811 0.95074 0.50153 3.34 0.0795GRE_Q 0.00400 0.00183 0.71582 4.77 0.0385GRE_V 0.00152 0.00105 0.31588 2.11 0.1593MAT 0.02090 0.00955 0.71861 4.79 0.0382AR 0.14423 0.11300 0.24448 1.63 0.2135

GRE-V and AR have p values that exceed the slstay value of .05.AR has the larger p, it is dropped from the model.

Page 10: Stepwise Regression SAS. Download the Data  atData.htm atData.htm

Step 2Statistics for RemovalDF = 1,26Variable Partial

R-SquareModelR-Square

F Value Pr > F

GRE_Q 0.1236 0.4935 8.39 0.0076GRE_V 0.0340 0.5830 2.31 0.1405MAT 0.1318 0.4852 8.95 0.0060

Only GRE_V has p > .05, it is dropped from the model.

Page 11: Stepwise Regression SAS. Download the Data  atData.htm atData.htm

Step 3Statistics for RemovalDF = 1,27Variable Partial

R-SquareModelR-Square

F Value Pr > F

GRE_Q 0.2179 0.3651 14.11 0.0008MAT 0.2095 0.3735 13.56 0.0010

No predictor has p < .05, backwards elimination halts.

Page 12: Stepwise Regression SAS. Download the Data  atData.htm atData.htm

The Final ModelParameter Estimates

Variable DF ParameterEstimate

StandardError

t Value Pr > |t| StandardizedEstimate

SquaredSemi-partialCorr Type II

Intercept 1 -2.12938 0.92704 -2.30 0.0296 0 .

GRE_Q 1 0.00598 0.00159 3.76 0.0008 0.48438 0.21791

MAT 1 0.03081 0.00836 3.68 0.0010 0.47494 0.20950

R2 = .5183, F(2, 27) = 18.87, p < .001

Page 13: Stepwise Regression SAS. Download the Data  atData.htm atData.htm

What the F Test?

• Forward selection led to a model with AR and GRE_V

• Backward selection led to a model with MAT and GRE_Q.

• I am getting suspicious about the utility of procedures like this.

Page 14: Stepwise Regression SAS. Download the Data  atData.htm atData.htm

Fully Stepwise Selection

c: MODEL GPA = GRE_Q GRE_V MAT AR / STB SCORR2

selection=stepwise slentry=.08 slstay = .08 details; run;• Like forward selection, but, once added to

the model, a predictor is considered for elimination in subsequent steps.

Page 15: Stepwise Regression SAS. Download the Data  atData.htm atData.htm

Step 3

• Steps 1 and 2 are identical to those of forward selection, but with slentry set to .08, MAT enters the model.

Statistics for EntryDF = 1,26Variable Tolerance Model

R-SquareF Value Pr > F

GRE_Q 0.659821 0.5716 3.41 0.0764MAT 0.670304 0.5719 3.42 0.0756

Page 16: Stepwise Regression SAS. Download the Data  atData.htm atData.htm

Step 4

• GRE_Q enters. Now we have every predictor in the model

Statistics for EntryDF = 1,25Variable Tolerance Model

R-SquareF Value Pr > F

GRE_Q 0.653236 0.6405 4.77 0.0385

Page 17: Stepwise Regression SAS. Download the Data  atData.htm atData.htm

Step 5

• Once GRE_Q is in the model, AR and GRE_V become eligible for removal.

Statistics for RemovalDF = 1,25Variable Partial

R-SquareModelR-Square

F Value Pr > F

GRE_Q 0.0686 0.5719 4.77 0.0385GRE_V 0.0303 0.6102 2.11 0.1593MAT 0.0689 0.5716 4.79 0.0382AR 0.0234 0.6170 1.63 0.2135

Page 18: Stepwise Regression SAS. Download the Data  atData.htm atData.htm

Step 6

• AR out, GRE_V still eligible for removal.

Statistics for RemovalDF = 1,26Variable Partial

R-SquareModelR-Square

F Value Pr > F

GRE_Q 0.1236 0.4935 8.39 0.0076GRE_V 0.0340 0.5830 2.31 0.1405MAT 0.1318 0.4852 8.95 0.0060

Page 19: Stepwise Regression SAS. Download the Data  atData.htm atData.htm

Step 7

• At this point, no variables in the model are eligible for removal

• And no variables not in the model are eligible for entry.

• The final model includes MAT and GRE_Q• Same as the final model with backwards

selection.

Page 20: Stepwise Regression SAS. Download the Data  atData.htm atData.htm

R-Square Selection

• d: MODEL GPA = GRE_Q GRE_V MAT AR / selection=rsquare cp mse; run;

• Test all one predictor models, all two predictor models, and so on.

• Goal is the get highest R2 with fewer than all predictors.

Page 21: Stepwise Regression SAS. Download the Data  atData.htm atData.htm

One Predictor Models

Number inModel

R-Square C(p) MSE Variables in Model

1 0.3853 16.7442 0.22908 AR1 0.3735 17.5642 0.23348 GRE_Q1 0.3651 18.1490 0.23661 MAT1 0.3381 20.0268 0.24667 GRE_V

Page 22: Stepwise Regression SAS. Download the Data  atData.htm atData.htm

One Predictor Models

• AR yields the highest R2

• C(p) = 16.74, MSE = .229• Mallows says best model will be that with

small C(p) and value of C(p) near that of p (number of parameters in the model).

• p here is 2 – one predictor and the intercept• Howell suggests one keep adding

predictors until MSE starts increasing.

Page 23: Stepwise Regression SAS. Download the Data  atData.htm atData.htm

Two Predictor Models

Number inModel

R-Square C(p) MSE Variables in Model

2 0.5830 4.9963 0.16116 GRE_Q MAT

2 0.5155 9.6908 0.18725 GRE_V AR2 0.5033 10.5388 0.19196 GRE_Q AR2 0.4935 11.2215 0.19575 GRE_V

MAT2 0.4923 11.3019 0.19620 MAT AR2 0.4852 11.7943 0.19894 GRE_Q

GRE_V

Page 24: Stepwise Regression SAS. Download the Data  atData.htm atData.htm

Two Predictor Models

• Compared to the best one predictor model, that with MAT and GRE_Q has– Considerably higher R2

– Considerably lower C(p)– Value of C(p), 5, close to value of p, 3.– Considerably lower MSE

Page 25: Stepwise Regression SAS. Download the Data  atData.htm atData.htm

Three Predictor ModelsNumber inModel

R-Square C(p) MSE Variables in Model

3 0.6170 4.6292 0.15369 GRE_Q GRE_V MAT

3 0.6102 5.1050 0.15644 GRE_Q MAT AR

3 0.5719 7.7702 0.17182 GRE_V MAT AR

3 0.5716 7.7888 0.17193 GRE_Q GRE_V AR

Page 26: Stepwise Regression SAS. Download the Data  atData.htm atData.htm

Three Predictor Models

• Adding GRE_V to the best two predictor model (GRE_Q and MAT)– Slightly increases R2 (from .58 to .62)– Reduces [C(p) – p] from 2 to .6– Reduces MSE from .16 to .15

• None of these stats impress me much, I am inclined to take the GRE_Q, MAT model as being best.

Page 27: Stepwise Regression SAS. Download the Data  atData.htm atData.htm

Closer Look at MAT, GRE_Q, GRE_V

• e: MODEL GPA = GRE_Q GRE_V MAT / STB SCORR2; run;Parameter Estimates

Variable DF ParameterEstimate

StandardError

t Value Pr > |t| StandardizedEstimate

SquaredSemi-partialCorr Type II

Intercept 1 -2.14877 0.90541 -2.37 0.0253 0 .

GRE_Q 1 0.00493 0.00170 2.90 0.0076 0.39922 0.12357

GRE_V 1 0.00161 0.00106 1.52 0.1405 0.22317 0.03404

MAT 1 0.02612 0.00873 2.99 0.0060 0.40267 0.13180

Page 28: Stepwise Regression SAS. Download the Data  atData.htm atData.htm

Keep GRE_V or Not ?

• It does not have a significant partial effect in the model, why keep it?

• Because it is free info. You get GRE-V and GRE_Q for the same price as GRE_Q along.

• Equi donati dentes non inspiciuntur.– As (gift) horses age, their gums recede,

making them look long in the tooth.

Page 29: Stepwise Regression SAS. Download the Data  atData.htm atData.htm

Add AR ?

• R2 increases from .617 to .640• C(p) = p (always true in full model)• MSE drops from .154 to .150• Getting AR data is expensive• Stop gathering the AR data, unless it has

some other value.

Page 30: Stepwise Regression SAS. Download the Data  atData.htm atData.htm

Conclusions

• Read http://core.ecu.edu/psyc/wuenschk/StatHelp/Stepwise-Voodoo.htm

• Treat all claims based on stepwise algorithms as if they were made by Saddam Hussein on a bad day with a headache having a friendly chat with George Bush.