minitab which is better

Upload: roberto-carlos-chuquilin-goicochea

Post on 02-Jun-2018

216 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/10/2019 MINITAB Which is Better

    1/6

    Which Is Better, Stepwise Regression or

    Best Subsets Regression?

    4

    19

    1

    0

    Stepwise regression and best subsets regression are both automatic tools that help you

    identify useful predictors during the exploratory stages of model building for linear

    regression. These two procedures use different methods and present you with different

    output.

    An obvious question arises. Does one procedure pick the true model more often than the

    other? Ill tackle that question in this post.

    http://blog.minitab.com/blog/adventures-in-statistics/which-is-better%2C-stepwise-regression-or-best-subsets-regressionhttp://blog.minitab.com/blog/adventures-in-statistics/which-is-better%2C-stepwise-regression-or-best-subsets-regressionhttp://blog.minitab.com/blog/adventures-in-statistics/which-is-better%2C-stepwise-regression-or-best-subsets-regression
  • 8/10/2019 MINITAB Which is Better

    2/6

    First, a quick refresher about the two procedures and their different results:

    Stepwise regression presents you with a single model constructed using thep-

    valuesof thepredictor variables

    http://blog.minitab.com/blog/adventures-in-statistics/how-to-interpret-regression-analysis-results-p-values-and-coefficientshttp://blog.minitab.com/blog/adventures-in-statistics/how-to-interpret-regression-analysis-results-p-values-and-coefficientshttp://blog.minitab.com/blog/adventures-in-statistics/how-to-interpret-regression-analysis-results-p-values-and-coefficientshttp://blog.minitab.com/blog/adventures-in-statistics/how-to-interpret-regression-analysis-results-p-values-and-coefficientshttp://support.minitab.com/en-us/minitab/17/topic-library/modeling-statistics/regression-and-correlation/regression-models/what-are-response-and-predictor-variables/http://support.minitab.com/en-us/minitab/17/topic-library/modeling-statistics/regression-and-correlation/regression-models/what-are-response-and-predictor-variables/http://support.minitab.com/en-us/minitab/17/topic-library/modeling-statistics/regression-and-correlation/regression-models/what-are-response-and-predictor-variables/http://support.minitab.com/en-us/minitab/17/topic-library/modeling-statistics/regression-and-correlation/regression-models/what-are-response-and-predictor-variables/http://blog.minitab.com/blog/adventures-in-statistics/how-to-interpret-regression-analysis-results-p-values-and-coefficientshttp://blog.minitab.com/blog/adventures-in-statistics/how-to-interpret-regression-analysis-results-p-values-and-coefficients
  • 8/10/2019 MINITAB Which is Better

    3/6

    Best subsets regression assess all possible models and displays a subset along with

    theiradjusted R-squaredand Mallows Cp values

    The key benefit of the stepwise procedure is the simplicity of the single model. Best subsets

    does not pick a final model for you but it does present you with multiple models and

    information to help you choose the final model. For more details, read this post where

    Icompare stepwise regression to best subsets regressionand present examples using both

    analyses.

    Determining the Better Model Selection Method

    A study by Olejnik, Mills, and Keselman* compares how often stepwise regression, best

    subsets regression using the lowest Mallows Cp, and best subsets using the highest

    adjusted R-squared selects the true model.

    The authors assessed 32 conditions that differed by the number of candidate variables,

    number of authentic variables, sample size, and level ofmulticollinearity.For each

    condition, the authors created 1,000 computer-generated datasets and analyzed them with

    both stepwise and best subsets to determine how often each procedure selected the

    correct model.

    And, the winner is...stepwise regression!! Congratulations! Well, sort of, as well see.

    Best subsets regression using the lowest Mallows Cp is a very close second. The overall

    difference between Mallows Cp and stepwise selection isless than 3%. The adjusted R-

    squared performed much more poorly than either stepwise or Mallows Cp.

    However, before we pop open the champagne to celebrate stepwise regressions victory,

    theres a huge caveat to reveal.

    Stepwise selection usually did not identify the correctmodel. Gasp!

    Digging into the Results

    Lets look at the results more closely to see how well stepwise selection performs and what

    affects its performance. Ill only cover stepwise selection, but the results for Mallows Cp are

    essentially tied and follow the same patterns. Ill give my thoughts on the matter at the end.

    In the results below, stepwise regression identifies the correct model ifit selects all of the

    authentic predictors and excludes all of the noise predictors.

    Best case scenario

    In the study, stepwise regression performs the best when there are four candidate variables,three of which are authentic; there is zero correlation between the predictors; and there is

    http://blog.minitab.com/blog/adventures-in-statistics/multiple-regession-analysis-use-adjusted-r-squared-and-predicted-r-squared-to-include-the-correct-number-of-variableshttp://blog.minitab.com/blog/adventures-in-statistics/multiple-regession-analysis-use-adjusted-r-squared-and-predicted-r-squared-to-include-the-correct-number-of-variableshttp://blog.minitab.com/blog/adventures-in-statistics/multiple-regession-analysis-use-adjusted-r-squared-and-predicted-r-squared-to-include-the-correct-number-of-variableshttp://blog.minitab.com/blog/adventures-in-statistics/regression-smackdown-stepwise-versus-best-subsetshttp://blog.minitab.com/blog/adventures-in-statistics/regression-smackdown-stepwise-versus-best-subsetshttp://blog.minitab.com/blog/adventures-in-statistics/regression-smackdown-stepwise-versus-best-subsetshttp://blog.minitab.com/blog/adventures-in-statistics/what-are-the-effects-of-multicollinearity-and-when-can-i-ignore-themhttp://blog.minitab.com/blog/adventures-in-statistics/what-are-the-effects-of-multicollinearity-and-when-can-i-ignore-themhttp://blog.minitab.com/blog/adventures-in-statistics/what-are-the-effects-of-multicollinearity-and-when-can-i-ignore-themhttp://blog.minitab.com/blog/adventures-in-statistics/what-are-the-effects-of-multicollinearity-and-when-can-i-ignore-themhttp://blog.minitab.com/blog/adventures-in-statistics/regression-smackdown-stepwise-versus-best-subsetshttp://blog.minitab.com/blog/adventures-in-statistics/multiple-regession-analysis-use-adjusted-r-squared-and-predicted-r-squared-to-include-the-correct-number-of-variables
  • 8/10/2019 MINITAB Which is Better

    4/6

    an extra-large sample size of 500 observations. For this case, the stepwise procedure selects

    the correct model 84% of the time. Unfortunately, this is not a realistic scenario and the

    accuracy diminishes from here.

    Number of candidate predictors and number of authentic predictors

    The study looks at scenarios where there are either 4 or 8 candidate predictors. It is harder

    to choose the correct model when there are more candidates simply because there are

    more possible models to choose from. The same pattern holds true for the number of

    authentic predictors.

    The table below shows the results for models with no multicollinearity and a good sample

    size (100-120 observations). Notice the decrease in the percent correct as both the number

    of candidates and number of authentic predictors increase.

    Candidate predictors Authentic predictors % Correct model

    4 1 62.7

    2 54.3

    3 34.4

    8 2 31.3

    4 12.7

    6 1.1

    Multicollinearity

    The study varies multicollinearity to determine how correlated predictors affect the ability

    of stepwise regression to choose the correct model. When predictors are correlated, its

    harder to determine the individual effect each one has on the response variable. The study

    set the correlation between predictors to 0, 0.2, and 0.6.

    The table below shows the results for models with a good sample size (100-120

    observations). As correlation increases, the percent correct decreases.

    Candidate

    predictors

    Authentic

    predictorsCorrelation

    % Correct

    model

    4 2 0.0 54.3

  • 8/10/2019 MINITAB Which is Better

    5/6

    Candidate

    predictors

    Authentic

    predictorsCorrelation

    % Correct

    model

    0.2 43.1

    0.6 15.7

    8 4 0.0 12.7

    0.2 1.0

    0.6 0.4

    Sample size

    The study uses two sample sizes to see how that influences the ability to select the correct

    model. The size of the smaller samples is calculated to achieve 0.80 power, which amounts

    to 100-120 observations. These sample sizes are consistent with good practices and can be

    considered a good sample size.

    The very large sample size is 500 observations and it is 5 times the size that you need to

    achieve the benchmark power of 0.80.

    The table below shows that a very large sample size improves the ability of stepwiseregression to choose the correct model. When choosing your sample size, you may want to

    consider a larger sample than what the power and sample size calculations suggest in order

    to improve the variable selection process.

    Candidate predictors Authentic predictors Correlation

    %

    Correct

    - good

    sample

    size

    %

    Correct

    - very

    large

    sample

    4 2 0.0 54.3 72.1

    0.2 43.1 72.9

    0.6 15.7 69.2

    8 4 0.0 12.7 53.9

    0.2 1.0 39.5

  • 8/10/2019 MINITAB Which is Better

    6/6

    Candidate predictors Authentic predictors Correlation

    %

    Correct

    - good

    sample

    size

    %

    Correct

    - very

    large

    sample

    0.6 0.4 1.8

    Closing Thoughts

    Stepwise regression generally cant pick the true model. This is true even with the small

    number of candidate predictors that this study looks at. In the real world, researchers often

    have many more candidates, which lowers the chances even further.

    Reality is complex and we should not expect that an automated algorithm can figure it out

    for us. After all, the stepwise algorithm follows simple rules and it knows nothing about the

    underlying process or subject area. However, stepwise regression canget you to right

    ballpark. At a glance, youll have a rough idea of what is going on in your data.

    Its up to you to get from the rough idea to the correct model. To do this, youll need to use

    your expertise, theory, and common sense rather than relying solely on simplistic model

    selection rules.

    For tips about how to do this, read my postFour Tips on How to Perform a RegressionAnalysis that Avoids Common Problems.

    If you're learning about regression, read myregression tutorial!

    *Stephen Olejnik, Jamie Mills, and Harfey Keselman, Using Wherrys Adjusted R2 and

    Mallows Cp for Model Selection from All Possible Regressions,The Journal of Experimental

    Education, 2000, 68(4), 365-380.

    http://blog.minitab.com/blog/adventures-in-statistics/four-tips-on-how-to-perform-a-regression-analysis-that-avoids-common-problemshttp://blog.minitab.com/blog/adventures-in-statistics/four-tips-on-how-to-perform-a-regression-analysis-that-avoids-common-problemshttp://blog.minitab.com/blog/adventures-in-statistics/four-tips-on-how-to-perform-a-regression-analysis-that-avoids-common-problemshttp://blog.minitab.com/blog/adventures-in-statistics/four-tips-on-how-to-perform-a-regression-analysis-that-avoids-common-problemshttp://blog.minitab.com/blog/adventures-in-statistics/regression-analysis-tutorial-and-exampleshttp://blog.minitab.com/blog/adventures-in-statistics/regression-analysis-tutorial-and-exampleshttp://blog.minitab.com/blog/adventures-in-statistics/regression-analysis-tutorial-and-exampleshttp://blog.minitab.com/blog/adventures-in-statistics/regression-analysis-tutorial-and-exampleshttp://blog.minitab.com/blog/adventures-in-statistics/four-tips-on-how-to-perform-a-regression-analysis-that-avoids-common-problemshttp://blog.minitab.com/blog/adventures-in-statistics/four-tips-on-how-to-perform-a-regression-analysis-that-avoids-common-problems