sw388r7 data analysis & computers ii slide 1 stepwise multiple regression differences between...

103
SW388R7 Data Analysis & Computers II Slide 1 Stepwise Multiple Regression Differences between stepwise and other methods of multiple regression Sample problem Steps in stepwise multiple regression Homework Problems

Upload: ethelbert-greer

Post on 16-Jan-2016

270 views

Category:

Documents


12 download

TRANSCRIPT

Page 1: SW388R7 Data Analysis & Computers II Slide 1 Stepwise Multiple Regression Differences between stepwise and other methods of multiple regression Sample

SW388R7Data Analysis

& Computers II

Slide 1

Stepwise Multiple Regression

Differences between stepwise and other methods of multiple regression

Sample problem

Steps in stepwise multiple regression

Homework Problems

Page 2: SW388R7 Data Analysis & Computers II Slide 1 Stepwise Multiple Regression Differences between stepwise and other methods of multiple regression Sample

SW388R7Data Analysis

& Computers II

Slide 2

Types of multiple regression

Different types of multiple regression are distinguished by the method for entering the independent variables into the analysis.

In standard (or simultaneous) multiple regression, all of the independent variables are entered into the analysis at the same.

In hierarchical (or sequential) multiple regression, the independent variables are entered in an order prescribed by the analyst.

In stepwise (or statistical) multiple regression, the independent variables are entered according to their statistical contribution in explaining the variance in the dependent variable.

No matter what method of entry is chosen, a multiple regression that includes the same independent variables and the same dependent variables will produce the same multiple regression equation.

Page 3: SW388R7 Data Analysis & Computers II Slide 1 Stepwise Multiple Regression Differences between stepwise and other methods of multiple regression Sample

SW388R7Data Analysis

& Computers II

Slide 3

Stepwise multiple regression

Stepwise regression is designed to find the most parsimonious set of predictors that are most effective in predicting the dependent variable.

Variables are added to the regression equation one at a time, using the statistical criterion of maximizing the R² of the included variables.

The process of adding more variables stops when all of the available variables have been included or when it is not possible to make a statistically significant improvement in R² using any of the variables not yet included.

Since variables will not be added to the regression equation unless they make a statistically significant addition to the analysis, all of the independent variable selected for inclusion will have a statistically significant relationship to the dependent variable.

Page 4: SW388R7 Data Analysis & Computers II Slide 1 Stepwise Multiple Regression Differences between stepwise and other methods of multiple regression Sample

SW388R7Data Analysis

& Computers II

Slide 4

Differences in statistical outputs

Each time SPSS includes or removes a variable from the analysis, SPSS considers it a new step or model, i.e. there will be one model and result for each variable included in the analysis.

SPSS provides a table of variables included in the analysis and a table of variables excluded from the analysis. It is possible that none of the variables will be included. It is possible that all of the variables will be included.

The order of entry of the variables can be used as a measure of relative importance.

Once a variable is included, its interpretation in stepwise regression is the same as it would be using other methods for including regression variables.

Page 5: SW388R7 Data Analysis & Computers II Slide 1 Stepwise Multiple Regression Differences between stepwise and other methods of multiple regression Sample

SW388R7Data Analysis

& Computers II

Slide 5

Differences in solving stepwise regression problems

The level of significance for the analysis is included in the specifications for the statistical analysis. While we will use 0.05 as the level of significance for our problems, a different level of significance can be entered in the SPSS Options dialog box.

The preferred sample size requirement is larger for stepwise regression, i.e. 50 x the number of independent variables.

Stepwise procedures are notorious for over-fitting the sample to the detriment of generalizability. Validation analysis is absolutely necessary. If generalizability is compromised, it is permissible to interpret the variables included in the 75% training analysis (though we will not do this in our problems).

While multicollinearity for all variable can be examined, it is really only a problem for the variables not included in the analysis. If a variable is included in the stepwise analysis, it will not have a collinear relationship.

Page 6: SW388R7 Data Analysis & Computers II Slide 1 Stepwise Multiple Regression Differences between stepwise and other methods of multiple regression Sample

SW388R7Data Analysis

& Computers II

Slide 6

A stepwise regression problem

When the problem asks us to identify the best set of predictors, we will do stepwise multiple regression.

Multiple regression is feasible if the dependent variable is metric and the independent variables (both predictors and controls) are metric or dichotomous, and the available data is sufficient to satisfy the sample size requirements.

Page 7: SW388R7 Data Analysis & Computers II Slide 1 Stepwise Multiple Regression Differences between stepwise and other methods of multiple regression Sample

SW388R7Data Analysis

& Computers II

Slide 7

Level of measurement - answer

True with caution is the correct answer.

Stepwise multiple regression requires that the dependent variable be metric and the independent variables be metric or dichotomous.

Page 8: SW388R7 Data Analysis & Computers II Slide 1 Stepwise Multiple Regression Differences between stepwise and other methods of multiple regression Sample

SW388R7Data Analysis

& Computers II

Slide 8

Sample size - question

The second question asks about the sample size requirements for multiple regression.

To answer this question, we will run the initial or baseline multiple regression to obtain some basic data about the problem and solution.

Page 9: SW388R7 Data Analysis & Computers II Slide 1 Stepwise Multiple Regression Differences between stepwise and other methods of multiple regression Sample

SW388R7Data Analysis

& Computers II

Slide 9

The baseline regression - 1

After we check for violations of assumptions and outliers, we will make a decision whether we should interpret the model that includes the transformed variables and omits outliers (the revised model), or whether we will interpret the model that uses the untransformed variables and includes all cases including the outliers (the baseline model).

In order to make this decision, we run the baseline regression before we examine assumptions and outliers, and record the R² for the baseline model. If using transformations and outliers substantially improves the analysis (a 2% increase in R²), we interpret the revised model. If the increase is smaller, we interpret the baseline model.

To run the baseline model, select Regression | Linear… from the Analyze model.

Page 10: SW388R7 Data Analysis & Computers II Slide 1 Stepwise Multiple Regression Differences between stepwise and other methods of multiple regression Sample

SW388R7Data Analysis

& Computers II

Slide 10

The baseline regression - 2

First, move the dependent variable rincom98 to the Dependent text box.

Second, move the independent variables hrs1, wkrslf, and prestg80 to the Independent(s) list box.

Third, select the method for entering the variables into the analysis from the drop down Method menu. In this example, we select Stepwise to request the best subset of variables.

Page 11: SW388R7 Data Analysis & Computers II Slide 1 Stepwise Multiple Regression Differences between stepwise and other methods of multiple regression Sample

SW388R7Data Analysis

& Computers II

Slide 11

The baseline regression - 3

Click on the Statistics… button to specify the statistics options that we want.

Page 12: SW388R7 Data Analysis & Computers II Slide 1 Stepwise Multiple Regression Differences between stepwise and other methods of multiple regression Sample

SW388R7Data Analysis

& Computers II

Slide 12

The baseline regression - 4

Second, mark the checkboxes for Model Fit, Descriptives, and R squared change.

The R squared change statistic will tell us the contribution of each additional variable that the stepwise procedure adds to the analysis.

Fifth, click on the Continue button to close the dialog box.

First, mark the checkboxes for Estimates on the Regression Coefficients panel.

Third, mark the Durbin-Watson statistic on the Residuals panel.

Fourth, mark the the Collinearity diagnostics to get tolerance values for testing multicollinearity.

Page 13: SW388R7 Data Analysis & Computers II Slide 1 Stepwise Multiple Regression Differences between stepwise and other methods of multiple regression Sample

SW388R7Data Analysis

& Computers II

Slide 13

The baseline regression - 5

Next, we need to specify the statistical criteria to use for including variables in the analysis.

Click on the Options button.

Page 14: SW388R7 Data Analysis & Computers II Slide 1 Stepwise Multiple Regression Differences between stepwise and other methods of multiple regression Sample

SW388R7Data Analysis

& Computers II

Slide 14

The baseline regression - 6

Second, click on the Continue button to close the dialog box.

First, the default level of significance for entering variables to the regression equation is .05. Since that is the alpha level for our problem we do not need to make any change.

The criteria for removing a variable from the analysis is usually set at twice the level for including variables.

Page 15: SW388R7 Data Analysis & Computers II Slide 1 Stepwise Multiple Regression Differences between stepwise and other methods of multiple regression Sample

SW388R7Data Analysis

& Computers II

Slide 15

The baseline regression - 7

Click on the OK button to request the regression output.

Page 16: SW388R7 Data Analysis & Computers II Slide 1 Stepwise Multiple Regression Differences between stepwise and other methods of multiple regression Sample

SW388R7Data Analysis

& Computers II

Slide 16

R² for the baseline model

In stepwise regression, the model number corresponds to the number of variables included in the stepwise analysis. Two variables are included in this problem.

The R² of 0.257 is the benchmark that we will use to evaluate the utility of transformations and the elimination of outliers.

Prior to any transformations of variables to satisfy the assumptions of multiple regression or the removal of outliers, the proportion of variance in the dependent variable explained by the independent variables (R²) was 25.7%.

In stepwise regression, the relationship will always be significant if any variables are included because the variables can only be included if they contributed to a statistically significant relationship.

Page 17: SW388R7 Data Analysis & Computers II Slide 1 Stepwise Multiple Regression Differences between stepwise and other methods of multiple regression Sample

SW388R7Data Analysis

& Computers II

Slide 17

Descriptive Statistics

13.94 5.287 145

41.22 12.776 145

1.88 .331 145

45.96 14.174 145

RINCOM98

HRS1

WRKSLF

PRESTG80

Mean Std. Deviation N

Sample size – evidence and answer

Stepwise multiple regression requires that the minimum ratio of valid cases to independent variables be at least 5 to 1. The ratio of valid cases (145) to number of independent variables (3) was 48.3 to 1, which was equal to or greater than the minimum ratio. The requirement for a minimum ratio of cases to independent variables was satisfied.

However, the ratio of 48.3 to 1 did not satisfy the preferred ratio of 50 cases per independent variable. A caution should be added to the interpretation of the analysis and validation analysis should be conducted.

True with caution is the correct answer.

Page 18: SW388R7 Data Analysis & Computers II Slide 1 Stepwise Multiple Regression Differences between stepwise and other methods of multiple regression Sample

SW388R7Data Analysis

& Computers II

Slide 18

Assumption of normality for the dependent variable - question

Having satisfied the level of measurement and sample size requirements, we turn our attention to conformity with three of the assumptions of multiple regression: normality, linearity, and homoscedasticity.

First, we will evaluate the assumption of normality for the dependent variable.

Page 19: SW388R7 Data Analysis & Computers II Slide 1 Stepwise Multiple Regression Differences between stepwise and other methods of multiple regression Sample

SW388R7Data Analysis

& Computers II

Slide 19

Run the script to test normality

First, move the variables to the list boxes based on the role that the variable plays in the analysis and its level of measurement.

Third, mark the checkboxes for the transformations that we want to test in evaluating the assumption.

Second, click on the Assumption of Normality option button to request that SPSS produce the output needed to evaluate the assumption of normality.

Fourth, click on the OK button to produce the output.

Page 20: SW388R7 Data Analysis & Computers II Slide 1 Stepwise Multiple Regression Differences between stepwise and other methods of multiple regression Sample

SW388R7Data Analysis

& Computers II

Slide 20

Descriptives

13.35 .419

12.52

14.18

13.54

15.00

29.535

5.435

1

23

22

8.00

-.686 .187

-.253 .373

Mean

Lower Bound

Upper Bound

95% ConfidenceInterval for Mean

5% Trimmed Mean

Median

Variance

Std. Deviation

Minimum

Maximum

Range

Interquartile Range

Skewness

Kurtosis

RESPONDENTS INCOMEStatistic Std. Error

Normality of the dependent variable: respondent’s income

The dependent variable "income" [rincom98] satisfied the criteria for a normal distribution. The skewness of the distribution (-0.686) was between -1.0 and +1.0 and the kurtosis of the distribution (-0.253) was between -1.0 and +1.0. True is the

correct answer.

Page 21: SW388R7 Data Analysis & Computers II Slide 1 Stepwise Multiple Regression Differences between stepwise and other methods of multiple regression Sample

SW388R7Data Analysis

& Computers II

Slide 21

Normality of the independent variable: hrs1

Next, we will evaluate the assumption of normality for the independent variable, number of hours worked in the past week.

Page 22: SW388R7 Data Analysis & Computers II Slide 1 Stepwise Multiple Regression Differences between stepwise and other methods of multiple regression Sample

SW388R7Data Analysis

& Computers II

Slide 22

Descriptives

40.99 .958

39.10

42.88

41.21

40.00

161.491

12.708

4

80

76

10.00

-.324 .183

.935 .364

Mean

Lower Bound

Upper Bound

95% ConfidenceInterval for Mean

5% Trimmed Mean

Median

Variance

Std. Deviation

Minimum

Maximum

Range

Interquartile Range

Skewness

Kurtosis

NUMBER OF HOURSWORKED LAST WEEK

Statistic Std. Error

Normality of the independent variable: number of hours worked in the past week

The independent variable "number of hours worked in the past week" [hrs1] satisfied the criteria for a normal distribution. The skewness of the distribution (-0.324) was between -1.0 and +1.0 and the kurtosis of the distribution (0.935) was between -1.0 and +1.0. True is the

correct answer.

Page 23: SW388R7 Data Analysis & Computers II Slide 1 Stepwise Multiple Regression Differences between stepwise and other methods of multiple regression Sample

SW388R7Data Analysis

& Computers II

Slide 23

Normality of the independent variable: prestg80

Finally, we will evaluate the assumption of normality for the independent variable, The independent variable "occupational prestige score" [prestg80]

Page 24: SW388R7 Data Analysis & Computers II Slide 1 Stepwise Multiple Regression Differences between stepwise and other methods of multiple regression Sample

SW388R7Data Analysis

& Computers II

Slide 24

Descriptives

44.17 .873

42.45

45.89

43.82

43.00

194.196

13.935

17

86

69

18.00

.401 .153

-.630 .304

Mean

Lower Bound

Upper Bound

95% ConfidenceInterval for Mean

5% Trimmed Mean

Median

Variance

Std. Deviation

Minimum

Maximum

Range

Interquartile Range

Skewness

Kurtosis

RS OCCUPATIONALPRESTIGE SCORE (1980)

Statistic Std. Error

Normality of the second independent variable:occupational prestige score

The independent variable "occupational prestige score" [prestg80] satisfied the criteria for a normal distribution. The skewness of the distribution (0.401) was between -1.0 and +1.0 and the kurtosis of the distribution (-0.630) was between -1.0 and +1.0. True is the

correct answer.

Page 25: SW388R7 Data Analysis & Computers II Slide 1 Stepwise Multiple Regression Differences between stepwise and other methods of multiple regression Sample

SW388R7Data Analysis

& Computers II

Slide 25

Assumption of linearity for respondent’s income and number of hours worked last week -

question

All of the metric variables included in the analysis satisfied the assumption of normality.

Next we will test the relationships for linearity.

Page 26: SW388R7 Data Analysis & Computers II Slide 1 Stepwise Multiple Regression Differences between stepwise and other methods of multiple regression Sample

SW388R7Data Analysis

& Computers II

Slide 26

Run the script to test linearity

First, click on the Assumption of Linearity option button to request that SPSS produce the output needed to evaluate the assumption of linearity.

Second, click on the OK button to produce the output.

When the linearity option is selected, a default set of transformations to test is marked.

Page 27: SW388R7 Data Analysis & Computers II Slide 1 Stepwise Multiple Regression Differences between stepwise and other methods of multiple regression Sample

SW388R7Data Analysis

& Computers II

Slide 27

Correlations

1 .337** -.231** -.303** -.059

. .000 .005 .000 .475

168 149 149 149 149

.337** 1 -.871** -.981** -.361**

.000 . .000 .000 .000

149 176 176 176 176

-.231** -.871** 1 .946** .743**

.005 .000 . .000 .000

149 176 176 176 176

-.303** -.981** .946** 1 .502**

.000 .000 .000 . .000

149 176 176 176 176

-.059 -.361** .743** .502** 1

.475 .000 .000 .000 .

149 176 176 176 176

Pearson Correlation

Sig. (2-tailed)

N

Pearson Correlation

Sig. (2-tailed)

N

Pearson Correlation

Sig. (2-tailed)

N

Pearson Correlation

Sig. (2-tailed)

N

Pearson Correlation

Sig. (2-tailed)

N

RESPONDENTS INCOME

NUMBER OF HOURSWORKED LAST WEEK

Logarithm of ReflectedValues of HRS1 [LG10(81-HRS1)]

Square Root of ReflectedValues of HRS1 [SQRT(81-HRS1)]

Inverse of ReflectedValues of HRS1 [-1/(81-HRS1)]

RESPONDENTS INCOME

NUMBER OFHOURS

WORKEDLAST WEEK

Logarithm ofReflectedValues of

HRS1 [LG10(81-HRS1)]

Square Rootof Reflected

Values ofHRS1 [SQRT(

81-HRS1)]

Inverse ofReflectedValues ofHRS1 [-1/(81-HRS1)]

Correlation is significant at the 0.01 level (2-tailed).**.

Linearity test: respondent’s income and number of hours worked last week

True is the correct answer.

The correlation between "number of hours worked in the past week" and "income" was statistically significant (r=.337, p<0.001). A linear relationship exists between these variables.

Page 28: SW388R7 Data Analysis & Computers II Slide 1 Stepwise Multiple Regression Differences between stepwise and other methods of multiple regression Sample

SW388R7Data Analysis

& Computers II

Slide 28

Assumption of linearity for respondent’s income

and occupational prestige score - question

All of the metric variables included in the analysis satisfied the assumption of normality.

Next we will test the relationships for linearity.

Page 29: SW388R7 Data Analysis & Computers II Slide 1 Stepwise Multiple Regression Differences between stepwise and other methods of multiple regression Sample

SW388R7Data Analysis

& Computers II

Slide 29

Correlations

1 .440** .436** .440** .414**

. .000 .000 .000 .000

168 168 168 168 168

.440** 1 .985** .996** .936**

.000 . .000 .000 .000

168 255 255 255 255

.436** .985** 1 .996** .982**

.000 .000 . .000 .000

168 255 255 255 255

.440** .996** .996** 1 .962**

.000 .000 .000 . .000

168 255 255 255 255

.414** .936** .982** .962** 1

.000 .000 .000 .000 .

168 255 255 255 255

Pearson Correlation

Sig. (2-tailed)

N

Pearson Correlation

Sig. (2-tailed)

N

Pearson Correlation

Sig. (2-tailed)

N

Pearson Correlation

Sig. (2-tailed)

N

Pearson Correlation

Sig. (2-tailed)

N

RESPONDENTS INCOME

RS OCCUPATIONALPRESTIGE SCORE (1980)

Logarithm of PRESTG80[LG10(PRESTG80)]

Square Root ofPRESTG80[SQRT(PRESTG80)]

Inverse of PRESTG80[-1/(PRESTG80)]

RESPONDENTS INCOME

RSOCCUPATIONAL

PRESTIGE SCORE

(1980)

Logarithm ofPRESTG80[LG10(PRES

TG80)]

Square Rootof PRESTG80[SQRT(PRES

TG80)]

Inverse ofPRESTG80[-1/(PREST

G80)]

Correlation is significant at the 0.01 level (2-tailed).**.

Linearity test: respondent’s income and occupational prestige score

True is the correct answer.

The correlation between "occupational prestige score" and "income" was statistically significant (r=.440, p<0.001). A linear relationship exists between these variables.

Page 30: SW388R7 Data Analysis & Computers II Slide 1 Stepwise Multiple Regression Differences between stepwise and other methods of multiple regression Sample

SW388R7Data Analysis

& Computers II

Slide 30

Assumption of homogeneity of variance - question

Self-employment is the only dichotomous independent variable in the analysis. We will test if for homogeneity of variance using income as the dependent variable.

Page 31: SW388R7 Data Analysis & Computers II Slide 1 Stepwise Multiple Regression Differences between stepwise and other methods of multiple regression Sample

SW388R7Data Analysis

& Computers II

Slide 31

Run the script to test homogeneity of variance

First, click on the Assumption of Homogeneity option button to request that SPSS produce the output needed to evaluate the assumption of linearity.

Second, click on the OK button to produce the output.

When the homogeneity of variance option is selected, a default set of transformations to test is marked.

Page 32: SW388R7 Data Analysis & Computers II Slide 1 Stepwise Multiple Regression Differences between stepwise and other methods of multiple regression Sample

SW388R7Data Analysis

& Computers II

Slide 32

Assumption of homogeneity of variance

Based on the Levene Test, the variance in "income" [rincom98] is homogeneous for the categories of "self-employment" [wrkslf]. The probability associated with the Levene Statistic (p=0.076) is greater than the level of significance (0.01), so we fail to reject the null hypothesis that the variance is equal across groups, and conclude that the homoscedasticity assumption is satisfied.

The homogeneity of variance assumption was satisfied. True is the correct answer.

Page 33: SW388R7 Data Analysis & Computers II Slide 1 Stepwise Multiple Regression Differences between stepwise and other methods of multiple regression Sample

SW388R7Data Analysis

& Computers II

Slide 33

Detection of outliers - question

In multiple regression, an outlier in the solution can be defined as a case that has a large residual because the equation did a poor job of predicting its value.

We will run the baseline regression again and have SPSS compute the standardized residual for each case. Cases with a standardized residual larger than +/- 3.0 will be treated as outliers.

Page 34: SW388R7 Data Analysis & Computers II Slide 1 Stepwise Multiple Regression Differences between stepwise and other methods of multiple regression Sample

SW388R7Data Analysis

& Computers II

Slide 34

Re-running the baseline regression - 1

To run the baseline model, select Regression | Linear… from the Analyze model.

Having decided to use the baseline model for the interpretation of this analysis, the SPSS regression output was re-created.

Page 35: SW388R7 Data Analysis & Computers II Slide 1 Stepwise Multiple Regression Differences between stepwise and other methods of multiple regression Sample

SW388R7Data Analysis

& Computers II

Slide 35

Re-running the baseline regression - 2

First, move the dependent variable rincom98 to the Dependent text box.

Second, move the independent variables hrs1, wkrslf, and prestg80 to the Independent(s) list box. Third, select the method for

entering the variables into the analysis from the drop down Method menu. In this example, we select Stepwise to request the best subset of variables.

Page 36: SW388R7 Data Analysis & Computers II Slide 1 Stepwise Multiple Regression Differences between stepwise and other methods of multiple regression Sample

SW388R7Data Analysis

& Computers II

Slide 36

Re-running the baseline regression - 3

Click on the Statistics… button to specify the statistics options that we want.

Page 37: SW388R7 Data Analysis & Computers II Slide 1 Stepwise Multiple Regression Differences between stepwise and other methods of multiple regression Sample

SW388R7Data Analysis

& Computers II

Slide 37

Re-running the baseline regression - 4

Second, mark the checkboxes for Model Fit, Descriptives, and R squared change.

The R squared change statistic will tell us whether or not the variables added after the controls have a relationship to the dependent variable.

Sixth, click on the Continue button to close the dialog box.

First, mark the checkboxes for Estimates on the Regression Coefficients panel.

Third, mark the Durbin-Watson statistic on the Residuals panel.

Fifth, mark the Collinearity diagnostics to get tolerance values for testing multicollinearity.

Fourth, mark the checkbox for the Casewise diagnostics, which will be used to identify outliers.

Page 38: SW388R7 Data Analysis & Computers II Slide 1 Stepwise Multiple Regression Differences between stepwise and other methods of multiple regression Sample

SW388R7Data Analysis

& Computers II

Slide 38

Re-running the baseline regression - 5

Click on the Save button to save the standardized residuals to the data editor.

Page 39: SW388R7 Data Analysis & Computers II Slide 1 Stepwise Multiple Regression Differences between stepwise and other methods of multiple regression Sample

SW388R7Data Analysis

& Computers II

Slide 39

Re-running the baseline regression - 6

Mark the checkbox for Standardized Residuals so that SPSS saves a new variable in the data editor. We will use this variable to omit outliers in the revised regression model.

Click on the Continue button to close the dialog box.

Page 40: SW388R7 Data Analysis & Computers II Slide 1 Stepwise Multiple Regression Differences between stepwise and other methods of multiple regression Sample

SW388R7Data Analysis

& Computers II

Slide 40

Re-running the baseline regression - 7

Click on the OK button to request the regression output.

Page 41: SW388R7 Data Analysis & Computers II Slide 1 Stepwise Multiple Regression Differences between stepwise and other methods of multiple regression Sample

SW388R7Data Analysis

& Computers II

Slide 41

Outliers in the analysis

If cases have a standardized residual larger than +/- 3.0, SPSS creates a table titled Casewise Diagnostics, in which it lists the cases and values that results in their being an outlier.

If there are no outliers, SPSS does not print the Casewise Diagnostics table. There was no table for this problem. The answer to the question is true.

We can verify that all standardized residuals were less than +/- 3.0 by looking the minimum and maximum standardized residuals in the table of Residual Statistics. Both the minimum and maximum fell in the acceptable range.

Since there were no outliers, the correct answer is true.

Page 42: SW388R7 Data Analysis & Computers II Slide 1 Stepwise Multiple Regression Differences between stepwise and other methods of multiple regression Sample

SW388R7Data Analysis

& Computers II

Slide 42

Selecting the model to interpret - question

Since there were no transformations used and there were no outliers, we can use the baseline regression for our interpretation.

The correct answer is false.

Page 43: SW388R7 Data Analysis & Computers II Slide 1 Stepwise Multiple Regression Differences between stepwise and other methods of multiple regression Sample

SW388R7Data Analysis

& Computers II

Slide 43

Assumption of independence of errors - question

We can now check the assumption of independence of errors for the analysis we will interpret.

Page 44: SW388R7 Data Analysis & Computers II Slide 1 Stepwise Multiple Regression Differences between stepwise and other methods of multiple regression Sample

SW388R7Data Analysis

& Computers II

Slide 44

Model Summaryc

.424a .180 .174 4.804 .180 31.350 1 143 .000

.507b .257 .247 4.588 .077 14.788 1 142 .000 1.866

Model1

2

R R SquareAdjustedR Square

Std. Error ofthe Estimate

R SquareChange F Change df1 df2 Sig. F Change

Change Statistics

Durbin-Watson

Predictors: (Constant), PRESTG80a.

Predictors: (Constant), PRESTG80, HRS1b.

Dependent Variable: RINCOM98c.

Assumption of independence of errors:evidence and answer

Multiple regression assumes that the errors are independent and there is no serial correlation. Errors are the residuals or differences between the actual score for a case and the score estimated by the regression equation. No serial correlation implies that the size of the residual for one case has no impact on the size of the residual for the next case.

The Durbin-Watson statistic is used to test for the presence of serial correlation among the residuals. The value of the Durbin-Watson statistic ranges from 0 to 4. As a general rule of thumb, the residuals are not correlated if the Durbin-Watson statistic is approximately 2, and an acceptable range is 1.50 - 2.50.

The Durbin-Watson statistic for this problem is 1.866 which falls within the acceptable range from 1.50 to 2.50. The analysis satisfies the assumption of independence of errors. True is the correct answer.

If the Durbin-Watson statistic was not in the acceptable range, we would add a caution to the findings for a violation of regression assumptions.

Page 45: SW388R7 Data Analysis & Computers II Slide 1 Stepwise Multiple Regression Differences between stepwise and other methods of multiple regression Sample

SW388R7Data Analysis

& Computers II

Slide 45

Multicollinearity - question

The final condition that can have an impact on our interpretation is multicollinearity.

Page 46: SW388R7 Data Analysis & Computers II Slide 1 Stepwise Multiple Regression Differences between stepwise and other methods of multiple regression Sample

SW388R7Data Analysis

& Computers II

Slide 46

Multicollinearity – evidence and answer

Multicollinearity occurs when one independent variable is so strongly Since multicollinearity will result in a variable not being included in the analysis, out examination of tolerances focuses on the table of excluded variables.

The tolerance values for all of the independent variables are larger than 0.10: "number of hours worked in the past week" [hrs1] (.954), "self-employment" [wrkslf] (.979) and "occupational prestige score" [prestg80] (.954).

Multicollinearity is not a problem in this regression analysis. True is the correct answer.

Page 47: SW388R7 Data Analysis & Computers II Slide 1 Stepwise Multiple Regression Differences between stepwise and other methods of multiple regression Sample

SW388R7Data Analysis

& Computers II

Slide 47

Overall relationship between dependent variable

and independent variables - question

The first finding we want to confirm concerns the overall relationship between the dependent variable and one or more of the independent variables.

Page 48: SW388R7 Data Analysis & Computers II Slide 1 Stepwise Multiple Regression Differences between stepwise and other methods of multiple regression Sample

SW388R7Data Analysis

& Computers II

Slide 48

ANOVAc

723.647 1 723.647 31.350 .000a

3300.795 143 23.082

4024.441 144

1034.982 2 517.491 24.581 .000b

2989.460 142 21.053

4024.441 144

Regression

Residual

Total

Regression

Residual

Total

Model1

2

Sum ofSquares df Mean Square F Sig.

Predictors: (Constant), PRESTG80a.

Predictors: (Constant), PRESTG80, HRS1b.

Dependent Variable: RINCOM98c.

Overall relationship between dependent variable

and independent variables – evidence and answer 1

Based on the results in the ANOVA table (F(2, 142) = 24.581, p<0.001), there was an overall relationship between the dependent variable "income" [rincom98] and one or more of independent variables. Since the probability of the F statistic (p<0.001) was less than or equal to the level of significance (0.05), the null hypothesis that the Multiple R for all independent variables was equal to 0 was rejected. The purpose of the analysis, to identify a relationship between some of independent variables and the dependent variable, was supported.

Stepwise multiple regression was performed to identify the best predictors of the dependent variable "income" [rincom98] among the independent variables "number of hours worked in the past week" [hrs1], "self-employment" [wrkslf], and "occupational prestige score" [prestg80].

Page 49: SW388R7 Data Analysis & Computers II Slide 1 Stepwise Multiple Regression Differences between stepwise and other methods of multiple regression Sample

SW388R7Data Analysis

& Computers II

Slide 49

Overall relationship between dependent variable

and independent variables – evidence and answer 2

The Multiple R for the relationship between the independent variables included in the analysis and the dependent variable was 0.507, which would be characterized as moderate using the rule of thumb that a correlation less than or equal to 0.20 is characterized as very weak; greater than 0.20 and less than or equal to 0.40 is weak; greater than 0.40 and less than or equal to 0.60 is moderate; greater than 0.60 and less than or equal to 0.80 is strong; and greater than 0.80 is very strong.

The relationship between the independent variables and the dependent variable was correctly characterized as moderate.

True with caution is the correct answer. Caution in interpreting the relationship should be exercised because of inclusion of ordinal variables; and cases to variables ratio less than 50:1.

Page 50: SW388R7 Data Analysis & Computers II Slide 1 Stepwise Multiple Regression Differences between stepwise and other methods of multiple regression Sample

SW388R7Data Analysis

& Computers II

Slide 50

Best subset of predictors - question

The next finding concerns the list of independent variables that are statistically significant.

Page 51: SW388R7 Data Analysis & Computers II Slide 1 Stepwise Multiple Regression Differences between stepwise and other methods of multiple regression Sample

SW388R7Data Analysis

& Computers II

Slide 51

Coefficientsa

6.669 1.358 4.911 .000

.158 .028 .424 5.599 .000 1.000 1.000

2.862 1.632 1.754 .082

.135 .028 .363 4.898 .000 .954 1.049

.118 .031 .285 3.846 .000 .954 1.049

(Constant)

PRESTG80

(Constant)

PRESTG80

HRS1

Model1

2

B Std. Error

UnstandardizedCoefficients

Beta

StandardizedCoefficients

t Sig. Tolerance VIF

Collinearity Statistics

Dependent Variable: RINCOM98a.

Best subset of predictors – evidence and answer

The best predictors of scores for the dependent variable "income" [rincom98] were "occupational prestige score" [prestg80]; and "number of hours worked in the past week" [hrs1].

The variable "number of hours worked in the past week" [hrs1] was not included in the list of predictors in the question, so false is the correct answer.

Page 52: SW388R7 Data Analysis & Computers II Slide 1 Stepwise Multiple Regression Differences between stepwise and other methods of multiple regression Sample

SW388R7Data Analysis

& Computers II

Slide 52

Relationship of the first independent variable and the dependent variable - question

In the stepwise regression problems, we will focus on the entry order of the independent variables and the interpretation of individual relationships of independent variables on the dependent variable.

Page 53: SW388R7 Data Analysis & Computers II Slide 1 Stepwise Multiple Regression Differences between stepwise and other methods of multiple regression Sample

SW388R7Data Analysis

& Computers II

Slide 53

Relationship of the first independent variable and the dependent variable – evidence and

answer 1

In the table of variables entered and removed, "number of hours worked in the past week" [hrs1] was added to the regression equation in model 2.

In the table of variables entered and removed, "number of hours worked in the past week" [hrs1] was added to the regression equation in model 2. The increase in R Square as a result of including this variable was .077 which was statistically significant, F(1, 142) = 14.788, p<0.001.

Page 54: SW388R7 Data Analysis & Computers II Slide 1 Stepwise Multiple Regression Differences between stepwise and other methods of multiple regression Sample

SW388R7Data Analysis

& Computers II

Slide 54

Coefficientsa

6.669 1.358 4.911 .000

.158 .028 .424 5.599 .000 1.000 1.000

2.862 1.632 1.754 .082

.135 .028 .363 4.898 .000 .954 1.049

.118 .031 .285 3.846 .000 .954 1.049

(Constant)

PRESTG80

(Constant)

PRESTG80

HRS1

Model1

2

B Std. Error

UnstandardizedCoefficients

Beta

StandardizedCoefficients

t Sig. Tolerance VIF

Collinearity Statistics

Dependent Variable: RINCOM98a.

Relationship of the first independent variable and the dependent variable – evidence and

answer 2

The b coefficient for the relationship between the dependent variable "income" [rincom98] and the independent variable "number of hours worked in the past week" [hrs1]. was .118, which implies a direct relationship because the sign of the coefficient is positive. Higher numeric values for the independent variable "number of hours worked in the past week" [hrs1] are associated with higher numeric values for the dependent variable "income" [rincom98]. The statement in the problem that "survey respondents who worked longer hours in the past week had higher incomes" is correct.

True with caution is the correct answer. Caution in interpreting the relationship should be exercised because of cases to variables ratio less than 50:1; and an ordinal variable treated as metric.

Page 55: SW388R7 Data Analysis & Computers II Slide 1 Stepwise Multiple Regression Differences between stepwise and other methods of multiple regression Sample

SW388R7Data Analysis

& Computers II

Slide 55

Relationship of the second independent variable and the dependent variable - question

Page 56: SW388R7 Data Analysis & Computers II Slide 1 Stepwise Multiple Regression Differences between stepwise and other methods of multiple regression Sample

SW388R7Data Analysis

& Computers II

Slide 56

Relationship of the second independent variable and the dependent variable – evidence

and answer

The independent variable "self-employment" [wrkslf] was not included in the regression equation. It did not increase the percentage of variance explained in the dependent variable by an amount large enough to be statistically significant.

False is the correct answer.

Page 57: SW388R7 Data Analysis & Computers II Slide 1 Stepwise Multiple Regression Differences between stepwise and other methods of multiple regression Sample

SW388R7Data Analysis

& Computers II

Slide 57

Relationship of the third independent variable and the dependent variable - question

Page 58: SW388R7 Data Analysis & Computers II Slide 1 Stepwise Multiple Regression Differences between stepwise and other methods of multiple regression Sample

SW388R7Data Analysis

& Computers II

Slide 58

Relationship of the third independent variable and the dependent variable – evidence and

answer 1

In the table of variables entered and removed, "occupational prestige score" [prestg80] was added to the regression equation in model 1.

The increase in R Square as a result of including this variable was .180 which was statistically significant, F(1, 143) = 31.350, p<0.001.

Page 59: SW388R7 Data Analysis & Computers II Slide 1 Stepwise Multiple Regression Differences between stepwise and other methods of multiple regression Sample

SW388R7Data Analysis

& Computers II

Slide 59

Coefficientsa

6.669 1.358 4.911 .000

.158 .028 .424 5.599 .000 1.000 1.000

2.862 1.632 1.754 .082

.135 .028 .363 4.898 .000 .954 1.049

.118 .031 .285 3.846 .000 .954 1.049

(Constant)

PRESTG80

(Constant)

PRESTG80

HRS1

Model1

2

B Std. Error

UnstandardizedCoefficients

Beta

StandardizedCoefficients

t Sig. Tolerance VIF

Collinearity Statistics

Dependent Variable: RINCOM98a.

Relationship of the third independent variable and the dependent variable – evidence and

answer 2

The b coefficient for the relationship between the dependent variable "income" [rincom98] and the independent variable "occupational prestige score" [prestg80]. was .135, which implies a direct relationship because the sign of the coefficient is positive. Higher numeric values for the independent variable "occupational prestige score" [prestg80] are associated with higher numeric values for the dependent variable "income" [rincom98].

The statement in the problem that "survey respondents who had more prestigious occupations had lower incomes" is incorrect. The direction of the relationship is stated incorrectly.

False is the correct answer.

Page 60: SW388R7 Data Analysis & Computers II Slide 1 Stepwise Multiple Regression Differences between stepwise and other methods of multiple regression Sample

SW388R7Data Analysis

& Computers II

Slide 60

Validation analysis - question

The problem states the random number seed to use in the validation analysis.

Page 61: SW388R7 Data Analysis & Computers II Slide 1 Stepwise Multiple Regression Differences between stepwise and other methods of multiple regression Sample

SW388R7Data Analysis

& Computers II

Slide 61

Validation analysis:set the random number seed

To set the random number seed, select the Random Number Seed… command from the Transform menu.

Validate the results of your regression analysis by conducting a 75/25% cross-validation, using 200070 as the random number seed.

Page 62: SW388R7 Data Analysis & Computers II Slide 1 Stepwise Multiple Regression Differences between stepwise and other methods of multiple regression Sample

SW388R7Data Analysis

& Computers II

Slide 62

Set the random number seed

First, click on the Set seed to option button to activate the text box.

Second, type in the random seed stated in the problem.

Third, click on the OK button to complete the dialog box.

Note that SPSS does not provide you with any feedback about the change.

Page 63: SW388R7 Data Analysis & Computers II Slide 1 Stepwise Multiple Regression Differences between stepwise and other methods of multiple regression Sample

SW388R7Data Analysis

& Computers II

Slide 63

Validation analysis:compute the split variable

To enter the formula for the variable that will split the sample in two parts, click on the Compute… command.

Page 64: SW388R7 Data Analysis & Computers II Slide 1 Stepwise Multiple Regression Differences between stepwise and other methods of multiple regression Sample

SW388R7Data Analysis

& Computers II

Slide 64

The formula for the split variable

First, type the name for the new variable, split, into the Target Variable text box.

Second, the formula for the value of split is shown in the text box.

The uniform(1) function generates a random decimal number between 0 and 1. The random number is compared to the value 0.75.

If the random number is less than or equal to 0.75, the value of the formula will be 1, the SPSS numeric equivalent to true. If the random number is larger than 0.75, the formula will return a 0, the SPSS numeric equivalent to false.Third, click on the OK

button to complete the dialog box.

Page 65: SW388R7 Data Analysis & Computers II Slide 1 Stepwise Multiple Regression Differences between stepwise and other methods of multiple regression Sample

SW388R7Data Analysis

& Computers II

Slide 65

The split variable in the data editor

In the data editor, the split variable shows a random pattern of zero’s and one’s.

To select the cases for the training sample, we select the cases where split = 1.

Page 66: SW388R7 Data Analysis & Computers II Slide 1 Stepwise Multiple Regression Differences between stepwise and other methods of multiple regression Sample

SW388R7Data Analysis

& Computers II

Slide 66

Repeat the regression for the validation

To repeat the multiple regression analysis for the validation sample, select Regression | Linear from the Analyze tool button.

Page 67: SW388R7 Data Analysis & Computers II Slide 1 Stepwise Multiple Regression Differences between stepwise and other methods of multiple regression Sample

SW388R7Data Analysis

& Computers II

Slide 67

Using "split" as the selection variable

First, scroll down the list of variables and highlight the variable split.

Second, click on the right arrow button to move the split variable to the Selection Variable text box.

Page 68: SW388R7 Data Analysis & Computers II Slide 1 Stepwise Multiple Regression Differences between stepwise and other methods of multiple regression Sample

SW388R7Data Analysis

& Computers II

Slide 68

Setting the value of split to select cases

When the variable named split is moved to the Selection Variable text box, SPSS adds "=?" after the name to prompt up to enter a specific value for split.

Click on the Rule… button to enter a value for split.

Page 69: SW388R7 Data Analysis & Computers II Slide 1 Stepwise Multiple Regression Differences between stepwise and other methods of multiple regression Sample

SW388R7Data Analysis

& Computers II

Slide 69

Completing the value selection

First, type the value for the training sample, 1, into the Value text box.

Second, click on the Continue button to complete the value entry.

Page 70: SW388R7 Data Analysis & Computers II Slide 1 Stepwise Multiple Regression Differences between stepwise and other methods of multiple regression Sample

SW388R7Data Analysis

& Computers II

Slide 70

Requesting output for the validation analysis

When the value entry dialog box is closed, SPSS adds the value we entered after the equal sign. This specification now tells SPSS to include in the analysis only those cases that have a value of 1 for the split variable.

Click on the OK button to

request the output.

Page 71: SW388R7 Data Analysis & Computers II Slide 1 Stepwise Multiple Regression Differences between stepwise and other methods of multiple regression Sample

SW388R7Data Analysis

& Computers II

Slide 71

Validation – Overall Relationship

The validation analysis requires that the regression model for the 75% training sample replicate the pattern of statistical significance found for the full data set.

In the analysis of the 75% training sample, the relationship between the set of independent variables and the dependent variable was statistically significant, F(2, 105) = 20.195, p<0.001, as was the overall relationship in the analysis of the full data set, F(2, 142) = 24.581, p<0.001.

Page 72: SW388R7 Data Analysis & Computers II Slide 1 Stepwise Multiple Regression Differences between stepwise and other methods of multiple regression Sample

SW388R7Data Analysis

& Computers II

Slide 72

Validation - Relationship of Individual Independent Variables to Dependent Variable

In stepwise multiple regression, the pattern of individual relationships between the dependent variable and the independent variables will be the same if the same variables are selected as predictors for the analysis using the full data set and the analysis using the 75% training sample. In this analysis, the same two variables entered into the regression model: "occupational prestige score" [prestg80]; and "number of hours worked in the past week" [hrs1].

Page 73: SW388R7 Data Analysis & Computers II Slide 1 Stepwise Multiple Regression Differences between stepwise and other methods of multiple regression Sample

SW388R7Data Analysis

& Computers II

Slide 73

Validation - Comparison of Training Sample and Validation Sample

The total proportion of variance explained in the model using the training sample was 27.8% (.527²), compared to 70.7% (.841²) for the validation sample. The value of R² for the validation sample was actually larger than the value of R² for the training sample, implying a better fit than obtained for the training sample. This supports a conclusion that the regression model would be effective in predicting scores for cases other than those included in the sample.

The validation analysis supported the generalizability of the findings of the analysis to the population represented by the sample in the data set.

The answer to the question is true.

Page 74: SW388R7 Data Analysis & Computers II Slide 1 Stepwise Multiple Regression Differences between stepwise and other methods of multiple regression Sample

SW388R7Data Analysis

& Computers II

Slide 74

Steps in complete stepwise regression analysis

The following flow charts depict the process for solving the complete regression problem and determining the answer to each of the questions encountered in the complete analysis.

Text in italics (e.g. True, False, True with caution, Incorrect application of a statistic) represent the answers to each specific question.

Many of the steps in stepwise regression analysis are identical to the steps in standard regression analysis. Steps that are different are identified with a magenta background, with the specifics of the difference underlined.

Page 75: SW388R7 Data Analysis & Computers II Slide 1 Stepwise Multiple Regression Differences between stepwise and other methods of multiple regression Sample

SW388R7Data Analysis

& Computers II

Slide 75

Complete stepwise multiple regression analysis:

level of measurement

Incorrect application of a statistic

NoIs the dependent variable metric and the independent variables metric or dichotomous?

Yes

Question: do variables included in the analysis satisfy the level of measurement requirements?

Ordinal variables included in the relationship?

No

Yes

True

True with caution

Examine all independent variables – controls as well as predictors

Page 76: SW388R7 Data Analysis & Computers II Slide 1 Stepwise Multiple Regression Differences between stepwise and other methods of multiple regression Sample

SW388R7Data Analysis

& Computers II

Slide 76

Complete stepwise multiple regression analysis:

sample size

Compute the baseline regression in SPSS

Yes

Ratio of cases to independent variables at least 5 to 1?

Yes

No Inappropriate application of a statistic

Question: Number of variables and cases satisfy sample size requirements?

Yes

Ratio of cases to independent variables at preferred sample size of at least 50 to 1?

No

True

True with caution

Include both controls and predictors, in the count of independent variables

Page 77: SW388R7 Data Analysis & Computers II Slide 1 Stepwise Multiple Regression Differences between stepwise and other methods of multiple regression Sample

SW388R7Data Analysis

& Computers II

Slide 77

Question: each metric variable satisfies the assumption of normality?

Complete stepwise multiple regression analysis: assumption of normality

The variable satisfies criteria for a normal distribution?

Yes

Use transformation in revised model, no caution needed

Log, square root, or inverse transformation satisfies normality?

If more than one transformation satisfies normality, use one with smallest skew

True

False

Yes

No

No

Use untransformed variable in analysis, add caution to interpretation for violation of normality

Test the dependent variable and independent variables

Page 78: SW388R7 Data Analysis & Computers II Slide 1 Stepwise Multiple Regression Differences between stepwise and other methods of multiple regression Sample

SW388R7Data Analysis

& Computers II

Slide 78

Question: relationship between dependent variable and metric independent variable satisfies assumption of linearity?

Complete stepwise multiple regression analysis:

assumption of linearity

Probability of Pearson correlation (r) <= level of significance?

Probability of correlation (r) for relationship with any transformation of IV <= level of significance?

No

Weak relationship.No caution

needed

No

Use transformation in revised model

Yes

If independent variable was transformed to satisfy normality, skip check for linearity. If more than one

transformation satisfies linearity, use one with largest r

If dependent variable was transformed for normality, use transformed dependent variable in the test for linearity.

Yes

True

Page 79: SW388R7 Data Analysis & Computers II Slide 1 Stepwise Multiple Regression Differences between stepwise and other methods of multiple regression Sample

SW388R7Data Analysis

& Computers II

Slide 79

Complete stepwise multiple regression analysis:

assumption of homogeneity of variance

Probability of Levene statistic <= level of significance?

Yes

No

If dependent variable was transformed for normality, substitute transformed dependent variable in the test for the assumption of homogeneity of variance

Question: variance in dependent variable is uniform across the categories of a dichotomous independent variable?

True

Do not test transformations of dependent variable, add caution to interpretation for violation of homoscedasticity

False

Page 80: SW388R7 Data Analysis & Computers II Slide 1 Stepwise Multiple Regression Differences between stepwise and other methods of multiple regression Sample

SW388R7Data Analysis

& Computers II

Slide 80

Complete stepwise multiple regression analysis: detecting outliers

No

Is the standardized residual for any case greater than +/-3.00?

Question: After incorporating any transformations, no outliers

were detected in the regression analysis.

If any variables were transformed for normality or linearity, substitute transformed variables in the regression for the detection of outliers.

Yes False

TrueRemove outliers and run revised regression again.

Page 81: SW388R7 Data Analysis & Computers II Slide 1 Stepwise Multiple Regression Differences between stepwise and other methods of multiple regression Sample

SW388R7Data Analysis

& Computers II

Slide 81

Complete stepwise multiple regression analysis: picking regression model for interpretation

R² for revised regression greater than R² for baseline regression by 2% or more?

Pick baseline regression with untransformed variables and all cases for interpretation

Pick revised regression with transformations and omitting outliers for interpretation

Yes No

Question: interpretation based on model that includes transformation of variables and removes outliers?

FalseTrue

Page 82: SW388R7 Data Analysis & Computers II Slide 1 Stepwise Multiple Regression Differences between stepwise and other methods of multiple regression Sample

SW388R7Data Analysis

& Computers II

Slide 82

Complete stepwise multiple regression analysis: assumption of independence of errors

Residuals are independent,Durbin-Watson between 1.5 and 2.5?

No

Yes

Question: serial correlation of errors is not a problem in this regression analysis?

NOTE: caution for violation of assumption of independence of errors

False

True

Page 83: SW388R7 Data Analysis & Computers II Slide 1 Stepwise Multiple Regression Differences between stepwise and other methods of multiple regression Sample

SW388R7Data Analysis

& Computers II

Slide 83

Complete stepwise multiple regression analysis: multicollinearity

Tolerance for all IV’s greater than 0.10, indicating no multicollinearity?

No

Yes

False

Question: Multicollinearity is not a problem in this regression analysis?

True

NOTE: halt the analysis if it is not okay to simply exclude the variable from the analysis.

Page 84: SW388R7 Data Analysis & Computers II Slide 1 Stepwise Multiple Regression Differences between stepwise and other methods of multiple regression Sample

SW388R7Data Analysis

& Computers II

Slide 84

Complete stepwise multiple regression analysis:

overall relationship

Yes

NoFalse

Yes

NoFalse

No

Yes

True

True with cautionSmall sample, ordinal variables, or violation of assumption in the relationship?

Question: Finding about overall relationship between dependent variable and independent variables.

Probability of F test of regression for last model <= level of significance?

Strength of relationship for included variables interpreted correctly?

Page 85: SW388R7 Data Analysis & Computers II Slide 1 Stepwise Multiple Regression Differences between stepwise and other methods of multiple regression Sample

SW388R7Data Analysis

& Computers II

Slide 85

Complete stepwise multiple regression analysis:

subset of best predictors

Yes

NoFalse

No

Yes

True

True with cautionSmall sample, ordinal variables, or violation of assumption in the relationship?

Question: Finding about list of best subset of predictors?

Listed variables match variables in table of variables entered/removed.

Page 86: SW388R7 Data Analysis & Computers II Slide 1 Stepwise Multiple Regression Differences between stepwise and other methods of multiple regression Sample

SW388R7Data Analysis

& Computers II

Slide 86

Complete stepwise multiple regression analysis:

individual relationships - 1

Yes

Significance of R2 change for variable <= level of significance?

Yes

NoFalse

Question: Finding about individual relationship between independent variable and dependent variable.

Yes

Order of entry into regression equation stated correctly?

NoFalse

Page 87: SW388R7 Data Analysis & Computers II Slide 1 Stepwise Multiple Regression Differences between stepwise and other methods of multiple regression Sample

SW388R7Data Analysis

& Computers II

Slide 87

Complete stepwise multiple regression analysis:

individual relationships - 2

Yes

Direction of relationship between included variables and DV interpreted oorrectly?

Yes

NoFalse

No

Yes

True

True with cautionSmall sample, ordinal variables, or violation of assumption in the relationship?

Page 88: SW388R7 Data Analysis & Computers II Slide 1 Stepwise Multiple Regression Differences between stepwise and other methods of multiple regression Sample

SW388R7Data Analysis

& Computers II

Slide 88

Complete stepwise multiple regression analysis: validation analysis - 1

Yes

Probability of ANOVA test for training sample <= level of significance?

Yes

NoFalse

Set the random seed and randomly split the sample into 75% training sample and 25% validation sample.

Question: The validation analysis supports the generalizability of the findings?

Page 89: SW388R7 Data Analysis & Computers II Slide 1 Stepwise Multiple Regression Differences between stepwise and other methods of multiple regression Sample

SW388R7Data Analysis

& Computers II

Slide 89

Complete stepwise multiple regression analysis: validation analysis - 2

Yes

Yes

Shrinkage in R² (R² for training sample - R² for validation sample) < 2%?

NoFalse

True

Same variables entered into regression equation in training sample?

NoFalse

Page 90: SW388R7 Data Analysis & Computers II Slide 1 Stepwise Multiple Regression Differences between stepwise and other methods of multiple regression Sample

SW388R7Data Analysis

& Computers II

Slide 90

Homework ProblemsMultiple Regression – Stepwise Problems - 1

The stepwise regression homework problems parallel the complete standard regression problems and the complete hierarchical problems. The only assumption made is the problems is that there is no problem with missing data.

The complete stepwise multiple regression will include:•Testing assumptions of normality and linearity •Testing for outliers•Determining whether to use transformations or exclude outliers, •Testing for independence of errors, •Checking for multicollinearty, and •Validating the generalizability of the analysis.

Page 91: SW388R7 Data Analysis & Computers II Slide 1 Stepwise Multiple Regression Differences between stepwise and other methods of multiple regression Sample

SW388R7Data Analysis

& Computers II

Slide 91

Homework ProblemsMultiple Regression – Stepwise Problems - 2

The statement of the stepwise regression problem identifies the dependent variable and the independent variables from which we will extract a parsimonious subset.

Page 92: SW388R7 Data Analysis & Computers II Slide 1 Stepwise Multiple Regression Differences between stepwise and other methods of multiple regression Sample

SW388R7Data Analysis

& Computers II

Slide 92

Homework ProblemsMultiple Regression – Stepwise Problems - 3

The findings, which must all be correct for a problem to be true, include: • an ordered listing of the included independent variables • an interpretive statement about each of the independent variables.• a statement about the strength of the overall relationship.

Page 93: SW388R7 Data Analysis & Computers II Slide 1 Stepwise Multiple Regression Differences between stepwise and other methods of multiple regression Sample

SW388R7Data Analysis

& Computers II

Slide 93

Homework ProblemsMultiple Regression – Stepwise Problems - 4

The first prerequisite for a problem is the satisfaction of the level of measurement and minimum sample size requirements.

Failing to satisfy either of these requirement results in an inappropriate application of a statistic.

Page 94: SW388R7 Data Analysis & Computers II Slide 1 Stepwise Multiple Regression Differences between stepwise and other methods of multiple regression Sample

SW388R7Data Analysis

& Computers II

Slide 94

Homework ProblemsMultiple Regression – Stepwise Problems - 5

The assumption of normality requires that each metric variable be tested. If the variable is not normal, transformations should be examined to see if we can improve the distribution of the variable. If transformations are unsuccessful, a caution is added to any true findings.

Page 95: SW388R7 Data Analysis & Computers II Slide 1 Stepwise Multiple Regression Differences between stepwise and other methods of multiple regression Sample

SW388R7Data Analysis

& Computers II

Slide 95

Homework ProblemsMultiple Regression – Stepwise Problems - 6

The assumption of linearity is examined for any metric independent variables that were not transformed for the assumption of normality.

Page 96: SW388R7 Data Analysis & Computers II Slide 1 Stepwise Multiple Regression Differences between stepwise and other methods of multiple regression Sample

SW388R7Data Analysis

& Computers II

Slide 96

Homework ProblemsMultiple Regression – Stepwise Problems - 7

After incorporating any transformations, we look for outliers using standard residuals as the criterion.

Page 97: SW388R7 Data Analysis & Computers II Slide 1 Stepwise Multiple Regression Differences between stepwise and other methods of multiple regression Sample

SW388R7Data Analysis

& Computers II

Slide 97

Homework ProblemsMultiple Regression – Stepwise Problems - 8

We compare the results of the regression without transformations and exclusion of outliers to the model with transformations and excluding outliers to determine whether we will base our interpretation on the baseline or the revised analysis.

Page 98: SW388R7 Data Analysis & Computers II Slide 1 Stepwise Multiple Regression Differences between stepwise and other methods of multiple regression Sample

SW388R7Data Analysis

& Computers II

Slide 98

Homework ProblemsMultiple Regression – Stepwise Problems - 9

We test for the assumption of independence of errors and the presence of multicollinearity.

If we violate the assumption of independence, we attach a caution to our findings.

If there is a mutlicollinearity problem, we halt the analysis, since we may be reporting erroneous findings.

Page 99: SW388R7 Data Analysis & Computers II Slide 1 Stepwise Multiple Regression Differences between stepwise and other methods of multiple regression Sample

SW388R7Data Analysis

& Computers II

Slide 99

Homework ProblemsMultiple Regression – Stepwise Problems - 9

In stepwise regression, we interpret the R² for the overall relationship at the step or model when the last statistically significant variable was entered.

Page 100: SW388R7 Data Analysis & Computers II Slide 1 Stepwise Multiple Regression Differences between stepwise and other methods of multiple regression Sample

SW388R7Data Analysis

& Computers II

Slide 100

Homework ProblemsMultiple Regression – Stepwise Problems - 10

The primary purpose of stepwise regression is to identify the best subset of predictors and the order in which variables were included in the regression equation. The order tells us the relative importance of the predictors, i.e. best predictor, second best, …

Page 101: SW388R7 Data Analysis & Computers II Slide 1 Stepwise Multiple Regression Differences between stepwise and other methods of multiple regression Sample

SW388R7Data Analysis

& Computers II

Slide 101

Homework ProblemsMultiple Regression – Stepwise Problems - 11

The relationships between predictor independent variables and the dependent variable stated in the problem must be statistically significant, and worded correctly for the direction of the relationship. The interpretation of individual predictors is the same for standard, hierarchical, and stepwise regression.

Page 102: SW388R7 Data Analysis & Computers II Slide 1 Stepwise Multiple Regression Differences between stepwise and other methods of multiple regression Sample

SW388R7Data Analysis

& Computers II

Slide 102

Homework ProblemsMultiple Regression – Stepwise Problems - 12

We use a 75-25% validation strategy to support the generalizability of our findings. The validation must support:•the significance of the overall relationship,•the inclusion of the same variables in the validation model that were included in the full model, though not necessarily in the same order, and•the shrinkage in R² for the validation sample must not be more than 2% less than the training sample.

Page 103: SW388R7 Data Analysis & Computers II Slide 1 Stepwise Multiple Regression Differences between stepwise and other methods of multiple regression Sample

SW388R7Data Analysis

& Computers II

Slide 103

Homework ProblemsMultiple Regression – Stepwise Problems - 13

Cautions are added as limitations to the analysis, if needed.