soleman abu-bader, ph.d. · cetla s. abu-bader regression analyses . 1. forward-lr: it is parallel...

24
CETLA S. Abu-Bader Regression Analyses Soleman Abu-Bader, Ph.D.

Upload: others

Post on 11-Oct-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Soleman Abu-Bader, Ph.D. · CETLA S. Abu-Bader Regression Analyses . 1. Forward-LR: It is parallel to the stepwise in multiple regression. Factors are entered in the analysis based

CETLA S. Abu-Bader Regression Analyses

Soleman Abu-Bader, Ph.D.

Page 2: Soleman Abu-Bader, Ph.D. · CETLA S. Abu-Bader Regression Analyses . 1. Forward-LR: It is parallel to the stepwise in multiple regression. Factors are entered in the analysis based

Purpose of Regression Analyses

Regression Equations

Coefficients / Test Statistics

Assumptions of Regression Analyses

Selecting Appropriate Factors for Regression

Methods of Data Entry

Examples

Working with SPSS

Outline

CETLA S. Abu-Bader Regression Analyses

Page 3: Soleman Abu-Bader, Ph.D. · CETLA S. Abu-Bader Regression Analyses . 1. Forward-LR: It is parallel to the stepwise in multiple regression. Factors are entered in the analysis based

Purpose of Regression Analyses

• They are used to examine the relationship between multiple IVs (predictors, factors) and one DV (criterion, outcome).

• They estimate a model of multiple factors that best predicts the criterion.

• In multiple regression, the outcome is a continuous variable, interval level of measurement.

• In binary logistic regression, the outcome is a dichotomous variable, nominal level of measurement, coded as “0” & “1”.

CETLA S. Abu-Bader Regression Analyses

Page 4: Soleman Abu-Bader, Ph.D. · CETLA S. Abu-Bader Regression Analyses . 1. Forward-LR: It is parallel to the stepwise in multiple regression. Factors are entered in the analysis based

Examples of Research Questions • What set of the following factors (IVs) best predicts levels

of depression (outcome) among individuals: gender, age, SES, education, physical health, family support, and self-esteem? (Multiple Regression).

• What is the probability that an individual will be considered clinically depressed (versus not depressed) given his/her gender, age, SES, education, physical health, family support, and self-esteem? (Logistic Regression).

• What set of the following factors correctly predicts whether a first-year doctoral student will pass a qualifying exam: gender, GPA, GRE scores, number of hours spent in the library, and income? (Logistic Regression).

CETLA S. Abu-Bader Regression Analyses

Page 5: Soleman Abu-Bader, Ph.D. · CETLA S. Abu-Bader Regression Analyses . 1. Forward-LR: It is parallel to the stepwise in multiple regression. Factors are entered in the analysis based

Regression Equations

Y

Multiple Regression:

Unstandardized Equation:

Ŷ = a + b1X1 + b2X2 + b3X3 + ….. + biXi

Standardized Equation:

ZŶ = β1ZX1 + β2ZX2 + β3ZX3 + ….. + βiZXi

CETLA S. Abu-Bader Regression Analyses

Logistic Regression

In(Odds) = a + b1X1 + b2X2 + b3X3 + ….. + biXi

Page 6: Soleman Abu-Bader, Ph.D. · CETLA S. Abu-Bader Regression Analyses . 1. Forward-LR: It is parallel to the stepwise in multiple regression. Factors are entered in the analysis based

Coefficients of Multiple Regression

1. Multiple Correlation Coefficient (R): It is the correlation coefficient between the DV (Y) and all factors (X’s).

For size and direction, look at the size and sign of beta.

2. Multiple R2: The % of variance in the DV explained by the multiple factors in the regression equation.

3. Adjusted R2: This is based on the # of factors entered in the analysis as opposed to the sample size.

The greater the # of factors entered compared to the sample size is, the smaller it becomes.

Usually, adjusted R2 is more appropriate than the standard R2 (Simply report both).

CETLA S. Abu-Bader Regression Analyses

Page 7: Soleman Abu-Bader, Ph.D. · CETLA S. Abu-Bader Regression Analyses . 1. Forward-LR: It is parallel to the stepwise in multiple regression. Factors are entered in the analysis based

Coefficients of Multiple Regression

4. R2 Change: This is the % of variance in the DV due to the addition of another factor.

5. F Change and Significance of Change: They represent the F ratio (ANOVA) and whether the R2 Change is statistically significant.

6. Regression Constant (a): This is the Y axis intercept; the value of Y when the values of all Xs are zero.

7. Regression Coefficients (β & b): They are the standardized & unstandardized, respectively, regression coefficients between the criterion and each factor, the slope of the regression line (computed using z scores & raw scores, respectively).

CETLA S. Abu-Bader Regression Analyses

Page 8: Soleman Abu-Bader, Ph.D. · CETLA S. Abu-Bader Regression Analyses . 1. Forward-LR: It is parallel to the stepwise in multiple regression. Factors are entered in the analysis based

Coefficients/Statistics of Logistic Regression

1. Omnibus Tests of Model Coefficient: It tests the overall

regression model using the chi-square test.

It is the same as the ANOVA test in multiple regression analysis.

2. Likelihood-Ratio Test: It examines the goodness-of-fit between

the observed and predicted models; how well the data fit the

population.

This is also known as “–2LL”. The smaller the “–2LL” is, the

better the model fits the population (predicts the outcome).

3. Cox and Snell R2 and Nagelkerke R2: They estimate the % of

variance in the outcome accounted for by the factors.

The two measures are also known as Pseudo R2.

CETLA S. Abu-Bader Regression Analyses

Page 9: Soleman Abu-Bader, Ph.D. · CETLA S. Abu-Bader Regression Analyses . 1. Forward-LR: It is parallel to the stepwise in multiple regression. Factors are entered in the analysis based

4. Hosmer and Lemeshow test: It is the chi-square goodness-of-fit

test.

It examines the overall goodness-of-fit of the predicted model.

A nonsignificant chi-square value is thus desired (p > .05).

5. Wald Test: This is analogous to the t-test in multiple regression

analysis.

It examines the level of significance for each regression

coefficient (b) in the regression model.

Unlike the t-test, it is calculated using standard z scores and the

chi-square distribution.

Coefficients/Statistics of Logistic Regression

CETLA S. Abu-Bader Regression Analyses

Page 10: Soleman Abu-Bader, Ph.D. · CETLA S. Abu-Bader Regression Analyses . 1. Forward-LR: It is parallel to the stepwise in multiple regression. Factors are entered in the analysis based

Assumptions

CETLA S. Abu-Bader Regression Analyses

Assumption Multiple Regression Logistic Regression

Criterion (DV) Continuous Dichotomous

Factors (IVs) Continuous/Dummy Continuous/Dummy

Distribution (DV) Normal N/R

Residuals Normal N/R

Linearity Yes N/R

Homoscedasticity Yes N/R

Multicollinearity Yes Yes

Sample Size N ≥ 50 + 8m n > 5 / cell, + 20% Rule

Page 11: Soleman Abu-Bader, Ph.D. · CETLA S. Abu-Bader Regression Analyses . 1. Forward-LR: It is parallel to the stepwise in multiple regression. Factors are entered in the analysis based

Evaluation of Assumptions

Normality: Examine measures of skewness & inspect histogram and normal Q-Q plot for the DV.

Residuals: Inspect histogram and normal Q-Q plot for the residuals. If normal, data will form a straight diagonal line on the normal probability plot.

Linearity: Inspect the scatterplot for each factor with the DV and then reviewing each scatterplot. Minor deviation from linearity does not greatly affect the results of regression.

Homoscedasticity: Plot the residuals against the predicted values. If the distribution is normal (homoscedastic), data will be distributed equally around a horizontal (reference) line.

CETLA S. Abu-Bader Regression Analyses

Page 12: Soleman Abu-Bader, Ph.D. · CETLA S. Abu-Bader Regression Analyses . 1. Forward-LR: It is parallel to the stepwise in multiple regression. Factors are entered in the analysis based

Evaluation of Assumptions

Multicollinearity: Check the Variance Inflation Factor (VIF) and Tolerance. A VIF value that is greater than 10 usually indicates a multicollinearity problem. A tolerance value smaller than .10 also indicates a multicollinearity problem.

Sample Size:

Multiple Regression: Check the valid number of cases compared to the expected sample size (see formula).

Logistic Regression: Check the Hosmer and Lemeshow test contingency table.

CETLA S. Abu-Bader Regression Analyses

Page 13: Soleman Abu-Bader, Ph.D. · CETLA S. Abu-Bader Regression Analyses . 1. Forward-LR: It is parallel to the stepwise in multiple regression. Factors are entered in the analysis based

Selecting Appropriate Factors for Regression

Enter in the regression analysis only factors that have significant bivariate relationship with the criterion.

If the two do not have a significant relationship to begin with, it is unlikely that one will predict the other.

Determine the appropriate bivariate test statistic to examine the relationship (Pearson’s correlation, t-test, ANOVA, etc.) and then run the test.

Factors that are significantly correlated with the criterion then are entered in the regression analysis while factors that are not significantly correlated with the criterion should not be entered in the analysis.

CETLA S. Abu-Bader Regression Analyses

Page 14: Soleman Abu-Bader, Ph.D. · CETLA S. Abu-Bader Regression Analyses . 1. Forward-LR: It is parallel to the stepwise in multiple regression. Factors are entered in the analysis based

Methods of Data Entry in Multiple Regression 1. Forward: Factor with the largest beta is entered 1st followed by

the 2nd largest beta, and so on. Once a factor is entered, it remains in the equation. This continues until no more factors contribute significantly to the variance.

2. Backward: All factors entered at once. Next, factor that has the smallest beta is removed followed by the 2nd smallest beta. This stops when the variance in the criterion significantly drops.

3. Stepwise: It combines both methods: 1) the one with the largest beta is entered 1st. The factor with the 2nd largest beta is entered next. 2) The contribution of the factors already in the analysis is reassessed. 3) Factors that no longer contribute significantly to the variance are removed.

CETLA S. Abu-Bader Regression Analyses

Page 15: Soleman Abu-Bader, Ph.D. · CETLA S. Abu-Bader Regression Analyses . 1. Forward-LR: It is parallel to the stepwise in multiple regression. Factors are entered in the analysis based

1. Forward-LR: It is parallel to the stepwise in multiple regression. Factors are entered in the analysis based on the maximum likelihood-ratio. The factor with the highest likelihood-ratio is entered 1st, followed by the 2nd highest likelihood- ratio, etc.

2. Forward-Wald: This is another form of a stepwise method. Factors are entered based on their Wald test value.

3. Backward-LR: All factors are entered at once. Next, factors with the smallest likelihood-ratio value will be removed one at a time using the maximum partial likelihood estimates.

4. Backward-Wald: All factors are entered at once. Next, factors with the least significant Wald value will be removed one at a time.

CETLA S. Abu-Bader Regression Analyses

Methods of Data Entry in Logistic Regression

Page 16: Soleman Abu-Bader, Ph.D. · CETLA S. Abu-Bader Regression Analyses . 1. Forward-LR: It is parallel to the stepwise in multiple regression. Factors are entered in the analysis based

EXAMPLES I. What set of the following factors best predicts

levels of depression among older immigrants: gender, home, emotional balance, physical health, mobility, cognitive status, & life satisfaction?

II. What set of the following factors correctly predicts whether a university student will be considered overweight: gender, age, race, physical health, & self-perception?

CETLA S. Abu-Bader Regression Analyses

Page 17: Soleman Abu-Bader, Ph.D. · CETLA S. Abu-Bader Regression Analyses . 1. Forward-LR: It is parallel to the stepwise in multiple regression. Factors are entered in the analysis based

Evaluation of Assumptions (# 1, 2, 3, & 4 for MR; # 5 for both MR & LR; & # 6 for LR):

1. Normality: Skewness, Histograms, & Q-Q Plots

2. Linearity: Scatterplots

3. Normality of Residuals: Histogram & Q-Q Plot

4. Homoscedasticity: Residuals against Predicted Pot

5. Multicollinearity: VIF & Tolerance Values

6. Sample Size: Hosmer and Lemeshow Contingency Table

Run Regression Analyses

Reading SPSS Output

Presenting Results in Tables

Regression Analyses

CETLA S. Abu-Bader Regression Analyses

Page 18: Soleman Abu-Bader, Ph.D. · CETLA S. Abu-Bader Regression Analyses . 1. Forward-LR: It is parallel to the stepwise in multiple regression. Factors are entered in the analysis based

Working with SPSS

CETLA S. Abu-Bader Regression Analyses

Page 19: Soleman Abu-Bader, Ph.D. · CETLA S. Abu-Bader Regression Analyses . 1. Forward-LR: It is parallel to the stepwise in multiple regression. Factors are entered in the analysis based

Multiple Regression SPSS Screens

CETLA S. Abu-Bader Regression Analyses

Page 20: Soleman Abu-Bader, Ph.D. · CETLA S. Abu-Bader Regression Analyses . 1. Forward-LR: It is parallel to the stepwise in multiple regression. Factors are entered in the analysis based

Multiple Regression SPSS Screens

CETLA S. Abu-Bader Regression Analyses

Page 21: Soleman Abu-Bader, Ph.D. · CETLA S. Abu-Bader Regression Analyses . 1. Forward-LR: It is parallel to the stepwise in multiple regression. Factors are entered in the analysis based

Logistic Regression SPSS Screens

CETLA S. Abu-Bader Regression Analyses

Page 22: Soleman Abu-Bader, Ph.D. · CETLA S. Abu-Bader Regression Analyses . 1. Forward-LR: It is parallel to the stepwise in multiple regression. Factors are entered in the analysis based

Logistic Regression SPSS Screens

CETLA S. Abu-Bader Regression Analyses

Page 23: Soleman Abu-Bader, Ph.D. · CETLA S. Abu-Bader Regression Analyses . 1. Forward-LR: It is parallel to the stepwise in multiple regression. Factors are entered in the analysis based

Presentations of Results in Tables

CETLA S. Abu-Bader Regression Analyses

The Results of Multiple Regressiona: Predictors of Depression

Factor R R2b β t p F p

Emotional Balance .46 .21 -.39 -5.02 .000 38.37 <.001

Mobility .50 .25 -.20 -2.68 .008 23.01 <.001

Home .52 .27 .18 2.27 .025 17.51 <.001

a Square Root of Depression b Adjusted R2 = .26

Page 24: Soleman Abu-Bader, Ph.D. · CETLA S. Abu-Bader Regression Analyses . 1. Forward-LR: It is parallel to the stepwise in multiple regression. Factors are entered in the analysis based

Presentations of Results in Tables

CETLA S. Abu-Bader Regression Analyses

Factor B Wald df p -22L R2 Odds-Ratio

Gender -1.30 14.38 1 .000 239.45 .07-.09 .27

Race 1.12 10.57 1 .001 228.48 .12-.16 3.05

Physical Health -.43 4.20 1 .040 219.79 .16-.22 .65

Self-Perception -.06 4.66 1 .030 214.74 .19-.25 .95

a Overall Model: χ2(df=4) = 37.48, p < .001.

b Goodness-of-fit: -2LL = 214.74; χ2(df=8) = 4.59, p = .800.

The Results of Logistic Regressiona, b: Predictors of Overweight