slide 1 hierarchical binary logistic regression. slide 2 hierarchical binary logistic regression in...

of 75/75
Slide 1 Hierarchical Binary Logistic Regression

Post on 28-Dec-2015

224 views

Category:

Documents

6 download

Embed Size (px)

TRANSCRIPT

  • Hierarchical Binary Logistic Regression

    Social Work Statistics

  • Hierarchical Binary Logistic RegressionIn hierarchical binary logistic regression, we are testing a hypothesis or research question that some predictor independent variables improve our ability to predict membership in the modeled category of the dependent variable, after taking into account the relationship between some control independent variables and the dependent variable.

    In multiple regression, we evaluated this question by looking at R2 change, the increase in R2 associated with adding the predictors to the regression analysis.

    The analog to R2 in logistic regression is the Block Chi-square, which is the increase in Model Chi-square associated with the inclusion of the predictors.

    In standard binary logistic regression, we interpreted the SPSS output that compared Block 0, a model with no independent variables, to Block 1, the model that included the independent variables.

    In hierarchical binary logistic regression, the control variables are added SPSS in Block 1, and the predictor variables are added in Block 2, and the interpretation of the overall relationship is based on the change in the relationship from Block 1 to Block 2.

    Social Work Statistics

  • Output for Hierarchical Binary Logistic Regression after control variables are addedIn this example, the control variables do not have a statistically significant relationship to the dependent variable, but they can still serve their purpose as controls.After the controls are added, the measure of error, -2 Log Likelihood, is 195.412.This output is for the sample problem worked below.

    Social Work Statistics

  • Output for Hierarchical Binary Logistic Regression after predictor variables are addedAfter the predictors are added, the measure of error, -2 Log Likelihood, is 168.542.The hierarchical relationship is based on the reduction in error associated with the inclusion of the predictor variables.Model Chi-square is the cumulative reduction in -2 log likelihood for the controls and the predictors.The difference between the -2 log likelihood at Block 1 (195.412) and the -2 log likelihood at Block 2 (168.542) is Block Chi-square (26.870) which is significant at p < .001.

    Social Work Statistics

  • The Problem in BlackboardThe Problem in Blackboard The problem statement tells us:the variables included in the analysis whether each variable should be treated as metric or non-metricthe type of dummy coding and reference category for non-metric variablesthe alpha for both the statistical relationships and for diagnostic tests

    Social Work Statistics

  • The Statement about Level of MeasurementThe first statement in the problem asks about level of measurement. Hierarchical binary logistic regression requires that the dependent variable be dichotomous, the metric independent variables be interval level, and the non-metric independent variables be dummy-coded if they are not dichotomous. SPSS Binary Logistic Regression calls non-metric variables categorical.

    SPSS Binary Logistic Regression will dummy-code categorical variables for us, provided it is useful to use either the first or last category as the reference category.

    Social Work Statistics

  • Marking the Statement about Level of MeasurementMark the check box as a correct statement because:The dependent variable "should marijuana be made legal" [grass] is dichotomous level, satisfying the requirement for the dependent variable.The independent variable "age" [age] is interval level, satisfying the requirement for independent variables. The independent variable "sex" [sex] is dichotomous level, satisfying the requirement for independent variables. The independent variable "strength of religious affiliation" [reliten] is ordinal level, which the problem instructs us to dummy-code as a non-metric variable.The independent variable "general happiness" [happy] is ordinal level, which the problem instructs us to dummy-code as a non-metric variable.

    Social Work Statistics

  • The Statement about OutliersWhile we do not need to be concerned about normality, linearity, and homogeneity of variance, we need to determine whether or not outliers were substantially reducing the classification accuracy of the model.

    To test for outliers, we run the binary logistic regression in SPSS and check for outliers. Next, we exclude the outliers and run the logistic regression a second time. We then compare the accuracy rates of the models with and without the outliers. If the accuracy of the model without outliers is 2% or more accurate than the model with outliers, we interpret the model excluding outliers.

    Social Work Statistics

  • Running the hierarchical binary logistic regressionSelect the Regression | Binary Logistic command from the Analyze menu.

    Social Work Statistics

  • Selecting the dependent variableSecond, click on the right arrow button to move the dependent variable to the Dependent text box.First, highlight the dependent variable grass in the list of variables.

    Social Work Statistics

  • Selecting the control independent variablesFirst, move the control independent variables stated in the problem (age and sex) to the Covariates list box.Second, click on the Next button to start a new block and add the predictor independent variables.

    Social Work Statistics

  • Selecting the predictor independent variablesFirst, move the predictor independent variables stated in the problem (reliten and happy) to the Covariates list box.Second, click on the Categorical button to specify which variables should be dummy coded.Note that the block is now labeled at 2 of 2.

    Social Work Statistics

  • Declare the categorical variables - 1Move the variables sex, reliten, and happy to the Categorical Covariates list box.SPSS assigns its default method for dummy-coding, Indicator coding, to each variable, placing the name of the coding scheme in parentheses after each variable name.

    Social Work Statistics

  • Declare the categorical variables - 2We will also accept the default of using the last category as the reference category for each variable.Click on the Continue button to close the dialog box.We accept the default of using the Indicator method for dummy-coding variable..

    Social Work Statistics

  • Specifying the method for including variablesSince the problem calls for a hierarchical binary logistic regression, we accept the default Enter method for including variables in both blocks.

    Social Work Statistics

  • Adding the values for outliers to the data set - 1Click on the Save button to request the statistics that we want to save.SPSS will calculate the values for standardized residuals and save them to the data set so that we can check for outliers and remove the outliers easily if we need to run a model excluding outliers.

    Social Work Statistics

  • Adding the values for outliers to the data set - 2Second, click on the Continue button to complete the specifications.First, mark the checkbox for Standardized residuals in the Residuals panel.

    Social Work Statistics

  • Requesting the outputWhile optional statistical output is available, we do not need to request any optional statistics.Click on the OK button to request the output.

    Social Work Statistics

  • Detecting the presence of outliers - 1SPSS created a new variable, ZRE_1, which contains the standardized residual. If SPSS finds that the data set already contains a ZRE_1 variable, it will create ZRE_2.

    I find it easier to delete the ZRE_1 variable after each analysis rather than have multiple ZRE_ variables in the data set, requiring that I remember which one goes with which analysis.

    Social Work Statistics

  • Detecting the presence of outliers - 2Click the right mouse button on the column header and select Sort Ascending from the pop-up menu.To detect outliers, we will sort the ZRE_1 column twice:first, in ascending order to identify outliers with a standardized residual of +2.58 or greater.second, in descending order to identify outliers with a standardized residual of -2.58 or less.

    Social Work Statistics

  • Detecting the presence of outliers - 3After scrolling down past the cases with missing data (. in the ZRE_1 column), we see that we have one outlier that has a standardized residual of -2.58 or less.

    Social Work Statistics

  • Detecting the presence of outliers - 4To check for outliers with large positive standardized residuals, click the right mouse button on the column header and select Sort Ascending from the pop-up menu.

    Social Work Statistics

  • Detecting the presence of outliers - 5Since we found outliers, we will run the model excluding them and compare accuracy rates to determine which one we will interpret.

    Had there been no outliers, we would move on to the issue of sample size.After scrolling up to the top of the data set, we see that there are no outliers that have standardized residuals of +2.58 or more.

    Social Work Statistics

  • Running the model excluding outliers - 1We will use a Select Cases command to exclude the outliers from the analysis.

    Social Work Statistics

  • Running the model excluding outliers - 2Second, click on the If button to specify the condition.First, in the Select Cases dialog box, mark the option button If condition is satisfied.

    Social Work Statistics

  • Running the model excluding outliers - 3The formula specifies that we should include cases if the standard score for the standardized residual (ZRE_1) is less than 2.58.

    The abs() or absolute value function tells SPSS to ignore the sign of the value.After typing in the formula, click on the Continue button to close the dialog box.To eliminate the outliers, we request the cases that are not outliers be selected into the analysis.

    Social Work Statistics

  • Running the model excluding outliers - 4SPSS displays the condition we entered on the Select Cases dialog box.Click on the OK button to close the dialog box.

    Social Work Statistics

  • Running the model excluding outliers - 5SPSS indicates which cases are excluded by drawing a slash across the case number.Scrolling down in the data, we see that the outliers and cases with missing values are excluded.

    Social Work Statistics

  • Running the model excluding outliers - 6To run the logistic regression excluding outliers, select Logistic Regression from the Dialog Recall menu.

    Social Work Statistics

  • Running the model excluding outliers - 7Click on the Save button to open the dialog box.The only change we will make is to clear the check box for saving standardized residuals.

    Social Work Statistics

  • Running the model excluding outliers - 8First, clear the check box for Standardized residuals.Second, click on the Continue button to close the dialog box.

    Social Work Statistics

  • Running the model excluding outliers - 9Finally, click on the OK button to request the output.

    Social Work Statistics

  • Accuracy rate of the baseline model including all casesNavigate to the Classification Table for the logistic regression with all cases. To distinguish the two models, I often refer to the first one as the baseline model.The accuracy rate for the model with all cases is 71.3%.

    Social Work Statistics

  • Accuracy rate of the revised model excluding outliersNavigate to the Classification Table for the logistic regression excluding outliers. To distinguish the two models, I often refer to the first one as the revised model.The accuracy rate for the model excluding outliers is 71.1%.

    Social Work Statistics

  • Marking the statement for excluding outliersIn the initial logistic regression model, 1 case had a standardized residual of +2.58 or greater or -2.58 or lower:

    - Case 20001058 had a standardized residual of -2.78The classification accuracy of the model that excluded outliers (71.14%) was not greater by 2% or more than the classification accuracy for the model that included all cases (71.33%). The model including all cases should be interpreted.

    The check box is nor marked because removing outliers did not increase the accuracy of the model. All of the remaining statements will be evaluated based on the output for the model that includes all cases.

    Social Work Statistics

  • The statement about multicollinearity and other numerical problemsMulticollinearity in the logistic regression solution is detected by examining the standard errors for the b coefficients. A standard error larger than 2.0 indicates numerical problems, such as multicollinearity among the independent variables, cells with a zero count for a dummy-coded independent variable because all of the subjects have the same value for the variable, and 'complete separation' whereby the two groups in the dependent event variable can be perfectly separated by scores on one of the independent variables. Analyses that indicate numerical problems should not be interpreted.

    Social Work Statistics

  • Checking for multicollinearityThe standard errors for the variables included in the analysis were: the standard error for "age" [age] was .01, the standard error for survey respondents who said that overall they were not too happy was .92, the standard error for survey respondents who said that overall they were pretty happy was .47, the standard error for survey respondents who said they had no religious affiliation was .53, the standard error for survey respondents who said they had a somewhat strong religious affiliation was .70, the standard error for survey respondents who said they had a not very strong religious affiliation was .47 and the standard error for survey respondents who were male was .39.

    Social Work Statistics

  • Marking the statement about multicollinearity and other numerical problemsSince none of the independent variables in this analysis had a standard error larger than 2.0, we mark the check box to indicate there was no evidence of multicollinearity.

    Social Work Statistics

  • The statement about sample sizeHosmer and Lemeshow, who wrote the widely used text on logistic regression, suggest that the sample size should be 10 cases for every independent variable.

    Social Work Statistics

  • The output for sample sizeThe 150 cases available for the analysis satisfied the recommended sample size of 70 (10 cases per independent variable) for logistic regression recommended by Hosmer and Lemeshow. . We find the number of cases included in the analysis in the Case Processing Summary.

    Social Work Statistics

  • Marking the statement for sample sizeSince we satisfy the sample size requirement, we mark the check box.

    Social Work Statistics

  • The hierarchical relationship between the dependent and independent variablesIn a hierarchical logistic regression, the presence of a relationship between the dependent variable and combination of independent variables entered after the control variables have been taken into account is based on the statistical significance of the block chi-square for the second block of variables in which the predictor independent variables are included.

    Social Work Statistics

  • The output for the hierarchical relationshipIn this analysis, the probability of the block chi-square was was less than or equal to the alpha of 0.05 ((5, N = 150) = 26.87, p < .001). The null hypothesis that there is no difference between the model with only the control variables versus the model with the predictor independent variables was rejected.

    The existence of the hierarchical relationship between the predictor independent variables and the dependent variable was supported.

    Social Work Statistics

  • Marking the statement for hierarchical relationshipSince the hierarchical relationship was statistically significant, we mark the check box.

    Social Work Statistics

  • The statement about the relationship between age and legalization of marijuanaHaving satisfied the criteria for the hierarchical relationship, we examine the findings for individual relationships with the dependent variable. If the overall relationship were not significant, we would not interpret the individual relationships.The first statement concerns the relationship between age and legalization of marijuana.

    Social Work Statistics

  • Output for the relationship between age and legalization of marijuanaThe probability of the Wald statistic for the control independent variable "age" [age] ((1, N = 150) = 1.83, p = .176) was greater than the level of significance of .05. The null hypothesis that the b coefficient for "age" [age] was equal to zero was not rejected. "Age" [age] does not have an impact on the odds that survey respondents supported the legalization of marijuana. The analysis does not support the relationship that 'For each unit increase in "age", survey respondents were 1.7% less likely to supported the legalization of marijuana'

    Social Work Statistics

  • Marking the statement for relationship between age and legalization of marijuanaSince the relationship was not statistically significant, we do not mark the check box for the statement.

    Social Work Statistics

  • Statement for relationship between general happiness and legalization of marijuanaThe next statement concerns the relationship between the dummy-coded variable for general happiness and legalization of marijuana.

    Social Work Statistics

  • Output for relationship between general happiness and legalization of marijuanaThe probability of the Wald statistic for the predictor independent variable survey respondents who said that overall they were not too happy ((1, N = 150) = 13.96, p < .001) was less than or equal to the level of significance of .05. The null hypothesis that the b coefficient for survey respondents who said that overall they were not too happy was equal to zero was rejected. The value of Exp(B) for the variable survey respondents who said that overall they were not too happy was 31.642 which implies the odds were multiplied by approximately 31.6 times. The statement that 'Survey respondents who said that overall they were not too happy were approximately 31.6 times more likely to supported the legalization of marijuana compared to those who said that overall they were very happy' is correct.

    Social Work Statistics

  • Marking the relationship between general happiness and legalization of marijuanaSince the relationship was statistically significant, and survey respondents who said that overall they were not too happy were approximately 31.6 times more likely to supported the legalization of marijuana compared to those who said that overall they were very happy is correct, the statement is marked.

    Social Work Statistics

  • Statement for relationship between general happiness and legalization of marijuanaThe next statement concerns the relationship between the dummy-coded variable for general happiness and legalization of marijuana.

    Social Work Statistics

  • Output for relationship between general happiness and legalization of marijuanaThe probability of the Wald statistic for the predictor independent variable survey respondents who said that overall they were pretty happy ((1, N = 150) = 3.42, p = .064) was greater than the level of significance of .05. The null hypothesis that the b coefficient for survey respondents who said that overall they were pretty happy was equal to zero was not rejected. Survey respondents who said that overall they were pretty happy does not have an impact on the odds that survey respondents supported the legalization of marijuana. The analysis does not support the relationship that 'Survey respondents who said that overall they were pretty happy were approximately two and a quarter times more likely to supported the legalization of marijuana compared to those who said that overall they were very happy'

    Social Work Statistics

  • Marking the relationship between general happiness and legalization of marijuanaSince the relationship was not statistically significant, we do not mark the check box for the statement.

    Social Work Statistics

  • Statement for relationship between religious affiliation and legalization of marijuanaThe next statement concerns the relationship between the dummy-coded variable for religious affiliation and legalization of marijuana.

    Social Work Statistics

  • Output for relationship between religious affiliation and legalization of marijuanaThe probability of the Wald statistic for the predictor independent variable survey respondents who said they had no religious affiliation ((1, N = 150) = 4.39, p = .036) was less than or equal to the level of significance of .05. The null hypothesis that the b coefficient for survey respondents who said they had no religious affiliation was equal to zero was rejected. The value of Exp(B) for the variable survey respondents who said they had no religious affiliation was 3.035 which implies the odds increased by approximately three times. The statement that 'Survey respondents who said they had no religious affiliation were approximately three times more likely to supported the legalization of marijuana compared to those who said they had a strong religious affiliation' is correct.

    Social Work Statistics

  • Marking the relationship between religious affiliation and legalization of marijuanaSince the relationship was statistically significant, and survey respondents who said they had no religious affiliation were approximately three times more likely to supported the legalization of marijuana compared to those who said they had a strong religious affiliation is correct , the statement is marked.

    Social Work Statistics

  • Statement for relationship between religious affiliation and legalization of marijuanaThe next statement concerns the relationship between the dummy-coded variable for a somewhat strong religious affiliation and legalization of marijuana.

    Social Work Statistics

  • Output for the relationship between religious affiliation and legalization of marijuanaThe probability of the Wald statistic for the predictor independent variable survey respondents who said they had a somewhat strong religious affiliation ((1, N = 150) = .67, p = .414) was greater than the level of significance of .05. The null hypothesis that the b coefficient for survey respondents who said they had a somewhat strong religious affiliation was equal to zero was not rejected. Survey respondents who said they had a somewhat strong religious affiliation does not have an impact on the odds that survey respondents support the legalization of marijuana. The analysis does not support the relationship that 'Survey respondents who said they had a somewhat strong religious affiliation were 43.7% less likely to support the legalization of marijuana compared to those who said they had a strong religious affiliation'

    Social Work Statistics

  • Marking the relationship between religious affiliation and legalization of marijuanaSince the relationship was not statistically significant, we do not mark the check box for the statement.

    Social Work Statistics

  • Statement for relationship between religious affiliation and legalization of marijuanaThe next statement concerns the relationship between the dummy-coded variable for a not very strong religious affiliation and legalization of marijuana.

    Social Work Statistics

  • Output for the relationship between religious affiliation and legalization of marijuanaThe probability of the Wald statistic for the predictor independent variable survey respondents who said they had a not very strong religious affiliation ((1, N = 150) = .24, p = .626) was greater than the level of significance of .05. The null hypothesis that the b coefficient for survey respondents who said they had a not very strong religious affiliation was equal to zero was not rejected. Survey respondents who said they had a not very strong religious affiliation does not have an impact on the odds that survey respondents support the legalization of marijuana. The analysis does not support the relationship that 'Survey respondents who said they had a not very strong religious affiliation were 25.8% more likely to support the legalization of marijuana compared to those who said they had a strong religious affiliation'

    Social Work Statistics

  • Marking the relationship between religious affiliation and legalization of marijuanaSince the relationship was not statistically significant, the check box is not marked.

    Social Work Statistics

  • The statement for the relationship between sex and legalization of marijuanaThe next statement concerns the relationship between the sex and legalization of marijuana.

    Social Work Statistics

  • Output for the relationship between sex and legalization of marijuanaThe probability of the Wald statistic for the control independent variable survey respondents who were male ((1, N = 150) = .13, p = .719) was greater than the level of significance of .05. The null hypothesis that the b coefficient for survey respondents who were male was equal to zero was not rejected. Survey respondents who were male does not have an impact on the odds that survey respondents support the legalization of marijuana. The analysis does not support the relationship that 'Survey respondents who were male were 13.1% less likely to support the legalization of marijuana compared to those who were female'

    Social Work Statistics

  • Marking the statement for the relationship between sex and legalization of marijuanaSince the relationship was not statistically significant, the check box is not marked.

    Social Work Statistics

  • Statement about the usefulness of the model based on classification accuracyThe final statement concerns the usefulness of the logistic regression model. The independent variables could be characterized as useful predictors distinguishing survey respondents who use a computer from survey respondents who not use a computer if the classification accuracy rate was substantially higher than the accuracy attainable by chance alone. Operationally, the classification accuracy rate should be 25% or more higher than the proportional by chance accuracy rate.

    Social Work Statistics

  • Computing proportional by-chance accuracy rateThe proportional by chance accuracy rate was computed by calculating the proportion of cases for each group based on the number of cases in each group in the classification table at Step 0, and then squaring and summing the proportion of cases in each group (.633 + .367 = .536). The proportion in the largest group is 63.3% or .633. The proportion in the other group is 1.0 0.633 = .367.At Block 0 with no independent variables in the model, all of the cases are predicted to be members of the modal group, 1=Legal in this example.

    Social Work Statistics

  • Output for the usefulness of the model based on classification accuracyTo be characterized as a useful model, the accuracy rate should be 25% higher than the by chance accuracy rate.

    The by chance accuracy criteria is compute by multiplying the by chance accurate rate of .536 times 1.25, or 1.25 x .536 = .669 (66.9%).. The classification accuracy rate computed by SPSS was 71.3% which was greater than or equal to the proportional by chance accuracy criteria of 66.9% (1.25 x 53.6% = 66.9%). The criteria for classification accuracy is satisfied.

    The criteria for classification accuracy is satisfied.

    Social Work Statistics

  • Marking the statement for usefulness of the modelSince the criteria for classification accuracy was satisfied, the check box is marked.

    Social Work Statistics

  • Hierarchical Binary Logistic Regression: Level of MeasurementOrdinal level variable treated as metric?Level of measurement ok?Consider limitation in discussion of findings Mark check box for level of measurementDo not mark check box for level of measurementMark: Inappropriate application of the statisticStop

    Social Work Statistics

  • Standard Binary Logistic Regression: Exclude OutliersRun Baseline Binary Logistic Regression, Including All Cases,Requesting Standardized ResidualsAccuracy rate for revisedModel >= accuracy rate for baseline model + 2%Run Revised Binary Logistic Regression, Excluding Outliers (standardizedResiduals >= 2.58)Interpret baseline model

    Interpret revised model

    Mark check box for excluding outliersDo not mark check box for excluding outliers

    Social Work Statistics

  • Hierarchical Binary Logistic Regression: Multicollinearity and Sample SizeMulticollinearity/Numerical Problems (S. E. > 2.0)Stop

    Adequate Sample Size(Number of IVs x 10)Consider limitation in discussion of findings Mark check box for no multicollinearityDo not mark check box for no multicollinearityMark check box for sample sizeDo not mark check box for sample size

    Social Work Statistics

  • Hierarchical Binary Logistic Regression: Hierarchical RelationshipProbability of Block Chi-square for Block 2 Do not mark check box for hierarchical relationshipStop

    Mark check box for hierarchical relationshipThe biggest distinction between hierarchical and standard models is our focus on the contribution of the predictors in addition to the controls.

    Social Work Statistics

  • Hierarchical Binary Logistic Regression: Individual RelationshipsIndividual relationship(Wald Sig )?NoMark check box for individual relationshipCorrect interpretation of direction and strength of relationship?Do not mark check box for individual relationshipAdditional individualRelationships to interpret?Yes

    Social Work Statistics

  • Hierarchical Binary Logistic Regression: Classification AccuracyClassification accuracy > 1.25 x by chance accuracy rateDo not mark check box for classification accuracyMark check box for classification accuracy

    Social Work Statistics