tutorial contingency table

21
A crosstabulation displays the number of cases in each category defined by two or more grouping variables. For example, we can display the number of sales employees in each division in each office location. Crosstabulations are useful for summarizing categorical variables -- variables with a limited number of distinct categories. 1

Upload: cart11

Post on 14-Apr-2015

14 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Tutorial Contingency Table

A crosstabulation displays the number of cases in each category defined by two or more grouping variables.

For example, we can display the number of sales employees in each division in each office location.

Crosstabulations are useful for summarizing categorical variables -- variables with a limited number of distinct categories.

1

Page 2: Tutorial Contingency Table

The chi-square measures test the hypothesis that the row and column variables in a crosstabulation are independ

A low significance value (typically below 0.05) indicates that there may be some relationship between the two variables.

While the chi-square measures may indicate that there is a relationship between two variables, they do not indicate the strength or direction of the relationship.

2

Page 3: Tutorial Contingency Table

The nominal directional measures indicate both the strength and significance of the relationship between the row and column variables of a crosstabulation.

The value of each statistic can range from 0 to 1 and indicates the proportional reduction in error in predicting the value of one variable based on the value of the other variable.

3

Page 4: Tutorial Contingency Table

For example, a test statistic value of 0.021 indicates that you have only reduced the error rate by 2.1% over what you could expect by random chance.

In this example, the low significance values for both tau and the uncertainty coefficient indicate that there is a relationship between the two variables...

4

Page 5: Tutorial Contingency Table

But the low values for both test statistics indicate that the relationship between the two variables is a fairly weak one.

The nominal directional measures are appropriate when both variables are nominal, categorical variables.

5

Page 6: Tutorial Contingency Table

Somers' d is an ordinal directional measure that indicates the significance, strength and direction of the relationship between the row and column variables of a crosstabulation.

A low significance value (typically less than 0.05) indicates that there is a relationship between the two variables.

6

Page 7: Tutorial Contingency Table

The value of the statistic can range from -1 to 1.

Negative values indicate a negative relationship, and positive values indicate a positive relationship

In this example, the low significance values for Somers' d indicate that there is a relationship between the two variables...

7

Page 8: Tutorial Contingency Table

But the low values for the test statistic indicate that the relationship between the two variables is a fairly weak one.

Somers' d is appropriate when both variables are ordinal, categorical variables.

8

Page 9: Tutorial Contingency Table

The nominal symmetric measures indicate both the strength and significance of the relationship between the row and column variables of a crosstabulation.

The value of each statistic can range from 0 to 1.

9

Page 10: Tutorial Contingency Table

Phi is only appropriate for 2x2 tables.

In this example, the low significance values for both Cramer's V and the contingency coefficient indicate that there is a relationship between the two variables...

10

Page 11: Tutorial Contingency Table

But the low values for the test statistics indicate that the relationship between the two variables is a fairly weak one.

The nominal symmetric measures are appropriate when both variables are nominal, categorical variables.

11

Page 12: Tutorial Contingency Table

The ordinal symmetric measures indicate the significance, strength and direction of the relationship between the row and column variables of a crosstabulation.

A low significance value (typically less than 0.05) indicates that there is a relationship between the two variables.

12

Page 13: Tutorial Contingency Table

The values of the test statistics can range from -1 to 1.

Negative values indicate a negative relationship, and positive values indicate a positive relationship.

In this example, the low significance values indicate that there is a relationship between the two variables...

13

Page 14: Tutorial Contingency Table

But the low values for the test statistics indicate that the relationship between the two variables is a fairly weak one.

The ordinal symmetric measures are appropriate when both variables are ordinal, categorical variables.

14

Page 15: Tutorial Contingency Table

The relative risk estimate is a measure of association between the presence or absence of a factor and the occurrence of an event.

For example, you could examine the relationship between smoking and lung cancer.

In this hypothetical example, the relative risk of lung cancer is more than twice as high among smokers than among non-smokers.

15

Page 16: Tutorial Contingency Table

And the 95% confidence interval for the relative risk ratio does not include 1, indicating that there is a significant difference in the occurrence of lung cancer between smokers and non-smokers.

The Breslow-Day and Tarone's statistics test the homogeneity of the odds ratio across categories of the layer variable.

A low significance value (typically below 0.05) indicates that the odds ratio varies across categories of the layer variable

The Cochran's and Mantel-Haenszel statistics are designed to test for independence between a binary factor variable and a binary response variable. The statistics are adjusted for covariate patterns defined by one or more control variables.

16

Page 17: Tutorial Contingency Table

low significance value (typically below 0.05) indicates that there may be some relationship between the two variables.

While the measures may indicate that there is a relationship between two variables, they do not indicate the strength or direction of the relationship.

This is essentially a t-test for the value of the common odds ratio.

17

Page 18: Tutorial Contingency Table

The estimate and natural log of the estimate of the common odds ratio are normally distributed for sufficiently large data sets.

18

Page 19: Tutorial Contingency Table

A low significance value (typically below 0.05) indicates that the hypothesized value of the common odds ratio is probably incorrect.

19