applied statistics using sas and spss

31
1 Applied Statistics Using SAS and SPSS Topic: Chi-square tests By Prof Kelly Fan, Cal. State Univ., East Bay

Upload: juro

Post on 06-Jan-2016

55 views

Category:

Documents


4 download

DESCRIPTION

Applied Statistics Using SAS and SPSS. Topic: Chi-square tests By Prof Kelly Fan, Cal. State Univ., East Bay. Outline. ALL variables must be categorical Goal one: verify a distribution of Y One-sample Chi-square test (SPSS lesson 40; SAS handout) - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Applied Statistics Using SAS and SPSS

1

Applied Statistics Using SAS and SPSS

Topic: Chi-square tests

By Prof Kelly Fan, Cal. State Univ., East Bay

Page 2: Applied Statistics Using SAS and SPSS

2

Outline ALL variables must be categorical Goal one: verify a distribution of Y

One-sample Chi-square test (SPSS lesson 40; SAS handout) Goal two: test the independence between two categorical

variablesChi-square test for two-way contingency table (SPSS lesson

41; SAS section 3.G)McNemar’s test for paired data (SPSS lesson 44; SAS

section 3.L) Measure the dependence (Phil and Kappa coefficients)

(SPSS lesson 41, 44; SAS section 3.G, 3.M) Goal three: test the independence between two categorical

variables after controlling the third factorMantel-Haenszel Chi-square test (SPSS in class; SAS

section 3.Q)

Page 3: Applied Statistics Using SAS and SPSS

3

Example: Postpartum Depression Study

Are women equally likely to show an increase, no change, or a decrease in depression as a function of childbirth?

Are the proportions associated with a decrease, no change, and an increase in depression from before to after childbirth the same?

Page 4: Applied Statistics Using SAS and SPSS

Raw data vs. Grouped data

Raw data:

Grouped data are shown in next slide.4

ID Name Depression level after birth in comparison with before birth

1 *** Same

2 *** Less depressed

3 *** More depressed

Page 5: Applied Statistics Using SAS and SPSS

5

Example: Postpartum Depression Study

Depression after birth in comparison with before birth

Observed frequencies

Hypothesized proportions

Expected frequencies

Less depressed (-1) 14 1/3 20

Neither less nor more depressed (0)

33 1/3 20

More depressed (1) 13 1/3 20

From a random sample of 60 women

Page 6: Applied Statistics Using SAS and SPSS

6

One-sample Chi-Square Test

Must be a random sample

The sample size must be large enough so that expected frequencies are greater than or equal to 5 for 80% or more of the categories

Page 7: Applied Statistics Using SAS and SPSS

7

One-sample Chi-Square Test

Test statistic:

Oi = the observed frequency of i-th category

ei = the expected frequency of i-th category

i i

ii

e

eo 22 )(

Page 8: Applied Statistics Using SAS and SPSS

8

SPSS Output

1. Weight your data by count first (data>>weight cases)

2. Analyze >> Nonparametric Tests >> Legacy Dialogs >> Chi Square, count as test variable

Postpartum Depression

14 20.0 -6.0

33 20.0 13.0

13 20.0 -7.0

60

less depressed

same

more depressed

Total

Observed N Expected N Residual

Test Statistics

12.700

2

.002

Chi-Square a

df

Asymp. Sig.

PostpartumDepression

0 cells (.0%) have expected frequencies less than5. The minimum expected cell frequency is 20.0.

a.

Page 9: Applied Statistics Using SAS and SPSS

9

Conclusion

Reject Ho

The proportions associated with a decrease, no change, and an increase in depression from before to after childbirth are significantly different to 1/3, 1/3, 1/3.

Page 10: Applied Statistics Using SAS and SPSS

10

Example: Postpartum Depression Study

Are the proportions associated with a change and no change from before to after childbirth the same?

Page 11: Applied Statistics Using SAS and SPSS

11

Example: Postpartum Depression Study

Depression after birth in comparison with before birth

Observed frequencies

Hypothesized proportions

Expected frequencies

Same amount of depression (0)

33 1/2 30

More or less depressed (1)

27 1/2 30

From a random sample of 60 women

Page 12: Applied Statistics Using SAS and SPSS

12

SPSS Output

Postpartum Depression--Recoded

33 30.0 3.0

27 30.0 -3.0

60

same

more or less depressed

Total

Observed N Expected N Residual

Test Statistics

.600

1

.439

Chi-Square a

df

Asymp. Sig.

PostpartumDepression--Recoded

0 cells (.0%) have expected frequencies less than5. The minimum expected cell frequency is 30.0.

a.

Page 13: Applied Statistics Using SAS and SPSS

13

Two-way Contingency Tables

Report frequencies on two variables

Such tables are also called crosstabs.

Page 14: Applied Statistics Using SAS and SPSS

14

Contingency Tables (Crosstabs)

1991 General Social Survey

Frequency Party Identification

Democrat Independent Republican

Race White 341 105 405

Black 103 15 11

Page 15: Applied Statistics Using SAS and SPSS

15

Crosstabs Analysis (Two-way Chi-square test) Chi-square test for testing the

independence between two variables:

1. For a fixed column, the distribution of frequencies over rows keeps the same regardless of the column

2. For a fixed row, the distribution of frequencies over columns keeps the same regardless of the row

Page 16: Applied Statistics Using SAS and SPSS

16

Measure of dependence for 2x2 tables

The phi coefficient measures the association between two categorical variables

-1 < phi < 1 | phi | indicates the strength of the

association If the two variables are both ordinal, then

the sign of phi indicate the direction of association

Page 17: Applied Statistics Using SAS and SPSS

SPSS Output

P. 332 : Data>> weight cases>> Weight cases by, select count variable P. 333: Analyze >> descriptive statistics >> crosstabs, cell

17

Page 18: Applied Statistics Using SAS and SPSS

18

SAS Output

Statistic DF Value ProbChi-Square 2 79.4310 <.0001

Likelihood Ratio Chi-Square 2 90.3311 <.0001Mantel-Haenszel Chi-Square 1 79.3336 <.0001

Phi Coefficient 0.2847 Contingency Coefficient 0.2738 Cramer's V 0.2847

Sample Size = 980

Page 19: Applied Statistics Using SAS and SPSS

Measure of dependence for non-2x2 tables

Cramers V

Range from 0 to 1V may be viewed as the association between

two variables as a percentage of their maximum possible variation.

V= phi for 2x2, 2x3 and 3x2 tables

19

Page 20: Applied Statistics Using SAS and SPSS

20

Fisher’s Exact Test for Independence

The Chi-squared tests are ONLY for large samples:

The sample size must be large enough so that expected frequencies are greater than or equal to 5 for 80% or more of the categories

Page 21: Applied Statistics Using SAS and SPSS

21

SAS/SPSS Output

• SAS output: Fisher's Exact Test Table Probability (P) 3.823E-22 Pr <= P 2.787E-20

• SPSS output: in “crosstabs” window, click “exact”, then tick “exact”:

Page 22: Applied Statistics Using SAS and SPSS

22

Matched-pair Data

Comparing categorical responses for two “paired” samples

When eitherEach sample has the same subjects (or say

subjects are measured twice)

OrA natural pairing exists between each subject in

one sample and a subject from the other sample (eg. Twins)

Page 23: Applied Statistics Using SAS and SPSS

23

Example: Rating for Prime Minister

Second Survey

First Survey Approve Disapprove

Approve 794 150

Disapprove 86 570

Page 24: Applied Statistics Using SAS and SPSS

24

Marginal Homogeneity

The probabilities of “success” for both samples are identical

Eg. The probability of “approve” at the first and 2nd surveys are identical

Page 25: Applied Statistics Using SAS and SPSS

25

McNemar Test (for 2x2 Tables only)

SAS: Section 3.L; SPSS: Lesson 44

Ho: marginal homogeneity

Ha: no marginal homogeneity

Exact p-valueApproximate p-value (When n12+n21>10)

Page 26: Applied Statistics Using SAS and SPSS

26

SAS Output

McNemar's Test Statistic (S) 17.3559 DF 1 Asymptotic Pr > S <.0001 Exact Pr >= S 3.716E-05

Simple Kappa Coefficient Kappa 0.6996 ASE 0.0180 95% Lower Conf Limit 0.6644 95% Upper Conf Limit 0.7348

Sample Size = 1600

Level of agreement

In SPSS: Analyze >> Descriptive statistics >> crosstabs, in “statistics” tick “Kappa” and “McNemar”

Page 27: Applied Statistics Using SAS and SPSS

SPSS Output

27

• SPSS(p. 361): Analyze >> Nonparametric tests >> Legacy dialogs >> 2 related samples; in “two-samples tests” tick “McNemar” and click “exact”, then tick “exact” again

Page 28: Applied Statistics Using SAS and SPSS

Stratified 2 by 2 Tables (Meta-Analysis)

Goal: to investigate the risk factor (lack of sleep) to the outcome (failing a test)

28

Test Results, Boys

Sleep Fail Pass

Low 20 100

High 15 150

Test Results, Girls

Sleep Fail Pass

Low 30 100

High 25 200

Page 29: Applied Statistics Using SAS and SPSS

Cochran Mantel-Haenszel Test After Importing your dataset, and providing names to variables,

click on: ANALYZE >> DESCRIPTIVE STATISTICS >> CROSSTABS For ROWS, Select the Independent Variable For COLUMNS, Select the Dependent Variable For LAYERS, Select the Strata Variable Under STATISTICS, Click on COCHRAN’S AND MANTEL-

HAENSZEL STATISTICS NOTE: You will want to code the data so that the outcome

present (Yes) category has the lower value (e.g. 1) and the outcome absent (No) category has the higher value (e.g. 2). Do the same for risk factor: 1 for exposure; 2 for no exposure. Use Value Labels to keep output straight.

Page 30: Applied Statistics Using SAS and SPSS

SPSS Output

30

Page 31: Applied Statistics Using SAS and SPSS

SAS Output

Common Odds Ratio and Relative Risks

Statistic Method Value 95% Confidence Limits

Odds Ratio Mantel-Haenszel 2.2289 1.4185 3.5024

  Logit 2.2318 1.4205 3.5064

Relative Risk (Column 1)

Mantel-Haenszel 1.9775 1.3474 2.9021

  Logit 1.9822 1.3508 2.9087

Relative Risk (Column 2)

Mantel-Haenszel 0.8891 0.8283 0.9544

  Logit 0.8936 0.8334 0.9582 31

Breslow-Day Test forHomogeneity of the Odds Ratios

Chi-Square 0.1501

DF 1

Pr > ChiSq 0.6985

Cochran-Mantel-Haenszel Statistics (Based on Table Scores)

Statistic Alternative Hypothesis

DF Value Prob

1 Nonzero Correlation

1 12.4770 0.0004

2 Row Mean Scores Differ

1 12.4770 0.0004

3 General Association

1 12.4770 0.0004