sadc course in statistics comparing several proportions (session 15)

18
SADC Course in Statistics Comparing several proportions (Session 15)

Upload: autumn-pugh

Post on 28-Mar-2015

220 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: SADC Course in Statistics Comparing several proportions (Session 15)

SADC Course in Statistics

Comparing several proportions

(Session 15)

Page 2: SADC Course in Statistics Comparing several proportions (Session 15)

2To put your footer here go to View > Header and Footer

Learning Objectives

By the end of this session, you will be able to

• conduct and interpret results from a chi-square test for comparing several proportions

• explain how results above can be extended to the study of associations in general r x c tables

• state assumptions underlying the above test and actions to take if assumptions fail

Page 3: SADC Course in Statistics Comparing several proportions (Session 15)

3To put your footer here go to View > Header and Footer

Example with several proportionsBelow is a 5x2 table of observed frequencies showing animals who did or did not get diseased after inoculation with one of five vaccines.

Vaccine diseased

healthy Total

A 43 237 280

B 52 198 250

C 25 245 270

D 48 212 260

E 57 233 290

Total 225 1125 1350

Question:

Does the proportion of diseased animals vary according to type of vaccine?

Page 4: SADC Course in Statistics Comparing several proportions (Session 15)

4To put your footer here go to View > Header and Footer

Null and alternative hypotheses

Translate the above question into a null hypothesis and an alternative hypothesis, i.e.H0: proportions are the same for each vaccine

H1: proportions vary over vaccines

If H0 is true, best estimate of proportion is

diseased total divided by grand total= 225/1350 = 0.167

We use this to compute expected values in each row of the table.

Page 5: SADC Course in Statistics Comparing several proportions (Session 15)

5To put your footer here go to View > Header and Footer

Chi-square test

Thus the procedure is exactly similar to the case of comparing two proportions. Again we compute the chi-square test-statistic given by

If H0 is true, X2 follows a 2 distribution with 4

d.f. (number of proportions being compared minus 1).

Comparing 16.56 to we get a p-value of 0.0024, a highly significant result.

22

allcells

(O-E)X 16.56

E

24χ

Page 6: SADC Course in Statistics Comparing several proportions (Session 15)

6To put your footer here go to View > Header and Footer

ConclusionsThere is strong evidence that the proportions of diseased animals are not all the same for all vaccines.

Note: The chi-square value above is nearly the same as that obtained in the previous session with data from a 2x2 table. This is a coincidence, but note the difference in d.f. Previously for 2x2 table, d.f.=1. In above example, d.f.=4.

So although the test statistic is the same, p-values are different.

Page 7: SADC Course in Statistics Comparing several proportions (Session 15)

7To put your footer here go to View > Header and Footer

Extensions to r x c tables

Survey results are often expressed in terms of 2-way tables. In general, such tables may contain r rows and c columns. Questions of interest in such tables centre on whether these is an association between the two variables that have been tabulated.

For example if the table tabulates education level of HH head (none, primary, secondary, tertiary) by poverty levels (not poor, poor, very poor), the question “is poverty related to education” may be asked.

Page 8: SADC Course in Statistics Comparing several proportions (Session 15)

8To put your footer here go to View > Header and Footer

Chi-square test for an rxc table

To answer the above question, the null hypothesis is that the two variables are NOT related, against the alternative that they are.

Under the null hypothesis, comparison of expected values with observed values leads to a chi-square test. The d.f. associated with this test = (r-1)(c-1).

In the above example, the d.f.=(4-1)(3-1)=6

Page 9: SADC Course in Statistics Comparing several proportions (Session 15)

9To put your footer here go to View > Header and Footer

Assumption underlying the test• The chi-square test is approximate

– Validity relies on “large” samples– Small samples of unbalanced data (large

and small counts together) may invalidate the approximation

• Rules of thumb for validity involve the expected values, E– Need large expected values under H0

– Say, most E5 and none less than 1– If rule of thumb is not satisfied, may have

an unreliable p-value

Page 10: SADC Course in Statistics Comparing several proportions (Session 15)

10To put your footer here go to View > Header and Footer

Actions when assumptions fail

(a) Simple approaches:

• Collect more data if this is possible

• Collapse rows or columns if the table has more than two rows/columns. But need to recognise that– this leads to loss of information– with some types of variables, there may be

no natural way of combining rows/columns

Page 11: SADC Course in Statistics Comparing several proportions (Session 15)

11To put your footer here go to View > Header and Footer

Actions when assumptions fail

(b) Use a continuity correction

This method is often called Yate’s correction and is applicable just to 2x2 tables.

First we show the standard chi-square value corresponding to a table with cell counts a, b, c, d as below. (Verify later that this is correct)

col1 col2

row1 a b r1

row2 c d r2

n1 n2 N

1 2 1 2

2

2 ad bc NX =

r r n n

Page 12: SADC Course in Statistics Comparing several proportions (Session 15)

12To put your footer here go to View > Header and Footer

Actions when assumptions fail

(b) Continuity correction (continued)…

The approximation of X2 to the chi-square is improved by reducing the absolute value of O-E by ½ before calculating X2. This results in the X2 taking the value below.

1 2 1 2

2

2 | ad bc | ½N NX =

r r n n

Note: The equivalent when comparing two proportions using an z-test is to reduce by ½, the r value for the first p=r/n and increase by ½ the r value for the second proportion.

Page 13: SADC Course in Statistics Comparing several proportions (Session 15)

13To put your footer here go to View > Header and Footer

Example of use of continuity corrn

Whether smoker?

Job

Driver Conductor

Total

No 40

67.8%

52

78.8%

92

73.6%

Yes 19

32.2%

14

21.2%

33

26.4%

Total 59

100.0%

66

100.0%

125

(100%)Above is the example on Bus data used during the practical sessions. Question of interest is whether the proportion of smokers are different across job types.

Page 14: SADC Course in Statistics Comparing several proportions (Session 15)

14To put your footer here go to View > Header and Footer

Example of use of continuity corrn

The usual chi-square test leads to X2=1.937

Applying the continuity correction, we get

X2 = 1.412

Here, there is little difference because the sample sizes are reasonably large.

More important to apply the continuity correction for small sample sizes.

Page 15: SADC Course in Statistics Comparing several proportions (Session 15)

15To put your footer here go to View > Header and Footer

Actions when assumptions fail (ctd)(c) Using an Exact Test

• When actions suggested in (a) or (b) are not possible, consider using an Exact Test.

• Details of such tests are beyond the scope of this module. However, the basic approach is to compute all possible tables having the same marginal totals, and examine how extreme the observed table is, in comparison.

• Some software packages (e.g. Stata) have the facility to perform Fisher’s exact test. SPSS does this only for 2x2 tables. Special software also exist for such tests, e.g. StatXact.

Page 16: SADC Course in Statistics Comparing several proportions (Session 15)

16To put your footer here go to View > Header and Footer

Limitations

• Chi-square tests are limited, in that only two factors are examined at a time.

• This may cause erroneous inferences to be made (see Practical 15 for an example).

• The inter-relations between more than two factors can be investigating using more sophisticated statistical techniques, e.g. log-linear modelling.

Page 17: SADC Course in Statistics Comparing several proportions (Session 15)

17To put your footer here go to View > Header and Footer

References

• Altman, D.G., Machin, D., Bryant, T.N., and Gardner, M.J. (2000) Statistics with confidence. (2nd Edition). BMJ Books, Bristol, UK. pp 240.

• Armitage, P., Matthews J.N.S. and Berry G. (2002). Statistical Methods in Medical Research. 4th edn. Blackwell.

Page 18: SADC Course in Statistics Comparing several proportions (Session 15)

18To put your footer here go to View > Header and Footer

Some practical work follows…