more than 2 groups.ppt

43
Statistical Inference for more than two groups Peter T. Donnan Professor of Epidemiology and Biostatistics Statistics for Health Statistics for Health Research Research

Upload: phamanh

Post on 12-Feb-2017

227 views

Category:

Documents


6 download

TRANSCRIPT

Page 1: More than 2 groups.ppt

Statistical Inference for more than two groups

Peter T. Donnan

Professor of Epidemiology and Biostatistics

Statistics for Health Statistics for Health ResearchResearch

Page 2: More than 2 groups.ppt

Tests to be Tests to be coveredcovered

•Chi-squared testChi-squared test•One-way ANOVAOne-way ANOVA•Logrank testLogrank test

Page 3: More than 2 groups.ppt

Significance testing – general Significance testing – general overviewoverview

1.1.Define the null and alternative Define the null and alternative hypotheses under the studyhypotheses under the study

2.2.Acquire dataAcquire data

3.3.Calculate the value of the test Calculate the value of the test statisticstatistic

4.4.Compare the value of the test Compare the value of the test statistic to values from a known statistic to values from a known probability distributionprobability distribution

5.5.Interpret the p-value and draw Interpret the p-value and draw conclusionconclusion

Page 4: More than 2 groups.ppt
Page 5: More than 2 groups.ppt

Categorical data > 2 Categorical data > 2 groupsgroups

Unordered categories – Unordered categories – NominalNominal- Chi-squared test for - Chi-squared test for associationassociation

Ordered categories - Ordered categories - OrdinalOrdinal - Chi squared test for - Chi squared test for trend trend

Page 6: More than 2 groups.ppt

ExampleExample

Does the proportion Does the proportion of mothers of mothers developingdevelopingpre-eclampsia vary pre-eclampsia vary by parity (birth by parity (birth order)?order)?

Page 7: More than 2 groups.ppt

Pre-Pre-eclampsiaeclampsia

Birth OrderBirth Order 11stst 2 2ndnd 3 3rdrd 44thth

No No YesYes

1170 1170 (79.4%)(79.4%)

278 278 (84.8%)(84.8%)

83 83 (86.5%)(86.5%)

86 86 (92.4%)(92.4%)

304 304 (20.6%)(20.6%)

50 50 (15.2%)(15.2%)

13 13 (13.5%)(13.5%)

7 7 (7.5%)(7.5%)

Contingency Contingency tabletable (r x c)(r x c)

Page 8: More than 2 groups.ppt

1.1. Null hypothesis: No Null hypothesis: No association between association between pre-eclampsia and pre-eclampsia and birth orderbirth order

2.2. Null hypothesis: There Null hypothesis: There is no trend in pre-is no trend in pre-eclampsia with parityeclampsia with parity

Null HypothesesNull Hypotheses

Page 9: More than 2 groups.ppt

Test of associationTest Test of of linear linear trendtrend

Page 10: More than 2 groups.ppt

1.1. Strong association between Strong association between pre-eclampsia and birth pre-eclampsia and birth order (order (ΧΧ22 = 15.42, p = 0.001) = 15.42, p = 0.001)

2.2. Significant linear trend in Significant linear trend in incidence of pre-eclampsia incidence of pre-eclampsia with parity (with parity (ΧΧ22 = 15.03, p < = 15.03, p < 0.001)0.001)

3.3. Note 3 degrees of freedom Note 3 degrees of freedom for association test and 1 df for association test and 1 df for test for trendfor test for trend

ConclusionsConclusions

Page 11: More than 2 groups.ppt

Pre-Pre-eclampsiaeclampsia

Birth OrderBirth Order 11stst 2 2ndnd 3 3rdrd 44thth

No No YesYes

1170 1170 (79.4%)(79.4%)

278 278 (84.8%)(84.8%)

83 83 (86.5%)(86.5%)

86 86 (92.4%)(92.4%)

304 304 (20.6%)(20.6%)

50 50 (15.2%)(15.2%)

13 13 (13.5%)(13.5%)

7 7 (7.5%)(7.5%)

Contingency Contingency tabletable (r x c)(r x c)

Page 12: More than 2 groups.ppt

1.1. Tables can be any size. For Tables can be any size. For example SIMD deciles by example SIMD deciles by parity would be a 10 x 4 parity would be a 10 x 4 tabletable

2.2. But with very large tables But with very large tables difficult to interpret tests of difficult to interpret tests of associationassociation

3.3. Crosstabulations in SPSS can Crosstabulations in SPSS can give Odds ratios as an option give Odds ratios as an option with row or column with two with row or column with two categoriescategories

Contingency Tables Contingency Tables (r x c)(r x c)

Page 13: More than 2 groups.ppt

Numerical data > 2 Numerical data > 2 groupsgroups

Compare means from several Compare means from several groupsgroups

Single global test of difference Single global test of difference in meansin meansAlso test for linear trendAlso test for linear trend

1-way analysis of variance 1-way analysis of variance (ANOVA)(ANOVA)

Page 14: More than 2 groups.ppt

Extend t-test to >2 groups Extend t-test to >2 groups i.e Analysis of Variance i.e Analysis of Variance

(ANOVA)(ANOVA)Consider scores for contribution Consider scores for contribution

to energy intake from fat to energy intake from fat groups, milk groups and alcohol groups, milk groups and alcohol groupsgroups

Does the mean score differ across Does the mean score differ across the three categories of intake the three categories of intake groups?groups?

Koh ET, Owen WL. Introduction to Nutrition and Health Koh ET, Owen WL. Introduction to Nutrition and Health Research Kluwer Boston, 2000Research Kluwer Boston, 2000

Page 15: More than 2 groups.ppt

One-Way ANOVA of One-Way ANOVA of scoresscores

Contributor to Energy Contributor to Energy Intake Intake

AlcohoAlcoholl

n=6n=6Mean=4.2Mean=4.222

n=6n=6Mean=0.1Mean=0.16767

FatFat MilMilkk

n=6n=6Mean=2.0Mean=2.011

Page 16: More than 2 groups.ppt

One-Way ANOVA of One-Way ANOVA of ScoresScores

The null hypothesis (HThe null hypothesis (H00) is ‘there ) is ‘there are no differences in mean score are no differences in mean score across the three groups’across the three groups’

321 xxx

Use SPSS One-Way ANOVA to Use SPSS One-Way ANOVA to carry out this testcarry out this test

Page 17: More than 2 groups.ppt

Assumptions of 1-Way Assumptions of 1-Way ANOVAANOVA

1. Standard deviations are similar1. Standard deviations are similar

2. Test variable (scores) are 2. Test variable (scores) are approx. Normally distributedapprox. Normally distributedIf assumptions are not met, use If assumptions are not met, use non-parametric equivalent non-parametric equivalent Kruskal-Wallis test Kruskal-Wallis test

Page 18: More than 2 groups.ppt

Results of ANOVA Results of ANOVA

ANOVA partitions variation into ANOVA partitions variation into WithinWithin and and BetweenBetween group group components components

Results in F-statistic – compared Results in F-statistic – compared with values in F-tableswith values in F-tables

F = 108.6, with 2 and 15 df, F = 108.6, with 2 and 15 df, p<0.001p<0.001

Page 19: More than 2 groups.ppt

Results of ANOVA Results of ANOVA

The groups differ significantly The groups differ significantly and it is clear the Fat group and it is clear the Fat group contributes most to energy contributes most to energy score with a mean = 4.22 score with a mean = 4.22

Further pair-wise comparisons Further pair-wise comparisons can be made (3 possible) using can be made (3 possible) using multiple comparisons test e.g. multiple comparisons test e.g. BonferroniBonferroni

Page 20: More than 2 groups.ppt

Example 2Example 2

Does income vary by Does income vary by highest levelhighest levelof education of education achieved?achieved?

Page 21: More than 2 groups.ppt

HH00: no difference in mean : no difference in mean income by education income by education level level achievedachieved

HH11: mean income varies : mean income varies with with education level education level achievedachieved

Null Hypothesis and Null Hypothesis and alternativealternative

Page 22: More than 2 groups.ppt

Assumptions of 1-Way Assumptions of 1-Way ANOVAANOVA

1.1. Standard deviations or variances are similarStandard deviations or variances are similar2.2. Test variable (income) are approx. Test variable (income) are approx.

Normally distributedNormally distributed

If assumptions are not met, use If assumptions are not met, use non-parametric equivalent non-parametric equivalent Kruskal-Wallis test Kruskal-Wallis test

Page 23: More than 2 groups.ppt
Page 24: More than 2 groups.ppt
Page 25: More than 2 groups.ppt

Table of Mean Table of Mean income for each income for each level of level of educational educational achievementachievement

Page 26: More than 2 groups.ppt
Page 27: More than 2 groups.ppt
Page 28: More than 2 groups.ppt
Page 29: More than 2 groups.ppt

Analysis of Analysis of Variance TableVariance TableF-test givesF-test givesP < 0.001 P < 0.001 showing showing significant significant difference difference between mean between mean levels of levels of educationeducation

Page 30: More than 2 groups.ppt

Table of Table of each each pairwise pairwise comparisocomparison.n.Note Note lower lower income income for ‘did for ‘did not not complete complete school’ to school’ to all other all other groups.groups.All p-All p-values values adjusted adjusted for for multiple multiple comparisocomparisonsns

Page 31: More than 2 groups.ppt

Summary of ANOVA Summary of ANOVA

ANOVA useful if number of groups ANOVA useful if number of groups with continuous summary in eachwith continuous summary in eachSPSS does all pairwise group SPSS does all pairwise group comparisons adjusted for multiple comparisons adjusted for multiple testingtestingNote that ANOVA is just a form of linear regression – see later

Page 32: More than 2 groups.ppt

Extending Kaplan-Meier Extending Kaplan-Meier and logrank test in SPSSand logrank test in SPSS

You need to specify:You need to specify:• Survival time – time from surgery Survival time – time from surgery

(tfsurg)(tfsurg)• Status – Dead = 1, censored = 0 Status – Dead = 1, censored = 0

(dead)(dead)• Factor – Duke’s stage at baseline (A, Factor – Duke’s stage at baseline (A,

B, C, D, Unknown)B, C, D, Unknown)• Select compare factor and logrank Select compare factor and logrank • Optionally select plot of survivalOptionally select plot of survival

Page 33: More than 2 groups.ppt

Implementing Logrank test Implementing Logrank test in SPSSin SPSS

Page 34: More than 2 groups.ppt

Select options Select options to obtain plot to obtain plot and median and median survivalsurvival

Select Select Compare Compare Factor to Factor to obtain obtain logrank testlogrank test

Select linear trend for Select linear trend for this testthis test

Page 35: More than 2 groups.ppt

Overall Comparisons Chi-Square df

Sig.Log Rank (Mantel-Cox) 80.534 1 .000

The vector of trend weights is -2, -1, 0, 1, 2. This is the default.

The test for The test for trend in trend in survival survival across across Duke’s Duke’s stage is stage is highly highly significantsignificant

Page 36: More than 2 groups.ppt

Interpret SPSS outputInterpret SPSS output

• Note the logrank statistic, degrees of Note the logrank statistic, degrees of freedom and statistical significance freedom and statistical significance (p-value).(p-value).

• Note in which direction survival is Note in which direction survival is worst or best and back up visual worst or best and back up visual information from the Kaplan-Meier information from the Kaplan-Meier plot with median survival and 95% plot with median survival and 95% confidence intervals from the output.confidence intervals from the output.

• Finally, interpret the results!Finally, interpret the results!

Page 37: More than 2 groups.ppt

Duke’s Stage

Median Survival (days)

Mean Survival(Days)

A 2770 1978B 1749 1866C 1120 1304D 375 646Unknown

581 1297

Interpret test result in Interpret test result in relation to median relation to median survivalsurvival

Page 38: More than 2 groups.ppt

Output form Kaplan-Meier Output form Kaplan-Meier in SPSSin SPSS

Note that SPSS gives three possible Note that SPSS gives three possible tests:tests:

• Logrank, Tarone-Ware and BreslowLogrank, Tarone-Ware and Breslow• In general, logrank gives greater In general, logrank gives greater

weight to later events compared to weight to later events compared to the other two tests. the other two tests.

• If all are similar quote logrank test.If all are similar quote logrank test.• If different results, quote more than If different results, quote more than

one test resultone test result

Page 39: More than 2 groups.ppt

Editing SPSS outputEditing SPSS output

•Note that everything in the SPSS Note that everything in the SPSS output window can be copied and output window can be copied and pasted into Word and Powerpoint.pasted into Word and Powerpoint.

•Double-clicking on plots also Double-clicking on plots also allows editing of the plot such as allows editing of the plot such as changing axes, colours, fonts, etc.changing axes, colours, fonts, etc.

Page 40: More than 2 groups.ppt

Diabetic patients LDL Diabetic patients LDL datadata

•Try carrying out extended Try carrying out extended Crosstabulations and Crosstabulations and ANOVA where appropriate ANOVA where appropriate in the LDL data…in the LDL data…

•E.g. APOE genotypeE.g. APOE genotype

Page 41: More than 2 groups.ppt

Colorectal cancer Colorectal cancer patients: survival patients: survival following surgeryfollowing surgery

•Try carrying out Kaplan-Try carrying out Kaplan-Meier plots and logrank Meier plots and logrank tests for other factors tests for other factors such as WHO Functional such as WHO Functional Performance, smoking, Performance, smoking, etc…etc…

Page 42: More than 2 groups.ppt

Extending test to more Extending test to more than 2 groups than 2 groups

Summary Summary •Define HDefine H0 0 and Hand H11

•Choosing the appropriate test Choosing the appropriate test according to type of variablesaccording to type of variables

•Interpret output carefullyInterpret output carefully

Page 43: More than 2 groups.ppt