the kruskal-wallis h test sporiš goran, phd

24
The Kruskal-Wallis H Test Sporiš Goran, PhD. http://kif.hr/predmet/mki http:// www.science4performance.com/

Upload: keira-saunders

Post on 30-Mar-2015

223 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: The Kruskal-Wallis H Test Sporiš Goran, PhD

The Kruskal-Wallis H Test

Sporiš Goran, PhD.http://kif.hr/predmet/mkihttp://www.science4performance.com/

Page 2: The Kruskal-Wallis H Test Sporiš Goran, PhD

• The Kruskal-Wallis Kruskal-Wallis HH Test Test is a nonparametric procedure that can be used to compare more than two populations in a completely randomized design.

• All n = n1+n2+…+nk measurements are jointly ranked (i.e.treat as one large sample).

• We use the sums of the ranks of the k samples to compare the distributions.

The Kruskal-Wallis H Test

Page 3: The Kruskal-Wallis H Test Sporiš Goran, PhD

Rank the total measurements in all k samples from 1 to n. Tied observations are assigned average of the ranks they would have gotten if not tied.Calculate

Ti = rank sum for the ith sample i = 1, 2,…,kAnd the test statistic

Rank the total measurements in all k samples from 1 to n. Tied observations are assigned average of the ranks they would have gotten if not tied.Calculate

Ti = rank sum for the ith sample i = 1, 2,…,kAnd the test statistic

The Kruskal-Wallis H Test

)1(3)1(

12 2

nn

T

nnH

i

i

Page 4: The Kruskal-Wallis H Test Sporiš Goran, PhD

H0: the k distributions are identical versus

Ha: at least one distribution is different

Test statistic: Kruskal-Wallis H

When H0 is true, the test statistic H has an approximate chi-square distribution with df = k-1.

Use a right-tailed rejection region or p-value based on the Chi-square distribution.

H0: the k distributions are identical versus

Ha: at least one distribution is different

Test statistic: Kruskal-Wallis H

When H0 is true, the test statistic H has an approximate chi-square distribution with df = k-1.

Use a right-tailed rejection region or p-value based on the Chi-square distribution.

The Kruskal-Wallis H Test

Page 5: The Kruskal-Wallis H Test Sporiš Goran, PhD

ExampleFour groups of students were randomly assigned to be taught with four different techniques, and their achievement test scores were recorded. Are the distributions of test scores the same, or do they differ in location?

88628179

67

78

59

3

83

69

75

2

73

87

65

1

80

89

94

4

Page 6: The Kruskal-Wallis H Test Sporiš Goran, PhD

Teaching Methods

H0: the distributions of scores are the same Ha: the distributions differ in location

H0: the distributions of scores are the same Ha: the distributions differ in location

88628179

67

78

59

3

83

69

75

2

73

87

65

1

80

89

94

4

55153531Ti

(14)(2)(11)(9)

(4)

(8)

(1)

(12)

(5)

(7)

(6)

(13)

(3)

(10)

(15)

(16)

96.8)17(34

55153531

)17(16

12

)1(3)1(

12

2222

2

nn

T

nnH

i

i :statistic Test

Rank the 16 measurements from 1 to 16, and calculate the four rank sums.

Rank the 16 measurements from 1 to 16, and calculate the four rank sums.

Page 7: The Kruskal-Wallis H Test Sporiš Goran, PhD

Teaching MethodsH0: the distributions of scores are the same Ha: the distributions differ in location

H0: the distributions of scores are the same Ha: the distributions differ in location

96.8)17(34

55153531

)17(16

12

)1(3)1(

12

2222

2

nn

T

nnH

i

i :statistic Test

Rejection region: For a right-tailed chi-square test with = .05 and df = 4-1 =3, reject H0 if H 7.81.

Rejection region: For a right-tailed chi-square test with = .05 and df = 4-1 =3, reject H0 if H 7.81.

Reject H0. There is sufficient evidence to indicate that there is a difference in test scores for the four teaching techniques.

Reject H0. There is sufficient evidence to indicate that there is a difference in test scores for the four teaching techniques.

Page 8: The Kruskal-Wallis H Test Sporiš Goran, PhD

Key ConceptsI.I. Nonparametric MethodsNonparametric Methods

These methods can be used when the data cannot be measured on a quantitative scale, or when

• The numerical scale of measurement is arbitrarily set by the researcher, or when

• The parametric assumptions such as normality or constant variance are seriously violated.

Page 9: The Kruskal-Wallis H Test Sporiš Goran, PhD

Key ConceptsKruskal-Wallis Kruskal-Wallis HH Test: Completely Randomized Design Test: Completely Randomized Design1. Jointly rank all the observations in the k samples (treat as one

large sample of size n say). Calculate the rank sums, Ti rank sum of sample i, and the test statistic

2. If the null hypothesis of equality of distributions is false, H will be unusually large, resulting in a one-tailed test.

3. For sample sizes of five or greater, the rejection region for H is based on the chi-square distribution with (k 1) degrees of freedom.

)1(3)1(

12 2

nn

T

nnH

i

i

Page 10: The Kruskal-Wallis H Test Sporiš Goran, PhD

Testing for trends: the Jonckheere-Terpstra test

This test looks at the differences between the medians of the groups, just as the Kruskall-Wallis test does.

Additionally, it includes information about whether the medians are ordered.

In our example, we predict an order for the number of sperms in the 4 groups, indeed:

no meal > 1 meal > 4 meals > 7 meals

In the coding variable, we have already encoded the order which we expect (1>2>3>4)

Page 11: The Kruskal-Wallis H Test Sporiš Goran, PhD

Output of the J-T testSperm count (million)

Number of levels in Number of Soya Meals Per Week 4,000N 80,000Observed J-P statistic 912,000Mean J-P statistic 1200,000SD of J-T statistic 116,330Std. J-T statistic -2,476Asymp. Sig. (2-tailed) 0,013Monte Carlo Sig (2-tailed) 99% Confidence Interval Lower Bound 0,010(2-tailed) 99% Confidence Interval Upper Bound 0,016

Monte Carlo Sig (1-tailed) 99% Confidence Interval Lower Bound 0,004(1-tailed) 99% Confidence Interval Upper Bound 0,008

Jonckheere-Terpstra Testb

0,013a

0,006a

a = Grouping variable: Number of Soya meals per weekb = Based on 10000 sampled tables with starting seed 846668601

Z-score =(912-1200)/116.33=-2.476

J-T test should always be 1-tailed (since we have a directed hypo!) We compare -2.47 against 1.65 which is the z-value for an -level of 5% for a 1- tailed test. Since 2.47>1.65 the result is significant.The negative sign means that medians are in descending order (a positive sign would have meant ascending order).

If you haveJ-T in yourversion of

SPSS, itwould look

like this

Page 12: The Kruskal-Wallis H Test Sporiš Goran, PhD

Differences between several related groups: Friedman's ANOVA

• Friedman's ANOVA is the non-parametric analogue to a repeated measure ANOVA (see chapter 11) where the same subjects have been subjected to various conditions.

• Example here: Testing the effect of a new diet called 'Andikins diet' on n=10 women. Their weight (in kg) was tested 3 times:– Start– Month 1– Month 2

• Would they loose weight in the course of the diet?

Page 13: The Kruskal-Wallis H Test Sporiš Goran, PhD

Theory of Friedman's ANOVA• Subject's weight on each of the 3 dates is listed

in a separate column. Then ranks for the 3 dates are determined and listed in separate columns.

• Then, the ranks are summed up for each Condition (R

i)

Diet data with ranksWeight Weight

Start Month 1 Month 2Start Month1 Month2 (Ranks) (Ranks) (Ranks)

Person 1 63,75 65,38 81,34 1 2 32 62,98 66,24 69,31 1 2 33 65,98 67,7 77,89 1 2 34 107,27 102,72 91,33 3 2 15 66,58 69,45 72,87 1 2 36 120,46 119,96 114,26 3 2 17 62,01 66,09 68,01 1 2 38 71,87 73,62 55,43 2 3 19 83,01 75,81 71,63 3 2 1

10 76,62 67,66 68,6 3 1 2

19 20 21Ri

Always the 3scores are compared:

The smallestone gets 1,the next 2,

and the biggestone 3.

Page 14: The Kruskal-Wallis H Test Sporiš Goran, PhD

The Test statistic Fr

From the sum of ranks for each group, the test statistic F

r is derived:

k

Fr= 12/Nk (k+1) Σ

i=1 R2

i - 3N(k+1)

= (12/(10x3)(3+1)) (192 + 202 + 212)) – (3x10)(3+1)=12/120 (361+400+441) – 120=0.1 (1202) – 120=120.2 - 120 = 0.2

Start Month 1 Month 2

19 20 21Ri

Page 15: The Kruskal-Wallis H Test Sporiš Goran, PhD

Data Input and provisional analysis (using) diet.sav

First, test for normality:Analyze Descriptive Statistics

Explore, tick 'Normality plots with tests' in the 'Plots' window

Data sheet

Tests of Normality

,228 10 ,149 ,785 10 ,012

,335 10 ,002 ,684 10 ,010**

,203 10 ,200* ,874 10 ,127

START Weight at Start (Kg)

MONTH1 Weight after 1 month (Kg)

MONTH2 Weight after 2 month (Kg)

Statistic df Sig. Statistic df Sig.

Kolmogorov-Smirnova

Shapiro-Wilk

This is an upper bound of the true significance.**.

This is a lower bound of the true significance.*.

Lilliefors Significance Correctiona.

In the Shapiro-Wilk test (which is more accurate than the K-S Test, two groups (Start, 1 month) show non-normal distributions. This violation of a parametric constraint justifies the choice of a non-para-metric test.

Page 16: The Kruskal-Wallis H Test Sporiš Goran, PhD

Running Friedman's ANOVA Analyze Non-parametric Tests K Related

Samples...

Exact...

Request everything there is -

it is not much...

If you have 'Exact', tick'Exact and limit calculation

time to 5 minutes.

Other optionsOther options

Page 17: The Kruskal-Wallis H Test Sporiš Goran, PhD

Other options

• Kendall's W: Similar to Friedman's ANOVA, but looks specifically at agreement between raters. For example: to what extent (from 0-1) women rate Justin Timberlake, David Beckham, or Tony Blair on their attractiveness. This is like a correlation coefficient.

• Cochran's Q: This is an extension of NcNemar's test. It is like a Friedman's test for dichotomous data. For example, if women should judge whether they would like to kiss Justin Timberlake, David Beckham, or Tony Blair and they could only answer: Yes or No.

Page 18: The Kruskal-Wallis H Test Sporiš Goran, PhD

Output from Friedman's ANOVADescriptive Statistics

10 78,0543 20,2301 62,01 120,46 63,5549 69,2288 89,0709

10 77,4635 18,6150 65,38 119,96 66,2065 68,5728 82,5385

10 77,0668 16,1061 55,43 114,26 68,4525 72,2493 83,8365

START Weight at Start (Kg)

MONTH1 Weight after 1 month (Kg)

MONTH2 Weight after 2 month (Kg)

N Mean Std. Deviation Minimum Maximum 25th 50th (Median) 75th

Percentiles

Ranks

1,90

2,00

2,10

START Weight atStart (Kg)

MONTH1 Weightafter 1 month (Kg)

MONTH2 Weightafter 2 month (Kg)

Mean Rank

Test Statisticsa

10

,200

2

,905

N

Chi-Square

df

Asymp. Sig.

Friedman Testa.

The F-Statistics iscalled Chi-Square, here.It has df=2 (k-1, wherek is the # of groups).The statistics is n.s.

Page 19: The Kruskal-Wallis H Test Sporiš Goran, PhD

Posthoc tests for Friedman's ANOVAWilcoxon signed-rank tests but correcting for the

numbers of tests we do, here = .05/3=.0167.

Ranks

4a 7,00 28,00

6b 4,50 27,00

0c

10

5d 6,00 30,00

5e 5,00 25,00

0f

10

4g 7,25 29,00

6h 4,33 26,00

0i

10

Negative Ranks

Positive Ranks

Ties

Total

Negative Ranks

Positive Ranks

Ties

Total

Negative Ranks

Positive Ranks

Ties

Total

MONTH1 Weight after 1month (Kg) - START Weight at Start (Kg)

MONTH2 Weight after 2month (Kg) - START Weight at Start (Kg)

MONTH2 Weight after 2month (Kg) - MONTH1 Weight after 1 month (Kg)

N Mean Rank Sum of Ranks

MONTH1 Weight after 1 month (Kg) < START Weight at Start (Kg)a.

MONTH1 Weight after 1 month (Kg) > START Weight at Start (Kg)b.

START Weight at Start (Kg) = MONTH1 Weight after 1 month (Kg)c.

MONTH2 Weight after 2 month (Kg) < START Weight at Start (Kg)d.

MONTH2 Weight after 2 month (Kg) > START Weight at Start (Kg)e.

START Weight at Start (Kg) = MONTH2 Weight after 2 month (Kg)f.

MONTH2 Weight after 2 month (Kg) < MONTH1 Weight after 1 month (Kg)g.

MONTH2 Weight after 2 month (Kg) > MONTH1 Weight after 1 month (Kg)h.

MONTH1 Weight after 1 month (Kg) = MONTH2 Weight after 2 month (Kg)i.

Test Statisticsb

-,051a -,255a -,153a

,959 ,799 ,878

Z

Asymp. Sig. (2-tailed)

MONTH1 Weight after1 month (Kg)

- START Weight atStart (Kg)

MONTH2 Weight after2 month (Kg)

- START Weight atStart (Kg)

MONTH2 Weight after2 month (Kg)- MONTH1 Weight after1 month (Kg)

Based on positive ranks.a.

Wilcoxon Signed Ranks Testb.

Analyze Nonparametric Tests 2-Related Tests, tick 'Wilcoxon', specify the 3 pairs of groups

All comparisons are ns, asexpected from the overall ns

effect.

Mean ranks and sum of ranksfor all 3 comparisons

So, actually,we do nothave to calculate

any further...

Page 20: The Kruskal-Wallis H Test Sporiš Goran, PhD

Posthoc tests for Friedman's ANOVA- calculation by hand

We take the difference between the mean ranks of the different groups and compare them to a value based on the value of z (corrected for the # of comparions) and a constant based on the total sample size (n=10) and the # of conditions (k=3)

Ru - Rvzk(k-1) k(k+1)/6N

zk(k-1) = .05/3(3-1) = .00833

If the difference is significant, it should have a higher value than the value of z for which only .00833 other values of z are bigger. As before, we look in the Appendix A.1 under the column Smaller Portion. The number corresponding to .00833 is the critical value: it is between 2.39 and 2.4.

k(k-1) = 3 (3-1) = 6

Page 21: The Kruskal-Wallis H Test Sporiš Goran, PhD

Calculating the critical differences

Critical difference = zk(k-1) k(k+1)/6N

crit. Diff = 2.4 (3(3+1)/6x10crit. Diff = 2.4 12/60crit. Diff = 2.4 0.2crit. Diff = 1.07

If the differences between mean ranks are the critical difference 1.07, then that difference is significant.

Page 22: The Kruskal-Wallis H Test Sporiš Goran, PhD

Calculating the differences between mean ranks for diet data

None of the differences is the critical difference 1.07, hence none of the comparisons is significant.

ComparisonStart – 1 month 1,9 2 -0,1 0,1Start – 2 months 1,9 2,1 -0,2 0,21 month – 2 months 2 2,1 -0,1 0,1

Ru

Rv

Ru - R

v R

u R

v

Page 23: The Kruskal-Wallis H Test Sporiš Goran, PhD

Calculating the effect size

Again, we will only calculate the effect sizes for single comparisons:

r = z 2n

rStart – 1 month

= -0.051/ = -.01

rStart – 2 months

= -0.255/0 = -.06

r1 month – 2 months

= -0.153/0 =-.03

Group comparisonsStart – 1 month -0,051Start – 2 months -0,2551 month – 2 months -0,153

z-value

Tiny effectsTiny effectsTiny effects

Page 24: The Kruskal-Wallis H Test Sporiš Goran, PhD

Reporting the results of Friedman's ANOVA (Field_2005_566)

„The weight of participants did not significantly change over the 2 months of the diet (2(2) = 0.20, p > .05). Wilcoxon tests were used to follow up on this finding. A Bonferroni correction was applied and so all effects are reported at a .0167 level of significance. It appeared that weight didn't significantly change from the start of the diet to 1 month, T=27, r=-.01, from the start of the diet to 2 months, T=25, r=-.06, or from 1 month to 2 months, T=26,r=-0.3. We can conclude that the Andikinds diet (...) is a complete failure.“