statistical tests for computational intelligence research...

52
Statistical Tests for Computational Intelligence Research and Human Subjective Tests Hideyuki TAKAGI Kyushu University, Japan http://www.design.kyushu-u.ac.jp/~takagi/ ver. March 26, 2015 ver. July 15, 2013 ver. July 11, 2013 ver. April 23, 2013 Slides are downloadable from http://www.design.kyushu-u.ac.jp/~takagi Contents 2 groups n groups (n > 2) data distribution unpaired -test sign test Wilcoxon signed-ranks test Friedman test Kruskal-Wallis test one-way ANOVA two-way ANOVA (no normality) one-way data two-way data Parametric Test Non-parametric Test (normality) unpaired (independent) paired (related) unpaired (independent) paired (related) paired -test Mann-Whitney U-test ANOVA (Analysis of Variance) Scheffé's method of paired comparison for Human Subjective Tests

Upload: others

Post on 20-Jul-2020

11 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Statistical Tests for Computational Intelligence Research ...homepages.ecs.vuw.ac.nz/.../StatisticalTests.pdf · Parametric Test Non-parametric Test (normality) unpaired (independent)

Statistical Tests for Computational Intelligence Research and

Human Subjective Tests

Hideyuki TAKAGIKyushu University, Japan

http://www.design.kyushu-u.ac.jp/~takagi/ver. March 26, 2015ver. July 15, 2013ver. July 11, 2013ver. April 23, 2013

Slides are downloadable fromhttp://www.design.kyushu-u.ac.jp/~takagi

Contents2 groups n groups (n > 2)

datadistribution

・unpaired t -test

・sign test

・Wilcoxon signed-ranks test ・Friedman test

・Kruskal-Wallis test

・ one-way ANOVA

・ two-way ANOVA

(no

norm

ality

)

one-waydata

two-waydata

Par

amet

ric T

est

Non

-par

amet

ric T

est

(nor

mal

ity)

unpa

ired

(inde

pend

ent)

pair

ed(r

elat

ed)

unpa

ired

(inde

pend

ent)

pair

ed(r

elat

ed)

・paired t -test

・Mann-Whitney U-test

AN

OV

A(A

naly

sis

of V

aria

nce)

+Scheffé's method of paired comparison for Human Subjective Tests

Page 2: Statistical Tests for Computational Intelligence Research ...homepages.ecs.vuw.ac.nz/.../StatisticalTests.pdf · Parametric Test Non-parametric Test (normality) unpaired (independent)

fitne

ss

generationsfit

ness

generations

conventionalEC

proposed EC1 proposed EC2

How to Show Significance?

conventionalEC

Just compare averages visually?It is not scientific.

Fig. XX Average convergence curves of n times of trial runs.

How to Show Significance?

sound made byconventional IEC

sound made byproposed IEC1

sound made byproposed IEC2

Sound design concept: exiting

Which method is good to make exiting sound?How to show it?

Page 3: Statistical Tests for Computational Intelligence Research ...homepages.ecs.vuw.ac.nz/.../StatisticalTests.pdf · Parametric Test Non-parametric Test (normality) unpaired (independent)

You cannot show the superiority of your method without statistical tests.

Papers without statistics tests may be rejected.

statistical test

My method is significantly better!

Which Test Should We Use?2 groups n groups (n > 2)

datadistribution

・unpaired t -test

・sign test

・Wilcoxon signed-ranks test ・Friedman test

・Kruskal-Wallis test

・ one-way ANOVA

・ two-way ANOVA

(no

norm

ality

)

one-waydata

two-waydata

Par

amet

ric T

est

Non

-par

amet

ric T

est

(nor

mal

ity)

unpa

ired

(inde

pend

ent)

paire

d(r

elat

ed)

unpa

ired

(inde

pend

ent)

paire

d(r

elat

ed)

・paired t -test

・Mann-Whitney U-test

AN

OV

A(A

naly

sis

of V

aria

nce)

Page 4: Statistical Tests for Computational Intelligence Research ...homepages.ecs.vuw.ac.nz/.../StatisticalTests.pdf · Parametric Test Non-parametric Test (normality) unpaired (independent)

Which Test Should We Use?2 groups n groups (n > 2)

datadistribution

・unpaired t -test

・sign test

・Wilcoxon signed-ranks test ・Friedman test

・Kruskal-Wallis test

・ one-way ANOVA

・ two-way ANOVA

(no

norm

ality

)

one-waydata

two-waydata

Par

amet

ric T

est

Non

-par

amet

ric T

est

(nor

mal

ity)

unpa

ired

(inde

pend

ent)

paire

d(r

elat

ed)

unpa

ired

(inde

pend

ent)

paire

d(r

elat

ed)

・paired t -test

・Mann-Whitney U-test

AN

OV

A(A

naly

sis

of V

aria

nce)

n-th generation n-th generation

2 groups n groups (n > 2)

datadistribution

・unpaired t -test

・sign test

・Wilcoxon signed-ranks test ・Friedman test

・Kruskal-Wallis test

・ one-way ANOVA

・ two-way ANOVA

one-waydata

two-waydata

unpa

ired

(inde

pend

ent)

paire

d(r

elat

ed)

unpa

ired

(inde

pend

ent)

paire

d(r

elat

ed)

・paired t -test

・Mann-Whitney U-test

AN

OV

A(A

naly

sis

of V

aria

nce)

(no

norm

ality

)P

aram

etric

Tes

tN

on-p

aram

etric

Tes

t(n

orm

ality

)

Which Test Should we Use?

• Anderson-Darling test• D'Agostino-Pearson test• Kolmogorov-Smirnov test• Shapiro-Wilk test• Jarque–Bera test

・・・・

Normality Test

Page 5: Statistical Tests for Computational Intelligence Research ...homepages.ecs.vuw.ac.nz/.../StatisticalTests.pdf · Parametric Test Non-parametric Test (normality) unpaired (independent)

2 groups n groups (n > 2)

datadistribution

・unpaired t -test

・sign test

・Wilcoxon signed-ranks test ・Friedman test

・Kruskal-Wallis test

・ one-way ANOVA

・ two-way ANOVA

one-waydata

two-waydata

・paired t -test

・Mann-Whitney U-test

AN

OV

A(A

naly

sis

of V

aria

nce)

(no

norm

ality

)P

aram

etric

Tes

tN

on-p

aram

etric

Tes

t(n

orm

ality

)

Which Test Should We Use?

unpa

ired

(inde

pend

ent)

paire

d(r

elat

ed)

unpa

ired

(inde

pend

ent)

paire

d(r

elat

ed)

initialdata #

conventional

proposed

1 4.23 2.51

2 3.21 3.30

3 3.63 3.75

4 4.42 3.22

5 4.08 3.99

6 3.98 3.65

unpaired data(independent)

paired data(related)

group A group B

4.23 2.51

3.21 3.3

3.63 3.75

4.42 3.22

4.08 3.99

3.98 3.65

2 groups n groups (n > 2)

datadistribution

・unpaired t -test

・sign test

・Wilcoxon signed-ranks test ・Friedman test

・Kruskal-Wallis test

・ one-way ANOVA

・ two-way ANOVA

one-waydata

two-waydata

・paired t -test

・Mann-Whitney U-test

AN

OV

A(A

naly

sis

of V

aria

nce)

(no

norm

ality

)P

aram

etric

Tes

tN

on-p

aram

etric

Tes

t(n

orm

ality

)

Which Test Should We Use?

unpa

ired

(inde

pend

ent)

paire

d(r

elat

ed)

unpa

ired

(inde

pend

ent)

paire

d(r

elat

ed)

unpaired data(independent)

paired data(related)

A group data B group data initialdata #

GA proposed

Page 6: Statistical Tests for Computational Intelligence Research ...homepages.ecs.vuw.ac.nz/.../StatisticalTests.pdf · Parametric Test Non-parametric Test (normality) unpaired (independent)

2 groups n groups (n > 2)

datadistribution

・unpaired t -test

・sign test

・Wilcoxon signed-ranks test ・Friedman test

・Kruskal-Wallis test

・ one-way ANOVA

・ two-way ANOVA

one-waydata

two-waydata

・paired t -test

・Mann-Whitney U-test

AN

OV

A(A

naly

sis

of V

aria

nce)

(no

norm

ality

)P

aram

etric

Tes

tN

on-p

aram

etric

Tes

t(n

orm

ality

)

Which Test Should We Use?

unpa

ired

(inde

pend

ent)

paire

d(r

elat

ed)

unpa

ired

(inde

pend

ent)

paire

d(r

elat

ed)

unpaired data(independent)

paired data(related)

A group data B group data initialdata #

GA proposed

Q1: Which tests are more sensitive,those for unpaired data or paired data?

A1: Statistical tests for paired data because of more data information.

2 groups n groups (n > 2)

datadistribution

・unpaired t -test

・sign test

・Wilcoxon signed-ranks test ・Friedman test

・Kruskal-Wallis test

・ one-way ANOVA

・ two-way ANOVA

one-waydata

two-waydata

・paired t -test

・Mann-Whitney U-test

AN

OV

A(A

naly

sis

of V

aria

nce)

(no

norm

ality

)P

aram

etric

Tes

tN

on-p

aram

etric

Tes

t(n

orm

ality

)

Which Test Should We Use?

unpa

ired

(inde

pend

ent)

paire

d(r

elat

ed)

unpa

ired

(inde

pend

ent)

paire

d(r

elat

ed)

Q2: How should you design your experimental conditions to use statistical tests for paired data and reduce the # of trial runs?

A2: Use the same initialized data for the set of (method A, method B) at each trial run.

n-th generation

significant?

Page 7: Statistical Tests for Computational Intelligence Research ...homepages.ecs.vuw.ac.nz/.../StatisticalTests.pdf · Parametric Test Non-parametric Test (normality) unpaired (independent)

2 groups n groups (n > 2)

datadistribution

・unpaired t -test

・sign test

・Wilcoxon signed-ranks test ・Friedman test

・Kruskal-Wallis test

・ one-way ANOVA

・ two-way ANOVA

one-waydata

two-waydata

unpa

ired

(inde

pend

ent)

paire

d(r

elat

ed)

unpa

ired

(inde

pend

ent)

paire

d(r

elat

ed)

・paired t -test

・Mann-Whitney U-test

AN

OV

A(A

naly

sis

of V

aria

nce)

(no

norm

ality

)P

aram

etric

Tes

tN

on-p

aram

etric

Tes

t(n

orm

ality

)

Which Test Should we Use?Q3: Which statistical tests are sensitive,

parametric tests or non-parametric onesand why?

A3: Parametric tests which can use information of assumed data distribution.

t -Test2 groups n groups (n > 2)

datadistribution

・unpaired t -test

・sign test

・Wilcoxon signed-ranks test ・Friedman test

・Kruskal-Wallis test

・ one-way ANOVA

・ two-way ANOVA

(no

norm

ality

)

one-waydata

two-waydata

Par

amet

ric T

est

Non

-par

amet

ric T

est

(nor

mal

ity)

unpa

ired

(inde

pend

ent)

paire

d(r

elat

ed)

unpa

ired

(inde

pend

ent)

paire

d(r

elat

ed)

・paired t -test

・Mann-Whitney U-test

AN

OV

A(A

naly

sis

of V

aria

nce)

t -test

Page 8: Statistical Tests for Computational Intelligence Research ...homepages.ecs.vuw.ac.nz/.../StatisticalTests.pdf · Parametric Test Non-parametric Test (normality) unpaired (independent)

gt-Test

How to Show Significance?

n-th generation

significant?

A B

12 10

14 9

14 7

11 15

16 11

19 10

significantdifference?

Conditions to use t-tests:(1) normality(2) equal variances (not essential though )

Test this difference with assuming no difference.(null hypothesis)

t-Testt-Test

Page 9: Statistical Tests for Computational Intelligence Research ...homepages.ecs.vuw.ac.nz/.../StatisticalTests.pdf · Parametric Test Non-parametric Test (normality) unpaired (independent)

Conditions to use t-tests:(1) normality(2) equal variances (not essential though )

A B

12 10

14 9

14 7

11 15

16 11

19 10

significantdifference?

Test this difference with assuming no difference.(null hypothesis)

Normality Test• Anderson-Darling test• D'Agostino-Pearson test• Kolmogorov-Smirnov test• Shapiro-Wilk test• Jarque–Bera test

・・・・

F-Test

When (p > 0.05), we assume that there is no significant difference between σ2

A and σ2B .

t-Testt-Test

t-Testt-TestExcel (32 bits version only?) has t-tests and ANOVA in Data Analysis Tools. You must install its add-in. (File -> option -> add-in, and set its add-in.)

Page 10: Statistical Tests for Computational Intelligence Research ...homepages.ecs.vuw.ac.nz/.../StatisticalTests.pdf · Parametric Test Non-parametric Test (normality) unpaired (independent)

t-Test

(1) t-Test: Pairs two sample for means

(2) t-Test: Two-sample assuming equal variances

(3) t-Test: Two-sample assuming unequal variances: Welch's t-test

n-th generation

significant? This is a case when each pair of two methods with the same initial condition.

t-Test

A B 4.23 2.51 3.21 3.31 3.63 3.75 4.42 3.22 4.08 3.99 3.98 3.65 3.68 3.35 4.18 3.93 3.85 3.91 3.71 3.82

Variable 1 Variable 2

Mean 3.897 3.544

Variance 0.125823333 0.208693333

Observations 10 10

Pearson Correlation -0.161190073Hypothesized Mean Difference

0

df 9

t Stat 1.794964241

P(T<=t) one-tail 0.053116886

t Critical one-tail 1.833112933

P(T<=t) two-tail 0.106233772

t Critical two-tail 2.262157163

sample data t-Test: Paired Two Sample for Means

Page 11: Statistical Tests for Computational Intelligence Research ...homepages.ecs.vuw.ac.nz/.../StatisticalTests.pdf · Parametric Test Non-parametric Test (normality) unpaired (independent)

t-Test

A B 4.23 2.51 3.21 3.31 3.63 3.75 4.42 3.22 4.08 3.99 3.98 3.65 3.68 3.35 4.18 3.93 3.85 3.91 3.71 3.82

Variable 1 Variable 2

Mean 3.897 3.544

Variance 0.125823333 0.208693333

Observations 10 10

Pearson Correlation -0.161190073Hypothesized Mean Difference

0

df 9

t Stat 1.794964241

P(T<=t) one-tail 0.053116886

t Critical one-tail 1.833112933

P(T<=t) two-tail 0.106233772

t Critical two-tail 2.262157163

sample data t-Test: Paired Two Sample for Means

2.5% 2.5% 5%

When p-value is less than 0.01 or 0.05, we assume that there is significant difference with the level of significance of (p < 0.01) or (p < 0.05).

A > B A < BA ≈ B When A>B never happens, you may use a one-tail test.

t-Test

(1) t-Test: Pairs two sample for means (2) t-Test: Two-sample assuming equal variances

Difference between two groups is significant (p < 0.01).

We cannot say that there is a significant difference between two group.

Page 12: Statistical Tests for Computational Intelligence Research ...homepages.ecs.vuw.ac.nz/.../StatisticalTests.pdf · Parametric Test Non-parametric Test (normality) unpaired (independent)

ANOVA: Analysis of Variance2 groups n groups (n > 2)

datadistribution

・unpaired t -test

・sign test

・Wilcoxon signed-ranks test ・Friedman test

・Kruskal-Wallis test

・ one-way ANOVA

・ two-way ANOVA

(no

norm

ality

)

one-waydata

two-waydata

Par

amet

ric T

est

Non

-par

amet

ric T

est

(nor

mal

ity)

unpa

ired

(inde

pend

ent)

paire

d(r

elat

ed)

unpa

ired

(inde

pend

ent)

paire

d(r

elat

ed)

・paired t -test

・Mann-Whitney U-test

AN

OV

A(A

naly

sis

of V

aria

nce)

ANOVA

ANOVA: Analysis of Variance

n-th generation

significant?

Page 13: Statistical Tests for Computational Intelligence Research ...homepages.ecs.vuw.ac.nz/.../StatisticalTests.pdf · Parametric Test Non-parametric Test (normality) unpaired (independent)

A B C

11.0 12.8 9.4

9.3 11.3 12.4

11.5 9.5 16.8

16.4 14.0 14.3

16.0 15.2 17.0

15.0 13.0 14.6

12.8 12.4 17.0

13.6 15.0 14.3

13.0 12.4 15.6

12.0 17.8 15.0

13.4 12.6 18.6

10.0 13.4 12.4

10.8 16.8 15.4

1. Analysis of more than two data groups.2. Normality and equal variance are required.

C A B

Excel has ANOVA in Data Analysis Tools.

ANOVA: Analysis of Variance

A B C

11.0 12.8 9.4

9.3 11.3 12.4

11.5 9.5 16.8

16.4 14.0 14.3

16.0 15.2 17.0

15.0 13.0 14.6

12.8 12.4 17.0

13.6 15.0 14.3

13.0 12.4 15.6

12.0 17.8 15.0

13.4 12.6 18.6

10.0 13.4 12.4

10.8 16.8 15.4

1. Analysis of more than two data groups.2. Normality and equal variance are required.

C A B

three t-tests one ANOVA=1-(1-0.05)3 = 0.14

Three times of t-test with (p<0.05) equivalent one ANOVA (p<0.14).

Excel has ANOVA in Data Analysis Tools.

ANOVA: Analysis of Variance

Check it using the Bartlett test.

Page 14: Statistical Tests for Computational Intelligence Research ...homepages.ecs.vuw.ac.nz/.../StatisticalTests.pdf · Parametric Test Non-parametric Test (normality) unpaired (independent)

ANOVA: Analysis of Variance

n-th generation

When data are independent, use one-way ANOVA (single factor ANOVA).

When data correspond each other, use two-way ANOVA (two-factor ANOVA).

ANOVA: Analysis of Variance

When data are independent, use one-way ANOVA (single factor ANOVA).

When data correspond each other, use two-way ANOVA (two-factor ANOVA).

Q1: What are "single factor" and "two factors"?

A1: A column factor (e.g. three groups) and a sample factor (e.g. initialized condition).

column factor

sam

ple

fact

or

column factor

Page 15: Statistical Tests for Computational Intelligence Research ...homepages.ecs.vuw.ac.nz/.../StatisticalTests.pdf · Parametric Test Non-parametric Test (normality) unpaired (independent)

ANOVA: Analysis of Variance

column factor

sam

ple

fact

or

column factor

one-factor (one-way) ANOVA two-factor (two-way) ANOVA

group A group B group C4.23 2.51 3.043.21 3.3 2.893.63 3.75 3.554.42 3.22 4.394.08 3.99 3.863.98 3.65 3.53.75 2.62 3.63.22 2.93 3.21

initial condition

group A group B group C

#1 4.23 2.51 3.04#2 3.21 3.3 2.89#3 3.63 3.75 3.55#4 4.42 3.22 4.39#5 4.08 3.99 3.86#6 3.98 3.65 3.5#7 3.75 2.62 3.6#8 3.22 2.93 3.21

We cannot say that three groups are significantly different. (p=0.089)

There are significant difference somewhere among three groups. (p<0.05)

ANOVA: Analysis of Variance

Output of the one-way ANOVASource of Variation SS df MS F P-value F crit

Between Groups 6.11342 2 3.05671 15.30677 3.6E-05 3.354131Within Groups 5.39181 27 0.199697

Total 11.50523 29

Output of the two-way ANOVASource of Variation SS df MS F P-value F crit

Sample 0.755233 2 0.377617 2.755097 0.103596 3.885294Columns 3.582272 1 3.582272 26.13631 0.000256 4.747225Interaction 0.139411 2 0.069706 0.508573 0.613752 3.885294Within 1.644733 12 0.137061

Total 6.12165 17

When (p-value < 0.01 or 0.05), there is(are) significant differencesomewhere among data groups.

• Significant difference among Sample (e.g. initial conditions) cannot be found (p > 0.05).

• Significant difference can be found somewhere among Columns (e.g. three methods) (p < 0.01).

• We need not care an interaction effect between two factors (e.g. initial condition vs. methods) (p > 0.05).

A B C

11.0 12.8 9.4

9.3 11.3 12.4

11.5 9.5 16.8

16.4 14.0 14.3

16.0 15.2 17.0

15.0 13.0 14.6

12.8 12.4 17.0

13.6 15.0 14.3

13.0 12.4 15.6

12.0 17.8 15.0

13.4 12.6 18.6

10.0 13.4 12.4

10.8 16.8 15.4

Sam

ple

fact

or

Columnfactor

Page 16: Statistical Tests for Computational Intelligence Research ...homepages.ecs.vuw.ac.nz/.../StatisticalTests.pdf · Parametric Test Non-parametric Test (normality) unpaired (independent)

ANOVA: Analysis of Variance

Source of Variation SS df MS F P-value F critSample 0.755233 2 0.377617 2.755097 0.103596 3.885294Columns 3.582272 1 3.582272 26.13631 0.000256 4.747225Interaction 0.139411 2 0.069706 0.508573 0.613752 3.885294Within 1.644733 12 0.137061

Total 6.12165 17

A B C

11.0 12.8 9.4

9.3 11.3 12.4

11.5 9.5 16.8

16.4 14.0 14.3

16.0 15.2 17.0

15.0 13.0 14.6

12.8 12.4 17.0

13.6 15.0 14.3

13.0 12.4 15.6

12.0 17.8 15.0

13.4 12.6 18.6

10.0 13.4 12.4

10.8 16.8 15.4

Sam

ple

fact

or

Columnfactor

Q1: Where is significant among A, B, and C?

A1: Apply multiple comparisons between all pairsamong columns.

(Fisher's PLSD method, Scheffé method, Bonferroni-Dunn test, Dunnett method, Williams method, Tukey method, Nemenyi test, Tukey-Kramer method, Games/Howell method, Duncan's new multiple range test, Student-Newman-Keuls method, etc. Each has different characteristics.)

significant?

Non-Parametric Tests

2 groups n groups (n > 2)

datadistribution

・unpaired t -test

・sign test

・Wilcoxon signed-ranks test ・Friedman test

・Kruskal-Wallis test

・ one-way ANOVA

・ two-way ANOVA

(no

norm

ality

)

one-waydata

two-waydata

Par

amet

ric T

est

Non

-par

amet

ric T

est

(nor

mal

ity)

unpa

ired

(inde

pend

ent)

paire

d(r

elat

ed)

unpa

ired

(inde

pend

ent)

paire

d(r

elat

ed)

・paired t -test

・Mann-Whitney U-test

AN

OV

A(A

naly

sis

of V

aria

nce)

If normality and equal variances are not guaranteed, use non-parametric tests.

Page 17: Statistical Tests for Computational Intelligence Research ...homepages.ecs.vuw.ac.nz/.../StatisticalTests.pdf · Parametric Test Non-parametric Test (normality) unpaired (independent)

Mann-Whitney U-test2 groups n groups (n > 2)

datadistribution

・unpaired t -test

・sign test

・Wilcoxon signed-ranks test ・Friedman test

・Kruskal-Wallis test

・ one-way ANOVA

・ two-way ANOVA

(no

norm

ality

)

one-waydata

two-waydata

Par

amet

ric T

est

Non

-par

amet

ric T

est

(nor

mal

ity)

unpa

ired

(inde

pend

ent)

paire

d(r

elat

ed)

unpa

ired

(inde

pend

ent)

paire

d(r

elat

ed)

・paired t -test

・Mann-Whitney U-test

AN

OV

A(A

naly

sis

of V

aria

nce)

Mann-Whitney U-test(Wilcoxon-Mann-Whitney test, two sample Wilcoxon test)

n-th generation

?

?no normality

1. Comparison of two groups.2. Data have no normality.3. There are no data corresponding

between two groups (independent).

Page 18: Statistical Tests for Computational Intelligence Research ...homepages.ecs.vuw.ac.nz/.../StatisticalTests.pdf · Parametric Test Non-parametric Test (normality) unpaired (independent)

Mann-Whitney U-test(Wilcoxon-Mann-Whitney test, two sample Wilcoxon test)

1. Calculate a U value.

0

2

34

U = 0 + 2 + 3 + 4 = 9U' = 11 (U + U' = n1n2)

when two values are the same, count as 0.5.( )

2. See a U-test table.

Mann-Whitney U-test (cont.)(Wilcoxon-Mann-Whitney test, two sample Wilcoxon test)

• Use the smaller value of U or U'.• When n1 ≤ 20 and n2 ≤ 20 , see a Mann-Whitney test table.

(where n1 and n2 are the # of data of two groups.)• Otherwise, since U follows the below normal distribution roughly,

normalize U as and check a standard normal distribution table

with the , where and .

12

)1(,

2, 2121212 nnnnnn

NN UU

U

UUz

221nn

U 12

)1( 2121

nnnnUz

Use an Excel function to calculate the p-value for the z-value:p-value = 1 - NORM.S.DIST( z )

Page 19: Statistical Tests for Computational Intelligence Research ...homepages.ecs.vuw.ac.nz/.../StatisticalTests.pdf · Parametric Test Non-parametric Test (normality) unpaired (independent)

Examples: Mann-Whitney U-test(Wilcoxon-Mann-Whitney test, two sample Wilcoxon test)

U = 9U' = 11

0

2

34

00.5

2.54

5

3.5

55

5

U = 23.5U' = 1.5

U = 12U' = 13

Ex.1 Ex.2 Ex.3

4 5 6 ・・・

・・・ ー ・・・ ・・・ ・・・

4 0 1 2 ・・・

5 2 3 ・・・

・・・ ・・・ ・・・

n1

n2

(p < 0.05)

4 5 6 ・・・

・・・ ー ・・・ ・・・ ・・・

4 ー ー 0 ・・・

5 1 1 ・・・

・・・ ・・・ ・・・

n1

n2

(p < 0.01)

5

Exercise: Mann-Whitney U-test(Wilcoxon-Mann-Whitney test, two sample Wilcoxon test)

2.5

45

6

66

U = 29.5U' = 6.5

4 5 6 7

3 ー 0 1 1

4 0 1 2 3

5 2 3 5

6 5 6

n1

n2

(p < 0.05)

4 5 6 7

3 ー ー ー ー

4 ー ー 0 0

5 1 1 1

6 2 3

n1

n2

(p < 0.01)

Since U' > 5, (p > 0.05): significance is not found( )

Page 20: Statistical Tests for Computational Intelligence Research ...homepages.ecs.vuw.ac.nz/.../StatisticalTests.pdf · Parametric Test Non-parametric Test (normality) unpaired (independent)

Sign Test2 groups n groups (n > 2)

datadistribution

・unpaired t -test

・sign test

・Wilcoxon signed-ranks test ・Friedman test

・Kruskal-Wallis test

・ one-way ANOVA

・ two-way ANOVA

(no

norm

ality

)

one-waydata

two-waydata

Par

amet

ric T

est

Non

-par

amet

ric T

est

(nor

mal

ity)

unpa

ired

(inde

pend

ent)

paire

d(r

elat

ed)

unpa

ired

(inde

pend

ent)

paire

d(r

elat

ed)

・paired t -test

・Mann-Whitney U-test

AN

OV

A(A

naly

sis

of V

aria

nce)

(1)Sign Test

(2)Wilcoxon's Signed Ranks Test

significance test between the # of winnings and losses

173 174

143 137

158 151

156 143

176 180

165 162

- ++ -+ -+ -- ++ -

-1+6+7+13-4+3

data of 2 groups

# of winnings and losses

the level of winnings/losses

Sign Test

significance test using both the # of winnings and losses and the level of winnings/losses

Page 21: Statistical Tests for Computational Intelligence Research ...homepages.ecs.vuw.ac.nz/.../StatisticalTests.pdf · Parametric Test Non-parametric Test (normality) unpaired (independent)

n-th generation

Sign Test

1. Calculate the # of winnings and losses by comparing runs with the same initial data.

2. Check a sign test table to show significance of two methods.

Sign Test Generations 0 10 20 30 40 50 |__________|__________|__________|__________|__________|F1: DE_N vs. DE_LR +++++++++++++++++++++++++++++++++++++++++++++++++F1: DE_N vs. DE_LS +++++++++++++++++++++++++++++++++++++++++++++++++F1: DE_N vs. DE_FR_GLB_nD + +++++++++++++++++++++++++++++++++++++F1: DE_N vs. DE_FR_LOC_nD ++ ++++++++++++++++++++++++++++++++++++F1: DE_N vs. DE_FR_GLB_1D + +++++++++++++++++++++++++++++++++++++F1: DE_N vs. DE_FR_LOC_1D +++ ++++++++++++++++++++++++++++++++++++F2: DE_N vs. DE_LR + ++++++++++++++++++++++++++++++++++++++++++F2: DE_N vs. DE_LS ++++++++++++++++++++++++++++++++++++++++++++F2: DE_N vs. DE_FR_GLB_nDF2: DE_N vs. DE_FR_LOC_nDF2: DE_N vs. DE_FR_GLB_1DF2: DE_N vs. DE_FR_LOC_1DF3: DE_N vs. DE_LRF3: DE_N vs. DE_LSF3: DE_N vs. DE_FR_GLB_nD ++++++++++++++++++++++++F3: DE_N vs. DE_FR_LOC_nDF3: DE_N vs. DE_FR_GLB_1D F3: DE_N vs. DE_FR_LOC_1D + +F4: DE_N vs. DE_LR +++++++++++++++++++++++++++++++++++++++++++++++++F4: DE_N vs. DE_LS ++++++++++++++++++++++++++++++++++++++++++++++++++F4: DE_N vs. DE_FR_GLB_nD ++++++++++++++++++++++++++++++++++++++++++++++F4: DE_N vs. DE_FR_LOC_nDF4: DE_N vs. DE_FR_GLB_1D + ++++++++++++++++++++++++++++++++++++++++++++++F4: DE_N vs. DE_FR_LOC_1D ++++++++++++++++++++++++++++++++++++++++++++++F5: DE_N vs. DE_LR + ++ +F5: DE_N vs. DE_LSF5: DE_N vs. DE_FR_GLB_nD +++++++++++++++++++ F5: DE_N vs. DE_FR_LOC_nD ++F5: DE_N vs. DE_FR_GLB_1D +++ F5: DE_N vs. DE_FR_LOC_1D +F6: DE_N vs. DE_LR ++++++++++++ + +++++++F6: DE_N vs. DE_LS +++++++++++++++++++++++++++++++++++ +++++F6: DE_N vs. DE_FR_GLB_nD + + +++++++++++++++++++++++++++++++++++F6: DE_N vs. DE_FR_LOC_nD +++++++++++++++++++++++++++++++++F6: DE_N vs. DE_FR_GLB_1D + + +++++++++++++++++++++++++++++++++++F6: DE_N vs. DE_FR_LOC_1D +++++++++ ++++++++++++++++++++++ +++F7: DE_N vs. DE_LR +F7: DE_N vs. DE_LSF7: DE_N vs. DE_FR_GLB_nD ++++++ + ++F7: DE_N vs. DE_FR_LOC_nD +F7: DE_N vs. DE_FR_GLB_1D +F7: DE_N vs. DE_FR_LOC_1DF8: DE_N vs. DE_LR +++++++++++++++++++++++++++++++++++F8: DE_N vs. DE_LS ++++++++++++++++++++++++++++++++++++++++++++++++F8: DE_N vs. DE_FR_GLB_nD +++++++++++++++++++++++++++++++++++++++++++++++F8: DE_N vs. DE_FR_LOC_nD +F8: DE_N vs. DE_FR_GLB_1D +++++++++++++++++++++++++++++++++++F8: DE_N vs. DE_FR_LOC_1D +++ +++++++++++++++++++++++++++++++++++++++++++++

Fig.3 in Y. Pei and H. Takagi, "Fourier analysis of the fitness landscape for evolutionary search acceleration," IEEE Congress on Evolutionary Computation (CEC), pp.1-7, Brisbane, Australia (June 10-15, 2012).

The (+,-) marks show whether our proposed methods converge significantly better or poorer than normal DE, respectively, (p ≤0.05).

Fig.2 in the same paper.

Page 22: Statistical Tests for Computational Intelligence Research ...homepages.ecs.vuw.ac.nz/.../StatisticalTests.pdf · Parametric Test Non-parametric Test (normality) unpaired (independent)

Task ExampleWhether performances of pattern recognition methods A and B are significantly different?

n1 cases: Both methods succeeded.n2 cases: Method A succeeded, and method B failed.n3 cases: Method A failed, and method B succeeded.n4 cases: Both methods failed.

1. Set N = n2 + n3.2. Check the right table with the N.3. If min(n2, n3) is smaller than the number for the N,

we can say that there is significant difference with the significant risk level of XX.

How to check?

Whether there is significant difference for n2 = 12 and n3 = 28?

Exercise

level of significanceSign Test %%

ANSWER:Check the right table with N = 40.As n2 is bigger than 11 and smaller than 13, we can say that there is a significant difference between two with (p < 0.05) but cannot say so with (p < 0.01).

% %

level of significance

level of significance

%%Sign Test

Let's think about the case of N = 17.

To say that n1 and n2 are significantly different,

(n1 vs. n2) = (17 vs. 0), (16 vs. 1), or (15 vs. 2) (p < 0.01)

or

(n1 vs. n2) = (14 vs. 3) or (13 vs. 4) (p < 0.05)

Page 23: Statistical Tests for Computational Intelligence Research ...homepages.ecs.vuw.ac.nz/.../StatisticalTests.pdf · Parametric Test Non-parametric Test (normality) unpaired (independent)

Check the significance of:

Exercise: Sign Test level of significance

%%

16 vs. 4

14 vs. 1

9 vs. 3

18 vs. 5

Wilcoxon Signed-Ranks Test2 groups n groups (n > 2)

datadistribution

・unpaired t -test

・sign test

・Wilcoxon signed-ranks test ・Friedman test

・Kruskal-Wallis test

・ one-way ANOVA

・ two-way ANOVA

(no

norm

ality

)

one-waydata

two-waydata

Par

amet

ric T

est

Non

-par

amet

ric T

est

(nor

mal

ity)

unpa

ired

(inde

pend

ent)

paire

d(r

elat

ed)

unpa

ired

(inde

pend

ent)

paire

d(r

elat

ed)

・paired t -test

・Mann-Whitney U-test

AN

OV

A(A

naly

sis

of V

aria

nce)

Page 24: Statistical Tests for Computational Intelligence Research ...homepages.ecs.vuw.ac.nz/.../StatisticalTests.pdf · Parametric Test Non-parametric Test (normality) unpaired (independent)

n-th generation

Wilcoxon Signed-Ranks Test

Q: When a sign test could not show significance,how to do?

A: Try the Wilcoxon signed-ranks test. It is more sensitive than a simple sign test due to more information use.

(1)Sign Test

(2)Wilcoxon's Signed Ranks Test

significance test between the # of winnings and losses

173 174

143 137

158 151

156 143

176 180

165 162

- ++ -+ -+ -- ++ -

-1+6+7+13-4+3

data of 2 groups

# of winnings and losses

the level of winnings/losses

significance test using both the # of winnings and losses and the level of winnings/losses

Wilcoxon Signed-Ranks Test

Page 25: Statistical Tests for Computational Intelligence Research ...homepages.ecs.vuw.ac.nz/.../StatisticalTests.pdf · Parametric Test Non-parametric Test (normality) unpaired (independent)

Wilcoxon Signed-Ranks Test

v (system A) v (system B) difference d rank of |d| add sign to the ranks

rank of fewer # of signs

182 163 19 7 7169 142 27 8 8172 173 -1 1 -1 1143 137 6 4 4158 151 7 5 5156 143 13 6 6176 172 4 3 3165 168 -3 2 -2 2

(step 1) (step 2) (step 3) (step 4)

(step 5) )4(# StepofT

3

8n

(step 6)

Wilcoxon test table

Example:

(step 6)

n = 8T = 3

Wilcoxon Test Table: significance point of T

T=3 ≤ 3 (n=8, p<0.05), then difference between systems A and B is significant.

T=3 > 0 (n=8, p<0.01), then we cannot say there is a significant difference.

one-tail p < 0.025 p < 0.005

two-tail p < 0.05 p < 0.01

n = 6789

10111213141516171819202122232425

02358

101317212529344046525865738189

013579

121519232732374248546168

When n > 25As T follows the below normal distribution roughly,

normalize T as the below and check a standard normal distribution table with the z; see andin the above equation.

24

)12)(1(,

4

)1(, 2 nnnnn

NN TT

T

TTz

T T

Page 26: Statistical Tests for Computational Intelligence Research ...homepages.ecs.vuw.ac.nz/.../StatisticalTests.pdf · Parametric Test Non-parametric Test (normality) unpaired (independent)

Wilcoxon Signed-Ranks Test

v (system A) v (system B) difference d rank of |d| add sign to the ranks

rank of fewer # of signs

176 163 13 7 → 6.5 6.5142 142 0172 173 -1 1 -1 1143 137 6 4 4158 151 7 5 5156 143 13 6 → 6.5 6.5176 172 4 3 3165 168 -3 2 -2 2

(step 1) (step 2) (step 3) (step 4)

Tips:1. When d = 0, ignore the data.2. When there are the same ranks of |d|,

give average ranks.

10

12

3

45

6

7 89

Give the average rank 6.5 = (5+6+7+8)/4.

Tip #1Tip #2

Tip #2

Exercise 1: Wilcoxon Signed-Ranks Test(step 1) (step 2) (step 3) (step 4)

v (system A) v (system B) difference d rank of |d| add sign to the ranks

rank of fewer # of signs

182 163

169 142

173 172

143 137

158 151

156 143

176 172

165 168

n = (step 5) )4(# StepofT

(step 6)

Wilcoxon test table

T =

Page 27: Statistical Tests for Computational Intelligence Research ...homepages.ecs.vuw.ac.nz/.../StatisticalTests.pdf · Parametric Test Non-parametric Test (normality) unpaired (independent)

n = 8

Exercise 1: Wilcoxon Signed-Ranks Test(step 1) (step 2) (step 3) (step 4)

(step 5) )4(# StepofT

(step 6)

Wilcoxon test table

v (system A) v (system B) difference d rank of |d| add sign to the ranks

rank of fewer # of signs

182 163 19 7 7

169 142 27 8 8

173 172 1 1 1

143 137 6 4 4

158 151 7 5 5

156 143 13 6 6

176 172 4 3 3

165 168 -3 2 -2 2

T = 2

As T(=2) < 3, there is a significant difference between A and B (p<0.05).But, as 0 < T(=2), we cannot say so with the significance level of (p<0.01).

(step 1) (step 2) (step 3) (step 4)

(step 5) )4(# StepofT

(step 6)

Wilcoxon test table

v (system A) v (system B) difference d rank of |d| add sign to the ranks

rank of fewer # of signs

27 31

20 25

34 33

25 27

31 31

23 29

26 27

24 30

35 34

n =

Exercise 2: Wilcoxon Signed-Ranks Test

Page 28: Statistical Tests for Computational Intelligence Research ...homepages.ecs.vuw.ac.nz/.../StatisticalTests.pdf · Parametric Test Non-parametric Test (normality) unpaired (independent)

n = 8 (no count for d = 0.)

(step 1) (step 2) (step 3) (step 4)

(step 5) )4(# StepofT

(step 6)

Wilcoxon test table

v (system A) v (system B) difference d rank of |d| add sign to the ranks

rank of fewer # of signs

27 31 -4 5 -5

20 25 -5 6 -6

34 33 1 2 2 2

25 27 -2 4 -4

31 31 0

23 29 -6 7.5 -7.5

26 27 -1 2 -2

24 30 -6 7.5 -7.5

35 34 1 2 2 2

(No need to care the case of d = 0.)

T = 4

As T > 3, we cannot say that there is a significant difference between A and B.

Exercise 2: Wilcoxon Signed-Ranks Test

n-th generation

Explain how to apply this test to test whether two groups are significantly different at the below generation?

Exercise 3: Wilcoxon Signed-Ranks Test

Page 29: Statistical Tests for Computational Intelligence Research ...homepages.ecs.vuw.ac.nz/.../StatisticalTests.pdf · Parametric Test Non-parametric Test (normality) unpaired (independent)

Kruskal-Wallis Test2 groups n groups (n > 2)

datadistribution

・unpaired t -test

・sign test

・Wilcoxon signed-ranks test ・Friedman test

・Kruskal-Wallis test

・ one-way ANOVA

・ two-way ANOVA

(no

norm

ality

)

two-waydata

Par

amet

ric T

est

Non

-par

amet

ric T

est

(nor

mal

ity)

unpa

ired

(inde

pend

ent)

paire

d(r

elat

ed)

unpa

ired

(inde

pend

ent)

paire

d(r

elat

ed)

・paired t -test

・Mann-Whitney U-test

AN

OV

A(A

naly

sis

of V

aria

nce)

Kruskal-Wallis Test

n-th generation

1. Comparison of more than two groups.2. Data have no normality.3. There are no data corresponding

among groups (independent). ??? no

nor

mal

ity

Page 30: Statistical Tests for Computational Intelligence Research ...homepages.ecs.vuw.ac.nz/.../StatisticalTests.pdf · Parametric Test Non-parametric Test (normality) unpaired (independent)

Kruskal-Wallis Test

Let's use ranks of data.

1

23

45

6 7

8 910

11 121314

1516

17

Kruskal-Wallis TestN: total # of datak: # of groupsni: # of data of group iRi : sum of ranks of group i

R1 = 38 R2 = 69 R3 = 46

1. Rank all data.2. Calculate N, k, ni and Ri .3. Calculate statistical value H.

4. If k = 3 and N ≤ 17, compare the H with a significant point in a Kruskal-Wallis test table.Otherwise, assume that Hfollows the χ2 distribution and test the H using a χ2

distribution table of (k-1) degrees of freedom

)1(3)1(

12

1

2

Nn

R

NNH

k

i i

i

1

23

4 56 7

8 91011 1213

141516

17

How to Test

Page 31: Statistical Tests for Computational Intelligence Research ...homepages.ecs.vuw.ac.nz/.../StatisticalTests.pdf · Parametric Test Non-parametric Test (normality) unpaired (independent)

Example: Kruskal-Wallis Testn1 n2 n3 p < 0.05 p < 0.01 n1 n2 n3 p < 0.05 p < 0.012 2 2 - - 3 3 3 5.606 7.200 2 2 3 4.714 - 3 3 4 5.791 6.746 2 2 4 5.333 - 3 3 5 6.649 7.079 2 2 5 5.160 6.533 3 3 6 5.615 7.410 2 2 6 5.346 6.655 3 3 7 5.620 7.228 2 2 7 5.143 7.000 3 3 8 5.617 7.350 2 2 8 5.356 6.664 3 3 9 5.589 7.422 2 2 9 5.260 6.897 3 3 10 5.588 7.372 2 2 10 5.120 6.537 3 3 11 5.583 7.418 2 2 11 5.164 6.766 3 4 4 5.599 7.144 2 2 12 5.173 6.761 3 4 5 5.656 7.445 2 2 13 5.199 6.792 3 4 6 5.610 7.500 2 3 3 5.361 - 3 4 7 5.623 7.550 2 3 4 5.444 6.444 3 4 8 5.623 7.585 2 3 5 5.251 6.909 3 4 9 5.652 7.614 2 3 6 5.349 6.970 3 4 10 5.661 7.617 2 3 7 5.357 6.839 3 5 5 5.706 7.578 2 3 8 5.316 7.022 3 5 6 5.602 7.591 2 3 9 5.340 7.006 3 5 7 5.607 7.697 2 3 10 5.362 7.042 3 5 8 5.614 7.706 2 3 11 5.374 7.094 3 5 9 5.670 7.733 2 3 12 5.350 7.134 3 6 6 5.625 7.725 2 4 4 5.455 7.036 3 6 7 5.689 7.756 2 4 5 5.273 7.205 3 6 8 5.678 7.796 2 4 6 5.340 7.340 3 7 7 5.688 7.810 2 4 7 5.376 7.321 4 4 4 5.692 7.654 2 4 8 5.393 7.350 4 4 5 5.657 7.760 2 4 9 5.400 7.364 4 4 6 6.681 7.795 2 4 10 5.345 7.357 4 4 7 5.650 7.814 2 4 11 5.365 7.396 4 4 8 5.779 7.853 2 5 5 5.339 7.339 4 4 9 5.704 7.910 2 5 6 5.339 7.376 4 5 5 5.666 7.823 2 5 7 5.393 7.450 4 5 6 5.661 7.936 2 5 8 5.415 7.440 4 5 7 5.733 7.931 2 5 9 5.396 7.447 4 5 8 5.718 7.992 2 5 10 5.420 7.514 4 6 6 5.724 8.000 2 6 6 5.410 7.467 4 6 7 5.706 8.039 2 6 7 5.357 7.491 5 5 5 5.780 8.000 2 6 8 5.404 7.522 5 5 6 5.729 8.028 2 6 9 5.392 7.566 5 5 7 5.708 8.108 2 7 7 5.398 7.491 5 6 6 5.765 8.124 2 7 8 5.403 7.571

Kruskal-Wallis Test Table(for k = 3 and N ≤17)

N = n1+n2+n3 = 17 datak = 3 groups(n1, n2, n3) = (6, 5, 6)(R1, R2, R3) = (38, 69, 46)

)1(3)1(

12

1

2

Nn

R

NNH

k

i i

i

)117(36

46*46

5

69*69

6

38*38

)117(17

12

= 6.609

Since significant points of (p<0.05) and (p<0.01) for (n1, n2, n3) = (6, 5, 6) are 5.765 and 8.124, respectively, there are significant difference(s) somewhere among three groups (p<0.05).

6.6098.1245.765

significance point of (p<0.05)

significance point of (p<0.01)

Example: Kruskal-Wallis Testn1 n2 n3 p < 0.05 p < 0.01 n1 n2 n3 p < 0.05 p < 0.012 2 2 - - 3 3 3 5.606 7.200 2 2 3 4.714 - 3 3 4 5.791 6.746 2 2 4 5.333 - 3 3 5 6.649 7.079 2 2 5 5.160 6.533 3 3 6 5.615 7.410 2 2 6 5.346 6.655 3 3 7 5.620 7.228 2 2 7 5.143 7.000 3 3 8 5.617 7.350 2 2 8 5.356 6.664 3 3 9 5.589 7.422 2 2 9 5.260 6.897 3 3 10 5.588 7.372 2 2 10 5.120 6.537 3 3 11 5.583 7.418 2 2 11 5.164 6.766 3 4 4 5.599 7.144 2 2 12 5.173 6.761 3 4 5 5.656 7.445 2 2 13 5.199 6.792 3 4 6 5.610 7.500 2 3 3 5.361 - 3 4 7 5.623 7.550 2 3 4 5.444 6.444 3 4 8 5.623 7.585 2 3 5 5.251 6.909 3 4 9 5.652 7.614 2 3 6 5.349 6.970 3 4 10 5.661 7.617 2 3 7 5.357 6.839 3 5 5 5.706 7.578 2 3 8 5.316 7.022 3 5 6 5.602 7.591 2 3 9 5.340 7.006 3 5 7 5.607 7.697 2 3 10 5.362 7.042 3 5 8 5.614 7.706 2 3 11 5.374 7.094 3 5 9 5.670 7.733 2 3 12 5.350 7.134 3 6 6 5.625 7.725 2 4 4 5.455 7.036 3 6 7 5.689 7.756 2 4 5 5.273 7.205 3 6 8 5.678 7.796 2 4 6 5.340 7.340 3 7 7 5.688 7.810 2 4 7 5.376 7.321 4 4 4 5.692 7.654 2 4 8 5.393 7.350 4 4 5 5.657 7.760 2 4 9 5.400 7.364 4 4 6 6.681 7.795 2 4 10 5.345 7.357 4 4 7 5.650 7.814 2 4 11 5.365 7.396 4 4 8 5.779 7.853 2 5 5 5.339 7.339 4 4 9 5.704 7.910 2 5 6 5.339 7.376 4 5 5 5.666 7.823 2 5 7 5.393 7.450 4 5 6 5.661 7.936 2 5 8 5.415 7.440 4 5 7 5.733 7.931 2 5 9 5.396 7.447 4 5 8 5.718 7.992 2 5 10 5.420 7.514 4 6 6 5.724 8.000 2 6 6 5.410 7.467 4 6 7 5.706 8.039 2 6 7 5.357 7.491 5 5 5 5.780 8.000 2 6 8 5.404 7.522 5 5 6 5.729 8.028 2 6 9 5.392 7.566 5 5 7 5.708 8.108 2 7 7 5.398 7.491 5 6 6 5.765 8.124 2 7 8 5.403 7.571

Kruskal-Wallis Test Table(for k = 3 and N ≤17)

N = n1+n2+n3 = 17 datak = 3 groups(n1, n2, n3) = (6, 5, 6)(R1, R2, R3) = (38, 69, 46)

)1(3)1(

12

1

2

Nn

R

NNH

k

i i

i

)117(36

46*46

5

69*69

6

38*38

)117(17

12 3

1

2

i i

i

n

R

= 6.609

Since significant points of (p<0.05) and (p<0.01) for (n1, n2, n3) = (6, 5, 6) are 5.765 and 8.124, respectively, there are significant difference(s) somewhere among three groups (p<0.05).

6.6098.1245.765

significance point of (p<0.05)

significance point of (p<0.01)

Q1: Where is significant among A, B, and C?

A1: Apply multiple comparisons between all pairsamong columns.

(Fisher's PLSD method, Scheffé method, Bonferroni-Dunn test, Dunnett method, Williams method, Tukey method, Nemenyi test, Tukey-Kramer method, Games/Howell method, Duncan's new multiple range test, Student-Newman-Keuls method, etc. Each has different characteristics.)

Page 32: Statistical Tests for Computational Intelligence Research ...homepages.ecs.vuw.ac.nz/.../StatisticalTests.pdf · Parametric Test Non-parametric Test (normality) unpaired (independent)

R1 = R2 = R3 =

1

2

4

7

10

8

11

12

13

3

5

6

9

N = n1+n2+n3 =k =(n1, n2, n3) =(R1, R2, R3) =

)1(3)1(

12

1

2

Nn

R

NNH

k

i i

i

=

Exercise: Kruskal-Wallis Test

24 44 23

13 samples3 groups( 5, 4, 4)(24, 44, 23)

6.227

7.7605.657

significance point of (p<0.05)

significance point of (p<0.01)

6.227

There is/are significant difference(s) somewhere among three groups (p<0.05).

Friedman Test2 groups n groups (n > 2)

datadistribution

・unpaired t -test

・sign test

・Wilcoxon signed-ranks test ・Friedman test

・Kruskal-Wallis test

・ one-way ANOVA

・ two-way ANOVA

(no

norm

ality

) one-waydata

Par

amet

ric T

est

Non

-par

amet

ric T

est

(nor

mal

ity)

unpa

ired

(inde

pend

ent)

paire

d(r

elat

ed)

unpa

ired

(inde

pend

ent)

paire

d(r

elat

ed)

・paired t -test

・Mann-Whitney U-test

AN

OV

A(A

naly

sis

of V

aria

nce)

Page 33: Statistical Tests for Computational Intelligence Research ...homepages.ecs.vuw.ac.nz/.../StatisticalTests.pdf · Parametric Test Non-parametric Test (normality) unpaired (independent)

Friedman TestWhen

(1) more than two groups, (2) data have correspondence (not independent), but(3) the conditions of two-way ANOVA are not satisfied,

Let' use ranks of data and Friedman test.

benchmarktasks

methods

a b c d

A 0.92 0.75 0.65 0.81

B 0.48 0.45 0.41 0.52

C 0.56 0.41 0.47 0.50

D 0.61 0.50 0.56 0.54

(ex.) Comparison of recognition rates.a b c d

methods

12

3

4

1

2 3

4 123

4

1

234

Friedman Test

Step 1: Make a ranking table.Step 2: Sum ranks of the factor that you want to test.

Step 3: Calculate the Friedman test value, χ2r .

Step 4: If k =3 or 4, compare χ2r with a significant point

in a Friedman test table. Otherwise, use a χ2 table of (k-1) degrees of freedom.

methods

12

3

4

1

23

4 1 234

123

4

a b c d

)1(3)1(

12

1

22

knRknk

k

iir

where (k, n) are the # of levels of factors 1 and 2.

# of

dat

a(n

= 4

)

# of methods (k = 4)

benchmarktasks

methoda b c d

A 4 2 1 3 B 3 2 1 4 C 4 1 2 3 D 4 1 3 2

Σ 15 6 7 12

ranking among methods

Page 34: Statistical Tests for Computational Intelligence Research ...homepages.ecs.vuw.ac.nz/.../StatisticalTests.pdf · Parametric Test Non-parametric Test (normality) unpaired (independent)

Example: Friedman TestStep 1: Make a ranking table.Step 2: Sum ranks of the factor that you want to test.

Step 3: Calculate the Friedman test value, χ2r .

Step 4: Since significant point for (k,n) = (4,4) is7.80,there is/are significant difference(s) somewhereamong four methods, a, b, c, and d (p<0.05).

1.8

5*4*31276155*4*4

12

)1(3)1(

12

2222

1

22

knRknk

k

iir

# of

dat

a (n

= 4

)

# of methods (k = 4)

benchmarktasks

methoda b c d

A 4 2 1 3 B 3 2 1 4 C 4 1 2 3 D 4 1 3 2

Σ 15 6 7 12

k n p<0.05 p<0.01

3

3 6.00 -

4 6.50 8.00

5 6.40 8.40

6 7.00 9.00

7 7.14 8.86

8 6.25 9.00 9 6.22 9.56 ∞ 5.99 9.21

4

3 7.40 9.00

4 7.80 9.60

5 7.80 9.96

∞ 7.81 11.34

Friedman test table.

8.19.67.8

significance point of (p<0.05)

significance point of (p<0.01)

Example: Friedman TestStep 1: Make a ranking table.Step 2: Sum ranks of the factor that you want to test.

Step 3: Calculate the Friedman test value, χ2r .

Step 4: Since significant point for (k,n) = (4,4) is7.80,there is/are significant difference(s) somewhereamong four methods, a, b, c, and d (p<0.05).

1.8

5*4*31276155*4*4

12

)1(3)1(

12

2222

1

22

knRknk

k

iir

# of

dat

a (n

= 4

)

# of methods (k = 4)

benchmarktasks

methoda b c d

A 4 2 1 3 B 3 2 1 4 C 4 1 2 3 D 4 1 3 2

Σ 15 6 7 12

k n p<0.05 p<0.01

3

3 6.00 -

4 6.50 8.00

5 6.40 8.40

6 7.00 9.00

7 7.14 8.86

8 6.25 9.00 9 6.22 9.56 ∞ 5.99 9.21

4

3 7.40 9.00

4 7.80 9.60

5 7.80 9.96

∞ 7.81 11.34

Friedman test table.

8.19.67.8

significance point of (p<0.05)

significance point of (p<0.01)

Q1: Where is significant among a, b, c, or d?

A1: Apply multiple comparisons between all pairsamong columns.

(Fisher's PLSD method, Scheffé method, Bonferroni-Dunn test, Dunnett method, Williams method, Tukey method, Nemenyi test, Tukey-Kramer method, Games/Howell method, Duncan's new multiple range test, Student-Newman-Keuls method, etc. Each has different characteristics.)

Page 35: Statistical Tests for Computational Intelligence Research ...homepages.ecs.vuw.ac.nz/.../StatisticalTests.pdf · Parametric Test Non-parametric Test (normality) unpaired (independent)

Multiple Comparisons

When there is significant difference among groups, multiple comparison is used to know which group is significantly difference from others.

4C2 = 6 times of pair comparisons with (p < 0.05)

1 - (1 - 0.05)6 = significance level 26.5%!

Example

Multiple Comparisons

When there is significant difference among groups, multiple comparison is used to know which group is significantly difference from others.

4C2 = 6 times of pair comparisons with (p < 0.05)

1 - (1 - 0.05)6 = significance level 26.5%!

Solution is to apply multiple pair comparisons with more strict significance level.

Example

Page 36: Statistical Tests for Computational Intelligence Research ...homepages.ecs.vuw.ac.nz/.../StatisticalTests.pdf · Parametric Test Non-parametric Test (normality) unpaired (independent)

Multiple Comparisons

-- Bobferroni method --

When pair comparisons are applied m times,let's use a significance level of p / m.

4C2 = 6 times of pair comparisons with (p < )0.05

6

Features:(1) Simple.(2) Rather strict, i.e. showing significances is rather hard.

Multiple Comparisons

-- Holm method --

Corrected Bonferroni method to detect significances easily.

Example pair comparisons

p-valuecorrected

p-value eqn.corrected p-value

0.0076 = p-value* 6 0.0456

0.0095 = p-value* 5 0.0475

0.0280 = p-value* 4 0.1120

0.0320 = p-value* 3 0.0960

0.0380 = p-value* 2 0.0760

0.0410 = p-value* 1 0.0410

vs.

vs.

vs.

vs.

vs.

vs.

Page 37: Statistical Tests for Computational Intelligence Research ...homepages.ecs.vuw.ac.nz/.../StatisticalTests.pdf · Parametric Test Non-parametric Test (normality) unpaired (independent)

normality(parametric)

2 groups n groups (n > 2)

datadistribution

t -test

・sign test

・Wilcoxon Signed-Ranks Test

ANOVA(Analysis of

Variance)

・Friedman test

・kruskal-wallis test

one-way ANOVA

two-way ANOVA

no normality(non-parametric)

one-waydata

two-waydata

Scheffé's Method of Paired Comparison

+Scheffé's method of paired comparison for Human Subjective Tests

IEC

lighting design of 3-D CG

measuring mental scale

geological simulation

hearing-aid fitting

Corridor

W

K

Wall

L

B

Verenda

MEMS design

EvolutionaryComputation

Target System

subjectiveevaluations

Interactive Evolutionary Computation

imag

e en

hanc

emen

t pro

cess

ing

room layoutplanning design

room lighting design byoptimizing LED assignments

Can you hear me ?

??

Scheffé's Method of Paired Comparison

Page 38: Statistical Tests for Computational Intelligence Research ...homepages.ecs.vuw.ac.nz/.../StatisticalTests.pdf · Parametric Test Non-parametric Test (normality) unpaired (independent)

ANOVA based on nC2 paired comparisons for n objects.

ANOVAevenbetter betterslightly better

slightly better

evenbetter betterslightly better

slightly better

evenbetter betterslightly better

slightly better

significance checkusing a yardstick

Scheffé's Method of Paired Comparison

Original method and three modified methods

All subjects must evaluate all pairs.

no yes

yesoriginal

(原法, 1952)Ura's variation(浦の変法, 1956)

no Haga's variation(芳賀の変法)

Nakaya's variation(中屋の変法, 1970)or

der

effe

ct

Order Effect

(1) and then

(2) and then

may result different evaluation.

Scheffé's Method of Paired Comparison

Page 39: Statistical Tests for Computational Intelligence Research ...homepages.ecs.vuw.ac.nz/.../StatisticalTests.pdf · Parametric Test Non-parametric Test (normality) unpaired (independent)

Scheffé's Method of Paired Comparison

1. Ask N human subjects to evaluate t objects in 3, 5 or 7 grades. 2. Assign [-1, +1], [-2, +2] or [-3, +3] for these grades.3. Then, start calculation (see other material).

evenbetter betterslightly better

slightly better

evenbetter betterslightly better

slightly better

evenbetter betterslightly better

slightly better

O1 O2 O3 O4 O5 O6

A1 - A2 2 1 1 2 1 2

A1 - A3 2 2 1 1 1 1

A2 - A3 1 0 1 1 -1 0

Six subjects (N = 6)

Pai

red

com

paris

ons

for

t=3

obje

cts.

Questionnaire Total row data

・・・

strap for a mobile phone

invitation to a dinner

tea /coffee stuffed animal fountain pen

Ex. Q.

Application Example:

What is the best present to be her/his boy/girl friend?

[SITUATION] He/he is my longing. I want to be her/his boy/girl friend before we graduate from our university. To get over my one-way love, I decided to present something of about 3,000 JPY and express my heart.

I show you 5C2 pairs of presents. Please compare each pair and mark your relative evaluation in five levels.

・・・・

Page 40: Statistical Tests for Computational Intelligence Research ...homepages.ecs.vuw.ac.nz/.../StatisticalTests.pdf · Parametric Test Non-parametric Test (normality) unpaired (independent)

present from a male

(significant difference)

Results of Scheffé's Method of Paired Comparison (Nakaya's variation)

What is the best present to be her/his boy/girl friend?

present from a female

I thi

nk e

ffect

ive.

Rea

lity

is ..

.

-1 -0.5 0 0.5 1

I will catch her heart by dinner.

mor

eef

fect

ive

less

effe

ctiv

e

-1 -0.5 0 0.5 1

How about tea leave or a stuffed anima?

mor

eef

fect

ive

less

effe

ctiv

e

-1 -0.5 0 0.5 1

Eat! Eat! Eat!

mor

eef

fect

ive

less

effe

ctiv

e

-1 -0.5 0 0.5 1

mor

eef

fect

ive

less

effe

ctiv

e

I hesitate to accept it as we have not gone about with him.

Original method and three modified methods

All subjects must evaluate all pairs.

no yes

yes original(原法, 1952)

Ura's variation(浦の変法, 1956)

no Haga's variation(芳賀の変法)

Nakaya's variation(中屋の変法, 1970)or

der

effe

ct

y yScheffé's Method of Paired Comparison

Modified methods by Ura and Nakaya

Page 41: Statistical Tests for Computational Intelligence Research ...homepages.ecs.vuw.ac.nz/.../StatisticalTests.pdf · Parametric Test Non-parametric Test (normality) unpaired (independent)

yScheffé's Method of Paired Comparison

Modified method by UraPairwise comparisons for objects which are effected by display order (order effect).

evenbetter betterslightly better

slightly better

evenbetter betterslightly better

slightly better

evenbetter betterslightly better

slightly better

evenbetter betterslightly better

slightly better

evenbetter betterslightly better

slightly better

evenbetter betterslightly better

slightly better

-2 -1 0 1 2

-2 -1 0 1 2

-2 -1 0 1 2

-2 -1 0 1 2

-2 -1 0 1 2

-2 -1 0 1 2

evenbetter betterslightly better

slightly better

evenbetter betterslightly better

slightly better

evenbetter betterslightly better

slightly better

evenbetter betterslightly better

slightly better

evenbetter betterslightly better

slightly better

evenbetter betterslightly better

slightly better

-2 -1 0 1 2

-2 -1 0 1 2

-2 -1 0 1 2

-2 -1 0 1 2

-2 -1 0 1 2

-2 -1 0 1 2

yScheffé's Method of Paired Comparison

Modified method by UraAsk N human subjects to evaluate 2×tC2 pairs for tobjects in 3, 5 or 7 grades and assign [-1, +1], [-2, +2] or [-3, +3], respectively.

Page 42: Statistical Tests for Computational Intelligence Research ...homepages.ecs.vuw.ac.nz/.../StatisticalTests.pdf · Parametric Test Non-parametric Test (normality) unpaired (independent)

yScheffé's Method of Paired Comparison

Modified method by Ura

Step 1: Make paired comparison table of each human subject.

evenbetter betterslightly better

slightly better

-2 -1 0 1 2

-2 -1 0 1 2

-2 -1 0 1 2

-2 -1 0 1 2

-2 -1 0 1 2

-2 -1 0 1 2

A1

A2

A1

A1

A2

A4

A4

A3

A3

A2

A4

A3

・・・

・・・

・・・

A1 A2 A3 A4

A1 0 -1 -1

A2 3 0 0

A3 3 1 -1

A4 3 3 1

ijlx : evaluation value when the l-th human subject compares the i-th object with the j-th object.

SubjectO1

Subject O3

SubjectO2

yScheffé's Method of Paired Comparison

Modified method by Ura

Step 1: Make paired comparison table of each human subject.

Page 43: Statistical Tests for Computational Intelligence Research ...homepages.ecs.vuw.ac.nz/.../StatisticalTests.pdf · Parametric Test Non-parametric Test (normality) unpaired (independent)

)(2

1ˆ iii xxtN

-1.1667 0.5417-0.5000 1.1250

A4 A3 A2 A1

1̂2̂3̂4̂

Average of four objects

where t: # of object (4) N: # of human subjects (3)

yScheffé's Method of Paired Comparison

Modified method by UraStep 2: Make a table summing all subjects' data and

calculate the average evaluations for all objects.

27 13 -12 -28

SxxN

S

Sxxt

S

xxtN

S

i ijjiij

l iilliB

iii

2

2)(

2

)(2

1

)(2

1

)(2

1

l i ijijlT

BBT

ilB

xS

SSSSSSS

Sxtt

S

xtNt

S

2

)()(

2)(

2

)1(

1

)1(

1

freedom. of degree and

,,,,,, where )()(

f

SSSSSSSS TBB

yScheffé's Method of Paired Comparison

Modified method by UraStep 3: Make a ANOVA table.

unbiased variance = S/f

F = unbiased variance

unbiased variance of S

for F tests.

Page 44: Statistical Tests for Computational Intelligence Research ...homepages.ecs.vuw.ac.nz/.../StatisticalTests.pdf · Parametric Test Non-parametric Test (normality) unpaired (independent)

SxxN

S

Sxxt

S

xxtN

S

i ijjiij

l iilliB

iii

2

2)(

2

)(2

1

)(2

1

)(2

1

l i ijijlT

BBT

ilB

xS

SSSSSSS

Sxtt

S

xtNt

S

2

)()(

2)(

2

)1(

1

)1(

1

yScheffé's Method of Paired Comparison

Modified method by Ura

ANOVA table.

-1.1667 0.5417-0.5000 1.1250

A4 A3 A2 A1

1̂2̂3̂4̂There are significant difference among A1 - A4

yScheffé's Method of Paired Comparison

Modified method by Ura

ANOVA table.

Page 45: Statistical Tests for Computational Intelligence Research ...homepages.ecs.vuw.ac.nz/.../StatisticalTests.pdf · Parametric Test Non-parametric Test (normality) unpaired (independent)

yScheffé's Method of Paired Comparison

Modified method by UraStep 4: Apply multiple comparisons.

Q1: Where is significant among A1, A2, and A3?

A1: Apply multiple comparisons between all pairs.(Fisher's PLSD method, Scheffé method, Bonferroni-Dunn test, Dunnett method, Williams method, Tukey method, Nemenyi test, Tukey-Kramer method, Games/Howell method, Duncan's new multiple range test, Student-Newman-Keuls method, etc. Each has different characteristics.)

Step 4: Apply multiple comparisons between all pairs and find which distance is significant.

Example of a simple multiple comparison.• Calculate a studentized yardstick • When a difference of average > a studentized yardstick,

the distance is significant.

yScheffé's Method of Paired Comparison

Modified method by Ura

-1.1667 0.5417-0.5000 1.1250

A4 A3 A2 A1

1̂2̂3̂4̂

(Fisher's PLSD method, Scheffé method, Bonferroni-Dunn test, Dunnett method, Williams method, Tukey method, Nemenyi test, Tukey-Kramer method, Games/Howell method, Duncan's new multiple range test, Student-Newman-Keuls method, etc. Each has different characteristics.)

Page 46: Statistical Tests for Computational Intelligence Research ...homepages.ecs.vuw.ac.nz/.../StatisticalTests.pdf · Parametric Test Non-parametric Test (normality) unpaired (independent)

(See in the next slide.)

yScheffé's Method of Paired Comparison

Modified method by Ura

tNftqY 2/ˆ),( 2

Step 4: Example of a simple multiple comparisons.

(studentized yardstick)

When (t, f) = (4,21), studentized yardsticks for significance levels of 5% and 1% are:

),,ˆ( 2 Ntwhere are an unbiased variance of Sε, the # of objects, and the #ofhuman subjects; is a studentized range obtained is a statistical test table for t, the degree of freedom of Sε ( ), and the significant level of φ; see these variables in an ANOVA table.

),( ftqf

)21,4(05.0q

Studentized yardstick ),(05.0 ftq2 3 4 5 6 7 8 9 10 12 15 20

1 18.0 27.0 32.8 37.1 40.4 43.1 45.4 47.4 49.1 52.0 55.4 59.6

2 6.09 8.30 9.80 10.9 11.7 12.4 13.0 13.5 14.0 14.7 15.7 16.8

3 4.50 5.91 6.82 7.50 8.04 8.48 8.85 9.18 9.46 9.95 10.5 11.2

4 3.93 5.04 5.76 6.29 6.71 7.05 7.35 7.60 7.83 8.21 8.66 9.23

5 3.64 4.60 5.22 5.67 6.03 6.33 6.58 6.80 6.99 7.32 7.72 8.21

6 3.46 4.34 4.90 5.31 5.63 5.89 6.12 6.32 6.49 6.79 7.14 7.59

7 3.34 4.16 4.68 5.06 5.36 5.61 5.82 6.00 6.16 6.43 6.76 7.17

8 3.26 4.04 4.53 4.89 5.17 5.40 5.60 5.77 5.92 6.18 6.48 6.87

9 3.20 3.95 4.42 4.76 5.02 5.24 5.43 5.60 5.74 5.98 6.28 6.64

10 3.15 3.88 4.33 4.65 4.91 5.12 5.30 5.46 5.60 5.83 6.11 6.47

11 3.11 3.82 4.26 4.57 4.82 5.03 5.20 5.35 5.49 5.71 5.99 6.33

12 3.08 3.77 4.20 4.51 4.75 4.95 5.12 5.27 5.40 5.62 5.88 6.21

13 3.06 3.73 4.15 4.45 4.69 4.88 5.05 5.19 5.32 5.53 5.79 6.11

14 3.03 3.70 4.11 4.41 4.67 4.83 4.99 5.10 5.25 5.46 5.72 6.03

15 3.01 3.67 4.08 4.37 4.60 4.78 4.94 5.08 5.20 5.40 5.66 5.96

16 3.00 3.65 4.05 4.33 4.56 4.74 4.90 5.03 5.15 5.35 5.59 5.90

17 2.98 3.63 4.02 4.30 4.52 4.71 4.86 4.99 5.11 5.31 5.55 5.84

18 2.97 3.61 4.00 4.28 4.49 4.67 4.82 4.96 5.07 5.27 5.50 5.79

19 2.96 3.59 3.98 4.25 4.47 4.65 4.79 4.92 5.04 5.23 5.46 5.75

20 2.95 3.58 3.96 4.23 4.45 4.62 4.77 4.90 5.01 5.20 5.43 5.71

24 2.92 3.53 3.90 4.17 4.37 4.54 4.68 4.81 4.92 5.10 5.32 5.59

30 2.89 3.49 3.84 4.10 4.30 4.46 4.60 4.72 4.83 5.00 5.21 5.48

40 2.86 3.44 3.79 4.04 4.23 4.39 4.52 4.63 4.74 4.91 5.11 5.36

60 2.83 3.40 3.74 3.98 4.16 4.31 4.44 4.55 4.65 4.81 5.00 5.24

120 2.80 3.36 3.69 3.92 4.10 4.24 4.36 4.48 4.56 4.72 4.90 5.13

∞ 2.77 3.31 3.63 3.86 4.03 4.17 4.29 4.39 4.47 4.62 4.80 5.01

f t

Page 47: Statistical Tests for Computational Intelligence Research ...homepages.ecs.vuw.ac.nz/.../StatisticalTests.pdf · Parametric Test Non-parametric Test (normality) unpaired (independent)

yScheffé's Method of Paired Comparison

Modified method by UraStep 4: Example of a simple multiple comparisons.

Original method and three modified methods

All subjects must evaluate all pairs.

no yes

yes original(原法, 1952)

Ura's variation(浦の変法, 1956)

no Haga's variation(芳賀の変法)

Nakaya's variation(中屋の変法, 1970)or

der

effe

ct

y yScheffé's Method of Paired Comparison

Modified methods by Ura and Nakaya

Page 48: Statistical Tests for Computational Intelligence Research ...homepages.ecs.vuw.ac.nz/.../StatisticalTests.pdf · Parametric Test Non-parametric Test (normality) unpaired (independent)

evenbetter betterslightly better

slightly better

evenbetter betterslightly better

slightly better

evenbetter betterslightly better

slightly better

-2 -1 0 1 2

-2 -1 0 1 2

-2 -1 0 1 2

y yScheffé's Method of Paired Comparison

Modified method by NakayaPairwise comparisons for objects that can be compared without order effect.

y yScheffé's Method of Paired Comparison

Modified method by Nakaya1. Ask N human subjects to evaluate t objects in 3, 5 or 7 grades. 2. Assign [-1, +1], [-2, +2] or [-3, +3] for these grades, respectively.3. Then, start calculation (see other material).

O1 O2 O3 O4 O5 O6

A1 - A2 2 3 3 2 0 1

A1 - A3 2 0 0 1 1 0

A2 - A3 -3 -2 -1 -1 -3 -2

Six human subjects (N = 6)

Pai

red

com

paris

ons

for t=

3 ob

ject

s.

Questionnaire

evenbetter betterslightly better

slightly better

evenbetter betterslightly better

slightly better

evenbetter betterslightly better

slightly better

-2 -1 0 1 2

-2 -1 0 1 2

-2 -1 0 1 2

Page 49: Statistical Tests for Computational Intelligence Research ...homepages.ecs.vuw.ac.nz/.../StatisticalTests.pdf · Parametric Test Non-parametric Test (normality) unpaired (independent)

y yScheffé's Method of Paired Comparison

Modified method by NakayaStep 1: Make paired comparison table of each human subject.

: evaluation value when the l-th human subject compares the i-th object with the j-th object.

ijlx

ii xtN

1̂Average of four objects

where t: # of object (3)N: # of human subjects (6)

y yScheffé's Method of Paired Comparison

Modified method by NakayaStep 2: Make a table summing all subjects' data and

calculate the average evaluations for all objects.

Page 50: Statistical Tests for Computational Intelligence Research ...homepages.ecs.vuw.ac.nz/.../StatisticalTests.pdf · Parametric Test Non-parametric Test (normality) unpaired (independent)

ii

l iliB

ii

xtN

S

Sxt

S

xtN

S

2

2

..

2.)(

..

1

1

1

SF

SSSSS

Sxt

S

BT

l iliB

of varianceUnbariased

varianceUnbiased

1

)(

2.)(

There are significant difference among A1 - A3

y yScheffé's Method of Paired Comparison

Modified method by NakayaStep 3: Make a ANOVA table.

ANOVA table.

y yScheffé's Method of Paired Comparison

Modified method by NakayaStep 4: Apply multiple comparisons.

Q1: Where is significant among A1, A2, and A3?

A1: Apply multiple comparisons between all pairsamong columns.

(Fisher's PLSD method, Scheffé method, Bonferroni-Dunn test, Dunnett method, Williams method, Tukey method, Nemenyi test, Tukey-Kramer method, Games/Howell method, Duncan's new multiple range test, Student-Newman-Keuls method, etc. Each has different characteristics.)

ANOVA table.

Page 51: Statistical Tests for Computational Intelligence Research ...homepages.ecs.vuw.ac.nz/.../StatisticalTests.pdf · Parametric Test Non-parametric Test (normality) unpaired (independent)

y yScheffé's Method of Paired Comparison

Modified method by NakayaStep 4: Apply multiple comparisons between all pairs and

find which distance is significant.

Example of a simple multiple comparison.• Calculate a studentized yardstick • When a difference of average > a studentized yardstick,

the distance is significant.

(Fisher's PLSD method, Scheffé method, Bonferroni-Dunn test, Dunnett method, Williams method, Tukey method, Nemenyi test, Tukey-Kramer method, Games/Howell method, Duncan's new multiple range test, Student-Newman-Keuls method, etc. Each has different characteristics.)

1980.263/79.197.6

4506.163/79.160.4

01.0

05.0

Y

Y

y yScheffé's Method of Paired Comparison

Modified method by NakayaStep 4: Example of a simple multiple comparisons.

tNftqY /ˆ),( 2 (studentized yardstick)

),,ˆ( 2 Ntwhere are an unbiased variance of Sε, the # of objects, and the #ofhuman subjects; is a studentized range obtained is a statistical test table for t, the degree of freedom of Sε ( ), and the significant level of φ; see these variables in an ANOVA table.

),( ftqf

(See in the next slide.))5,3(05.0q

Page 52: Statistical Tests for Computational Intelligence Research ...homepages.ecs.vuw.ac.nz/.../StatisticalTests.pdf · Parametric Test Non-parametric Test (normality) unpaired (independent)

Studentized yardstick ),(05.0 ftq2 3 4 5 6 7 8 9 10 12 15 20

1 18.0 27.0 32.8 37.1 40.4 43.1 45.4 47.4 49.1 52.0 55.4 59.6

2 6.09 8.30 9.80 10.9 11.7 12.4 13.0 13.5 14.0 14.7 15.7 16.8

3 4.50 5.91 6.82 7.50 8.04 8.48 8.85 9.18 9.46 9.95 10.5 11.2

4 3.93 5.04 5.76 6.29 6.71 7.05 7.35 7.60 7.83 8.21 8.66 9.23

5 3.64 4.60 5.22 5.67 6.03 6.33 6.58 6.80 6.99 7.32 7.72 8.21

6 3.46 4.34 4.90 5.31 5.63 5.89 6.12 6.32 6.49 6.79 7.14 7.59

7 3.34 4.16 4.68 5.06 5.36 5.61 5.82 6.00 6.16 6.43 6.76 7.17

8 3.26 4.04 4.53 4.89 5.17 5.40 5.60 5.77 5.92 6.18 6.48 6.87

9 3.20 3.95 4.42 4.76 5.02 5.24 5.43 5.60 5.74 5.98 6.28 6.64

10 3.15 3.88 4.33 4.65 4.91 5.12 5.30 5.46 5.60 5.83 6.11 6.47

11 3.11 3.82 4.26 4.57 4.82 5.03 5.20 5.35 5.49 5.71 5.99 6.33

12 3.08 3.77 4.20 4.51 4.75 4.95 5.12 5.27 5.40 5.62 5.88 6.21

13 3.06 3.73 4.15 4.45 4.69 4.88 5.05 5.19 5.32 5.53 5.79 6.11

14 3.03 3.70 4.11 4.41 4.67 4.83 4.99 5.10 5.25 5.46 5.72 6.03

15 3.01 3.67 4.08 4.37 4.60 4.78 4.94 5.08 5.20 5.40 5.66 5.96

16 3.00 3.65 4.05 4.33 4.56 4.74 4.90 5.03 5.15 5.35 5.59 5.90

17 2.98 3.63 4.02 4.30 4.52 4.71 4.86 4.99 5.11 5.31 5.55 5.84

18 2.97 3.61 4.00 4.28 4.49 4.67 4.82 4.96 5.07 5.27 5.50 5.79

19 2.96 3.59 3.98 4.25 4.47 4.65 4.79 4.92 5.04 5.23 5.46 5.75

20 2.95 3.58 3.96 4.23 4.45 4.62 4.77 4.90 5.01 5.20 5.43 5.71

24 2.92 3.53 3.90 4.17 4.37 4.54 4.68 4.81 4.92 5.10 5.32 5.59

30 2.89 3.49 3.84 4.10 4.30 4.46 4.60 4.72 4.83 5.00 5.21 5.48

40 2.86 3.44 3.79 4.04 4.23 4.39 4.52 4.63 4.74 4.91 5.11 5.36

60 2.83 3.40 3.74 3.98 4.16 4.31 4.44 4.55 4.65 4.81 5.00 5.24

120 2.80 3.36 3.69 3.92 4.10 4.24 4.36 4.48 4.56 4.72 4.90 5.13

∞ 2.77 3.31 3.63 3.86 4.03 4.17 4.29 4.39 4.47 4.62 4.80 5.01

f t

SUMMARY

2 groups n groups (n > 2)

datadistribution

・unpaired t -test

・sign test・Wilcoxon signed-ranks test ・Friedman test

・Kruskal-Wallis test

・ one-way ANOVA

・ two-way ANOVA

(no

norm

ality

) one-waydata

two-waydata

Par

amet

ric T

est

Non

-par

amet

ric T

est

(nor

mal

ity)

unpa

ired

(inde

pend

ent)

pair

ed(r

elat

ed)

unpa

ired

(inde

pend

ent)

pair

ed(r

elat

ed)

・paired t -test

・Mann-Whitney U-test

AN

OV

A(A

naly

sis

of V

aria

nce)

+Scheffé's method of paired comparison for Human Subjective Tests

1. We overview which statistical test we should use for which case.

2. We can appeal the effectiveness of our experiments with correct use of statistical tests.