6. non-parametric methods - university of dundee · kolmogorov-smirnov test: summary 13 input...

32
P-values and statistical tests 6. Non-parametric methods Hand-outs available at http://is.gd/statlec Marek Gierliński Division of Computational Biology

Upload: trinhanh

Post on 04-May-2019

219 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 6. Non-parametric methods - University of Dundee · Kolmogorov-Smirnov test: summary 13 Input twosamples of ! "and! 2values values can be ordinal Assumptions Samplesare random and

P-values and statistical tests6. Non-parametric methods

Hand-outsavailableathttp://is.gd/statlec

MarekGierlińskiDivisionofComputationalBiology

Page 2: 6. Non-parametric methods - University of Dundee · Kolmogorov-Smirnov test: summary 13 Input twosamples of ! "and! 2values values can be ordinal Assumptions Samplesare random and

Kolmogorov-Smirnov test

Tест Колмогорова-Смирнова

Page 3: 6. Non-parametric methods - University of Dundee · Kolmogorov-Smirnov test: summary 13 Input twosamples of ! "and! 2values values can be ordinal Assumptions Samplesare random and

Cumulative distribution of data

3

Data𝑛 points

Startatzero

"#$"

step

Lastvalueis1

Page 4: 6. Non-parametric methods - University of Dundee · Kolmogorov-Smirnov test: summary 13 Input twosamples of ! "and! 2values values can be ordinal Assumptions Samplesare random and

Test statistic

n 𝐷 - maximumverticaldifferencebetweentwocumulativedistributionsn Itmeasuresdistancebetweensamples

4

𝐷 = 0.42

Page 5: 6. Non-parametric methods - University of Dundee · Kolmogorov-Smirnov test: summary 13 Input twosamples of ! "and! 2values values can be ordinal Assumptions Samplesare random and

Test statistic

n 𝐷 - maximumverticaldifferencebetweentwocumulativedistributionsn Itmeasuresdistancebetweensamples

5

𝐷 = 0.42𝐷 = 0.83

Page 6: 6. Non-parametric methods - University of Dundee · Kolmogorov-Smirnov test: summary 13 Input twosamples of ! "and! 2values values can be ordinal Assumptions Samplesare random and

Null distribution

6

𝐷 = 0.167

Populationofmice

Selecttwosampleswithsizesasindatainquestion

Find𝐷

Builddistributionof𝐷

×101

Page 7: 6. Non-parametric methods - University of Dundee · Kolmogorov-Smirnov test: summary 13 Input twosamples of ! "and! 2values values can be ordinal Assumptions Samplesare random and

Null distribution

7

Populationofmice

Selecttwosampleswithsizesasindatainquestion

Find𝐷

Builddistributionof𝐷

×101

𝑛" = 12𝑛2 = 9

Kolmogorovdistribution

Nulldistributionrepresentsallpossiblesamplesunderthenullhypothesis.

Kolmogorovdistributionapproximatesit

Page 8: 6. Non-parametric methods - University of Dundee · Kolmogorov-Smirnov test: summary 13 Input twosamples of ! "and! 2values values can be ordinal Assumptions Samplesare random and

Null distribution

8

𝑛" = 12𝑛2 = 9

𝐷 = 0.42

𝑝 = 0.25

𝐷 = 0.42

Nulldistributionrepresentsallpossiblesamplesunderthenullhypothesis.

Kolmogorovdistributionapproximatesit

Page 9: 6. Non-parametric methods - University of Dundee · Kolmogorov-Smirnov test: summary 13 Input twosamples of ! "and! 2values values can be ordinal Assumptions Samplesare random and

KS test is sensitive to location and shape

9

𝐷 = 0.57𝑝 = 6×10$1

𝐷 = 0.40𝑝 = 0.01

Page 10: 6. Non-parametric methods - University of Dundee · Kolmogorov-Smirnov test: summary 13 Input twosamples of ! "and! 2values values can be ordinal Assumptions Samplesare random and

KS-test does not work for small samples!n Considertwosamplesofsize𝑛6 = 𝑛7 = 3

n Thereareonlythreepossiblevaluesofstatistic𝐷

10

𝐷 𝑝

1/3 1

2/3 0.6

1 0.1

Page 11: 6. Non-parametric methods - University of Dundee · Kolmogorov-Smirnov test: summary 13 Input twosamples of ! "and! 2values values can be ordinal Assumptions Samplesare random and

Comparison of two-sample tests

11

Test p-value

t-test 0.96

Mann-Whitney 0.41

Kolmogorov-Smirnov 0.33

Test p-value

t-test 0.07

Mann-Whitney 0.04

Kolmogorov-Smirnov 0.02

Page 12: 6. Non-parametric methods - University of Dundee · Kolmogorov-Smirnov test: summary 13 Input twosamples of ! "and! 2values values can be ordinal Assumptions Samplesare random and

How to do it in R?> mice = read.table('http://tiny.cc/mice_kruskal', header=T)> sco = mice[mice$Country=='Scottish', 'Lifespan']> eng = mice[mice$Country=='English', 'Lifespan']> ks.test(eng, sco)

Two-sample Kolmogorov-Smirnov test

data: eng and scoD = 0.41667, p-value = 0.3338alternative hypothesis: two-sided

Warning message:In ks.test(eng, sco) : cannot compute exact p-value with ties

12

Page 13: 6. Non-parametric methods - University of Dundee · Kolmogorov-Smirnov test: summary 13 Input twosamples of ! "and! 2values values can be ordinal Assumptions Samplesare random and

Kolmogorov-Smirnov test: summary

13

Input two samplesof𝑛" and 𝑛2 valuesvaluescanbeordinal

Assumptions Samples arerandomandindependent(nobefore-after)Variablesshouldbecontinuous(nodiscretedata)

Usage Comparedistributionsoftwosamples

Nullhypothesis Bothsamplesaredrawnfromthesamedistribution

Comments Doesn’tcareaboutdistributionsNotveryusefulforsmallsamplesItistooconservativefordiscretedistributions

Page 14: 6. Non-parametric methods - University of Dundee · Kolmogorov-Smirnov test: summary 13 Input twosamples of ! "and! 2values values can be ordinal Assumptions Samplesare random and

Permutation and bootstrap test

Page 15: 6. Non-parametric methods - University of Dundee · Kolmogorov-Smirnov test: summary 13 Input twosamples of ! "and! 2values values can be ordinal Assumptions Samplesare random and

Permutation test

15

𝐷

𝐷9:;

𝑋 𝑌

Pooled

𝐷9:;

𝐷"

𝐷2

𝐷2...

𝑛6 𝑛7

𝑛6 𝑛7

Freechoiceofteststatistic:

𝐷 = �̅� − 𝑦B

𝐷 = 𝑥C − 𝑦C

𝐷 =�̅�𝑦B

...Randompartitioninto𝑛6 and𝑛7

Pooledsamplesapproximatethepopulation

Simulateddistributionof𝐷approximatesthenulldistribution

Page 16: 6. Non-parametric methods - University of Dundee · Kolmogorov-Smirnov test: summary 13 Input twosamples of ! "and! 2values values can be ordinal Assumptions Samplesare random and

Permutation test

16

Page 17: 6. Non-parametric methods - University of Dundee · Kolmogorov-Smirnov test: summary 13 Input twosamples of ! "and! 2values values can be ordinal Assumptions Samplesare random and

Permutation test

n Othermetricscanbeused:differencebetweenthemedians,trimmedmeans,ratio...

n Butagain:doesn’tworkforsmallsamples,only5discretep-valuesfor𝑛 = 3

17

𝐷9:; = −171

𝑝 = 0.07

Page 18: 6. Non-parametric methods - University of Dundee · Kolmogorov-Smirnov test: summary 13 Input twosamples of ! "and! 2values values can be ordinal Assumptions Samplesare random and

How to do it in R?> library(coin)> mice = read.table('http://tiny.cc/mice_kruskal', header=T)> mice2 = mice[mice$Country %in% c('English', 'N.Irish'),] > oneway_test(Lifespan ~ Country, mice2, alternative="greater", distribution=approximate(B=100000))

Approximative Two-Sample Fisher-Pitman Permutation Test

data: Lifespan by Country (English, N.Irish)Z = 1.5587, p-value = 0.06603alternative hypothesis: true mu is greater than 0

18

Page 19: 6. Non-parametric methods - University of Dundee · Kolmogorov-Smirnov test: summary 13 Input twosamples of ! "and! 2values values can be ordinal Assumptions Samplesare random and

Efron-Tibshirani bootstrap testn Twosamples,size𝑛6 and𝑛7n Thenullhypothesis:𝜇" = 𝜇2

n 𝑀 - meanacrosstwosamplesn Shiftthesamplestocommonmean:

𝑥′G = 𝑥G − �̅� + 𝑀

𝑦′G = 𝑦G − 𝑦B + 𝑀

n Poolthemtogether

𝑍 = (𝑥"K , … , 𝑥#NK , 𝑦"K , … , 𝑦#O

K )

n Draw𝑛6 and𝑛7 pointsfromZwithreplacement

n Findt-statisticforthemn Builddistributionof𝑡

n Comparewith𝑡9:;

19

𝑡

𝑡9:;

Page 20: 6. Non-parametric methods - University of Dundee · Kolmogorov-Smirnov test: summary 13 Input twosamples of ! "and! 2values values can be ordinal Assumptions Samplesare random and

Permutation vs bootstrap

Permutation

n Drawwithoutreplacement

Bootstrap

n Drawwithreplacement

20

1 2 3 4 5 6 7 8 9 10

6 9 5 2 10 1 3 4 8 7

3 8 2 5 6 1 7 10 4 9

6 7 8 5 2 10 9 1 3 4

1 2 3 4 5 6 7 8 9 10

8 10 5 2 5 3 3 8 2 6

9 2 7 2 8 8 7 3 2 2

7 1 4 1 8 6 6 2 6 9

Page 21: 6. Non-parametric methods - University of Dundee · Kolmogorov-Smirnov test: summary 13 Input twosamples of ! "and! 2values values can be ordinal Assumptions Samplesare random and

Bootstrap test

n Two-sided𝑝 = 0.09n Lessaccuratethanpermutationtestn Bootstraphasmoreapplications

21

𝑡9:; = −2.0

𝑝 = 0.02

t

Page 22: 6. Non-parametric methods - University of Dundee · Kolmogorov-Smirnov test: summary 13 Input twosamples of ! "and! 2values values can be ordinal Assumptions Samplesare random and

How to do it in R?> mice = read.table('http://tiny.cc/mice_kruskal', header=TRUE)

> mice2 <- mice[mice$Country %in% c('English', 'N.Irish'),]

> nEng <- length(which(mice$Country == 'English'))

> nBoot <- 1000

> tstat <- function(data) {

x <- data[1:nEng, 2]

y <- data[(nEng+1):nrow(data), 2]

tobj <- t.test(x, y)

t <- tobj$statistic return(t)

}

> bootstat <- function(data, indices) {

d <- data[indices,] # allows boot to select sample

t <- tstat(d)

return(t)

}

> b <- boot(data=mice2, statistic=bootstat, R=nBoot)

> p <- length(which(b$t > b$t0)) / nBoot

> p

[1] 0.027

22

Page 23: 6. Non-parametric methods - University of Dundee · Kolmogorov-Smirnov test: summary 13 Input twosamples of ! "and! 2values values can be ordinal Assumptions Samplesare random and

Two-sample test comparison

23

Test Statistic p-value(two-sided)

Comments

t-test 𝑡 = 2.00 0.068 Notappropriateforskeweddistributions

Mann-Whitney 𝑈 = 50 0.040 Compares locationandshape

Kolmogorov-Smirnov 𝐷 = 0.83 0.015 Comparesdistributions

permutation 𝐷 = −171 0.12 Comparesdistributions

E-Tbootstrap 𝑡 = −2.00 0.094 Comparesmeans,distribution-free

Page 24: 6. Non-parametric methods - University of Dundee · Kolmogorov-Smirnov test: summary 13 Input twosamples of ! "and! 2values values can be ordinal Assumptions Samplesare random and

Monte Carlo chi-square test

Page 25: 6. Non-parametric methods - University of Dundee · Kolmogorov-Smirnov test: summary 13 Input twosamples of ! "and! 2values values can be ordinal Assumptions Samplesare random and

Contingency tables

25

Drawn Notdrawn Total

White 𝑘 𝑚 − 𝑘 20

Black 𝑛 − 𝑘 𝑁 + 𝑘 − 𝑛 −𝑚 16

Total 10 26 36

0 20

10 6

1 19

9 7

2 18

8 8

3 17

7 9

4 16

6 10

5 15

5 11

6 14

4 12

7 13

3 13

8 12

2 14

9 11

1 15

10 10

0 16

Fisher’stest– countallpossiblecombinations

WT KO1 KO2 KO3

G1 50 61 78 43

S 172 175 162 178

G2 55 45 47 59

Chi-squaretest– findp-valuefromanasymptoticdistribution

𝑝 = 0.02

Page 26: 6. Non-parametric methods - University of Dundee · Kolmogorov-Smirnov test: summary 13 Input twosamples of ! "and! 2values values can be ordinal Assumptions Samplesare random and

Generate a random subset of all combinations

26

WT KO1 KO2 KO3 Sum

G1 50 61 78 43 232

S 172 175 162 178 687

G2 55 45 47 59 206

Sum 277 281 287 280 1125

62 54 63 53167 175 176 16948 52 48 58

𝜒2 = 2.87

59 70 52 51164 161 186 17654 50 49 53

𝜒2 = 6.40

57 56 58 61164 173 172 17856 52 57 41

𝜒2 = 3.78Nullhypothesis:

proportionsinrowsandcolumnsareindependent

or

sumsinrowsandcolumnsarefixed

...

Page 27: 6. Non-parametric methods - University of Dundee · Kolmogorov-Smirnov test: summary 13 Input twosamples of ! "and! 2values values can be ordinal Assumptions Samplesare random and

Realexperiment

27

Contingencytable

Redistributecounts,keepingrowandcolumnsumsfixed

Find𝜒2

Builddistributionof𝜒2

×101Observation

𝑝 = 0.019

Page 28: 6. Non-parametric methods - University of Dundee · Kolmogorov-Smirnov test: summary 13 Input twosamples of ! "and! 2values values can be ordinal Assumptions Samplesare random and

How to do it in R?# Flow cytometry experiment

> flcyt = rbind(c(50,61,78,43), c(172,175,162,178), c(55,45,47,59))

> chisq.test(flcyt, simulate.p.value = TRUE, B=100000)

Pearson's Chi-squared test with simulated p-value (based on 1e+05 replicates)

data: flcyt

X-squared = 15.22, df = NA, p-value = 0.01944

# Pearson’s test with asymptotic distribution

> chisq.test(flcyt)

Pearson's Chi-squared test

data: flcyt

X-squared = 15.122, df = 6, p-value = 0.01933

28

Page 29: 6. Non-parametric methods - University of Dundee · Kolmogorov-Smirnov test: summary 13 Input twosamples of ! "and! 2values values can be ordinal Assumptions Samplesare random and

Monte Carlo chi-square test: summary

29

Input 𝑛W×𝑛X contingencytabletable containscounts

Assumptions Observationsarerandomandindependent(nobefore-after)Mutualexclusivity(nooverlapbetweencategories)Errorsdon’thavetobenormalCountscanbesmall

Usage Examineifthereisanassociation(contingency)betweentwovariables;whethertheproportionsin“groups”dependonthe“condition”(andviceversa)

Nullhypothesis The proportionsbetweenrowsdonotdependonthechoiceofcolumn

Comments Almostexact(with largenumberofbootstraps)Computationallyexpensive

Page 30: 6. Non-parametric methods - University of Dundee · Kolmogorov-Smirnov test: summary 13 Input twosamples of ! "and! 2values values can be ordinal Assumptions Samplesare random and

Which test should I use?

30

Outcomeoftheexperiment

Goal Measurement(Gaussian)

Measurement(non-Gaussian) Rank,score Category

Comparecentralvalueoftwounpairedgroups

t-test t-test(if symmetric)Mann-WhitneyEfron-Tibshirani

Mann-Whitney Fisher’sChi-squareG-testMonte-Carlo

Comparedistributionsoftwounpairedgroups

Mann-WhitneyKolmogorov-Smirnov

permutation

Compare twopairedgroups

pairedt-test Wilcoxon signed-ranktestpermutationbootstrap

McNemar’s test

Comparethreeofmoregroups

ANOVA Kruskal-Wallis Chi-squareMonte-Carlo

Page 31: 6. Non-parametric methods - University of Dundee · Kolmogorov-Smirnov test: summary 13 Input twosamples of ! "and! 2values values can be ordinal Assumptions Samplesare random and

Parametric or non-parametric?

31

Measurement Rankor score

Gaussian?

Parametric

CantransformtoGaussian? Non-parametric

Y Y

N N

Out-of-scalevalues?

N

Y

Experimentoutcome

Page 32: 6. Non-parametric methods - University of Dundee · Kolmogorov-Smirnov test: summary 13 Input twosamples of ! "and! 2values values can be ordinal Assumptions Samplesare random and

Hand-outsavailableathttp://tiny.cc/statlec