if you liked it you should've put a p-value on it ...or not

63
If you liked it you should’ve put a p-value on it … or not. Chris Gorgolewski Max Planck Institute for Human Cognitive and Brain Sciences

Upload: krzysztof-gorgolewski

Post on 27-Jun-2015

292 views

Category:

Education


1 download

DESCRIPTION

Statistical inference in neuroimaging

TRANSCRIPT

Page 1: If you liked it you should've put a p-value on it ...or not

If you liked it you should’ve put a p-value on it

… or not.

Chris Gorgolewski Max Planck Institute for Human Cognitive and Brain Sciences

Page 2: If you liked it you should've put a p-value on it ...or not

SIGNAL DETECTION THEORY

Signal and noise

False positive and false negative errors

Power

Page 3: If you liked it you should've put a p-value on it ...or not

Signal detection theory

Page 4: If you liked it you should've put a p-value on it ...or not

Types of errors

Page 5: If you liked it you should've put a p-value on it ...or not

Vocabulary

• Type I error – false positive

• Type II error – false negative

• False positive rate

• False negative rate

• Statistical power = 1 – false negative rate

• Sensitivity = Power

Page 6: If you liked it you should've put a p-value on it ...or not

Inference = thresholding

Page 7: If you liked it you should've put a p-value on it ...or not

Inference = thresholding

Page 8: If you liked it you should've put a p-value on it ...or not

Signal to Noise ratio

Page 9: If you liked it you should've put a p-value on it ...or not

Looking in the wrong places

Page 10: If you liked it you should've put a p-value on it ...or not

Lower SNR = we miss more stuff

Page 11: If you liked it you should've put a p-value on it ...or not

Lower SNR = higher FDR threshold

Page 12: If you liked it you should've put a p-value on it ...or not

VOXELWISE TESTS

P-maps

Multiple comparison

FWE correction: Bonferroni, permutations

FDR correction: B-H, Local FDR

Page 13: If you liked it you should've put a p-value on it ...or not

Hypothesis testing

• Distinguish between two hypotheses

1. H0 – there is no difference between groups

2. H1 – there is a difference between groups

• Or…

1. H0 – there is no relation between two variables

2. H1 – there is some relation between the two variables

Page 14: If you liked it you should've put a p-value on it ...or not

From statistical values to p-values

• Various procedures give us statistical values

– T-tests (one sample, two sample, paired etc.)

– F-Tests

– Correlation tests (r values)

• What is a p value?

Page 15: If you liked it you should've put a p-value on it ...or not

P value

• P(z) = A probability if we repeat our experiment (with all the analyses) and there is no effect we will get this or greater statistical value.

Page 16: If you liked it you should've put a p-value on it ...or not

t, z, F to p

Page 17: If you liked it you should've put a p-value on it ...or not

OK back to neuroimaging

• Assuming that we are doing a massive univariate analysis (we look at each voxel independently) we have a t-map

• Now using a theoretical distribution (given the degrees of freedom) we can turn it into a p-map

Page 18: If you liked it you should've put a p-value on it ...or not

Inference!

• We take out p-map discard all voxel with values > 0.05

– “The value for which P=0.05, or 1 in 20, is 1.96 or nearly 2; it is convenient to take this point as a limit in judging whether a deviation ought to be considered significant or not. Deviations exceeding twice the standard deviation are thus formally regarded as significant.”

• We are done – right?

Page 19: If you liked it you should've put a p-value on it ...or not

Not quite done yet…

• Let me generate two vectors of values and test using a t-test if they are different

• What is the probability that P(t) < 0.05

– Well… 0.05

• Let me generate another set of values… and another… 100 pairs of vectors

• What is the probability that at least one of the test?

Page 20: If you liked it you should've put a p-value on it ...or not

The Salmon of Doubt

Page 21: If you liked it you should've put a p-value on it ...or not

Correcting for multiple comparisons

• Bonferroni correction (based on Bool’s inequality)

– Divide your p-threshold by the number of tests you have performed

– Or multiple your p-values by the number of tests you have performed

Page 22: If you liked it you should've put a p-value on it ...or not

Bonferroni is a Family Wise Error correction

It guarantees that the chances of getting at least one false positive in all the tests is less than your

p-threshold

Page 23: If you liked it you should've put a p-value on it ...or not

Permutation based FWE correction

• The assumptions behind the theoretical distributions are often not met

• There are many dependencies between voxels

– Each test is not independent so Bonferroni correction can be conservative

• We can however establish an empirical distribution

Page 24: If you liked it you should've put a p-value on it ...or not

Permutation based FWE correction

1. Break the relation: shuffle the participants between the groups

2. Perform the test

3. Save the maximum statistical value across voxels

4. Repeat

Page 25: If you liked it you should've put a p-value on it ...or not

Permutation based FWE correction

Our FWE corrected p value is the percentage of permutations that yielded statistical values

higher than the original (unshuffled one)

Page 26: If you liked it you should've put a p-value on it ...or not

False Discovery Rate

• Even conceptually FWE correction seems conservative

– At least one test out of 60 000?

• Is there a more intuitive way of looking at this?

Page 27: If you liked it you should've put a p-value on it ...or not

False Discovery Rate

I present a number of voxels that I think show a strong effect, but I admit that a certain

percentage of them might be false positives.

Page 28: If you liked it you should've put a p-value on it ...or not

False Discovery Rate

Percentage of false positive voxels among all significant voxels.

Page 29: If you liked it you should've put a p-value on it ...or not

FDR procedures

• Benjamini-Hochberg procedure

– With it’s dependent variables variant

• Efrons local FDR procedure

– Explicit modeling of the signal distribution

Page 30: If you liked it you should've put a p-value on it ...or not

Interim Summary

• FWE corrections

– Bonferroni – simple but struggles with dependencies (over conservative)

– Permutations – less dependent on assumptions, but time consuming

• FDR corrections

– B-H – simple but also struggles with dependencies

– Local FDR – data driven, but can fail in case of low SNR

Page 31: If you liked it you should've put a p-value on it ...or not

CLUSTER EXTENT TESTS

Test how big are the blobs Random field theory Smoothness estimation Permutation test The problem of cluster forming threshold Fun fact: FWE with RFT

Page 32: If you liked it you should've put a p-value on it ...or not

Intuition

If we are interested in continuous regions of activations why are we looking at voxels not

blobs?

Page 33: If you liked it you should've put a p-value on it ...or not

Aww patters!

Page 34: If you liked it you should've put a p-value on it ...or not

No wait… it’s just smooth noise…

Page 35: If you liked it you should've put a p-value on it ...or not

What contributes to expected cluster size?

How likely is to get cluster of this size from pure noise?

It depends… on:

1. cluster forming threshold

2. smoothness of the map

3. size of the map

Page 36: If you liked it you should've put a p-value on it ...or not

Where do we get those parameters?

1. cluster forming threshold

– Arbitrary decision

2. smoothness of the map

– Estimated from the residuals of the GLM

3. size of the map

– Calculated from the mask

Page 37: If you liked it you should've put a p-value on it ...or not

Permutation based cluster extent probability

1. Break the relation: shuffle the participants between the groups

2. Perform the test

3. Threshold the map to get clusters

4. Save the sizes of all clusters

5. Repeat

Page 38: If you liked it you should've put a p-value on it ...or not

Permutation based cluster extent probability

Our cluster extent p value is the percentage of permutations that yielded cluster sizes bigger

than the original (unshuffled one)

Page 39: If you liked it you should've put a p-value on it ...or not

Cluster forming threshold conundrum

Page 40: If you liked it you should've put a p-value on it ...or not
Page 41: If you liked it you should've put a p-value on it ...or not

HONORABLE MENTIONS

TFCE

Mixture models

Page 42: If you liked it you should've put a p-value on it ...or not

Threshold Free Cluster Enhancement

Page 43: If you liked it you should've put a p-value on it ...or not

Spatially Regularized Mixture Models

Page 44: If you liked it you should've put a p-value on it ...or not

IMPLEMENTATIONS

SPM

FSL

AFNI

Page 45: If you liked it you should've put a p-value on it ...or not

SPM

• RFT based voxelwise FWE correction

• Smoothness estimation

• Cluster extent p-values

• Peak height p-values

• Permutation tests through SnPM toolbox

Page 46: If you liked it you should've put a p-value on it ...or not

FSL

• RFT based voxelwise FWE correction

• Smoothness estimation

• Cluster extent p-values

• FDR

• Permutation tests through randomize

– Including TFCE

Page 47: If you liked it you should've put a p-value on it ...or not

AFNI

• Cluster extent p-values (3dClustSim)

– Simulations are not permutations

• Smoothness estimation (3dFWHMx)

Page 48: If you liked it you should've put a p-value on it ...or not

Interim summary

Clusterwise methods allow us to find surprising patterns in terms of spatially consistent clusters

instead of individual voxels.

Page 49: If you liked it you should've put a p-value on it ...or not

LIMITATIONS OF P-VALUES

Page 50: If you liked it you should've put a p-value on it ...or not

P-VALUES ARE MEANINGLESS

Page 51: If you liked it you should've put a p-value on it ...or not

FORGET ALL I SAID SO FAR

Page 52: If you liked it you should've put a p-value on it ...or not

WE ARE ALL DOOMED

Page 53: If you liked it you should've put a p-value on it ...or not

P-value paradox

• There are no two entities or groups that are truly identical

• There are no two variables that are in no way unrelated

• We just fail to obtain enough samples to see it

– Or our tools are not sensitive enough

Page 54: If you liked it you should've put a p-value on it ...or not

More samples more “significance”

• The more subjects you will have in your study the more likely it is that you will find something significant

• The same applies to scan length, and field strength

Page 55: If you liked it you should've put a p-value on it ...or not

H0 is never true

we just fail to show that

Page 56: If you liked it you should've put a p-value on it ...or not

P-value failure

• P-values do not tell us much about actual size of the effect

• Neither do they tell of the predictive power of the found relation

Page 57: If you liked it you should've put a p-value on it ...or not

The interesting question

Is PCC involved in autism?

vs.

Given cortical thickness of a subjects PCC how well am I able to predict his or hers diagnosis?

Page 58: If you liked it you should've put a p-value on it ...or not

Why does this matter

• More subjects, longer scans, stronger scans – everything is significant

– We are getting there

• Lack of faith in science from the public

– Poor reproducibility

Page 59: If you liked it you should've put a p-value on it ...or not

What needs to be done

We need more replications

We need to start reporting null results

Page 60: If you liked it you should've put a p-value on it ...or not

What you can do

• Report effect sizes and their confidence intervals – For all test/voxels – not just those significant

• Share the unthresholded statistical maps – It only takes 5 minutes on neurovault.org

• Report all the tests you have performed – not just the significant ones

Page 61: If you liked it you should've put a p-value on it ...or not

http://dx.doi.org/10.1016/j.neuron.2012.05.001

Page 62: If you liked it you should've put a p-value on it ...or not
Page 63: If you liked it you should've put a p-value on it ...or not

If you liked it you should’ve convinced a skeptical researcher to

to try to replicate your results.