practical statistics for neuroscience miniprojects steven kiddle slides & data :

74
Practical statistics for Neuroscience miniprojects Steven Kiddle Slides & data : http://bit.ly/1Jaor2r

Upload: elaine-morris

Post on 20-Jan-2016

223 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Practical statistics for Neuroscience miniprojects Steven Kiddle Slides & data :

Practical statistics for Neuroscience miniprojects

Steven Kiddle

Slides & data : http://bit.ly/1Jaor2r

Page 2: Practical statistics for Neuroscience miniprojects Steven Kiddle Slides & data :

We are unlikely to finish all the slides

Keep them, they may be helpful for your miniproject

Page 3: Practical statistics for Neuroscience miniprojects Steven Kiddle Slides & data :

Lecture outline

• Taught component– How to present statistics– Hypothesis testing– Normal distribution

• Practical component– Plots– Statistical tests– Multiple testing corrections

Page 4: Practical statistics for Neuroscience miniprojects Steven Kiddle Slides & data :

Taught component

Page 5: Practical statistics for Neuroscience miniprojects Steven Kiddle Slides & data :

Why are statistics important?

• Help to make science more repeatable and objective

• Help you to interpret your results• Help you to assess the level of evidence you

have supporting a hypothesis

• A vital skill for a scientific career!

Page 6: Practical statistics for Neuroscience miniprojects Steven Kiddle Slides & data :

How to report statistics

• Always report:(1) Statistical software you used(2) Statistical tests you used(3) Significance level you used(4) Sample size

I checked these in 17 randomly chosen neuroscience project posters

Page 7: Practical statistics for Neuroscience miniprojects Steven Kiddle Slides & data :

How not to report statistics!

• I found that:(1) 16/17 didn’t report the statistical software used(2) 11/17 didn’t report the statistical tests used(3) 9/17 didn’t report the significance level used(4) 2/17 didn’t report the sample size!

Page 8: Practical statistics for Neuroscience miniprojects Steven Kiddle Slides & data :

Commonly used analysis methods

• Plotting:- Box plots- Line plots

• Hypothesis testing– T-test– ANOVA– Chi-squared

Page 9: Practical statistics for Neuroscience miniprojects Steven Kiddle Slides & data :

Hypothesis testing

• Two types of hypothesis– Null hypothesis (H0)• Usually that there are no differences between groups

or that two variables are unrelated– Example : (H0) Smoking and lung cancer are unrelated

– Alternative hypothesis (H1)• There are differences between groups, or that two

variables are related– Example : (H1) Smoking and lung cancer are associated

Page 10: Practical statistics for Neuroscience miniprojects Steven Kiddle Slides & data :
Page 11: Practical statistics for Neuroscience miniprojects Steven Kiddle Slides & data :
Page 12: Practical statistics for Neuroscience miniprojects Steven Kiddle Slides & data :

Significance levels

• You accept the alternative hypothesis if the chance of your data being generated under the null hypothesis (the ‘p-value’) is beneath a pre-specified significance level α– Typically α = 0.05

• You should state the significance threshold you use in your report

Page 13: Practical statistics for Neuroscience miniprojects Steven Kiddle Slides & data :
Page 14: Practical statistics for Neuroscience miniprojects Steven Kiddle Slides & data :

Multiple hypothesis testing I• Suppose you have a significance threshold of α = 0.05

• Suppose that you measure 100 variables that are NOT related to a disease

• You perform 100 hypothesis tests to compare your variables to disease state• H0 : Variable is not affected by disease state

• H1 : Variable is affected by disease state

• For how many variables do you expect to reject the null hypothesis (H0) even though its true?

Page 15: Practical statistics for Neuroscience miniprojects Steven Kiddle Slides & data :

Multiple hypothesis testing II

• α = 0.05 means that if the null hypothesis (H0) is true, we would expect to reject it 5% of the time

• So if H0 is true and we did 100 tests, we would expect to reject H0 5 times by chance alone• That is bad, these findings will not replicate

• How do we stop it?• Multiple testing corrections

Page 16: Practical statistics for Neuroscience miniprojects Steven Kiddle Slides & data :

Multiple testing corrections

• Bonferroni correction– If we want α = 0.05 , instead use α = 0.05/n

where n is the number of tests you want to use– So for 100 tests, we would use α = 0.0005,

and would only have 5% chance of any test rejecting the null hypothesis

• Benjamini-Hochberg correction– Popular alternative

Page 17: Practical statistics for Neuroscience miniprojects Steven Kiddle Slides & data :
Page 18: Practical statistics for Neuroscience miniprojects Steven Kiddle Slides & data :

Normal distribution

Page 19: Practical statistics for Neuroscience miniprojects Steven Kiddle Slides & data :

Tests that rely on assumptions of normality

• T-tests• ANOVA / linear models

Page 20: Practical statistics for Neuroscience miniprojects Steven Kiddle Slides & data :

How to check if you data is normally distributed

• Histograms• Statistical tests

• Can apply to data

• But better to apply to residuals of the models– For t-test, that means looking at the groups separately– For ANOVA, that means extracting residuals from the

model

Page 21: Practical statistics for Neuroscience miniprojects Steven Kiddle Slides & data :

What do you do if your data is not normally distributed?

• If sample size is really small– Nothing you can do – use test anyway

• If data is skewed– Transform data (e.g. log? square root?)

• Use non-parametric tests– Mann-whitney U instead of T-test– Spearman’s Rank Correlation

• Last resort - Remove outliers?– Systematically and preferably only if you know what causes them

Page 22: Practical statistics for Neuroscience miniprojects Steven Kiddle Slides & data :

How to present plots

• Label both axes– Large enough to read

• Show units• If using stars (*) for significance levels, explain

what *, **, *** means

Lunnon et al., (2012) Journal of Alzheimer’s Disease

Page 23: Practical statistics for Neuroscience miniprojects Steven Kiddle Slides & data :

How to present statistics I

• Say what statistical software used, e.g.– SPSS, STATA, R, MATLAB, etc

• Say what the sample size is

• Say what statistical test is being performed– T-test, ANOVA, chi-squared, etc

• Say what significance level you are using for the study– Think, is it appropriate given my sample size and number of

hypotheses being tested?

Page 24: Practical statistics for Neuroscience miniprojects Steven Kiddle Slides & data :

How to present statistics II

• Report p-value– And/or multiple testing corrected p-value• E.g. Q-values for Benjamini-Hochberg

• Report coefficient (β), and ideally it’s standard error for each reported statistic– This can be more informative than a p-value,

especially for small datasets

Page 25: Practical statistics for Neuroscience miniprojects Steven Kiddle Slides & data :

How to present statistics III

A more complete guide, tailored to SPSS and specific tests is given at:

http://statistics-help-for-students.com/

Page 26: Practical statistics for Neuroscience miniprojects Steven Kiddle Slides & data :

Be cautious in your interpretations• Correlation does not equal causation!

• Can you hypothesise a mechanism by which causation could occur?

Page 27: Practical statistics for Neuroscience miniprojects Steven Kiddle Slides & data :

Why does correlation not equal causation?

• It looks like the variables are correlated when they are not– How this happens?• By chance, especially when multiple testing is

performed but not corrected for

• Variables are truly correlated but there is either:– Reverse causation– Confounding by other variables

Page 28: Practical statistics for Neuroscience miniprojects Steven Kiddle Slides & data :

Confounding

Page 29: Practical statistics for Neuroscience miniprojects Steven Kiddle Slides & data :
Page 30: Practical statistics for Neuroscience miniprojects Steven Kiddle Slides & data :

Statistical software

• Excel– Point and click, quite limited

• SPSS– Point and click, a little limited

• STATA– Command line

• R, MATLAB, etc– Command line, very useful, steep learning curve

Page 31: Practical statistics for Neuroscience miniprojects Steven Kiddle Slides & data :

http://www.r-project.org/http://www.rstudio.com/

Page 32: Practical statistics for Neuroscience miniprojects Steven Kiddle Slides & data :

R introduction

Page 33: Practical statistics for Neuroscience miniprojects Steven Kiddle Slides & data :

Practical component - SPSS

Data is faked to show large differences, real data will not be so

clear cut

Page 34: Practical statistics for Neuroscience miniprojects Steven Kiddle Slides & data :

Outline

• Data• Tests for normality• Plots• T-test• ANOVA• Chi-squared• Non-parametric tests

Page 35: Practical statistics for Neuroscience miniprojects Steven Kiddle Slides & data :

Data

• Create folder in ‘My Documents’• Download data and save in your new folder:

Slides & data : http://bit.ly/1Jaor2r

• Open zip folder• Double click on ‘neuroscience_example.sav’ to

open SPSS

Page 36: Practical statistics for Neuroscience miniprojects Steven Kiddle Slides & data :

Introduction to the data

• 5 variables– a , b , c , d , e

• 2 are binary– a , b

• 3 are continuous– c , d , e

Page 37: Practical statistics for Neuroscience miniprojects Steven Kiddle Slides & data :

Normality checks I

• Need to check data is normally distributed when we want to apply– T-test– ANOVA– Linear regression

Page 38: Practical statistics for Neuroscience miniprojects Steven Kiddle Slides & data :

Normality checks II

Let’s see if the variable ‘d’ is normally distributed

Page 39: Practical statistics for Neuroscience miniprojects Steven Kiddle Slides & data :

Normality checks III

Can see that the data has two peaks

Rejects the null hypothesis that the data is normally distributed

Page 40: Practical statistics for Neuroscience miniprojects Steven Kiddle Slides & data :

Normality checks IIINow we take into account variable ‘a’, we find that ‘d’ is normally distributed when we take into account ‘a’

Page 41: Practical statistics for Neuroscience miniprojects Steven Kiddle Slides & data :

Plots

• Histograms (shown in normality check)– Show distribution of a continuous variable

• Boxplots– Show the distribution of a continuous variable between

groups

• Line plot/scatter plot– Shows the relationship between two continuous

variables

Page 42: Practical statistics for Neuroscience miniprojects Steven Kiddle Slides & data :

Generating a boxplot I

Page 43: Practical statistics for Neuroscience miniprojects Steven Kiddle Slides & data :

Generating a boxplot II

Page 44: Practical statistics for Neuroscience miniprojects Steven Kiddle Slides & data :

Generating a boxplot III

Page 45: Practical statistics for Neuroscience miniprojects Steven Kiddle Slides & data :

Generating a boxplot IV

Double click on plot to label axis

Page 46: Practical statistics for Neuroscience miniprojects Steven Kiddle Slides & data :

Labelling a plot I

Double click to change label

Page 47: Practical statistics for Neuroscience miniprojects Steven Kiddle Slides & data :

How to present plots

• Label both axes– Large enough to read

• Show units• If using stars (*) for significance levels, explain

what *, **, *** means

Lunnon et al., (2012) Journal of Alzheimer’s Disease

Page 48: Practical statistics for Neuroscience miniprojects Steven Kiddle Slides & data :

Labelling a plot II

Page 49: Practical statistics for Neuroscience miniprojects Steven Kiddle Slides & data :

Saving plots

Rename document and save in your folderYou can now open the document and extract the plot as an image

Page 50: Practical statistics for Neuroscience miniprojects Steven Kiddle Slides & data :

Boxplot exercises

• Make a few more boxplots comparing binary variables to continuous variables

• Try adding labels• Try saving• Try to interpret the boxplot– Do you see differences between the groups

Page 51: Practical statistics for Neuroscience miniprojects Steven Kiddle Slides & data :

Generating a line plot I

Page 52: Practical statistics for Neuroscience miniprojects Steven Kiddle Slides & data :

Generating a line plot II

Page 53: Practical statistics for Neuroscience miniprojects Steven Kiddle Slides & data :

T-test

• Compares a binary variable (yes/no) to a continuous variable

• Example null hypothesis– Mean height is the same across males and females

• Example alternative hypothesis– Mean height is different between males and

females

Page 54: Practical statistics for Neuroscience miniprojects Steven Kiddle Slides & data :

Performing a t-test I

Page 55: Practical statistics for Neuroscience miniprojects Steven Kiddle Slides & data :

Performing a t-test II

Descriptive statistics

Page 56: Practical statistics for Neuroscience miniprojects Steven Kiddle Slides & data :

Performing a t-test II

Levene test null hypothesis : the variance (and standard deviation) of the two groups are the same.

At the significance level 0.05 we can reject the null hypothesis.

Therefore we should use the second row (‘Equal variances not assumed’).

Page 57: Practical statistics for Neuroscience miniprojects Steven Kiddle Slides & data :

Performing a t-test II

T-test null hypothesis : the mean of the two groups are the same

At the significance level 0.05 we can reject the null hypothesis (p-value is less than 0.001).

I.e. the data supports the fact that the variable ‘d’ is different between the two groups.

Page 58: Practical statistics for Neuroscience miniprojects Steven Kiddle Slides & data :

How to report findings (option A)

• A two-sample t-test assuming unequal variances performed in SPSS showed differences in variable ‘d’ between groups 0 (N = 19) and 1 (N = 31) in variable ‘a’ at the 5% significance level (mean difference = 1.6, standard error = 0.89, p-value < 0.001 ).

Page 59: Practical statistics for Neuroscience miniprojects Steven Kiddle Slides & data :

How to report findings (option B)

• Materials and methods:– Statistical analysis

• Statistical analysis was performed in SPSS 20.• Group differences were analysed using two sample t-test

assuming unequal variances.• A significance level of 5% was applied to all hypothesis tests.

• Results:– Variable ‘d’ was found to differ between groups 0 (N = 19)

and 1 (N = 31) in variable ‘a’ (mean difference = 1.6, standard error = 0.89, p-value < 0.001 ).

Page 60: Practical statistics for Neuroscience miniprojects Steven Kiddle Slides & data :

How that might look

• Materials and methods:– Statistical analysis

• Statistical analysis was performed in SPSS 20.• Group differences were analysed using two sample t-test

assuming unequal variances.• A significance level of 5% was applied to all hypothesis tests.

• Results:– Change in blood glucose levels differed between males (N

= 19) and females (N = 31) (mean difference = 1.6 ng/ml , standard error = 0.89, p-value < 0.001 ).

Page 61: Practical statistics for Neuroscience miniprojects Steven Kiddle Slides & data :

How to present statistics

A more complete guide, tailored to SPSS and specific tests is given at:

http://statistics-help-for-students.com/

Page 62: Practical statistics for Neuroscience miniprojects Steven Kiddle Slides & data :

One-way ANOVA

• Extension of t-test idea• Compares a binary variable (yes/no) to a several

variables, continuous or nominal (including binary)• Example null hypothesis– Mean height is the same across males and females,

regardless of age• Example alternative hypotheses– Mean height is different between males and females– Mean height differs across ages

Page 63: Practical statistics for Neuroscience miniprojects Steven Kiddle Slides & data :

Performing ANOVA I

Page 64: Practical statistics for Neuroscience miniprojects Steven Kiddle Slides & data :

Performing ANOVA II

Which variables differ between variable ‘a’ group 0 and 1?

How would you perform a Bonferroni multiple testing correction?

How would you report these findings? (clue: http://statistics-help-for-students.com/ )

Page 65: Practical statistics for Neuroscience miniprojects Steven Kiddle Slides & data :

Chi-squared test

• Compares multiple nominal variables• Example null hypothesis– Lung cancer and smoking are unrelated

• Example alternative hypotheses– Smokers are more likely to have lung cancer

Page 66: Practical statistics for Neuroscience miniprojects Steven Kiddle Slides & data :

Performing chi-squared I

Page 67: Practical statistics for Neuroscience miniprojects Steven Kiddle Slides & data :

Performing chi-squared II

How would you report these findings? (clue: http://statistics-help-for-students.com/ )

Page 68: Practical statistics for Neuroscience miniprojects Steven Kiddle Slides & data :

Non-parametric tests

• If a continuous variable is not normally distributed, using parametric tests may give you misleading results– T-test and ANOVA are parametric tests

• Solution, use non-parametric tests– Such as• Mann-Whitney U• Spearman’s Rank Correlation

Page 69: Practical statistics for Neuroscience miniprojects Steven Kiddle Slides & data :

Mann-Whitney U

• A non-parametric equivalent of a t-test

Page 70: Practical statistics for Neuroscience miniprojects Steven Kiddle Slides & data :

Performing Mann-Whitney U I

Page 71: Practical statistics for Neuroscience miniprojects Steven Kiddle Slides & data :

Performing Mann-Whitney U II

How would you report these findings? (clue: http://statistics-help-for-students.com/ )

Page 72: Practical statistics for Neuroscience miniprojects Steven Kiddle Slides & data :

Spearman’s Rank Correlation

• A non-parametric equivalent of correlation, or a one-way ANOVA between a binary variable and a continuous variable

Page 73: Practical statistics for Neuroscience miniprojects Steven Kiddle Slides & data :

Performing Spearman’s Rank Correlation I

Page 74: Practical statistics for Neuroscience miniprojects Steven Kiddle Slides & data :

Performing Spearman’s Rank Correlation II

How would you report these findings? (clue: http://statistics-help-for-students.com/ )