![Page 1: Biol 500: basic statistics Goals: 1) understand basics of experimental design - controls - replication 2) understand how to report quantitative data 3)](https://reader035.vdocuments.mx/reader035/viewer/2022062713/56649f515503460f94c74e93/html5/thumbnails/1.jpg)
Biol 500: basic statistics
Goals:
1) understand basics of experimental design - controls - replication
2) understand how to report quantitative data
3) be able to interpret the reporting of statistical results in a journal article
![Page 2: Biol 500: basic statistics Goals: 1) understand basics of experimental design - controls - replication 2) understand how to report quantitative data 3)](https://reader035.vdocuments.mx/reader035/viewer/2022062713/56649f515503460f94c74e93/html5/thumbnails/2.jpg)
Replication: allows you to determine if the difference between treatments or groups of samples is greater than the variation within a treatment or group
Is there a difference in how effective the 3 drugs are in curing headaches?
![Page 3: Biol 500: basic statistics Goals: 1) understand basics of experimental design - controls - replication 2) understand how to report quantitative data 3)](https://reader035.vdocuments.mx/reader035/viewer/2022062713/56649f515503460f94c74e93/html5/thumbnails/3.jpg)
?
no
yes
Replication: allows you to determine if the difference between treatments or groups of samples is greater than the variation within a treatment or group
Is there a difference in how effective the 3 drugs are in curing headaches?
Generally, overlapping error bars indicate no significant difference between the mean values that are being graphed
Bars don’t overlap = probably different
![Page 4: Biol 500: basic statistics Goals: 1) understand basics of experimental design - controls - replication 2) understand how to report quantitative data 3)](https://reader035.vdocuments.mx/reader035/viewer/2022062713/56649f515503460f94c74e93/html5/thumbnails/4.jpg)
Controls: From these data, could you tell if the least effective drug has any effect at all?
![Page 5: Biol 500: basic statistics Goals: 1) understand basics of experimental design - controls - replication 2) understand how to report quantitative data 3)](https://reader035.vdocuments.mx/reader035/viewer/2022062713/56649f515503460f94c74e93/html5/thumbnails/5.jpg)
Controls: From these data, could you tell if the least effective drug has any effect at all?
Key to interpreting your results: Include a control that is the same in all respects except the one variable you will manipulate experimentally
![Page 6: Biol 500: basic statistics Goals: 1) understand basics of experimental design - controls - replication 2) understand how to report quantitative data 3)](https://reader035.vdocuments.mx/reader035/viewer/2022062713/56649f515503460f94c74e93/html5/thumbnails/6.jpg)
Controls: Procedural controls allow you to diagnose problems in your experiment, samples or technique
When we amplify DNA from unknown samples by PCR, we include a positive control (a DNA sample that always works) and a negative control (all the PCR reagents, but no DNA)
This allows us to interpret the results of our gels, and to troubleshoot any problems
![Page 7: Biol 500: basic statistics Goals: 1) understand basics of experimental design - controls - replication 2) understand how to report quantitative data 3)](https://reader035.vdocuments.mx/reader035/viewer/2022062713/56649f515503460f94c74e93/html5/thumbnails/7.jpg)
Do squirrels bury acorns?
My experiment: I remove all the squirrels from 3 clumps of trees in one park, but leave the squirrels in 3 ‘control’ tree clumps in another park, on the other side of town
Park A Park B
![Page 8: Biol 500: basic statistics Goals: 1) understand basics of experimental design - controls - replication 2) understand how to report quantitative data 3)](https://reader035.vdocuments.mx/reader035/viewer/2022062713/56649f515503460f94c74e93/html5/thumbnails/8.jpg)
Pseudoreplication
In this example the unit of replication is the park, not the clump of trees – I have no actual replication
Park A Park B
Any difference I measure could be due to differences between the two parks, and not due to my squirrel-removal treatment
![Page 9: Biol 500: basic statistics Goals: 1) understand basics of experimental design - controls - replication 2) understand how to report quantitative data 3)](https://reader035.vdocuments.mx/reader035/viewer/2022062713/56649f515503460f94c74e93/html5/thumbnails/9.jpg)
Avoiding pseudoreplication
Correct design would be to have squirrel-removal and control areas in each of several replicate parks
This lets you assess differences between treatment and control areas, while simultaneously measuring variation among parks
Park A Park CPark B
![Page 10: Biol 500: basic statistics Goals: 1) understand basics of experimental design - controls - replication 2) understand how to report quantitative data 3)](https://reader035.vdocuments.mx/reader035/viewer/2022062713/56649f515503460f94c74e93/html5/thumbnails/10.jpg)
n = 10
Did these two classes do differently on my 418 midterm?
![Page 11: Biol 500: basic statistics Goals: 1) understand basics of experimental design - controls - replication 2) understand how to report quantitative data 3)](https://reader035.vdocuments.mx/reader035/viewer/2022062713/56649f515503460f94c74e93/html5/thumbnails/11.jpg)
n = 20
![Page 12: Biol 500: basic statistics Goals: 1) understand basics of experimental design - controls - replication 2) understand how to report quantitative data 3)](https://reader035.vdocuments.mx/reader035/viewer/2022062713/56649f515503460f94c74e93/html5/thumbnails/12.jpg)
n = 44
![Page 13: Biol 500: basic statistics Goals: 1) understand basics of experimental design - controls - replication 2) understand how to report quantitative data 3)](https://reader035.vdocuments.mx/reader035/viewer/2022062713/56649f515503460f94c74e93/html5/thumbnails/13.jpg)
n = 44 = 133.9 ± 29.7 SD range: 59 - 183
= 126.3 ± 38.8 SD range: 42 - 188
The statistical approach is to ask if the means of these two populations are significantly different
![Page 14: Biol 500: basic statistics Goals: 1) understand basics of experimental design - controls - replication 2) understand how to report quantitative data 3)](https://reader035.vdocuments.mx/reader035/viewer/2022062713/56649f515503460f94c74e93/html5/thumbnails/14.jpg)
n = 44
the standard deviation (SD) is what you should report if you are actually interested in the variation – ie, for purposes ofdeciding where to draw the line between grades
= 133.9 ± 29.7 SD range: 59 - 183
= 126.3 ± 38.8 SD range: 42 - 188
![Page 15: Biol 500: basic statistics Goals: 1) understand basics of experimental design - controls - replication 2) understand how to report quantitative data 3)](https://reader035.vdocuments.mx/reader035/viewer/2022062713/56649f515503460f94c74e93/html5/thumbnails/15.jpg)
n = 44 = 133.9 ± 29.7 SD or ± 4.3 SErange: 59 - 183
= 126.3 ± 38.8 SD or ± 5.8 SE range: 42 - 188
the standard error (SE, or SEM) is SD
√ n sample size
![Page 16: Biol 500: basic statistics Goals: 1) understand basics of experimental design - controls - replication 2) understand how to report quantitative data 3)](https://reader035.vdocuments.mx/reader035/viewer/2022062713/56649f515503460f94c74e93/html5/thumbnails/16.jpg)
n = 44
the standard error is what you report when you want to compare the means of different treatments or samples
= 133.9 ± 29.7 SD or ± 4.3 SErange: 59 - 183
= 126.3 ± 38.8 SD or ± 5.8 SE range: 42 - 188
![Page 17: Biol 500: basic statistics Goals: 1) understand basics of experimental design - controls - replication 2) understand how to report quantitative data 3)](https://reader035.vdocuments.mx/reader035/viewer/2022062713/56649f515503460f94c74e93/html5/thumbnails/17.jpg)
= 133.9 ± 29.7 SD
= 126.3 ± 38.8 SD
unpaired two-tailed t-test: df = 86, t = 1.20, P = 0.23
a t-test compares 2 populations by calculating a test statistic called t and determining the probability (P, or p) of getting that value of t, with that sample size, by chance alone
![Page 18: Biol 500: basic statistics Goals: 1) understand basics of experimental design - controls - replication 2) understand how to report quantitative data 3)](https://reader035.vdocuments.mx/reader035/viewer/2022062713/56649f515503460f94c74e93/html5/thumbnails/18.jpg)
unpaired two-tailed t-test: df = 86, t = 1.20, P = 0.23
paired would be, you compare the % scores on midterm versus final for each student; most tests are unpaired
= 133.9 ± 29.7 SD
= 126.3 ± 38.8 SD
![Page 19: Biol 500: basic statistics Goals: 1) understand basics of experimental design - controls - replication 2) understand how to report quantitative data 3)](https://reader035.vdocuments.mx/reader035/viewer/2022062713/56649f515503460f94c74e93/html5/thumbnails/19.jpg)
unpaired two-tailed t-test: df = 86, t = 1.20, P = 0.23
one-tailed if you have some reason to think, in advance, that the 2009 scores will only be higher (or lower) than 2007
- cuts your P-value in half, but you need a reason to do this!
= 133.9 ± 29.7 SD
= 126.3 ± 38.8 SD
![Page 20: Biol 500: basic statistics Goals: 1) understand basics of experimental design - controls - replication 2) understand how to report quantitative data 3)](https://reader035.vdocuments.mx/reader035/viewer/2022062713/56649f515503460f94c74e93/html5/thumbnails/20.jpg)
unpaired two-tailed t-test: df = 86, t = 1.20, P = 0.23
power of your test will depend on your degrees of freedom, which is (sample size) – (number of groups)
- in this case: (44 + 44 students) – (2 groups) = 88 -2 = 86
= 133.9 ± 29.7 SD
= 126.3 ± 38.8 SD
![Page 21: Biol 500: basic statistics Goals: 1) understand basics of experimental design - controls - replication 2) understand how to report quantitative data 3)](https://reader035.vdocuments.mx/reader035/viewer/2022062713/56649f515503460f94c74e93/html5/thumbnails/21.jpg)
unpaired two-tailed t-test: df = 86, t = 1.20, P = 0.23
P values below 0.05 are accepted as significant, meaning there is less than a 5% chance of getting a test statistic this large if the groups are not really any different
= 133.9 ± 29.7 SD
= 126.3 ± 38.8 SD
![Page 22: Biol 500: basic statistics Goals: 1) understand basics of experimental design - controls - replication 2) understand how to report quantitative data 3)](https://reader035.vdocuments.mx/reader035/viewer/2022062713/56649f515503460f94c74e93/html5/thumbnails/22.jpg)
3 or more samples can be compared using a one-way Analysis of Variance, or ANOVA
instead of calculating a t statistic, ANOVA calculates an F-ratio, which compares variation within groups (error bars) to the differences in mean values among groups
2 degrees of freedom: 1st = (# of groups – 1) 2nd = (total sample size) – (# of groups)
F2,129 = 7.12
P <0.001
df subscripted under F ratio
n = 44 n = 44 n = 44 overall P for 3-way comparison of means
![Page 23: Biol 500: basic statistics Goals: 1) understand basics of experimental design - controls - replication 2) understand how to report quantitative data 3)](https://reader035.vdocuments.mx/reader035/viewer/2022062713/56649f515503460f94c74e93/html5/thumbnails/23.jpg)
If your overall P-value is significant, you can then do a post-hoc(“after the fact”) test to work out which specific means are different from each other
Bonferroni - not too conservative; may see differences that aren’t real
Scheffe - very conservative; if it sees a difference, there really is one
Dunnett - compares each mean to a control; most powerful
F2,129 = 7.12
P<0.001Scheffe: P = 0.002n = 44 n = 44 n = 44
Scheffe: P = 0.050
P = 0.474
![Page 24: Biol 500: basic statistics Goals: 1) understand basics of experimental design - controls - replication 2) understand how to report quantitative data 3)](https://reader035.vdocuments.mx/reader035/viewer/2022062713/56649f515503460f94c74e93/html5/thumbnails/24.jpg)
2-way ANOVA tests for interactions among 2 or more factors
0
25
50
75
100
control aspirin only tylenol only aspirin + tylenol
factors: aspirin, yes/no tylenol, yes/no
![Page 25: Biol 500: basic statistics Goals: 1) understand basics of experimental design - controls - replication 2) understand how to report quantitative data 3)](https://reader035.vdocuments.mx/reader035/viewer/2022062713/56649f515503460f94c74e93/html5/thumbnails/25.jpg)
2-way ANOVA tests for interactions among 2 or more factors
0
25
50
75
100
control aspirin only tylenol only aspirin + tylenol
when the response to two treatments combined is not what you would expect from adding their individual effects, this is an interaction
interactions are usually the most biologically interesting result!
![Page 26: Biol 500: basic statistics Goals: 1) understand basics of experimental design - controls - replication 2) understand how to report quantitative data 3)](https://reader035.vdocuments.mx/reader035/viewer/2022062713/56649f515503460f94c74e93/html5/thumbnails/26.jpg)
2-way ANOVA tests for interactions among 2 or more factors
0
25
50
75
100
control aspirin only tylenol only aspirin + tylenol
NOT appropriate to do a 1-way ANOVA on these data, because that requires that each treatment be independent of the other treatments
- since 2 treatments involve aspirin, they are not independent
- also, you miss the interaction, which is the important result
A B C D
![Page 27: Biol 500: basic statistics Goals: 1) understand basics of experimental design - controls - replication 2) understand how to report quantitative data 3)](https://reader035.vdocuments.mx/reader035/viewer/2022062713/56649f515503460f94c74e93/html5/thumbnails/27.jpg)
Correlation analysis is appropriate when you think 2 variables are related, but not in a cause-and-effect way
- arm length and leg length are related, but longer arms do not cause you to have longer legs; both are due to your height
Regression analysis is when you believe a change in one predictor variable (what you manipulate) causes a change in the response variable (the thing you measure)
- adding more water makes plants grow taller
![Page 28: Biol 500: basic statistics Goals: 1) understand basics of experimental design - controls - replication 2) understand how to report quantitative data 3)](https://reader035.vdocuments.mx/reader035/viewer/2022062713/56649f515503460f94c74e93/html5/thumbnails/28.jpg)
Output of a regression analysis includes:
1) ANOVA table
tells you if your modelexplains a significantamount of the variationin the response
![Page 29: Biol 500: basic statistics Goals: 1) understand basics of experimental design - controls - replication 2) understand how to report quantitative data 3)](https://reader035.vdocuments.mx/reader035/viewer/2022062713/56649f515503460f94c74e93/html5/thumbnails/29.jpg)
Output of a regression analysis includes:
1) ANOVA table
2) equation of the best-fit line
summarizes the relationship between predictor and response
![Page 30: Biol 500: basic statistics Goals: 1) understand basics of experimental design - controls - replication 2) understand how to report quantitative data 3)](https://reader035.vdocuments.mx/reader035/viewer/2022062713/56649f515503460f94c74e93/html5/thumbnails/30.jpg)
Output of a regression analysis includes:
1) ANOVA table
2) equation of the best-fit line
3) table testing the effect of each predictor
in multiple regression, you can test many possible predictors that might matter, and see which significantly affect the response variable
![Page 31: Biol 500: basic statistics Goals: 1) understand basics of experimental design - controls - replication 2) understand how to report quantitative data 3)](https://reader035.vdocuments.mx/reader035/viewer/2022062713/56649f515503460f94c74e93/html5/thumbnails/31.jpg)
Output of a regression analysis includes:
1) ANOVA table
2) equation of the best-fit line
3) table testing the effect of each predictor
4) r2
r2 is the % of variation in the response that is due a change in the predictor
![Page 32: Biol 500: basic statistics Goals: 1) understand basics of experimental design - controls - replication 2) understand how to report quantitative data 3)](https://reader035.vdocuments.mx/reader035/viewer/2022062713/56649f515503460f94c74e93/html5/thumbnails/32.jpg)
More scatter = lower r2
You can have a low r2, but still have a significant slope
![Page 33: Biol 500: basic statistics Goals: 1) understand basics of experimental design - controls - replication 2) understand how to report quantitative data 3)](https://reader035.vdocuments.mx/reader035/viewer/2022062713/56649f515503460f94c74e93/html5/thumbnails/33.jpg)
ANOVA and regression are both types of linear models, which test the same basic equation:
response variable = model + error
thing you measure
predictors, andcoefficients thattell you how theyaffect the response
variance in the response that is not explained by the model
this is what a simple linearregression model looks like
![Page 34: Biol 500: basic statistics Goals: 1) understand basics of experimental design - controls - replication 2) understand how to report quantitative data 3)](https://reader035.vdocuments.mx/reader035/viewer/2022062713/56649f515503460f94c74e93/html5/thumbnails/34.jpg)
Does predictor X affect response?
test is to set the coefficient = 0, which drops out the predictor, and see if the model (now just the residual error term) is really any worse
![Page 35: Biol 500: basic statistics Goals: 1) understand basics of experimental design - controls - replication 2) understand how to report quantitative data 3)](https://reader035.vdocuments.mx/reader035/viewer/2022062713/56649f515503460f94c74e93/html5/thumbnails/35.jpg)
All of the tests we have discussed are parametric tests
- they use the numerical values of your actual data
- however, they also have built-in assumptions that your data, and the residual errors, fit a normal distribution
(bell curve)
Parametric versus non-parametric tests
0
2
4
6
8
10
12
14
16
Cou
nt
75 100 125 150 175 200 225 250 275 300Column 1
Histogram
![Page 36: Biol 500: basic statistics Goals: 1) understand basics of experimental design - controls - replication 2) understand how to report quantitative data 3)](https://reader035.vdocuments.mx/reader035/viewer/2022062713/56649f515503460f94c74e93/html5/thumbnails/36.jpg)
If your data do not fit a normal distribution, you can transform the raw numbers to make them more “normal” – put the data through a mathematical function
Parametric versus non-parametric tests
0
2
4
6
8
10
12
14
16
Cou
nt
.3 .4 .5 .6 .7 .8 .9 1Column 1
Histogram
0
2
4
6
8
10
12
14
16
18
20
Cou
nt
.5 .6 .7 .8 .9 1 1.1 1.2 1.3 1.4 1.5Column 2
Histogram
% scores
arcsine(square-root(%)) is a standard transformation for %’s which stop at 100%, and are often not normally distributed
![Page 37: Biol 500: basic statistics Goals: 1) understand basics of experimental design - controls - replication 2) understand how to report quantitative data 3)](https://reader035.vdocuments.mx/reader035/viewer/2022062713/56649f515503460f94c74e93/html5/thumbnails/37.jpg)
Parametric versus non-parametric testsAlternatively, there are non-parametric versions of most common statistical tests that use ranked values instead of the raw data
- are typically more conservative: if they see a difference, it is real
- make no assumptions about the shape of the distribution
raw ranked (high to low)3 52 66 34 49 21 712 1