statistical analysis. null hypothesis: observed differences are due to chance (no causal...
TRANSCRIPT
Statistical Analysis
Null hypothesis: observed differences are due to chance (no causal relationship)• Ex. If light intensity increases, then the rate of
photosynthesis will not be affected
Alternative hypothesis: states that a causal relationship exists between independent variable and observed data• Ex: If light intensity increases, then the rate of
photosynthesis will increase
In statistics, the world is null until proven alternative
Null vs. Alternative Hypothesis
A mean is an average of all data points in a set A median is the middle value in a data set A mode is the most common value in a data set Percent difference shows the difference between
the means of the experimental and control groups• % difference = (│experimental – control│/ control) x 100
Standard deviation is the average measure of how much each value differs, or deviates, from the mean
With 2 data sets, you could have the same mean but very different standard deviations.
A small standard deviation shows more consistency
Mean, Median, Mode, % Difference, & Standard Deviation
Formula for Standard Deviation
What does this mean?
MeanN = Total # of values Each individual
value
SD example
Data Set 1: 4,4,4,4,4,6,6,6,6,6,5,5,5,5,5
Data Set 2: 5,5,5,4,4,6,6,3,3,7,7,1,1,9,9
Both sets have an identical mean…which data set has a smaller standard deviation?
Set 1 has less spread around the mean, which would give it a lower standard deviation
Mean and SD
For our data sets: Set 1: Mean = 5, SD = 0.8 Set 2: Mean = 5, SD = 2.4 What these numbers really mean is that, given a
normal (bell curve) distribution, 68% of data points fall within 1 SD of the mean, and 95% fall within 2 standard deviations
Precision of Data- BE CONSISTENT
Which data set is more useful? Why?
Error Bars When we graph our data,
we can use error bars to show the SD for each mean
What is the approximate standard deviation of meal worms per tray in the canopy cover group at 4 m from cover?
Error Bars When we graph our data,
we can use error bars to show the SD for each mean
What is the approximate standard deviation of meal worms per tray in the canopy cover group at 4 m from cover?
Answer: ~1 mealworm per tray
Chi-square Analysis
A chi-square analysis tests the significance of results
Answers the question: were the differences in the means large enough to reject the null hypothesis (and support the alternative hypothesis)?
Tests the probability of observed differences being random and NOT due to the independent variable
In the chi-square formula, the expected (e) values are those that you would expect if the null were true.
o = observed values
e= expected values
A p-value of .05 means there is a 5% chance that the difference between observed and expected data is random (95% chance that there is a significant difference)
Critical value – predetermined value establishing boundary for rejecting/accepting null hypothesis• Maximum chi-square value that would fail to reject null
hypothesis (i.e., chi-square value higher than the critical value shows support for the alternative hypothesis)
• Critical values will be provided in a chi-square table
• Dependent on degrees of freedom: number of possible outcomes minus 1 (d = N – 1)
Null vs. Alternative Hypothesis
CHI-SQUARE DISTRIBUTION TABLE Critical values
Accept Null Hypothesis (difference due to
chance) Reject Null Hypothesis
Probability (p-value)
Degrees of Freedom
0.95 0.90 0.80 0.70 0.50 0.30 0.20 0.10 0.05 0.01 0.001
1 0.004 0.02 0.06 0.15 0.46 1.07 1.64 2.71 3.84 6.64 10.83
2 0.10 0.21 0.45 0.71 1.39 2.41 3.22 4.60 5.99 9.21 13.82
3 0.35 0.58 1.01 1.42 2.37 3.66 4.64 6.25 7.82 11.34 16.27
4 0.71 1.06 1.65 2.20 3.36 4.88 5.99 7.78 9.49 13.38 18.47
5 1.14 1.61 2.34 3.00 4.35 6.06 7.29 9.24 11.07 15.09 20.52
6 1.63 2.20 3.07 3.83 5.35 7.23 8.56 10.64 12.59 16.81 22.46
7 2.17 2.83 3.82 4.67 6.35 8.38 9.80 12.02 14.07 18.48 24.32
8 2.73 3.49 4.59 5.53 7.34 9.52 11.03 13.36 15.51 20.09 26.12
9 3.32 4.17 5.38 6.39 8.34 10.66 12.24 14.68 16.92 21.67 27.88
10 3.94 4.86 6.18 7.27 9.34 11.78 13.44 15.99 18.31 23.21 29.59
Chi-square Analysis For example, using a p-value of .05 and 3 degrees of
freedom, a chi-square value must be greater than __________ (the critical value) to reject the null hypothesis and support the alternative hypothesis.
Put another way, a calculated chi-square value that is greater than 7.82 means there is a greater than 95% chance that there is a significant difference between the observed and expected data (less than 5% chance that the difference is random).
1.1.5 T-test
A T-test determines whether or not there is a significant difference between 2 samples
Assume we’re measuring wing span of 2 populations of eagles, 1 wild and 1 captive bred
We want to know if the difference between the lengths is significant (as opposed to being due to chance)
1.1.5 T-test
Captive: 180 cm, 187, 212, 196, 200, 204, 194, 189 Wild: 188, 205, 201, 214, 194, 189, 206, 203 Degrees of Freedom = 8 + 8 – 2 = 14 When we apply the T-test, and use a T value chart,
we obtain a 66% confidence level that the differences are significant. Not enough.
We need a confidence level of 95%, with a minimum sample size of 5.
1.1.6 Correlation and Causality
Simply because data shows a correlation does not imply causation.
Causation requires that one variable causes the other to occur.
The number of cavities in children shows a strong positive correlation with their vocabulary level.
? We should not assume that well spoken children
will have dentures by college.
Stats Quiz
1. Define standard deviation as required on the syllabus. (2)
2. State the usefulness of knowing a standard deviation. (2)
3. Give the minimum confidence level for results to be significant in science. (1)
4. If I told you that, based on measurements in a previous class, that blue haired people are hard of hearing, how would you respond (regarding the relationship)? (2)
Stats Quiz Answers
1. Summarize spread of values around mean, 68% of data lies within 1 SD of mean (95% within 2)
2. Comparing samples/ data points, large SD = bad, low SD = consistent
3. Greater than 95%4. Just because there is correlation does
not imply causation.