histograms and distributions experiment: do athletes have faster reflexes than non-athletes?...
Post on 04-Jan-2016
216 Views
Preview:
TRANSCRIPT
Histograms and Distributions
Experiment:
Do athletes have faster reflexes than non-athletes?
Questions:
- You go out and 1st collect the reaction time of 25 non-athletes.
Histograms and DistributionsNon-AthletesIndividual Reaction Time (ms)
1 2302 2683 2434 2335 2106 3297 3148 2789 324
10 31111 21012 22513 29514 28215 27416 27017 30718 24719 29820 27621 25722 23323 25624 29825 300
Calculate the mean…
278.5
Non-Athletes reaction time in millliseconds (ms)
Histograms and Distributions
Calculate the mean score…
264.4
Compare:
AthletesIndividual Reaction Time (ms)
1 2152 2183 2234 2265 2306 2317 2318 2459 251
10 25511 26112 26513 26814 27015 27516 27517 28418 28719 29020 29421 29422 29823 30124 30725 315
athletes Non-
athletes
mean 264.4 278.5
Athletes reaction time in millliseconds (ms)
Histograms and Distributions
Make a histogram to display the data…
Non-Athletes reaction time in millliseconds (ms) arranged from low to high reaction time
Histograms and Distributions
Histogram = a plot of frequency
Histogram
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
200-210210-220221-230231-240241-250251-260261-270271-280281-290291-300301-310311-320321-329330-339reaction time (ms)
frequency
Series1
Non-athletesSample size: 25
Histograms and Distributions
Athletes
Histogram
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
200-210210-220221-230231-240241-250251-260261-270271-280281-290291-300301-310311-320321-329330-339reaction time (ms)
frequency
Series1
Sample size: 25
Histograms and Distributions
Athletes
Histogram
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
200-210210-220221-230231-240241-250251-260261-270271-280281-290291-300301-310311-320321-329330-339reaction time (ms)
frequency
Series1
Histogram
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
200-210210-220221-230231-240241-250251-260261-270271-280281-290291-300301-310311-320321-329330-339reaction time (ms)
frequency
Series1
Non-athletes
MEAN: 264.4278.5
Compare the histograms of non-athletes to athletes:
Histograms and Distributions
AthletesNon-athletes
MEAN: 264.4278.5
Compare the histograms of non-athletes to athletes:
Reaction time (ms)
Num
ber
of
stud
ents
(fr
eque
ncy)
Q: Is there really a difference between these two groups???
Histograms and Distributions
The student decided to collect more data (larger sample size), which is really the only option at this point…
bin non-athletes athletes200-210 0 3210-220 1 6221-230 2 8231-240 2 12241-250 1 15251-260 2 10261-270 2 8271-280 6 6281-290 12 3291-300 17 3301-310 15 2311-320 9 1321-329 4 0330-339 0 0sample size 73 77
Reaction time (ms)
Num
ber
of s
tude
nts
(fre
quen
cy)
AthletesNon-athletesMEAN: 251298
Histograms and Distributions
Reaction time (ms)
Num
ber
of s
tude
nts
(fre
quen
cy)
AthletesNon-athletesMEAN: 251298
AthletesNon-athletesMEAN: 264279
Reaction time (ms)
Num
ber
of s
tude
nts
(fre
quen
cy)
Comparison of histograms with small vs. large sample size:
Sample size: 25 in each group (N=50) Sample size: 73 in non-athletes 77 in athletes(N=150)
Histograms and Distributions
AthletesNon-athletes
MEAN: 264.4278.5
Let’s go back to the small sample size data…
Reaction time (ms)
Num
ber
of
stud
ents
(fr
eque
ncy)
How can we determine if there is a significant difference between these two groups?
Histograms and Distributions
Standard deviation (sigma)
Normal or Gaussian Distribution
First one needs to determine the standard deviation, which is basically a measure of the width of the histogram.
For example, the mean of the non-athletes is 278.5 ms. If the standard dev. is determined to be 30 ms, then it is assumed that 68.2% of the data will fall between 278.5 +/- 30ms (between 248.5 and 308.5 ms).
Would you prefer your standard dev. to be larger or smaller in value?
Histograms and Distributions
How do we determine the standard deviation (sigma) of the mean?
Histograms and Distributions1. Find the distance between each value and the mean
Individual Reaction Time (ms)
1 210 -68.522 225 -53.523 233 -45.54 233 -45.55 247 -31.56 256 -22.57 257 -21.58 268 -10.59 270 -8.5
10 274 -4.511 276 -2.512 278 -0.513 282 3.514 286 7.515 287 8.516 295 16.517 298 19.518 298 19.519 300 21.520 305 26.521 307 28.522 311 32.523 314 35.524 324 45.525 329 50.5
Non-Athletes
210-278.5225-278.5233-278.5233-278.5…
This will tell you how far away each value is from the mean and begin to help you understand the width of your distribution.
Histograms and Distributions2. Square all the differences
Individual Reaction Time (ms)
1 210 -68.52 4694.99042 225 -53.52 2864.39043 233 -45.5 2070.254 233 -45.5 2070.255 247 -31.5 992.256 256 -22.5 506.257 257 -21.5 462.258 268 -10.5 110.259 270 -8.5 72.25
10 274 -4.5 20.2511 276 -2.5 6.2512 278 -0.5 0.2513 282 3.5 12.2514 286 7.5 56.2515 287 8.5 72.2516 295 16.5 272.2517 298 19.5 380.2518 298 19.5 380.2519 300 21.5 462.2520 305 26.5 702.2521 307 28.5 812.2522 311 32.5 1056.2523 314 35.5 1260.2524 324 45.5 2070.2525 329 50.5 2550.25
Non-Athletes
Histograms and Distributions3. Sum all the squares
Individual Reaction Time (ms)
1 210 -68.52 4694.99042 225 -53.52 2864.39043 233 -45.5 2070.254 233 -45.5 2070.255 247 -31.5 992.256 256 -22.5 506.257 257 -21.5 462.258 268 -10.5 110.259 270 -8.5 72.25
10 274 -4.5 20.2511 276 -2.5 6.2512 278 -0.5 0.2513 282 3.5 12.2514 286 7.5 56.2515 287 8.5 72.2516 295 16.5 272.2517 298 19.5 380.2518 298 19.5 380.2519 300 21.5 462.2520 305 26.5 702.2521 307 28.5 812.2522 311 32.5 1056.2523 314 35.5 1260.2524 324 45.5 2070.2525 329 50.5 2550.25
Non-Athletes
23957.13
Histograms and Distributions4. Divide the sum by the number of scores minus 1
Individual Reaction Time (ms)
1 210 -68.52 4694.99042 225 -53.52 2864.39043 233 -45.5 2070.254 233 -45.5 2070.255 247 -31.5 992.256 256 -22.5 506.257 257 -21.5 462.258 268 -10.5 110.259 270 -8.5 72.25
10 274 -4.5 20.2511 276 -2.5 6.2512 278 -0.5 0.2513 282 3.5 12.2514 286 7.5 56.2515 287 8.5 72.2516 295 16.5 272.2517 298 19.5 380.2518 298 19.5 380.2519 300 21.5 462.2520 305 26.5 702.2521 307 28.5 812.2522 311 32.5 1056.2523 314 35.5 1260.2524 324 45.5 2070.2525 329 50.5 2550.25
Non-Athletes
23957.1324
998.2(variance)
Histograms and Distributions5. Take the square root of the variance
Individual Reaction Time (ms)
1 210 -68.52 4694.99042 225 -53.52 2864.39043 233 -45.5 2070.254 233 -45.5 2070.255 247 -31.5 992.256 256 -22.5 506.257 257 -21.5 462.258 268 -10.5 110.259 270 -8.5 72.25
10 274 -4.5 20.2511 276 -2.5 6.2512 278 -0.5 0.2513 282 3.5 12.2514 286 7.5 56.2515 287 8.5 72.2516 295 16.5 272.2517 298 19.5 380.2518 298 19.5 380.2519 300 21.5 462.2520 305 26.5 702.2521 307 28.5 812.2522 311 32.5 1056.2523 314 35.5 1260.2524 324 45.5 2070.2525 329 50.5 2550.25
Non-Athletes
31.6(standard deviation)
Histograms and DistributionsStandard deviation formula (what we just did):
- the square root of the sum of the squared deviations from the mean divided by the number of scores minus one
Histograms and DistributionsStandard deviation formula:
Non-athletes: 278.5 SD(σ)=31.6
Athletes: 264.4 SD(σ)=30.6
Are these groups statistically different from each other??
Histograms and Distributions
T-Test
assesses whether the means of two groups are statistically different from each other
Histograms and Distributions
Histograms and Distributions
Histograms and Distributions
= Standard Error of the difference
Histograms and Distributions
Histograms and Distributions
Histograms and Distributions
Therefore the t-value is related to how different the means are and how broad yours data is. A high t-value is obviously what you hope for…
Calculate the t-score
Histograms and Distributions
t = -1.61-Degrees of freedom is the sum of the people in both groups minus 2
df = 48
Histograms and Distributions
The null hypothesis vs the hypothesis
1. The hypothesis:
Athletes will have a quicker reaction time than non-athletes.
2. The null hypothesis:
The null hypothesis always states that there is no relationship between the two groups or there is no difference in reaction time between athletes and non-athletes.
Histograms and Distributions
3. Therefore, the probability that there is a difference between the two groups is 1 minus the p-value.
4. In order for the data to support the hypothesis, the p-value must be high or low?
1. The p-value is a number between 0 and 1.
The p-value
2. It is the probability (hence the p-value) that there is no difference between the groups supporting the null hypothesis.
The p-value should be low (<0.05), which says that there is less than a 5% chance that there is no difference between the two groups. Therefore, there is greater than 95% chance that there is a difference.
Histograms and Distributions
When the p-value is less than 0.05, we say that the data is statistically significant, and there may be a real difference between the two groups.
Statistical Significance
Be warned that just because p is less than 0.05 between two groups doesn’t mean that there is actually a difference. For example, if we find p < 0.05 for the reaction time experiment, it doesn’t mean that there is a definite difference between athletes and non-athletes. It only means that there is a difference in our data, but our data might be flawed or there is not enough data yet (sample size too small) or we measured the data improperly, or the sampling wasn’t random, or the experiment was garbage, etc…
Doubt is the greatest tool of any scientist (person).
Histograms and Distributions
http://bioinfo-out.curie.fr/ittaca/documentation/Images/ttable.gif
http://davidmlane.com/hyperstat/t-table.html
The p-value is found by using a standard t-table in combination with the t-value and the degrees of freedom previously determined:
How is the p-value determined?
Histograms and DistributionsNow you try it:
1. On Edmodo you will find data collected by Tom and Ileana regarding one’s ability to estimate the length of a line or the number of spots on a screen.
2. The questions were accompanied with a survey that asked for the subject’s grade level, ethnicity, participation in sports, and honors vs. regents level.
3. The wanted to know if any of these differences would correlate to their ability to estimate.
How should we analyze this data?
Histograms and Distributions
1. Begin by choosing the dependent variable like grade for example.
Since the T-test can only look at two groups simultaneously and there are four grades, we need to perform all the possible combinations (there was apparently only one 9 th grader and therefore the sample size is too low to look at this grade):
10th vs 11th
10th vs 12th 11th vs 12th
We also would want to know if the mean of each group is significantly different than the actual value.
Actual value vs 10th
Actual value vs 11th Actual value vs 12th
This needs to be done twice, once for the line estimation and once for the dots estimation!!
Histograms and DistributionsThese are the tables you need to fill out:
Grade Mean SD Variance
10th
11th
12th
Gades Difference of means
Variability of Groups
T-score P-value
10th vs actual
11th vs actual
12th vs actual
10th vs 11th
10th vs 12th
11th vs 12th
Write a conclusion based on your analysis. Remember, just because p < 0.5 it doesn’t necessarily mean you hypothesis is supported!
top related