organizing and analyzing data. types of statistical analysis descriptive statistics: organizes data...
DESCRIPTION
Descriptive Statistics: Measures of Central Tendency Mode the most frequently occurring score in a distribution Median the middle score in a distribution half the scores are above it and half are below it Mean the arithmetic average of a distribution obtained by adding the scores and then dividing by the number of scoresTRANSCRIPT
Organizing and Analyzing Data
Types of statistical analysis
• DESCRIPTIVE STATISTICS: Organizes datameasures of central tendency
mean, median, modemeasures of variability
range, standard deviation
• INFERENTIAL STATISTICS: Analyzes datameasures of association
correlation coefficientmeasures of causation
Statistical significance
Descriptive Statistics: Measures of Central Tendency
Mode the most frequently occurring score in a distribution
Median the middle score in a distribution half the scores are above it and half are below it
Mean the arithmetic average of a distribution obtained by adding the scores and then dividing by the
number of scores
Central Tendency PRACTICE
Using the data set below, compute the (3) measures of central tendency.
2, 4, 5, 7, 7, 8, 9, 9, 9, 10, 11
Descriptive Statistics: Measures of Variability
Range - the difference between the highest and lowest score in a set of data2, 4, 5, 7, 7, 8, 9, 9, 9, 10, 11
Standard deviation- a computed measure of how much scores vary around the meancalculated by finding the square root of the varianceDefines the shape of the normal distribution curve
Bell Curve: “Normal” Distribution
The red area represents the first standard deviant. 68% of the data falls within this area. The green area represents the second standard deviant. 95% of the data falls within the green PLUS the red area. Calculated by The blue area represents the third standard
deviant. 99% of the data falls within blue PLUS the green PLUS the
red area.
Standard Deviation
Two classes took a recent quiz. There were 10 students in each class, and each class had an average score of 81.5
Since the averages are the same, can we assume that the students in both classes all did pretty much the same on the exam? Why or why not?
The answer is… No.
The average (mean) does not tell us anything about the distribution or variation in the grades.
Here are Dot-Plots of the grades in each class:
Mean
So, we need to come up with some way of
measuring not just the average, but also the
spread of the distribution of our data.
Why not just give an average and the range of data (the highest and
lowest values) to describe the
distribution of the data?
Well, for example, lets say from a set of data,
the average is 17.95 and the range is 23.
But what if the data looked like this:
Here is the average
And here is the range
But really, most of the numbers are in this area, and are not evenly distributed throughout the range.
The Standard Deviation is a number
that measures how far away each
number in a set of data is from their
mean.
If the Standard Deviation is large, it means the
numbers are spread out from their mean.
If the Standard Deviation is small, it means the numbers are close to
their mean.
small,
large,
Here are the scores
on the math quiz for Team
A:
72768080818384858589
Average: 81.5
The Standard Deviation measures how far away each number in a set of data is from their mean.For example, start with the lowest score, 72. How far away is 72 from the mean of 81.5?
72 - 81.5 = - 9.5
- 9.5
- 9.5
Or, start with the lowest score, 89. How far away is 89 from the mean of 81.5?
89 - 81.5 = 7.5
7.5
So, the first step to finding
the Standard Deviation is to find all the
distances from the mean.
72768080818384858589
-9.5
7.5
Distance from Mean
So, the first step to finding
the Standard Deviation is to find all the
distances from the mean.
72768080818384858589
- 9.5- 5.5- 1.5- 1.5- 0.51.52.53.53.57.5
Distance from Mean
Next, you need to square each of
the distances
to turn them all
into positive
numbers
72768080818384858589
- 9.5- 5.5- 1.5- 1.5- 0.51.52.53.53.57.5
Distance from Mean
90.2530.25
Distances Squared
Next, you need to square each of
the distances
to turn them all
into positive
numbers
72768080818384858589
- 9.5- 5.5- 1.5- 1.5- 0.51.52.53.53.57.5
Distance from Mean
90.2530.252.252.250.252.256.25
12.2512.2556.25
Distances Squared
Add up all of the distances
72768080818384858589
- 9.5- 5.5- 1.5- 1.5- 0.51.52.53.53.57.5
Distance from Mean
90.2530.252.252.250.252.256.25
12.2512.2556.25
Distances Squared
Sum:214.5
Divide by (n - 1)
where n represents
the amount of numbers you have.
72768080818384858589
- 9.5- 5.5- 1.5- 1.5- 0.51.52.53.53.57.5
Distance from Mean
90.2530.252.252.250.252.256.25
12.2512.2556.25
Distances Squared
Sum:214.5
(10 - 1)
= 23.8
Finally, take the Square Root of
the average distance
72768080818384858589
- 9.5- 5.5- 1.5- 1.5- 0.51.52.53.53.57.5
Distance from Mean
90.2530.252.252.250.252.256.25
12.2512.2556.25
Distances Squared
Sum:214.5
(10 - 1)
= 23.8
= 4.88
This is the
Standard Deviation
72768080818384858589
- 9.5- 5.5- 1.5- 1.5- 0.51.52.53.53.57.5
Distance from Mean
90.2530.252.252.250.252.256.25
12.2512.2556.25
Distances Squared
Sum:214.5
(10 - 1)
= 23.8
= 4.88
The Standard Deviation
for the other class
grades is 15.91
57658394959698937163
- 24.5- 16.5
1.512.513.514.516.511.5
- 10.5-18.5
Distance from Mean
600.25272.25
2.25156.25182.25210.25272.25132.25110.25342.25
Distances Squared
Sum:2280.5
(10 - 1)
= 253.4
= 15.91
Now, lets compare the two classes again
Team A Team B
Average on the Quiz
Standard Deviation
81.5 81.5
4.88 15.91
Which is the “smarter” class and why?
Class A St. Dev = 4.88Class B St. Dev = 15.91
Bell Curve: “Normal” Distribution
The red area represents the first standard deviant. 68% of the data falls within this area. The green area represents the second standard deviant. 95% of the data falls within the green PLUS the red area. Calculated by The blue area represents the third standard
deviant. 99% of the data falls within blue PLUS the green PLUS the
red area.
Skew: “Non-Normal” distribution
INFERENTIAL STATISTICSCorrelational designCorrelation Coefficient: How strong is the relationship between the two variables? As one goes up does the other go slightly or more extremely up or down?
Experimental designStatistical Significance: How confident am I that the difference between my experimental group and control group is a result of the treatment?
Correlation Coefficient
A statistic that quantifies a relation between two variables
Can be either positive or negative
Falls between -1.00 and 1.00
The value of the number (not the sign) indicates the strength of the relation
Positive Correlation
Association between variables such that high scores on one variable tend to have high scores on the other variableA direct relation between the variables
Negative Correlation
Association between variables such that high scores on one variable tend to have low scores on the other variableAn inverse relation between the variables
Correlational ResearchThe correlation technique indicates the
degree of association between 2 variables
Correlations vary in direction:Positive association: increases in the
value of variable X are associated with increases in the value of variable Y
Negative association: increases in the value of variable 1 are associated with decreases in the value of variable 2
No relation: values of variable 1 are not related to variable 2 values
Correlation Correlation Coefficient
a statistical measure of the extent to which two factors vary together, and thus how well either factor predicts the other
Correlation coefficient
Indicates directionof relationship
(positive or negative)
Indicates strengthof relationship(0.00 to 1.00)
r = +.37
Check Your Learning
Which is stronger?A correlation of 0.25 or -0.74?
Misleading Correlations:Correlation is NOT Causation
Something to think aboutThere is a 0.91 correlation between ice cream consumption
and drowning deaths.
Does eating ice cream cause drowning? Does grief cause us to eat more ice cream?
45
CorrelationCorrelation is NOT
causation-e.g., armspan and height
The Limitations of Correlation
Correlation is not causation. Invisible third variables
Three Possible Causal Explanations for a Correlation
Inferential statisticsStatistical Significance: Computation that determines degree of confidence that your experimental results occurred due to the treatment and not other factors
How likely/probable are results like mine to occur by chance? a statistical computation and statement
of how likely it is that an obtained result occurred by chance
Statistical significance is calculated by determining:
the probability that the differences between sets of data occurred by chance or were the result of the experimental treatment. Statistical Significance (α) reveals the probability level that results could be obtained by chance.
Most common pre-determined value= 5%/.05(…which means that there is a 5% chance or below that
results were obtained by chance)
Statistical Significance and the Null Hypothesis
Two hypotheses need to be formed:
Research hypothesis- the one being tested by the researcher.Null hypothesis- the one that assumes that any differences within the set of data is due to chance and is not significant.
Instead of testing to find the intended result, research test the “Null” which is the OPPOSITE of one’s hypothesis.
If there is ANY difference between the control and the experimental group, and the research is confident it’s because of the IV, he/she REJECTS THE NULL.Example 1: Caffeine has NO effect on student’s ability
to stay awake past 2 a.m.Example 2: Music has NO effect on subjects’ memory
The Null Hypothesis
If there the experiment reveals ANY effect (statistical degree of significance/between the experimental and control groups) then we REJECT THE NULL.
If the Null Hypothesis is rejected, what does that mean?
Caffeine and ability to stay awake past 2 a.m
Music and memory
The Null Hypothesis
If there the experiment reveals NO effect (statistical degree of significance/between the experimental and control groups) then we ACCEPT THE NULL.If the Null Hypothesis is accepted, what
does that mean?Caffeine and ability to stay awake
past 2 a.mMusic and memory
The Null Hypothesis
STATISTICAL SIGNIFICANCE
Statistical Significance (α) reveals the probability level that results could be obtained by chance. Most common pre-determined value= 5%/.05(…which means that there is a 5% chance or below that results were obtained by chance)“Energy Drinks have no effects on AP Calculus exam results”The results reveal a level of significance:.06 (Reject or Accept the null hypothesis?).0008 ((Reject or Accept the null hypothesis?)
Statistical significanceNull Hypothesis: “There is no difference
between students’ performance on CSTs when they are fed breakfast before or not”
Statistical Significance (α)- .0555- .04- .008
ERRORS:Type I: False positive / Type II: false
negative
Reject null hypothesis when it is true
Type I error: False Positive (Drug X really has no effect!)
Fail to Reject null hypothesis Type II error: False Negative (Drug X actually does have an effect!)
Drug X has no effects on anxiety.