organizing and analyzing data. types of statistical analysis descriptive statistics: organizes data...

Organizing and Analyzing Data

Types of statistical analysis

• DESCRIPTIVE STATISTICS: Organizes datameasures of central tendency

mean, median, modemeasures of variability

range, standard deviation

• INFERENTIAL STATISTICS: Analyzes datameasures of association

correlation coefficientmeasures of causation

Statistical significance

Descriptive Statistics: Measures of Central Tendency

Mode the most frequently occurring score in a distribution

Median the middle score in a distribution half the scores are above it and half are below it

Mean the arithmetic average of a distribution obtained by adding the scores and then dividing by the

number of scores

Central Tendency PRACTICE

Using the data set below, compute the (3) measures of central tendency.

2, 4, 5, 7, 7, 8, 9, 9, 9, 10, 11

Descriptive Statistics: Measures of Variability

Range - the difference between the highest and lowest score in a set of data2, 4, 5, 7, 7, 8, 9, 9, 9, 10, 11

Standard deviation- a computed measure of how much scores vary around the meancalculated by finding the square root of the varianceDefines the shape of the normal distribution curve

Bell Curve: “Normal” Distribution

The red area represents the first standard deviant. 68% of the data falls within this area. The green area represents the second standard deviant. 95% of the data falls within the green PLUS the red area. Calculated by The blue area represents the third standard

deviant. 99% of the data falls within blue PLUS the green PLUS the

red area.

Standard Deviation

Two classes took a recent quiz. There were 10 students in each class, and each class had an average score of 81.5

Since the averages are the same, can we assume that the students in both classes all did pretty much the same on the exam? Why or why not?

The answer is… No.

The average (mean) does not tell us anything about the distribution or variation in the grades.

Here are Dot-Plots of the grades in each class:

So, we need to come up with some way of

measuring not just the average, but also the

spread of the distribution of our data.

Why not just give an average and the range of data (the highest and

lowest values) to describe the

distribution of the data?

Well, for example, lets say from a set of data,

the average is 17.95 and the range is 23.

But what if the data looked like this:

Here is the average

And here is the range

But really, most of the numbers are in this area, and are not evenly distributed throughout the range.

The Standard Deviation is a number

that measures how far away each

number in a set of data is from their

mean.

If the Standard Deviation is large, it means the

numbers are spread out from their mean.

If the Standard Deviation is small, it means the numbers are close to

their mean.

small,

large,

Here are the scores

on the math quiz for Team

A:

72768080818384858589

Average: 81.5

The Standard Deviation measures how far away each number in a set of data is from their mean.For example, start with the lowest score, 72. How far away is 72 from the mean of 81.5?

72 - 81.5 = - 9.5

- 9.5

- 9.5

Or, start with the lowest score, 89. How far away is 89 from the mean of 81.5?

89 - 81.5 = 7.5

7.5

So, the first step to finding

the Standard Deviation is to find all the

distances from the mean.

72768080818384858589

-9.5

7.5

Distance from Mean

So, the first step to finding

the Standard Deviation is to find all the

distances from the mean.

72768080818384858589

- 9.5- 5.5- 1.5- 1.5- 0.51.52.53.53.57.5

Distance from Mean

Next, you need to square each of

the distances

to turn them all

into positive

numbers

72768080818384858589

- 9.5- 5.5- 1.5- 1.5- 0.51.52.53.53.57.5

Distance from Mean

90.2530.25

Distances Squared

Next, you need to square each of

the distances

to turn them all

into positive

numbers

72768080818384858589

- 9.5- 5.5- 1.5- 1.5- 0.51.52.53.53.57.5

Distance from Mean

90.2530.252.252.250.252.256.25

12.2512.2556.25

Distances Squared

Add up all of the distances

72768080818384858589

- 9.5- 5.5- 1.5- 1.5- 0.51.52.53.53.57.5

Distance from Mean

90.2530.252.252.250.252.256.25

12.2512.2556.25

Distances Squared

Sum:214.5

Divide by (n - 1)

where n represents

the amount of numbers you have.

72768080818384858589

- 9.5- 5.5- 1.5- 1.5- 0.51.52.53.53.57.5

Distance from Mean

90.2530.252.252.250.252.256.25

12.2512.2556.25

Distances Squared

Sum:214.5

(10 - 1)

= 23.8

Finally, take the Square Root of

the average distance

72768080818384858589

- 9.5- 5.5- 1.5- 1.5- 0.51.52.53.53.57.5

Distance from Mean

90.2530.252.252.250.252.256.25

12.2512.2556.25

Distances Squared

Sum:214.5

(10 - 1)

= 23.8

= 4.88

This is the

Standard Deviation

72768080818384858589

- 9.5- 5.5- 1.5- 1.5- 0.51.52.53.53.57.5

Distance from Mean

90.2530.252.252.250.252.256.25

12.2512.2556.25

Distances Squared

Sum:214.5

(10 - 1)

= 23.8

= 4.88

The Standard Deviation

for the other class

grades is 15.91

57658394959698937163

- 24.5- 16.5

1.512.513.514.516.511.5

- 10.5-18.5

Distance from Mean

600.25272.25

2.25156.25182.25210.25272.25132.25110.25342.25

Distances Squared

Sum:2280.5

(10 - 1)

= 253.4

= 15.91

Now, lets compare the two classes again

Team A Team B

Average on the Quiz

Standard Deviation

81.5 81.5

4.88 15.91

Which is the “smarter” class and why?

Class A St. Dev = 4.88Class B St. Dev = 15.91

Bell Curve: “Normal” Distribution

The red area represents the first standard deviant. 68% of the data falls within this area. The green area represents the second standard deviant. 95% of the data falls within the green PLUS the red area. Calculated by The blue area represents the third standard

deviant. 99% of the data falls within blue PLUS the green PLUS the

red area.

Skew: “Non-Normal” distribution

INFERENTIAL STATISTICSCorrelational designCorrelation Coefficient: How strong is the relationship between the two variables? As one goes up does the other go slightly or more extremely up or down?

Experimental designStatistical Significance: How confident am I that the difference between my experimental group and control group is a result of the treatment?

Correlation Coefficient

A statistic that quantifies a relation between two variables

Can be either positive or negative

Falls between -1.00 and 1.00

The value of the number (not the sign) indicates the strength of the relation

Positive Correlation

Association between variables such that high scores on one variable tend to have high scores on the other variableA direct relation between the variables

Negative Correlation

Association between variables such that high scores on one variable tend to have low scores on the other variableAn inverse relation between the variables

Correlational ResearchThe correlation technique indicates the

degree of association between 2 variables

Correlations vary in direction:Positive association: increases in the

value of variable X are associated with increases in the value of variable Y

Negative association: increases in the value of variable 1 are associated with decreases in the value of variable 2

No relation: values of variable 1 are not related to variable 2 values

Correlation Correlation Coefficient

a statistical measure of the extent to which two factors vary together, and thus how well either factor predicts the other

Correlation coefficient

Indicates directionof relationship

(positive or negative)

Indicates strengthof relationship(0.00 to 1.00)

r = +.37

Check Your Learning

Which is stronger?A correlation of 0.25 or -0.74?

Misleading Correlations:Correlation is NOT Causation

Something to think aboutThere is a 0.91 correlation between ice cream consumption

and drowning deaths.

Does eating ice cream cause drowning? Does grief cause us to eat more ice cream?

45

CorrelationCorrelation is NOT

causation-e.g., armspan and height

The Limitations of Correlation

Correlation is not causation. Invisible third variables

Three Possible Causal Explanations for a Correlation

Inferential statisticsStatistical Significance: Computation that determines degree of confidence that your experimental results occurred due to the treatment and not other factors

How likely/probable are results like mine to occur by chance? a statistical computation and statement

of how likely it is that an obtained result occurred by chance

Statistical significance is calculated by determining:

the probability that the differences between sets of data occurred by chance or were the result of the experimental treatment. Statistical Significance (α) reveals the probability level that results could be obtained by chance.

Most common pre-determined value= 5%/.05(…which means that there is a 5% chance or below that

results were obtained by chance)

Statistical Significance and the Null Hypothesis

Two hypotheses need to be formed:

Research hypothesis- the one being tested by the researcher.Null hypothesis- the one that assumes that any differences within the set of data is due to chance and is not significant.

Instead of testing to find the intended result, research test the “Null” which is the OPPOSITE of one’s hypothesis.

If there is ANY difference between the control and the experimental group, and the research is confident it’s because of the IV, he/she REJECTS THE NULL.Example 1: Caffeine has NO effect on student’s ability

to stay awake past 2 a.m.Example 2: Music has NO effect on subjects’ memory

The Null Hypothesis

If there the experiment reveals ANY effect (statistical degree of significance/between the experimental and control groups) then we REJECT THE NULL.

If the Null Hypothesis is rejected, what does that mean?

Caffeine and ability to stay awake past 2 a.m

Music and memory

The Null Hypothesis

If there the experiment reveals NO effect (statistical degree of significance/between the experimental and control groups) then we ACCEPT THE NULL.If the Null Hypothesis is accepted, what

does that mean?Caffeine and ability to stay awake

past 2 a.mMusic and memory

The Null Hypothesis

STATISTICAL SIGNIFICANCE

Statistical Significance (α) reveals the probability level that results could be obtained by chance. Most common pre-determined value= 5%/.05(…which means that there is a 5% chance or below that results were obtained by chance)“Energy Drinks have no effects on AP Calculus exam results”The results reveal a level of significance:.06 (Reject or Accept the null hypothesis?).0008 ((Reject or Accept the null hypothesis?)

Statistical significanceNull Hypothesis: “There is no difference

between students’ performance on CSTs when they are fed breakfast before or not”

Statistical Significance (α)- .0555- .04- .008

ERRORS:Type I: False positive / Type II: false

negative

Reject null hypothesis when it is true

Type I error: False Positive (Drug X really has no effect!)

Fail to Reject null hypothesis Type II error: False Negative (Drug X actually does have an effect!)

Drug X has no effects on anxiety.

organizing and analyzing data. types of statistical analysis descriptive statistics: organizes data...

Documents