Correlation and CausationPart II – Correlation Coefficient
This video is designed to accompany
pages 19-24
in
Making Sense of UncertaintyActivities for Teaching Statistical
ReasoningVan-Griner Publishing Company
Defining a Need
The Correlation Coefficient is simply a numerical way of summarizing the relationship you’d see between two variables that you could represent with a scatterplot.
Positive association.How strong is it?
Formula for “r”
The Correlation Coefficient is “r” measures the strength of the linear relationship between two variables “x” and “y”.
Before we compute it …
1. It is only appropriate to compute r if the scatterplot of y versus x exhibits a linear trend
2. r will always be between -1 and 1. 3. r will be negative if the points in the
scatterplot have a downward trend from left to right
4. r will be positive if the points in the scatterplot have an upward trend from left to right
5. The closer r is to 1 in absolute value the tighter the cluster of points about the linear trend and the stronger the association between x and y
6. If r is close to 0 then the association is weak.
Simple Scatterplot
15 20 25 30 35 40 45 50 55 60 6550
60
70
80
90
100
110
Scatterplot
Age
Glu
cose L
Evels
Modera
te,
posi
tive
corr
ela
tion
?
Compute It!
Subject Age x
Glucose
Level y
xy x2 y2
1 43 99 4257 1849 98012 21 65 1365 441 42253 25 79 1975 625 62414 42 75 3150 1764 56255 57 87 4959 3249 75696 59 81 4779 3481 6561
ΣΣx = 247
Σy = 486
Σxy = 20485
Σx2 = 11409
Σy2 = 40022
Scatterplots Revisited
Time Spent Studying
Stu
den
t G
rad
es
r = 0
.75
Quiz Average
Fin
al Exam
S
core
r = 0.02
GNP per capita
Lif
e E
xp
ecta
ncy a
t B
irth
Not appropriate to
use r since plot is
curved
Hours Exercised
LD
L L
evels
r = -0.93
Got it!
One-Sentence Reflection
The correlation coefficient is the most common numerical measure of the strength of a straight line relationship between two variables that can represented by a scatterplot.