rxy
rxy
• When two variables are correlated, we can predict a score on one variable from a score on the other
• The stronger the correlation, the more accurate our prediction will be
rxy
• We need a measure of the “strength” of a correlation
rxy
• We need a number that gets bigger when big numbers are paired with big numbers and small numbers are paired with small numbers
• We need a number that gets smaller when big numbers are paired with small numbers and small numbers are paired with big numbers
rxy
• Remember the height/weight example:• Big number indicates this (strong positive correlation)
5’ 5’2 5’4 5’6 5’8 5’10
100 110 120 130 140 150
a
a
b
b, e
c
c
d
d
e f
f
rxy
• Remember the height/weight example:• Small number indicates this (strong negative
correlation)
5’ 5’2 5’4 5’6 5’8 5’10
100 110 120 130 140 150
a
a
b
b, e
c
c
d
d
ef
f
rxy
• Two sets of scores, xi and yi
• What could we do?
rxy
• What could we do?
€
(x iy i)i=1
n
∑
rxy
• What could we do?• When pairs are multiplied and the
products are summed up: – Greatest when big numbers paired with big
numbers and small numbers with small numbers
– Least when small numbers are paired with big numbers and big numbers are paired with small numbers
rxy
• analogy: This gets you most money
PenniesQuartersLoonies
rxy
• analogy:this gets you the least…
PenniesQuartersLoonies
rxy
• analogy:
Because:
3 x $1 plus 2 x $0.25 plus 1 x $0.01
is more than
1 x $1 plus 2 x $0.25 plus 3 x $0.01
rxy
• But there’s a problem
€
(x iy i)i=1
n
∑Not a good measure because the value ultimately depends on n AND the size of the numbers
rxy
• Try this
€
(x iy i)i=1
n
∑n
rxy
• Try this
Still not so good - doesn’t depend on n anymore, but does depend on size of x’s and y’s
€
(x iy i)i=1
n
∑n
rxy
• How about multiply deviation scores– comparing each variable relative to its
respective mean
€
(x i − x)(y i − y)i=1
n
∑n
rxy
• Multiply deviation scores
Now value depends on the spread of the data
€
(x i − x)(y i − y)i=1
n
∑n
rxy
• So standardize the scores
€
(x i − x)
Sx
(y i − y)
Syi=1
n
∑n
rxy
• This measures strength of correlation:
€
(x i − x)
Sx
(y i − y)
Syi=1
n
∑n
=
€
(zx izyi)
i=1
n
∑n
= rxy
rxy
• rxy ranges from -1.0 indicating a perfect negative correlation to +1.0 indicating a perfect positive correlation
• an rxy of zero indicates no correlation whatsoever. Scores are random with respect to each other.
rxy
• rxy also has a geometric meaning
rxy
• rxy also has a geometric meaning
• Recall that the mean of the zx and zy distributions is zero and each z-score is a deviation from the mean
rxy
• Each point lands in one of four quadrants
point zx,zy
zx
zy
rxy
• notice that:
both zx and zy are positive
€
(zx izyi)
i=1
n
∑n
rxy =
rxy
• notice that:
zx is negative and zy is positive
€
(zx izyi)
i=1
n
∑n
rxy =
rxy
• notice that:
zx is negative and zy is negative
€
(zx izyi)
i=1
n
∑n
rxy =
rxy
• notice that:
zx is positive and zy is negative
€
(zx izyi)
i=1
n
∑n
rxy =
rxy
• SoThus if most points tend to fall around a line with a positive (45 degree) slope (I and III), the cross-products will tend to be positive
III
III IV
rxy
• So
If most points tend to fall around a line with a negative slope (II and IV), the cross products will tend to be negative
Thus if most points tend to fall around a line with a positive (45 degree) slope (I and III), the cross-products will tend to be positive
III
III IV
rxy
• SoIf the points were randomly scattered about, the negative and positive cross-products cancel
Covariance
• a related measure of the relationship between scores on two different variables is the covariance
€
Sxy =(x i − x )(y i − y )
i=1
n
∑n
Covariance
• notice that the variance (S2x) is the
covariance between a variable and itself !
€
Sxy =(x i − x )(y i − y )
i=1
n
∑n
Regression
• If two variables are perfectly correlated (r = + or - 1.0) then one can exactly predict a score on one variable given a score on another
Regression
• For example: a university charges $250 registration fee plus $100 / credit
Regression
• tuition = $100(X) + $250 – where X is the number of credits
• Notice this is a linear relationship (an equation of the form y = ax + b– a = $100/credit– b = $250– x = number of credits
Regression
• Tuition as a function of credit hours is a straight line
• There is a perfect correlation between credit hours and tuition
•You could predict perfectly the tuition required given the number of credit hours
Next Time
• Regression - read chapter 8