biserial correlation
TRANSCRIPT
Dr. Meenakshi Shukla
Assistant Professor
Department of Psychology
Magadh University
Bodh Gaya
Biserial
Correlation
What is Biserial Correlation?❑ Suppose you have a set of bivariate data from the bivariate normal distribution. The
two variables have a correlation, sometimes called the product-moment correlation
coefficient. Now suppose one of the variables is dichotomized by creating a binary
variable that is zero if the original variable is less than a certain variable and one
otherwise.
❑ For example, you may want to calculate the correlation between IQ and the score on
a certain test, but the only measurement available with whether the test was passed
or failed. You could then use the biserial correlation to estimate the more meaningful
product-moment correlation.
• The biserial correlation is a correlation between a continuous variable
and a binary variable, where the binary variable is not a true binary
variable but a continuous variable has been dichotomized to create a
binary variable.
• Biserial correlation (rbis or rb) is a correlational index that estimates the
strength of a relationship between an artificially dichotomous variable
and a true continuous variable. Both variables are assumed to be
normally distributed in their underlying populations.
Assumptions of Biserial Correlation
•Assumption #1: Both of your two variables should be measured on a continuous scale.
Assumption #2: One of the variables should be made dichotomous. Examples of such
artificial dichotomous variables include Pass or Fail, above 75 or below 75 attendance,
Happy or Sad, and so forth.
•Assumption #3: There should be no outliers for both the continuous variables. You can
test for outliers using boxplots.
•Assumption #4: Your continuous variables should be approximately normally
distributed . You can test this using the Shapiro-Wilk test of normality.
•Assumption #5: Your continuous variables should have equal variances. You can test this
using Levene's test of equality of variances.
𝑟𝑏 =𝑀1 −𝑀0
SDt
×𝑝𝑞
𝑦
Formula:
Where,
𝑀0 = mean score for data pairs for x=0,
𝑀1 = mean score for data pairs for x=1,
q = proportion of data pairs for x=0,
p = proportion of data pairs for x=1,
SDt = population standard deviation,
y = ordinate or the height of the standard normal distribution at the point which divides the
proportions of p and q
A teacher wants to determine whether there is a relationship between the results of the
students (Pass or Fail) and the number of hours per week that they devoted to their studies.
The data of 14 students is given below. Calculate biserial correlation from the data given
below:
Result Study hours
Pass 2
Pass 3
Pass 3
Pass 4
Pass 5
Pass 3
Pass 3
Pass 2
Pass 1
Fail 0
Fail 3
Fail 5
Fail 0
Fail 1
Result Study hours
Pass (p) 2
Pass (p) 3
Pass (p) 3
Pass (p) 4
Pass (p) 5
Pass (p) 3
Pass (p) 3
Pass (p) 2
Pass (p) 1
Fail (q) 0
Fail (q) 3
Fail (q) 5
Fail (q) 0
Fail (q) 1
Let’s call Pass as 1 and Fail as 0. Then, the proportion of passed students will be denoted by p and the
proportion of failed students will be denoted by q.
𝑀1 =
𝑟𝑏 =𝑀1 −𝑀0
SDt
×𝑝𝑞
𝑦
2 + 3 + 3 + 4 + 5 + 3 + 3 + 2 + 1
9
= 26
9
= 2.89
0 + 3 + 5 + 0 + 1
5
𝑀0 =
9
5=
= 1.80
=33.50
13
= 2.5769
= 1.605
𝑟𝑏 =𝑀1 −𝑀0
SDt
×𝑝𝑞
𝑦
p= 9/14 = .64
q= 5/14 = .36
y= .50 - .36 = .14
• In the ordinate table, check at .14
under ‘Area from mean’ and see the
value of ‘y’ which is the ordinate50% 50%
p
q
𝑟𝑏 =𝑀1 −𝑀0
SDt
×𝑝𝑞
𝑦
=2.89 − 1.80
1.605×.64 × .36
.3739
=1.09
1.605×.2304
.3739
= .6791 × .6162
= .42
Calculating biserial correlation from
point-biserial correlation, and vice-
versa
𝑟𝑏 =𝑟𝑝𝑏 𝑝𝑞
𝑦
Significance testing
2 × .42
5
12
514
=
= -.27
• Using z-table check the p-value to
determine significance of biserial
correlation.
• Remember to multiply the Table value
by 2 to get a two-tailed p-value in
case of a two-tailed hypothesis.
Since the p-value is .79, the biserial correlation is non-significant. This means that there
is not a significant relationship between result and study hours.
• The p-value obtained from the table is .39358. Since it is the p-value for a one-tailed test,
multiply it by 2 to get p-value for a two-tailed test.
• If you have a specific one-tailed hypothesis, then you can use the one-tailed value from the
table and will not need to multiply it by 2.
• To recap the concept of one-tailed and two-tailed tests, see the next two slides.
p-value for two-tailed test:
= .39358 x 2
= .78716
= .79
Practice question 1:
Question: From the following data, obtain biserial correlation
and interpret the result.
Negative affectivity Scores on Beck Depression
Inventory
High 0
Low 12
High 14
High 54
Low 12
High 60
Low 43
Low 36
Low 9
High 58
Practice question 2:
Question: From the following data, obtain biserial correlation
and interpret the result.
Results IQ
Above average 80
Above average 85
Above average 90
Above average 104
Above average 88
Above average 110
Below average 100
Below average 110
Below average 98
Below average 88
Help: https://www.youtube.com/watch?v=RwqkiTDCgnc&t=699s
Class Interval X (Trained) Y (Untrained)
46-50 2 3
41-45 1 4
36-40 3 5
31-35 4 5
26-30 2 2
21-25 5 5
16-20 2 4
11-15 2 1
6-10 2 2
0-5 1 1
Practice question 3:
Question: From the following data, obtain biserial correlation and interpret the result. (Hint: Use Assumed Mean
method to calculate mean and then follow the regular process of calculating biserial correlation)
Biserial Correlation using SPSS
• A teacher wants to determine whether there is a relationship between the results of the students (Pass or
Fail) and the number of hours per week that they devoted to their studies. The data of 40 students is
available.
• Therefore, two variables were created in the Variable View of SPSS Statistics: Result, which had two
categories (“Pass" and “Fail") and StudyHours (i.e., a variable denoting the number of hours per week that a
student devoted to studies).
Click Analyze > Correlate > Bivariate... on the top menu, as shown below:
You will be presented with the following Bivariate Correlations screen:
SPSS Output
Thank you…