correlation
Post on 20-Nov-2014
182 Views
Preview:
TRANSCRIPT
04/08/23 Dr Tarek Amin 1
Investigating the Investigating the Relationship between Two Relationship between Two
or More Variablesor More Variables(Correlation)(Correlation)
Dr. Tarek TawfikDr. Tarek Tawfik
The Relationship Between The Relationship Between VariablesVariables
The variables can be categorized into two main types when investigating their relationship:
I-Dependent: A dependent variable is explained or affected by an independent variable.
Age and height
04/08/23 Dr Tarek Amin 3
II-Independent variable: Two variables are independent if the pattern of variation in the scores for one variable is not related or associated with variation in the scores for the other variable.
The level of education in Ecuador and the infant mortality in Mali
Techniques used to Analyze Techniques used to Analyze the Relationship between Two the Relationship between Two
VariablesVariablesMethodExamples
Tabular and graphical methods:
These present data in way that reveals a possible relationship between two variables.
Numerical methods:
Mathematical operations used to quantify, in a single number, the strength and direction of a relationship (measures
of association).
Bivariate table (2x2 table) for categorical data (nominal/ordinal data)
Scatter plot for interval/ratio.
Lambda, Cramer’s V (nominal)
Gamma, Somer’s d, Kendall’s tau-b/c (ordinal with few values)
Spearman’s rank order Co/Co.
)ordinal scales with many values(
Pearson’s product moment correlation (Interval/ratio)
Regression
These techniques are called collectively as Bi-variate descriptive statistics
04/08/23 Dr Tarek Amin 5
CorrelationCorrelation Correlational techniques are used to
study relationships. They may be used in exploratory studies in which one need to determine whether relationships exist, and
In hypothesis testing about a particular relationship.
Pearson Correlation Pearson Correlation Numeric (interval/ratio)Numeric (interval/ratio)
The Pearson product moment correlation coefficient (r or rho) is the usual method by which the relation between two variables is quantified.
Type of data required:
Interval/ratio sometimes ordinal data.At least two measures on each subjects at the interval/ratio level.
04/08/23 Dr Tarek Amin 7
Assumptions:Certain assumptions must be made if we are to generalize beyond the sample statistics; that if we are to make inference about the population itself:
The sample must be representative of the population.The variables that are being correlated must be normally distributed.The relationship between variables must be LINEAR.
04/08/23 Dr Tarek Amin 8
Directions of Correlations on Directions of Correlations on Scatter PlotScatter Plot
Positive Negative
No Correlation Non-linear (Curvilinear)
Correlation CoefficientCorrelation CoefficientThe correlation coefficient r allows us to
state mathematically the relationship that exists between two variables.
The correlation coefficient may range from +1.00 through 0.00 to – 1.00.
A + 1.00 indicates a perfect positive relationship ,
0.00 indicates no relationship, and -1.00 indicates a perfect negative
relationship.
The correlation coefficient also tell us the type of relation that exists; that is, whether is positive or negative.
- The relationship between job satisfaction and job turnover has been shown to be negative; an inverse relationship exists between them.
- When one variable increases, the other decreases.
- Those with higher grades have lower dropout rates (a positive relationship).
- Increases in the score of one variable is accompanied by increase in the other.
Relationships Measured Relationships Measured with Correlation with Correlation
CoefficientCoefficient
The correlation coefficient is the cross products of the z-scores.
nzXzYr Where:
ZX= the z-score of variable XZY= the z-score of variable YN= number of observations
04/08/23 Dr Tarek Amin 12
Relationships Measured by Relationships Measured by Correlation CoefficientsCorrelation Coefficients::
When using the formula with z-scores, r is the average of the corss-products of the z-scores.
nzXzYr
A five subjects took a quiz X, on which the scores ranged from 6 to 10 and an examination Y, on which the scores ranged form
82 to 98. Calculate r and determine the pattern of correlation.
04/08/23 Dr Tarek Amin 13
Formula for calculating Formula for calculating correlation coefficient correlation coefficient rr..
nzXzYr
A perfect positive A perfect positive relationship between two relationship between two
variablesvariables..subjectsX (quiz)Y
(examination)zXzYzXzY
1
2
3
4
5
6
7
8
9
10
82
86
90
94
98
-1.42
-0.71
0.00
0.71
1.42
-1.42
0.71
0.00
0.71
1.42
2.0
0.5
0.0
0.5
2.0
mean X= 8 SD=1.41 mean Y= 90 SD=5.66 ∑zXzY= 5.00
r = ∑zXzY/n = 5.00/5 = +1
Positive Correlation
80828486889092949698
100
0 5 10 15
X score
Y s
co
re
Perfect negative Perfect negative relationshiprelationship
Subjects XYzXzYzXzY
1
2
3
4
5
6
7
8
9
10
98
94
90
86
82
-1.42
-0.71
00.0
0.71
1.42
1.42
0.71
0.00
-0.71
-1.42
-2.0
-0.5
0.0
-0.71
-2.0
Mean X =8SD= 1.41
Mean Y= 90SD= 5.66 zXzY= -5.00∑
nzXzYr - =5.0/5-=1.0
Negative Correlation
80828486889092949698
100
0 5 10 15
X score
Y s
core
04/08/23 Dr Tarek Amin 18
No relationshipNo relationship
Subjects XYzXzYzXzY
1
2
3
4
5
6
7
8
9
10
94
82
90
98
86
-1.42
-0.71
0.00
0.71
1.42
0.71
-1.42
0.00
1.42
-0.71
-1.0
1.0
0.0
1.0
-1.0
Mean X= 8SD= 1.41
Mean Y= 90SD= 5.66
zXzY= 0.00 ∑
0.00/5=0.00
No Correlation
80828486889092949698
100
0 5 10 15
X score
Y s
co
re
Kass et al., 1991Kass et al., 1991Five variables were included, smoking history in ordinal, scored from 0 to 2 (0=never, 1= quit, 2=
still smoking), depressed state of mind is also ordinal ranging from 1 (rarely) to 4 (routinely); overall state of health is a 10 points rating (1= very ill to 10 = very healthy); quality of life in the past 6 months is a 6 points scale (1= very dissatisfied, to 6= extremely happy).
The total score on the Inventory of Positive Psychological Attitude (IPPA) ranges from 30 to 210.
04/08/23 Dr Tarek Amin 21
• Correlation coefficient was calculated to draw the following conclusions regard smoking behavior and the quality of life among the included sample (a 95 % level of significance was selected).
Smoking HistoryDepressed state of mind
Overall state of health
Quality of lifeTotal IPPA score
Smoking History Pearson r
Sig.(2 tailed)
No.
Depressed state of mind Pearson
r
Sig.(2 tailed)
No.
.227*
.000
442
Overall state of health
Pearson r
Sig.(2 tailed)
No.
.200*
.000
441
-.409*
.000
444
Quality of life
Pearson r
Sig.(2 tailed)
No.
-.102*
.033
440
-.513**
.000
443
.437**
.000
420
Total IPPA score
Pearson r
Sig.(2 tailed)
No .
-.147**
.000
418
-.674**
.000
421
.457**
.000
420
.599**
.000
419
04/08/23 Dr Tarek Amin 23
Because the means and standard deviations of any given two sets of variables are different, we cannot directly compare the two scores.
However, we can, transform them from the ordinary absolute figures to z-scores with a mean of 0 and SD of 1.
The correlation is the mean of the cross-products of the z-score for each value included, a measure of how much each pair of observations (scores) varies together.
Strength of the Correlation Strength of the Correlation CoefficientCoefficient
How large r should for it to be useful? In decision making at least 0.95 while those concerning human behaviors 0.5 is fair.The strengths of r are as follow:
0.00-0.25 little if any. 0.26- 0.49 LOW
0.50 -0.69 Moderate 0.70 - 0.89 High
0.90 – 1.00 Very high.
The direction of the relationship does not affect the strength of the relationship: a correlation of -.90 is just high, or just as strong, as one of + .90.
Significance of the CorrelationSignificance of the CorrelationThe level of statistical significance is greatly affected by the sample size n.If r is based on a sample of 1,000, there is much greater likelihood that it represents the r of the population (minimum random variation) than if it were based on 10.
With a two-tailed test and a sample of 100, r= 0.20 is statistically significant at the 0.05 level, but with a sample of 10, the correlation must be high (0.632 or more) to be significant.
04/08/23 Dr Tarek Amin 26
‘With large sample sizes rs that are described as demonstrating (little if any) relationship are statistically significant’
Statistical significance implies that r did not occur by chance, the relationship is greater than zero.
The following table is SPSS output describing the correlation between age, education in years ,
smoking history, satisfaction with the current weight, and the overall state of health for a randomly selected subjects.
Overall state of health
Satisfaction with current weight
Smoking history
Education in years
Subject's age
Subject's age Pearson Correlation Sig.(2 tailed) N
.022
.649419
Education in yearsPearson Correlation Sig.(2 tailed) N
-.108*.026423
.143**.003432
Smoking history Pearson Correlation Sig.(2 tailed) N
-.009.849440
.033
.493424
-.077.109432
Satisfaction with current weight Pearson Correlation Sig.(2 tailed) N
.370*.000443
-.200*.000441
.149**.000425
-.126**.009433
Overall state of health Pearson Correlation Sig.(2 tailed) N
*Correlation is significant at the 0.05 level (2-tailed).
** Correlation is significant at the 0.01 level (2-tailed).
Spearman’s rank-order Spearman’s rank-order correlation coefficientcorrelation coefficient
Used when ordinal data have a wide range of possible scores and collapsing of such data is not possible. (more than 5 categories are included).
Where we have two ordinal scales with a large number of values, or one ordinal and one interval/ratio.
Spearman’s rho or Spearman’s rank-order correlation coefficient is indicated.
04/08/23 Dr Tarek Amin 29
ExampleExample
A physiotherapist uses a new treatment on a group of patients and is interested in whether their ages affect their ability to respond to treatment .
Each patient is given a mobility score out of 15, according to his or her ability to perform certain tasks.
Age and mobility score with Age and mobility score with rankingsrankings
Patient Age Ranking on age
MobilityRanking on
mobility1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
23
25
28
30
35
37
38
39
40
41
45
50
52
55
60
62
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
14
15
12
8
13
10
11
8
10
9
10
9
7
8
4
6
15
16
13
5
14
10
12
5
10
7.5
10
7.5
3
5
1
2
Calculating the value the Calculating the value the difference in rank for each difference in rank for each
personpersonPatient Ranking on ageRanking on
mobilityRank
difference D D2
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
15
16
13
5
14
10
12
5
10
7.5
10
7.5
3
5
1
2
1-15=14
2-16=14
3-13=10
-1
-9
-4
-5
3
-1
2.5
1
4.5
10
9
14
14
196
196
100
1
81
16
25
9
1
6.25
1
20.25
100
81
196
196
∑D2 =1225.5
04/08/23 Dr Tarek Amin 32
Calculating Spearman’s Calculating Spearman’s rhorho..
16
12
2
nn
Drs
=1-6(1225.5/)16(16x16-1- =)0.8
04/08/23 Dr Tarek Amin 33
Thank you
top related