correlation

33
06/06/22 Dr Tarek Amin 1 Investigating the Investigating the Relationship between Relationship between Two or More Variables Two or More Variables (Correlation) (Correlation) Dr. Tarek Tawfik Dr. Tarek Tawfik

Upload: sanjeev-nawani

Post on 20-Nov-2014

182 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Correlation

04/08/23 Dr Tarek Amin 1

Investigating the Investigating the Relationship between Two Relationship between Two

or More Variablesor More Variables(Correlation)(Correlation)

Dr. Tarek TawfikDr. Tarek Tawfik

Page 2: Correlation

The Relationship Between The Relationship Between VariablesVariables

The variables can be categorized into two main types when investigating their relationship:

I-Dependent: A dependent variable is explained or affected by an independent variable.

Age and height

Page 3: Correlation

04/08/23 Dr Tarek Amin 3

II-Independent variable: Two variables are independent if the pattern of variation in the scores for one variable is not related or associated with variation in the scores for the other variable.

The level of education in Ecuador and the infant mortality in Mali

Page 4: Correlation

Techniques used to Analyze Techniques used to Analyze the Relationship between Two the Relationship between Two

VariablesVariablesMethodExamples

Tabular and graphical methods:

These present data in way that reveals a possible relationship between two variables.

Numerical methods:

Mathematical operations used to quantify, in a single number, the strength and direction of a relationship (measures

of association).

Bivariate table (2x2 table) for categorical data (nominal/ordinal data)

Scatter plot for interval/ratio.

Lambda, Cramer’s V (nominal)

Gamma, Somer’s d, Kendall’s tau-b/c (ordinal with few values)

Spearman’s rank order Co/Co.

)ordinal scales with many values(

Pearson’s product moment correlation (Interval/ratio)

Regression

These techniques are called collectively as Bi-variate descriptive statistics

Page 5: Correlation

04/08/23 Dr Tarek Amin 5

CorrelationCorrelation Correlational techniques are used to

study relationships. They may be used in exploratory studies in which one need to determine whether relationships exist, and

In hypothesis testing about a particular relationship.

Page 6: Correlation

Pearson Correlation Pearson Correlation Numeric (interval/ratio)Numeric (interval/ratio)

The Pearson product moment correlation coefficient (r or rho) is the usual method by which the relation between two variables is quantified.

Type of data required:

Interval/ratio sometimes ordinal data.At least two measures on each subjects at the interval/ratio level.

Page 7: Correlation

04/08/23 Dr Tarek Amin 7

Assumptions:Certain assumptions must be made if we are to generalize beyond the sample statistics; that if we are to make inference about the population itself:

The sample must be representative of the population.The variables that are being correlated must be normally distributed.The relationship between variables must be LINEAR.

Page 8: Correlation

04/08/23 Dr Tarek Amin 8

Directions of Correlations on Directions of Correlations on Scatter PlotScatter Plot

Positive Negative

No Correlation Non-linear (Curvilinear)

Page 9: Correlation

Correlation CoefficientCorrelation CoefficientThe correlation coefficient r allows us to

state mathematically the relationship that exists between two variables.

The correlation coefficient may range from +1.00 through 0.00 to – 1.00.

A + 1.00 indicates a perfect positive relationship ,

0.00 indicates no relationship, and -1.00 indicates a perfect negative

relationship.

Page 10: Correlation

The correlation coefficient also tell us the type of relation that exists; that is, whether is positive or negative.

- The relationship between job satisfaction and job turnover has been shown to be negative; an inverse relationship exists between them.

- When one variable increases, the other decreases.

- Those with higher grades have lower dropout rates (a positive relationship).

- Increases in the score of one variable is accompanied by increase in the other.

Page 11: Correlation

Relationships Measured Relationships Measured with Correlation with Correlation

CoefficientCoefficient

The correlation coefficient is the cross products of the z-scores.

nzXzYr Where:

ZX= the z-score of variable XZY= the z-score of variable YN= number of observations

Page 12: Correlation

04/08/23 Dr Tarek Amin 12

Relationships Measured by Relationships Measured by Correlation CoefficientsCorrelation Coefficients::

When using the formula with z-scores, r is the average of the corss-products of the z-scores.

nzXzYr

A five subjects took a quiz X, on which the scores ranged from 6 to 10 and an examination Y, on which the scores ranged form

82 to 98. Calculate r and determine the pattern of correlation.

Page 13: Correlation

04/08/23 Dr Tarek Amin 13

Formula for calculating Formula for calculating correlation coefficient correlation coefficient rr..

nzXzYr

Page 14: Correlation

A perfect positive A perfect positive relationship between two relationship between two

variablesvariables..subjectsX (quiz)Y

(examination)zXzYzXzY

1

2

3

4

5

6

7

8

9

10

82

86

90

94

98

-1.42

-0.71

0.00

0.71

1.42

-1.42

0.71

0.00

0.71

1.42

2.0

0.5

0.0

0.5

2.0

mean X= 8 SD=1.41 mean Y= 90 SD=5.66 ∑zXzY= 5.00

r = ∑zXzY/n = 5.00/5 = +1

Page 15: Correlation

Positive Correlation

80828486889092949698

100

0 5 10 15

X score

Y s

co

re

Page 16: Correlation

Perfect negative Perfect negative relationshiprelationship

Subjects XYzXzYzXzY

1

2

3

4

5

6

7

8

9

10

98

94

90

86

82

-1.42

-0.71

00.0

0.71

1.42

1.42

0.71

0.00

-0.71

-1.42

-2.0

-0.5

0.0

-0.71

-2.0

Mean X =8SD= 1.41

Mean Y= 90SD= 5.66 zXzY= -5.00∑

nzXzYr - =5.0/5-=1.0

Page 17: Correlation

Negative Correlation

80828486889092949698

100

0 5 10 15

X score

Y s

core

Page 18: Correlation

04/08/23 Dr Tarek Amin 18

No relationshipNo relationship

Subjects XYzXzYzXzY

1

2

3

4

5

6

7

8

9

10

94

82

90

98

86

-1.42

-0.71

0.00

0.71

1.42

0.71

-1.42

0.00

1.42

-0.71

-1.0

1.0

0.0

1.0

-1.0

Mean X= 8SD= 1.41

Mean Y= 90SD= 5.66

zXzY= 0.00 ∑

0.00/5=0.00

Page 19: Correlation

No Correlation

80828486889092949698

100

0 5 10 15

X score

Y s

co

re

Page 20: Correlation

Kass et al., 1991Kass et al., 1991Five variables were included, smoking history in ordinal, scored from 0 to 2 (0=never, 1= quit, 2=

still smoking), depressed state of mind is also ordinal ranging from 1 (rarely) to 4 (routinely); overall state of health is a 10 points rating (1= very ill to 10 = very healthy); quality of life in the past 6 months is a 6 points scale (1= very dissatisfied, to 6= extremely happy).

The total score on the Inventory of Positive Psychological Attitude (IPPA) ranges from 30 to 210.

Page 21: Correlation

04/08/23 Dr Tarek Amin 21

• Correlation coefficient was calculated to draw the following conclusions regard smoking behavior and the quality of life among the included sample (a 95 % level of significance was selected).

Page 22: Correlation

Smoking HistoryDepressed state of mind

Overall state of health

Quality of lifeTotal IPPA score

Smoking History Pearson r

Sig.(2 tailed)

No.

Depressed state of mind Pearson

r

Sig.(2 tailed)

No.

.227*

.000

442

Overall state of health

Pearson r

Sig.(2 tailed)

No.

.200*

.000

441

-.409*

.000

444

Quality of life

Pearson r

Sig.(2 tailed)

No.

-.102*

.033

440

-.513**

.000

443

.437**

.000

420

Total IPPA score

Pearson r

Sig.(2 tailed)

No .

-.147**

.000

418

-.674**

.000

421

.457**

.000

420

.599**

.000

419

Page 23: Correlation

04/08/23 Dr Tarek Amin 23

Because the means and standard deviations of any given two sets of variables are different, we cannot directly compare the two scores.

However, we can, transform them from the ordinary absolute figures to z-scores with a mean of 0 and SD of 1.

The correlation is the mean of the cross-products of the z-score for each value included, a measure of how much each pair of observations (scores) varies together.

Page 24: Correlation

Strength of the Correlation Strength of the Correlation CoefficientCoefficient

How large r should for it to be useful? In decision making at least 0.95 while those concerning human behaviors 0.5 is fair.The strengths of r are as follow:

0.00-0.25 little if any. 0.26- 0.49 LOW

0.50 -0.69 Moderate 0.70 - 0.89 High

0.90 – 1.00 Very high.

The direction of the relationship does not affect the strength of the relationship: a correlation of -.90 is just high, or just as strong, as one of + .90.

Page 25: Correlation

Significance of the CorrelationSignificance of the CorrelationThe level of statistical significance is greatly affected by the sample size n.If r is based on a sample of 1,000, there is much greater likelihood that it represents the r of the population (minimum random variation) than if it were based on 10.

With a two-tailed test and a sample of 100, r= 0.20 is statistically significant at the 0.05 level, but with a sample of 10, the correlation must be high (0.632 or more) to be significant.

Page 26: Correlation

04/08/23 Dr Tarek Amin 26

‘With large sample sizes rs that are described as demonstrating (little if any) relationship are statistically significant’

Statistical significance implies that r did not occur by chance, the relationship is greater than zero.

Page 27: Correlation

The following table is SPSS output describing the correlation between age, education in years ,

smoking history, satisfaction with the current weight, and the overall state of health for a randomly selected subjects.

Overall state of health

Satisfaction with current weight

Smoking history

Education in years

Subject's age

Subject's age Pearson Correlation Sig.(2 tailed) N

.022

.649419

Education in yearsPearson Correlation Sig.(2 tailed) N

-.108*.026423

.143**.003432

Smoking history Pearson Correlation Sig.(2 tailed) N

-.009.849440

.033

.493424

-.077.109432

Satisfaction with current weight Pearson Correlation Sig.(2 tailed) N

.370*.000443

-.200*.000441

.149**.000425

-.126**.009433

Overall state of health Pearson Correlation Sig.(2 tailed) N

*Correlation is significant at the 0.05 level (2-tailed).

** Correlation is significant at the 0.01 level (2-tailed).

Page 28: Correlation

Spearman’s rank-order Spearman’s rank-order correlation coefficientcorrelation coefficient

Used when ordinal data have a wide range of possible scores and collapsing of such data is not possible. (more than 5 categories are included).

Where we have two ordinal scales with a large number of values, or one ordinal and one interval/ratio.

Spearman’s rho or Spearman’s rank-order correlation coefficient is indicated.

Page 29: Correlation

04/08/23 Dr Tarek Amin 29

ExampleExample

A physiotherapist uses a new treatment on a group of patients and is interested in whether their ages affect their ability to respond to treatment .

Each patient is given a mobility score out of 15, according to his or her ability to perform certain tasks.

Page 30: Correlation

Age and mobility score with Age and mobility score with rankingsrankings

Patient Age Ranking on age

MobilityRanking on

mobility1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

23

25

28

30

35

37

38

39

40

41

45

50

52

55

60

62

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

14

15

12

8

13

10

11

8

10

9

10

9

7

8

4

6

15

16

13

5

14

10

12

5

10

7.5

10

7.5

3

5

1

2

Page 31: Correlation

Calculating the value the Calculating the value the difference in rank for each difference in rank for each

personpersonPatient Ranking on ageRanking on

mobilityRank

difference D D2

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

15

16

13

5

14

10

12

5

10

7.5

10

7.5

3

5

1

2

1-15=14

2-16=14

3-13=10

-1

-9

-4

-5

3

-1

2.5

1

4.5

10

9

14

14

196

196

100

1

81

16

25

9

1

6.25

1

20.25

100

81

196

196

∑D2 =1225.5

Page 32: Correlation

04/08/23 Dr Tarek Amin 32

Calculating Spearman’s Calculating Spearman’s rhorho..

16

12

2

nn

Drs

=1-6(1225.5/)16(16x16-1- =)0.8

Page 33: Correlation

04/08/23 Dr Tarek Amin 33

Thank you