correlation coefficient linear regression معامل الارتباط و الانحدار

47
CORRELATION COEFFICIENT LINEAR REGRESSION دار ح ن لاطوا ا ب تر لالا ما ع مBy: Amani Albraikan

Upload: eze

Post on 24-Feb-2016

67 views

Category:

Documents


5 download

DESCRIPTION

CORRELATION COEFFICIENT LINEAR REGRESSION معامل الارتباط و الانحدار. By: Amani Albraikan. Pearson r Spearman rho. Factors Affecting Correlation. Linearity Range restrictions Outliers Beware of spurious correlations….take care in interpretation - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: CORRELATION COEFFICIENT LINEAR  REGRESSION معامل الارتباط  و  الانحدار

CORRELATION COEFFICIENTLINEAR REGRESSION

االنحدار و االرتباط معامل

By: Amani Albraikan

Page 2: CORRELATION COEFFICIENT LINEAR  REGRESSION معامل الارتباط  و  الانحدار

Pearson r Spearman rho

Page 3: CORRELATION COEFFICIENT LINEAR  REGRESSION معامل الارتباط  و  الانحدار

Factors Affecting Correlation

Linearity Range restrictions Outliers

Beware of spurious correlations….take care in interpretation◦ High positive correlation between a country’s infant

mortality rate and the no. of physicians per 100,000 population

Page 4: CORRELATION COEFFICIENT LINEAR  REGRESSION معامل الارتباط  و  الانحدار

General Overview of Correlational Analysis The purpose is to measure the strength of a

linear relationship between 2 variables. A correlation coefficient does not ensure

“causation” (i.e. a change in X causes a change in Y)

X is typically the Input, Measured, or Independent variable.

Y is typically the Output, Predicted, or Dependent variable.

If, as X increases, there is a predictable shift in the values of Y, a correlation exists.

Page 5: CORRELATION COEFFICIENT LINEAR  REGRESSION معامل الارتباط  و  الانحدار

General Properties of Correlation Coefficients

Values can range between +1 and -1 The value of the correlation coefficient

represents the scatter of points on a scatterplot

You should be able to look at a scatterplot and estimate what the correlation would be

You should be able to look at a correlation coefficient and visualize the scatterplot

Page 6: CORRELATION COEFFICIENT LINEAR  REGRESSION معامل الارتباط  و  الانحدار

Perfect Linear Correlation Occurs when all the points in a

scatterplot fall exactly along a straight line.

Page 7: CORRELATION COEFFICIENT LINEAR  REGRESSION معامل الارتباط  و  الانحدار

Positive CorrelationDirect Relationship

As the value of X increases, the value of Y also increases

Larger values of X tend to be paired with larger values of Y (and consequently, smaller values of X and Y tend to be paired)

Page 8: CORRELATION COEFFICIENT LINEAR  REGRESSION معامل الارتباط  و  الانحدار

Negative CorrelationInverse Relationship

• As the value of X increases, the value of Y decreases

• Small values of X tend to be paired with large value of Y (and vice versa).

Page 9: CORRELATION COEFFICIENT LINEAR  REGRESSION معامل الارتباط  و  الانحدار

Non-Linear Correlation• As the value of X increases, the

value of Y changes in a non-linear manner

Page 10: CORRELATION COEFFICIENT LINEAR  REGRESSION معامل الارتباط  و  الانحدار

No Correlation• As the value of X

changes, Y does not change in a predictable manner.

• Large values of X seem just as likely to be paired with small values of Y as with large values of Y

Page 11: CORRELATION COEFFICIENT LINEAR  REGRESSION معامل الارتباط  و  الانحدار

Interpretation Depends on what the purpose of the

study is… but here is a “general guideline”...

• Value = magnitude of the relationship

• Sign = direction of the relationship

Page 12: CORRELATION COEFFICIENT LINEAR  REGRESSION معامل الارتباط  و  الانحدار

Some of the manyTypes of Correlation Coefficients(there are lot’s more…)

Name X variable Y variable

Pearson r Interval/Ratio Interval/Ratio

Spearman rho Ordinal Ordinal

Kendall's Tau Ordinal Ordinal

Phi Dichotomous Dichotomous

Intraclass R Interval/Ratio Test

Interval/Ratio Retest

Page 13: CORRELATION COEFFICIENT LINEAR  REGRESSION معامل الارتباط  و  الانحدار

Some of the manyTypes of Correlation Coefficients(there are lot’s more…. these are the ones we willfocus on this semester)

Name X variable Y variable

Pearson r Interval/Ratio Interval/Ratio

Spearman rho Ordinal Ordinal

Kendall's Tau Ordinal Ordinal

Phi Dichotomous Dichotomous

Intraclass R Interval/Ratio Test

Interval/Ratio Retest

Included in SPSS “Bivariate Correlation” procedure

Page 14: CORRELATION COEFFICIENT LINEAR  REGRESSION معامل الارتباط  و  الانحدار

The Pearson Product-Moment Correlation (r)

Named after Karl Pearson (1857-1936)

Both X and Y measured at the Interval/Ratio level

Most widely used coefficient in the literature

Page 15: CORRELATION COEFFICIENT LINEAR  REGRESSION معامل الارتباط  و  الانحدار

The Pearson Product-Moment Correlation (r)

A measure of the extent to which paired scores occupy the same or opposite positions within their own distributions

From: Pagano (1994)

Page 16: CORRELATION COEFFICIENT LINEAR  REGRESSION معامل الارتباط  و  الانحدار

Computing Pearson r

Hand Calculation

Page 17: CORRELATION COEFFICIENT LINEAR  REGRESSION معامل الارتباط  و  الانحدار

Interpretation

r = 0.73 : p = .161

The researchers found a moderate, but not-significant, relationship between X and Y

Page 18: CORRELATION COEFFICIENT LINEAR  REGRESSION معامل الارتباط  و  الانحدار

Interpretation

r = 0.73 : p = .000

The researchers found a significant moderate relationship between X and Y

Page 19: CORRELATION COEFFICIENT LINEAR  REGRESSION معامل الارتباط  و  الانحدار

Calculation of Pearson’s Correlation Coefficient r

2222 )()( iiii

iiii

yynxxn

yxyxnr

Page 20: CORRELATION COEFFICIENT LINEAR  REGRESSION معامل الارتباط  و  الانحدار

Pearson’s Correlation Coefficient rSource data (p.202): Spice sales vs. shelf space

Correlations

1 .833**.003

10 10.833** 1.003

10 10

Pearson CorrelationSig. (2-tailed)NPearson CorrelationSig. (2-tailed)N

shelf_sp

week_sale

shelf_sp week_sale

Correlation is significant at the 0.01 level (2-tailed).**.

Page 21: CORRELATION COEFFICIENT LINEAR  REGRESSION معامل الارتباط  و  الانحدار

The point is that neither the first path nor the second one do withstand the numerical competition with the so called the Pearson product moment correlation coefficient despite its complex and apparently non attractive clothes as they are seen below:

CORRELATION COEFFICIENT

Page 22: CORRELATION COEFFICIENT LINEAR  REGRESSION معامل الارتباط  و  الانحدار

Choosing the significance level at we shall find that for 18 d.f. which allows us to reject null hypotheses that the correlation coefficient is equal to zero even at such high significance level.

Our further considerations will be related to linear regression in order to switch on the same problem but from some what different attitude.

From the other side it is reasonable to add that the correlation coefficient measures the strength of the linear relation between both considered variables. In practice it is convenient to use for statistical inferences indications shown below:

CORRELATION COEFFICIENT

Page 23: CORRELATION COEFFICIENT LINEAR  REGRESSION معامل الارتباط  و  الانحدار

The Pearson Product-Moment Correlation Coefficient The relationship between IQ scores and

grade point average? (N=12 uni students) IQ and Grade Point Average

Student No X Y X2 Y2 XY 1 110 1.0 12,100 1.00 110.0 2 112 1.6 12,544 2.56 179.2 3 118 1.2 13,924 1.44 141.6 4 119 2.1 14,161 4.41 249.9 5 122 2.6 14,884 6.76 317.2 6 125 1.8 15,625 3.24 225.0 7 127 2.6 16,129 6.76 330.2 8 130 2.0 16,900 4.00 260.0 9 132 3.2 17,424 10.24 422.4

10 134 2.6 17,956 6.76 348.4 11 136 3.0 18,496 9.00 408.0 12 138 3.6 19,044 12.96 496.8

Total 1503 27.3 189,187 69.13 3488.7

Page 24: CORRELATION COEFFICIENT LINEAR  REGRESSION معامل الارتباط  و  الانحدار

86.0856.0088.81375.69

]12

)3.27(13.69[]12

)1503(187,189[

12)3.27(15037.3488

])([])([

))((

22

22

22

NN

Nr

Page 25: CORRELATION COEFFICIENT LINEAR  REGRESSION معامل الارتباط  و  الانحدار

Example

Serotonin Levels and Aggression in Rhesus Monkeys

Subject No.

Serotonin level (microgm/gm)

X

Number of Aggressive Acts/day

Y

XY

X2

1 0.32 6.0 1.920 0.1024 2 0.35 3.8 1.330 0.1225 3 0.38 3.0 1.140 0.1444 4 0.41 5.1 2.091 0.1681 5 0.43 3.0 1.290 0.1849 6 0.51 3.8 1.938 0.2601 7 0.53 2.4 1.272 0.2809 8 0.60 3.5 2.100 0.3600 9 0.63 2.2 1.386 0.3969

Total 4.16 32.8 14.467 2.0202

Page 26: CORRELATION COEFFICIENT LINEAR  REGRESSION معامل الارتباط  و  الانحدار

0

20

40

60

80

100

120

140

40 60 80 100 120 140

r =1

Page 27: CORRELATION COEFFICIENT LINEAR  REGRESSION معامل الارتباط  و  الانحدار

0

20

40

60

80

100

120

140

40 60 80 100 120 140

r = 0.95

Page 28: CORRELATION COEFFICIENT LINEAR  REGRESSION معامل الارتباط  و  الانحدار

0

20

40

60

80

100

120

140

40 60 80 100 120 140

r = 0.7

Page 29: CORRELATION COEFFICIENT LINEAR  REGRESSION معامل الارتباط  و  الانحدار

0

20

40

60

80

100

120

140

160

40 60 80 100 120 140

r = 0.4

Page 30: CORRELATION COEFFICIENT LINEAR  REGRESSION معامل الارتباط  و  الانحدار

0

20

40

60

80

100

120

140

40 60 80 100 120 140

r = -0.4

Page 31: CORRELATION COEFFICIENT LINEAR  REGRESSION معامل الارتباط  و  الانحدار

0

20

40

60

80

100

120

140

40 60 80 100 120 140

r = -0.7

Page 32: CORRELATION COEFFICIENT LINEAR  REGRESSION معامل الارتباط  و  الانحدار

0

20

40

60

80

100

120

140

40 60 80 100 120 140

r = -0.8

Page 33: CORRELATION COEFFICIENT LINEAR  REGRESSION معامل الارتباط  و  الانحدار

0

20

40

60

80

100

120

140

40 60 80 100 120 140

r = -0.95

Page 34: CORRELATION COEFFICIENT LINEAR  REGRESSION معامل الارتباط  و  الانحدار

0

20

40

60

80

100

120

140

40 60 80 100 120 140

r = -1

Page 35: CORRELATION COEFFICIENT LINEAR  REGRESSION معامل الارتباط  و  الانحدار

0102030405060708090

100

0 10 20 30 40 50 60 70 80 90 100

Variable X

Varia

ble

Y

High Groupr = 0.67

HIGH

60

70

80

90

100

60 70 80 90 100

Variable X

Varia

ble

Y

Page 36: CORRELATION COEFFICIENT LINEAR  REGRESSION معامل الارتباط  و  الانحدار

0

20

40

60

80

100

120

140

0 20 40 60 80 100 120 140

X variable

Y va

riabl

e

MenWomen

Here’s another problem with interpreting Correlation Coefficients that you should watch out for…..

Menr = -0.21

Womenr = +0.22

All data combinedr = +0.89

Page 37: CORRELATION COEFFICIENT LINEAR  REGRESSION معامل الارتباط  و  الانحدار

Reporting a set of Correlation Coefficients in a table

Complete correlation matrix.Notice redundancy.

Lower triangular correlation matrix. Values are not repeated. There is also an upper triangular matrix!

Page 38: CORRELATION COEFFICIENT LINEAR  REGRESSION معامل الارتباط  و  الانحدار

Spearman Rho (rs) Named after Charles E.

Spearman (1863-1945) Assumptions:

◦ Data consist of a random sample of n pairs of numeric or non-numeric observations that can be ranked.

◦ Each pair of observations represents two measurement taken on the same object or individual.

Photo from: http://www.york.ac.uk/depts/maths/histstat/people/sources.htm

Page 39: CORRELATION COEFFICIENT LINEAR  REGRESSION معامل الارتباط  و  الانحدار

Why choose Spearman rhoinstead of a Pearson r?

Both X and Y are measured at the ordinal level

Sample size is small X and Y are measured at the

interval/ratio level, but are not normally distributed (e.g. are severely skewed)

X and Y do not follow a bivariate normal distribution

Page 40: CORRELATION COEFFICIENT LINEAR  REGRESSION معامل الارتباط  و  الانحدار

Doctor ID Systolic Diastolic1 141.8 89.72 140.2 74.43 131.8 83.54 132.5 77.85 135.7 85.86 141.2 86.57 143.9 89.48 140.2 89.39 140.8 88.010 131.7 82.211 130.8 84.612 135.6 84.413 143.6 86.314 133.2 85.9

Table 1.Mean blood pressure readings, millimeters mercury, by doctor.

Doctor ID Systolic Diastolic R(systolic)11 130.8 84.6 110 131.7 82.2 23 131.8 83.5 34 132.5 77.8 414 133.2 85.9 512 135.6 84.4 65 135.7 85.8 72 140.2 74.4 8.58 140.2 89.3 8.59 140.8 88.0 106 141.2 86.5 111 141.8 89.7 1213 143.6 86.3 137 143.9 89.4 14

Table 1.Mean blood pressure readings, millimeters mercury, by doctor.

Doctor ID Systolic Diastolic R(systolic) R(diastolic)

2 140.2 74.4 8.5 14 132.5 77.8 4 210 131.7 82.2 2 33 131.8 83.5 3 412 135.6 84.4 6 511 130.8 84.6 1 65 135.7 85.8 7 714 133.2 85.9 5 813 143.6 86.3 13 96 141.2 86.5 11 109 140.8 88.0 10 118 140.2 89.3 8.5 127 143.9 89.4 14 131 141.8 89.7 12 14

Table 1.Mean blood pressure readings, millimeters mercury, by doctor.

Doctor ID Systolic Diastolic R(systolic) R(diastolic)

1 141.8 89.7 12 142 140.2 74.4 8.5 13 131.8 83.5 3 44 132.5 77.8 4 25 135.7 85.8 7 76 141.2 86.5 11 107 143.9 89.4 14 138 140.2 89.3 8.5 129 140.8 88.0 10 1110 131.7 82.2 2 311 130.8 84.6 1 612 135.6 84.4 6 513 143.6 86.3 13 914 133.2 85.9 5 8

Table 1.Mean blood pressure readings, millimeters mercury, by doctor.

Page 41: CORRELATION COEFFICIENT LINEAR  REGRESSION معامل الارتباط  و  الانحدار

Doctor ID Systolic Diastolic R(systolic) R(diastolic)

1 141.8 89.7 12 142 140.2 74.4 8.5 13 131.8 83.5 3 44 132.5 77.8 4 25 135.7 85.8 7 76 141.2 86.5 11 107 143.9 89.4 14 138 140.2 89.3 8.5 129 140.8 88.0 10 1110 131.7 82.2 2 311 130.8 84.6 1 612 135.6 84.4 6 513 143.6 86.3 13 914 133.2 85.9 5 8

Table 1.Mean blood pressure readings, millimeters mercury, by doctor.

Doctor ID Systolic Diastolic R(systolic) R(diastolic) di

1 141.8 89.7 12 14 -22 140.2 74.4 8.5 1 7.53 131.8 83.5 3 4 -14 132.5 77.8 4 2 25 135.7 85.8 7 7 06 141.2 86.5 11 10 17 143.9 89.4 14 13 18 140.2 89.3 8.5 12 -3.59 140.8 88.0 10 11 -110 131.7 82.2 2 3 -111 130.8 84.6 1 6 -512 135.6 84.4 6 5 113 143.6 86.3 13 9 414 133.2 85.9 5 8 -3

Table 1.Mean blood pressure readings, millimeters mercury, by doctor.

Doctor ID Systolic Diastolic R(systolic) R(diastolic) di di2

1 141.8 89.7 12 14 -2 42 140.2 74.4 8.5 1 7.5 56.253 131.8 83.5 3 4 -1 14 132.5 77.8 4 2 2 45 135.7 85.8 7 7 0 06 141.2 86.5 11 10 1 17 143.9 89.4 14 13 1 18 140.2 89.3 8.5 12 -3.5 12.259 140.8 88.0 10 11 -1 110 131.7 82.2 2 3 -1 111 130.8 84.6 1 6 -5 2512 135.6 84.4 6 5 1 113 143.6 86.3 13 9 4 1614 133.2 85.9 5 8 -3 9

di = 132.50

Table 1.Mean blood pressure readings, millimeters mercury, by doctor.

Page 42: CORRELATION COEFFICIENT LINEAR  REGRESSION معامل الارتباط  و  الانحدار

Spearman’s Rank Correlation Coefficient

)1(61 2

2

nnd

r is

D = the difference between the ranks of corresponding values of x and yn= the number of pairs of values

Page 43: CORRELATION COEFFICIENT LINEAR  REGRESSION معامل الارتباط  و  الانحدار

200 100

100 150

500 185

300 180

400 170

)1(

61 2

2

nnd

r is

Spearman’s Rank Correlation Coefficient (example)

x y xrank _ yrank _ )(ddifferencerank 2d

Page 44: CORRELATION COEFFICIENT LINEAR  REGRESSION معامل الارتباط  و  الانحدار

Interpretation of Correlation• Issue of Causality - The existence of a correlation between two variables does not imply causality - It is possible that there were other confounding variables responsible for the observed correlation, either in whole or in part • Description - Correlation analysis does serve a data reduction descriptive function to understand key variables• Prediction - The descriptive power of correlation analysis has its potential for prediction information • Common variance - The square of the correlation coefficient between two variables, , indicates that the proportion of variance in one of the variables explained the variance of the other variable.

2

r

Page 45: CORRELATION COEFFICIENT LINEAR  REGRESSION معامل الارتباط  و  الانحدار

Linear Correlation and Linear Regression - Closely Linked Linear correlation refers to the presence of a

linear relationship between two variables ie a relationship that can be expressed as a straight line

Linear regression refers to the set of procedures by which we actually establish that particular straight line, which can then be used to predict a subject’s score on one of the variables from knowledge of the subject’s score on the other variable

Page 46: CORRELATION COEFFICIENT LINEAR  REGRESSION معامل الارتباط  و  الانحدار

To draw the regression line, choose two convenient values of X (often near the extremes of the X values to ensure greater accuracy)and substitute them in the formula to obtain the corresponding Y values, and then plot these points and join with a straight line

With the regression equation, we now have a means by which to predict a score on one variable given the information (score) of another variable◦ E.g. SAT score and collegiate GPA

Page 47: CORRELATION COEFFICIENT LINEAR  REGRESSION معامل الارتباط  و  الانحدار

What to do with OutliersYou are stuck with them unless…..

Check to see if there has been a data entry error. If so, fix the data.

Check to see if these values are plausible. Is this score within the minimum and maximum score possible? If values are impossible, delete the data. Report how many scores were deleted.

Examine other variables for these subjects to see if you can find an explanation for these scores being so different from the rest. You might be able to delete them if your reasoning is sound.