kendall’s tau correlation
DESCRIPTION
Kendall’s Tau Correlation. 2013/12/10. Definition. The Kendall’s tau correlation is another non-parametric correlation coefficient - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Kendall’s Tau Correlation](https://reader035.vdocuments.mx/reader035/viewer/2022081420/568162e0550346895dd364c3/html5/thumbnails/1.jpg)
Kendall’s Tau Correlation
2013/12/10
![Page 2: Kendall’s Tau Correlation](https://reader035.vdocuments.mx/reader035/viewer/2022081420/568162e0550346895dd364c3/html5/thumbnails/2.jpg)
The Kendall’s tau correlation is another non-parametric correlation coefficient
Let x1, …, xn be a sample for random variable x and let y1, …, yn be a sample for random variable y of the same size n. There are C(n, 2) possible ways of selecting distinct pairs (xi, yi) and (xj, yj). For any such assignment of pairs, define each pair as concordant, discordant or neither as follows:
Definition
![Page 3: Kendall’s Tau Correlation](https://reader035.vdocuments.mx/reader035/viewer/2022081420/568162e0550346895dd364c3/html5/thumbnails/3.jpg)
concordant if (xi > xj and yi > yj) or (xi < xj and yi < yj) discordant if (xi > xj and yi < yj) or (xi < xj and yi > yj) neither if xi = xj or yi = yj (i.e. ties are not
counted).
![Page 4: Kendall’s Tau Correlation](https://reader035.vdocuments.mx/reader035/viewer/2022081420/568162e0550346895dd364c3/html5/thumbnails/4.jpg)
Now let C = the number of concordant pairs and D = the number of discordant pairs. Then define tau as
)2,(nCDC
![Page 5: Kendall’s Tau Correlation](https://reader035.vdocuments.mx/reader035/viewer/2022081420/568162e0550346895dd364c3/html5/thumbnails/5.jpg)
To easily calculate C – D, it is best to first put all the x data elements in ascending order. If x and y are perfectly correlated, then all the values of y would be in ascending order too. Otherwise, there will be some inversions. For each i, count the number of j > i for which xj < xi. This sum is D. If there are no ties, then C = C(n, 2) – D.
Observation from definition
![Page 6: Kendall’s Tau Correlation](https://reader035.vdocuments.mx/reader035/viewer/2022081420/568162e0550346895dd364c3/html5/thumbnails/6.jpg)
The value of τ is : This is a result of the fact that there are C(n, 2) pairings. If there are a large number of ties, then
C(n,2) should be replaced by where nx is the number of ties
involving x and ny is the number of ties involving y.
Property 111
])2,([])2,([ yx nnCnnC
![Page 7: Kendall’s Tau Correlation](https://reader035.vdocuments.mx/reader035/viewer/2022081420/568162e0550346895dd364c3/html5/thumbnails/7.jpg)
The calculation of ny is similar to that of D given above, namely for each i, count the number of j > i for which xi = xj. This sum is ny. Calculating nx is similar, although easier since the xi are in ascending order. Once D, nx and ny are determined then C = C(n, 2) – D– nx – ny. This works well assuming that there are no values of i and j for which xi = xj and yi = yj.
![Page 8: Kendall’s Tau Correlation](https://reader035.vdocuments.mx/reader035/viewer/2022081420/568162e0550346895dd364c3/html5/thumbnails/8.jpg)
there is a commonly accepted measure of standard error for Kendall’s tau, namely
For sufficiently large n (generally n ≥ 10), the following statistic has a standard normal distribution and so can be used for testing the null hypothesis of zero correlation.
Property 2
)2,(52
31
nCns
52)2,(3
nnC
sz
![Page 9: Kendall’s Tau Correlation](https://reader035.vdocuments.mx/reader035/viewer/2022081420/568162e0550346895dd364c3/html5/thumbnails/9.jpg)
For smaller values of n the table of critical values found in Kendall’s Tau Table can be used.
![Page 10: Kendall’s Tau Correlation](https://reader035.vdocuments.mx/reader035/viewer/2022081420/568162e0550346895dd364c3/html5/thumbnails/10.jpg)
A study is designed to check the relationship between smoking and longevity. A sample of 15 men fifty years and older was taken and the average number of cigarettes smoked per day and the age at death was recorded, as summarized in the table in Figure. Can we conclude from the sample that longevity is independent of smoking?
Example of Kendall’s τ Correlation
![Page 11: Kendall’s Tau Correlation](https://reader035.vdocuments.mx/reader035/viewer/2022081420/568162e0550346895dd364c3/html5/thumbnails/11.jpg)
![Page 12: Kendall’s Tau Correlation](https://reader035.vdocuments.mx/reader035/viewer/2022081420/568162e0550346895dd364c3/html5/thumbnails/12.jpg)
We begin by sorting the original data in ascending order by longevity and then creating entries for inversions as ties as described above
take a look at how we calculate the value in cell C8, i.e. the number of inversions for the data element in row 8. Since the number of cigarettes smoked by that person is 14 (the value in cell B8), we count the entries in column B below B8 that have value smaller than 14. This is 5 since only the entries in cells B10, B14, B15, B16 and B18 have smaller values. We carry out the same calculation for each of the rows and sum the result to get 76 (the value in cell C19).
![Page 13: Kendall’s Tau Correlation](https://reader035.vdocuments.mx/reader035/viewer/2022081420/568162e0550346895dd364c3/html5/thumbnails/13.jpg)
This calculation is carried out by putting the array formula =COUNTIF(B5:B18,”<”&B4) in cell C4.
Ties are handled in a similar way, using, for example, the array formula =COUNTIF(B5:B18,”=”&B4) in cell E4.
Since p-value < α, the null hypothesis is rejected, and so we conclude there is a negative correlation between smoking and longevity.
![Page 14: Kendall’s Tau Correlation](https://reader035.vdocuments.mx/reader035/viewer/2022081420/568162e0550346895dd364c3/html5/thumbnails/14.jpg)
We can also establish 95% confidence interval for tau as follows:
τ ± Zcritical ∙ sτ = -0.471 ± (1.96)(.192) = (-0.848 , -0.094)
![Page 15: Kendall’s Tau Correlation](https://reader035.vdocuments.mx/reader035/viewer/2022081420/568162e0550346895dd364c3/html5/thumbnails/15.jpg)
Correlation in Relationship to t-test
2013/12/10
![Page 16: Kendall’s Tau Correlation](https://reader035.vdocuments.mx/reader035/viewer/2022081420/568162e0550346895dd364c3/html5/thumbnails/16.jpg)
Two sample comparison of means testing can be turned into a correlation problem by combining the two samples into one (random valuable x) and setting the random variable y (the dichotomous variable) to 0 for elements in one sample and to 1 for elements in the other sample.
It turns out that the two-sample analysis using the t-test is equival to the analysis of the correlation coefficient using the t-test.
![Page 17: Kendall’s Tau Correlation](https://reader035.vdocuments.mx/reader035/viewer/2022081420/568162e0550346895dd364c3/html5/thumbnails/17.jpg)
To investigate the effect of a new hay fever drug on driving skills, a researcher studies 24 individuals with hay fever: 12 who have been taking the drug and 12 who have not. All participants then entered a simulator and were given a driving test which assigned a score to each driver as summarized in Figure.
calculate the correlation coefficient γ for x and y, and then test the null hypothesis H0: ρ = 0.
Example
![Page 18: Kendall’s Tau Correlation](https://reader035.vdocuments.mx/reader035/viewer/2022081420/568162e0550346895dd364c3/html5/thumbnails/18.jpg)
![Page 19: Kendall’s Tau Correlation](https://reader035.vdocuments.mx/reader035/viewer/2022081420/568162e0550346895dd364c3/html5/thumbnails/19.jpg)
H0: μcontrol = μdrug
Since t = 0.1 < 2.07 = tcrit ( p-value = 0.921 > α =0.05 ) we retain the null
hypothesis; i.e. we are 95% confident that any difference between the two groups is due to chance.
Two sample data analysis
![Page 20: Kendall’s Tau Correlation](https://reader035.vdocuments.mx/reader035/viewer/2022081420/568162e0550346895dd364c3/html5/thumbnails/20.jpg)
Using correlation testing
![Page 21: Kendall’s Tau Correlation](https://reader035.vdocuments.mx/reader035/viewer/2022081420/568162e0550346895dd364c3/html5/thumbnails/21.jpg)
The values for p-value and t are exactly the same as those that result from the t-test in Example, again we conclude that the hay fever drug did not offer any significant improvement in driving results as compared to the control.
correlation coefficient γ correlation degree
0.8 above very high
0.6 - 0.8 high
0.4 - 0.6 normal
0.2 - 0.4 low
0.2 below very low
![Page 22: Kendall’s Tau Correlation](https://reader035.vdocuments.mx/reader035/viewer/2022081420/568162e0550346895dd364c3/html5/thumbnails/22.jpg)
A variable is dichotomous if it only takes two values (usually set to 0 and 1).
The point-biserial correlation coefficient is simply the Pearson’s product-moment correlation coefficient where one or both of the variables are dichotomous.
Property 1
dftt
2
22
![Page 23: Kendall’s Tau Correlation](https://reader035.vdocuments.mx/reader035/viewer/2022081420/568162e0550346895dd364c3/html5/thumbnails/23.jpg)
where t is the test statistic for two means hypothesis testing of variables x1 and x2 with t ~T(df), x is a combination of x1 and x2 and y is the dichotomous variable
yx
yx
nns
yxt11
)()(
![Page 24: Kendall’s Tau Correlation](https://reader035.vdocuments.mx/reader035/viewer/2022081420/568162e0550346895dd364c3/html5/thumbnails/24.jpg)
The effect size for the comparison of two means is given by
This means that the difference between the average memory recall score between the control group and the sleep deprived group is only about 4.1% of a standard deviation. Note that this is the same effect size that was calculated in Example
Observation from Example
041.0)1()(2
21
22121
nnnndf
sxxd
![Page 25: Kendall’s Tau Correlation](https://reader035.vdocuments.mx/reader035/viewer/2022081420/568162e0550346895dd364c3/html5/thumbnails/25.jpg)
Alternatively, we can use φ (phi) as a measure of effect size. Phi is nothing more than r. For this example φ = r = 0.0214. Since r2 = 0.00046, we know that 0.46% of the variation in the memory recall scores is based on the amount of sleep.
A rough estimate of effect size is that r = 0.5 represents a large effect size (explains 25% of the variance), r = 0.3 represents a medium effect size (explains 9% of the variance), and r = 0.1 represents a small effect size (explains 1% of the variance).
![Page 26: Kendall’s Tau Correlation](https://reader035.vdocuments.mx/reader035/viewer/2022081420/568162e0550346895dd364c3/html5/thumbnails/26.jpg)
Correlation and Chi-square Test for Independence
2013/12/10
![Page 27: Kendall’s Tau Correlation](https://reader035.vdocuments.mx/reader035/viewer/2022081420/568162e0550346895dd364c3/html5/thumbnails/27.jpg)
In Independence Testing we used the chi-square test to determine whether two variables were independent. We now look at the Example using dichotomous variables.
![Page 28: Kendall’s Tau Correlation](https://reader035.vdocuments.mx/reader035/viewer/2022081420/568162e0550346895dd364c3/html5/thumbnails/28.jpg)
A researcher wants to know whether there is a significant difference in two therapies for curing patients of cocaine dependence (defined as not taking cocaine for at least 6 months). She tests 150 patients and obtains the results in the figure. Calculate the point-biserial correlation coefficient for the data using dichotomous variables.
Example 1
![Page 29: Kendall’s Tau Correlation](https://reader035.vdocuments.mx/reader035/viewer/2022081420/568162e0550346895dd364c3/html5/thumbnails/29.jpg)
the point-biserial correlation coefficient :
: the average of group A St : the standard deviation of group A and B A : the ratio of chosen one of A
AX
![Page 30: Kendall’s Tau Correlation](https://reader035.vdocuments.mx/reader035/viewer/2022081420/568162e0550346895dd364c3/html5/thumbnails/30.jpg)
Chi-square tests for independence
![Page 31: Kendall’s Tau Correlation](https://reader035.vdocuments.mx/reader035/viewer/2022081420/568162e0550346895dd364c3/html5/thumbnails/31.jpg)
This time let x = 1 if the patient is cured and x = 0 if the patient is not cured, and let y = 1 if therapy 1 is used and y = 0 if therapy 2 is used. Thus for 28 patients x = 1 and y = 1, for 10 patients x = 0 and y = 1, for 48 patients x = 1 and y = 0 and for 46 patients x = 0 and y = 0. If we list all 150 pairs of x and y in a column we can calculate the correlation coefficient to get r = 0.207.
Using dichotomous variables
![Page 32: Kendall’s Tau Correlation](https://reader035.vdocuments.mx/reader035/viewer/2022081420/568162e0550346895dd364c3/html5/thumbnails/32.jpg)
if ρ = 0 (the null hypothesis), then This property provides an alternative
method for carrying out chi-square tests such as the one we did.
Property 1)1(~ 22 nr
![Page 33: Kendall’s Tau Correlation](https://reader035.vdocuments.mx/reader035/viewer/2022081420/568162e0550346895dd364c3/html5/thumbnails/33.jpg)
Using Property 1 in Example 1, determine whether there is a significant difference in the two therapies for curing patients of cocaine dependence based on the data in Figure.
Example 2
![Page 34: Kendall’s Tau Correlation](https://reader035.vdocuments.mx/reader035/viewer/2022081420/568162e0550346895dd364c3/html5/thumbnails/34.jpg)
the p-value = CHITEST(5.67,1) = 0.017 < α = 0.05, we again reject the null hypothesis and conclude there is a significant difference between the two therapies.
![Page 35: Kendall’s Tau Correlation](https://reader035.vdocuments.mx/reader035/viewer/2022081420/568162e0550346895dd364c3/html5/thumbnails/35.jpg)
If we calculate the value of for independence as in Independence Testing, from the previous observation we conclude that
This gives us a way to measure the effect of the chi-square test of independence.
Observation from Example 2
2
nr
2
![Page 36: Kendall’s Tau Correlation](https://reader035.vdocuments.mx/reader035/viewer/2022081420/568162e0550346895dd364c3/html5/thumbnails/36.jpg)
there is clearly an important difference between the two therapies (not just a significant difference), but if you look at r we see that only 4.3% of the variance is explained by the choice of therapy.
![Page 37: Kendall’s Tau Correlation](https://reader035.vdocuments.mx/reader035/viewer/2022081420/568162e0550346895dd364c3/html5/thumbnails/37.jpg)
we calculated the correlation coefficient of x with y by listing all 132 values and then using Excel’s correlation function CORREL. The following is an alternative approach for calculating r, which is especially useful if n is very large.
data needed for calculating
Observation from Example 1
![Page 38: Kendall’s Tau Correlation](https://reader035.vdocuments.mx/reader035/viewer/2022081420/568162e0550346895dd364c3/html5/thumbnails/38.jpg)
First we repeat the data from Figure in Example 1 using the dummy variables x and y (in range F4:H7). Essentially this is a frequency table. We then calculate the mean of x and y. E.g. the mean of x (in cell F10) is calculated by the formula =SUMPRODUCT(F4:F7,H4:H7)/H8.
![Page 39: Kendall’s Tau Correlation](https://reader035.vdocuments.mx/reader035/viewer/2022081420/568162e0550346895dd364c3/html5/thumbnails/39.jpg)
Next we calculate , , (in cells L8, M8 and N8). E.g. the first of
these terms is calculated by the formula =SUMPRODUCT(L4:L7,O4:O7). Now the point-serial correlation coefficient is the first of these terms divided by the square root of the product of the other two, i.e.
))(( yyxx ii 2)( xxi 2)( yyi
)88(8NM
Lr
![Page 40: Kendall’s Tau Correlation](https://reader035.vdocuments.mx/reader035/viewer/2022081420/568162e0550346895dd364c3/html5/thumbnails/40.jpg)
Thank you for listening !