Download - Non – Parametric Statistics
Non Non –– Parametric StatisticsParametric Statistics
ΔιατμηματικόΔιατμηματικό
ΠΜΣΠΜΣΕπαγγελματικήΕπαγγελματική
καικαι
ΠεριβαλλοντικήΠεριβαλλοντική
ΥγείαΥγεία--ΔιαχείρισηΔιαχείριση
καικαι
ΟικονομικήΟικονομική ΑποτίμησηΑποτίμηση
ΔημήτρηςΔημήτρης
ΦουσκάκηςΦουσκάκης
IntroductionIntroduction
So far in the course weSo far in the course we’’ve assumed that the ve assumed that the data come from some known distribution, e.g. data come from some known distribution, e.g. normal or the Central Limit Theory hold. normal or the Central Limit Theory hold. Methods of estimation and hypothesis testing Methods of estimation and hypothesis testing have been based on these assumption. These have been based on these assumption. These procedures are usually called procedures are usually called parametric parametric statistical methodsstatistical methods. If these assumptions are . If these assumptions are not met the not met the nonparametric statistical nonparametric statistical methodsmethods
must be used. must be used.
Revision Revision –– Inferential StatisticsInferential Statistics
Hypothesis testing versus Confidence Hypothesis testing versus Confidence IntervalsIntervalsParametric versus NonparametricParametric versus NonparametricQuantitative dataQuantitative dataCategorical dataCategorical dataRelation between two variablesRelation between two variablesRelation between several variablesRelation between several variables
What does inferential statistics do?What does inferential statistics do?
helps to quantify how certain we can be helps to quantify how certain we can be when we make inferences from a given when we make inferences from a given sample.sample.
The three approaches:a) Hypothesis testing
b) Confidence Intervalsc) Both
I know how to do a t-test, but I don’t know when!
Hypothesis TestingHypothesis TestingHHOO
: : WW==wwaa
HHAA
: : WW≠≠wwaa
αα: The Type I error or significance level of the test, is : The Type I error or significance level of the test, is usually set to a value like 5%.usually set to a value like 5%.Power = (1Power = (1--ββ), the power of the test, common value 80%. ), the power of the test, common value 80%. Power calculations: Have I chosen a correct number of Power calculations: Have I chosen a correct number of observations?observations?
Is H0 really true?
Yes No
Reject H0
Type I error α
Correct decision Power
Researcher’s decision Accept
H0
Correct decision
Type II error β
Statistical and clinical significanceStatistical and clinical significanceStatistical significance (PStatistical significance (Pvaluevalue
): ):
The probability that this sample was drawn from The probability that this sample was drawn from a population with characteristics consistent with a population with characteristics consistent with HH00
was low enough to reject Hwas low enough to reject H0. 0. (usual rule: (usual rule:
reject Hreject HOO
if Pif Pvalue value < 0.05; why 0.05 and not < 0.05; why 0.05 and not 0.04?)0.04?)
Clinical (practical) significance:Clinical (practical) significance:An important finding with implications for your An important finding with implications for your clinical practice.clinical practice.
Summary points for Summary points for PPvaluesvalues
PPvaluesvalues
, or significant levels, measure the strength , or significant levels, measure the strength of the evidence against the null hypothesis; the of the evidence against the null hypothesis; the smaller it is the stronger the evidence is.smaller it is the stronger the evidence is.An arbitrary division of results, into An arbitrary division of results, into ““significantsignificant””
or or
not, according to the not, according to the PPvaluevalue
was not the intention of was not the intention of the founders.the founders.A A PPvaluevalue
of 0.05 provides some but not strong of 0.05 provides some but not strong evidence against the null hypothesis, but it is evidence against the null hypothesis, but it is reasonable to say that reasonable to say that PPvaluevalue
<0.001 does. <0.001 does. Results of medical research should not be Results of medical research should not be reported as reported as ““significantsignificant””
or not but should be or not but should be
interpreted in the context of the type of study interpreted in the context of the type of study and other available evidence.and other available evidence.
Correct Definition of the Correct Definition of the PPvaluevalue
PPvaluevalue
is the chance of getting a test is the chance of getting a test statistics as extreme or more than the statistics as extreme or more than the observed one.observed one.
PPvaluevalue
is NOT the chance of the null is NOT the chance of the null hypothesis being right.hypothesis being right.
Confidence Confidence Intervals(C.IIntervals(C.I.).)The wrong definition:The wrong definition:There is a 95% (e.g.) chance that the parameter of interest There is a 95% (e.g.) chance that the parameter of interest will fall within the particular interval.will fall within the particular interval.The exact definition:The exact definition:
If we take a series of samples from the same population If we take a series of samples from the same population and construct e.g. 95%confidence intervals around their and construct e.g. 95%confidence intervals around their parameters then 95% of these confidence intervals will parameters then 95% of these confidence intervals will contain the true parameter.contain the true parameter.
ImplementationImplementation
to the Hypothesis testing:to the Hypothesis testing:
Check if the interval includes Check if the interval includes wwaa
, in order to decide if you , in order to decide if you are going to reject the null hypothesis.are going to reject the null hypothesis.
How to choose a statistical test . . .How to choose a statistical test . . .
The type of dataThe type of datacontinuous versus categoricalcontinuous versus categorical
The distributionThe distributionparametric versus nonparametric versus non--parametricparametric
The sample sizeThe sample sizeThe number of samplesThe number of samplesThe relation of samples to each otherThe relation of samples to each other
paired versus unpairedpaired versus unpaired
The number of variablesThe number of variablesunivariateunivariate versus multivariateversus multivariate
Parametric versus NonParametric versus Non--ParametricParametric
Parametric methods:Parametric methods:make distributional assumptionsmake distributional assumptions
usually assume Normal distribution or use the usually assume Normal distribution or use the Central Limit Theorem.Central Limit Theorem.comparable Standard Deviationscomparable Standard Deviations
NonNon--parametric methods:parametric methods:““distributiondistribution--freefree””
PPvaluevalue(non(non--parametric) > parametric) > PPvaluevalue(parametric(parametric))No confidence intervals usually in the nonNo confidence intervals usually in the non--parametric parametric tests.tests.
Statistical methods for continuous Statistical methods for continuous datadata
Univariate tests to compare means:Univariate tests to compare means:
parametric non-parametric
1 One-sample t-test
Wilcoxon signed rank sum test
paired Paired t-testWilcoxon matched pairs signed rank
sum test 2
unpaired Two-sample t-test Mann-Whitney U test
Number of
samples
3 or more One-way
ANOVA Kruskal-Wallis test
One SampleOne SampleTable 1: Average daily energy intake (kJ) over 10 days of 11 healthy women.
Subject Average daily energy intake (kj) 1 5260 2 5470 3 5640 4 6180 5 6390 6 6515 7 6808 8 7515 9 7515
10 8230 11 8770
Mean 6753.6 SD 1142.1
What can we say about the energy intake of these women in relation to a recommended daily intake of 7725kJ?
One SampleOne Sample
To answer the question we can carry out a test of To answer the question we can carry out a test of the null hypothesis that our data are a sample from the null hypothesis that our data are a sample from a population with a specific hypothesized mean. a population with a specific hypothesized mean. The test is called the The test is called the one sample tone sample t--test.test.
sample mean - hypothesized mean x kt standard error of sample mean s / n
6753.6 7725 2.8211142.1/ 11
−= =
−= = −
Pvalue
< 0.02 Reject H0
t distribution with n -1=10 df
Table2 2 ××
(area to the right of |t| under (area to the right of |t| under the t distribution with 10 the t distribution with 10 dfdf))
If t > tIf t > tnn--1,1,αα/2 /2 or or t < t < --
ttnn--1,1,αα/2/2
rejectrejectHHoo
One SampleOne Sample
Alternatively we could calculate a 95% C.I. for the Alternatively we could calculate a 95% C.I. for the mean intake:mean intake:
This range does not include the recommended level This range does not include the recommended level of 7725KJ. If we assume that the women are a of 7725KJ. If we assume that the women are a representative sample, then we can infer that for all representative sample, then we can infer that for all women of this age the average daily energy women of this age the average daily energy consumption is less than is recommended. consumption is less than is recommended.
10,0.025(x t s / n ) (6753.6 2.228 344.4) (5986,7521)± ⋅ = ± ⋅ =
One SampleOne Sample
Assumptions:Assumptions:The Data comes from a Normal distribution.The Data comes from a Normal distribution.If the sample size is >30 then because of the If the sample size is >30 then because of the Central Limit Theory we can perform the test Central Limit Theory we can perform the test even if data doesneven if data doesn’’t look very near to Normal.t look very near to Normal.For small samples non Normally distributed For small samples non Normally distributed we should perform a non parametric method we should perform a non parametric method like the like the Sign Test Sign Test or the or the WilcoxonWilcoxon signed signed rank sum testrank sum test. .
One SampleOne Sample The Sign Test (or Binomial Test)The Sign Test (or Binomial Test)
If there were no differences on average between the sample valueIf there were no differences on average between the sample values s and the hypothesized specific value we would expect an equal and the hypothesized specific value we would expect an equal number of observations above and below the specific value. We canumber of observations above and below the specific value. We can n thus use the Binomial distribution, or the Normal approximation thus use the Binomial distribution, or the Normal approximation of it, of it, to evaluate the probability of the observed frequencies when theto evaluate the probability of the observed frequencies when the
true true
probability of exceeding the expected intake is p=1/2. In our daprobability of exceeding the expected intake is p=1/2. In our dataset 2 taset 2 women had daily intakes above 7725 KJ and 9 below. We calculate women had daily intakes above 7725 KJ and 9 below. We calculate the following test statistic:the following test statistic:
Normal TablePvalue
=0.035r np 9 5.5z 2.11
1.658np(1 p)− −
= = =−
r np 2 5.5z 2.111.658np(1 p)
− −= = = −
−
OROR
REJECT HREJECT H00
2 2 ××
(area to the right of |z| under (area to the right of |z| under the N(0,1) distribution)the N(0,1) distribution)If z > zIf z > zαα/2/2
or z<or z<--zzαα/2/2
reject Hreject Hoo
One SampleOne Sample The Sign Test (or Binomial Test)The Sign Test (or Binomial Test)
If any of the observations is exactly the same as the If any of the observations is exactly the same as the hypothesized value then we ignore it in the calculation. hypothesized value then we ignore it in the calculation. Thus the sample size is the number of observations that Thus the sample size is the number of observations that differ from the hypothesized value.differ from the hypothesized value.Because of the small sample size it would be better in the Because of the small sample size it would be better in the normal approximation to use the normal approximation to use the continuity correctioncontinuity correction, , i.e. subtract i.e. subtract ½½
in the absolute value of the numerator. in the absolute value of the numerator.
| r np | 1 / 2z 1.81np(1 p)− −
= =−
Normal TablePvalue
=0.07
DO NOT REJECT HDO NOT REJECT H00
One SampleOne Sample The The WilcoxonWilcoxon
Signed rank TestSigned rank TestCalculate the difference between each Calculate the difference between each observation and the value of interest.observation and the value of interest.Ignoring the signs of the differences, rank Ignoring the signs of the differences, rank them in order of magnitude. More powerful them in order of magnitude. More powerful test than the sign test. test than the sign test. Calculate the sum of the ranks of all the Calculate the sum of the ranks of all the negative (or positive) ranks and find negative (or positive) ranks and find PPvaluevalue
from corresponding table.from corresponding table.
One SampleOne Sample The The WilcoxonWilcoxon
Signed rank TestSigned rank Test
3+5 = 8 3+5 = 8 PPvaluevalue
< 0.05< 0.05 Reject HReject H00WilcoxonWilcoxon
Signed rank Test TableSigned rank Test Table
Two Groups of Paired Two Groups of Paired ObservationsObservations
Paired data arise when the same individuals are Paired data arise when the same individuals are studied more than once, usually in different studied more than once, usually in different circumstances.circumstances.Also, when we have 2 different groups of Also, when we have 2 different groups of subjects who have been individually matched, subjects who have been individually matched, for example on a matched pair casefor example on a matched pair case--control control study.study.Very common in Medical Research.Very common in Medical Research.We are interested in the average difference We are interested in the average difference between the observations for each individual between the observations for each individual and the variability of these differences.and the variability of these differences.
Two Groups of Paired Two Groups of Paired ObservationsObservations
Table 2: Mean daily intake over 10 pre-menstrual and 10 post-menstrual days
Dietary intake Subject Pre-menstrual Post-menstrual Difference
1 5260 3910 13502 5470 4220 12503 5640 3885 17554 6180 5160 10205 6390 5645 7456 6515 4680 18357 6808 5265 15408 7515 5975 15409 7515 6790 725
10 8230 6900 133011 8770 7335 1435
Mean 6753.6 5433.2 1320.5SD 1142.1 1216.8 366.7
We can use the one sample We can use the one sample tt--test test to calculate a P value for the to calculate a P value for the comparison of means, the comparison of means, the observed mean difference of observed mean difference of 1320.5KJ and the hypothetical 1320.5KJ and the hypothetical value of zero, i.e. the null value of zero, i.e. the null hypothesis is that prehypothesis is that pre--
and postand post--
menstrual dietary intake is the menstrual dietary intake is the same.same.
d 0 1320.5 0t 11.94se(d ) 366.7 / 11− −
= = =
T distribution with n -1=10 df
Table
PPvaluevalue
< 0.001< 0.001
Reject HReject H00
Two Groups of Paired Two Groups of Paired ObservationsObservations
Alternatively we could calculate a 95% C.I. for Alternatively we could calculate a 95% C.I. for the mean difference:the mean difference:
This range does not include the recommended This range does not include the recommended level of 0KJ. If we assume that the women are a level of 0KJ. If we assume that the women are a representative sample, then we can infer that representative sample, then we can infer that dietary intake is much lower in the postdietary intake is much lower in the post--
menstrual period.menstrual period.
10,0.025(d t s / n ) (1320.5 2.228 110.6) (1074.2,1566.8)± ⋅ = ± ⋅ =
Two Groups of Paired Two Groups of Paired ObservationsObservations
The same assumptions as before hold for the The same assumptions as before hold for the difference data (difference data (thus we require normality for thus we require normality for the differences not for each set of datathe differences not for each set of data). If ). If these assumptions are not met then we can these assumptions are not met then we can apply the same non parametric techniques as apply the same non parametric techniques as before for the difference data. For example we before for the difference data. For example we see that all 11 differences have the same sign see that all 11 differences have the same sign so the test statistic of the sign test with the so the test statistic of the sign test with the continuity correction is:continuity correction is:| r np | 0.5 | 11 5.5 | 0.5z 3.02
1.658np(1 p)− − − −
= = =−
Normal TablePPvaluevalue
= 0.003= 0.003
Reject HReject H00
Two Independent Groups of Two Independent Groups of ObservationsObservations
The most common statistical analysis, e.g. clinical The most common statistical analysis, e.g. clinical trials or observational studies comparing different trials or observational studies comparing different groups of subjects. groups of subjects.
Table: 24 hour total energy expenditure (MJ/day) in groups of lean and obese women.
Lean(n=13)
Obese(n=9)
1 6.13 8.792 7.05 9.193 7.48 9.214 7.48 9.685 7.53 9.696 7.58 9.977 7.9 11.518 8.08 11.859 8.09 12.79
10 8.1111 8.412 10.1513 10.88
Mean 8.066 10.298SD 1.238 1.398
Is there a true difference in the 24 hour total energy expenditure between lean and obese women?
Two Independent Groups of Two Independent Groups of ObservationsObservations
To answer this question we can carry out To answer this question we can carry out a test of the null hypothesis that the a test of the null hypothesis that the means of the two populations, obese and means of the two populations, obese and lean women have the same mean of total lean women have the same mean of total energy expenditure. The test is called the energy expenditure. The test is called the two sample ttwo sample t--test.test.
1 2
1 2 p 1 2
p
2 22 th1 1 2 2
p i1 2
x x 10.298 8.066t 3.95se(x x ) s 1 / n 1 / n
w here s is the standard deviation given by
(n 1)s (n 1)ss , w ith s the variance of the i group.n n 2
− −= = =
− ⋅ +
− + −=
+ −
pooled
Pvalue
<0.001 (T distribution with n1
+ n2
-2=20 df
)
Reject HReject H00
If t > tIf t > tn1+n2n1+n2--2,2,αα/2 /2 or t < or t < --
ttn1+n2n1+n2--2,2,αα/2/2
reject Hreject Hoo
Two Independent Groups of Two Independent Groups of ObservationsObservations
Alternatively we could calculate a 95% C.I. Alternatively we could calculate a 95% C.I. for the mean difference:for the mean difference:
This range does not include the value of This range does not include the value of 0MJ/day. Thus the total energy 0MJ/day. Thus the total energy expenditure in the obese women is greater expenditure in the obese women is greater than that of the lean women.than that of the lean women.
( )1 21 2 n n 2,0.025 p 1 2x x t s 1/ n 1/ n
(2.232 2.086 0.5656) (1.05,3.41)+ −− ± ⋅ ⋅ +
= ± ⋅ =
Two Independent Groups of Two Independent Groups of ObservationsObservations
Assumptions:Assumptions:Each set of observations is sampled from a Each set of observations is sampled from a population with a Normal distribution population with a Normal distribution and the and the variances of the two populations are the same.variances of the two populations are the same.If the sample sizes of the two groups are >30 then If the sample sizes of the two groups are >30 then because of the Central Limit Theory we can perform because of the Central Limit Theory we can perform the test even if data doesnthe test even if data doesn’’t look very near to Normal t look very near to Normal in either or both groups.in either or both groups.For small samples non Normally distributed, or/and For small samples non Normally distributed, or/and for populations with unequal variances, we should for populations with unequal variances, we should perform a non parametric method, the perform a non parametric method, the MannMann--Whitney Whitney test (or the test (or the WilcoxonWilcoxon Rank sum test)Rank sum test). .
Two Independent Groups of Two Independent Groups of Observations Observations –– MannMann--Whitney TestWhitney Test
The MannThe Mann--Whitney test requires all Whitney test requires all observations to be ranked as if they were observations to be ranked as if they were from a single sample. Then T = sum of the from a single sample. Then T = sum of the ranks in the smaller group (either group ranks in the smaller group (either group can be taken if they have equal size) is can be taken if they have equal size) is calculated and a P value is found from calculated and a P value is found from tables. tables. In our case T=150 In our case T=150 PPvaluevalue
< 0.01< 0.01
Reject HReject H00
Mann Mann ––
Whitney TableWhitney Table
Two Independent Groups of Two Independent Groups of Observations Observations –– MannMann--Whitney TestWhitney Test
Testing the AssumptionsTesting the AssumptionsHow to test normalityHow to test normality? Most people just make a histogram of the ? Most people just make a histogram of the data and check if this looks like a bell shape. Although remembedata and check if this looks like a bell shape. Although remember that r that the assumption is not that the sample has the normal distributiothe assumption is not that the sample has the normal distribution but n but that it comes from a population which does. For large samples wethat it comes from a population which does. For large samples we
expect to see a histogram with a bell shape if the population isexpect to see a histogram with a bell shape if the population is
normal normal but with small samples it is quite unlike to get a symmetric disbut with small samples it is quite unlike to get a symmetric distribution tribution even if the population is normally distributed. There are formaleven if the population is normally distributed. There are formal
methods that test for normality, and you can find them in most methods that test for normality, and you can find them in most statistical packages, like the statistical packages, like the ShapiroShapiro--WilkWilk
testtest
or the or the ShapiroShapiro--
FranciaFrancia
testtest. You can also use common sense and answer the . You can also use common sense and answer the question if it is reasonable to make the assumption that the question if it is reasonable to make the assumption that the population of interest is normally distributed. population of interest is normally distributed. When the data are not normally distributed and are skewed, it isWhen the data are not normally distributed and are skewed, it is
better to try some transformations first, like the logarithmic obetter to try some transformations first, like the logarithmic one, in ne, in order to make their shape symmetric and then perform a parametriorder to make their shape symmetric and then perform a parametric c test on the transformed data, instead of doing directly a non test on the transformed data, instead of doing directly a non parametric test. parametric test.
Testing the AssumptionsTesting the Assumptions
How to test equality of variancesHow to test equality of variances? Most ? Most people just see how close are the 2 people just see how close are the 2 sample variances. Instead you can sample variances. Instead you can perform a hypothesis testing with a null perform a hypothesis testing with a null hypothesis that the two variances are hypothesis that the two variances are equal; this test is called the equal; this test is called the F test. F test.
Testing the AssumptionsTesting the AssumptionsTable: Serum thyroxine level (nmol/l) in 16 hypothyroid infants by severity of symptoms (Hulse et al., 1979)
Marked symptoms (n=7)
Slight or no symptoms (n=9)
1 5 342 8 453 18 494 24 555 60 586 84 597 96 608 629 86
Mean 42.1 56.4SD 37.48 14.22
We wish to compare We wish to compare thyroxinethyroxine
levels in the levels in the
two groups defined by two groups defined by severity of symptoms, severity of symptoms, but the sample standard but the sample standard deviations are markedly deviations are markedly different.different.
22122
thi
s 37.48F 6.95s 14.22
w here s is the standard deviation of the i group.
⎛ ⎞= = =⎜ ⎟⎝ ⎠
F distribution with n1
-1=6 and n2
-1=8
df PPvaluevalue
< 0.01< 0.01
Reject HReject H00
area to the right of F under the area to the right of F under the F distribution with 6, 8 F distribution with 6, 8 dfdf))
If F < FIf F < Fn1n1--1,n21,n2--1,11,1--αα/2 /2 or F > For F > Fn1n1--1,n21,n2--1,a/21,a/2
reject Hreject Hoo
Testing the AssumptionsTesting the AssumptionsAlternatively we could calculate a 95% C.I. for the Alternatively we could calculate a 95% C.I. for the variances ratio:variances ratio:
This range does not include the value of 1. Thus the This range does not include the value of 1. Thus the variance in the marked symptoms group is larger than variance in the marked symptoms group is larger than the one in the slight or no symptoms group. Thus we the one in the slight or no symptoms group. Thus we cannot use the tcannot use the t--test and we have to perform a nontest and we have to perform a non--
parametric method.parametric method.
1 2 1 2
2 21 12 22 n 1,n 1,0.975 2 n 1,n 1,0.025
2 2
s s1 1, s F s F
37.48 1 37.48 1, (1.49,38.61)14.22 4.65 14.22 0.18
− − − −
⎛ ⎞⋅ ⋅ =⎜ ⎟⎜ ⎟
⎝ ⎠⎛ ⎞⎛ ⎞ ⎛ ⎞= ⋅ ⋅ =⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠
Testing the AssumptionsTesting the Assumptions
The F test is nonThe F test is non--robust to a violation of robust to a violation of Normality. Alternatively one can use the Normality. Alternatively one can use the LeveneLevene’’ss
Test using a statistical package, Test using a statistical package,
which is not strongly dependent on the which is not strongly dependent on the assumption of Normality of the two groups. assumption of Normality of the two groups.