critical review of significance testing f.dancona from a alain morens lecture 2006
TRANSCRIPT
EpidemiologyTra ining
Critical review of significance testing
F.D’Ancona
from a Alain Moren’s lecture
2006
Botulism outbreak in Italy
“The relative risk of illness was higher among diners who ate home preserved green olives (RR=2.9)”
Is it statistically significant ?
Tests of statistical significance
Many of them regarding differences between means or proportions
These tests help to establish if the observed difference is real (= if it is not due to the chance alone)
The two hypothesis!
There is a difference between people that ate olives and people that didn’t eat them
Hypothesis (H1)(alternative hypothesis)
When you perform a test of statistical significance you usually reject or not reject the Null Hypothesis (H0)
There is NO difference between the two groups
Null Hypothesis (H0)
(example:RR = 1 OR=1)
Hypothesis, testing and null hypothesis
If data provide evidence against the Null Hypothesis then this hypothesis can be rejected in favour of some alternative hypothesis H1 (the objective of our study).
If you don’t reject the Null Hypothesis never you can say that the Null Hypothesis is true. You can only reject it or not reject it.
p = probability that a result (for example a difference between proportions or a RR) or more extreme values can be observed by chance alone
Significance testing: H0 rejected using reported p value
Small p values = low degree of compatibility between H0 and the observed data: you reject H0 and the test is significant.
Large p values = high degree of compatibility between H0 and the observed data: you don’t reject H0, the test is not significant
Never we can reduce to zero the probability that our result was not observed by chance alone
Levels of significance
We need of a cut-off !
0.01 0.05 0.10
p value > 0.05 = H0 non rejected (non significant)
p value ≤ 0.05 = H0 rejected (significant)
Avoid to submit for publication if p > 0.05Referees commonly relied on tests of significance
p = 0.05 and its errors
Level of significance, usually p = 0.05 p value was used for decision making
but still 2 possible errors
H0 should not be rejected, but it was rejected (Type I or alpha error or “false positive”)
H0 should be rejected but it was not rejected (Type II or beta error or “false negative”)
• H0 is “true” but rejected: Type I or error
• H0 is “false” but not rejected: Type II or error
Types of errors
H0 to be not rejected H0 to be rejected
H0 not rejected OK !
1- sensitivity
false -
H0 rejected
false +
OK !
1- specificity
Test result
Truth
The p value level is the level of error that we could accept (usually 5%)
Treatment Successful Unsuccessful Total
B 14 8 22
A 7 13 20
Treatment B, success = 64 %Treatment A, success = 35%2 = 3.44 p = NS
Hypothetical data from a clinical trial of a new treatment
p > 0.05p = 0.06
Different ways to write the same concept but with more information
The epidemiologist needs measurements rather than probabilities
2 is a test of association.
OR, RR are measure of association on a continuous scale (infinite number of possible values)
The best estimate = point estimate
Range of values allowing for random variability = confidence interval (precision of the point estimate)
the amount of variability in the data the dimension of the sample the arbitrary level of confidence (usually 90%,
95%, 99%)
One way to use confidence interval is :If 1 is included in CI, then NON SIGNIFICANTIf 1 is not included in CI, then SIGNIFICANT
Width of confidence interval depends on …
Confidence interval provide more information than p value
magnitude of the effect (strength of association)
direction of the effect (RR > or < 1)precision around the point estimate
of the effect (variability)
p value can not provide them !
Level of confidence interval at 95%
If the data collection and analysis could be replicated many times, the CI should include within it the TRUE value of the measure 95% of the time
The only thing that should bring variability is the chance!
Treatment Successful Unsuccessful Total
B 14 8 22
A 7 13 20
Treatment B, success = 64 %Treatment A, success = 35%p = NSRR = 1.82 95% CI ( 0.93 - 3.57)
Hypothetical data from a clinical trial of a new treatment
p > 0.05p = 0.06
Different ways to write the same concept but with more information
More studies are better or worse?
Decision based on results from a collection of studies are not facilitated when each study is classified as a YES or NO decision.
You have to look the CI and the punctualestimation
But also consider its clinical or biological significance
1RR
20 studies with different results...
Study A, large sample, precise results, narrow CI - SIGNIFICANTStudy B, small size, large CI - NON SIGNIFICANT
Looking the CI
Study A, effect close to NO EFFECTStudy B, no information about absence of large effect
RR = 1
A
B
Large RR
2 = A test of association. It depends on sample size
p value = Probability that equal (or more extreme) results can be observed by chance alone
OR, RR = Direction & strength of associationif > 1 risk factor if < 1 protective factor(independently from sample size)
CI = Magnitude and precision of effect
What we have to evaluate the study
Remember that these values not provide any information on the possibility that the observed association is due to a bias or confounding.
This possibility should be investigated
Cases Non cases Total 2 = 1.3E 9 51 60 p = 0.13NE 5 55 60 RR = 1.8Total 14 106 120 95% CI [ 0.6 - 4.9 ]
Cases Non cases Total 2 = 12E 90 510 600 p = 0.0002NE 50 550 600 RR = 1.8Total 140 1060 1200 95% CI [ 1.3-2.5 ]
Cases Non cases Total 2 = 12E 600 1400 2000 p = 0.0002NE 500 1500 2000 RR = 1.2Total 1100 2900 4000 95% CI [ 1.1-1.3 ]
2 and Relative Risk
Exposure cases non cases AR%Yes 15 20 42.8%No 50 200 20.0%
Total 65 220
Common source outbreak suspected
Remember that these values do not provide any information on the possibility that the observed association is due to a bias or confounding.
HOW YOU COULD EXPLAIN THAT ONLY 23% OF CASES WERE EXPOSED ?
2 = 9.1 p = 0.002RR = 2.195%CI= 1.4-3.4
Recommendations
Hypothesis testing and CI evaluate only the role of chance as alternative explanation of the association.
Interpret with caution every association that achieves statistical significance.
Double caution if this statistical significance is not expected.
P < 0.05
Rothman
It is not a good description of the information in the data