critical review of significance testing f.dancona from a alain morens lecture 2006

EpidemiologyTra ining

Critical review of significance testing

F.D’Ancona

from a Alain Moren’s lecture

2006

Botulism outbreak in Italy

“The relative risk of illness was higher among diners who ate home preserved green olives (RR=2.9)”

Is it statistically significant ?

Tests of statistical significance

Many of them regarding differences between means or proportions

These tests help to establish if the observed difference is real (= if it is not due to the chance alone)

The two hypothesis!

There is a difference between people that ate olives and people that didn’t eat them

Hypothesis (H1)(alternative hypothesis)

When you perform a test of statistical significance you usually reject or not reject the Null Hypothesis (H0)

There is NO difference between the two groups

Null Hypothesis (H0)

(example:RR = 1 OR=1)

Hypothesis, testing and null hypothesis

If data provide evidence against the Null Hypothesis then this hypothesis can be rejected in favour of some alternative hypothesis H1 (the objective of our study).

If you don’t reject the Null Hypothesis never you can say that the Null Hypothesis is true. You can only reject it or not reject it.

p = probability that a result (for example a difference between proportions or a RR) or more extreme values can be observed by chance alone

Significance testing: H0 rejected using reported p value

Small p values = low degree of compatibility between H0 and the observed data: you reject H0 and the test is significant.

Large p values = high degree of compatibility between H0 and the observed data: you don’t reject H0, the test is not significant

Never we can reduce to zero the probability that our result was not observed by chance alone

Levels of significance

We need of a cut-off !

0.01 0.05 0.10

p value > 0.05 = H0 non rejected (non significant)

p value ≤ 0.05 = H0 rejected (significant)

Avoid to submit for publication if p > 0.05Referees commonly relied on tests of significance

p = 0.05 and its errors

Level of significance, usually p = 0.05 p value was used for decision making

but still 2 possible errors

H0 should not be rejected, but it was rejected (Type I or alpha error or “false positive”)

H0 should be rejected but it was not rejected (Type II or beta error or “false negative”)

• H0 is “true” but rejected: Type I or error

• H0 is “false” but not rejected: Type II or error

Types of errors

H0 to be not rejected H0 to be rejected

H0 not rejected OK !

1- sensitivity

false -

H0 rejected

false +

OK !

1- specificity

Test result

Truth

The p value level is the level of error that we could accept (usually 5%)

Treatment Successful Unsuccessful Total

B 14 8 22

A 7 13 20

Treatment B, success = 64 %Treatment A, success = 35%2 = 3.44 p = NS

Hypothetical data from a clinical trial of a new treatment

p > 0.05p = 0.06

Different ways to write the same concept but with more information

The epidemiologist needs measurements rather than probabilities

2 is a test of association.

OR, RR are measure of association on a continuous scale (infinite number of possible values)

The best estimate = point estimate

Range of values allowing for random variability = confidence interval (precision of the point estimate)

the amount of variability in the data the dimension of the sample the arbitrary level of confidence (usually 90%,

95%, 99%)

One way to use confidence interval is :If 1 is included in CI, then NON SIGNIFICANTIf 1 is not included in CI, then SIGNIFICANT

Width of confidence interval depends on …

Confidence interval provide more information than p value

magnitude of the effect (strength of association)

direction of the effect (RR > or < 1)precision around the point estimate

of the effect (variability)

p value can not provide them !

Level of confidence interval at 95%

If the data collection and analysis could be replicated many times, the CI should include within it the TRUE value of the measure 95% of the time

The only thing that should bring variability is the chance!

Treatment Successful Unsuccessful Total

B 14 8 22

A 7 13 20

Treatment B, success = 64 %Treatment A, success = 35%p = NSRR = 1.82 95% CI ( 0.93 - 3.57)

Hypothetical data from a clinical trial of a new treatment

p > 0.05p = 0.06

Different ways to write the same concept but with more information

More studies are better or worse?

Decision based on results from a collection of studies are not facilitated when each study is classified as a YES or NO decision.

You have to look the CI and the punctualestimation

But also consider its clinical or biological significance

1RR

20 studies with different results...

Study A, large sample, precise results, narrow CI - SIGNIFICANTStudy B, small size, large CI - NON SIGNIFICANT

Looking the CI

Study A, effect close to NO EFFECTStudy B, no information about absence of large effect

RR = 1

A

B

Large RR

2 = A test of association. It depends on sample size

p value = Probability that equal (or more extreme) results can be observed by chance alone

OR, RR = Direction & strength of associationif > 1 risk factor if < 1 protective factor(independently from sample size)

CI = Magnitude and precision of effect

What we have to evaluate the study

Remember that these values not provide any information on the possibility that the observed association is due to a bias or confounding.

This possibility should be investigated

Cases Non cases Total 2 = 1.3E 9 51 60 p = 0.13NE 5 55 60 RR = 1.8Total 14 106 120 95% CI [ 0.6 - 4.9 ]

Cases Non cases Total 2 = 12E 90 510 600 p = 0.0002NE 50 550 600 RR = 1.8Total 140 1060 1200 95% CI [ 1.3-2.5 ]

Cases Non cases Total 2 = 12E 600 1400 2000 p = 0.0002NE 500 1500 2000 RR = 1.2Total 1100 2900 4000 95% CI [ 1.1-1.3 ]

2 and Relative Risk

Exposure cases non cases AR%Yes 15 20 42.8%No 50 200 20.0%

Total 65 220

Common source outbreak suspected

Remember that these values do not provide any information on the possibility that the observed association is due to a bias or confounding.

HOW YOU COULD EXPLAIN THAT ONLY 23% OF CASES WERE EXPOSED ?

2 = 9.1 p = 0.002RR = 2.195%CI= 1.4-3.4

Recommendations

Hypothesis testing and CI evaluate only the role of chance as alternative explanation of the association.

Interpret with caution every association that achieves statistical significance.

Double caution if this statistical significance is not expected.

P < 0.05

Rothman

It is not a good description of the information in the data

critical review of significance testing f.dancona from a alain morens lecture 2006

Documents

significant p value

p value level

information slide

large p values

p value magnitude

effect variability p

error h

b large rr slide