gy zou - publish.uwo.ca
TRANSCRIPT
We have discussed inference procedures for 2 × 2 tables in the
context of comparing two groups.
Yes No
Group 1 a b n1
Group 2 c d n2
m1 m2 n
For hypothesis testing, we use Pearson chi-square test;
For interval estimation, we use methods for p1 − p2 (of course,
NNT ).
However, Pearson chi-square test will work only if expected value for
every cell is greater than 5.
For data with small cells, we can use Fisher’s ‘exact’ test. The idea of
this test is to fix the row and column totals as the observed table, and
compute the probabilities of observing as or more extreme tables in their
departure from the null hypothesis (recall the definition of P -value).
Fisher (1935, The logic of inductive inference JRSS A 98: 39-54)
presented his test at the annual Christmas meeting of the Royal
Statistical Society.
The title is very good, because I’ve heard that the most important
contribution of statistics to science is not the formula, but logic.
Still, right after his talk, a speaker compared Fisher’s talk to “the
braying of the Golden Ass”.
The probability of observing the table is given by
Pr(a, b, c, d|marginals = n1, n2, m1, m2) =n1!n2!m1!m2!
n!a!b!c!d!(1)
Fisher’s procedure requires the probability of all more extreme
tables to be computed, using Eq (1) repeatedly.
The p-value of the test is obtained by definition:
Sum of all those probabilities.
Thus, Fisher’s exact test is essentially one-sided.
If two-sided is called for, the simplest way to do it is to double the
p−value. This is exactly SAS proc freq gives you whenn1 = n2.
Example of Fisher’s exact test (p. 375).
Yes No
Group 1 8 16 24
Group 2 1 23 24
9 39 48
Pr(a, b, c, d|48, 9, 24) =24!24!9!39!
48!8!16!1!23!= 0.010525
A more extreme table is:
Yes No
Group 1 9 15 24
Group 2 0 24 24
9 39 48
Pr(a, b, c, d|48, 9, 24) =24!24!9!39!
48!9!15!0!24!= .000779619
The p-value is then .010525+.000779619=0.011304.
SAS function for hypergeometric probability
* pdf(’HYPER’, a, n, n1, m1);
data;
bb=pdf(’HYPER’, 8, 48, 24, 9);
cc=pdf(’HYPER’, 9, 48, 24, 9);
dd=bb+cc;
proc print;
run;
Obs bb cc dd
1 0.010525 .000779619 0.011304
Two-sided p-value is then 2 × 0.011304 = 0.022608
Yes No
Group 1 8 16 24
Group 2 1 19 20
9 35 44
data fisher;
do i = 0 to 9;
bb=pdf(’HYPER’, i, 44, 9, 24);
output;
end;
proc print;
run;
The SAS System
Obs i bb
1 0 0.00024
2 1 0.00426
3 2 0.03018
4 3 0.11066
5 4 0.23239
6 5 0.29048
7 6 0.21644
8 7 0.09276
9 8 0.02075
10 9 0.00184
One-sided p-value is 0.02075 + 0.00184 = 0.02259.
Another way called mid-p-value (Lancaster 1961 JASA 56: 223–234):
One sided mid-p−value is 1/2 × 0.02075 + 0.00184 = 0.012215
Two-sided p-value is 2 × mid − p = 2 × 0.012215 = 0.02443
Two-sided p-values is given by
0.02075 + 0.00184 + 0.00426 + 0.00024 = 0.02709. This is how SAS obtains
two-sided p-value.
The way to present the results is: Rate in group I was ? , in Group
II was ?; difference ? (95% confidence interval ? to ? ), P = ?
(Fisher’s Exact test two-sided mid P).
McKinney et al (1989 The inexact use of Fisher’s exact test in six
major medical journals JAMA 261:3430–3433).
Half of 70 articles reviewed either had used a one-tailed test when
a two-tailed test was called for, or the authors simply had not
bothered to state which test they had used.
If only hypothesis testing, an epidemiologist’s life would be too
easy.
Effect estimation makes it hard, also interesting.
Besides randomized studies, there are more ways of generating
2 × 2 table:
• cross-sectional (naturalistic, multinomial) sampling: select a
total of N subjects, followed by the determination for each
subject of presence or absence of characteristics of A and B;
• retrospective sampling: predetermine n1 of subjects who
possess A and n2 who do not possess A, followed by the
determination of B in each group, where A is usually a
disease of interest and B is a risk factor. Case-control study
• prospective sampling: similar to case-control, except A and B
is switched. Cohort study.
Cross-sectional sample to estimate risk ratio (relative risk, RR)
Outcome (D)
Exposure (E) Yes (+) No (-)
1(Yes, +) a b n1
2(No, -) c d n2
m1 m2 n
Risk ratio is defined by
RR =Pr(D+|E+)Pr(D+|E−)
The estimated RR is
RR =a/n1
c/n2
The estimated variance for ln RR estimated by
var[ln(RR)] =1a− 1
n1+
1c− 1
n2.
95% CI for RR is obtained by obtaining CI for ln(RR) because the
sampling distribution of ln RR is closer to Normal than that of RR
l, u = ln(RR) ± 1.96√
var(ln RR)
The CI for RR is then given by
exp(l), exp(u)
Example. Data for 200 mothers and their baby birthweight are as
Outcome
Maternal Age ≤ 2500 > 2500
< 20 a = 10 b = 40 n1 = 50
≥ 20 c = 15 b = 135 n2 = 150
m1 = 25 m2 = 175 n = 200
RR =10/5015/150
= 2
with variance estimate for ln RR given by
var[ln(RR)] =1a− 1
n1+
1c− 1
n2=
110
− 150
+115
− 1150
= 0.14
95% CI for RR is exp[ln(2) ± 1.96√
0.14] = (0.96, 4.16)
Levin’s attributable risk fraction:
How much risk would be reduced if the exposure is eliminated ?
e.g., force all the smoker in London leave town.
Since people with disease include two exclusive types: those who
were exposed, and those who were not exposed, we have
Pr(D+) = Pr(D+ ∩ E+) + Pr(D+ ∩ E−)
= Pr(D+|E+) Pr(E+) + Pr(D+|E−) Pr(E−)
If E+ cannot cause disease, we would expected people with
exposure (E+) have the same disease rate as those who were not
exposed, i.e., Pr(D+|E−).
Thus, the proportion of exposed people will have disease, if theexposure could not cause disease, is given by
Pr(D+|E−) × Pr(E+)
Levin (1953, Acta Unio Int contra Cancrum 19: 531–541)
defined Attributable Fraction as
RA = actual−counterfactualactual
= Pr(D+|E+) Pr(E+)−Pr(D+|E−) Pr(E+)Pr(D+)
=Pr(E+)[Pr(D+|E+)−Pr(D+|E−)]
Pr(D+|E+) Pr(E+)+Pr(D+|E−) Pr(E−)
=Pr(E+)[Pr(D+|E+)/ Pr(D+|E−)−1]
Pr(E+) Pr(D+|E+)/ Pr(D+|E−)+Pr(E−)
= Pr(E+)[RR−1]Pr(E+)RR+1−Pr(E−)
RA =Pr(E+) [RR − 1]
1 + Pr(E+)(RR − 1)
RA =Pr(E+)(RR − 1)
1 + Pr(E+)(RR − 1)
Estimated by
RA =Pr(E+)
[RR − 1
]1 + Pr(E+)(RR − 1)
with estimated variance for ln(1 − RA) given by
var[ln(1 − RA)
]=
1nc
[b + RA(a + d)
]see Fleiss (1979 Am J Epidemiol 110: 103–104).
Example: Infant mortality by birthweight for n = 72730 live births
in New York City in 1974. What % of deaths could have been
prevented if low birthweight had been eliminated ?
Outcome @ 1 yr
BW Dead Alive Total
≤ 2500g a/n = 0.0085 b/n = 0.0632 (a + b)/n = 0.0717
> 2500g c/n = 0.0058 d/n = 0.9225 (c + d)/n = 0.9283
Total n/n = 1
RR =0.0085/0.07170.0058/0.9283
= 18.97405
P (E+) = 0.0717
RA =Pr(E+)
[RR − 1
]1 + Pr(E+)(RR − 1)
=.0717(18.97405 − 1)
1 + .0717(18.97405 − 1)= 0.563.
Variance for ln(1 − RA) is
var[ln(1 − RA)
]=
.0632 + .563(.0085 + .9225)72730 × .0058
= 0.001392379
i.e.,
s.e.[ln(1 − RA)
]=
√var = 0.037
95% CI for ln(1 − RA) is
ln(1 − RA) ± 1.96se[ln(1 − RA)] = ln(1 − 0.563) ± 1.96 × 0.037
= (−0.900,−0.755)
CI for RA is then given by
[1 − exp(−0.755), 1 − exp(−0.900)] = (0.530, 0.593)
With 95% confidence, between 53% and 59% of all infant death in
New York City in 1974 could have been prevented if low birth
weight had been eliminated.
Attributable risk among the exposed
RE = 1 − 1RR
which is widely used in the law to describe the excess risk as a
fraction of the risk among those exposed to the antecedent factor.
The estimator is
RE = 1 − 1
RR
Estimation can be conducted through RR.
Case-control study and odds ratio
Rothman, Modern Epidemiology, 1986, p.62
The sophisticated use and understanding of case-control
studies is the most outstanding methodological
development of modern epidemiology
My understanding of case-control study is from
Breslow. 1996. Statistics in epidemiology: The case-control study.
J Am Stat Assoc 91:14–28.
Recall case-control design involves selecting n1 of subjects who
have disease D+ and n2 who do not possess D−, followed by the
determination of exposure X+ or X− in each group.
Case-control design in general can only provides the ratio of
exposure odds of case group to that of the control group, i.e.
ORe =exposure odds|case
exposure odds|control
Since odds is defined as P1−P ,
ORe =Pr(X+|D+)
1−Pr(X+|D+)
Pr(X+|D−)1−Pr(X+|D−)
=Pr(X+|D+) Pr(X−|D−)Pr(X+|D−) Pr(X−|D+)
For ORe to be useful, it must have some relationship with risk ratioPr(D+|X+)
Pr(D+|X−).
Entered Cornfield (1951 J. Natl Cancer Inst 11:1269–1275) who showed that
1)
ORe =Pr(X+|D+) Pr(X−|D−)
Pr(X+|D−) Pr(X−|D+)=
Pr(D+|X+) Pr(X+)Pr(D+)
×Pr(D−|X−) Pr(X−)Pr(D−)
Pr(D−|X+) Pr(X+)Pr(D−)
×Pr(D+|X−) Pr(X−)Pr(D+)
=Pr(D+|X+) Pr(D−|X−)
Pr(D−|X+) Pr(D+|X−)= ORd
2)
Pr(D−|X−)
Pr(D+|X−)≈ 1, when Pr(D+) → 0.
which implies that ORd ≈ RRd.
To see 2), observe
Pr(D−|X−) =Pr(X−|D−)P (D−)
P (X−)=
Pr(X−|D−)P (D−)
Pr(X−∩D−)+Pr(X−∩D+)
=Pr(X−|D−)P (D−)
Pr(X−|D−) Pr(D−)+Pr(X−|D+) Pr(D+)≈ 1, whenPr(D+) → 0
Similarly Pr(D−|X+) ≈ 1 when Pr(D+) → 0.
Thus, case-control studies are indeed useful.
In fact, Mantel & Haenszel (1959, J Natl Cancer Inst 22:719-748) stated
Among the desirable attributes of the retrospective study is the ability
to yield results from presently collectible data.... The retrospective
approach is also adapted to the limited resources of an individual
investigator... For especially rare disease a retrospective study may be
the only feasible approach... In the absence of important biases in the
study setting, the retrospective method could be regarded, according
to sound statistical theory, as the study method of choice (p. 720).
This was almost 50 years ago. Recent epidemiologic literature has seen more
and more prospective studies, except genetic epidemiologic literature in which
case-control design is almost universal. More detailed discussion of statistical
issues may be found in Zou (2006 Annals of Human Genetics 70: 262–276).
Status
Exposure Case Control
Yes a b n1
No c d n2
Odds ratio in a case control study is estimated by
OR =ad
bc
with variance of ln(OR) estimated by
var[ln(OR)] =1a
+1b
+1c
+1d
Example. Sun protection during childhood by case-control status for
cutaneous melanoma in Belgium, France and Germany.
Exposure
Sun protection Case Control
Yes 99 132
No 303 290
402 422 824
OR =99 × 290
132 × 303= 0.72
var[ln(OR)] =1
99+
1
132+
1
290+
1
303= 0.02442537
95% CI for OR is
exp[ln(0.72) ± 1.96√
0.02442537] = exp(−0.634825,−0.02218313)
= (0.53, 0.98)
Status
Exposure Case Control
Yes a b n1
No c d n2
Odds ratio in a case control study is estimated by
OR =ad
bc
Status
trt Yes No
1 a = 0 b = 14 n1 = 14
2 c = 0 d = 11 n2 = 11
This is a data set discussed by
Parzen M, Lipsitz S, Ibrahim J, Klar N. 2002. An estimate of the odds ratio
that always exists. J Comput Graph Stat 11: 420-436.
Exact confidence interval for OR
It is available in SAS proc freq, which computes exact
confidence limits for the odds ratio with an algorithm by Thomas
(1971, Applied Statistics 20:105–110 ), ie., the limits L and U are
iterative solutions for the following two equations:∑m1i=a
(n1i
)(n2
m1−i
)Li∑m1
i=0
(n1i
)(n2
m1−i
)Li
= α/2
∑ai=0
(n1i
)(n2
m1−i
)U i∑m1
i=0
(n1i
)(n2
m1−i
)U i
= α/2
Exact may not be the best in categorical data analysis because the results may
be too conservative
Agresti A. 2003. Dealing with discreteness: making ’exact’ confidence
intervals for proportions, differences of proportions, and odds ratios more
exact Stat Meth Med Res 12 (1): 3–21.
... the inversion of the asymptotic score test seems to be a good
choice. This tends to have actual level fluctuating around the
nominal level. If one prefers that level to be a bit more conservative,
mid-p adaptations of exact methods work well. For situations that
require a bound on the error, it appears that basing conservative
intervals on inverting the exact score test has reasonable
performance. For teaching, the Wald-type interval of point estimate
plus and minus a normal-score multiple of a standard error is
simplest. Unfortunately, this can perform poorly, but simple
adjustments sometimes provide much improved performance.
Odds ratio versus risk ratio.
Most traditional statistical methods in epidemiology were developed in the
case of case-control design. Specifically, OR was the effect measure of choice.
Unfortunately, when it comes to prospective or cross-sectional design, the rare
disease assumption may not be satisfied. In such cases, OR becomes very
difficulty to interpret, sometimes misleading (see NEJM 1999;341:279–83).
For a RR = 2,
p1 .05 .1 .2 .3 .4 .6 .8
p2 0.025 0.050 0.100 0.150 0.200 0.300 0.400
OR =p1/(1−p1)p2/(1−p2)
2.05 2.11 2.25 2.43 2.67 3.50 6.00
My view is that we must always remember why Cornfield (1951)
proposed OR.
Excellent discussion on the choice of effect measures may be found
in Greenland (Interpretation and choice of effect measures in
epidemiologic analyses. Am J Epidemiol 1987;125:761–8).
Converting OR to RR (Zhang & Yu, 1998 JAMA 280: 1690–1):
RR =OR
1 − p2 + p2OR
RR =OR
1 − p2 + p2OR(2)
• Eq. (2) results in correct point estimate only if they are no
confounder;
• Substituting confidence limits for OR to obtain CI for RR
yields invalid interval for RR.
More discussion can be found in McNutt et al (2003 Am JEpidemiol 157:940–3) and Zou (2004 Am J Epidemiol 159:
702–6).
A little trick to check calculations when a confidence interval is
constructed through log-transformation (Lee PN. 1999. Stat Med18:1973–81):
Such a interval should satisfies: square of the point estimateshould equal to the product of lower and upper limits.
l = exp[ln point − Z × √var(ln point)]
u = exp[ln point + Z × √var(ln point)]
thus
l × u = exp(2 × ln point) = (point)2
Sample size estimation with SAS proc power
proc power;
twosamplefreq test=pchi
relativerisk = 1.5
refproportion = 0.2
power=0.8
ntotal=.;
run;
The POWER Procedure
Pearson Chi-square Test for Two Proportions
Fixed Scenario Elements
Distribution Asymptotic normal
Method Normal approximation
Reference (Group 1) Proportion 0.2
Relative Risk 1.5
Nominal Power 0.8
Number of Sides 2
Null Relative Risk 1
Alpha 0.05
Group 1 Weight 1
Group 2 Weight 1
Computed N Total
Actual N
Power Total
0.801 588
proc power;
twosamplefreq test=pchi
oddsratio = 2.5
refproportion = 0.3
groupweights = (1 2)
ntotal = .
power = 0.8;
run;
The POWER Procedure
Pearson Chi-square Test for Two Proportions
Fixed Scenario Elements
Distribution Asymptotic normal
Method Normal approximation
Reference (Group 1) Proportion 0.3
Odds Ratio 2.5
Group 1 Weight 1
Group 2 Weight 2
Nominal Power 0.8
Number of Sides 2
Null Odds Ratio 1
Alpha 0.05
Computed N Total
Actual N
Power Total
0.801 180
The POWER Procedure
Fisher’s Exact Conditional Test for Two Proportions
Fixed Scenario Elements
Distribution Exact conditional
Method Walters normal approximation
Group 1 Proportion 0.35
Group 2 Proportion 0.15
Nominal Power 0.8
Number of Sides 2
Alpha 0.05
Computed N Per Group
Actual N Per
Power Group
0.801 81
Outcome
Exposure Yes No
Yes ak bk n1k
No ck dk n2k
m1k m2k nk
MH test for no association between exposure and outcome (Mantel &
Haenszel, 1959 J Natl Cancer Inst 22: 719–48)
χ2MH =
[∑k
(ak − m1kn1k
nk
)]2
∑k
m1km2kn1kn2k
n2k(nk−1)
which is distributed as chi-square with one degree-of-freedom, under
H0.
5 years earlier, Cochran (1954 Biometrics 10:417-451) proposed a test
that is virtually identical to χ2MH (difference ?)
When k = 1, χ2MH reduce to Pearson chi-square test for 2 × 2 table.
Mantel-Haenszel odds ratio estimator (1959)
ORMH =∑
k akdk/nk∑k bkck/nk
For 20 years, nobody knew what was the standard error forORMH .
Hauck (1979, Biometrics 35: 817-819) provided a formula that is
valid when each table are large.
Outcome
Exposure Yes No
Yes ak bk n1k
No ck dk n2k
m1k m2k nk
The popular variance formula for ORMH is the one derived by Robins,
Breslow, & Greenland (1986, Biometrics 42: 311-323).
Recall
ORMH =
∑k akdk/nk∑k bkck/nk
=
∑k Rk∑k Sk
Define two more terms Pk = (ak + dk)/nk and Qk = (bk + ck)/nk
var[ln( ORMH)] =
∑k PkRk
2(∑
k Rk
)2+
∑k PkSk +
∑k QkRk
2(∑
k Rk
) (∑k Sk
) +
∑k QkSk
2(∑
k Sk
)2
Outcome
Exposure Yes No
Yes ak bk n1k
No ck dk n2k
m1k m2k nk
Example (case-control study): ORMH . Case-control studies on the role
of high voltage power lines in the etiology of leukemia in children
(Hanley & Thriault, 2000 Epidemiology 11(5): 613)
Study 1 Study 2
Case Control Case Control
< 100m ak = 18 bk = 25 n1k = 43 12 123 135
> 100m ck = 162 dk = 252 n2k = 252 26 431 457
m1k = 180 m2k = 277 nk = 457 38 554 592
OR1 = 1.12 OR2 = 1.62
If not stratify, we have
Status
Case Control
< 100m a = 30 b = 148
> 100m c = 188 d = 683
OR = 30×683188×148 = 0.74, living closer to powerlines protects children
from leukemia. However,
ORMH =18×252
457 + 12×431592
25×162457 + 123×26
592
=18.6620914.26417
= 1.308319
Mantel-Haenszel technique has also been used to derive RR estimator
(Tarone, 1981 J Chronic Dis 34: 463–468):
RRMH =
∑k akn2k/nk∑k ckn1k/nk
with variance given by
var[ln( RRMH)] =
∑k [n1kn2km1k − akcknk] /n2
k(∑k akn2k/nk
) (∑k ckn1k/nk
)
Outcome
Exposure Yes No
Yes ak bk n1k
No ck dk n2k
m1k m2k nk
Ex 8.5. Example (Clinical trial): RRMH
Age 65+ Age 65-
Drug Yes No Yes No
B ak = 32 bk = 8 n1k = 40 24 36 60
A ck = 24 dk = 36 n2k = 60 8 32 40
m1k = 56 m2k = 44 nk = 100 32 68 100
χ2MH =
[∑k
(ak−n1km1k
nk
)]2
∑k
n1kn2km1km2kn2
k(nk−1)
= (32− 56×40100 +24− 32×60
100 )2
40×60×56×441002(100−1)
+ 60×40×32×681002(100−1)
= 18.435
RRMH =∑
k akn2k/nk∑k ckn1k/nk
=32×60100 + 24×40
10024×40100 + 8×60
100= 28.8
14.4 = 2.0
var[ln(RR)] =∑
k (n1kn2km1k−akcknk)/n2k
(∑k akn2k/nk)(
∑k ckn1k/nk)
=40×60×100−32×24×100
1002+ 60×40×32−24×8×100
1002
28.8×14.4 = 0.02777778
95% CI is then
exp(ln 2 ± 1.96√
0.0278) = (1.44, 2.77)
1.44 × 2.77 ≈ 4 = 22 (checked)
Summary: Application of Mantel-Haenszel methods
Adjust for confounding (the original purpose): combat Simpson’s
paradox;
Meta-analysis: considering each study as a stratum.
The method of meta-analysis is commonly referred to as fix-effect
model with intention of summarizing available evidence, but not to
predict future study results. For that, random-effect model must
have to be adopted.