p1/[pl+p3] and p2/[p2+p4] cannot

4
The following paper will probably be a helpful guide in arriving at rapid estimates of sample sizes for retrospective studies. A GRAPH OF SAMPLE SIZES FOR RETROSPECTIVE STUDIES Gerald Chase, M.S., and Melville R. Klauber, Ph.D. IN retrospective studies, a group of cases with a disease and a group of well con- trols are examined for the presence or absence of a factor which possibly may be associated with the occurrence of the disease. Cornfield, Mantel, and Haens- zel1 2 discuss the biases that may enter retrospective studies and compare their attributes with those of prospective studies. In the latter, a group with the factor and a group without the factor are followed for a period of time and are observed for the occurrence of the disease. The sample numbers and population proportions are denoted as shown by the tables below: Table 1-Population Proportions Factor Disease + - Total + P1 P2 P1+P2 - P3 P4 P3 + P4 Total P1+ P3 P2 + P4 1 Table 2-Sample Numbers Factor Disease + - Total + a b a+b - c d c+d Total a+c b+d n It is assumed that the cases and con- trols are representative samples from a partition of a single population into two parts, one with and one free of the dis- ease. Of course, it is desirable, although not usually possible, to obtain all cases of the disease in the population. A test of association can be performed, using the corrected chi square with one degree of freedom, (I ad-bc -n/2)2n (a+b) (c+d) (a+c) (b+d) A measure of the association between the factor and the disease is given by the ratio of the proportion of the popula- tion with the factor having the disease to the proportion of those free of the fac- tor with the disease. R=P1(P2+P4) / [P2 (P1+ P3) ] - Cornfield has noted that when the disease is rare, R is well ap- proximated by r=P1P4/P2P3. This quantity, r, is referred to as the "rela- tive risk." The "vertical proportions" P1/[Pl+P3] and P2/[P2+P4] cannot be estimated from retrospective data; however, the "horizontal proportions" P3/[P3+P4] and P1/[P1+P2] can be estimated. Inferences on the magnitude of the relative risk, r, are based on the "horizontal proportions." Cornfield gives confidence intervals for r,3'4 which in- volve solving a quartic equation. To avoid this computation Cox5 recom- mends the logistic model when none of the entries in the contingency table is small, and gives approximate confidence DECEMBER. 1965 1 993

Upload: phambao

Post on 13-Feb-2017

247 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: P1/[Pl+P3] and P2/[P2+P4] cannot

The following paper will probably be a helpful guide in arriving at rapidestimates of sample sizes for retrospective studies.

A GRAPH OF SAMPLE SIZES FOR RETROSPECTIVE STUDIES

Gerald Chase, M.S., and Melville R. Klauber, Ph.D.

IN retrospective studies, a group of caseswith a disease and a group of well con-

trols are examined for the presence orabsence of a factor which possibly maybe associated with the occurrence of thedisease. Cornfield, Mantel, and Haens-zel1 2 discuss the biases that may enterretrospective studies and compare theirattributes with those of prospectivestudies. In the latter, a group with thefactor and a group without the factorare followed for a period of time andare observed for the occurrence of thedisease. The sample numbers andpopulation proportions are denoted asshown by the tables below:

Table 1-Population Proportions

FactorDisease + - Total

+ P1 P2 P1+P2- P3 P4 P3+ P4

Total P1+ P3 P2+ P4 1

Table 2-Sample Numbers

FactorDisease + - Total

+ a b a+b- c d c+d

Total a+c b+d n

It is assumed that the cases and con-trols are representative samples from apartition of a single population into twoparts, one with and one free of the dis-ease. Of course, it is desirable, althoughnot usually possible, to obtain all casesof the disease in the population. A testof association can be performed, usingthe corrected chi square with one degreeof freedom,

(I ad-bc -n/2)2n(a+b) (c+d) (a+c) (b+d)

A measure of the association betweenthe factor and the disease is given bythe ratio of the proportion of the popula-tion with the factor having the diseaseto the proportion of those free of the fac-tor with the disease. R=P1(P2+P4) /[P2 (P1+ P3) ] - Cornfield has noted thatwhen the disease is rare, R is well ap-proximated by r=P1P4/P2P3. Thisquantity, r, is referred to as the "rela-tive risk." The "vertical proportions"P1/[Pl+P3] and P2/[P2+P4] cannotbe estimated from retrospective data;however, the "horizontal proportions"P3/[P3+P4] and P1/[P1+P2] can beestimated. Inferences on the magnitudeof the relative risk, r, are based on the"horizontal proportions." Cornfield givesconfidence intervals for r,3'4 which in-volve solving a quartic equation. Toavoid this computation Cox5 recom-mends the logistic model when none ofthe entries in the contingency table issmall, and gives approximate confidence

DECEMBER. 1965 1 993

Page 2: P1/[Pl+P3] and P2/[P2+P4] cannot

intervals. The approximate 100 ( - a)per cent confidence interval for r isgiven by (eY1, eY2 ), where -yl and Y2are the solutions to the equations,

m+ya2=a+ (U 1-a/2 a+1/2),(a+c) (a+b)

m=-n

2 (a+c) (n-a-c) (a+b) (n-a-b)n2(n-1)

and U1 a/2 is the upper 100 ( 1-a/2) percent point of the normal distribution.Gart6 gives alternative methods of ob-taining confidence intervals for largeand small samples and compares a num-ber of existing methods. We now con-sider some methods for determiningadequate sample sizes for a test of as-sociation in a retrospective study. As-sume we wish a one-sided test at the alevel of significance with power 1-,B todetect a relative risk of magnitude C.Let P=P3/ [P3 + P4] be the proportionof the well population with the factor;P is often well approximated by thegeneral population proportion. The pro-portion of the diseased population withthe factor, given a relative risk ofmagnitude C, is P'=CP/(1-P+CP).Let P= (P+P') /2. The necessarysample size, N, where N is the numberof cases and controls is given by(1) N1/2= (pt_p)- {U1a [2QP(1_p)1/2

+U1X [P(I _P) +PI(I_ pl)]1/2}.The formula (1) is based on the normaltest for two proportions, which can beshown in the two-sided case to beequivalent to the uncorrected chi squaretest.

Mantel7 gives the necessary samplesizes for the case a =,8 by the formula(2) N'/[= tU 1-(t/(P`-P)I

{ (P+P')(l-P'/2-P/2)]112+[P(1-P) +P'(1P_ PI)1/2 }

which follows immediately from (1) .The necessary sample sizes we give

in the figure below are not based on(2) but on

(3)2

(N/2)l 2l-2 aresin (VP)-2 arcsin (VP') |.

The formula (3) is based on thenormal test for two proportions usingthe arcsin transformation. This formulahas some advantages in that the compu-tations are easier, (if one uses a table of2 arcsin VX)8 and in some instancesgives sample sizes smaller than (2).Even though the sample sizes are inclose agreement over part of the rangeof P, it is recommended that the testof significance based on the arcsintransformation given by R. A. Fisher9 beused when sample sizes are obtainedfrom (3). The critical region of thetest is

(4) (N/2) 1/2 { 2 arcsin [c/(c+d)-1/(2N)I 1/2

-2 arcsin [a/(a+b)-1/(2N) ] /2)}>U1Ca.The solid lines in Figure 1 show

the regions of the curves whereNP*(1-P*)27, and P*(1-P*)=min[P(1-P), P'(1-P')]. In these re-gions the normal approximation appearsto be good. The log x probability paperis used here only to give more accuratereadings for the small sample sizes andto allow a smaller size graph. Thefigure can be used to find sample sizesfor other values of a (here also thepower to detect a relative risk of magni-tude C is I-a) by multiplying tlhe

Table 3-K for Various Values of a

a K a K

0.050 1.000 0.006 2.3320.040 1.133 0.004 2.5590.025 1.420 0.002 3.0610.020 1.559 0.001 3.5280.010 1.999 0.0005 4.0020.008 2.145 0.0001 5.111

VOL. 55. NO. 12, A.J.P.H.1994

Page 3: P1/[Pl+P3] and P2/[P2+P4] cannot

SAMPLE SIZES FOR RETROSPECTIVE STUDIES

Figure I-Sample Sizes Necessary to Detect a Relative Risk of Magnitude C withProbability 0.95 and Level of Significance 0.05. (NP* (1-P*) .7 for C=4 and C =5in the Regions 11 per cent to 70 per cent and 26 per cent to 35 per cent, Respec-tively.)

1 2 20 o0 40 50 60 70 80 0n 9';

JXlt4A-4- -I A

14

2000WWgmg00qWg 0

700;m A M m--m.!i I riUtt lt+ r

600; - g TR 7410t14W 0

400tgg1!0003A f '] 0l t

90 3z 1-4

3 4 5 ME S 3WX~~~~~~~~~~~~~~~~~~-80.51 2 5 1 0 3 ° °7 0 9

DECEMBER, 1965

5;

PERCEIITAGE OF WELL POPULATION WITH THE: FACTOR

1995

Page 4: P1/[Pl+P3] and P2/[P2+P4] cannot

Figure 1 value by the appropriate K inTable 3.

For example, if a test is made at thea= 0.01 level, and it is desired to detecta relative risk of magnitude 3 withprobability 1 -/3= 0.99 (suppose P=0.40), the ordinate 74 is read fromFigure 1 at the point on the line C=3which has abscissa equal to 0.40. Thesample size for each population is thenN= 1.999.74_ 148 where K= 1.999 isread from Table 3.When P is known, the sample size can

easily be computed by (1), (2), or (3).The figure has its greatest usefulness insituations where P is not knownexactly. In this case, one can determineat a glance the range of values in P forwhich a given sample size is adequate.

Control Factors

In order to reduce the likelihood ofspurious association, it is usually desir-able to control for other factors whichcould introduce a bias. Each cell con-tained in a cross-classification under thecontrolling factors may be viewed in theframework of Table 1. If one wished todetect an over-all relative risk of magni-tude C, under certain specifications oflevel of significance and power, Table 3and Figure 1 give a conservative totalnumber of cases necessary, assuming atleast as many controls are chosen ineach classification. Mantel and Haenszel1discuss in detail the subject of control-ling for other factors, and give an over-all test of significance for the case

where there appears to be a consistentassociation between the disease and thefactor under study. Significance may betested by the corrected chi square withone degree of freedom,

( lat-ME(ai) I -1/2)2/7,Var (a,),where i indexes all classifications underthe control factors and

E(a,) = (a,+bi) (at +c,)/nh,(a +c,) (bt+d) (a, +b i) (ct+dd )

Var(al) ~~n2 (ni -1);

a suggested formula for the estimatedrelative risk is given by

Z- (aidt/nO)/Z (btct/ni).

ACKNOWLEDGMENT - The authors are in-debted to Mrs. Shinko Obata who performedthe computations for the figure.

REFERENCES1. Mantel, N., and Haenszel, W. Statistical Aspects of

the Analysis of Data from Retrospective Studies. J.Nat. Cancer Inst. 22:719-748, 1959.

2. Cornfield, J., and Haenszel, W. Some Aspects ofRetrospective Studies. J. Chronic Dis. 11:523-534,1960.

3. Cornfield, J. A Method of Estimating ComparativeRates from Clinical Data. Applications to Cancer ofthe Lung, Breast, and Cervix. J. Nat. Cancer Inst.11:1269-1275, 1951.

4. . A Statistical Problem Arising fromRetrospective Studies. Proc. Third Berkeley Sym-posium on Mathematical Statistics and Probability 4:135-148, 1956.

5. Cox, D. R. The Regression Analysis of Binary Se-quences. J. Roy. Stat. Soc. B 20:21S-232, 1958.

6. Gart, J. J. Approximate Confidence Limits for theRelative Risk. Ibid. B 24:454-463, 1962.

7. Dunn, J. E., Jr., and Buell, P. Association of CervicalCancer with Circumcision of Sexual Partner. J. Nat.Cancer Inst. 22:749-764, 1959.

8. Hald, A. Statistical Tables and Formulas. New York,N. Y.: John Wiley and Sons, Inc., 1952, p. 70.

9. . Statistical Theory with Engineering Ap-plications. New York, N. Y.: Wiley, 1952, pp. 685-686,705-710.

Dr. Klauber is on the staff of the California Cancer Field Research Program,California State Department of Public Health, Berkeley, Calif., and Mr. Chaseis at the Department of Statistics, Stanford University.

This study was supported by Grant No. C-5924 and Grant No. 63-34823 of theU. S. Public Health Service.

(Originally submitted for publication in December, 1963, this paper wasrevised and resubmitted in June, 1964.)

VOL. 55. NO. 12. A.J.P.H.1 996