encyclopedia of quantitative risk analysis and assessment || case–control studies

13
Case–Control Studies There are two primary types of nonexperimental studies in epidemiology. The first, the cohort study (see Cohort Studies) (also called the follow-up study or incidence study ), is a direct analog of the exper- iment. Different exposure groups are compared, but it is the investigator who selects subjects to observe, and classifies these subjects by exposure status, rather than assigning them to exposure groups. The sec- ond, the incident case–control study, or simply the case–control study, employs an extra step of sam- pling from the source population for cases. A cohort study includes all persons in the population at risk of becoming a study case. In contrast, a case-control study selects only a sample of those persons, and does so partly on the basis of their final disease sta- tus. Thus, by design, a person’s outcome influences their chance of becoming a subject in the case- control study. This extra sampling step can make a case–control study much more efficient than a cohort study of the same population, but it introduces a num- ber of subtleties and avenues for bias that are absent in typical cohort studies. Conventional wisdom about case–control studies is that they do not yield estimates of effect that are as valid as measures obtained from cohort studies. This thinking may reflect common misunderstand- ings in conceptualizing case–control studies, but it also reflects concern about quality of exposure infor- mation and biases in case or control selection. For example, if exposure information comes from inter- views, then cases will have usually reported the exposure information after learning of their diagno- sis, which can lead to errors in the responses that are related to the disease (recall bias). While it is true that recall bias does not occur in prospective cohort studies, neither does it occur in all case–control studies. Exposure information that is taken from records whose creation predated disease occurrence will not be subject to recall bias. Similarly, while a cohort study may log information on exposure for an entire source population at the outset of the study, it still requires tracing of subjects to ascertain exposure variation and outcomes, and the success of this trac- ing may be related to exposure. These concerns are analogous to case–control problems of loss of sub- jects with unknown exposure and to biased selection of controls and cases. Each study, whether cohort or case–control, must be considered on its own merits. Conventional wisdom also holds that cohort studies are useful for evaluating the range of effects related to a single exposure, while case–control stud- ies provide information only about the one disease that afflicts the cases. This thinking conflicts with the idea that case–control studies can be viewed sim- ply as more efficient cohort studies. Just as one can choose to measure more than one disease outcome in a cohort study, it is possible to conduct a set of case–control studies nested within the same pop- ulation using several disease outcomes as the case series. The case–cohort study (see the section titled Case–Cohort Studies”) is particularly well suited to this task, allowing one control group to be com- pared with several series of cases. Whether or not the case–cohort design is the form of case–control study that is used, case–control studies do not have to be characterized as being limited with respect to the number of disease outcomes that can be studied. For diseases that are sufficiently rare, cohort studies become impractical, and case–control studies become the only useful alternative. On the other hand, if exposure is rare, ordinary case–control studies are inefficient, and one must use methods that selectively recruit additional exposed subjects, such as special cohort studies or two-stage designs. If both the exposure and the outcome are rare, two-stage designs may be the only informative option, as they employ oversampling of both exposed and diseased subjects. Ideally, a case–control study can be conceptual- ized as a more efficient version of a corresponding cohort study. Under this conceptualization, the cases in the case – control study are the same cases as would ordinarily be included in the cohort study. Rather than including all of the experience of the source popula- tion that gave rise to the cases (the study base), as would be the usual practice in a cohort design, con- trols are selected from the source population. The sampling of controls from the population that gave rise to the cases affords the efficiency gain of a case–control design over a cohort design. The con- trols provide an estimate of the prevalence of the exposure and covariates in the source population. When controls are selected from members of the population who were at risk for disease at the begin- ning of the study’s follow-up period, the case – control odds ratio (see Odds and Odds Ratio) estimates the risk ratio that would be obtained from a cohort

Upload: brian-s

Post on 15-Dec-2016

214 views

Category:

Documents


2 download

TRANSCRIPT

Case–Control Studies

There are two primary types of nonexperimentalstudies in epidemiology. The first, the cohort study(see Cohort Studies) (also called the follow-up studyor incidence study), is a direct analog of the exper-iment. Different exposure groups are compared, butit is the investigator who selects subjects to observe,and classifies these subjects by exposure status, ratherthan assigning them to exposure groups. The sec-ond, the incident case–control study, or simply thecase–control study, employs an extra step of sam-pling from the source population for cases. A cohortstudy includes all persons in the population at riskof becoming a study case. In contrast, a case-controlstudy selects only a sample of those persons, anddoes so partly on the basis of their final disease sta-tus. Thus, by design, a person’s outcome influencestheir chance of becoming a subject in the case-control study. This extra sampling step can make acase–control study much more efficient than a cohortstudy of the same population, but it introduces a num-ber of subtleties and avenues for bias that are absentin typical cohort studies.

Conventional wisdom about case–control studiesis that they do not yield estimates of effect that areas valid as measures obtained from cohort studies.This thinking may reflect common misunderstand-ings in conceptualizing case–control studies, but italso reflects concern about quality of exposure infor-mation and biases in case or control selection. Forexample, if exposure information comes from inter-views, then cases will have usually reported theexposure information after learning of their diagno-sis, which can lead to errors in the responses that arerelated to the disease (recall bias). While it is truethat recall bias does not occur in prospective cohortstudies, neither does it occur in all case–controlstudies. Exposure information that is taken fromrecords whose creation predated disease occurrencewill not be subject to recall bias. Similarly, while acohort study may log information on exposure for anentire source population at the outset of the study, itstill requires tracing of subjects to ascertain exposurevariation and outcomes, and the success of this trac-ing may be related to exposure. These concerns areanalogous to case–control problems of loss of sub-jects with unknown exposure and to biased selection

of controls and cases. Each study, whether cohort orcase–control, must be considered on its own merits.

Conventional wisdom also holds that cohortstudies are useful for evaluating the range of effectsrelated to a single exposure, while case–control stud-ies provide information only about the one diseasethat afflicts the cases. This thinking conflicts with theidea that case–control studies can be viewed sim-ply as more efficient cohort studies. Just as one canchoose to measure more than one disease outcomein a cohort study, it is possible to conduct a setof case–control studies nested within the same pop-ulation using several disease outcomes as the caseseries. The case–cohort study (see the section titled“Case–Cohort Studies”) is particularly well suitedto this task, allowing one control group to be com-pared with several series of cases. Whether or notthe case–cohort design is the form of case–controlstudy that is used, case–control studies do not haveto be characterized as being limited with respect tothe number of disease outcomes that can be studied.

For diseases that are sufficiently rare, cohortstudies become impractical, and case–control studiesbecome the only useful alternative. On the other hand,if exposure is rare, ordinary case–control studies areinefficient, and one must use methods that selectivelyrecruit additional exposed subjects, such as specialcohort studies or two-stage designs. If both theexposure and the outcome are rare, two-stage designsmay be the only informative option, as they employoversampling of both exposed and diseased subjects.

Ideally, a case–control study can be conceptual-ized as a more efficient version of a correspondingcohort study. Under this conceptualization, the casesin the case–control study are the same cases as wouldordinarily be included in the cohort study. Rather thanincluding all of the experience of the source popula-tion that gave rise to the cases (the study base), aswould be the usual practice in a cohort design, con-trols are selected from the source population. Thesampling of controls from the population that gaverise to the cases affords the efficiency gain of acase–control design over a cohort design. The con-trols provide an estimate of the prevalence of theexposure and covariates in the source population.When controls are selected from members of thepopulation who were at risk for disease at the begin-ning of the study’s follow-up period, the case–controlodds ratio (see Odds and Odds Ratio) estimatesthe risk ratio that would be obtained from a cohort

2 Case–Control Studies

design. When controls are selected from membersof the population who were noncases at the timesthat each case occurs, or otherwise in proportionto the person-time accumulated by the cohort, thecase–control odds ratio estimates the rate ratio thatwould be obtained from a cohort design. Finally,when controls are selected from members of the pop-ulation who were noncases at the end of the study’sfollow-up period, the case–control odds ratio esti-mates the incidence odds ratio that would be obtainedfrom a cohort design. With each control selectionstrategy, the odds ratio calculation is the same, but themeasure of effect estimated by the odds ratio differs.Study designs that implement each of these controlselection paradigms will be discussed after topics thatare common to all designs.

Common Elements of Case–ControlStudies

In a cohort study, the numerator and denominatorof each disease frequency (incidence proportion,incidence rate, or incidence odds) are measured,which requires enumerating the entire population andkeeping it under surveillance. A case–control studyattempts to observe the population more efficiently byusing a control series in place of complete assessmentof the denominators of the disease frequencies. Thecases in a case–control study should be the samepeople who would be considered cases in a cohortstudy of the same population.

Pseudofrequencies and the Odds Ratio

The primary goal for control selection is that theexposure distribution among controls be the same asit is in the source population of cases. The rationalefor this goal is that, if it is met, we can use the controlseries in place of the denominator information inmeasures of disease frequency to determine the ratioof the disease frequency in exposed people relative tothat among unexposed people. This goal will be metif we can sample controls from the source populationsuch that the ratio of the number of exposed controls(B1) to the total exposed experience of the sourcepopulation is the same as the ratio of the number ofunexposed controls (B0) to the unexposed experienceof the source population, apart from sampling error.For most purposes, this goal need only be followed

within strata defined by the factors that are used forstratification in the analysis, such as factors used forrestriction or matching.

Using person-time to illustrate, the goal requiresthat B1 has the same ratio to the amount of exposedperson-time (T1) as B0 has to the amount of unex-posed person-time (T0):

B1

T1= B0

T0(1)

Here B1/T1 and B0/T0 are the control samplingrates – that is, the number of controls selected perunit of person-time. Suppose A1 exposed cases andA0 unexposed cases occur over the study period. Theexposed and unexposed rates are then

I1 = A1

T1(2)

and

I0 = A0

T0(3)

We can use the frequencies of exposed andunexposed controls as substitutes for the actualdenominators of the rates to obtain exposure-specificcase–control ratios, or pseudorates:

Pseudorate1 = A1

B1(4)

and

Pseudorate0 = A0

B0(5)

These pseudorates have no epidemiologic inter-pretation by themselves. Suppose, however, that thecontrol sampling rates B1/T1 and B0/T0 are equal tothe same value r , as would be expected if controlsare selected independently of exposure. If this com-mon sampling rate r is known, the actual incidencerates can be calculated by simple algebra, since apartfrom sampling error, B1/r should equal the amountof exposed person-time in the source population, andB0/r should equal the amount of unexposed person-time in the source population: B1/r = B1/(B1/T1) =T1 and B0/r = B0/(B0/T0) = T0. To get the inci-dence rates, we need only multiply each pseudorateby the common sampling rate, r .

If the common sampling rate is not known, whichis often the case, we can still compare the sizes of

Case–Control Studies 3

the pseudorates by division. Specifically, if we dividethe pseudorate for exposed by the pseudorate forunexposed, we obtain

Pseudorate1

Pseudorate0= A1/B1

A0/B0= A1/[(B1/T1)T1]

A0/[(B0/T0)T0]

= A1/(r · T1)

A0/(r · T0)= A1/T1

A0/T0(6)

In other words, the ratio of the pseudorates forthe exposed and unexposed is an estimate of theratio of the incidence rates in the source population,provided that the control sampling rate is independentof exposure. Thus, using the case–control studydesign, one can estimate the incidence rate ratioin a population without obtaining information onevery subject in the population. Similar derivationsin the section titled “Variants of the Case–ControlDesign” show that one can estimate the risk ratio bysampling controls from those at risk for disease atthe beginning of the follow-up period (case–cohortdesign) and that one can estimate the incidence oddsratio by sampling controls from the noncases at theend of the follow-up period (cumulative case–controldesign). With these designs, the pseudofrequenciescorrespond to the incidence proportions and incidenceodds, respectively, multiplied by common samplingrates.

There is a statistical penalty for using a sample ofthe denominators, rather than measuring the person-time experience for the entire source population:the precision of the estimates of the incidence rateratio from a case–control study is less than theprecision from a cohort study of the entire populationthat gave rise to the cases (the source population).Nevertheless, the loss of precision that stems fromsampling controls will be small if the number ofcontrols selected per case is large. Furthermore,the loss is balanced by the cost savings of nothaving to obtain information on everyone in thesource population. The cost savings might allow theepidemiologist to enlarge the source population andso obtain more cases, resulting in a better overallestimate of the incidence rate ratio, statistically andotherwise, than would be possible using the sameexpenditures to conduct a cohort study.

The ratio of the two pseudorates in a case–controlstudy is usually written as A1B0/A0B1, and issometimes called the cross-product ratio. The cross-product ratio in a case–control study can be viewed

as the ratio of cases to controls among the exposedsubjects (A1/B1), divided by the ratio of cases tocontrols among the unexposed subjects (A0/B0). Thisratio can also be viewed as the odds of being exposedamong cases (A1/A0) divided by the odds of beingexposed among controls (B1/B0), in which case it istermed the exposure odds ratio. While either inter-pretation will give the same result, viewing this oddsratio as the ratio of case–control ratios shows moredirectly how the control group substitutes for thedenominator information in a cohort study and howthe ratio of pseudofrequencies gives the same resultas the ratio of the incidence rates, incidence propor-tion, or incidence odds in the source population, ifsampling is independent of exposure.

Defining the Source Population

If the cases are a representative sample of all casesin a precisely defined and identified population, andthe controls are sampled directly from this sourcepopulation, the study is said to be population-basedor a primary-base study. For a population-basedcase–control study, random sampling of controls maybe feasible if a population registry exists or can becompiled. When random sampling from the sourcepopulation of cases is feasible, it is usually the mostdesirable option.

Random sampling of controls does not necessarilymean that every person should have an equal proba-bility of being selected to be a control. As explainedabove, if the aim is to estimate the incidence rateratio, then we would employ longitudinal (density)sampling, in which a person’s control selection prob-ability is proportional to the person’s time at risk.For example, in a case–control study nested withinan occupational cohort, workers on an employee ros-ter will have been followed for varying lengths oftime, and a random sampling scheme should reflectthis varying time to estimate the incidence rate ratio.

When it is not possible to identify the sourcepopulation explicitly, simple random sampling is notfeasible, and other methods of control selection mustbe used. Such studies are sometimes called studiesof secondary bases, because the source populationis identified secondarily to the definition of a case-finding mechanism. A secondary source population orsecondary base is therefore a source population thatis defined from (secondary to) a given case series.

4 Case–Control Studies

Consider a case–control study in which the casesare patients treated for severe psoriasis at the MayoClinic. These patients come to the Mayo Clinic fromall corners of the world. What is the specific sourcepopulation that gives rise to these cases? To answerthis question, we would have to know exactly whowould go to the Mayo Clinic, if he or she hadsevere psoriasis. We cannot enumerate this sourcepopulation because many people in it do not knowthemselves that they would go to the Mayo Clinicfor severe psoriasis, unless they actually developedsevere psoriasis. This secondary source might bedefined as a population spread around the worldthat constitutes those people who would go to theMayo Clinic if they developed severe psoriasis. Itis this secondary source from which the controlseries for the study would ideally be drawn. Thechallenge to the investigator is to apply eligibilitycriteria to the cases and controls so that there is goodcorrespondence between the controls and this sourcepopulation. For example, cases of severe psoriasisand controls might be restricted to those in countieswithin a certain distance of the Mayo Clinic, so thatat least a geographic correspondence between thecontrols and the secondary source population can beassured. This restriction might, however, leave veryfew cases for study.

Unfortunately, the concept of a secondary base isoften tenuously connected to underlying realities, andcan be highly ambiguous. For the psoriasis example,whether a person would go to the Mayo Clinicdepends on many factors that vary over time, suchas whether the person is encouraged to go by theirregular physicians and whether the person can affordto go. It is not clear, then, how or even whetherone could precisely define the secondary base, letalone draw a sample from it; thus it is not clearone could ensure that controls were members of thebase at the time of sampling. We therefore preferto conceptualize and conduct case–control studiesas starting with a well-defined source population,and then identify and recruit cases and controls torepresent the disease and exposure experience ofthat population. When one takes a case series asa starting point instead, it is incumbent upon theinvestigator to demonstrate that a source populationcan be operationally defined to allow the study to berecast and evaluated relative to this source. Similarconsiderations apply when one takes a control seriesas a starting point, as is sometimes done [1].

Case Selection

Ideally, case selection will amount to a directsampling of cases within a source population. There-fore, apart from random sampling, all people in thesource population who develop the disease of inter-est are presumed to be included as cases in thecase–control study. It is not always necessary, how-ever, to include all cases from the source population.Cases, like controls, can be randomly sampled forinclusion in the case–control study, so long as thissampling is independent of the exposure under studywithin the strata defined by the stratification factorsthat are used in the analysis. Of course, if fewer thanall cases are sampled, the study precision will belower in proportion to the sampling fraction.

The cases identified in a single clinic or treated bya single medical practitioner are possible case seriesfor case–control studies. The corresponding sourcepopulation for the cases treated in a clinic comprisesall people who would attend that clinic and wouldbe recorded with the diagnosis of interest, if theyhad the disease in question. It is important to specify“if they had the disease in question” because clinicsserve different populations for different diseases,depending on referral patterns and the reputation ofthe clinic in specific speciality areas. As noted above,without a precisely identified source population, itmay be difficult or impossible to select controls in anunbiased fashion.

Control Selection

The definition of the source population determinesthe population from which controls are sampled.Ideally, control selection will amount to a directsampling of people within the source population. Onthe basis of the principles explained above regardingthe role of the control series, many general rulesfor control selection can be formulated. Two basicrules are as follows: (a) Controls should be selectedfrom the same population – the source population –that gives rise to the study cases. If this rule cannotbe followed, there needs to be solid evidence thatthe population supplying controls has an exposuredistribution identical to that of the population thatis the source of cases, which is a very stringentdemand that is rarely demonstrable. (b) Within stratadefined by factors that are used for stratification in theanalysis, controls should be selected independently

Case–Control Studies 5

of their exposure status, in that the sampling rate forcontrols (r in the above discussion) should not varywith exposure.

If these rules and the corresponding case ruleare met, then the ratio of pseudofrequencies will,apart from sampling error, equal the ratio of thecorresponding measure of disease frequency in thesource population. If the sampling rate is known, thenthe actual measures of disease frequency can alsobe calculated [2]. Wacholder et al. have elaboratedon the principles of control selection in case–controlstudies [3–5].

When one wishes controls to represent person-time, sampling of the person-time should be constantacross exposure levels. This requirement implies thatthe sampling probability of any person as a controlshould be proportional to the amount of person-timethat person spends at risk of disease in the sourcepopulation. For example, if in the source populationone person contributes twice as much person-timeduring the study period as another person, the firstperson should have twice the probability of thesecond of being selected as a control.

This difference in probability of selection is auto-matically induced by sampling controls at a steadyrate per unit time over the period in which casesoccur (longitudinal, or density sampling), rather thanby sampling all controls at a point in time (such asthe start or end of the study). With longitudinal sam-pling of controls, a population member present fortwice as long as another will have twice the chanceof being selected.

If the objective of the study is to estimate a riskor rate ratio, it should be possible for a person tobe selected as a control and yet remain eligible tobecome a case, so that person might appear in thestudy as both a control and a case. This possibilitymay sound paradoxical or wrong, but is, nevertheless,correct. It corresponds to the fact that in a cohortstudy, a case contributes to both the numerator andthe denominator of the estimated incidence. If thecontrols are intended to represent person-time and areselected longitudinally, similar arguments show thata person selected as a control should remain eligibleto be selected as a control again, and thus might beincluded in the analysis repeatedly as a control [6, 7].

Common Fallacies in Control Selection

In cohort studies, the study population is restrictedto people at risk for the disease. Because they

viewed case–control studies as if they were cohortstudies done backwards, some authors argued thatcase–control studies ought to be restricted to those atrisk for exposure (i.e., those with exposure opportu-nity). Excluding sterile women from a case–controlstudy of an adverse effect of oral contraceptives andmatching for duration of employment in an occu-pational study are examples of attempts to controlfor exposure opportunity. Such restrictions do notdirectly address validity issues, and can ultimatelyharm study precision by reducing the number ofunexposed subjects available for study [8]. If the fac-tor used for restriction (e.g., sterility) is unrelatedto the disease, it will not be a confounder, and hencethe restriction will yield no benefit to the validity ofthe estimate of effect. Furthermore, if the restrictionreduces the study size, the precision of the estimateof effect will be reduced.

Another principle sometimes used in cohort stud-ies is that the study cohort should be “clean” atstart of follow-up, including only people who havenever had the disease. Misapplying this principle tocase–control design suggests that the control groupought to be “clean”, including only people who arehealthy, for example. Illness arising after the startof the follow-up period is not reason to excludesubjects from a cohort analysis, and such exclusioncan lead to bias. Similarly, controls with illness thatarose after exposure should not be removed from thecontrol series. Nonetheless, in studies of the rela-tion between cigarette smoking and colorectal cancer,certain authors recommended that the control groupshould exclude people with colon polyps, becausecolon polyps are associated with smoking and are pre-cursors of colorectal cancer [9]. But such exclusionreduces the prevalence of the exposure in the controlsbelow that in the actual source population of cases,and hence biases the effect estimates upward [10].

Sources for Control Series

The methods suggested below for control samplingapply when the source population cannot be explicitlyenumerated, so random sampling is not possible.These methods should only be implemented subjectto the reservations about secondary bases describedabove.

Neighborhood Controls. If the source populationcannot be enumerated, it may be possible to select

6 Case–Control Studies

controls through sampling of residences. This methodis not straightforward. Usually, a geographic rosterof residences is not available, so a scheme must bedevised to sample residences without enumeratingthem all. For convenience, investigators may samplecontrols who are individually matched to cases fromthe same neighborhood. That is, after a case isidentified, one or more controls residing in the sameneighborhood as that case are identified and recruitedinto the study. If neighborhood is related to exposure,the matching should be taken into account in theanalysis.

Neighborhood controls are often used when thecases are recruited from a convenient source, suchas a clinic or hospital. Such usage can introducebias, however, for the neighbors selected as controlsmay not be in the source population of the cases.For example, if the cases are from a particularhospital, neighborhood controls may include peoplewho would not have been treated at the same hospitalhad they developed the disease. If being treated at thehospital from which cases are identified is related tothe exposure under study, then using neighborhoodcontrols would introduce a bias. For any given study,the suitability of using neighborhood controls needsto be evaluated with regard to the study variables onwhich the research focuses.

Hospital- or Clinic-Based Controls. As notedabove, the source population for hospital- or clinic-based case–control studies is not often identifiable,since it represents a group of people who would betreated in a given clinic or hospital if they developedthe disease in question. In such situations, a randomsample of the general population will not necessarilycorrespond to a random sample of the source popula-tion. If the hospitals or clinics that provide the casesfor the study only treat a small proportion of cases inthe geographic area, then referral patterns to the hos-pital or clinic are important to take into account inthe sampling of controls. For these studies, a controlseries comprising patients from the same hospitalsor clinics as the cases may provide a less biasedestimate of effect than general-population controls(such as those obtained from case neighborhoods orby random-digit dialing). The source population doesnot correspond to the population of the geographicarea, but only to the people who would seek treat-ment at the hospital or clinic were they to develop thedisease under study. While the latter population may

be difficult or impossible to enumerate or even definevery clearly, it seems reasonable to expect that otherhospital or clinic patients will represent this sourcepopulation better than general-population controls.The major problem with any nonrandom sampling ofcontrols is the possibility that they are not selectedindependently of exposure in the source population.Patients hospitalized with other diseases, for example,may be unrepresentative of the exposure distributionin the source population either because exposure isassociated with hospitalization, or because the expo-sure is associated with the other diseases, or both. Forexample, suppose the study aims to evaluate the rela-tion between tobacco smoking and leukemia usinghospitalized cases. If controls are people hospitalizedwith other conditions, many of them will have beenhospitalized for conditions associated with smoking.A variety of other cancers, as well as cardiovascu-lar diseases and respiratory diseases, are related tosmoking. Thus, a control series of people hospital-ized for diseases other than leukemia would includea higher proportion of smokers than would the sourcepopulation of the leukemia cases.

Limiting the diagnoses for controls to conditionsfor which there is no prior indication of an associ-ation with the exposure improves the control series.For example, in a study of smoking and hospitalizedleukemia cases, one could exclude from the controlseries anyone who was hospitalized with a diseaseknown to be related to smoking. Such an exclusionpolicy may exclude most of the potential controls,since cardiovascular disease by itself would representa large proportion of hospitalized patients. Neverthe-less, even a few common diagnostic categories shouldsuffice to find enough control subjects, so that theexclusions will not harm the study by limiting thesize of the control series. Indeed, in limiting the scopeof eligibility criteria, it is reasonable to exclude cate-gories of potential controls even on the suspicion thata given category might be related to the exposure. Ifwrong, the cost of the exclusion is that the controlseries becomes more homogeneous with respect todiagnosis and perhaps a little smaller. But if right,then the exclusion is important to the ultimate validityof the study.

On the other hand, an investigator can rarely besure that an exposure is not related to a disease or tohospitalization for a specific diagnosis. Consequently,it would be imprudent to use only a single diagnosticcategory as a source of controls. Using a variety of

Case–Control Studies 7

diagnoses has the advantage of potentially dilutingthe biasing effects of including a specific diagnosticgroup that is related to the exposure.

Excluding a diagnostic category from the list ofeligibility criteria for identifying controls is intendedsimply to improve the representativeness of the con-trol series with respect to the source population.Such an exclusion criterion does not imply that thereshould be exclusions based on disease history [11].For example, in a case–control study of smoking andhospitalized leukemia patients, one might use hospi-talized controls but exclude any who are hospitalizedbecause of cardiovascular disease. This exclusioncriterion for controls does not imply that leukemiacases who have had cardiovascular disease shouldbe excluded; only if the cardiovascular disease wasa cause of the hospitalization, should the case beexcluded. For controls, the exclusion criterion shouldonly apply to the cause of the hospitalization usedto identify the study subject. A person who was hos-pitalized because of a traumatic injury and who isthus eligible to be a control would not be excludedif he or she had previously been hospitalized for car-diovascular disease. The source population includespeople who have had cardiovascular disease, and theyshould be included in the control series. Excludingsuch people would lead to an underrepresentation ofsmoking relative to the source population and pro-duce an upward bias in the effect estimates.

If exposure directly affects hospitalization (forexample, if the decision to hospitalize is in part basedon exposure history), the resulting bias cannot beremedied without knowing the hospitalization rates,even if the exposure is unrelated to the study diseaseor the control diseases. This problem was in factone of the first problems of hospital-based studiesto receive detailed analysis [12], and is often calledBerksonian bias.

Other Diseases. In many settings, especially inpopulations with established disease registries orinsurance-claims databases, it may be most conven-ient to choose controls from people who are diag-nosed with other diseases. The considerations neededfor valid control selection from other diagnoses par-allel those just discussed for hospital controls. It isessential to exclude any diagnoses known or sus-pected to be related to exposure, and better still toinclude only diagnoses for which there is some evi-dence to indicate they are unrelated to exposure.

These exclusion and inclusion criteria apply only tothe diagnosis that brought the person into the reg-istry or database from which controls are selected.The history of an exposure-related disease should notbe a basis for exclusion. If, however, the exposuredirectly affects the chance of entering the registry ordatabase, the study will be subject to the Berksonianbias mentioned earlier for hospital studies.

Other Considerations for Subject Selection

Representativeness. Some textbooks have stressedthe need for representativeness in the selection ofcases and controls. The advice has been that casesshould be representative of all people with the diseaseand that controls should be representative of theentire nondiseased population. Such advice can bemisleading. A case–control study may be restricted toany type of case that may be of interest: female cases,old cases, severely ill cases, cases that died soon afterdisease onset, mild cases, cases from Philadelphia,cases among factory workers, and so on. In noneof these examples would the cases be representativeof all people with the disease, yet, in each one,perfectly valid case–control studies are possible [13].The definition of a case can be virtually anything thatthe investigator wishes.

Ordinarily, controls should represent the sourcepopulation for cases, rather than the entire nondis-eased population. The latter may differ vastly fromthe source population for the cases by age, race, sex(e.g., if the cases come from a Veterans administra-tion hospital), socioeconomic status, occupation, andso on – including the exposure of interest. One ofthe reasons for emphasizing the similarities ratherthan the differences between cohort and case–controlstudies is that numerous principles apply to both typesof study, but are more evident in the context of cohortstudies. In particular, many principles relating to sub-ject selection apply identically to both types of study.For example, it is widely appreciated that cohortstudies can be based on special cohorts, rather than onthe general population. It follows that case–controlstudies can be conducted by sampling cases and con-trols from within those special cohorts. The resultingcontrols should represent the distribution of exposureacross those cohorts, rather than the general popu-lation, reflecting the more general rule that controlsshould represent the source population of the casesin the study, not the general population.

8 Case–Control Studies

Comparability of Information. Some authorshave recommended that information obtained aboutcases and controls should be of comparable or equalaccuracy, to ensure nondifferentiality (equal distri-bution) of measurement errors [3]. The rationale forthis principle is the notion that nondifferential mea-surement error biases the observed association towardthe null, and so will not generate a spurious associa-tion, and that bias in studies with nondifferential erroris more predictable than in studies with differentialerror.

The comparability-of-information principle isoften used to guide selection of controls and collec-tion of data. For example, it is the basis for usingproxy respondents instead of direct interviews for liv-ing controls, whenever case information is obtainedfrom proxy respondents. Unfortunately, in most set-tings, the arguments for the principle are logicallyunsound. For example, in a study that used proxyrespondents for cases, use of proxy respondents forthe controls might lead to greater bias than use ofdirect interviews with controls, even if measurementerror is differential. The comparability-of-informationprinciple is therefore applicable only under very lim-ited conditions. In particular, it would seem to be use-ful only when confounders and effect modifiers aremeasured with negligible error, and when measure-ment error is reduced by using comparable sourcesof information. Otherwise, the effect of forcing com-parability of information may be as unpredictable asthe effect of using noncomparable information.

Timing of Classification and Diagnosis. The prin-ciples for classifying persons, cases, and person-timeunits in cohort studies according to exposure statusalso apply to cases and controls in case–control stud-ies. If the controls are intended to represent person-time (rather than persons) in the source population,one should apply principles for classifying person-time to the classification of controls. In particular,principles of person-time classification lead to therule that controls should be classified by their expo-sure status as of their selection time. Exposuresaccrued after that time should be ignored. The rulenecessitates that information (such as exposure his-tory) be obtained in a manner that allows one toignore exposures accrued after the selection time. Ina similar manner, cases should be classified as oftime of diagnosis or disease onset, accounting for anybuilt-in lag periods or induction-period hypotheses.

Variants of the Case–Control Design

Nested Case–Control Studies

Epidemiologists sometimes refer to specific case–control studies as nested case–control studies whenthe population within which the study is conductedis a fully enumerated cohort, which allows formalrandom sampling of cases and controls to be car-ried out. The term is usually used in reference to acase–control study conducted within a cohort study,in which further information (perhaps from expensivetests) is obtained on most or all cases, but for econ-omy is obtained from only a fraction of the remainingcohort members (the controls). Nonetheless, manypopulation-based case–control studies can be thoughtof as nested within an enumerated source population.

Case–Cohort Studies

The case–cohort study is a case–control study inwhich the source population is a cohort, and everyperson in this cohort has an equal chance of beingincluded in the study as a control, regardless ofhow much time that person has contributed to theperson-time experience of the cohort or whetherthe person developed the study disease. This is alogical way to conduct a case–control study whenthe effect measure of interest is the ratio of incidenceproportions rather than a rate ratio, as is commonin perinatal studies. The average risk (or incidenceproportion) of falling ill during a specified period maybe written as

R1 = A1

N1(7)

for the exposed subcohort and

R0 = A0

N0(8)

for the unexposed subcohort, where R1 and R0 are theincidence proportions among the exposed and unex-posed, respectively, and N1 and N0 are the initialsizes of the exposed and unexposed subcohorts. (Thisdiscussion applies equally well to exposure variableswith several levels, but, for simplicity, we consideronly a dichotomous exposure.) Controls should beselected such that the exposure distribution amongthem will estimate without bias the exposure distribu-tion in the source population. In a case–cohort study,

Case–Control Studies 9

the distribution we wish to estimate is among theN1 + N0 cohort members, not among their person-time experience [14–16].

The objective is to select controls from the sourcecohort such that the ratio of the number of exposedcontrols (B1) to the number of exposed cohortmembers (N1) is the same as the ratio of thenumber of unexposed controls (B0) to the numberof unexposed cohort members (N0), apart fromsampling error:

B1

N1= B0

N0(9)

Here, B1/N1 and B0/N0 are the control samplingfractions (the number of controls selected per cohortmember). Apart from random error, these samplingfractions will be equal if controls have been selectedindependently of exposure.

We can use the frequencies of exposed andunexposed controls as substitutes for the actualdenominators of the incidence proportions to obtain“pseudorisks”:

Pseudorisk1 = A1

B1(10)

and

Pseudorisk0 = A0

B0(11)

These pseudorisks have no epidemiologic inter-pretation by themselves. Suppose, however, that thecontrol sampling fractions are equal to the samefraction, f . Then, apart from sampling error, Bl/f

should equal N1, the size of the exposed subco-hort; and B0/f should equal N0, the size of theunexposed subcohort: Bl/f = Bl/(B1/N1) = N1 andB0/f = B0/(B0/N0) = N0. Thus, to get the inci-dence proportions, we need only multiply each pseu-dorisk by the common sampling fraction, f . If thisfraction is not known, we can still compare the sizesof the pseudorisks by division:

Pseudorisk1

Pseudorisk0= A1/B1

A0/B0= A1/[(B1/N1)N1]

A0/[(B0/N0)N0]

= A1/f N1

A0/f N0= A1/N1

A0/N0(12)

In other words, the ratio of pseudorisks is anestimate of the ratio of incidence proportions (riskratio) in the source cohort if control sampling

is independent of exposure. Thus, using a case–cohort design, one can estimate the risk ratio in acohort without obtaining information on every cohortmember.

Thus far, we have implicitly assumed that thereis no loss to follow-up or competing risks in theunderlying cohort. If there are such problems, it isstill possible to estimate risk or rate ratios from acase–cohort study, provided that we have data onthe time spent at risk by the sampled subjects orwe use certain sampling modifications [17]. Theseprocedures require the usual assumptions for rate-ratio estimation in cohort studies, namely, that lossto follow-up and competing risks are either notassociated with exposure or not associated withdisease risk.

An advantage of the case–cohort design is thatit facilitates conduct of a set of case–control studiesfrom a single cohort, all of which use the same controlgroup. Just as one can measure the incidence rate ofa variety of diseases within a single cohort, one canconduct a set of simultaneous case–control studiesusing a single control group. A sample from thecohort is the control group needed to compare withany number of case groups. If matched controls areselected from people at risk at the time a case occurs(as in risk-set sampling, which is described in thesection titled “Density Case–Control Studies”), thecontrol series must be tailored to a specific groupof cases. To have a single control series serve manycase groups, another sampling scheme must be used.The case–cohort approach is a good choice in sucha situation.

Wacholder has discussed the advantages and dis-advantages of the case–cohort design relative to theusual type of case–control study [18]. One point tonote is that, because of the overlap of membershipin the case and control groups (controls who areselected may also develop disease and enter the studyas cases), one will need to select more controls in acase–cohort study than in an ordinary case–controlstudy with the same number of cases, if one isto achieve the same amount of statistical precision.Extra controls are needed because the statistical preci-sion of a study is strongly determined by the numbersof distinct cases and noncases. Thus, if 20% of thesource cohort members will become cases, and allcases will be included in the study, one will haveto select 1.25 times as many controls as cases in acase–cohort study to insure that there will be as many

10 Case–Control Studies

controls who never become cases in the study. Onaverage, only 80% of the controls in such a situationwill remain noncases; the other 20% will becomecases. Of course, if the disease is uncommon, thenumber of extra controls needed for a case–cohortstudy will be small.

Density Case–Control Studies

Earlier, we described how case–control odds ratioswill estimate rate ratios if the control series is selectedso that the ratio of the person-time denominatorsT1/T0 is validly estimated by the ratio of exposed tounexposed controls B1/B0. That is, to estimate rateratios, controls should be selected so that the exposuredistribution among them is, apart from random error,the same as it is among the person-time in thesource population. Such control selection is calleddensity sampling because it provides for estimationof relations among incidence rates, which have beencalled incidence densities.

If a subject’s exposure may vary over time, then acase’s exposure history is evaluated up to the timethe disease occurred. A control’s exposure historyis evaluated up to an analogous index time, usuallytaken as the time of sampling; exposure after the timeof selection must be ignored. This rule helps ensurethat the number of exposed and unexposed controlswill be in proportion to the amount of exposed andunexposed person-time in the source population.

The time during which a subject is eligible to bea control should be the time in which that personis also eligible to become a case, should the diseaseoccur. Thus, a person in whom the disease has alreadydeveloped or who has died is no longer eligible tobe selected as a control. This rule corresponds tothe treatment of subjects in cohort studies. Everycase that is tallied in the numerator of a cohortstudy contributes to the denominator of the rate untilthe time that the person becomes a case, when thecontribution to the denominator ceases. One way toimplement this rule is to choose controls from the setof people in the source population who are at risk ofbecoming a case at the time that the case is diagnosed.This set is sometimes referred to as the risk setfor the case, and this type of control sampling issometimes called risk-set sampling. Controls sampledin this manner are matched to the case with respectto sampling time; thus, if time is related to exposure,the resulting data should be analyzed as matched data

[19]. It is also possible to conduct unmatched densitysampling using probability sampling methods if oneknows the time interval at risk for each populationmember. One then selects a control by samplingmembers with probability proportional to time atrisk, and then randomly samples a time to measureexposure within the interval at risk.

As mentioned earlier, a person selected as acontrol, and who remains in the study populationat risk after selection should remain eligible to beselected once again as a control. Thus, althoughunlikely in typical studies, the same person mayappear in the control group two or more times.Note, however, that including the same person atdifferent times does not necessarily lead to exposure(or confounder) information being repeated, becausethis information may change with time. For example,in a case–control study of an acute epidemic ofintestinal illness, one might ask about food ingestedwithin the previous day or days. If a contaminatedfood item was a cause of the illness for some cases,then the exposure status of a case or control chosen5 days into the study might well differ from whatit would have been 2 days into the study when thesubject might also have been included as a control.

Cumulative (“Epidemic”) Case–Control Studies

In some research settings, case–control studies mayaddress a risk that ends before subject selectionbegins. For example, a case–control study of anepidemic of diarrheal illness after a social gatheringmay begin after all the potential cases have occurred(because the maximum induction time has elapsed).In such a situation, an investigator might selectcontrols from that portion of the population thatremains after eliminating the accumulated cases; thatis, one selects controls from among noncases (thosewho remain noncases at the end of the epidemicfollow-up).

Suppose that the source population is a cohortand that a fraction f of both exposed and unexposednoncases are selected to be controls. Then the ratioof pseudofrequencies will be

A1/B1

A0/B0= A1/f (N1 − A1)

A0/f (N0 − A0)= A1/(N1 − A1)

A0/(N0 − A0)(13)

which is the incidence odds ratio for the cohort. Thelatter ratio will provide a reasonable approximation tothe rate ratio, provided that the proportions falling ill

Case–Control Studies 11

in each exposure group during the risk period are low,that is, less than about 20%, and that the prevalence ofexposure remains reasonably steady during the studyperiod. If the investigator prefers to estimate the riskratio rather than the incidence rate ratio, the studyodds ratio can still be used [20], but the accuracyof this approximation is only about half as good asthat of the odds ratio approximation to the rate ratio[21]. The use of this approximation in the cumulativedesign is the basis for the common and mistakennotion that a rare-disease assumption is needed toestimate risk ratios in all case-control studies.

Prior to the 1970s, the standard conceptualiza-tion of case–control studies involved the cumulativedesign, in which controls are selected from noncasesat the end of a follow-up period. As discussed bynumerous authors [19, 22, 23], density designs andcase–cohort designs have several advantages outsideof the acute epidemic setting, including potentiallymuch less sensitivity to bias from exposure-relatedloss to follow-up.

Case-Specular and Case-Crossover Studies

When the exposure under study is defined by prox-imity to an environmental source (e.g., a power line),it may be possible to construct a specular (hypothet-ical) control for each case, by conducting a “thoughtexperiment”. Either the case or the exposure sourceis imaginarily moved to another location that wouldbe equally likely were there no exposure effect; thecase exposure level under this hypothetical config-uration is then treated as the (matched) “control”exposure for the case [24]. When the specular controlarises by examining the exposure experience of thecase outside of the time in which exposure could berelated to disease occurrence, the result is called acase-crossover study.

The classic crossover study is a type of experimentin which two (or more) treatments are compared,as in any experimental study. In a crossover study,however, each subject receives both treatments, withone following the other. Preferably, the order inwhich the two treatments are applied is randomlychosen for each subject. Enough time should beallocated between the two administrations so thatthe effect of each treatment can be measured andcan subside before the other treatment is given. Apersistent effect of the first intervention is calleda carryover effect. A crossover study is only valid

to study treatments for which effects occur withina short induction period and do not persist, i.e.,carryover effects must be absent, so that the effectof the second intervention is not intermingled withthe effect of the first.

The case-crossover study is a case–control ana-logue of the crossover study [25]. For each case, oneor more predisease or postdisease time periods areselected as matched “control” periods for the case.The exposure status of the case at the time of thedisease onset is compared with the distribution ofexposure status for that same person in the controlperiods. Such a comparison depends on the assump-tion that neither exposure nor confounders changewith time in a systematic way.

Only a limited set of research topics are amenableto the case-crossover design. The exposure mustvary over time within individuals, rather than stayconstant. If the exposure does not vary within aperson, then there is no basis for comparing exposedand unexposed time periods of risk within the person.Like the crossover study, the exposure must alsohave a short induction time and a transient effect;otherwise, exposures in the distant past could be thecause of a recent disease onset (a carryover effect).

Maclure [25] used the case-crossover design tostudy the effect of sexual activity on incident myocar-dial infarction. This topic is well suited to a case-crossover design because the exposure is intermittentand is presumed to have a short induction periodfor the hypothesized effect. Any increase in risk fora myocardial infarction from sexual activity is pre-sumed to be confined to a short time following theactivity. A myocardial infarction is an outcome wellsuited to this type of study because it is thought tobe triggered by events close in time.

Each case in a case-crossover study is automati-cally matched with its control on all characteristics(e.g., sex and birth date) that do not change withinindividuals. A matched analysis of case-crossoverdata automatically adjusts for all such fixed con-founders, whether or not they are measured. Con-trol for measured time-varying confounders is pos-sible using modeling methods for matched data. Itis also possible to adjust case-crossover estimatesfor bias owing to time trends in exposure throughuse of longitudinal data from a nondiseased controlgroup (case-time controls) [26]. Nonetheless, thesetrend adjustments themselves depend on additional

12 Case–Control Studies

no-confounding assumptions and may introduce biasif those assumptions are not met [27].

Two-Stage Sampling

Another variant of the case–control study uses two-stage or two-phase sampling [28, 29]. In this typeof study, the control series comprises a relativelylarge number of people (possibly everyone in thesource population), from whom exposure informationor perhaps some limited amount of informationon other relevant variables is obtained. Then, foronly a subsample of the controls, more detailedinformation is obtained on exposure or on otherstudy variables that may need to be controlled inthe analysis. More detailed information may alsobe limited to a subsample of cases. This two-stageapproach is useful when it is relatively inexpensiveto obtain the exposure information (e.g., by telephoneinterview), but the covariate information is moreexpensive to obtain (say, by laboratory analysis). Itis also useful when exposure information already hasbeen collected on the entire population (e.g., jobhistories for an occupational cohort), but covariateinformation is needed (e.g., genotype). This situationarises in cohort studies when more information isrequired than was gathered at baseline. This type ofstudy requires special analytic methods to take fulladvantage of the information collected at both stages.

Case–Control Studies with Prevalent Cases

Case–control studies are sometimes based on preva-lent cases rather than incident cases. When it isimpractical to include only incident cases, it maystill be possible to select existing cases of illnessat a point in time. If the prevalence odds ratio inthe population is equal to the incidence rate ratio,then the odds ratio from a case-control study basedon prevalent cases can unbiasedly estimate the rateratio. The conditions required for the prevalence oddsratio to equal the rate ratio are very strong, however,and a simple relation does not exist for age-specificratios. If exposure is associated with duration of ill-ness or migration out of the prevalence pool, thena case–control study based on prevalent cases can-not by itself distinguish exposure effects on diseaseincidence from the exposure association with diseaseduration or migration, unless the strengths of the lat-ter associations are known. If the size of the exposed

or the unexposed population changes with time orthere is migration into the prevalence pool, the preva-lence odds ratio may be further removed from the rateratio. Consequently, it is always preferable to selectincident rather than prevalent cases when studyingdisease etiology.

Prevalent cases are usually drawn in studies ofcongenital malformations. In such studies, casesascertained at birth are prevalent because they havesurvived with the malformation from the time ofits occurrence until birth. It would be etiologicallymore useful to ascertain all incident cases, includ-ing affected abortuses that do not survive until birth.Many of these, however, do not survive until ascer-tainment is feasible, and thus it is virtually inevitablethat case–control studies of congenital malforma-tions are based on prevalent cases. In this example,the source population comprises all conceptuses, andmiscarriage and induced abortion represent emigra-tion before the ascertainment date. Although an expo-sure will not affect duration of a malformation, it mayvery well affect risks of miscarriage and abortion.

Other situations in which prevalent cases arecommonly used are studies of chronic conditionswith ill-defined onset times and limited effects onmortality, such as obesity and multiple sclerosis, andstudies of health services utilization.

Conclusion

Epidemiologic research employs a range of studydesigns, including both experimental and nonex-perimental studies. Among nonexperimental studies,cohort designs are sometimes thought to be inherentlyless susceptible to bias than case–control designs.Nonetheless, most of the biases that are associatedwith case–control studies are not inherent to thedesign, nor are cohort studies immune from bias. Forexample, recall bias will not occur in case–controlstudy when exposure comes from records takenbefore disease onset, and selection bias can occurin cohort studies that suffer from loss to follow-up.No epidemiologic study is perfect, and this cautionapplies to cohort studies as well as case–control stud-ies. A clear understanding of the principles of studydesign is essential for valid study design, conduct,and analysis, and for proper interpretation of results,regardless of the design.

Case–Control Studies 13

Acknowledgment

This article is adapted from Modern Epidemiology, ThirdEdition, Rothman KJ, Greenland S, and Lash TL, eds.,Lippincott Williams & Wilkins, 2008.

References

[1] Greenland, S. (1985). Control-initiated case-controlstudies, International Journal of Epidemiology 14,130–134.

[2] Rothman, K.J. & Greenland, S. (1998). ModernEpidemiology, 2nd Edition, Lippincott, Philadelphia,Chapter 21.

[3] Wacholder, S., McLaughlin, J.K., Silverman, D.T. &Mandel, J.S. (1992). Selection of controls in case-controlstudies. I. Principles, American Journal of Epidemiology135, 1019–1028.

[4] Wacholder, S., McLaughlin, J.K., Silverman, D.T. &Mandel, J.S. (1992). Selection of controls in case-controlstudies. I. Principles, American Journal of Epidemiology135, 1029–1041.

[5] Wacholder, S., McLaughlin, J.K., Silverman, D.T. &Mandel, J.S. (1992). Selection of controls in case-controlstudies. I. Principles, American Journal of Epidemiology135, 1042–1050.

[6] Lubin, J.H. & Gail, M.H. (1984). Biased selection ofcontrols for case-control analyses of cohort studies,Biometrics 40, 63–75.

[7] Robins, J.M., Gail, M.H. & Lubin, J.H. (1986). More onbiased selection of controls for case-control analyses ofcohort studies, Biometrics 42, 293–299.

[8] Poole, C. (1986). Exposure opportunity in case-controlstudies, American Journal of Epidemiology 123,352–358.

[9] Terry, M.B. & Neugut, A.L. (1998). Cigarette smok-ing and the colorectal adenoma-carcinoma sequence: ahypothesis to explain the paradox, American Journal ofEpidemiology 147, 903–910.

[10] Poole, C. (1999). Controls who experienced hypotheticalcausal intermediates should not be excluded from case-control studies, American Journal of Epidemiology 150,547–551.

[11] Lubin, J.H. & Hartge, P. (1984). Excluding controls:misapplications in case-control studies, American Jour-nal of Epidemiology 120, 791–793.

[12] Berkson, J. (1946). Limitations of the application of 4-fold tables to hospital data, Biometrics Bulletin 2, 47–53.

[13] Cole, P. (1979). The evolving case-control study, Jour-nal of Chronic Diseases 32, 15–27.

[14] Thomas, D.B. (1972). Relationship of oral contracep-tives to cervical carcinogenesis, Obstetrics and Gyne-cology 40, 508–518.

[15] Kupper, L.L., McMichael, A.J. & Spirtas, R. (1975). Ahybrid epidemiologic design useful in estimating relativerisk, Journal of the American Statistical Association 70,524–528.

[16] Miettinen, O.S. (1982). Design options in epidemiologicresearch: an update, Scandinavian Journal of WorkEnvironment Health 8(Suppl. 1), 7–14.

[17] Flanders, W.D., DerSimonian, R. & Rhodes, P. (1990).Estimation of risk ratios in case-base studies withcompeting risks, Statistics in Medicine 9, 423–435.

[18] Wacholder, S. (1991). Practical considerations in choos-ing between the case-cohort and nested case-controldesign, Epidemiology 2, 155–158.

[19] Greenland, S. & Thomas, D.C. (1982). On the needfor the rare disease assumption in case-control studies,American Journal of Epidemiology 116, 547–553.

[20] Cornfield, J. (1951). A method of estimating compara-tive rates from clinical data. Application to cancer of thelung, breast and cervix, Journal of the National CancerInstitute 11, 1269–1275.

[21] Greenland, S. (1987). Interpretation and choice of effectmeasures in epidemiologic analysis, American Journalof Epidemiology 125, 761–768.

[22] Sheehe, P.R. (1962). Dynamic risk analysis in retro-spective matched-pair studies of disease, Biometrics 18,323–341.

[23] Miettinen, O.S. (1976). Estimability and estimation incase-referent studies, American Journal of Epidemiology103, 226–235.

[24] Zaffanella, L.E., Savitz, D.A., Greenland, S. & Ebi, K.L.(1998). The residential case-specular method to studywire codes, magnetic fields, and disease, Epidemiology9, 16–20.

[25] Maclure, M. (1991). The case-crossover design: amethod for studying transient effects on the risk ofacute events, American Journal of Epidemiology 133,144–153.

[26] Suissa, S. (1995). The case-time-control design, Epi-demiology 6, 248–253.

[27] Greenland, S. (1996). Confounding and exposure trendsin case-crossover and case-time-control designs, Epi-demiology 7, 231–239.

[28] Walker, A.M. (1982). Anamorphic analysis: samplingand estimation for confounder effects when both expo-sure and disease are known, Biometrics 38, 1025–1032.

[29] White, J.E. (1982). A two stage design for the studyof the relationship between a rare exposure and arare disease, American Journal of Epidemiology 115,119–128.

Related Articles

Absolute Risk Reduction

Epidemiology as Legal Evidence

History of Epidemiologic Studies

KENNETH J. ROTHMAN, SANDER GREENLAND

AND TIMOTHY L. LASH