study design topics for today different kinds of study designs and their advantages and...
Post on 22-Dec-2015
214 views
TRANSCRIPT
Study Design
Topics for today
• Different kinds of study designs and their advantages and disadvantages
• Power and sample size calculations
Readings
• Jewell Chapter 5
Conceptual frameworkTarget population – group on
whom we want to draw inferences
Study population – convenient population (possibly, but not necessarily a subpopulation of target population) from which we can sample
Study sample – actual individuals sampled for study
ExamplesStudy Target
populationStudy Population
Study Sample
Arsenic US population Southwest Taiwan
Population of 42 villages
NHANES US population US population 5000 households
Epileptic drugs
All pregnant women
Births at MGH 312 drug exposed, 96 seizure history and 506 controls
Megh All live births. US population for purpose of regulation
Faroes, Seychelles and New Zealand
Specific samples of recent births
Examples (cont’d)Study Target
populationStudy Population
Study Sample
Breast Cancer
Japanese women ? ?
CPP All births 12 Major Medical Centers in the US
Enrolled women
Issues• In essence, we are interested in the following table
P(E)? P(D)? Or their association?
• Differences between target and study population may lead to bias• How to appropriately draw samples from the study population• Depending on study design, we may not be able to estimate
everything
Diseased Not Diseased
Exposed
Unexposed
Types of study design
• Population-based study
• Cohort study
• Case-control study
Population-based studiesIdea is simple – choose a random sample from the study
population
How do we do this? It is easier said than done!– Door to door survey?
– Phone survey?
–
Ideally, want a sampling frame - listing of all subjects in the population.
Sampling frames can be hard to come by, except for small, well defined population (e.g. a list of students enrolled at a university)
Population-based studies (cont’d)
Many population-based studies (e.g. NHANES) use classic multi-stage probability-based sampling design, often exploiting population structure characterized through national census.
StateCensus tract
County
Census block group
Households enumerated within block
Can account for sampling probabilities in order to estimate disease and exposure prevalences for whole population
Population-based studies (cont’d)Can be• Retrospective - questionnaires and medical tests used
to assess past and current exposures and disease outcomes (NHANES). Allows estimation of disease prevalence. Incidence possible based on past reconstruction, but difficult and prone to bias.
• Prospective – exposures measured at the time of sampling, then subjects followed forward in time to assess disease onset. Allows estimation of disease incidence
NHANES is retrospective.CPP study has both a retrospective and a prospective component
Ecological studies A special kind of population-based study where
individuals are not examined, but exposures and possibly even outcomes measured at population level– Arsenic (exposure measured at village level)– Seatbelt laws vs state accident rates
Many people argue that ecological studies are a low quality epidemiological study because they are subject to confounding as well as ecological bias.
Cohort studiesOne of the most common epidemiological designs• Identify exposed and unexposed subgroups from
study population and select randomly• Measure disease outcome, other characteristics
and confounders. Disease can be assessed – At the time of sampling (prevalence)– Retroactively, e.g. when did person get disease– Prospectively (need to follow cohort forward in time)
• Cannot estimate exposure prevalence, but can estimate relative risks, odds ratios etc.
• Cannot estimate attributable risk
Examples of cohort studies• Epilepsy study – women identified at the time they
gave birth. Subsets of controls and exposed identified for more detailed study.
• Womens Health Initiative (University of Washington, Seattle) . This is a randomized cohort study since women are randomly assigned to Hormone Replacement Therapy (HRT) vs placebo
Case-control studiesLike a cohort study, except role of exposure and
disease reversed:
• Identify subpopulation of disease subjects
• Identify a subpopulation of non-diseased subjects
• Take random sample and measure exposure (and other characteristics and confounders of interest).
• Analyze with logistic regression, treating outcome as disease status. Why does this work?
Case/control theoryLet Y denote disease presense/absense
X denote exposure (yes or no)
indicate if subject was sampled
Case/control analysis is based on modeling
the probability of Y, given the observed exposure
and othe
0 1
r factors), conditional on being sampled
Pr(Y=1|X, =1)
Suppose that
logit(Pr(Y=1|X))= X
Case/control theory (cont’d)
0 11
0 1
0 11 0
0 1 0 1
0 1 1
Pr(Y=1|X, =1)
Pr(Y=1|X)Pr( =1|X,Y=1)=
Pr( =1|X)
Pr(Y=1|X)Pr( =1|X,Y=1)=
Pr(Y=1|X)Pr( =1|X,Y=1)+Pr(Y=0|X)Pr( =1|X,Y=0)
exp( )
1 exp( )exp( ) 1
1 exp( ) 1 exp( )
exp( )
exp
Xs
XX
s sX X
X s
0 1 1 0( )X s s
Case/control theory (cont’d)
0 1 1 0
0 1 1 0
*0 1
*0 1
*0 0 1 0
exp( ) /
exp( ) / 1
exp( )
exp( ) 1
where log( / )
X s s
X s s
X
X
s s
This result tells us that logistic regression applied to case/control data will result in the correct odds ratios associated with exposure X. Only the intercept will be affected by the sampling process. Note that if s1=s0
(cases and controls have same sampling prob), then the usual model applies.
Case/control practical challenges
While appealing in theory, case/control studies can be notoriously difficult to execute properly. Cases can usually be found easily (e.g. from hospitals, disease registries etc). Picking appropriate controls is extremely hard. Biggest challenge is ensuring that sampling mechanisms are random with respect to everything except disease status. Best illustrated with some examples.
Case/control example 1 Professor David Christiani from HSPH has
been conducting a case-control study in lung cancer.
Exposures of interest include smoking, genetic characteristics etc. Cases are patients diagnosed at Massachusetts General Hospital. Controls are spouses or friends of cases
Dr Christiani has successfully recruited about 1000 cases and 1000 controls and is getting good results for this study.
Case/control example 2 Dr Christiani has also been conducting a
study of arsenic and skin cancer in Taiwan. Exposures of interest are whether or not patients live in arsenic endemic area, as well as measures of toenail arsenic. Cases are patients diagnosed with other diseases at those same hospitals.
This was not a successful study in the sense that it did not show an association between skin cancer and arsenic. It was probably “overmatched”. Dr Christiani has another study underway in Bangladesh
Case/control example 3 Dr Christiani has also been conducting a
study of petrochemical exposure and childhood leukemia in Taiwan. Exposures of interest are whether or not patients live near a petrochemical plant. Cases are children diagnosed at one of the local hospitals. Cases are randomly from a list of population identification numbers maintained by the city and known to be independent with regard to geography.
Case/Control variants• Nested case/control. Cases and controls both
select from a larger cohort. Economical because data collected on large cohort can be very simple. But avoids bias associated with poor control selection since cohort is well defined. Several specialized strategies possible– Risk set sampling– Case-cohort
• Matching – pick controls to match certain case characteristics (e.g. age, sex).
Power and Sample Size Consider designing a study to assess association between binary
exposure X and binary outcome Y. Let subscript 1 denote control or unexposed and 2 denote exposed group. A popular version of the two-sample binomial is:
2 10 1 2 1 / 2
1 2
1 1 1 2 2 2
2 1
ˆ ˆR eject H : if
1 1(1 )( )
ˆ ˆ ˆ ˆwhere (1 ) / (1 ) / and is significance level.
Suppose true difference is = .
Power is the probability of satisfying this reject
p pp p z
p pn n
p p p n p p n
p p
2 1
ion rule
when the true probs in each group are and .p p
Power and Sample Size (cont’d)
1/ 2
1 21 / 2 1/ 2
1 1 2 21 1 2 2
1 21 2
1/ 2
1 21 / 2
1 1 2 21 1
1 2
1 1(1 )( )
1(1 ) (1 ) (1 ) (1 )
1 1(1 )( )
(1 ) (1 ) (1 )
Power
p pn n
zp p p p p p p p
n n n n
p pn n
zp p p p p p
n n n
1/ 2
2 2
1 2
(1 )p p
n
See SAS PROC POWER
SAS PROC POWERproc power;
twosamplefreq test=pchi
relativerisk=1.5
refproportion=.1
groupns=800 | 800
power=.;
run;