study design topics for today different kinds of study designs and their advantages and...

Study Design

Topics for today

• Different kinds of study designs and their advantages and disadvantages

• Power and sample size calculations

Readings

• Jewell Chapter 5

Conceptual frameworkTarget population – group on

whom we want to draw inferences

Study population – convenient population (possibly, but not necessarily a subpopulation of target population) from which we can sample

Study sample – actual individuals sampled for study

ExamplesStudy Target

populationStudy Population

Study Sample

Arsenic US population Southwest Taiwan

Population of 42 villages

NHANES US population US population 5000 households

Epileptic drugs

All pregnant women

Births at MGH 312 drug exposed, 96 seizure history and 506 controls

Megh All live births. US population for purpose of regulation

Faroes, Seychelles and New Zealand

Specific samples of recent births

Examples (cont’d)Study Target

populationStudy Population

Study Sample

Breast Cancer

Japanese women ? ?

CPP All births 12 Major Medical Centers in the US

Enrolled women

Issues• In essence, we are interested in the following table

P(E)? P(D)? Or their association?

• Differences between target and study population may lead to bias• How to appropriately draw samples from the study population• Depending on study design, we may not be able to estimate

everything

Diseased Not Diseased

Exposed

Unexposed

Types of study design

• Population-based study

• Cohort study

• Case-control study

Population-based studiesIdea is simple – choose a random sample from the study

population

How do we do this? It is easier said than done!– Door to door survey?

– Phone survey?

–

Ideally, want a sampling frame - listing of all subjects in the population.

Sampling frames can be hard to come by, except for small, well defined population (e.g. a list of students enrolled at a university)

Population-based studies (cont’d)

Many population-based studies (e.g. NHANES) use classic multi-stage probability-based sampling design, often exploiting population structure characterized through national census.

StateCensus tract

County

Census block group

Households enumerated within block

Can account for sampling probabilities in order to estimate disease and exposure prevalences for whole population

Population-based studies (cont’d)Can be• Retrospective - questionnaires and medical tests used

to assess past and current exposures and disease outcomes (NHANES). Allows estimation of disease prevalence. Incidence possible based on past reconstruction, but difficult and prone to bias.

• Prospective – exposures measured at the time of sampling, then subjects followed forward in time to assess disease onset. Allows estimation of disease incidence

NHANES is retrospective.CPP study has both a retrospective and a prospective component

Ecological studies A special kind of population-based study where

individuals are not examined, but exposures and possibly even outcomes measured at population level– Arsenic (exposure measured at village level)– Seatbelt laws vs state accident rates

Many people argue that ecological studies are a low quality epidemiological study because they are subject to confounding as well as ecological bias.

Cohort studiesOne of the most common epidemiological designs• Identify exposed and unexposed subgroups from

study population and select randomly• Measure disease outcome, other characteristics

and confounders. Disease can be assessed – At the time of sampling (prevalence)– Retroactively, e.g. when did person get disease– Prospectively (need to follow cohort forward in time)

• Cannot estimate exposure prevalence, but can estimate relative risks, odds ratios etc.

• Cannot estimate attributable risk

Examples of cohort studies• Epilepsy study – women identified at the time they

gave birth. Subsets of controls and exposed identified for more detailed study.

• Womens Health Initiative (University of Washington, Seattle) . This is a randomized cohort study since women are randomly assigned to Hormone Replacement Therapy (HRT) vs placebo

Case-control studiesLike a cohort study, except role of exposure and

disease reversed:

• Identify subpopulation of disease subjects

• Identify a subpopulation of non-diseased subjects

• Take random sample and measure exposure (and other characteristics and confounders of interest).

• Analyze with logistic regression, treating outcome as disease status. Why does this work?

Case/control theoryLet Y denote disease presense/absense

X denote exposure (yes or no)

indicate if subject was sampled

Case/control analysis is based on modeling

the probability of Y, given the observed exposure

and othe

0 1

r factors), conditional on being sampled

Pr(Y=1|X, =1)

Suppose that

logit(Pr(Y=1|X))= X

Case/control theory (cont’d)

0 1 1 0

0 1 1 0

*0 1

*0 1

*0 0 1 0

exp( ) /

exp( ) / 1

exp( )

exp( ) 1

where log( / )

X s s

X s s

X

X

s s

This result tells us that logistic regression applied to case/control data will result in the correct odds ratios associated with exposure X. Only the intercept will be affected by the sampling process. Note that if s1=s0

(cases and controls have same sampling prob), then the usual model applies.

Case/control practical challenges

While appealing in theory, case/control studies can be notoriously difficult to execute properly. Cases can usually be found easily (e.g. from hospitals, disease registries etc). Picking appropriate controls is extremely hard. Biggest challenge is ensuring that sampling mechanisms are random with respect to everything except disease status. Best illustrated with some examples.

Case/control example 1 Professor David Christiani from HSPH has

been conducting a case-control study in lung cancer.

Exposures of interest include smoking, genetic characteristics etc. Cases are patients diagnosed at Massachusetts General Hospital. Controls are spouses or friends of cases

Dr Christiani has successfully recruited about 1000 cases and 1000 controls and is getting good results for this study.

Case/control example 2 Dr Christiani has also been conducting a

study of arsenic and skin cancer in Taiwan. Exposures of interest are whether or not patients live in arsenic endemic area, as well as measures of toenail arsenic. Cases are patients diagnosed with other diseases at those same hospitals.

This was not a successful study in the sense that it did not show an association between skin cancer and arsenic. It was probably “overmatched”. Dr Christiani has another study underway in Bangladesh

Case/control example 3 Dr Christiani has also been conducting a

study of petrochemical exposure and childhood leukemia in Taiwan. Exposures of interest are whether or not patients live near a petrochemical plant. Cases are children diagnosed at one of the local hospitals. Cases are randomly from a list of population identification numbers maintained by the city and known to be independent with regard to geography.

Case/Control variants• Nested case/control. Cases and controls both

select from a larger cohort. Economical because data collected on large cohort can be very simple. But avoids bias associated with poor control selection since cohort is well defined. Several specialized strategies possible– Risk set sampling– Case-cohort

• Matching – pick controls to match certain case characteristics (e.g. age, sex).

Power and Sample Size Consider designing a study to assess association between binary

exposure X and binary outcome Y. Let subscript 1 denote control or unexposed and 2 denote exposed group. A popular version of the two-sample binomial is:

2 10 1 2 1 / 2

1 2

1 1 1 2 2 2

2 1

ˆ ˆR eject H : if

1 1(1 )( )

ˆ ˆ ˆ ˆwhere (1 ) / (1 ) / and is significance level.

Suppose true difference is = .

Power is the probability of satisfying this reject

p pp p z

p pn n

p p p n p p n

p p

2 1

ion rule

when the true probs in each group are and .p p

Power and Sample Size (cont’d)

1/ 2

1 21 / 2 1/ 2

1 1 2 21 1 2 2

1 21 2

1/ 2

1 21 / 2

1 1 2 21 1

1 2

1 1(1 )( )

1(1 ) (1 ) (1 ) (1 )

1 1(1 )( )

(1 ) (1 ) (1 )

Power

p pn n

zp p p p p p p p

n n n n

p pn n

zp p p p p p

n n n

1/ 2

2 2

1 2

(1 )p p

n

See SAS PROC POWER

SAS PROC POWERproc power;

twosamplefreq test=pchi

relativerisk=1.5

refproportion=.1

groupns=800 | 800

power=.;

run;

study design topics for today different kinds of study designs and their advantages and...

Documents