confounding in epidemiology

Confounding in epidemiology

Maura Pugliatti, MD, PhD

Associate Professor of NeurologyDept. of Clinical and Experimental Medicine, Unit of Clinical Neurology

University of Sassari, Italy

1st International Course of NeuroepidemiologyChisinau, Moldova, 24-28 Sept. 2012

“Confounding, the situation in which an apparent effect of an exposure on risk is explained by its association with other factors, is probably the most important cause of spurious associations in observational epidemiology”BMJ Editorial: “The scandal of poor epidemiological research” BMJ 2004;329:868-869

Definitions

“Bias of the estimated effect of an exposure on an outcome, due to the presence of a common cause of the exposure and the outcome” Porta, 2008

Overview

Causality: central concern of epidemiology

Confounding: central concern when establishing causality

Four approaches to understand confounding

Avoiding and controlling for confounding is essential in health research

Causality

Main application of epidemiology:

to identify etiologic (causal) associations between exposure(s) and outcome(s)

Exposure Outcome

?

Adapted from: Maclure, M, Schneeweis S. Epidemiology 2001;12:114-122.

Causal Effect

Random Error

Confounding

Information bias (misclassification)

Selection bias

Bias in inference

Reporting & publication bias

Bias in knowledge use

Key biases in identifying causal effects:

RRcausal

“truth”RRassociation

Confounding: four approaches

1. “Mixing of effects”2. Based on a priori criteria (classical

approach)3. Data-based criteria4. “Counterfactual” and non-comparability

approaches

Overlapping

“Confounding is confusion, or mixing, of effects; the effect of the exposure is mixed together with the effect of another variable,

leading to bias”

Rothman KJ. Epidemiology. An introduction. Oxford: Oxford University Press, 2002

Latin: “confundere” = “to mix together”

Association between birth order and Down Syndrome

Data from Stark and Mantel (1966)

Association between maternal age and Down Syndrome


Association between maternal age and Down Syndrome, stratified by birth order


1. A confounder must be causally or non-causally associated with the exposure in the source population (study base) being studied;

C

E

2. A confounder must be a causal risk factor (or a surrogate measure of a cause) for the disease in the unexposed cohort; and

3. A confounder must not be an intermediate cause (not an intermediate step in the causal pathway between the exposure and the disease)

C

D

C DE X

A factor is a confounder if 3 criteria are met:

Exposure Disease (outcome)

E D

ConfounderC

Szklo M, Nieto JF. Epidemiology: Beyond the basics. Aspen Publishers, Inc., 2000.Gordis L. Epidemiology. Philadelphia: WB Saunders, 4th Edition.

Exposure

E DDiseaseIntermediate cause

C

Exposure

Confounder

Confounder:‘parent’ of the exposure not ‘daughter’ of the exposure!!!

E D

C

Disease

Birth Order Down Syndrome

Confounding factor:Maternal Age

E D

C

Simple causal graphs

E DC

Maternal age (C) can confound the association between multivitamin use (E) and the risk of certain

birth defects (D)

Hernan MA, et al. Causal knowledge as a prerequisite for confounding evaluation: an application to birth defects epidemiology. Am J Epidemiol 2002;155:176-84.

Complex causal graphs

Hernan MA, et al. Causal knowledge as a prerequisite for confounding evaluation: an application to birth defects epidemiology. Am J Epidemiol 2002;155:176-84.

E DC

U

History of birth defects (C) may increase the chance of periconceptional vitamin intake (E). A genetic factor (U) could have been the cause of previous birth defects in the family, and could again cause birth defects in the current pregnancy (D)

A

E D

C

Smoking

BMI

Calcium supplementation

Bone fractures

U

B

Physical Activity

Source: Hertz-Picciotto

More complicated causal graphs

A factor is a confounder if:a) the effect measure is homogeneous across the strata defined by the confounder and

b) the crude and common stratum-specific (adjusted) effect measures are unequal (“lack of collapsibility”)

Usually evaluated using 2x2 tables, and simple stratified analyses to compare crude effects with adjusted effects

“Collapsibility is equality of stratum-specific measures of effect with the crude (collapsed), unstratified measure” Porta, 2008, Dictionary

Crude vs. Adjusted Effects

Crude: does not take into account the effect of the confounder

Adjusted: accounts for the confounderMantel-Haenszel method estimator

Multivariate analyses (e.g. logistic regression)

Confounding is likely when:RRcrude =/= RRadjusted

ORcrude =/= ORadjusted

Crude 2 x 2 tableCalculate Crude OR (or RR)

Stratify by Confounder

Calculate OR’s for each stratum

If stratum-specific OR’s are similar,calculate adjusted OR (e.g. MH)

Crude

Stratum 1 Stratum 2

If Crude OR =/= Adjusted OR,confounding is likely

If Crude OR = Adjusted OR, confounding is unlikely

ORCrude

OR1 OR2

Stratified Analysis

Ideal “causal contrast” between exposed and unexposed groups:

“A causal contrast compares disease frequency under two exposure distributions, but in one target population during one etiologic time period”

If the ideal causal contrast is met, the observed effect is the “causal effect”

Maldonado & Greenland, Int J Epi 2002;31:422-29

Iexp

Iunexp

Exposed cohort

Ideal counterfactual comparison to determine causal effects:

RRcausal = Iexp / Iunexp

Maldonado & Greenland, Int J Epi 2002;31:422-29

Initial conditions are identical in the exposed and unexposed groups, except for presence of exposure (=cause)

Unexposed cohort

Iexp

Iunexp

Isubstitute

What happens in reality?

Exposed cohort

Unexposed cohort

Substitute, unexposed cohort

RRassoc = Iexp / Isubstitute

In this case:

RRassoc = Iexp / Isubstitute

RRcausal = Iexp / Iunexp IDEAL

ACTUAL

“Confounding is present if the substitute population represents imperfectly what the target would have been like under the counterfactual condition”

Simulating the counter-factual comparison:Experimental Studies: Randomized Clinical Trials

Randomization helps to make the groups “comparable” (i.e. similar initial conditions) with respect to known and unknown confounders

Confounding is unlikely at randomization - time t0

Disease +

Disease -

Disease +

Disease -

Treated individuals

Untreated individuals

compare ratesRandomization

Elig

ible

pop

ulat

ion

Disease +

Disease -

Disease +

Disease -

Exposed cohort

Unexposed cohort

compare rates

PRESENT FUTURE

Simulating the counter-factual comparison:Observational Studies: Cohort studies, case-control studies

In observational studies, because exposures are not assigned randomly, attainment of exchangeability is impossible – “initial conditions” are likely to be different and the groups may not be comparable

Confounding:Observational studies vs

randomized trials

Example:Aspirin to reduce cardiovascular mortality

Confounding: adjustment and controls

• Control at the design stage– Randomization– Restriction– Matching

• Control at the analysis stage– Conventional approaches

• Stratified analyses• Multivariate analyses

– Newer approaches• Graphical approaches using DAGs• Propensity scores• Instrumental variables• Marginal structural models

• Options at the design stage:

– Randomization• Reduces potential for confounding by generating groups that

are fairly comparable with respect to known and unknown confounding variables

– Restriction• Eliminates variation in the confounder (e.g. only recruiting one

gender)

– Matching• Involves selection of a comparison group that is forced to

resemble the index group with respect to the distribution of one or more potential confounders

Randomization

• Randomization– Only for intervention studies– Definition: random assignment of study subjects to

exposure categories– To control/reduce the effect of confounding variables

about which the investigator is unaware (i.e. both known and unknown confounders get distributed evenly because of randomization)

– Randomization does not always eliminate confounding• Covariate imbalance in small trials• “Maldistribution” of potentially confounding variables after

randomization (“Table I: Baseline characteristics” in the randomized trial)

Exposure Disease (outcome)

Confounder

Randomization breaks any linksbetween treatment and prognostic factors

E D

CRandomization

X

Restriction

• The distribution of the potential confounding factors does not vary across exposure or disease categories– An investigator may restrict study subjects to only those falling

with specific level(s) of a confounding variable

• Advantages of restriction– straightforward, convenient, inexpensive (but, reduces

recruitment!)

• Disadvantages of restriction– Limits number of eligible subjects– Limits ability to generalize the study findings– Residual confounding– Impossible to evaluate the relationship of interest at different

levels of the confounder

Matching

• Matching is commonly used in case-control studies

• Match on strong confounder• Types:

– Pair (individual) matching– Frequency matching

• The use of matching usually requires special analysis techniques (e.g. matched pair analyses and conditional logistic regression)

Matching

• Disadvantages of matching – Finding appropriate control subjects: difficult and

expensive and limit sample size– Confounder used to match subjects cannot be

evaluated with respect to the outcome/disease– Matching does not control for confounders other than

those used to match– The use of matching makes the use of stratified

analysis very difficult– Matching is most often used in case-control studies

(prohibitive in a large cohort study)– In a case-control study, matching may even introduce

confounding

Controlling Confounding:At the analysis stageConventional approaches

• Confounding is one type of bias that can be adjusted in the analysis (unlike selection and information bias)

• Options at the analysis stage:– Stratification– Multivariate methods

• To control for confounding in the analyses, confounders must be measured in the study

Confounding: control at the analysis stage

Stratification

• Produce groups within which the confounder does not vary

• Evaluate the exposure-disease association within each stratum of the confounder

0100200300400500600700800900

1000

Cases per 100000

1 2 3 4 5

Birth order

Cases of Down syndrom by birth order and mother's age

Source: www.epiet.org




If stratum-specific OR’s are similar,calculate adjusted OR (e.g. MH)

Crude

Stratum 1 Stratum 2

If Crude OR =/= Adjusted OR,confounding is likely

If Crude OR = Adjusted OR, confounding is unlikely

ORCrude

OR1 OR2

Stratified Analysis

• Confounding “pulls” the observed association away from the true association– It can either exaggerate/over-estimate the true

association (positive confounding)• Example

– ORcausal = 1.0– ORobserved = 3.0

or– It can hide/under-estimate the true association

(negative confounding)• Example

– ORcausal = 3.0– ORobserved = 1.0

Direction of Confounding

Multivariate Analysis

• Stratified analysis works best only in the presence of 1 or 2 confounders

• If the number of potential confounders is large, multivariate analyses offer the only real solution– Can handle large numbers of confounders (covariates)

simultaneously– Based on statistical regression “models”

• E.g. logistic regression, multiple linear regression

– Always done with statistical software packages

Residual confounding

• Confounding that can persist, even after adjustment

– Unmeasured confounding– Some variables were actually not confounders– Confounders were measured with error (eg.,

misclassification)– Categories of the confounder improperly defined

Effect modification and interaction

Maura Pugliatti, MD, PhD

Associate Professor of NeurologyDept. of Clinical and Experimental Medicine, Unit of Clinical Neurology

University of Sassari, Italy

1st International Course of NeuroepidemiologyChisinau, Moldova, 24-28 Sept. 2012

DefinitionBiological interaction

Effect modification (“effect-measure modification”)

Heterogeneity of effectsSubgroup effectsStatistical Interaction

Deviation from a specified model form (additive or multiplicative)

Biological interaction

“the interdependent operation of two or more biological causes to produce,

prevent or control an effect”[Porta, Dictionary, 2008]

Multicausality and interdependent effects

Disease processes tend to be multifactorial: “multicausality”

The “one-variable-at-a-time” perspective has several limitations

Confounding and effect modification: manifestations of multicausality

Schoenbach, 2000

Effect modification and statistical interaction

Two definitions (related):Based on homogeneity or heterogeneity of effects

Interaction occurs when the effect of a risk factor (X) on an outcome (Y) is not homogeneous in strata formed by a third variable (Z, effect modifier)

“Differences in the effect measure for one factor at different levels of another factor” [Porta, 2008]

This is often called “effect modification”

Based on the comparison between observed and expected joint effects of a risk factor and a third variableInteraction occurs when the observed joint effects of the risk

factor (X) and third variable (Z) differs from that expected on the basis of their independent effects

This is often called “statistical interaction”

Szklo & Nieto, Epidemiology: Beyond the basics. 2007

Definition based on homogeneity or heterogeneity of effects

Effect of exposure on the disease is modified depending on the value of a third variable:

the “effect modifier”

Exposure Disease

Effect modifier




Crude

Stratum 1 Stratum 2

If Crude OR =/= Adjusted OR,confounding is likely.Report Adjusted OR

If Crude OR = Adjusted OR, confounding is unlikely.

Report Crude OR

ORCrude

OR1 OR2

Stratified Analysis

If stratum-specific OR’s are the same or similar, calculate adjusted OR (e.g.

MH)

If stratum-specific OR’s are not similar, calculate adjusted OR (e.g. MH)

Effect modification is present.Report Stratum-specific OR

Confounding vs. interaction

Confounding is a problem we want to eliminate (control or adjust for) in our studyComparing crude vs. adjusted effect estimates

Interaction is a natural occurrence that we want to describe and study furtherComparing stratum-specific estimates

Heterogeneity of effects

Can occur at the level of: Individual study: within subgroups of a single study or

trialSeen in subgroup or stratified analyses within a study

Across studies: if several studies are done on the same topic, the effect measures may vary across studiesSeen in meta-analyses (across trials)

Definition based on the comparison between observed and expected joint effects of a risk

factor and a third variable

Deviation from additive or multiplicative joint effects

This is often called “statistical interaction”

Observed vs expected joint effects of a risk factor and a third variable

Szklo & Nieto, Epidemiology: Beyond the basics. 2007

No interaction

Positive interaction

Negative interaction

Deviation from additive or multiplicative joint effects

Interaction on an “additive” scale (additive interaction) Effect measure modification when risk difference is used as

measure of effect Additive statistical model:

Linear regression: y = a + b1x1 + b2x2

Interaction on a “multiplicative” scale (multiplicative interaction) Effect measure modification when risk ratio is used as measure

of effect Multiplicative statistical model:

Logistic regression:

Additive or multiplicative model?

The additive model underpins the methods for assessing biological interaction Interaction here is a departure from additivity of disease rates (risk

difference is the key measure) Risk difference scale is of greatest public health importance (based on

attributable risk)

Many of the models used in epidemiology are inherently multiplicative (e.g. logistic regression) Vast majority of epi analyses implicitly use the multiplicative scale (risk

ratio is the key measure) Because most epi studies report RR and OR estimates and use

regression models such as logistic and survival analyses – these models inherently use ratio measures and are therefore multiplicative

Ahlbom A et al. Eur J Epi 2005

Why is interaction/effect modification important?

Better understanding of causation

Identification of “high-risk” groups

Target interventions at specific subgroups

confounding in epidemiology

Documents

causal knowledge

causal effects

causal risk factor

causal pathway

etiologic causal associations

defects epidemiology

confounding evaluation

birth order data