Sensitivity Analysis with Several UnmeasuredConfounders
Lawrence [email protected]
Faculty of Health Sciences, Simon Fraser University, Vancouver Canada
Spring 2015
OutlineThe problem of several unmeasured confounders
• Background: Sensitivity analysis for a single binaryunmeasured confounder.
• New methodology: Sensitivity analysis for severalunmeasured confounders.
• The role of Bayesian inference: We assign probabilitydistributions to sensitivity parameters.
BackgroundBinary Unmeasured Confounders
To explore sensitivity to unmeasured confounding, researchersoften assume that there is one unmeasured confounder(typically binary).
This approach dates back to Rosenbaum and Rubin (1983),and Lin, Psaty and Kronmal et al. (1998), and others.
The assumption of 1 binary unmeasured confounder isappealing because it is
• Tractable mathematically• Leads to simple bias-adjustment formulas• Low dimensional with few bias parameter inputs• Easy to explain, interpret and implement
BackgroundUnmeasured Confonding in the Real World
In reality, there are often several unmeasured confounders, andthere is little guidance in the statistical literature how toproceed.
Examples:
Statins and lower fracture risk in the elderly
Unmeasured confounders: health-seeking behaviors, BMI,physical activity, smoking, alcohol consumption
Lead exposure in childhood and lower IQ
Unmeasured confounders: pesticide exposure, breastfeeding,poor parenting, maternal depression, iron deficiency, tobbaccoexposure, poverty, and pica.
The Problem of Several Unmeasured Confounders
There are no methods available to explore sensitivity toseveral (say 5) unmeasured confounders.
Method input:“Please list your 5 confounders and their properties”Method output:Bias-corrected point and interval estimates for causal effects.
However: see notes below for other related methods
Why does problem of several unmeasuredconfounders remain unsolved?
The challenges:
1. We need to specify the relationship between 1) theconfounders and the outcome, 2) confounders and theexposure, (High dimensional).
2. The confounders may be continuous, nominal or ordinal(e.g. race or smoking). (More dimensions).
3. The unmeasured confounders may be correlated with oneanother. (Less important)
4. The unmeasured confounders may be correlated with themeasured covariates. (More important)
5. The confounders may interact with exposure. (Lessimportant)
Why does problem of several unmeasuredconfounders remain unsolved?
More challenges:
6. Even if the laundry-list of parameters are specified, wemust still impute out the U (intergrate out of the model).This integration is often not available analytically, except insimple cases (like in Lin et al.). Custom statistical softwareis required.
7. The inference is fundementally Bayesian because there isuncertainty in the bias parameters.
8. We require content area expertise because the problem isinherently qualitative. Statisticians often do not have thisexpertise.
Methods for sensitivity analysis for severalunmeasured confounders
• Greenland (2005) JRSSA, treats U as a compound ofother variables, or “sufficient summary”
• Rosenbaum (2002), Departures from random assignmentby factor Γ.
• McCandless (2012) JASA use validation data.• Hsu (2013) Biometrics, calibrated sensitivity analysis.• Deng (2013) Biometrika Cornfield conditions• Maclehose & Kaufman (2005) Epidemiol, linear
programming• Brumbeck (2004) Stat Med, Sensitivity analysis based on
potential outcomes.• Vanderweele & Arah (2011) Epidemiol, formulas for
general scalar U.
Greenland 2005 JRSSA
Lin et al. (1998) Biometrics
A New Methodology
Consider a prospective cohort study with equal follow-up,where
• Y be an continous outcome measure• X is a dichotomous 0/1 exposure at baseline• C is a p × 1 vector of measured covariates• U is a q × 1 vector of unmeasured confounders that are
quantitative and continuous
A New MethodologyBuilding on Lin et al. (1998), factoriseP(Y ,U|X ,C) = P(Y |X ,C,U)P(U|X ,C), and write
Y |X ,C,U = β0 + βX X + βTCC + βU
q∑j=1
Ui + ε
U|X ,C ∼ MVN{
(α0 + αX X )× 1,Σρ
},
where 1 is a q-vector of 1’s, and Σρ is a q × q covariance matrix
Σρ =
1 ρ . . . ρρ 1 . . . ρ. . . . . . 1 . . .ρ . . . . . . 1
with diagonal elements 1 and off-diagonals ρ (compoundsymmetric).
Key Assumptions
I will call these the Duplicate Unmeasured Confounder (DUC)assumptions.
1. U1,U2, . . . ,Uq are equicorrelated with Y2. U1,U2, . . . ,Uq are equicorrelated with X3. U1,U2, . . . ,Uq are equicorrelated with one another
Other assumptions: Linearity; absence of interactions;U ⊥⊥ C|X (zero correlation between measured andunmeasured confounders); no measurement error, ...
A New MethodologySensitivity analysis for several unmeasured confounders
Define a new variable U∗ =∑q
i=1 Ui/√
Θ, where
Θ = Var
( q∑i=1
Ui
∣∣∣∣∣X ,C)
= 1T Σρ1 = q(1 + ρ(q − 1))
.
The quantity U∗ is the sum of U1, . . . ,Uq rescaled to haveunit-variance, and normally distributed.
The idea: We replace the vector U with the scalar U∗, and weare then within the the general framework of Lin, Psaty &Kronmal (1998).
A New Methodology
Therefore the original model
E(Y |X ,C,U) = β0 + βX X + βTCC + βU
q∑j=1
Ui
U|X ,C ∼ MVN{
(α0 + αX X )× 1,Σρ
},
becomes the new model
E(Y = 1|X ,C,U∗) = β0 + βX X + βTCC + βU
√ΘU∗
U∗|X ,C ∼ N{
q(α0 + αX X )/√
Θ,1}.
and this is embedded within the original framework of Lin,Psaty & Kronmal (1998)
Inference
To conduct a sensitivity analysis, we can use maximumlikelihood calculated from the observed data likelihoodL(.) =
∏P(Yi |Xi ,C i) where
P(Y |X ,C) =
∫P(Y |X ,U,C)P(U|X ,C)dU
Lin et al (1998) show how to do the integration analytically forGaussian scalar U for linear, log-linear or logistic response Y ,so that we obtain, for example,
E(Y |X ,C) = β0 + βUα0 +[βX +
(βU√
Θ)×(
qαX/√
Θ)]
X +
βTCC
A New Methodology
Therefore the bias on the causal effect parameter βX from qunmeasured confounders under the DUC assumptions is equalto
Bias =(βU√
Θ)×(
qαX/√
Θ)
= qβUαX
Lin et al. (1998), Vanderweele & Arah (2011)
A new methodology
Consequently, within our modelling framework
1. The confounding bias from q unmeasured confounders isequal to q× the confounding bias of a single Ui (thus biasis additive).
2. The correlation among the U1, . . . ,Uq (which is ρ) does notaffect the magnitude of bias.
Results #2 is surprising, but makes sense.
Demonstration with NumbersOne simulated dataset
correlation <- 0.999 ## Correlation among the Us
k <- 10 ## Dimension of U
sigma <- matrix(correlation, nrow=k, ncol=k); diag(sigma) <- 1
n <- 10000 ## Sample size
X <- rbinom(n, 1, 0.5)
U <- X + matrix(rnorm(k*n), nrow=n, ncol=k) %*% chol(sigma)
Y <- rnorm(n, 0*X + apply(U, 1, sum), 1)
Demonstration with NumbersOne Simulated Dataset
Demonstration with Numbers ρ = 0.99999High Correlation Among Unmeasured Confounders
Demonstration with Numbers ρ = 0Zero Correlation Among Unmeasured Confounders
Conclusion
Within our modelling framework.
1. The confounding bias from q unmeasured confounders isequal to q× the confounding bias of a single Ui (thus biasis additive)
2. The correlation among the U1, . . . ,Uq (which is ρ) does notaffect the magnitude of bias.
Questions:
1. How general are these findings?2. How useful in practice?3. What about correlation between measure and unmeasured
confounders?
Bias from Several Unmeasured ConfoundersHow general are these findings?
• What about binary outcomes and survival data?• What if U1, . . . ,Uq are binary?• What if Σρ is not compound symmetric?• What about weakening the Duplicate Unmeasued
Confounder (DUC) assumption?
E.g.
Y = β0 + βX X + βTCC + βU1U1 + βU2U2 + . . .+ βUq Uq + ε
whereβU1 , . . . , βUq ∼ N(µ, σ2)
instead of
Y = β0 + βX X + βTCC + βU
q∑j=1
Ui + ε
Bias from Several Unmeasured Confounders
How useful in practice?
Rule of thumb (?):
If assume DUC then “k unmeasured confounders means ktimes more bias”
... always true???
Brief Comment on the Role of Bayesian Statistics
The Bayesian approach is useful because it quantifies theuncertainty about unmeasured confounding.
We assign a prior probability distribution to bias parameters.Bayesian theorem gives posterior credible intervals thatincorporate uncertainty from unmeasured confounding.
The Bayesian approach is useful to obtain simple summarizesin sensitivity analysis when there are multiple bias parameterinputs.
McCandless et al. (2007) Stat Med
Gustafson, Greenland (2009) Statistical Science
Thank You
Thank you