bayesian distributed lag models: estimating effects of particulate matter air pollution on daily...

40
Bayesian Distributed Lag Models: Estimating Effects of Particulate Matter Air Pollution on Daily Mortality Leah J. Welty, PhD Department of Preventive Medicine Northwestern University Joint work with Scott Zeger, Roger Peng, Francesca Dominici Johns Hopkins Department of Biostatistics

Upload: anna-oneil

Post on 27-Mar-2015

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Bayesian Distributed Lag Models: Estimating Effects of Particulate Matter Air Pollution on Daily Mortality Leah J. Welty, PhD Department of Preventive

Bayesian Distributed Lag Models:Estimating Effects of Particulate Matter

Air Pollution on Daily Mortality

Leah J. Welty, PhDDepartment of Preventive Medicine

Northwestern University

Joint work with Scott Zeger, Roger Peng, Francesca Dominici

Johns Hopkins Department of Biostatistics

Page 2: Bayesian Distributed Lag Models: Estimating Effects of Particulate Matter Air Pollution on Daily Mortality Leah J. Welty, PhD Department of Preventive

Extreme Air Pollution Events

London Fog, December 1952When Smoke Ran Like Water, D. Davis, Perseus Books © 2002

Page 3: Bayesian Distributed Lag Models: Estimating Effects of Particulate Matter Air Pollution on Daily Mortality Leah J. Welty, PhD Department of Preventive

Time Series: Daily Deaths & Daily Air Pollution

Page 4: Bayesian Distributed Lag Models: Estimating Effects of Particulate Matter Air Pollution on Daily Mortality Leah J. Welty, PhD Department of Preventive

Current Air Quality Standards: Adverse Health Effects?

Energy Information Administration United States Dept. of Energy

United States EPA

Page 5: Bayesian Distributed Lag Models: Estimating Effects of Particulate Matter Air Pollution on Daily Mortality Leah J. Welty, PhD Department of Preventive

Time Series Models for Mortality on Air Pollution

log (expected mortality) ~

long term trends + season + weather + pollution + day of week

• Poisson regression

• Smooth functions of time, season, and weather

• GLM/GAM with natural/smoothing splines

Page 6: Bayesian Distributed Lag Models: Estimating Effects of Particulate Matter Air Pollution on Daily Mortality Leah J. Welty, PhD Department of Preventive

Distributed Lag Models (DLMs)

• These are time series models with lagged exposure variables as predictors.

• The acute health effects from pollution exposure may take a day or more to develop.

• The time from exposure to event may vary among individuals.

• Compared to DLMs, models with single day exposures may under or over estimate the risk of mortality.

Page 7: Bayesian Distributed Lag Models: Estimating Effects of Particulate Matter Air Pollution on Daily Mortality Leah J. Welty, PhD Department of Preventive

Distributed Lag Models

• Allow response to depend on exposure over several days

Effect of yesterday’s pollution on today’s mortality

“Total effect” = % increase in daily mortality associated w/ 1 unit increase in PM on previous days

Page 8: Bayesian Distributed Lag Models: Estimating Effects of Particulate Matter Air Pollution on Daily Mortality Leah J. Welty, PhD Department of Preventive

Distributed Lag Functions

Effect of unit increase in pollution 7 days ago on today’s mortality

lag i

Page 9: Bayesian Distributed Lag Models: Estimating Effects of Particulate Matter Air Pollution on Daily Mortality Leah J. Welty, PhD Department of Preventive

Estimating Distributed Lag Models

• Allow response to depend on exposure over several days

Difficult to estimate since x’s are correlated

Page 10: Bayesian Distributed Lag Models: Estimating Effects of Particulate Matter Air Pollution on Daily Mortality Leah J. Welty, PhD Department of Preventive

Example distributed lag functions

Yesterday and day before constrained to have same effect

Page 11: Bayesian Distributed Lag Models: Estimating Effects of Particulate Matter Air Pollution on Daily Mortality Leah J. Welty, PhD Department of Preventive

Constrained Distributed Lag Models

• Unconstrained

• Constrained

- Step function constraints

- Polynomial

- Spline

-Other

How do we set up constraints that are consistent with the ‘true’ distributed lag function?

Page 12: Bayesian Distributed Lag Models: Estimating Effects of Particulate Matter Air Pollution on Daily Mortality Leah J. Welty, PhD Department of Preventive

Air Pollution Distributed Lag Function:Prior Knowledge

• Constraint is application of prior knowledge

• Prior knowledge

Acute risk varies smoothly as function of lag

Acute risk goes to zero as time from exposure increases

• Not part of polynomial or spline constraints

i

Page 13: Bayesian Distributed Lag Models: Estimating Effects of Particulate Matter Air Pollution on Daily Mortality Leah J. Welty, PhD Department of Preventive

i

0 2 4 6 8 10 12 14

-0.0002

-0.0001

0

0.0001

0.0002

max likelihood (-0.00038)natural spline (-0.00042)smoothing spline (-0.00038)smoothing spline (-0.00038)

Estimation Problems (Chicago, PM10)

Page 14: Bayesian Distributed Lag Models: Estimating Effects of Particulate Matter Air Pollution on Daily Mortality Leah J. Welty, PhD Department of Preventive

Bayesian Distributed Lag Models: Outline

• Propose Bayesian Dist Lag Model (BDLM)Approximate posterior distribution

Gibbs sampler implementation

• Compare to other constraints (spline, poly)

• Apply to Chicago PM & Mortality

• BDLMs in context of smoothing

Page 15: Bayesian Distributed Lag Models: Estimating Effects of Particulate Matter Air Pollution on Daily Mortality Leah J. Welty, PhD Department of Preventive

1. No knowledge of early lag effects

2. Lag effects must eventually go to zero

3. Lag effects tend to zero smoothly

Bayesian Constraints for DLMs

Prior on distributed lag coefficients

Construct as to reflecti

Page 16: Bayesian Distributed Lag Models: Estimating Effects of Particulate Matter Air Pollution on Daily Mortality Leah J. Welty, PhD Department of Preventive

Constructing Distributed Lag Prior

1. No knowledge of early lag effects

2. Lag effects must eventually go to zero

Large Variances → Small Variances

3. Lag effects tend to zero smoothly

Uncorrelated → Correlated

Page 17: Bayesian Distributed Lag Models: Estimating Effects of Particulate Matter Air Pollution on Daily Mortality Leah J. Welty, PhD Department of Preventive

eta1 = -0.35

eta

2 =

-0

.37

0 2 4 6 8 10 12 140 2 4 6 8 10 12 140 2 4 6 8 10 12 140 2 4 6 8 10 12 140 2 4 6 8 10 12 140 2 4 6 8 10 12 140 2 4 6 8 10 12 140 2 4 6 8 10 12 140 2 4 6 8 10 12 140 2 4 6 8 10 12 14

eta1 = -0.2

0 2 4 6 8 10 12 140 2 4 6 8 10 12 140 2 4 6 8 10 12 140 2 4 6 8 10 12 140 2 4 6 8 10 12 140 2 4 6 8 10 12 140 2 4 6 8 10 12 140 2 4 6 8 10 12 140 2 4 6 8 10 12 140 2 4 6 8 10 12 14

eta1 = -0.05

0 2 4 6 8 10 12 140 2 4 6 8 10 12 140 2 4 6 8 10 12 140 2 4 6 8 10 12 140 2 4 6 8 10 12 140 2 4 6 8 10 12 140 2 4 6 8 10 12 140 2 4 6 8 10 12 140 2 4 6 8 10 12 140 2 4 6 8 10 12 14

eta

2 =

-0

.18

5

0 2 4 6 8 10 12 140 2 4 6 8 10 12 140 2 4 6 8 10 12 140 2 4 6 8 10 12 140 2 4 6 8 10 12 140 2 4 6 8 10 12 140 2 4 6 8 10 12 140 2 4 6 8 10 12 140 2 4 6 8 10 12 140 2 4 6 8 10 12 14 0 2 4 6 8 10 12 140 2 4 6 8 10 12 140 2 4 6 8 10 12 140 2 4 6 8 10 12 140 2 4 6 8 10 12 140 2 4 6 8 10 12 140 2 4 6 8 10 12 140 2 4 6 8 10 12 140 2 4 6 8 10 12 140 2 4 6 8 10 12 14 0 2 4 6 8 10 12 140 2 4 6 8 10 12 140 2 4 6 8 10 12 140 2 4 6 8 10 12 140 2 4 6 8 10 12 140 2 4 6 8 10 12 140 2 4 6 8 10 12 140 2 4 6 8 10 12 140 2 4 6 8 10 12 140 2 4 6 8 10 12 14

eta

2 =

0

0 2 4 6 8 10 12 140 2 4 6 8 10 12 140 2 4 6 8 10 12 140 2 4 6 8 10 12 140 2 4 6 8 10 12 140 2 4 6 8 10 12 140 2 4 6 8 10 12 140 2 4 6 8 10 12 140 2 4 6 8 10 12 140 2 4 6 8 10 12 14 0 2 4 6 8 10 12 140 2 4 6 8 10 12 140 2 4 6 8 10 12 140 2 4 6 8 10 12 140 2 4 6 8 10 12 140 2 4 6 8 10 12 140 2 4 6 8 10 12 140 2 4 6 8 10 12 140 2 4 6 8 10 12 140 2 4 6 8 10 12 14 0 2 4 6 8 10 12 140 2 4 6 8 10 12 140 2 4 6 8 10 12 140 2 4 6 8 10 12 140 2 4 6 8 10 12 140 2 4 6 8 10 12 140 2 4 6 8 10 12 140 2 4 6 8 10 12 140 2 4 6 8 10 12 140 2 4 6 8 10 12 14

Fast Variances → 0 Slow

Le

ss

Co

rre

lati

on

Mo

re

Simulated Dist Lag Functions from Prior

Page 18: Bayesian Distributed Lag Models: Estimating Effects of Particulate Matter Air Pollution on Daily Mortality Leah J. Welty, PhD Department of Preventive

Bayesian Model

If the likelihood for the distributed lag coefficients is normal, then

Page 19: Bayesian Distributed Lag Models: Estimating Effects of Particulate Matter Air Pollution on Daily Mortality Leah J. Welty, PhD Department of Preventive

Bayesian Model: Hyperparameters

If the likelihood for the distributed lags is normal, then this mixture is a mixture of normals

Posterior distribution for η determined from data (smoothness of DL function estimated from data)

Page 20: Bayesian Distributed Lag Models: Estimating Effects of Particulate Matter Air Pollution on Daily Mortality Leah J. Welty, PhD Department of Preventive

Simulation Study

• BDLM vs.unconstrained, maximum likelihoodpolynomial of degree 4p-spline estimated via GCV p-spline estimated via REML

• 25 different ‘true’ distributed lag functionssome consistent with prior knowledge

some not -- e.g. the dist lag fun a non-zero constant

• 500 outcome series for each function

Page 21: Bayesian Distributed Lag Models: Estimating Effects of Particulate Matter Air Pollution on Daily Mortality Leah J. Welty, PhD Department of Preventive

Comparing BDLM to Common Methods

black = true DL function

white = estimated DL function

gray = 95% posterior/confidence bands

Page 22: Bayesian Distributed Lag Models: Estimating Effects of Particulate Matter Air Pollution on Daily Mortality Leah J. Welty, PhD Department of Preventive

Comparing BDLM to Common Methods

black = truth DL function

white = estimated DL function

gray = 95% posterior/confidence bands

Page 23: Bayesian Distributed Lag Models: Estimating Effects of Particulate Matter Air Pollution on Daily Mortality Leah J. Welty, PhD Department of Preventive

Comparing BDLM to Common Methods: MSEs*

Distributed Lag

Function Bayes Poly GCV REML Bayes Poly GCV REML

* Expressed as a percent of the MSE for the MLE (unconstrained) Distributed Lag Model

TOTAL EFFECT LAG 14

Page 24: Bayesian Distributed Lag Models: Estimating Effects of Particulate Matter Air Pollution on Daily Mortality Leah J. Welty, PhD Department of Preventive

Comparing BDLM to Common Methods

• BDLM performs consistently well

Captures features of DL functionNarrower confidence bands

• When the goal is estimating the total effect

BDLM 10-15% betterBetter estimation at longer lags

Page 25: Bayesian Distributed Lag Models: Estimating Effects of Particulate Matter Air Pollution on Daily Mortality Leah J. Welty, PhD Department of Preventive

Application: Chicago Mortality & PM10

log (expected mortality) ~

long term trends + season + weather + pollution + day of week

everything else: trends, season, weather, dow

Page 26: Bayesian Distributed Lag Models: Estimating Effects of Particulate Matter Air Pollution on Daily Mortality Leah J. Welty, PhD Department of Preventive

Distributed Lag for PM10 & Mortality Chicago: Necessary Extensions

• Two problems

1. Likelihood Poisson → No closed form posterior

2. Prior on β?

• Two approaches

1. Pretend MLEs normal; ignore β uncertainty

2. Gibbs sampler

Page 27: Bayesian Distributed Lag Models: Estimating Effects of Particulate Matter Air Pollution on Daily Mortality Leah J. Welty, PhD Department of Preventive

Gibbs Sampler

Rather than estimating the full model all at once:

Alternate between:

1. Sampling θ (normal approximate as proposal), using last β in offset:

2. Sampling β (random walk Metropolis, flat prior), using last θ in offset:

Page 28: Bayesian Distributed Lag Models: Estimating Effects of Particulate Matter Air Pollution on Daily Mortality Leah J. Welty, PhD Department of Preventive

Posterior Mean & 95% Posterior Region PM10 on Mortality Chicago 1987-2000

Using last 4000 of 5000 iterations of Gibbs sampler

Page 29: Bayesian Distributed Lag Models: Estimating Effects of Particulate Matter Air Pollution on Daily Mortality Leah J. Welty, PhD Department of Preventive

Joint Posterior for Hyperparameters η PM10 on Mortality Chicago 1987-2000

Using last 4000 of 5000 iterations of Gibbs sampler

Fast Variances → 0 Slow

Mo

re

Co

rrel

atio

n

Les

s

Page 30: Bayesian Distributed Lag Models: Estimating Effects of Particulate Matter Air Pollution on Daily Mortality Leah J. Welty, PhD Department of Preventive

Estimation Method Comparison: Normal Approx vs Gibbs Sampler

Using last 4000 of 5000 iterations of Gibbs sampler

Page 31: Bayesian Distributed Lag Models: Estimating Effects of Particulate Matter Air Pollution on Daily Mortality Leah J. Welty, PhD Department of Preventive

Sensitivity of BDLM for Chicago to value of σ and prior on η

Altering σ Altering η

Page 32: Bayesian Distributed Lag Models: Estimating Effects of Particulate Matter Air Pollution on Daily Mortality Leah J. Welty, PhD Department of Preventive

BDLMs in Context of Smoothing

P-spline approach to smoothing distributed lag coefficients:

To estimate p-spline, minimize over :

Prior on dist lag coefficients → penalty matrix:

penalty

Page 33: Bayesian Distributed Lag Models: Estimating Effects of Particulate Matter Air Pollution on Daily Mortality Leah J. Welty, PhD Department of Preventive

BDLM in the Context of Smoothing

• Equivalent results from BDLMs & specific p-spline

use reasonably flexible spline basis

• P-spline penalties related to jumps in 3rd derivative

Difficult to relate to biological or prior knowledge

BDLMs transparent method for eliciting P-spline penalties consistent w/ objective function

Page 34: Bayesian Distributed Lag Models: Estimating Effects of Particulate Matter Air Pollution on Daily Mortality Leah J. Welty, PhD Department of Preventive

Conclusions: BDLMs

• Introduce flexible Bayesian DLM– Incorporates prior knowledge– Degree of smoothness of DLM est from data

• Simulation Study– smaller MSEs than comparable methods

• Relates to P-splines for DLMs– BDLM analogous to specific p-spline– Method of eliciting a prior distribution

Page 35: Bayesian Distributed Lag Models: Estimating Effects of Particulate Matter Air Pollution on Daily Mortality Leah J. Welty, PhD Department of Preventive

Conclusions: PM10 and Mortality Chicago 1987-2000

• Largest effect lag day 3 10 μg/m3 increase in PM10 three days previous associated with 0.17% increase daily mortality

• Total effect -0.21% (-0.86, 0.41) increase mortality

• Normal approximation and Gibbs sampler similar results

Large # daily deaths (~116), t = 1, … 5114

Less agreement for shorter time series, less normally distributed outcomes

Page 36: Bayesian Distributed Lag Models: Estimating Effects of Particulate Matter Air Pollution on Daily Mortality Leah J. Welty, PhD Department of Preventive

Future Directions: BDLMs

• Naturally extends to additional hierarchy

Peng, Dominici, Welty. “Estimating the time course of hospitalization risk associated with air pollution using a Bayesian hierarchical distributed lag model.” in press JRSS-C

• Missing data in exposure time series

Many cities air pollution 1/3 or 1/6 days

Page 37: Bayesian Distributed Lag Models: Estimating Effects of Particulate Matter Air Pollution on Daily Mortality Leah J. Welty, PhD Department of Preventive

Bayesian Distributed Lag Models:

Leah J. Welty, Roger D. Peng, Scott L. Zeger, and Francesa Dominici, “Bayesian Distributed

Lag Models: Estimating Effects of Particulate Matter Air Pollution on Daily Mortality”

Biometrics [Epub ahead of print April 2008]

Roger Peng, Francesca Dominici, and Leah J. Welty, “A Bayesian hierarchical distributed lag

model for estimating the time course of hospitalization risk association with particulate matter

air pollution” JRSS-C (in press)

NMMAPS (1987-2000) Data & Programs:

Roger D. Peng, Leah J. Welty, and Aidan McDermott, "The National Morbidity, Mortality, and

Air Pollution Study Database in R" (2004).

http://www.ihapss.jhsph.edu/data/NMMAPS/R/

Contact:

[email protected]

Page 38: Bayesian Distributed Lag Models: Estimating Effects of Particulate Matter Air Pollution on Daily Mortality Leah J. Welty, PhD Department of Preventive
Page 39: Bayesian Distributed Lag Models: Estimating Effects of Particulate Matter Air Pollution on Daily Mortality Leah J. Welty, PhD Department of Preventive

Constructing Prior: More Details

3. Lag effects tend to zero smoothly

X has desired correlation structure

--

-

Page 40: Bayesian Distributed Lag Models: Estimating Effects of Particulate Matter Air Pollution on Daily Mortality Leah J. Welty, PhD Department of Preventive

Comparing BDLM to Common Methods