biost 572 final talk - university of...

22
BIOST 572 Final Talk David Benkeser University of Washington Department of Biostatistics May 22, 2012 David Benkeser BIOST 572 Final Talk

Upload: others

Post on 26-Sep-2020

11 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: BIOST 572 Final Talk - University of Washingtonfaculty.washington.edu/heagerty/Courses/b572/public/... · 2012. 6. 1. · BIOST 572 Final Talk David Benkeser University of Washington

BIOST 572 Final Talk

David Benkeser

University of Washington Department of Biostatistics

May 22, 2012

David Benkeser BIOST 572 Final Talk

Page 2: BIOST 572 Final Talk - University of Washingtonfaculty.washington.edu/heagerty/Courses/b572/public/... · 2012. 6. 1. · BIOST 572 Final Talk David Benkeser University of Washington

When I was a boy there wereonly three kinds of sandwichesin common use - the ham, thechicken and the Swiss cheese.Others, to be sure, existed, but

it was only as oddities

HL Mencken (1880-1956)American essayist, journalist

David Benkeser BIOST 572 Final Talk

Page 3: BIOST 572 Final Talk - University of Washingtonfaculty.washington.edu/heagerty/Courses/b572/public/... · 2012. 6. 1. · BIOST 572 Final Talk David Benkeser University of Washington

Motivation

Non-linear, non-normal data, but we are still motivated to estimate“linear trend”

Define Bayesian analogue of frequentist methods used in thissituation: estimating equations and sandwich-based standard errors

Distinguish between fixed and random sampling

Informs about what the frequentist sandwich is actually measuring

David Benkeser BIOST 572 Final Talk

Page 4: BIOST 572 Final Talk - University of Washingtonfaculty.washington.edu/heagerty/Courses/b572/public/... · 2012. 6. 1. · BIOST 572 Final Talk David Benkeser University of Washington

MethodsDefining β

Suppose we have Y and X sampled from a distribution withdensity function λ and

Eλ(Y |X = x) = φ(X)

Define our quantity of interest as

β = argminα

Eλ[(φ(x)− xα)2

]i.e. the set of coefficients minimizing average squared error inapproximating the mean value of Y by linear function of X

David Benkeser BIOST 572 Final Talk

Page 5: BIOST 572 Final Talk - University of Washingtonfaculty.washington.edu/heagerty/Courses/b572/public/... · 2012. 6. 1. · BIOST 572 Final Talk David Benkeser University of Washington

MethodsBayesian Model Specification

Likelihood:

Y |X = x, φ(·), σ2(·) ∼ N(φ(x), σ2(x))

Priors specified such that:

p(λ(·), φ(·), σ2(·)) = pλ(λ(·)) pφ,σ2(φ(·), σ2(·))

This gives a posterior for φ(·) and λ(·):

π(λ(·), φ(·)|X,Y )

David Benkeser BIOST 572 Final Talk

Page 6: BIOST 572 Final Talk - University of Washingtonfaculty.washington.edu/heagerty/Courses/b572/public/... · 2012. 6. 1. · BIOST 572 Final Talk David Benkeser University of Washington

MethodsBayesian Model Specification

π(λ(·), φ(·)|X,Y ) induces a posterior for β

π(β) = π

(argmin

α

∫(φ(x)− xα)2λ(x)dx | X,Y

)Define point estimate by posterior mean

β = Eπ(β | X,Y )

and measure of uncertainty by posterior standard deviation

σβ = diag(Covπ(β | X,Y ))1/2

David Benkeser BIOST 572 Final Talk

Page 7: BIOST 572 Final Talk - University of Washingtonfaculty.washington.edu/heagerty/Courses/b572/public/... · 2012. 6. 1. · BIOST 572 Final Talk David Benkeser University of Washington

MethodsDiscrete Covariates

Let ξ = (ξ1, ..., ξK ) be K values covariates X can assume and nk

be the number of times Xi = ξk

Define density λ with support ξ of form

Pr (x = ξk ;λ(·)) = λk ,

K∑k=1

λk = 1

Define improper Dirichlet prior for λ(·)

pλ(λ(·)) ∝K∏

k=1

λ−1k

David Benkeser BIOST 572 Final Talk

Page 8: BIOST 572 Final Talk - University of Washingtonfaculty.washington.edu/heagerty/Courses/b572/public/... · 2012. 6. 1. · BIOST 572 Final Talk David Benkeser University of Washington

MethodsDiscrete Covariates

Which gives posterior that is also Dirichlet

pλ|X (λ(·)) ∝K∏

k=1

λ−1+nkk

This posterior is the Bayesian Bootstrap (Rubin, 1981)

Operationally similar to regular bootstrap

Instead of resampling, reweights observations

Why not just Bayesian Bootstrap the whole model?

David Benkeser BIOST 572 Final Talk

Page 9: BIOST 572 Final Talk - University of Washingtonfaculty.washington.edu/heagerty/Courses/b572/public/... · 2012. 6. 1. · BIOST 572 Final Talk David Benkeser University of Washington

MethodsDiscrete Covariates

Let φ(·) = (φ1, ..., φK ) and σ2(·) = (σ21, ..., σ2K )

Assign independent noninformative priors for (φk , σ2k )

If nk ≥ 4, φk has posterior t-distribution with

Eπ(φk | X,Y ) = yk

Varπ(φk | X,Y ) =1

nk (nk − 3)

∑i :Xi=ξk

(Yi − yk )2

David Benkeser BIOST 572 Final Talk

Page 10: BIOST 572 Final Talk - University of Washingtonfaculty.washington.edu/heagerty/Courses/b572/public/... · 2012. 6. 1. · BIOST 572 Final Talk David Benkeser University of Washington

MethodsDiscrete Covariates

It turns out thatβ →as (XTX)−1XTY

σβ − diag [(XTX)−1XT ΣX(XTX)−1]1/2 = o(n−1)

where

Σij =

{(Yi − Xi (X

TX)−1XTY )2 if i=j0 else

David Benkeser BIOST 572 Final Talk

Page 11: BIOST 572 Final Talk - University of Washingtonfaculty.washington.edu/heagerty/Courses/b572/public/... · 2012. 6. 1. · BIOST 572 Final Talk David Benkeser University of Washington

MethodsDiscrete Covariates

Posterior of φk can be split into (uncorrelated) deterministic andrandom components

φk = yk + ε

When calculating posterior variance of β we can split into twocovariance terms

Covπ(β) = Covπ(Ex ;λ[xTx]−1Ex ;λ[xT y(x)]|X,Y

)+ Covπ

(Ex ;λ[xTx]−1Ex ;λ[xT ε(x)]|X,Y

)→as diag [(XTX)−1XT (Σ′ + Σ)X(XTX)−1]

David Benkeser BIOST 572 Final Talk

Page 12: BIOST 572 Final Talk - University of Washingtonfaculty.washington.edu/heagerty/Courses/b572/public/... · 2012. 6. 1. · BIOST 572 Final Talk David Benkeser University of Washington

MethodsDiscrete Covariates

Three “meat” matrices:

Σ′ii =1

nk − 3

∑i :Xi=ξk

(Yi − yk )2

Σii = (Yi − Xi (XTX)−1XT Y )2

Σii = Σ′ii + Σii = (Yi − Xi (XTX)−1XTY )2

Classic sandwich (Σ) accounts for residual errors (Σ′) as well asthe errors due to non-linearity φ (Σ)

David Benkeser BIOST 572 Final Talk

Page 13: BIOST 572 Final Talk - University of Washingtonfaculty.washington.edu/heagerty/Courses/b572/public/... · 2012. 6. 1. · BIOST 572 Final Talk David Benkeser University of Washington

MethodsFixed Design Matrix

Replace random density λ with deterministic density

λfixed ;k =nk

n

and proceed as before defining our quantity of interest as

βfixed = argminα

∫(φ(x)− xα)2λfixed (x)dx

Posterior mean is our point estimate and posterior standarddeviation is measure of uncertainty

David Benkeser BIOST 572 Final Talk

Page 14: BIOST 572 Final Talk - University of Washingtonfaculty.washington.edu/heagerty/Courses/b572/public/... · 2012. 6. 1. · BIOST 572 Final Talk David Benkeser University of Washington

MethodsFixed Design Matrix

It turns out that βfixed is exactly the OLS estimator and

σβ,fixed = diag [(XTX)−1XT Σ′X(XTX)−1]

where Σ′ is diagonal matrix

Σ′ij =

{ 1nk−3

∑i :Xi=ξk

(Yi − yk )2 if i=j

0 else

Accounts for variation of Y |X around its mean only and not errordue to non-linearity in φ (which does not change between samples)

David Benkeser BIOST 572 Final Talk

Page 15: BIOST 572 Final Talk - University of Washingtonfaculty.washington.edu/heagerty/Courses/b572/public/... · 2012. 6. 1. · BIOST 572 Final Talk David Benkeser University of Washington

MethodsContinuous Covariates

Need some constraints on φ(·) and σ2(·) for identifiability

In applied setting, these functions can be approximated bysemi-parametric smoothing methods

Use penalized O’Sullivan splines (Wand and Ormerod, 2008) tomodel φ(·) and σ2(·)

φ(x ; a) = α0 + α1xi +Q∑

q=1

aqZiq

logσ(x ; b) = γ0 + γ1xi +Q∑

q=1

bqZiq

David Benkeser BIOST 572 Final Talk

Page 16: BIOST 572 Final Talk - University of Washingtonfaculty.washington.edu/heagerty/Courses/b572/public/... · 2012. 6. 1. · BIOST 572 Final Talk David Benkeser University of Washington

MethodsContinuous Covariates

Define diffuse priors on aq and bq and use OpenBUGS to simulatefrom posterior

For prior on λ use limiting case of Dirichlet process, which givesBB distribution as posterior

Expect that similar results will hold as in the discrete case.Simulations give supporting evidence.

David Benkeser BIOST 572 Final Talk

Page 17: BIOST 572 Final Talk - University of Washingtonfaculty.washington.edu/heagerty/Courses/b572/public/... · 2012. 6. 1. · BIOST 572 Final Talk David Benkeser University of Washington

ResultsSimulations (n=400)

−10 −5 0 5 10

−40

−20

020

40

Linear and Homoscedastic

−10 −5 0 5 10

−50

050

Linear and Heteroscedastic

−10 −5 0 5 10

−50

050

Non−linear and Homoscedastic

−10 −5 0 5 10

−10

0−

500

5010

0

Non−linear and Heteroscedastic

David Benkeser BIOST 572 Final Talk

Page 18: BIOST 572 Final Talk - University of Washingtonfaculty.washington.edu/heagerty/Courses/b572/public/... · 2012. 6. 1. · BIOST 572 Final Talk David Benkeser University of Washington

ResultsSimulations (n=400)

8590

9510

0

Linear and Homoscedastic

Sampling

Nom

inal

95%

Cov

erag

e

ModelSandwichBayes

Random Fixed

8590

9510

0

Linear and Heteroscedastic

Sampling

Nom

inal

95%

Cov

erag

e

Random Fixed

8590

9510

0

Non−linear and Homoscedastic

Sampling

Nom

inal

95%

Cov

erag

e

Random Fixed

8590

9510

0

Non−linear and Heteroscedastic

Sampling

Nom

inal

95%

Cov

erag

e

Random Fixed

David Benkeser BIOST 572 Final Talk

Page 19: BIOST 572 Final Talk - University of Washingtonfaculty.washington.edu/heagerty/Courses/b572/public/... · 2012. 6. 1. · BIOST 572 Final Talk David Benkeser University of Washington

ResultsHealth Care Cost Data

0 10 20 30 40 50 60

020

000

4000

0

Outpatient Health Costs

Age

Ann

ual C

ost (

dolla

rs)

0 10 20 30 40 50 60

500

1500

2500

Smoothed Outpatient Health Costs

Age

Ann

ual c

ost (

dolla

rs)

O'Sullivan Splines (posterior mean)O'Sullivan Splines (posterior sample)

0 10 20 30 40 50 60

1000

3000

Smoothed Standard Deviation

Age

Sta

ndar

d D

evia

tion O'Sullivan Splines (posterior mean)

O'Sullivan Splines (posterior sample)

David Benkeser BIOST 572 Final Talk

Page 20: BIOST 572 Final Talk - University of Washingtonfaculty.washington.edu/heagerty/Courses/b572/public/... · 2012. 6. 1. · BIOST 572 Final Talk David Benkeser University of Washington

ResultsHealth Care Cost Data

1214

1618

20

Linear regression of average annual outpatient health care cost on age

Continuous Discrete

β (9

5% C

I)

Model Sandwich Bayes(random) Bayes(fixed) Bayes(random) Bayes(fixed)

David Benkeser BIOST 572 Final Talk

Page 21: BIOST 572 Final Talk - University of Washingtonfaculty.washington.edu/heagerty/Courses/b572/public/... · 2012. 6. 1. · BIOST 572 Final Talk David Benkeser University of Washington

Conclusions

Developed model-robust Bayesian framework for linear regression

Method distinguishes between random and fixed covariates and isasymptotically equivalent to sandwich in random case

Better frequentist coverage in the fixed design case thansandwich-based estimates

Contrasting fixed and random case shows that sandwich implicitlyaccounts for random sampling

David Benkeser BIOST 572 Final Talk

Page 22: BIOST 572 Final Talk - University of Washingtonfaculty.washington.edu/heagerty/Courses/b572/public/... · 2012. 6. 1. · BIOST 572 Final Talk David Benkeser University of Washington

Questions?

David Benkeser BIOST 572 Final Talk