graphical models for combining multiple sources of information in observational studies nicky best...

30
Graphical models for combining multiple sources of information in observational studies Nicky Best Sylvia Richardson Chris Jackson Virgilio Gomez Sara Geneletti ESRC National Centre for Research Methods – BIAS node

Upload: randolph-fields

Post on 15-Jan-2016

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Graphical models for combining multiple sources of information in observational studies Nicky Best Sylvia Richardson Chris Jackson Virgilio Gomez Sara

Graphical models for combining multiple sources of information in

observational studies

Nicky BestSylvia Richardson

Chris JacksonVirgilio GomezSara Geneletti

ESRC National Centre for Research Methods – BIAS node

Page 2: Graphical models for combining multiple sources of information in observational studies Nicky Best Sylvia Richardson Chris Jackson Virgilio Gomez Sara

Outline

• Overview of graphical modelling• Case study 1: Water disinfection byproducts and

adverse birth outcomes – Modelling multiple sources of bias in observational

studies

• Bayesian computation and software• Case study 2: Socioeconomic factors and heart

disease (Chris Jackson)– Combining individual and aggregate level data– Application to Census, Health Survey for England, HES

Page 3: Graphical models for combining multiple sources of information in observational studies Nicky Best Sylvia Richardson Chris Jackson Virgilio Gomez Sara

Graphical modelling

Modelling

Inference

Mathematics

Algorithms

Page 4: Graphical models for combining multiple sources of information in observational studies Nicky Best Sylvia Richardson Chris Jackson Virgilio Gomez Sara

1. Mathematics

Modelling

Inference

Mathematics

Algorithms

• Key idea: conditional independence• X and W are conditionally independent given Z if, knowing

Z, discovering W tells you nothing more about XP(X | W, Z) = P(X | Z)

Page 5: Graphical models for combining multiple sources of information in observational studies Nicky Best Sylvia Richardson Chris Jackson Virgilio Gomez Sara

Example: Mendelian inheritance• Y, Z = genotype of parents • W, X = genotypes of 2 children• If we know the genotypes of the parents, then the

children’s genotypes are conditionally independent

P(X | W, Y, Z) = P(X | Y, Z)

Y

W

Z

X

Page 6: Graphical models for combining multiple sources of information in observational studies Nicky Best Sylvia Richardson Chris Jackson Virgilio Gomez Sara

Joint distributions and graphical models

Graphical models can be used to:

• represent structure of a joint probability distribution…..

• …..by encoding conditional independencies

Factorization thm:

Jt distribution P(V) = P(v | parents[v])

Y

W

Z

XP(X|Y, Z)P(W|Y, Z)

P(Z)P(Y)

P(W,X,Y,Z) = P(W|Y,Z) P(X|Y,Z) P(Y) P(Z)

Page 7: Graphical models for combining multiple sources of information in observational studies Nicky Best Sylvia Richardson Chris Jackson Virgilio Gomez Sara

Where does the graph come from?

• Genetics– pedigree (family tree)

• Physical, biological, social systems– supposed causal effects (e.g. regression models)

Page 8: Graphical models for combining multiple sources of information in observational studies Nicky Best Sylvia Richardson Chris Jackson Virgilio Gomez Sara

• Conditional independence provides basis for splitting large system into smaller components

Y

W

Z

X

A B

D

C

Page 9: Graphical models for combining multiple sources of information in observational studies Nicky Best Sylvia Richardson Chris Jackson Virgilio Gomez Sara

• Conditional independence provides basis for splitting large system into smaller components

Y

W

Z

WD

C

Y Z

X

Y

A B

Page 10: Graphical models for combining multiple sources of information in observational studies Nicky Best Sylvia Richardson Chris Jackson Virgilio Gomez Sara

2. Modelling

Modelling

Inference

Mathematics

Algorithms

Page 11: Graphical models for combining multiple sources of information in observational studies Nicky Best Sylvia Richardson Chris Jackson Virgilio Gomez Sara

Building complex models

Key idea• understand complex system• through global model• built from small pieces

– comprehensible– each with only a few variables– modular

Page 12: Graphical models for combining multiple sources of information in observational studies Nicky Best Sylvia Richardson Chris Jackson Virgilio Gomez Sara

Example: Case study 1

• Epidemiological study of low birth weight and mothers’ exposure to water disinfection byproducts

• Background– Chlorine added to tap water supply for disinfection– Reacts with natural organic matter in water to form

unwanted byproducts (including trihalomethanes, THMs)– Some evidence of adverse health effects (cancer, birth

defects) associated with exposure to high levels of THM– SAHSU are carrying out study in Great Britain using

routine data, to investigate risk of low birth weight associated with exposure to different THM levels

Page 13: Graphical models for combining multiple sources of information in observational studies Nicky Best Sylvia Richardson Chris Jackson Virgilio Gomez Sara

Data sources

• National postcoded births register• Routinely monitored THM concentrations in tap

water samples for each water supply zone within 14 different water company regions

• Census data – area level socioeconomic factors• Millenium cohort study (MCS) – individual level

outcomes and confounder data on sample of mothers

• Literature relating to factors affecting personal exposure (uptake factors, water consumption, etc.)

Page 14: Graphical models for combining multiple sources of information in observational studies Nicky Best Sylvia Richardson Chris Jackson Virgilio Gomez Sara

Model for combining data sources

[c]

[T]

yik

2

yim

cik

i

cim

THMik[mother]

THMzt[true]

THMztj[raw]

THMim[mother]

Page 15: Graphical models for combining multiple sources of information in observational studies Nicky Best Sylvia Richardson Chris Jackson Virgilio Gomez Sara

Regression sub-model (MCS)

[c]

[T]

yik

2

yim

cik

i

cim

THMik[mother]

THMzt[true]

THMztj[raw]

THMim[mother]

Regression model for MCS data relating risk of low

birth weight (yim) to mother’s THM exposure

and other confounders (cim)

Page 16: Graphical models for combining multiple sources of information in observational studies Nicky Best Sylvia Richardson Chris Jackson Virgilio Gomez Sara

Regression sub-model (MCS)

[c]

[T]

yim

cim

THMim[mother]

Regression model for MCS data relating risk of low

birth weight (yim) to mother’s THM exposure

and other confounders (cim)

Logistic regression

yim ~ Bernoulli(pim)

logit pim = b[c] cim + b[T] THMim

i indexes small area

m indexes mother

[mother]

cik = potential confounders,e.g. deprivation, smoking, ethnicity

Page 17: Graphical models for combining multiple sources of information in observational studies Nicky Best Sylvia Richardson Chris Jackson Virgilio Gomez Sara

Regression sub-model (national data)

[c]

[T]

yik

2

yim

cik

i

cim

THMik[mother]

THMzt[true]

THMztj[raw]

THMim[mother]

Regression model for national data relating risk of

low birth weight (yik) to mother’s THM exposure

and other confounders (cik)

Page 18: Graphical models for combining multiple sources of information in observational studies Nicky Best Sylvia Richardson Chris Jackson Virgilio Gomez Sara

Regression sub-model (national data)

[c]

[T]

yik

cik

THMik[mother]

Regression model for national data relating risk of

low birth weight (yik) to mother’s THM exposure

and other confounders (cik)

Logistic regression

yik ~ Bernoulli(pik)

logit pik = b[c] cik + b[T] THMik

i indexes small areak indexes mother

[mother]

Page 19: Graphical models for combining multiple sources of information in observational studies Nicky Best Sylvia Richardson Chris Jackson Virgilio Gomez Sara

Missing confounders sub-model

[c]

[T]

yik

2

yim

cik

i

cim

THMik[mother]

THMzt[true]

THMztj[raw]

THMim[mother]

Missing data model to estimate confounders (cik)

for mothers in national data, using information on within area distribution of

confounders in MCS

Page 20: Graphical models for combining multiple sources of information in observational studies Nicky Best Sylvia Richardson Chris Jackson Virgilio Gomez Sara

Missing confounders sub-model

cik

i

cim

Missing data model to estimate confounders (cik)

for mothers in national data, using information on within area distribution of

confounders in MCS

cim ~ Bernoulli(i) (MCS mothers)

cik ~ Bernoulli(i) (Predictions for

mothers in national data)

Page 21: Graphical models for combining multiple sources of information in observational studies Nicky Best Sylvia Richardson Chris Jackson Virgilio Gomez Sara

THM measurement error sub-model

[c]

[T]

yik

2

yim

cik

i

cim

THMik[mother]

THMzt[true]

THMztj[raw]

THMim[mother]

Model to estimate true tap water THM concentration

from raw data

Page 22: Graphical models for combining multiple sources of information in observational studies Nicky Best Sylvia Richardson Chris Jackson Virgilio Gomez Sara

THM measurement error sub-model

2

THMzt[true]

THMztj[raw]

Model to estimate true tap water THM concentration

from raw data

THMztj ~ Normal(THMzt, 2)

z = water zone; t = season; j = sample

(Actual model used was a more complex mixture of Normal distributions)

[raw] [true]

Page 23: Graphical models for combining multiple sources of information in observational studies Nicky Best Sylvia Richardson Chris Jackson Virgilio Gomez Sara

THM personal exposure sub-model

[c]

[T]

yik

2

yim

cik

i

cim

THMik[mother]

THMzt[true]

THMztj[raw]

THMim[mother]

Model to predict personal exposure using estimated tap water THM level and

literature on distribution of factors affecting individual

uptake of THM

Page 24: Graphical models for combining multiple sources of information in observational studies Nicky Best Sylvia Richardson Chris Jackson Virgilio Gomez Sara

THM personal exposure sub-model

THMik[mother]

THMzt[true]

THMim[mother]

Model to predict personal exposure using estimated tap water THM level and

literature on distribution of factors affecting individual

uptake of THM

THM = ∑k THMzt x quantity (1k) x uptake factor (2k)

where k indexes different water use activities, e.g. drinking, showering, bathing

[mother] [true]

Page 25: Graphical models for combining multiple sources of information in observational studies Nicky Best Sylvia Richardson Chris Jackson Virgilio Gomez Sara

3. Inference

Modelling

Inference

Mathematics

Algorithms

Page 26: Graphical models for combining multiple sources of information in observational studies Nicky Best Sylvia Richardson Chris Jackson Virgilio Gomez Sara

Bayesian

Page 27: Graphical models for combining multiple sources of information in observational studies Nicky Best Sylvia Richardson Chris Jackson Virgilio Gomez Sara

… or non Bayesian

Page 28: Graphical models for combining multiple sources of information in observational studies Nicky Best Sylvia Richardson Chris Jackson Virgilio Gomez Sara

• Graphical approach to building complex models lends itself naturally to Bayesian inferential process

• Graph defines joint probability distribution on all the ‘nodes’ in the model

Recall: Joint distribution P(V) = P(v | parents[v])

• Condition on parts of graph that are observed (data) • Calculate posterior probabilities of remaining nodes

using Bayes theorem• Automatically propagates all sources of uncertainty

Bayesian Full Probability Modelling

Page 29: Graphical models for combining multiple sources of information in observational studies Nicky Best Sylvia Richardson Chris Jackson Virgilio Gomez Sara

[c]

[T]

yik

2

yim

cik

i

cim

THMik[mother]

THMzt[true]

THMztj[raw]

THMim[mother]

Data

Unknowns

Page 30: Graphical models for combining multiple sources of information in observational studies Nicky Best Sylvia Richardson Chris Jackson Virgilio Gomez Sara

4. Algorithms

Modelling

Inference

Mathematics

Algorithms

• MCMC algorithms are able to exploit graphical structure for efficient inference

• Bayesian graphical models implemented in WinBUGS