cross-study differential gene expression - harvard...

47
Cross-study differential gene expression giovanni [email protected] PQG, November 2009

Upload: others

Post on 15-Mar-2020

9 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Cross-study differential gene expression - Harvard Universitybcb.dfci.harvard.edu/~gp/talks/PQG09.pdf · outline: three related questions Which aspects of gene expression can be consistently

Cross-study differential gene expression

giovanni [email protected]

PQG, November 2009

Page 2: Cross-study differential gene expression - Harvard Universitybcb.dfci.harvard.edu/~gp/talks/PQG09.pdf · outline: three related questions Which aspects of gene expression can be consistently

Marshall etal Science 2004; Tan etal NAR 2004

Marshall: “Little overlap.Three array systemsrated the activity of 185genes differently in onetest.”

Page 3: Cross-study differential gene expression - Harvard Universitybcb.dfci.harvard.edu/~gp/talks/PQG09.pdf · outline: three related questions Which aspects of gene expression can be consistently

FDA approves Mammaprint in Feb 2007

Page 4: Cross-study differential gene expression - Harvard Universitybcb.dfci.harvard.edu/~gp/talks/PQG09.pdf · outline: three related questions Which aspects of gene expression can be consistently

outline: three related questions

Which aspects of gene expression can be consistentlymeasured across studies and platforms?Integrative Correlation (Parmigiani etal JCO 2004)To what extent are the biological conclusions confirmedacross studies?Integrative Association (Zhong etal almost done)How do we perform a joint analysis?Hierarchical Modeling (Scharpf etal JASA 2009)

Page 5: Cross-study differential gene expression - Harvard Universitybcb.dfci.harvard.edu/~gp/talks/PQG09.pdf · outline: three related questions Which aspects of gene expression can be consistently

INTEGRATIVE CORRELATION

A PROFILE FOR BRCA1-LINKED TUMORS?

Page 6: Cross-study differential gene expression - Harvard Universitybcb.dfci.harvard.edu/~gp/talks/PQG09.pdf · outline: three related questions Which aspects of gene expression can be consistently

a profile for BRCA1-linked tumors?

Studies:van’t Veer, Nature 2002 (Rosetta, Agilent long oligos)Hedenfalk, NEJM 2001 (NHGRI, cDNA)The overlap among the lists of BRCA1-related genes ismeager, and reproducibility has been criticized.Does breast cancer in BRCA1 germline mutation carriershave a specific molecular profile?

Page 7: Cross-study differential gene expression - Harvard Universitybcb.dfci.harvard.edu/~gp/talks/PQG09.pdf · outline: three related questions Which aspects of gene expression can be consistently

integrative correlation (IC)

Expression matrices,studies a and b:

A∗, B∗

Gene by Genecorrelation matrices:

Ca, Cb

Integrative Correlationfor Gene g

Cor(Cag ,Cbg)

0.2

0.4

0.6

0.8

1.0

1.2

1.4

STUDY A

EX

PR

ES

SIO

N

A1 A2 A3 A4

● ●● ● ●

●●

0.2

0.4

0.6

0.8

1.0

1.2

1.4

STUDY B

EX

PR

ES

SIO

N

B1 B2 B3 B4 B5

● ●● ●●

● ● ●●

●● ●

PA

IRW

ISE

CO

RR

ELA

TIO

NS

● ● 0.63

● ● −0.95

● ● −0.36

PA

IRW

ISE

CO

RR

ELA

TIO

NS

● ● 0.52

● ● −0.8

● ● 0.046

GP etal CCR 2004, P Pavlidis, JK Lee

Page 8: Cross-study differential gene expression - Harvard Universitybcb.dfci.harvard.edu/~gp/talks/PQG09.pdf · outline: three related questions Which aspects of gene expression can be consistently

integrative correlations: examples

−1.0 −0.5 0.0 0.5 1.0

−1.

0−

0.5

0.0

0.5

1.0

Correlations of Gene 1 in Study A

Cor

rela

tions

of G

ene

1 in

Stu

dy B

r = 0.75

−1.0 −0.5 0.0 0.5 1.0

−1.

0−

0.5

0.0

0.5

1.0

Correlations of Gene 2 in Study AC

orre

latio

ns o

f Gen

e 2

in S

tudy

B

r = −0.67

−1.0 −0.5 0.0 0.5 1.0

−1.

0−

0.5

0.0

0.5

1.0

Correlations of Gene 3 in Study A

Cor

rela

tions

of G

ene

3 in

Stu

dy B

r = −0.03

−1.0 −0.5 0.0 0.5 1.0

−1.

0−

0.5

0.0

0.5

1.0

Correlations of Gene 4 in Study A

Cor

rela

tions

of G

ene

4 in

Stu

dy B

r = 0.04

−1.0 −0.5 0.0 0.5 1.0

−1.

0−

0.5

0.0

0.5

1.0

Correlations of Gene 5 in Study A

Cor

rela

tions

of G

ene

5 in

Stu

dy B

r = −0.02

Page 9: Cross-study differential gene expression - Harvard Universitybcb.dfci.harvard.edu/~gp/talks/PQG09.pdf · outline: three related questions Which aspects of gene expression can be consistently

integrative correlations: alternative representation

xa, xb standardized gene expression of gene g in the twostudies. integrative covariance of x is

xaAtIBx tb

and the integrative variance of xa is

xaAtIAx ta.

The integrative correlation of gene g is

xaAtIBx tb√

xaAtIAx ta

√xbBtIBx t

b

.

Page 10: Cross-study differential gene expression - Harvard Universitybcb.dfci.harvard.edu/~gp/talks/PQG09.pdf · outline: three related questions Which aspects of gene expression can be consistently

IC and sample correlationsHs.174070

Integrative Cor = 0.86

Harvard

Sta

nfor

d

Page 11: Cross-study differential gene expression - Harvard Universitybcb.dfci.harvard.edu/~gp/talks/PQG09.pdf · outline: three related questions Which aspects of gene expression can be consistently

integrative correlation and integrative association

−0.4 −0.2 0.0 0.2 0.4 0.6

02

46

810

12

Integrative Correlation

Den

sity

−0.2 −0.1 0.0 0.1 0.2 0.3

−0.

4−

0.2

0.0

0.2

0.4

0.6

SAM statistics for BRCA1 versus Wildtype: VANT’VEER

SA

M s

tatis

tics

for

BR

CA

1 ve

rsus

Wild

type

: HE

DE

NF

ALK

UNREPRODUCIBLE GENES

CORR = −0.132

−0.4 −0.2 0.0 0.2

−0.

50.

00.

51.

0

SAM statistics for BRCA1 versus Wildtype: VANT’VEER

SA

M s

tatis

tics

for

BR

CA

1 ve

rsus

Wild

type

: HE

DE

NF

ALK

REPRODUCIBLE GENES

CORR = 0.712

Here “Reproducible” means high IC

Page 12: Cross-study differential gene expression - Harvard Universitybcb.dfci.harvard.edu/~gp/talks/PQG09.pdf · outline: three related questions Which aspects of gene expression can be consistently

observed vs expected IC and false discovery rates

−5 0 5

0.0

0.1

0.2

0.3

0.4

SAM STATISTICS: observed (solid) and expected (dashed)

DE

NS

ITY

Page 13: Cross-study differential gene expression - Harvard Universitybcb.dfci.harvard.edu/~gp/talks/PQG09.pdf · outline: three related questions Which aspects of gene expression can be consistently

BRCA1 profile

HEDENFALK

ALCAMDKFZp762E1

ZNF22VCAM1SF3B5BIRC3

SH3GLB1GALNT10

DYSFITGB5MTIF2

INPP4BNRIP1

GATA3MSN

TOB1MYBL2

C20orf55YES1

PPP1CBGTPBP4NUP160

MMP7BCL2A1

SEC13L1NUP155

C1GALT1EML2

TIMP3CTPS

TNFRSF1BUGDH

KIAA1223KIAA0063

RFC4MTCP1MFGE8

TP53BP2POLR2FCDKN2C

SASNCK1

FBXL5GDI2

CSTBNSEP1

PLODILF2

TFAMNOLC1MDH1

DHCR24CD58

TNFAIP1GTF2E2

BTG3CALU

HMGN3DLG7

USP10KIAA0232

KLK6LYN

LPIN1WARS

TRIM29GART

RARRES1VRK2

MPLCKS2CYB5CTSKINDO

HDGFTOPBP1

SSBP1TPX2

TCEAL1DSC2

VAN’T VEER

scale

−3 −2 −1 0 1 2 3

Page 14: Cross-study differential gene expression - Harvard Universitybcb.dfci.harvard.edu/~gp/talks/PQG09.pdf · outline: three related questions Which aspects of gene expression can be consistently

cross-validation

Concordance between .78 and .94 by “square onecross-validation”.More informative than family historyUseful complement to genetic testing?Useful surrogate for mutation analysis?

Page 15: Cross-study differential gene expression - Harvard Universitybcb.dfci.harvard.edu/~gp/talks/PQG09.pdf · outline: three related questions Which aspects of gene expression can be consistently

INTEGRATIVE ASSOCIATION:

BASAL SUBTYPE AND SURVIVALIN BREAST CANCER

Page 16: Cross-study differential gene expression - Harvard Universitybcb.dfci.harvard.edu/~gp/talks/PQG09.pdf · outline: three related questions Which aspects of gene expression can be consistently

Sorlie PNAS 03: subtypes, profiles, prognosis

Page 17: Cross-study differential gene expression - Harvard Universitybcb.dfci.harvard.edu/~gp/talks/PQG09.pdf · outline: three related questions Which aspects of gene expression can be consistently

overlap of cross-referencing approaches

Page 18: Cross-study differential gene expression - Harvard Universitybcb.dfci.harvard.edu/~gp/talks/PQG09.pdf · outline: three related questions Which aspects of gene expression can be consistently

integrative association and filtering

V vs S V vs H S vs H

Allgenes

HighIC

Page 19: Cross-study differential gene expression - Harvard Universitybcb.dfci.harvard.edu/~gp/talks/PQG09.pdf · outline: three related questions Which aspects of gene expression can be consistently

integrative association and Taiwan “batch 2”

V vs S V vs H2 S vs H2

Allgenes

HighIC

Page 20: Cross-study differential gene expression - Harvard Universitybcb.dfci.harvard.edu/~gp/talks/PQG09.pdf · outline: three related questions Which aspects of gene expression can be consistently

integrative association and filtering lots of ways

Page 21: Cross-study differential gene expression - Harvard Universitybcb.dfci.harvard.edu/~gp/talks/PQG09.pdf · outline: three related questions Which aspects of gene expression can be consistently

genes and profiles, “old” intrinsic gene list

Page 22: Cross-study differential gene expression - Harvard Universitybcb.dfci.harvard.edu/~gp/talks/PQG09.pdf · outline: three related questions Which aspects of gene expression can be consistently

genes and profiles, “new” intrinsic gene list

Page 23: Cross-study differential gene expression - Harvard Universitybcb.dfci.harvard.edu/~gp/talks/PQG09.pdf · outline: three related questions Which aspects of gene expression can be consistently

XDE: A BAYESIAN MODEL

FOR COMBINED ANALYSIS

Page 24: Cross-study differential gene expression - Harvard Universitybcb.dfci.harvard.edu/~gp/talks/PQG09.pdf · outline: three related questions Which aspects of gene expression can be consistently

discordance

Important insight about technology, study design or thegenetics of alternative splicing can potentially be gained byidentifying and following discordant genes

Discordant patterns of expression could emerge from

genetic heterogeneity of samples across studies

alternative splicingtwo technologies that measure a gene’s expression by targetingportions of a gene that are associated with different transcripts that arenegatively correlated with each other.

Page 25: Cross-study differential gene expression - Harvard Universitybcb.dfci.harvard.edu/~gp/talks/PQG09.pdf · outline: three related questions Which aspects of gene expression can be consistently

4 breast cancer studies

−6 −2 0 2 4

−6

−4

−2

0

2

4

Hedenfalk

−4

−2

0

2

4 x

x xxx

xx

xx

xx

x

xx

x

x x

x

x

xx x

x

x

xx x

x

x

x

x

xx

xx

x

x

xx

x

x

x

x

x

x

x

x

xx

x

x

x

xx

x xx

x

x

x

x

x

xx

x

x

xxxxx xxxx

x

x

x

xx

x

x

x

x x

xx

x

xx

xx

x

xx

x

xx

xx

x

x

x

x

Sorlie

−5

0

5

10

xx xx x xxx xxx

x

xx

x

xxx

x

xx xx

xxx

x xx

x

xxx

xx

x

x

xx

x

xx

x

x

xx xx xx

xxxx x

x

xx xx

x

xx

xx xxx

xxx

x

xxx xx

xx

x

x xxx x

xxx

xx

xx

xx

x xxx

x

xxx

xx

xxxx x xxx x xx

x

xx

x

xx x

x

xxxx

xx x

x xx

x

xxx

xx

x

x

xx

x

xx

x

x

x xxxxx

xx

x x xx

xx xx

x

xxx

x xxxx x

xx

xx

x xxx

xx

xxx xx

xxx

xx

xx

xx

x xx x

x

xx x

xx

Farmer

−6 −2 0 2 4

−4

−2

0

2

4

6

x

x xx x

xx

x

xx

xx

xx

x

xx

x

x

xx

x

x

x

xx

x x

x

x

x

xx

x

x

x

x

xx

xx

x

xx

xx x

x x

x

x

x

xx

x xx

x

x

x

x

x

x

xx

x

x

xxxx xxxx

x

xxx

x

x xxx

xx

x

x

xxx

xx

xx

x

x

x

xx

x

x

x

x

−4 −2 0 2 4

x

xxx x

xx

x

x x

xx

xx

x

xx

x

x

xxx

x

x

x x

x x

x

x

x

xx

x

x

x

x

xx

xx

x

xx

xxx

xx

x

x

x

x x

xxx

x

x

x

x

x

x

xx

x

x

xxxxx x

xx

x

xx x

x

xx x x

xx

x

x

xxx

xx

xx

x

x

x

xx

x

x

x

x

−5 0 5 10

x

xxxx

xx

x

xx

xx

x x

x

xx

x

x

xxx

x

x

xx

xx

x

x

x

xx

x

x

x

x

xx

xx

x

x x

xxx

xx

x

x

x

xx

x xx

x

x

x

x

x

x

x x

x

x

x xx xx

xx x

x

xxx

x

xx xx

xx

x

x

xxx

xx

xx

x

x

x

xx

x

x

x

x

−4 0 2 4 6

−4

−2

0

2

4

6

Huang

−6 −2 0 2 4

−6

−4

−2

0

2

4

Hedenfalk

−4

−2

0

2

4

oo

o

o

o

ooo

o oo

o

oo

o

o

o

oo oo

o

o

o

o

o

o

o

o

oo

o

o

oo

o

o

o

o

oo

oo

oo

oo

o

oooo

o

oo

o

o

ooo

o

o

o

o

ooo

o o

oo

o

o

o

ooo

oo

o

o

o o

o

o

o

o

o

o

o

o

o

o

o

o oo

o

o oo

o

o

o

o o

o

o

o

o

o

o

o

oo

oo

oooo

o

o

oo

o

o

o

o

oo

o oo

o o

oo

oo ooo o

o

oo o

oo

oo

oo

o

oo

oo

o

o

oo

o

o

o

o

oo

ooo o

o

o

oo

o

o

o

o

oooo

oo

o

o

ooo

ooo

o oo

oo

o

oo

o

oo

o

Sorlie

−5

0

5

10

oo

o

o

o

oo

o o oo

o

ooo

o

o

oo oo

o

o

o

o

o

o

o

o

o

o

oo

ooo

o

o

o

oo

oo

oo

oo

o

oo

oo

o

oo

o

o

oooo

o

o

o

ooo

o oo

oo

o

o

o

oo

oo

o

o

o o

o

o

o

oo

o

oo

o

o

o

ooo

o

oo

o

o o

o

o o

o

o

o

oo

o

ooo

oo

oooo

oo

oo

o

ooo

oo

ooo

oo

o

oo

o oo

o oooo

o

oo

ooo

o

o

oo oo

o

o

oo

o

o

o

o

oo

o

oo

o

o

o

oo

o

o

o

o

oo oo

oo

o

o oooo o

ooo

o

oo

o

o

ooo

oo

oo

o

o

o

oooooo

o

o oo

o

o

oooo

o

o

o

o

o

o

o

o

o

o

o o

ooo

o

o

o

oo

oo

oo

o o

o

oo

oo

o

oo

o

o

oo

o o

o

o

o

ooo

ooo

oo

o

o

o

oo

oo

o

o

oo

o

o

o

oo

o

oo

o

o

o

ooo

o

ooo

oo

o

oo

o

o

o

oo

o

ooo

oo

oooo

o o

oo

o

ooo

oo

ooo

oo

o

oooo

oooo oo

o

oo

ooo

o

o

o oo o

o

o

oo

o

o

o

o

oo

o

oo

o

o

o

oo

o

o

o

o

oo oo

oo

o

oo oooo

ooo

o

oo

o

o

oo o

oo

Farmer

−6 −2 0 2 4

−4

−2

0

2

4

6

o

o

o

o

o

oo

o o

o

o

o

oo

o

o

o

oooo

o

o

o

o

oo

o

oo

o

o

oo

o

o

o

o

o

oo

oo

oo

ooo

oo

oo o

o

ooo

ooo

o

o

o

o

o

oo

o oo

oo

o

o

ooo

oo

o

o

o

o

o

o

oo

o

o

oo

o

o

o

o

o

oo

oo

o

o o

o

oo

o

oo

oo

o

ooo

o

o

oooo

oo

oo

o

ooo

oo

o

oo

o

o

o

o o

o oo

oooo

o o

oo

ooo

o

o

oo o

o

o

o

oo

o

o

o

o

oo

o

oo

o

o

o

oo

o

o

o

o

oo o

o

oo

o

o

o

oo

ooo

ooo oo

o

o

ooo

o o

−4 −2 0 2 4

o

o

o

o

o

oooo

o

o

o

o o

o

o

o

oooo

o

o

o

o

oo

o

o o

o

o

oo

o

o

o

o

o

oo

oo

oo

ooo

oo

ooo

o

ooo

oo

oo

o

o

o

o

oo

ooo

oo

o

o

ooo

oo

o

o

o

o

o

o

oo

o

o

oo

o

o

o

o

o

o o

oo

o

oo

o

oo

o

oo

o o

o

o oo

o

o

oooo

oo

oo

o

oo o

o o

o

oo

o

o

o

o o

ooo

ooo ooo

oo

ooo

o

o

ooo

o

o

o

oo

o

o

o

o

oo

o

ooo

o

o

oo

o

o

o

o

oo o

o

oo

o

o

o

oo

ooo

oo ooo

o

o

oo o

oo

−5 0 5 10

o

o

o

o

o

oo

oo

o

o

o

oo

o

o

o

oooo

o

o

o

o

o o

o

o o

o

o

oo

o

o

o

o

o

o o

o o

oo

ooo

oo

oo o

o

o oo

oo

oo

o

o

o

o

oo

ooo

o o

o

o

ooo

oo

o

o

o

o

o

o

ooo

o

oo

o

o

o

o

o

oo

oo

o

oo

o

oo

o

oo

oo

o

ooo

o

o

oo oo

oo

oo

o

oo o

oo

o

oo

o

o

o

oo

ooo

oooooo

oo

oo o

o

o

ooo

o

o

o

oo

o

o

o

o

oo

o

oo

o

o

o

o o

o

o

o

o

ooo

o

o o

o

o

o

oo

oo o

oo oo o

o

o

ooo

o o

−4 0 2 4 6

−4

−2

0

2

4

6

Huang

t-statistics for estrogen receptor statusin bold are genes significant in 3 out of 4 studies

Page 26: Cross-study differential gene expression - Harvard Universitybcb.dfci.harvard.edu/~gp/talks/PQG09.pdf · outline: three related questions Which aspects of gene expression can be consistently

Hierarchical model for gene expression

����x

����δHHHH

HHHHHj

����ξ

?

����ν

@@@@R

����∆

����

����τ2

��

��

@@@@R

����ρ

?

����γ2

�����

����a

AAAAU

����r

?

����c2

AAAAU

����b

����� ����

ϕ��������������)

����θ

PPi�����)

����σ2

���

���

���

���

��+

�����

�����

�������������������9

����tPPi

����l��)

Figure: A graphical representation of the hierarchical Bayesian model

∆gp : effect size or ’offset’ (indexed by gene and platform)

δg : binary indicator for differential expression (indexed by gene)

Page 27: Cross-study differential gene expression - Harvard Universitybcb.dfci.harvard.edu/~gp/talks/PQG09.pdf · outline: three related questions Which aspects of gene expression can be consistently

Hierarchical model for gene expression

Level 1:

xgsp|νgp, δg ,∆gp, σ2g0p, σ

2g1p ∼ N

“νgp + δg (2ψsp − 1)∆gp, σ

2gψspp

Level 2:

P(δg = 1|ξ) = ξ, where ξ ∼ Beta(αξ, βξ)

νg ∼ N`0,Σg

´, (Σ)pq = γ2ρpq

rτ2

p τ2qσ

2apgp σ

2aqgq and

Qp τ

2p = 1

∆g ∼ N(0,Rg

), (Rg)pq = c2rpq

√τ2

p τ2qσ

2bpgp σ

2bqgq

σ2gp =

qσ2

g0pσ2g1p, σ

2g0p = σ2

gpϕgp, σ2g1p =

σ2gpϕgp

σ2gp|tp, lp ∼ Gamma(tp, lp), ϕgp|θp, λp ∼ Gamma(θp, λp)

Level 3:

P(ap = 0) = p0a, P(ap = 1) = p1

a, ap|ap ∈ (0, 1) ∼ Beta(αa, βa)

P(bp = 0) = p0b, P(bp = 1) = p1

b, bp|bp ∈ (0, 1) ∼ Beta(αb, βb)

Barnard et al. priors for rpq and ρpq ; joint uniform for τ11 , . . . , τ

2P and

Qp τ

2p = 1

tp ∼ Unif(0,∞), lp ∼ Unif(0,∞), γ2 ∼ Unif(0,∞), c2 ∼ Unif(0,∞)

Page 28: Cross-study differential gene expression - Harvard Universitybcb.dfci.harvard.edu/~gp/talks/PQG09.pdf · outline: three related questions Which aspects of gene expression can be consistently

Hierarchical model for gene expression

Level 1:

xgsp|νgp, δg ,∆gp, σ2g0p, σ

2g1p ∼ N

“νgp + δg (2ψsp − 1)∆gp, σ

2gψspp

Level 2:

P(δg = 1|ξ) = ξ, where ξ ∼ Beta(αξ, βξ)

νg ∼ N`0,Σg

´, (Σ)pq = γ2ρpq

rτ2

p τ2qσ

2apgp σ

2aqgq and

Qp τ

2p = 1

∆g ∼ N(0,Rg

), (Rg)pq = c2rpq

√τ2

p τ2qσ

2bpgp σ

2bqgq

σ2gp =

qσ2

g0pσ2g1p, σ

2g0p = σ2

gpϕgp, σ2g1p =

σ2gpϕgp

σ2gp|tp, lp ∼ Gamma(tp, lp), ϕgp|θp, λp ∼ Gamma(θp, λp)

Level 3:

P(ap = 0) = p0a, P(ap = 1) = p1

a, ap|ap ∈ (0, 1) ∼ Beta(αa, βa)

P(bp = 0) = p0b, P(bp = 1) = p1

b, bp|bp ∈ (0, 1) ∼ Beta(αb, βb)

Barnard et al. priors for rpq and ρpq ; joint uniform for τ11 , . . . , τ

2P and

Qp τ

2p = 1

tp ∼ Unif(0,∞), lp ∼ Unif(0,∞), γ2 ∼ Unif(0,∞), c2 ∼ Unif(0,∞)

Page 29: Cross-study differential gene expression - Harvard Universitybcb.dfci.harvard.edu/~gp/talks/PQG09.pdf · outline: three related questions Which aspects of gene expression can be consistently

Hierarchical model for gene expression

Level 1:

xgsp|νgp, δg ,∆gp, σ2g0p, σ

2g1p ∼ N

“νgp + δg (2ψsp − 1)∆gp, σ

2gψspp

Level 2:

P(δg = 1|ξ) = ξ, where ξ ∼ Beta(αξ, βξ)

νg ∼ N`0,Σg

´, (Σ)pq = γ2ρpq

rτ2

p τ2qσ

2apgp σ

2aqgq and

Qp τ

2p = 1

∆g ∼ N(0,Rg

), (Rg)pq = c2rpq

√τ2

p τ2qσ

2bpgp σ

2bqgq

σ2gp =

qσ2

g0pσ2g1p, σ

2g0p = σ2

gpϕgp, σ2g1p =

σ2gpϕgp

σ2gp|tp, lp ∼ Gamma(tp, lp), ϕgp|θp, λp ∼ Gamma(θp, λp)

Level 3:

P(ap = 0) = p0a, P(ap = 1) = p1

a, ap|ap ∈ (0, 1) ∼ Beta(αa, βa)

P(bp = 0) = p0b, P(bp = 1) = p1

b, bp|bp ∈ (0, 1) ∼ Beta(αb, βb)

Barnard et al. priors for rpq and ρpq ; joint uniform for τ11 , . . . , τ

2P and

Qp τ

2p = 1

tp ∼ Unif(0,∞), lp ∼ Unif(0,∞), γ2 ∼ Unif(0,∞), c2 ∼ Unif(0,∞)

Page 30: Cross-study differential gene expression - Harvard Universitybcb.dfci.harvard.edu/~gp/talks/PQG09.pdf · outline: three related questions Which aspects of gene expression can be consistently

Estimates of differential expression

Parameter XDE estimates

differentially expressed PME(g)

concordantly expressed PMC(g)

discordantly expressed PMD(g)

PM·(g) denotes the posterior mean of the following indicators:

Eg ≡ δg

Cg ≡

1 δg = 1 and ∆g· have the same sign,0 otherwise

Dg ≡

1 δg = 1 and ∆g· do not have the same sign,0 otherwise

Page 31: Cross-study differential gene expression - Harvard Universitybcb.dfci.harvard.edu/~gp/talks/PQG09.pdf · outline: three related questions Which aspects of gene expression can be consistently

Estimates of differential expression

Parameter XDE estimates

differentially expressed PME(g)

concordantly expressed PMC(g)

discordantly expressed PMD(g)

PM·(g) denotes the posterior mean of the following indicators:

Eg ≡ δg

Cg ≡

1 δg = 1 and ∆g· have the same sign,0 otherwise

Dg ≡

1 δg = 1 and ∆g· do not have the same sign,0 otherwise

Page 32: Cross-study differential gene expression - Harvard Universitybcb.dfci.harvard.edu/~gp/talks/PQG09.pdf · outline: three related questions Which aspects of gene expression can be consistently

Estimates of differential expression

Parameter XDE estimates

differentially expressed PME(g)

concordantly expressed PMC(g)

discordantly expressed PMD(g)

PM·(g) denotes the posterior mean of the following indicators:

Eg ≡ δg

Cg ≡

1 δg = 1 and ∆g· have the same sign,0 otherwise

Dg ≡

1 δg = 1 and ∆g· do not have the same sign,0 otherwise

Page 33: Cross-study differential gene expression - Harvard Universitybcb.dfci.harvard.edu/~gp/talks/PQG09.pdf · outline: three related questions Which aspects of gene expression can be consistently

Estimates of differential expression

EstimatesParameter XDE Alternative

differentially expressed PME(g) uE(g)1

concordantly expressed PMC(g) z-score2

discordantly expressed PMD(g) uD(g)3

1uE(g) ≡ α1|Ug1| + . . . + αP |UgP |, where

αp ≡Lpq

SpPPi=1 Li

qSi

for p ∈ {1, . . . , P}

L is the covariance loading from the first principal componentsand S is the number of samples.

2cross-study estimator for differential expression described in Choi et al., 2003

3uD(g) ≡

uE(g) sign(Ug1) = · · · = sign(UgP )−1× uE(g) otherwise.

Page 34: Cross-study differential gene expression - Harvard Universitybcb.dfci.harvard.edu/~gp/talks/PQG09.pdf · outline: three related questions Which aspects of gene expression can be consistently

Simulation using three lung cancer studies

1 Randomly assign a binary covariate (ψ∗) to early stagelung adenocarcinomas

2 For each gene, simulate δ∗ from a Bernoulli(ξ∗)3 Simulate offsets ∆∗

24 ∆?g1

∆?g2

∆?g3

35 ∼ N

0@k?24 sg1

sg2

sg3

35 , 1c?

24 s2g1 r?1 sg1sg2 r?2 sg1sg3

r?1 sg2sg1 s2g2 r?3 sg2sg3

r?2 sg3sg1 r?3 sg3sg2 s2g3

351A4 Compute the simulated expression values:

x?gsp =

xgsp + (2ψ?sp − 1)∆?

gp if δ∗g = 1xgsp otherwise.

Page 35: Cross-study differential gene expression - Harvard Universitybcb.dfci.harvard.edu/~gp/talks/PQG09.pdf · outline: three related questions Which aspects of gene expression can be consistently

Simulation

We simulated a number of artificial datasets of different sample sizes (S) byvarying parameters that affect the location (k?), precision (c?), andinter-study correlation (r?) of the simulated offsets, as well as the proportionof genes that are differentially expressed (ξ?)

Simulation k? S c? r? ξ?

A† 0.5 4 0.5 (0.1, 0.2, 0.4) 0.10B · · · · 0.50C · · · (0.8, 0.9, 0.92) 0.10D · · · · 0.50E† · 8 0.5 (0.1, 0.2, 0.4) 0.10F · · 1 · 0.10G · · · · 0.50H · · · (0.8, 0.9, 0.92) 0.10I · · · · 0.50J† 0 16 10 (0.1, 0.2, 0.4) 0.10K · · · · 0.50L · · · (0.8, 0.9, 0.92) 0.10M · · · · 0.50O · 32 20 (0.1, 0.2, 0.4) 0.10P · · · · 0.50Q · · · (0.8, 0.9, 0.92) 0.10R · · · · 0.50

† used 10 different seeds to assess sensitivity to randomly generated values

Page 36: Cross-study differential gene expression - Harvard Universitybcb.dfci.harvard.edu/~gp/talks/PQG09.pdf · outline: three related questions Which aspects of gene expression can be consistently

4 genes, 2 studies

gene δ? sign (∆?) E? C? D?

1 0 · 0 0 02 1 {−,−} 1 1 03 1 {−,+} 1 0 14 1 {+,+} 1 1 0

Columns E?, C?, and D? are indicators for true differential expression,concordant differential expression, and discordant differentialexpression, respectively.

Page 37: Cross-study differential gene expression - Harvard Universitybcb.dfci.harvard.edu/~gp/talks/PQG09.pdf · outline: three related questions Which aspects of gene expression can be consistently

Simulations A - R

z

z

z

z

0.89 0.91 0.93

0.89

0.90

0.91

0.92

0.93

0.94

s

A − D

S=4

z

z

z

z

0.90 0.92 0.94 0.96 0.98

0.90

0.92

0.94

0.96

0.98

t

t

t

s

s

E − I

S=8

z

zz

z

0.75 0.80 0.85 0.90

0.75

0.80

0.85

0.90

t

t

s

s

J − M

S=16 z

z

z

z

0.68 0.72 0.76 0.80

0.68

0.70

0.72

0.74

0.76

0.78

0.80 t

t

t

t s

s

s

O − R

S=32

AUC − XDE

AU

C −

alte

rnat

ives

Page 38: Cross-study differential gene expression - Harvard Universitybcb.dfci.harvard.edu/~gp/talks/PQG09.pdf · outline: three related questions Which aspects of gene expression can be consistently

Random seeds

0.80 0.84 0.88

0.80

0.82

0.84

0.86

0.88

0.90

differential

0.88 0.92 0.96

0.88

0.90

0.92

0.94

0.96

concordant

0.75 0.80 0.85 0.90

0.75

0.80

0.85

0.90

discordant

S = 4

0.945 0.955 0.965 0.975

0.945

0.950

0.955

0.960

0.965

0.970

0.975

0.980

0.93 0.95 0.97

0.93

0.94

0.95

0.96

0.97

0.98

0.86 0.90 0.94 0.98

0.86

0.88

0.90

0.92

0.94

0.96

0.98

S = 8

0.55 0.65 0.75

0.55

0.60

0.65

0.70

0.75

0.65 0.70 0.75 0.80

0.65

0.70

0.75

0.80

0.45 0.55 0.65

0.45

0.50

0.55

0.60

0.65

0.70

S = 16

Bayesian AUC

z A

UC

Page 39: Cross-study differential gene expression - Harvard Universitybcb.dfci.harvard.edu/~gp/talks/PQG09.pdf · outline: three related questions Which aspects of gene expression can be consistently

Split Study Validation

To assess the baseline behavior of XDE, we split the Huang 1

study into four disjoint parts, treating each part as anindependent study

We randomly assigned 5 estrogen receptor (ER) negative and16 ER positive samples to each split.

We denote the Bayesian effect size (BES) for gene g andplatform p by δg∆gp

cτpσbpgp

and use this as a study-specific Bayesian

estimate of differential expression

1Huang et al. 2003, Lancet, 361(9369):1590–6

Page 40: Cross-study differential gene expression - Harvard Universitybcb.dfci.harvard.edu/~gp/talks/PQG09.pdf · outline: three related questions Which aspects of gene expression can be consistently

Split Study Validation

T-test XDE

Page 41: Cross-study differential gene expression - Harvard Universitybcb.dfci.harvard.edu/~gp/talks/PQG09.pdf · outline: three related questions Which aspects of gene expression can be consistently

4 breast cancer studies

platform ER- ER+Hedenfalk cDNA 6 10

Sorlie cDNA 30 81Farmer Affymetrix hu133a 22 27Huang Affymetrix hu95av2 23 65

Table: Distribution of the estrogen receptor status in the three studies

Platform-specific annotations for the features were cross-referencedby Entrez gene identifiers to yield a set of 2064 features measured ineach platform.

Page 42: Cross-study differential gene expression - Harvard Universitybcb.dfci.harvard.edu/~gp/talks/PQG09.pdf · outline: three related questions Which aspects of gene expression can be consistently

4 breast cancer studies: concordant genes

avg{Pr(concordant)}

Fre

quen

cy

0.2 0.4 0.6 0.8 1.0

0

50

100

150

PMC(g)

−2 0 1 2 3

−2

−1

0

1

2

3

Hedenfalk

−2

−1

0

1

2

3

oo

o

o

o

o

ooooo

o

ooo

o

o

oooo

o

o

o

o

oo

o

o

o

o

oo

ooo

o

o

o

oo

oo

oo

ooo

oo

oo

o

oo oo

o

ooo

o

o

o

ooo

ooo

oo

o

o

o

oo

oo

o

o

oo

o

o

oo

o

o

oo

o

o

o

oooo

oo

o

oo

o

oo

o

o

o

oo

o

oo

o

oo

oooo

oo

oo

o

o

o

o

oo

o

oo

oo

o

oo

o ooo

oo

ooo

oo

ooo

o

o

oooo

o o

oo

o

o

o

o oo

oooo

o

o

oo

o

o

o

o

ooo

o

oo

o

oooooo oooooo

o

oo

oooo

Sorlie

−2−1

01234

oo

o

o

o

oo

oooo

o

ooo

o

o

oooo

o

o

o

o

oo

o

o

o

o

oo

ooo

o

o

o

oo

oo

oo

ooo

oo

oo

o

oo

o

o

oooo

o

o

o

ooo

ooo

oo

o

o

o

oo

oo

o

o

oo

o

o

oo

o

o

oo

o

o

o

oooo

o

o

o

oo

o

oo

o

o

o

oo

o

ooo

oo

oo

oo

oo

oo

o

ooo

oo

o

oo

oo

o

ooo o

oo

oooo

o

oo

ooo

o

o

oooo

oo

oo

o

o

o

oo

o

o

ooo

o

o

oo

o

o

o

o

oo o

o

oo

o

oooooo

oooo

oo

o

oooo

oo oo

o

o

o

oo

oooo

o

ooo

o

o

oooo

o

o

o

o

oo

o

o

o

o

o o

ooo

o

o

o

oo

oo

oo

ooo

oo

oo

o

oo

o

o

ooo

o

o

o

o

ooo

ooo

oo

o

o

o

oo

oo

o

o

o o

o

o

oo

o

o

oo

o

o

o

oooo

o

o

o

oo

o

oo

o

o

o

oo

o

ooo

oo

oo

oo

oo

oo

o

ooo

oo

o

oo

oo

o

o ooo

oo

oo oo

o

oo

ooo

o

o

oooo

oo

oo

o

o

o

oo

o

o

ooo

o

o

oo

o

o

o

o

oo o

o

oo

o

oooooo

oooo

oo

o

ooo o

oo

Farmer

−2 0 1 2 3

−2

−1

0

1

2

3

oo

o

o

o

oooo

o

o

o

oo

o

o

o

oooo

o

o

o

o

oo

o

o o

o

o

o

ooo

o

o

o

oo

oo

oo

ooo

oo

ooo

oo

oo

ooo

o

o

o

o

o

oo

ooo

oo

o

o

o

oo

oo

o

o

oo

o

o

o

o

o

o

oo

o

o

o

oo

oo

o

o

o

oo

o

oo

o

o

o

oo

o

oo

o

o

o

oo

oo

oo

oo

o

oo o

oo

o

oo

o

o

o

oo

o o

o

oo

ooo o

oo

ooo

o

o

ooo

o

oo

oo

o

o

o

oo

o

o

ooo

o

o

oo

o

o

o

o

oo o

o

oo

o

oooooo oooo

oo

o

oooo

oo

−2 0 1 2 3

oo

o

o

o

oooo

o

o

o

oo

o

o

o

oooo

o

o

o

o

oo

o

o o

o

o

o

ooo

o

o

o

oo

oo

oo

ooo

oo

ooo

oooo

ooo

o

o

o

o

o

oo

ooo

oo

o

o

o

oo

oo

o

o

oo

o

o

o

o

o

o

oo

o

o

o

oooo

o

o

o

oo

o

oo

o

o

o

o o

o

oo

o

o

o

oo

oo

oo

oo

o

oo o

oo

o

oo

o

o

o

o o

oo

o

oo

o oo o

oo

ooo

o

o

oooo

oo

oo

o

o

o

oo

o

o

ooo

o

o

oo

o

o

o

o

oo o

o

oo

o

oooooooooo

oo

o

ooo o

oo

−2 0 2 4

oo

o

o

o

oooo

o

o

o

oo

o

o

o

oooo

o

o

o

o

oo

o

o o

o

o

o

ooo

o

o

o

oo

oo

oo

ooo

oo

ooo

oo

oo

ooo

o

o

o

o

o

oo

ooo

oo

o

o

o

oo

oo

o

o

oo

o

o

o

o

o

o

oo

o

o

o

oooo

o

o

o

oo

o

oo

o

o

o

oo

o

oo

o

o

o

oo

oo

oo

oo

o

oo o

oo

o

oo

o

o

o

oo

oo

o

oo

ooo o

oo

ooo

o

o

ooo

o

oo

oo

o

o

o

oo

o

o

oo

o

o

o

oo

o

o

o

o

ooo

o

o o

o

oooo

oo ooooo

o

o

oooo

oo

−2 0 1 2 3

−2

−1

0

1

2

3

Huang

posterior average of BES

Page 43: Cross-study differential gene expression - Harvard Universitybcb.dfci.harvard.edu/~gp/talks/PQG09.pdf · outline: three related questions Which aspects of gene expression can be consistently

4 breast cancer studies: discordant genes

avg{Pr(discordant)}

Fre

quen

cy

0.0 0.2 0.4 0.6 0.8

0

50

100

150

PMD(g)

−2 0 1 2 3

−2

−1

0

1

2

3

Hedenfalk

−2

−1

0

1

2

3

x

xxxx

xxx

xxxx

xxxxx

xx

xxxx

xxx

xx

xx

x

xx

xxxx

xxx

x x

x x xxx

xx

xx

x

xxx

xxx

x

x

xx

xxx

xx

xxxxxxxx

xxxx

x

xx

xx x

xxx

xxx

xx

xxxx

x xx

x

x x

x

Sorlie

−2−1

01234

xxxx x xxx xxx

xxx

xx

xx

xxxxx

xxxx x

xx

xxx

xxx

x

xx

x

x xx

x

xxxxx x

xxxx x

x

xx x

x

x

x xxx

xxxxxxx

xxx xxx

x x

xxx x xxx

x

xxx

xxxx xxx

x

xx x

xx xxxx x xx

x xxxx

xxx

xx

x

xxxxxxxx

x xx

x

xxx

xxxx

xx

x

xxx

x

x xxxx x

xxxx x

x

xx x

x

x

x xxx

xxx

xxxx

xxx xxxx x

xxx xxxx

x

xx x

xxxx xx x

x

xx x

xx

Farmer

−2 0 1 2 3

−2

−1

0

1

2

3

x

xxx x

xx

x

xx

xx

xxx

xx

x

x

xxx

x

x

xx

x x

x

x

x

xxx

x

x

x xx

xxx

xx

xxx

xx

x

x

x

xx

xx

x

x

x

x

x

x

x

xx

x

xx

xxxxxxx

x

xxx

x

xx x x

xx

x

x

xxx

xx

xx

x

xx

xx

xx x

x

−2 0 1 2 3

x

xxx x

xx

x

xx

xx

xxx

xx

x

x

xxx

x

x

xx

x x

x

x

x

xxxx

x

xxx

x xx

xx

xx

x

xx

x

x

x

xx

xx

x

x

x

x

x

x

x

xx

x

xx

xxxxxxx

x

xxx

x

xx x x

xx

x

x

xx x

xx

xx

x

xxx

x

xxx

x

−2 0 2 4

x

xxxx

xx

x

xx

x x

xxx

xx

x

x

xxx

x

x

xx

xx

x

x

x

xxx

x

x

x xx

xxx

xx

xxx

xx

x

x

x

xx

xx

x

x

x

x

x

x

x

xx

x

xx

xxxxx

xx

x

xxx

x

xx xx

xx

x

x

xxx

xx

xx

x

xx

xx

xx x

x

−2 0 1 2 3

−2

−1

0

1

2

3

Huang

posterior average of BES

Page 44: Cross-study differential gene expression - Harvard Universitybcb.dfci.harvard.edu/~gp/talks/PQG09.pdf · outline: three related questions Which aspects of gene expression can be consistently

4 breast cancer studies: outlying studies

0.6 0.7 0.8 0.9 1.0

0.6

0.7

0.8

0.9

1.0

m = 1

m=

3

0.6 0.7 0.8 0.9 1.0

m = 1

m=

4

Probability of concordant differential expression in at least mstudies

Page 45: Cross-study differential gene expression - Harvard Universitybcb.dfci.harvard.edu/~gp/talks/PQG09.pdf · outline: three related questions Which aspects of gene expression can be consistently

4 breast cancer studies: goodness of fit

0 500 1000 1500 2000

0

5

10

15

Index

−lo

g10(

p−va

lue)

n = 16

0 500 1000 1500 2000

0

5

10

15

Index

−lo

g10(

p−va

lue)

n = 111

0 500 1000 1500 2000

0

5

10

15

Index

−lo

g10(

p−va

lue)

n = 49

0 500 1000 1500 2000

0

5

10

15

Index

−lo

g10(

p−va

lue)

n = 88

Page 46: Cross-study differential gene expression - Harvard Universitybcb.dfci.harvard.edu/~gp/talks/PQG09.pdf · outline: three related questions Which aspects of gene expression can be consistently

R package: XDE

> data(expressionSetList)> params <- new("XdeParameter",+ esetList = expressionSetList,+ phenotypeLabel = "adenoVsquamous")> fit <- xde(params, expressionSetList)> plot(fit)

10000

20000

30000

40000

potential

0.00.20.40.60.81.0

a

0.00.20.40.60.81.0

b

0.00.20.40.60.81.0

l

0.00.20.40.60.81.0

t

0 400 1000

1.2

1.4

1.6

1.8

γ2

0 400 1000

0.00.20.40.60.81.0

c2

0 400 1000

0.60.81.01.21.41.61.8

τ2

0 400 1000

0.00

0.05

0.10

0.15

ξ

0 400 1000

0.00.20.40.60.81.0

ρ

0 400 1000

0.00.20.40.60.81.0

r

Look for it here: www.bioconductor.org

Page 47: Cross-study differential gene expression - Harvard Universitybcb.dfci.harvard.edu/~gp/talks/PQG09.pdf · outline: three related questions Which aspects of gene expression can be consistently

Credits

INTEGRATIVE CORRELATION:Les Cope, Ed Gabrielson, Liz Garrett-Mayer

INTEGRATIVE ASSOCIATION:Simens Zhong, Luigi Marchionni, Les Cope, Ed Gabrielson, LizGarrett-Mayer

XDE:Rob Scharpf, Häkon Tjemeland, Andrew Nobel