cross-study differential gene expression - harvard...
TRANSCRIPT
Marshall etal Science 2004; Tan etal NAR 2004
Marshall: “Little overlap.Three array systemsrated the activity of 185genes differently in onetest.”
FDA approves Mammaprint in Feb 2007
outline: three related questions
Which aspects of gene expression can be consistentlymeasured across studies and platforms?Integrative Correlation (Parmigiani etal JCO 2004)To what extent are the biological conclusions confirmedacross studies?Integrative Association (Zhong etal almost done)How do we perform a joint analysis?Hierarchical Modeling (Scharpf etal JASA 2009)
INTEGRATIVE CORRELATION
A PROFILE FOR BRCA1-LINKED TUMORS?
a profile for BRCA1-linked tumors?
Studies:van’t Veer, Nature 2002 (Rosetta, Agilent long oligos)Hedenfalk, NEJM 2001 (NHGRI, cDNA)The overlap among the lists of BRCA1-related genes ismeager, and reproducibility has been criticized.Does breast cancer in BRCA1 germline mutation carriershave a specific molecular profile?
integrative correlation (IC)
Expression matrices,studies a and b:
A∗, B∗
Gene by Genecorrelation matrices:
Ca, Cb
Integrative Correlationfor Gene g
Cor(Cag ,Cbg)
0.2
0.4
0.6
0.8
1.0
1.2
1.4
STUDY A
EX
PR
ES
SIO
N
A1 A2 A3 A4
● ●● ● ●
●
●
●●
0.2
0.4
0.6
0.8
1.0
1.2
1.4
STUDY B
EX
PR
ES
SIO
N
B1 B2 B3 B4 B5
● ●● ●●
● ● ●●
●● ●
●
PA
IRW
ISE
CO
RR
ELA
TIO
NS
● ● 0.63
● ● −0.95
● ● −0.36
PA
IRW
ISE
CO
RR
ELA
TIO
NS
● ● 0.52
● ● −0.8
● ● 0.046
GP etal CCR 2004, P Pavlidis, JK Lee
integrative correlations: examples
−1.0 −0.5 0.0 0.5 1.0
−1.
0−
0.5
0.0
0.5
1.0
Correlations of Gene 1 in Study A
Cor
rela
tions
of G
ene
1 in
Stu
dy B
r = 0.75
−1.0 −0.5 0.0 0.5 1.0
−1.
0−
0.5
0.0
0.5
1.0
Correlations of Gene 2 in Study AC
orre
latio
ns o
f Gen
e 2
in S
tudy
B
r = −0.67
−1.0 −0.5 0.0 0.5 1.0
−1.
0−
0.5
0.0
0.5
1.0
Correlations of Gene 3 in Study A
Cor
rela
tions
of G
ene
3 in
Stu
dy B
r = −0.03
−1.0 −0.5 0.0 0.5 1.0
−1.
0−
0.5
0.0
0.5
1.0
Correlations of Gene 4 in Study A
Cor
rela
tions
of G
ene
4 in
Stu
dy B
r = 0.04
−1.0 −0.5 0.0 0.5 1.0
−1.
0−
0.5
0.0
0.5
1.0
Correlations of Gene 5 in Study A
Cor
rela
tions
of G
ene
5 in
Stu
dy B
r = −0.02
integrative correlations: alternative representation
xa, xb standardized gene expression of gene g in the twostudies. integrative covariance of x is
xaAtIBx tb
and the integrative variance of xa is
xaAtIAx ta.
The integrative correlation of gene g is
xaAtIBx tb√
xaAtIAx ta
√xbBtIBx t
b
.
IC and sample correlationsHs.174070
Integrative Cor = 0.86
Harvard
Sta
nfor
d
integrative correlation and integrative association
−0.4 −0.2 0.0 0.2 0.4 0.6
02
46
810
12
Integrative Correlation
Den
sity
−0.2 −0.1 0.0 0.1 0.2 0.3
−0.
4−
0.2
0.0
0.2
0.4
0.6
SAM statistics for BRCA1 versus Wildtype: VANT’VEER
SA
M s
tatis
tics
for
BR
CA
1 ve
rsus
Wild
type
: HE
DE
NF
ALK
UNREPRODUCIBLE GENES
CORR = −0.132
−0.4 −0.2 0.0 0.2
−0.
50.
00.
51.
0
SAM statistics for BRCA1 versus Wildtype: VANT’VEER
SA
M s
tatis
tics
for
BR
CA
1 ve
rsus
Wild
type
: HE
DE
NF
ALK
REPRODUCIBLE GENES
CORR = 0.712
Here “Reproducible” means high IC
observed vs expected IC and false discovery rates
−5 0 5
0.0
0.1
0.2
0.3
0.4
SAM STATISTICS: observed (solid) and expected (dashed)
DE
NS
ITY
BRCA1 profile
HEDENFALK
ALCAMDKFZp762E1
ZNF22VCAM1SF3B5BIRC3
SH3GLB1GALNT10
DYSFITGB5MTIF2
INPP4BNRIP1
GATA3MSN
TOB1MYBL2
C20orf55YES1
PPP1CBGTPBP4NUP160
MMP7BCL2A1
SEC13L1NUP155
C1GALT1EML2
TIMP3CTPS
TNFRSF1BUGDH
KIAA1223KIAA0063
RFC4MTCP1MFGE8
TP53BP2POLR2FCDKN2C
SASNCK1
FBXL5GDI2
CSTBNSEP1
PLODILF2
TFAMNOLC1MDH1
DHCR24CD58
TNFAIP1GTF2E2
BTG3CALU
HMGN3DLG7
USP10KIAA0232
KLK6LYN
LPIN1WARS
TRIM29GART
RARRES1VRK2
MPLCKS2CYB5CTSKINDO
HDGFTOPBP1
SSBP1TPX2
TCEAL1DSC2
VAN’T VEER
scale
−3 −2 −1 0 1 2 3
cross-validation
Concordance between .78 and .94 by “square onecross-validation”.More informative than family historyUseful complement to genetic testing?Useful surrogate for mutation analysis?
INTEGRATIVE ASSOCIATION:
BASAL SUBTYPE AND SURVIVALIN BREAST CANCER
Sorlie PNAS 03: subtypes, profiles, prognosis
overlap of cross-referencing approaches
integrative association and filtering
V vs S V vs H S vs H
Allgenes
HighIC
integrative association and Taiwan “batch 2”
V vs S V vs H2 S vs H2
Allgenes
HighIC
integrative association and filtering lots of ways
genes and profiles, “old” intrinsic gene list
genes and profiles, “new” intrinsic gene list
XDE: A BAYESIAN MODEL
FOR COMBINED ANALYSIS
discordance
Important insight about technology, study design or thegenetics of alternative splicing can potentially be gained byidentifying and following discordant genes
Discordant patterns of expression could emerge from
genetic heterogeneity of samples across studies
alternative splicingtwo technologies that measure a gene’s expression by targetingportions of a gene that are associated with different transcripts that arenegatively correlated with each other.
4 breast cancer studies
−6 −2 0 2 4
−6
−4
−2
0
2
4
Hedenfalk
−4
−2
0
2
4 x
x xxx
xx
xx
xx
x
xx
x
x x
x
x
xx x
x
x
xx x
x
x
x
x
xx
xx
x
x
xx
x
x
x
x
x
x
x
x
xx
x
x
x
xx
x xx
x
x
x
x
x
xx
x
x
xxxxx xxxx
x
x
x
xx
x
x
x
x x
xx
x
xx
xx
x
xx
x
xx
xx
x
x
x
x
Sorlie
−5
0
5
10
xx xx x xxx xxx
x
xx
x
xxx
x
xx xx
xxx
x xx
x
xxx
xx
x
x
xx
x
xx
x
x
xx xx xx
xxxx x
x
xx xx
x
xx
xx xxx
xxx
x
xxx xx
xx
x
x xxx x
xxx
xx
xx
xx
x xxx
x
xxx
xx
xxxx x xxx x xx
x
xx
x
xx x
x
xxxx
xx x
x xx
x
xxx
xx
x
x
xx
x
xx
x
x
x xxxxx
xx
x x xx
xx xx
x
xxx
x xxxx x
xx
xx
x xxx
xx
xxx xx
xxx
xx
xx
xx
x xx x
x
xx x
xx
Farmer
−6 −2 0 2 4
−4
−2
0
2
4
6
x
x xx x
xx
x
xx
xx
xx
x
xx
x
x
xx
x
x
x
xx
x x
x
x
x
xx
x
x
x
x
xx
xx
x
xx
xx x
x x
x
x
x
xx
x xx
x
x
x
x
x
x
xx
x
x
xxxx xxxx
x
xxx
x
x xxx
xx
x
x
xxx
xx
xx
x
x
x
xx
x
x
x
x
−4 −2 0 2 4
x
xxx x
xx
x
x x
xx
xx
x
xx
x
x
xxx
x
x
x x
x x
x
x
x
xx
x
x
x
x
xx
xx
x
xx
xxx
xx
x
x
x
x x
xxx
x
x
x
x
x
x
xx
x
x
xxxxx x
xx
x
xx x
x
xx x x
xx
x
x
xxx
xx
xx
x
x
x
xx
x
x
x
x
−5 0 5 10
x
xxxx
xx
x
xx
xx
x x
x
xx
x
x
xxx
x
x
xx
xx
x
x
x
xx
x
x
x
x
xx
xx
x
x x
xxx
xx
x
x
x
xx
x xx
x
x
x
x
x
x
x x
x
x
x xx xx
xx x
x
xxx
x
xx xx
xx
x
x
xxx
xx
xx
x
x
x
xx
x
x
x
x
−4 0 2 4 6
−4
−2
0
2
4
6
Huang
−6 −2 0 2 4
−6
−4
−2
0
2
4
Hedenfalk
−4
−2
0
2
4
oo
o
o
o
ooo
o oo
o
oo
o
o
o
oo oo
o
o
o
o
o
o
o
o
oo
o
o
oo
o
o
o
o
oo
oo
oo
oo
o
oooo
o
oo
o
o
ooo
o
o
o
o
ooo
o o
oo
o
o
o
ooo
oo
o
o
o o
o
o
o
o
o
o
o
o
o
o
o
o oo
o
o oo
o
o
o
o o
o
o
o
o
o
o
o
oo
oo
oooo
o
o
oo
o
o
o
o
oo
o oo
o o
oo
oo ooo o
o
oo o
oo
oo
oo
o
oo
oo
o
o
oo
o
o
o
o
oo
ooo o
o
o
oo
o
o
o
o
oooo
oo
o
o
ooo
ooo
o oo
oo
o
oo
o
oo
o
Sorlie
−5
0
5
10
oo
o
o
o
oo
o o oo
o
ooo
o
o
oo oo
o
o
o
o
o
o
o
o
o
o
oo
ooo
o
o
o
oo
oo
oo
oo
o
oo
oo
o
oo
o
o
oooo
o
o
o
ooo
o oo
oo
o
o
o
oo
oo
o
o
o o
o
o
o
oo
o
oo
o
o
o
ooo
o
oo
o
o o
o
o o
o
o
o
oo
o
ooo
oo
oooo
oo
oo
o
ooo
oo
ooo
oo
o
oo
o oo
o oooo
o
oo
ooo
o
o
oo oo
o
o
oo
o
o
o
o
oo
o
oo
o
o
o
oo
o
o
o
o
oo oo
oo
o
o oooo o
ooo
o
oo
o
o
ooo
oo
oo
o
o
o
oooooo
o
o oo
o
o
oooo
o
o
o
o
o
o
o
o
o
o
o o
ooo
o
o
o
oo
oo
oo
o o
o
oo
oo
o
oo
o
o
oo
o o
o
o
o
ooo
ooo
oo
o
o
o
oo
oo
o
o
oo
o
o
o
oo
o
oo
o
o
o
ooo
o
ooo
oo
o
oo
o
o
o
oo
o
ooo
oo
oooo
o o
oo
o
ooo
oo
ooo
oo
o
oooo
oooo oo
o
oo
ooo
o
o
o oo o
o
o
oo
o
o
o
o
oo
o
oo
o
o
o
oo
o
o
o
o
oo oo
oo
o
oo oooo
ooo
o
oo
o
o
oo o
oo
Farmer
−6 −2 0 2 4
−4
−2
0
2
4
6
o
o
o
o
o
oo
o o
o
o
o
oo
o
o
o
oooo
o
o
o
o
oo
o
oo
o
o
oo
o
o
o
o
o
oo
oo
oo
ooo
oo
oo o
o
ooo
ooo
o
o
o
o
o
oo
o oo
oo
o
o
ooo
oo
o
o
o
o
o
o
oo
o
o
oo
o
o
o
o
o
oo
oo
o
o o
o
oo
o
oo
oo
o
ooo
o
o
oooo
oo
oo
o
ooo
oo
o
oo
o
o
o
o o
o oo
oooo
o o
oo
ooo
o
o
oo o
o
o
o
oo
o
o
o
o
oo
o
oo
o
o
o
oo
o
o
o
o
oo o
o
oo
o
o
o
oo
ooo
ooo oo
o
o
ooo
o o
−4 −2 0 2 4
o
o
o
o
o
oooo
o
o
o
o o
o
o
o
oooo
o
o
o
o
oo
o
o o
o
o
oo
o
o
o
o
o
oo
oo
oo
ooo
oo
ooo
o
ooo
oo
oo
o
o
o
o
oo
ooo
oo
o
o
ooo
oo
o
o
o
o
o
o
oo
o
o
oo
o
o
o
o
o
o o
oo
o
oo
o
oo
o
oo
o o
o
o oo
o
o
oooo
oo
oo
o
oo o
o o
o
oo
o
o
o
o o
ooo
ooo ooo
oo
ooo
o
o
ooo
o
o
o
oo
o
o
o
o
oo
o
ooo
o
o
oo
o
o
o
o
oo o
o
oo
o
o
o
oo
ooo
oo ooo
o
o
oo o
oo
−5 0 5 10
o
o
o
o
o
oo
oo
o
o
o
oo
o
o
o
oooo
o
o
o
o
o o
o
o o
o
o
oo
o
o
o
o
o
o o
o o
oo
ooo
oo
oo o
o
o oo
oo
oo
o
o
o
o
oo
ooo
o o
o
o
ooo
oo
o
o
o
o
o
o
ooo
o
oo
o
o
o
o
o
oo
oo
o
oo
o
oo
o
oo
oo
o
ooo
o
o
oo oo
oo
oo
o
oo o
oo
o
oo
o
o
o
oo
ooo
oooooo
oo
oo o
o
o
ooo
o
o
o
oo
o
o
o
o
oo
o
oo
o
o
o
o o
o
o
o
o
ooo
o
o o
o
o
o
oo
oo o
oo oo o
o
o
ooo
o o
−4 0 2 4 6
−4
−2
0
2
4
6
Huang
t-statistics for estrogen receptor statusin bold are genes significant in 3 out of 4 studies
Hierarchical model for gene expression
����x
����δHHHH
HHHHHj
����ξ
?
����ν
@@@@R
����∆
����
����τ2
��
��
@@@@R
����ρ
?
����γ2
�����
����a
AAAAU
����r
?
����c2
AAAAU
����b
����� ����
ϕ��������������)
����θ
PPi�����)
����σ2
���
���
���
���
��+
�����
�����
�������������������9
����tPPi
����l��)
Figure: A graphical representation of the hierarchical Bayesian model
∆gp : effect size or ’offset’ (indexed by gene and platform)
δg : binary indicator for differential expression (indexed by gene)
Hierarchical model for gene expression
Level 1:
xgsp|νgp, δg ,∆gp, σ2g0p, σ
2g1p ∼ N
“νgp + δg (2ψsp − 1)∆gp, σ
2gψspp
”
Level 2:
P(δg = 1|ξ) = ξ, where ξ ∼ Beta(αξ, βξ)
νg ∼ N`0,Σg
´, (Σ)pq = γ2ρpq
rτ2
p τ2qσ
2apgp σ
2aqgq and
Qp τ
2p = 1
∆g ∼ N(0,Rg
), (Rg)pq = c2rpq
√τ2
p τ2qσ
2bpgp σ
2bqgq
σ2gp =
qσ2
g0pσ2g1p, σ
2g0p = σ2
gpϕgp, σ2g1p =
σ2gpϕgp
σ2gp|tp, lp ∼ Gamma(tp, lp), ϕgp|θp, λp ∼ Gamma(θp, λp)
Level 3:
P(ap = 0) = p0a, P(ap = 1) = p1
a, ap|ap ∈ (0, 1) ∼ Beta(αa, βa)
P(bp = 0) = p0b, P(bp = 1) = p1
b, bp|bp ∈ (0, 1) ∼ Beta(αb, βb)
Barnard et al. priors for rpq and ρpq ; joint uniform for τ11 , . . . , τ
2P and
Qp τ
2p = 1
tp ∼ Unif(0,∞), lp ∼ Unif(0,∞), γ2 ∼ Unif(0,∞), c2 ∼ Unif(0,∞)
Hierarchical model for gene expression
Level 1:
xgsp|νgp, δg ,∆gp, σ2g0p, σ
2g1p ∼ N
“νgp + δg (2ψsp − 1)∆gp, σ
2gψspp
”
Level 2:
P(δg = 1|ξ) = ξ, where ξ ∼ Beta(αξ, βξ)
νg ∼ N`0,Σg
´, (Σ)pq = γ2ρpq
rτ2
p τ2qσ
2apgp σ
2aqgq and
Qp τ
2p = 1
∆g ∼ N(0,Rg
), (Rg)pq = c2rpq
√τ2
p τ2qσ
2bpgp σ
2bqgq
σ2gp =
qσ2
g0pσ2g1p, σ
2g0p = σ2
gpϕgp, σ2g1p =
σ2gpϕgp
σ2gp|tp, lp ∼ Gamma(tp, lp), ϕgp|θp, λp ∼ Gamma(θp, λp)
Level 3:
P(ap = 0) = p0a, P(ap = 1) = p1
a, ap|ap ∈ (0, 1) ∼ Beta(αa, βa)
P(bp = 0) = p0b, P(bp = 1) = p1
b, bp|bp ∈ (0, 1) ∼ Beta(αb, βb)
Barnard et al. priors for rpq and ρpq ; joint uniform for τ11 , . . . , τ
2P and
Qp τ
2p = 1
tp ∼ Unif(0,∞), lp ∼ Unif(0,∞), γ2 ∼ Unif(0,∞), c2 ∼ Unif(0,∞)
Hierarchical model for gene expression
Level 1:
xgsp|νgp, δg ,∆gp, σ2g0p, σ
2g1p ∼ N
“νgp + δg (2ψsp − 1)∆gp, σ
2gψspp
”
Level 2:
P(δg = 1|ξ) = ξ, where ξ ∼ Beta(αξ, βξ)
νg ∼ N`0,Σg
´, (Σ)pq = γ2ρpq
rτ2
p τ2qσ
2apgp σ
2aqgq and
Qp τ
2p = 1
∆g ∼ N(0,Rg
), (Rg)pq = c2rpq
√τ2
p τ2qσ
2bpgp σ
2bqgq
σ2gp =
qσ2
g0pσ2g1p, σ
2g0p = σ2
gpϕgp, σ2g1p =
σ2gpϕgp
σ2gp|tp, lp ∼ Gamma(tp, lp), ϕgp|θp, λp ∼ Gamma(θp, λp)
Level 3:
P(ap = 0) = p0a, P(ap = 1) = p1
a, ap|ap ∈ (0, 1) ∼ Beta(αa, βa)
P(bp = 0) = p0b, P(bp = 1) = p1
b, bp|bp ∈ (0, 1) ∼ Beta(αb, βb)
Barnard et al. priors for rpq and ρpq ; joint uniform for τ11 , . . . , τ
2P and
Qp τ
2p = 1
tp ∼ Unif(0,∞), lp ∼ Unif(0,∞), γ2 ∼ Unif(0,∞), c2 ∼ Unif(0,∞)
Estimates of differential expression
Parameter XDE estimates
differentially expressed PME(g)
concordantly expressed PMC(g)
discordantly expressed PMD(g)
PM·(g) denotes the posterior mean of the following indicators:
Eg ≡ δg
Cg ≡
1 δg = 1 and ∆g· have the same sign,0 otherwise
Dg ≡
1 δg = 1 and ∆g· do not have the same sign,0 otherwise
Estimates of differential expression
Parameter XDE estimates
differentially expressed PME(g)
concordantly expressed PMC(g)
discordantly expressed PMD(g)
PM·(g) denotes the posterior mean of the following indicators:
Eg ≡ δg
Cg ≡
1 δg = 1 and ∆g· have the same sign,0 otherwise
Dg ≡
1 δg = 1 and ∆g· do not have the same sign,0 otherwise
Estimates of differential expression
Parameter XDE estimates
differentially expressed PME(g)
concordantly expressed PMC(g)
discordantly expressed PMD(g)
PM·(g) denotes the posterior mean of the following indicators:
Eg ≡ δg
Cg ≡
1 δg = 1 and ∆g· have the same sign,0 otherwise
Dg ≡
1 δg = 1 and ∆g· do not have the same sign,0 otherwise
Estimates of differential expression
EstimatesParameter XDE Alternative
differentially expressed PME(g) uE(g)1
concordantly expressed PMC(g) z-score2
discordantly expressed PMD(g) uD(g)3
1uE(g) ≡ α1|Ug1| + . . . + αP |UgP |, where
αp ≡Lpq
SpPPi=1 Li
qSi
for p ∈ {1, . . . , P}
L is the covariance loading from the first principal componentsand S is the number of samples.
2cross-study estimator for differential expression described in Choi et al., 2003
3uD(g) ≡
uE(g) sign(Ug1) = · · · = sign(UgP )−1× uE(g) otherwise.
Simulation using three lung cancer studies
1 Randomly assign a binary covariate (ψ∗) to early stagelung adenocarcinomas
2 For each gene, simulate δ∗ from a Bernoulli(ξ∗)3 Simulate offsets ∆∗
24 ∆?g1
∆?g2
∆?g3
35 ∼ N
0@k?24 sg1
sg2
sg3
35 , 1c?
24 s2g1 r?1 sg1sg2 r?2 sg1sg3
r?1 sg2sg1 s2g2 r?3 sg2sg3
r?2 sg3sg1 r?3 sg3sg2 s2g3
351A4 Compute the simulated expression values:
x?gsp =
xgsp + (2ψ?sp − 1)∆?
gp if δ∗g = 1xgsp otherwise.
Simulation
We simulated a number of artificial datasets of different sample sizes (S) byvarying parameters that affect the location (k?), precision (c?), andinter-study correlation (r?) of the simulated offsets, as well as the proportionof genes that are differentially expressed (ξ?)
Simulation k? S c? r? ξ?
A† 0.5 4 0.5 (0.1, 0.2, 0.4) 0.10B · · · · 0.50C · · · (0.8, 0.9, 0.92) 0.10D · · · · 0.50E† · 8 0.5 (0.1, 0.2, 0.4) 0.10F · · 1 · 0.10G · · · · 0.50H · · · (0.8, 0.9, 0.92) 0.10I · · · · 0.50J† 0 16 10 (0.1, 0.2, 0.4) 0.10K · · · · 0.50L · · · (0.8, 0.9, 0.92) 0.10M · · · · 0.50O · 32 20 (0.1, 0.2, 0.4) 0.10P · · · · 0.50Q · · · (0.8, 0.9, 0.92) 0.10R · · · · 0.50
† used 10 different seeds to assess sensitivity to randomly generated values
4 genes, 2 studies
gene δ? sign (∆?) E? C? D?
1 0 · 0 0 02 1 {−,−} 1 1 03 1 {−,+} 1 0 14 1 {+,+} 1 1 0
Columns E?, C?, and D? are indicators for true differential expression,concordant differential expression, and discordant differentialexpression, respectively.
Simulations A - R
z
z
z
z
0.89 0.91 0.93
0.89
0.90
0.91
0.92
0.93
0.94
s
A − D
S=4
z
z
z
z
0.90 0.92 0.94 0.96 0.98
0.90
0.92
0.94
0.96
0.98
t
t
t
s
s
E − I
S=8
z
zz
z
0.75 0.80 0.85 0.90
0.75
0.80
0.85
0.90
t
t
s
s
J − M
S=16 z
z
z
z
0.68 0.72 0.76 0.80
0.68
0.70
0.72
0.74
0.76
0.78
0.80 t
t
t
t s
s
s
O − R
S=32
AUC − XDE
AU
C −
alte
rnat
ives
Random seeds
0.80 0.84 0.88
0.80
0.82
0.84
0.86
0.88
0.90
differential
0.88 0.92 0.96
0.88
0.90
0.92
0.94
0.96
concordant
0.75 0.80 0.85 0.90
0.75
0.80
0.85
0.90
discordant
S = 4
0.945 0.955 0.965 0.975
0.945
0.950
0.955
0.960
0.965
0.970
0.975
0.980
0.93 0.95 0.97
0.93
0.94
0.95
0.96
0.97
0.98
0.86 0.90 0.94 0.98
0.86
0.88
0.90
0.92
0.94
0.96
0.98
S = 8
0.55 0.65 0.75
0.55
0.60
0.65
0.70
0.75
0.65 0.70 0.75 0.80
0.65
0.70
0.75
0.80
0.45 0.55 0.65
0.45
0.50
0.55
0.60
0.65
0.70
S = 16
Bayesian AUC
z A
UC
Split Study Validation
To assess the baseline behavior of XDE, we split the Huang 1
study into four disjoint parts, treating each part as anindependent study
We randomly assigned 5 estrogen receptor (ER) negative and16 ER positive samples to each split.
We denote the Bayesian effect size (BES) for gene g andplatform p by δg∆gp
cτpσbpgp
and use this as a study-specific Bayesian
estimate of differential expression
1Huang et al. 2003, Lancet, 361(9369):1590–6
Split Study Validation
T-test XDE
4 breast cancer studies
platform ER- ER+Hedenfalk cDNA 6 10
Sorlie cDNA 30 81Farmer Affymetrix hu133a 22 27Huang Affymetrix hu95av2 23 65
Table: Distribution of the estrogen receptor status in the three studies
Platform-specific annotations for the features were cross-referencedby Entrez gene identifiers to yield a set of 2064 features measured ineach platform.
4 breast cancer studies: concordant genes
avg{Pr(concordant)}
Fre
quen
cy
0.2 0.4 0.6 0.8 1.0
0
50
100
150
PMC(g)
−2 0 1 2 3
−2
−1
0
1
2
3
Hedenfalk
−2
−1
0
1
2
3
oo
o
o
o
o
ooooo
o
ooo
o
o
oooo
o
o
o
o
oo
o
o
o
o
oo
ooo
o
o
o
oo
oo
oo
ooo
oo
oo
o
oo oo
o
ooo
o
o
o
ooo
ooo
oo
o
o
o
oo
oo
o
o
oo
o
o
oo
o
o
oo
o
o
o
oooo
oo
o
oo
o
oo
o
o
o
oo
o
oo
o
oo
oooo
oo
oo
o
o
o
o
oo
o
oo
oo
o
oo
o ooo
oo
ooo
oo
ooo
o
o
oooo
o o
oo
o
o
o
o oo
oooo
o
o
oo
o
o
o
o
ooo
o
oo
o
oooooo oooooo
o
oo
oooo
Sorlie
−2−1
01234
oo
o
o
o
oo
oooo
o
ooo
o
o
oooo
o
o
o
o
oo
o
o
o
o
oo
ooo
o
o
o
oo
oo
oo
ooo
oo
oo
o
oo
o
o
oooo
o
o
o
ooo
ooo
oo
o
o
o
oo
oo
o
o
oo
o
o
oo
o
o
oo
o
o
o
oooo
o
o
o
oo
o
oo
o
o
o
oo
o
ooo
oo
oo
oo
oo
oo
o
ooo
oo
o
oo
oo
o
ooo o
oo
oooo
o
oo
ooo
o
o
oooo
oo
oo
o
o
o
oo
o
o
ooo
o
o
oo
o
o
o
o
oo o
o
oo
o
oooooo
oooo
oo
o
oooo
oo oo
o
o
o
oo
oooo
o
ooo
o
o
oooo
o
o
o
o
oo
o
o
o
o
o o
ooo
o
o
o
oo
oo
oo
ooo
oo
oo
o
oo
o
o
ooo
o
o
o
o
ooo
ooo
oo
o
o
o
oo
oo
o
o
o o
o
o
oo
o
o
oo
o
o
o
oooo
o
o
o
oo
o
oo
o
o
o
oo
o
ooo
oo
oo
oo
oo
oo
o
ooo
oo
o
oo
oo
o
o ooo
oo
oo oo
o
oo
ooo
o
o
oooo
oo
oo
o
o
o
oo
o
o
ooo
o
o
oo
o
o
o
o
oo o
o
oo
o
oooooo
oooo
oo
o
ooo o
oo
Farmer
−2 0 1 2 3
−2
−1
0
1
2
3
oo
o
o
o
oooo
o
o
o
oo
o
o
o
oooo
o
o
o
o
oo
o
o o
o
o
o
ooo
o
o
o
oo
oo
oo
ooo
oo
ooo
oo
oo
ooo
o
o
o
o
o
oo
ooo
oo
o
o
o
oo
oo
o
o
oo
o
o
o
o
o
o
oo
o
o
o
oo
oo
o
o
o
oo
o
oo
o
o
o
oo
o
oo
o
o
o
oo
oo
oo
oo
o
oo o
oo
o
oo
o
o
o
oo
o o
o
oo
ooo o
oo
ooo
o
o
ooo
o
oo
oo
o
o
o
oo
o
o
ooo
o
o
oo
o
o
o
o
oo o
o
oo
o
oooooo oooo
oo
o
oooo
oo
−2 0 1 2 3
oo
o
o
o
oooo
o
o
o
oo
o
o
o
oooo
o
o
o
o
oo
o
o o
o
o
o
ooo
o
o
o
oo
oo
oo
ooo
oo
ooo
oooo
ooo
o
o
o
o
o
oo
ooo
oo
o
o
o
oo
oo
o
o
oo
o
o
o
o
o
o
oo
o
o
o
oooo
o
o
o
oo
o
oo
o
o
o
o o
o
oo
o
o
o
oo
oo
oo
oo
o
oo o
oo
o
oo
o
o
o
o o
oo
o
oo
o oo o
oo
ooo
o
o
oooo
oo
oo
o
o
o
oo
o
o
ooo
o
o
oo
o
o
o
o
oo o
o
oo
o
oooooooooo
oo
o
ooo o
oo
−2 0 2 4
oo
o
o
o
oooo
o
o
o
oo
o
o
o
oooo
o
o
o
o
oo
o
o o
o
o
o
ooo
o
o
o
oo
oo
oo
ooo
oo
ooo
oo
oo
ooo
o
o
o
o
o
oo
ooo
oo
o
o
o
oo
oo
o
o
oo
o
o
o
o
o
o
oo
o
o
o
oooo
o
o
o
oo
o
oo
o
o
o
oo
o
oo
o
o
o
oo
oo
oo
oo
o
oo o
oo
o
oo
o
o
o
oo
oo
o
oo
ooo o
oo
ooo
o
o
ooo
o
oo
oo
o
o
o
oo
o
o
oo
o
o
o
oo
o
o
o
o
ooo
o
o o
o
oooo
oo ooooo
o
o
oooo
oo
−2 0 1 2 3
−2
−1
0
1
2
3
Huang
posterior average of BES
4 breast cancer studies: discordant genes
avg{Pr(discordant)}
Fre
quen
cy
0.0 0.2 0.4 0.6 0.8
0
50
100
150
PMD(g)
−2 0 1 2 3
−2
−1
0
1
2
3
Hedenfalk
−2
−1
0
1
2
3
x
xxxx
xxx
xxxx
xxxxx
xx
xxxx
xxx
xx
xx
x
xx
xxxx
xxx
x x
x x xxx
xx
xx
x
xxx
xxx
x
x
xx
xxx
xx
xxxxxxxx
xxxx
x
xx
xx x
xxx
xxx
xx
xxxx
x xx
x
x x
x
Sorlie
−2−1
01234
xxxx x xxx xxx
xxx
xx
xx
xxxxx
xxxx x
xx
xxx
xxx
x
xx
x
x xx
x
xxxxx x
xxxx x
x
xx x
x
x
x xxx
xxxxxxx
xxx xxx
x x
xxx x xxx
x
xxx
xxxx xxx
x
xx x
xx xxxx x xx
x xxxx
xxx
xx
x
xxxxxxxx
x xx
x
xxx
xxxx
xx
x
xxx
x
x xxxx x
xxxx x
x
xx x
x
x
x xxx
xxx
xxxx
xxx xxxx x
xxx xxxx
x
xx x
xxxx xx x
x
xx x
xx
Farmer
−2 0 1 2 3
−2
−1
0
1
2
3
x
xxx x
xx
x
xx
xx
xxx
xx
x
x
xxx
x
x
xx
x x
x
x
x
xxx
x
x
x xx
xxx
xx
xxx
xx
x
x
x
xx
xx
x
x
x
x
x
x
x
xx
x
xx
xxxxxxx
x
xxx
x
xx x x
xx
x
x
xxx
xx
xx
x
xx
xx
xx x
x
−2 0 1 2 3
x
xxx x
xx
x
xx
xx
xxx
xx
x
x
xxx
x
x
xx
x x
x
x
x
xxxx
x
xxx
x xx
xx
xx
x
xx
x
x
x
xx
xx
x
x
x
x
x
x
x
xx
x
xx
xxxxxxx
x
xxx
x
xx x x
xx
x
x
xx x
xx
xx
x
xxx
x
xxx
x
−2 0 2 4
x
xxxx
xx
x
xx
x x
xxx
xx
x
x
xxx
x
x
xx
xx
x
x
x
xxx
x
x
x xx
xxx
xx
xxx
xx
x
x
x
xx
xx
x
x
x
x
x
x
x
xx
x
xx
xxxxx
xx
x
xxx
x
xx xx
xx
x
x
xxx
xx
xx
x
xx
xx
xx x
x
−2 0 1 2 3
−2
−1
0
1
2
3
Huang
posterior average of BES
4 breast cancer studies: outlying studies
0.6 0.7 0.8 0.9 1.0
0.6
0.7
0.8
0.9
1.0
m = 1
m=
3
0.6 0.7 0.8 0.9 1.0
m = 1
m=
4
Probability of concordant differential expression in at least mstudies
4 breast cancer studies: goodness of fit
0 500 1000 1500 2000
0
5
10
15
Index
−lo
g10(
p−va
lue)
n = 16
0 500 1000 1500 2000
0
5
10
15
Index
−lo
g10(
p−va
lue)
n = 111
0 500 1000 1500 2000
0
5
10
15
Index
−lo
g10(
p−va
lue)
n = 49
0 500 1000 1500 2000
0
5
10
15
Index
−lo
g10(
p−va
lue)
n = 88
R package: XDE
> data(expressionSetList)> params <- new("XdeParameter",+ esetList = expressionSetList,+ phenotypeLabel = "adenoVsquamous")> fit <- xde(params, expressionSetList)> plot(fit)
10000
20000
30000
40000
potential
0.00.20.40.60.81.0
a
0.00.20.40.60.81.0
b
0.00.20.40.60.81.0
l
0.00.20.40.60.81.0
t
0 400 1000
1.2
1.4
1.6
1.8
γ2
0 400 1000
0.00.20.40.60.81.0
c2
0 400 1000
0.60.81.01.21.41.61.8
τ2
0 400 1000
0.00
0.05
0.10
0.15
ξ
0 400 1000
0.00.20.40.60.81.0
ρ
0 400 1000
0.00.20.40.60.81.0
r
Look for it here: www.bioconductor.org
Credits
INTEGRATIVE CORRELATION:Les Cope, Ed Gabrielson, Liz Garrett-Mayer
INTEGRATIVE ASSOCIATION:Simens Zhong, Luigi Marchionni, Les Cope, Ed Gabrielson, LizGarrett-Mayer
XDE:Rob Scharpf, Häkon Tjemeland, Andrew Nobel