estimating autoantibody signatures to detect autoimmune ... · 2 z. wu and others i. smoothing. for...
TRANSCRIPT
Biostatistics (2017), 0, 0, pp. 1–13doi:10.1093/biostatistics/gel-supplementary-materials˙rev1
Supplementary Materials to
“Estimating AutoAntibody Signatures to Detect
Autoimmune Disease Patient Subsets”
ZHENKE WU∗,1, LIVIA CASCIOLA-ROSEN2, AMI A. SHAH2,
ANTONY ROSEN2, SCOTT L. ZEGER3
1 Department of Biostatistics and Michigan Institute of Data Science, University of Michigan,Ann Arbor, Michigan 48109
2 Division of Rheumatology, Department of Medicine, Johns Hopkins University School ofMedicine, Baltimore, Maryland, 21224
3 Department of Biostatistics, Johns Hopkins University, Baltimore, MD 21205
Appendix S1. Raw Data and Standardization
Let (traw,Mraw) ={
(trawgs ,Mrawgis )
}represent the raw, high-frequency GEA data, for pixel s =
1, . . . , Sg, lane i = 1, . . . , Ng, gel g = 1, . . . , G. Here traw is a grid that evenly splits the unit
interval [0, 1] with trawgs = s/Sg ∈ [0, 1], where a large Sg represents a high imaging resolution.
Note that in the raw data, Sg varies by gel from 1, 437-1, 522 in our application. M rawgis is the
radioactive intensity scanned at trawgs for lane i = 1, . . . , Ng, gel g = 1, . . . , G. Let N =∑g Ng be
the total number of samples tested.
For the rest of this section, we process the high-frequency data (traw,Mraw) into high-frequency
data (t0,M0) that have been standardized across multiple gels. The latter will be used as input
for peak detection (Section 2.2).
∗To whom correspondence should be addressed.
c© The Author 2017. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: [email protected]
2 Z. Wu and others
i. Smoothing. For each sample lane, smooth the raw intensity data by LOESS, with a span
h = 0.022. Let M = {Mgis} denote the smoothed mean function evaluated at raw imaging
location trawgs .
ii. Standardization Across Gels. Imaging resolution may vary by gel, we hence standardize
the smoothed intensity values M into B0 = 700 bins using a set of evenly-spaced break
points {0 = κ0 < κ1 < . . . < κB0 = 1} shared by all gels.
We clip the images at the right end {b : tb > 0.956} because they represent small molecular
weight molecules migrating at the dye front (that is, not separable by gel type used). Their
exclusion from autoantibody comparisons is standard practice. We denote the standardized
data by (t0,M0) ={(t0b ,M
0gib
)}.
Appendix S2. Peak Calling Algorithm
Given a half-width h, collect the peak-candidate bins defined by B0gi(h) = {b | scoregi(b) = 3}.
Because h controls the locality of a peak, we perform peak-candidacy search for a few values of
h. Denote the union of the identified peak-candidate bins by B0gi = ∪hB0gi(h). B0gi is comprised of
multiple blocks, each comprised of contiguous peak-candidate bins. Among the blocks, we merge
two nearby ones, for example, if B0gij and B0gi,j+1 satisfy minB0gi,j+1−maxB0gij 6 γ(= 5). We also
remove short blocks of length less than three to obtain the final peak-candidate bins {Bgij}Jgij=1.
Finally, we pick the bin bgij that maximizes its within-block intensities. We refer to them as peaks
j = 1, . . . , Jgi for lane i = 1, . . . , Ng on gel g = 1 . . . , G.
The true and false peak detection rates are determined by several factors including the half-
peak width h, the minimum peak elevation C0, the true intensities at each bin and the measure-
ment errors inherent in autoradiography. We calibrate the first two parameters so that 1) the
reference lanes have exactly 7 detected peaks (perfect observed sensitivity and specificity), and
2) the actin peaks stand out clearly. For example, the middle panel of Figure 1(b) shows the
Supplementary Materials to Wu et al. 2017 3
result of peak detection by blue asterisks for one set of the gels (Section 4.3), where the peaks
that are slightly higher than the neighboring intensities are effectively captured, most notably for
lanes 5,10,15 where the small actin peaks are identified. Note that, we have reduced the impact of
measurement noise on peak detection by computing the local difference scores from the smoothed
data rather than the raw data. In our analyses, we have chosen the minimum peak amplitude
parameter C0 = 0.01 that is of higher order than the noise level obtained from LOESS smoothing.
Alternative approaches to peak detection include random process modeling (e.g., Carlson and
others, 2015), multiplicity adjustment after local maxima hunting (e.g., Schwartzman and others,
2011) and filtering methods (e.g., Du and others, 2006). From our experience, they are designed
and hence more suitable for data with appreciably higher noise levels; our data show much lower
noise level in the measured autoradiographic intensities. For example, in random process models
that are motivated by the analysis of pulsatile, or episodic time series data, the unknown locations
of peaks and the observed intensity values are modeled by double stochastic processes, such as
Cox processes, to fit the continuous intensity data (e.g., Carlson and others, 2015). However,
because a minor peak has a narrow span comprised of few bins, its presence or absence is more
uncertain than major peaks under random process models. In addition, to detect peaks via the
random process models for hundreds of samples and hundreds of dimensions per sample, the
iterative MCMC model fitting algorithm could be computationally expensive.
Appendix S3. Reference Alignment via Piecewise Linear Dewarping
Let (t = (t1, . . . , tB),Mg = {Mgib}) represent the high-frequency intensity data for a query gel
g and (t,Mg0 = {Mg0ib}) the data for a template gel g0. We describe a procedure to align gel g
towards g0 using piecewise linear dewarping (Uchida and Sakoe, 2001). Let κg represent the set of
knots comprised of two endpoints κg1 = ν0, κgR = νL+1 and the reference peaks {κg2, . . . , κg,R−1}
on gel g. The knot κgj corresponds to a bin defined by {bgj : tbgj = κgj}, j = 1, . . . , R. For the
4 Z. Wu and others
template gel g0, we similarly denote the knots and the bins by κg0 and bg0j , j = 1, . . . , R. Note
that κg and κg0 need to have identical cardinality R. In our application, R = 9: seven reference
peaks plus two endpoints.
In two steps, we define a piecewise linear function Wg(·; g0) for reference alignment. The
function first matches the query knots κg to the template knots κg0 and then linearly stretches
or compresses gel g between the matched knots. It finds a bin number Wg(b; g0) in the query gel
g and match it to bin b in the template gel g0. For example, we will have Wg(bg0j ; g0) = bgj i.e.,
the j-th knots in both gels are exactly matched.
For the bin b = 1, . . . , B on the template gel g0:
1. Find the nearest knot to its left: {j = j(b; g0) ∈ {1, . . . , R− 1} : κg0j 6 tb < κg0,j+1}.
2. Find the bin in the query gel g to be matched by bin b by
W(j)g (·; g0) : b 7→ bw · bg,j+1 + (1− w)bgjc, (A1)
where w = w(b; g0) = (b− bg0j)/
(bg0,j+1 − bg0j) , and bac is the largest integer smaller
than or equal to a.
We thus obtain the piecewise linear dewarping function
Wg(·; g0) : b 7→R−1∑j=1
W(j)g (b; g0)1{κg0j6tb<κg0,j+1}, b = 1, . . . , B,
that aligns the query gel g to the template gel g0 (g 6= g0).
Figure S1 illustrates the results before and after the piecewise linear dewarping by Wg(·; g0).
The piecewise linear dewarping automatically matches the reference peaks from multiple gels to
facilitate cross-gel band comparisons.
Supplementary Materials to Wu et al. 2017 5
Appendix S4. Details on Shrinkage Prior for Hyperparameters {σ−2gs }
For the hyperpriors on the smoothing parameters {τ2gs = σ−2gs }, s = 2, . . . , Tν − 1, g = 1, . . . , G,
we specify a two-component mixture distribution with one component favoring small and the
other favoring large values (Morrissey and others, 2011):
τ2gs ∼ ξgsGamma(· | aτ , bτ ) + (1− ξgs)InvPareto(· | a′τ , b′τ ), (A2)
InvPareto(τ ; a, b) =a
b
(τb
)a−1, a > 0, 0 < τ < b, (A3)
where the Gamma-distributed component (aτ = 3, bτ = 2) concentrates near smaller values while
the inverse-Pareto component prefers larger values (a′τ = 1.5, b′τ = 400).
We designed the mixture priors for τ2gs so that given s, βgst, t = 1, . . . , Tu are similar or
discrepant according as τ2gs being large or small. The random smoothness indicator ξs represents
different (1) or the same amount of (0) warping across lanes. Let ξgs ∼ Bernoulli(ρg) with suc-
cess probability ρg. We specify a hyperprior ρg ∼ Beta(aρ, bρ) to let data inform the degree of
smoothness. In our application, we use aρ = bρ = 1 so that the prior has a mean of 1/2 that
assigns equal prior probabilities to all the submodels defined by (ξg2, . . . , ξg,Tν−1) ∈ {0, 1}Tν−1.
We specify the hyperprior for the smoothing parameter τ2g1 = σ−2g1d∼ InvPareto(· | a′τ , b′τ ). We
also fix the measurement error variance σε = ∆/3 where ∆ is the minimum distance among grid
points {ν`} in the standardized scale. These parameters are chosen to constrain the shape of Sg
and are shown to have good dewarping performances, e.g., aligning all the actin peaks to a single
landmark (maximum a posteriori).
Appendix S5. Details on Posterior Sampling Algorithm
We sample from the joint posterior by the following algorithm:
1. Update peak-to-landmark indicators Zgij for peak j = 1, . . . , Jgi, lane i = 1, . . . , Ng and
6 Z. Wu and others
gel g = 1, . . . , G by categorical distribution
P(Zgij = ` | others) ∝ N(Tgij ;Sg(ui, ν`;βg), σ2
ε
){1− exp(−λ∗` )} , (A4)
for ` = 1, . . . , L that satisfy the support constraint |ν` − Tgij | < A0 and order restriction
Zgij < Zgi,j+1, j = 1, . . . , Jgi − 1.
2. Update the B-spline basis coefficients βg = {βgst} for gel g = 1, . . . , G.
To establish notation, let ∆1 be the first order difference operator of dimension (Tν − 2)×
Tν − 1 with entries ∆1kk′ = δ(k + 1, k′) − δ(k, k′) and δ(k, k′) equals 1 if k = k′ and 0
otherwise; Similarly we define ∆2 with Tν replaced by Tu+1. The random walk priors (2.6)
and (2.7) can then be written as
βg[−Tν ]1d∼ NTν−1(·;βid
[−Tν ], σ−2g1 ∆′1∆1)1{ν0 = βg11 < . . . < βg,Tν−1,1 < νL+1},
and
βgs·d∼ NTu(·; 0, σ−2gs ∆′2∆2)1{ν0 = βg1t < βg,s−1,t < βgst < νL+1,∀t > 2}, s = 2, . . . , Tν − 1.
Although both ∆1 and ∆2 are rank deficient, we will show that the conditional posterior
for βg is proper under a scattering condition.
We now write the likelihood (2.2) of the Pg =∑Ngi=1 Jgi observed peaks Tg. Let vec[βg] be
a vector of length TνTu that stacks the columns of βg. Let Bg be a Pg by TνTu matrix;
the i-th row is defined by the Kronecker product of two sets of B-spline basis functions
evaluated at (νZgij , ugi): Bg2(ugi)′ ⊗Bg1(νZgij )
′.
Update the B-spline basis coefficients βg = {βgst} for gel g = 1, . . . , G by the multivariate
Gaussian distribution (with constraints (2.4) and (2.5))
[vec[βg] | others] ∝ exp
(− 1
σ2ε
∣∣∣∣∣∣∣∣Tg −Bgvec[βg]
∣∣∣∣∣∣∣∣22
)
Supplementary Materials to Wu et al. 2017 7
× exp
(− 1
σ2g1
∣∣∣∣∣∣∣∣∆aug1 βid,aug −∆aug
1 vec[βg]
∣∣∣∣∣∣∣∣22
)
× exp
(−Tν−1∑s=2
1
σ2gs
∣∣∣∣∣∣∣∣∆aug2s vec[βg]
∣∣∣∣∣∣∣∣22
),
where βid,aug is a column vector formed by stacking βid for Tu times hence is of length
TνTu. Here ∆aug1 is an augmented first-order difference operator applied to a vector of
length TνTu. It augments ∆1 to [∆1 | 0(Tν−2)×(TuTν−Tν+1)]. Similarly, we define a first-
order difference matrix ∆aug2s that augments ∆2 to a (Tu− 1)×TuTν matrix; we replace the
(s, s+ Tν , . . . , s+ (Tu − 1)Tν)-th columns 0(Tu−1)×TνTu by the columns of ∆2.
The conditional distribution above then simplifies to a multivariate Gaussian distribution
with a mean vector
Λ−1g
{σ−2ε B′gTg + σ−2g1 ∆aug′
1 ∆aug1 βid,aug
}where the precision matrix Λg = σ−2ε B′gBg + σ−2g1 ∆aug′
1 ∆aug1 +
∑Tν−1s=2 σ−2gs ∆aug′
2s ∆aug2s .
The matrix B′gBg is of full rank when the observed peaks are well scattered across lanes
and along the gels. Let (cs0 , cs0+1) be the support of the s0-th B-spline basis B1s0(·) in
the ν-direction. Bg can be rank deficient, for example, when no peak appears in between
(cs0 , cs0+1). That is, we have νZgij /∈ (cs0 , cs0+1) and Bg1s0(νZgij ) = 0, for i = 1, . . . , Ng,
j = 1, . . . , Jgi. By the definition of Bg, it will have constant zeros in the columns s0,
s0 + Tν ,. . . , and s0 + (Tν − 1)Tu. As a result, the marginal posterior of {βgs0t}Tut=1 can only
be learned indirectly through its neighboring coefficients via random walk priors (2.6) and
(2.7). Rank deficiency also occurs if multiple neighboring lanes have no observed peaks.
Given sparse peaks, Bg can be made full rank by reducing the number of basis functions.
However, the warping function Sg will be less flexible. We therefore recommend a minimal
number of basis functions necessary to ensure parameter identifiability provided the family
of warping functions Sg is flexible enough to characterize the image deformations (e.g.,
curvature of the actin peaks). We refer to the condition that Bg being full rank as the
8 Z. Wu and others
scattering condition.
In our application, failure of the scattering condition is rare. Λg is then a sum of positive
definite matrix and semi-definite matrices and hence is invertable. Also recall that B′gBg is
sparse as constructed from sparse B-spline bases. Because ∆aug′
1 ∆aug1 and
∑Tν−1s=2 σ−2gs ∆aug′
2s ∆aug2s
are both sparse square matrices with at most O(TνTu) nonzeros, Λg preserves the sparsity
of B′gBg. So we use sparse Cholesky factorization of Λg to produce its Cholesky factors.
We first block update {βgst}Tut=1 for s = 2 from [βgst | βgj1j2 , j1 6= s, others] with the
constraint βgst > βg,s−1,t, t = 1, . . . , Tu and continue for s = 3, . . . , Tν − 1. This step
requires calculation of inverse of submatrices of Λ−1g for Tν − 2 times. In our application,
computing one such inverse when Tν = 10 and Tu = 6 requires < 0.001 seconds.
3. Update the smoothing parameters τ2gs = σ−2gs and smoothness selection indicator ξgs (Ap-
pendix S4). First randomly switch ξgs to ξ∗gs either from 0 to 1, or 1 to 0 for s = 2, . . . , Tν−1,
g = 1, . . . , G. For the parameter τ2gs, we propose its candidate τ∗2gs from the log-normal dis-
tribution with log-mean parameter τ2gs. We accept (τ∗2gs , ξ∗gs) with probability
α(1)g = min
{1,p(Tg;Zg,βg, τ
∗2gs )π(τ∗2gs | ξ∗gs)q(τ2gs | τ∗2gs )q(ξgs | ξ∗gs)
p(Tg;Zg,βg, τ2gs)π(τ2gs | ξgs)q(τ∗2gs | τ2gs)q(ξ∗gs | ξgs)
}, (A5)
where p(Tg;Zg,βg, τ∗2gs ) denotes the Gaussian likelihood (2.2) and π(τ2gs | ξgs) is the prior
distribution (A2).
We update τ2gs again because it is continuous and therefore has a much bigger parameter
space than that of discrete parameter. Using random walk Metroplis-within-Gibbs algo-
rithm, we propose τ∗2gs from the log-normal distribution with log-mean parameter τ2gs and
accept with probability
α(2)g = min
{1,p(Tg;Zg,βg, τ
∗2gs )π(τ∗2gs | ξ∗gs)q(τ2gs | τ∗2gs )
p(Tg;Zg,βg, τ2gs)π(τ2gs | ξgs)q(τ∗2gs | τ2gs)
}. (A6)
4. Update the smoothing parameter in the ν-direction τ2g1 = σ−2g1 , g = 1, . . . , G by proposing
Supplementary Materials to Wu et al. 2017 9
its candidate τ2g1 from the log-normal distribution with log-mean parameter τ2g1. We accept
the proposal with probability min
{1,
NTν−1({βgs1}Tν−1s=1 ;0,τ∗2g1 I)q(τ
2g1|τ
∗2g1 )
NTν−1({βgs1}Tν−1s=1 ;0,τ2
g1I)q(τ∗2g1 |τ2g1)
}.
5. Update the smoothness selection hyperparameter ρg (Appendix S4), for g = 1, . . . , G from
[ρg | others] ∼ Beta(aρ +∑s
1{ξgs = 1}, bρ +∑s
1{ξgs = 0}). (A7)
We calibrate the scale of the proposals in Step 3-4 at the burn-in period of the MCMC to achieve
an acceptance rate between 30% and 70%. All the posterior analyses were based on three chains of
10,000 iterations with a burn-in period of 5,000 iterations. We monitor the MCMC convergence via
chain histories (visual inspection of good mixing), kernel density plots (unimodal for continuous
parameters) and Brooks-Gelman-Rubin statistic R (Brooks and Gelman, 1998) (three chains with
random starting values; we used the criterion R < 1.01 to declare posterior convergence for every
model parameter). Convergence is fast within thousands of burn-in iterations.
10 Z. Wu and others
Appendix Figures
Molecular Weight (kDa)
Inte
nsity
21.531456697116200
1.2
1
0.8
0.6
0.4
0.2
0
0.2
0.4
0.6
0.8
1
1.2Refence Lane for Gel Set 1
Refence Lane for Gel Set 4
Gel 1 aligned towards Gel Set 4
Figure S1. Before (bottom) and after (top) piecewise linear dewarping towards Gel 4 (Section 4.3).Top: 21 intensity curves, 20 solid curves from one GEA experiement (g = 1) after reference alignment;one dashed curve for the reference lane in gel g0 = 4. The two curves not in purple denote the Lane 1intensities from the two gels and are aligned. Bottom: The same 21 intensity curves without referencealignment. The reference lanes (non-purple ones) are mismatched.
Supplementary Materials to Wu et al. 2017 11
● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ●
● ● ● ●
● ● ● ● ● ● ●
● ● ● ●
● ● ● ● ● ● ● ●
●
● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ●
● ● ● ● ●
● ● ● ●
● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ●
● ● ● ● ●
● ● ● ● ●
● ● ● ● ● ●
● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ●
Ser
um S
ampl
e La
ne
Landmark #
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
23456789
1011121314151617181920
23456789
1011121314151617181920
200
116
97
66
45
31
21.5
200
116
97
66
45
31
21.5
0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50 52 54 56 58 60 62 64 66 68 70 72 74 76 78 80 82 84 86 88 90 92 94 96 98 1001 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 61 63 65 67 69 71 73 75 77 79 81 83 85 87 89 91 93 95 97 99 101
0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50 52 54 56 58 60 62 64 66 68 70 72 74 76 78 80 82 84 86 88 90 92 94 96 98 1001 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 61 63 65 67 69 71 73 75 77 79 81 83 85 87 89 91 93 95 97 99 101
Long
Exp
osur
eSh
ort E
xpos
ure
0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50 52 54 56 58 60 62 64 66 68 70 72 74 76 78 80 82 84 86 88 90 92 94 96 98 1001 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 61 63 65 67 69 71 73 75 77 79 81 83 85 87 89 91 93 95 97 99 101
Figure S2. Bayesian spatial dewarping results for the experiment with replicates (short/long exposure).Top: For each gel set, 19 serum lanes over a grid of L = 100 interior landmarks (reference lanes excluded).
Each detected peak Tgij (solid blue dots “•”) is connected to its maximum a posteriori landmark Zgij
(red triangle “∆”). The image deformations are shown by the bundle of black vertical curves {u 7→Sg(ν`, u) : u = 2, . . . , 20}, ` = 1, . . . , L, g = 1, 2, each of which connects the estimated locations ofidentical molecular weight. Bottom: Marginal posterior probability of each landmark protein present ina sample.
12 REFERENCES
Pai
r.1
Pai
r.1
Pai
r.8
Pai
r.8
Pai
r.11
Pai
r.11
Pai
r.4
Pai
r.6 Pai
r.19
Pai
r.20
Pai
r.2
Pai
r.19
Pai
r.2
Pai
r.20
Pai
r.16
Pai
r.16
Pai
r.12
Pai
r.12
Pai
r.5
Pai
r.7
Pai
r.14
Pai
r.14
Pai
r.9
Pai
r.9
Pai
r.13
Pai
r.13
Pai
r.15
Pai
r.5
Pai
r.7
Pai
r.15
Pai
r.17
Pai
r.17
Pai
r.10
Pai
r.10
Pai
r.4
Pai
r.6
Pai
r.3
Pai
r.3
Pai
r.18
Pai
r.18
0.0
0.5
1.0
1.5
2.0 With Processing
Cluster method: completeDistance: cosine
Hei
ght
100100 10093 100100 92 5991 8895 10010090 6294 94 989386 97 9595947794 93 79100 63 65947459
74
7575
100
au
100100 10092 100100 87 4696 8884 9910088 3072 62 9293100 76 7774768978 68 12100 24 24668971
20
2020
100
bp
12 34 56 7 89 1011 121314 1516 17 181920 21 2223242526 27 2829 30 31323334
35
3637
38
edge #
Pai
r.1
Pai
r.1
Pai
r.19
Pai
r.2
Pai
r.2
Pai
r.20
Pai
r.20
Pai
r.8
Pai
r.8
Pai
r.11
Pai
r.11
Pai
r.4
Pai
r.3
Pai
r.6
Pai
r.9
Pai
r.14
Pai
r.9
Pai
r.14
Pai
r.16
Pai
r.5
Pai
r.7
Pai
r.12
Pai
r.16
Pai
r.15
Pai
r.5
Pai
r.7
Pai
r.12
Pai
r.17
Pai
r.17
Pai
r.10
Pai
r.10
Pai
r.3
Pai
r.4
Pai
r.6
Pai
r.13
Pai
r.13
Pai
r.15
Pai
r.19
Pai
r.18
Pai
r.18
0.0
0.5
1.0
1.5
2.0
No Proprocessing
Hei
ght
100 9667 92 7369 97 978361 100 99 7477 7162 8666 8679 81838281 8569 8493 8592 85 9575
93
95100 95
100
au
100 9462 87 3466 89 867368 100 100 6154 5650 6175 6292 421004876 4044 2547 1454 42 1824
10
28100 28
100
bp
1 23 4 56 7 8910 11 12 1314 1516 1718 1920 21222324 2526 2728 2930 31 3233
34
3536 37
38
edge #
Figure S3. Estimated dendrograms with (top) and without (bottom) pre-processing for the replicationexperiment. The red boxes show the subtree with > 95% confidence levels. The actual confidence levelsare shown in red on top of the subtrees. The numbers below the edges denote the edge numbers.
References
Brooks, Stephen P and Gelman, Andrew. (1998). General methods for monitoring conver-
gence of iterative simulations. Journal of computational and graphical statistics 7(4), 434–455.
Carlson, Nichole E, Grunwald, Gary K and Johnson, Timothy D. (2015). Using
cox cluster processes to model latent pulse location patterns in hormone concentration data.
Biostatistics, kxv046.
Du, Pan, Kibbe, Warren A and Lin, Simon M. (2006). Improved peak detection in mass
REFERENCES 13
spectrum by incorporating continuous wavelet transform-based pattern matching. Bioinfor-
matics 22(17), 2059–2065.
Morrissey, Edward R, Juarez, Miguel A, Denby, Katherine J and Burroughs,
Nigel J. (2011). Inferring the time-invariant topology of a nonlinear sparse gene regulatory
network using fully bayesian spline autoregression. Biostatistics 12(4), 682–694.
Schwartzman, Armin, Gavrilov, Yulia and Adler, Robert J. (2011). Multiple testing
of local maxima for detection of peaks in 1d. Annals of statistics 39(6), 3290.
Uchida, Seiichi and Sakoe, Hiroaki. (2001). Piecewise linear two-dimensional warping.
Systems and Computers in Japan 32(12), 1–9.