arxiv:1808.03604v1 [cs.lg] 10 aug 2018 · 2019-08-15 · disease progression timeline estimation...

Disease Progression Timeline Estimation for Alzheimer’s Disease usingDiscriminative Event Based Modeling

Vikram Venkatraghavana,∗, Esther E. Brona, Wiro J. Niessena,b, Stefan Kleina, for the Alzheimer’s DiseaseNeuroimaging Initiative1

aBiomedical Imaging Group Rotterdam, Departments of Medical Informatics & Radiology, Erasmus MC, University MedicalCenter Rotterdam, The Netherlands

bQuantitative Imaging Group, Dept. of Imaging Physics, Faculty of Applied Sciences, Delft University of Technology, Delft,The Netherlands

Abstract

Alzheimer’s Disease (AD) is characterized by a cascade of biomarkers becoming abnormal, the pathophysiol-ogy of which is very complex and largely unknown. Event-based modeling (EBM) is a data-driven techniqueto estimate the sequence in which biomarkers for a disease become abnormal based on cross-sectional data.It can help in understanding the dynamics of disease progression and facilitate early diagnosis and prognosisby staging patients. In this work we propose a novel discriminative approach to EBM, which is shown to bemore accurate than existing state-of-the-art EBM methods. The method first estimates for each subject anapproximate ordering of events. Subsequently, the central ordering over all subjects is estimated by fittinga generalized Mallows model to these approximate subject-specific orderings based on a novel probabilisticKendall’s Tau distance. We also introduce the concept of relative distance between events which helps in cre-ating a disease progression timeline. Subsequently, we propose a method to stage subjects by placing themon the estimated disease progression timeline. We evaluated the proposed method on Alzheimer’s DiseaseNeuroimaging Initiative (ADNI) data and compared the results with existing state-of-the-art EBM meth-ods. We also performed extensive experiments on synthetic data simulating the progression of Alzheimer’sdisease. The event orderings obtained on ADNI data seem plausible and are in agreement with the currentunderstanding of progression of AD. The proposed patient staging algorithm performed consistently betterthan that of state-of-the-art EBM methods. Event orderings obtained in simulation experiments were moreaccurate than those of other EBM methods and the estimated disease progression timeline was observed tocorrelate with the timeline of actual disease progression. The results of these experiments are encouragingand suggest that discriminative EBM is a promising approach to disease progression modeling.

Keywords: Disease Progression Modeling, Event-Based Model, Alzheimer’s Disease

∗Corresponding authorEmail address: [email protected]

(Vikram Venkatraghavan)1Data used in preparation of this article were ob-

tained from the Alzheimer’s Disease Neuroimaging Initia-tive (ADNI) database (adni.loni.usc.edu). As such, theinvestigators within the ADNI contributed to the designand implementation of ADNI and/or provided data butdid not participate in analysis or writing of this report.A complete listing of ADNI investigators can be foundat: http://adni.loni.usc.edu/wp-content/uploads/how_

to_apply/ADNI_Acknowledgement_List.pdf

1. Introduction

Dementia is considered a major global healthproblem as the number of people living with de-mentia was estimated to be about 46.8 million in2015. It is expected to increase to 131.5 millionin 2050 (Prince et al., 2015). Alzheimer’s Dis-ease (AD) is the most common form of demen-tia. There is a gradual shift in the definitionof AD from it being a clinical-pathologic entity(based on clinical symptoms), to a biological onebased on neuropathologic change (change of imag-ing and non-imaging biomarkers from normal to ab-normal) (Jack et al., 2018). The latter definition is

Preprint submitted to NeuroImage August 15, 2019

arX

iv:1

808.

0360

4v1

[cs

.LG

] 1

0 A

ug 2

018

http://adni.loni.usc.edu/wp-content/uploads/how_to_apply/ADNI_Acknowledgement_List.pdf

http://adni.loni.usc.edu/wp-content/uploads/how_to_apply/ADNI_Acknowledgement_List.pdf

more useful for understanding the mechanisms ofdisease progression.

Preventive and supportive therapy for patientsat risk of developing dementia due to AD could im-prove their quality of life and reduce costs relatedto care and lifestyle changes. To identify the at-risk individuals as well as monitor the effectivenessof these preventive and supportive therapies, meth-ods for accurate patient staging (estimating the dis-ease severity in each individual) are needed. Toenable accurate patient staging in an objective andquantitative way, it is important to understand howthe different imaging and non-imaging biomarkersprogress after disease onset.

Longitudinal models of disease progression re-construct biomarker trajectories in individual sub-jects (Donohue et al., 2014; Sabuncu et al., 2014;Schmidt-Richberg et al., 2016). However, the util-ity of such models is restricted by the fact that lon-gitudinal data in large groups of patients is scarce.Identifying at-risk individuals for clinical trials bystudying their biomarker trajectory evolution isalso not feasible.

To circumvent this problem, methods to infer theorder in which biomarkers become abnormal duringdisease progression based on cross-sectional datahave been proposed (Fonteijn et al., 2012; Huangand Alexander, 2012; Iturria-Medina et al., 2016).The model used in Iturria-Medina et al. (2016) re-lies on stratification of patients into several sub-groups based on symptomatic staging, for inferringthe aforementioned ordering. However, the problemwith using symptomatic staging is that it is verycoarse and qualitative. The models used in Fonteijnet al. (2012); Huang and Alexander (2012) are vari-ants of Event-Based Models (EBM). EBM algo-rithms neither rely on symptomatic staging nor onthe presence of longitudinal data for inferring thetemporal ordering of events, where an event is de-fined by a biomarker becoming abnormal. Figure 1shows these biomarker events on hypothetical tra-jectories as expected in a typical neuropathologicchange.

An important assumption made in Fonteijn et al.(2012) is that the ordering of events is common forall the subjects in a dataset. AD is known to bea heterogeneous disease with multiple disease sub-types. The assumptions in Fonteijn’s EBM maytherefore be too restrictive. The assumptions inHuang’s EBM on the other hand are more realis-tic, as they do assume that the disease is heteroge-neous. However the algorithm does not scale well

Figure 1: Illustration of the output expected in an EBM. Thebiomarker trajectories shown here are hypothetical trajecto-ries representing a change of biomarker value from normalstate. The dots on these trajectories are biomarker eventsas defined in an EBM. Output of an EBM is the ordering ofsuch events.

to a large number of biomarkers (Venkatraghavanet al., 2017).

To make EBM more scalable to large number ofbiomarkers and subjects, as well as make it robustto variations in ordering, we propose a novel ap-proach to EBM, discriminative event-based model(DEBM), for estimating the ordering of events2.We also introduce the concept of relative distancebetween events which helps in creating a diseaseprogression timeline. Subsequently, we propose amethod to stage subjects by placing them on theestimated disease progression timeline. The othercontributions of this paper include an optimizationtechnique for Gaussian mixture modeling that helpsin accurate estimation of event ordering in DEBMas well as improving the accuracies of other EBMs,and a novel probabilistic distance metric betweenevent orderings (probabilistic Kendall’s Tau).

The remainder of the paper is organized as fol-lows: An introduction to the existing EBM mod-els is given in Section 2. In Section 3, we proposeour novel method for estimating central ordering ofevents. We perform extensive sets of experimentson ADNI data as well as on simulation data, thedetails of which are in Section 4. Section 5 sum-marizes the results of the experiments. Section 6discusses the implications of these findings followedby concluding remarks in Section 7.

2An earlier version of the model was presented at theIPMI conference (Venkatraghavan et al., 2017). In the cur-rent manuscript, several methodological improvements andextensions are presented, and the experimental evaluationhas been expanded substantially.

2

2. Event-Based Models

EBM assumes monotonic increase or decrease ofbiomarker values with increase in disease sever-ity (with the exception of measurement noise). Itconsiders disease progression as a series of events,where each event corresponds to a new biomarkerbecoming abnormal. Fonteijn’s EBM (Fonteijnet al., 2012) finds the ordering of events (S) suchthat the likelihood that a dataset was generatedfrom subjects following this event ordering is max-imized. S is a set of integer indices of biomarkers,which represents the order in which they becomeabnormal. Thus, disease progression is defined by{ES(1), ES(2), ..., ES(N)}, where N is the number ofbiomarkers per subject in the dataset and ES(i) isthe i-th event that is associated with biomarker S(i)becoming abnormal.

In a cross-sectional dataset (X) of M subjects,Xj denotes a measurement of biomarkers for sub-ject j ∈ [1,M ], consisting of N scalar biomarkervalues xj,i. Probabilistic formulation of an EBM,as proposed in Fonteijn et al. (2012), can be givenby argmaxS(p(S|X)), where p(S|X) can be writtenusing Bayes’ rule as:

p(S|X) =p(S)p(X|S)

p(X)(1)

An important assumption in Fonteijn et al. (2012)is that p(S) is uniformly distributed. This makesinferring S, equivalent to the maximum likelihoodproblem of maximizing p (X|S)3 . This can be fur-ther written in terms of Xj as follows:

p (X|S) =

M∏j=1

p (Xj |S) (2)

where p (Xj |S) can be written as:

p (Xj |S) =

N∑k=0

p(k|S)p (Xj |k, S) (3)

where p(k|S) is the prior probability of a subjectbeing at position k of the event ordering, whichis assumed to be equal for each position. The

3Fonteijn’s EBM uses Markov Chain Monte Carlo(MCMC) sampling to estimate the posterior distributionP (S|X). Average position of events in all the MCMC sam-ples was used as a way for selecting the mean ordering byFonteijn et al. (2012) whereas further extensions of the worksuch as Young et al. (2014) prefer the maximum likelihoodsolution.

k which maximizes p (Xj |S) denotes subject j’sdisease stage. This method of identifying diseaseseverity for a subject results in discrete set of stages,where the number of stages is one more than thenumber of biomarkers used for creating the model.p (Xj |k, S) can be expressed as:

p (Xj |k, S) =

k∏i=1

p(xj,S(i)|ES(i)

)×

N∏i=k+1

p(xj,S(i)|¬ES(i)

)(4)

where p(xj,S(i)|ES(i)

)is the likelihood of observing

xj,S(i) in subject j, conditioned on event i having

already occurred. p(xj,S(i)|¬ES(i)

), on the other

hand, computes a similar likelihood, given thatevent i has not occurred.

With the assumption that all the biomarkersin the control population are normal and thatthe biomarker values follow a Gaussian distribu-tion, p

(xj,S(i)|¬ES(i)

)is computed. Abnormal

biomarker values in the patient population are as-sumed to follow a uniform distribution but not allbiomarkers of a patient could be assumed to be ab-normal. For this reason, the likelihoods were ob-tained using a mixture model of a Gaussian and auniform distribution, where only the parameters ofthe uniform distribution were allowed to be opti-mized.

This method was modified in Young et al. (2014)to estimate the optimal ordering in a sporadic ADdataset with significant proportions of controls ex-pected to have presymptomatic AD (Schott et al.,2010). A Gaussian distribution was used to de-scribe both the control and patient population, andthe mixture model allowed for optimization of pa-rameters for the Gaussians describing both con-trol and patient population. The Gaussian mixturemodel was also used to incorporate more subjectsfrom the dataset with clinical diagnosis of mild cog-nitive impairment (MCI).

After obtaining the central ordering S whichmaximizes the likelihood p (X|S), staging of pa-tients is done by finding a disease stage k for subjectj, such that p (Xj |k, S) is maximized.

The assumption that subjects follow a uniqueevent ordering was relaxed by Huang and Alexander(2012), who estimate a distribution of event order-ings with a central event ordering (S) and a spread(φ) as per a generalized Mallows model (Fligner and

3

Figure 2: Overview of the steps in DEBM. A) Biomark-ers measured from different subjects are converted to prob-abilities of abnormality for individual biomarkers. This isdone by estimating normal and abnormal distributions us-ing Gaussian mixture modeling before classifying individualbiomarkers using a Bayesian classifier. B) Subject-specificorderings of biomarker abnormalities are inferred from theseprobabilities which are then used to estimate the central or-dering and for creating the disease progression timeline. C)This is then used to stage subjects based on disease severity.

Verducci, 1988) using an expectation maximizationalgorithm. The E-step estimates the likelihood ofpatients’ biomarker value measurements followingsubject-specific event order sj , given S and φ. Inthe M-step, S and φ are estimated based on sj es-timated in the E-step. This is done iteratively tomaximize the likelihood of generation of patients’data based on S and φ. Patient staging in Huang’sEBM is also a maximum likelihood estimate, butunlike Fonteijn’s EBM, the staging is done on thesubject-specific event ordering sj .

In both Fonteijn’s and Huang’s EBM, relativedistances between events, that can be observed inFigure 1, are not captured4. Some events can becloser to each other than others and using theserelative distance between events could help createa more informative disease progression model.

3. Discriminative Event-Based Model

Fonteijn’s and Huang’s EBM are generative mod-els where the likelihood p (X|S) is maximized.

4 Fonteijn et al. (2012) briefly mention the idea of captur-ing relative distance between events, but it was not validatedor used in any of the experiments.

Huang’s EBM also estimates subject-specific order-ing based on a generative approach. Here, we pro-pose our novel method for estimating central or-dering of events (S), a discriminative event-basedmodel (DEBM).

The proposed framework is discriminative in na-ture, since we estimate sj directly based on the pos-terior probabilities of individual biomarkers becom-ing abnormal. We also introduce a new concept ofrelative distance between events. This subsequentlyleads to a novel continuous patient staging algo-rithm. Figure 2 shows the different steps involvedin our approach.

In Section 3.1, we present the method to robustlyestimate biomarker distributions in pre-event andpost-event classes, given a single cross-sectionalmeasurement of biomarkers. In Section 3.2, wepresent a way for estimating sj , and we addressthe problem of estimating a disease timeline fromnoisy estimates of sj . In Section 3.3, we present thecontinuous patient staging method.

3.1. Biomarker Progression

In this section, we propose a method to robustlyconvert xj,i to p (Ei|xj,i), which denotes the pos-terior probability of a biomarker measurement be-ing abnormal. Assuming a paradigm similar tothat in previous EBM variants (Huang and Alexan-der, 2012; Young et al., 2014), the probability den-sity functions (PDF) of pre-event (p (xj,i|¬Ei)) andpost-event (p (xj,i|Ei)) classes in the biomarkers areassumed to be represented by Gaussians, indepen-dently for each biomarker. There are two reasonswhy constructing these PDFs is non-trivial. Firstly,the labels (clinical diagnoses) for the subjects donot necessarily represent the true labels of all thebiomarkers extracted from the subject. Not allbiomarkers are abnormal for subjects with AD di-agnosis, while some of the cognitively normal (CN)subjects could have undiagnosed pre-symptomaticconditions. Secondly, the clinical diagnosis can benon-binary and include classes such as MCI, withsignificant number of biomarkers in normal and ab-normal classes.

In our approach we address these two issues in-dependently. We make an initial estimate of thePDFs using biomarkers from easily classifiable CNand easily classifiable AD subjects and later refinethe estimated PDF using the entire dataset.

A Bayesian classifier is trained for each biomarkerusing CN and AD subjects, based on the assump-tion that there are no biomarkers in the pre-

4

symptomatic stage for CN subjects and all thebiomarkers are abnormal for AD subjects. Thisclassifier is subsequently applied to the trainingdata, and the predicted labels are compared withthe clinical labels. The misclassified data in thedataset could either be outliers in each class result-ing from our aforementioned assumption or couldgenuinely belong to their respective classes and rep-resent the tails of the true PDFs. Irrespective of thereason of misclassification, we remove them for ini-tial estimation of the PDFs. This procedure thusresults, for each biomarker, in a set of easily classi-fiable CN subjects (whose biomarker values repre-sent normal values) and easily classifiable AD sub-jects (whose biomarker values represent abnormalvalues). This is shown in the top part of Figure 3.

As we use Gaussians to represent the PDFs, wecalculate initial estimates for mean and standarddeviation for both normal (µ¬Ei , σ¬Ei ) and abnor-mal classes (µEi , σ

Ei ) based on ‘easy’ CN and ‘easy’

AD subjects for each biomarker i. As these meansand standard deviations are estimated based ontruncated Gaussians, these are biased estimates.The initial estimates of standard deviations are al-ways smaller than the expected unbiased estimateswhereas the initial estimates of means are underes-timated for Gaussians with smaller means (as com-pared to the other class for corresponding biomark-ers) and overestimated for Gaussians with largermeans.

We refine the initial estimates using a Gaussianmixture model (GMM) and include all the avail-able data, including MCI subjects and previouslymisclassified cases. To obtain a robust GMM fit,a constrained optimization method is used, withbounds on the means, standard deviations and mix-ing parameters, based on the aforementioned rela-tionship between the initial estimates and their cor-responding expected unbiased estimates. The ob-jective function for optimization for biomarker i isa summation of log-likelihoods, for all subjects:

Ci =∑∀j

log f(xj,i) (5)

where the likelihood function f(xj,i) is computed asa function of mixing parameters (θEi , θ

¬Ei ) for the

groups corresponding to post-event and pre-eventrespectively and their corresponding Gaussian dis-tributions (µEi , σ

Ei ) and (µ¬Ei , σ¬Ei ):

Figure 3: Overview of the steps involved in the proposedGaussian Mixture Model optimization strategy. A) Illustra-tion of the initialization step for Gaussian Mixture Model.Rejecting the tails of the Gaussian distribution in CN andAD class is done to account for the fact that some of theCN subjects could be in pre-symptomatic stage of diseaseprogression and some of the biomarkers could still be nor-mal in AD subjects. B and C) This is followed by iterativeestimation of Gaussian parameter optimization and Mixingparameter optimization.

f(xj,i) = θEi p(xj,i|µEi , σEi )+θ¬Ei p(xj,i|µ¬Ei , σ¬Ei )

= θEi p(xj,i|Ei) + θ¬Ei p(xj,i|¬Ei) (6)

θEi and θ¬Ei are selected such that θEi + θ¬Ei = 1.The mixing parameters and the Gaussian parame-ters are optimized alternately, until convergence ofthe mixing parameters. The initialization and opti-mization strategy in GMM is illustrated in Figure 3.

The strategy of alternating between optimizingfor mixing parameter and optimizing for Gaussianparameters in combination with the initializationstrategy and the subsequent constraints is differentfrom all previous versions of EBM and it will beshown in Section 5 that this results in more accuratecentral ordering of events in most cases.

3.2. Estimating a disease progression timeline

3.2.1. Estimating Subject-Specific Orderings

The PDF thus obtained is used for classifica-tion of the biomarkers using a Bayesian classifier,where the mixing parameters (θEi and θ¬Ei ) areused as the prior probabilities (p(Ei) and p(¬Ei)respectively) when estimating posterior probabili-ties for each biomarker. We assume these posteriorprobabilities to be a measure of progression of abiomarker. Thus, sj is established such that:

5

sj 3 p(Esj(1)|xj,sj(1)) > p(Esj(2)|xj,sj(2)) >

... > p(Esj(N)|xj,sj(N)) (7)

However, the posterior probability is influencednot only by progression of the biomarker value toits abnormal state, but also by inherent variabilityin normal and abnormal biomarker values acrosssubjects, and by measurement noise. Disentan-gling measurement noise and inherent variability innormal biomarker values from progression of thebiomarker to its abnormal state can only be donebased on longitudinal data. This makes sj a noisyestimate.

3.2.2. Estimating a central ordering

Since the event ordering for each subject is es-timated independently, any heterogeneity in dis-ease progression is captured in the estimates of sj .The central event ordering (S) is the mean of thesubject-specific estimates of sj . To describe the dis-tribution of sj , we make use of a generalized Mal-lows model. The generalized Mallows model is pa-rameterized by a central (‘mean’) ordering as wellas spread parameters (analogous to the standarddeviation in a normal distribution). The centralordering is defined as the ordering that minimizesthe sum of distances to all subject-wise orderingssj . To measure distance between orderings, an of-ten used measure is Kendall’s Tau distance (Huangand Alexander, 2012). Kendall’s Tau distance be-tween a subject specific event ordering (sj) and cen-tral ordering (S) can be defined as:

K(S, sj) =

N−1∑i=1

Vi(S, sj) (8)

where Vi(S, sj) is the number of adjacent swapsneeded so that event at position i is the same insj and S.

Since the estimates of sj are based on rankingsof posterior probabilities, it would be desirable topenalize certain swaps more than others, based onhow close the posterior probabilities are to eachother. To this end, we introduce a probabilisticKendall’s Tau distance, which penalizes each swapbased on the difference in posterior probabilities ofthe corresponding events.

K(S, sj) =

N−1∑i=1

Vi(S, sj) (9)

Vi∀i ∈ [1, N − 1] is computed sequentially usingthe following algorithm5:

Algorithm 1 Probabilistic Kendall Tau distancebetween Subject-specific event orderings and cen-tral event ordering

1: for i ∈ [1, N − 1] do2: k ← s−1

j (S(i))3: if k > i then4: Vi(S, sj)←

∑kl=i+1 pi − pl

5: Move sj(k) to position i and update sj6: else7: Vi(S, sj)← 0

where pa is shortened notation forp(Esj(a)|xj,sj(a)

).

This variant of Kendall’s Tau distance is quiteclose to the weighted Kendall’s Tau distance de-fined in the permutation space introduced in Ku-mar and Vassilvitskii (2010). The difference stemsfrom the fact that since the probabilistic Kendall’sTau distance is between individual estimates anda central-ordering, the penalization of each swap isweighted asymmetrically as Vi(S, sj) 6= Vi(sj , S).

The optimum S is the one that minimizes∑∀j K(S, sj). However, computing a global opti-

mum S based on subject-wise orderings is NP-hard.Thus getting a good initial estimate of S is impor-tant to ensure the estimated S is not a suboptimallocal optimum. In our implementation the initialestimate of S is based on ordering θ¬Ei . The moti-vation for this is discussed in Section 3.3. S was fur-ther optimized based on the algorithm introducedby Fligner and Verducci (1988) to estimate thecentral ordering.

3.2.3. Estimating Event Centers

The S that has been derived in this manner, isan estimate of the sequence in which the biomarkersbecome abnormal during the progression of a dis-ease. However, it falls short of being a disease time-line, because it does not provide information aboutthe proximity of consecutive events. To address thisissue, we estimate distances between events by com-puting the cost of adjacent swaps in the event or-dering, as measured by summation of probabilistic

5The summation symbol in step 4 was missed accidentallyin Venkatraghavan et al. (2017).

6

Kendall’s Tau distance over all subjects.

Γi+1,i =∑∀j

K(Si+1,i, sj)− K(S, sj) (10)

where Si+1,i is identical to S except for the swapbetween events at locations i and i + 1, and Γi+1,i

is the cost of the swap. This represents the cost forthe central ordering to be Si+1,i instead of S. Wehypothesize that the closer the events i+1 and i areto each other, the lower the swapping cost would be.Hence we consider these costs to be proportionalto distance between events in terms of biomarkerprogression.

To estimate the distance of the first biomarkerbeing abnormal (event) in S to a hypotheticaldisease-free individual, we introduce a pseudo-eventwhich becomes abnormal at the beginning of thedisease timeline and hence is abnormal for all thesubjects in the database i.e. p (E0|xj,0) = 1 ∀j.Similarly, we introduce another pseudo-event whichbecomes abnormal at the end of the disease time-line and hence is normal for all the subjects in thedatabase i.e. p (EN+1|xj,N+1) = 0 ∀j. We scaleΓi+1,i∀i ∈ [0, N ] such that

∑Γi+1,i = 1. Event

center (λk) of event k in S for k > 0, is computedas follows:

λk =

k−1∑i=0

Γi+1,i (11)

In fact, the concept of event centers can also beextended to Fonteijn’s EBM by computing the costof adjacent swaps in the event ordering as the dif-ference in log-likelihoods as follows:

Γi+1,i = log (p(X|S))− log (p(X|Si+1,i)) (12)

Extension of this concept to Huang’s EBM is notstraightforward and is beyond this paper’s scope.

The set of event centers λ1,2,...,N , will henceforthbe referred to as Λ. This results in a disease time-line, with S giving information about the order ofprogression of biomarkers and Λ giving informationabout the event centers in this timeline.

3.3. Patient Staging

Once the central ordering of events (S) and eventcenters (Λ) have been determined, we propose a pa-tient staging algorithm where a patient stage (Υj)is interpreted as an expectation of λk with respect

to the conditional distribution p(k|S,Xj). Thus,Υj can be written as given below:

Υj =

∑Nk=1 λkp(k|S,Xj)∑Nk=1 p(k|S,Xj)

(13)

Multiplying p(S,Xj) in both numerator and de-nominator and using the chain rule of probabilityresults in:

Υj =

∑Nk=1 λkp(k, S,Xj)∑Nk=1 p(k, S,Xj)

(14)

Using chain rule of probability, we can writep(k, S,Xj) as:

p(k, S,Xj) = p(Xj |k, S)p(k, S) (15)

If we assume a uniform distribution of p(k|S)and p(S) as in Fonteijn et al. (2012), p(k, S,Xj)becomes equal to p(Xj |k, S), which was used forpatient staging in Fonteijn’s EBM as discussed inSection 2. However we use prior knowledge in orderto define a more informative distribution p(k, S):

p(k, S) =

∏ki=1 θ

ES(i)

∏Ni=k+1 θ

¬ES(i)

Z(16)

where Z is a normalizing factor, chosen so as tomake this a probability. This choice of p(k, S) canbe justified because biomarkers which become ab-normal earlier in the disease process are more likelyto have a higher value of θEi than the biomarkerswhich become abnormal later. Hence it is far morelikely to have a central-ordering based on ascend-ing values of θ¬Ei than an ordering with ascendingvalues of θEi . It should be noted that, the choiceof p(k, S) is not unique. For example, it could alsobe any n-th power of the above equation ∀n > 0.Thus, from Equations 15, 16 and 4, we get:

p (k, S,Xj) ∝k∏i=1

p(xj,S(i)|ES(i)

)θES(i)×

N∏i=k+1

p(xj,S(i)|¬ES(i)

)θ¬ES(i) (17)

Using the above value of p (k, S,Xj) in Equa-tion 14, results in continuous patient stages.

7

4. Experiments

This section describes the experiments per-formed to benchmark the accuracy of the proposedDEBM algorithm and compare it with state-of-the-art EBM methods. The EBM methods usedfor comparison in these experiments are Huang’sEBM (Huang and Alexander, 2012) and the variantof Fonteijn’s EBM that is suited for AD disease pro-gression modeling (Young et al., 2014). The sourcecode for DEBM and Fonteijn’s EBM, with differ-ent mixture modeling techniques and patient stag-ing techniques discussed in this paper have beenmade publicly available online under the GPL 3.0license: https://github.com/88vikram/pyebm/.The source code for Huang’s EBM used in ourexperiments was provided by the authors of themethod.

For brevity, Fonteijn’s EBM and Huang’s EBMwill henceforth be referred to as FEBM and HEBM,respectively. The mixture model used with an EBMmodel (as the one described in Section 3.1) will bedenoted by a subscript. For example, FEBM withthe Gaussian mixture model proposed in Younget al. (2014) will be referred to as FEBMay. TheGaussian mixture model optimization techniquesin Huang and Alexander (2012), Venkatraghavanet al. (2017) and the one introduced in this paperwill be denoted with subscripts ‘jh’, ‘vv1’ and ‘vv2’respectively.6

Data used in the experiments were obtainedfrom the Alzheimers Disease Neuroimaging Initia-tive (ADNI) database (adni.loni.usc.edu) 7 . We be-gin with the details of the experiments performedon ADNI data to estimate the event ordering inSection 4.1. Since the ground-truth event ordering

6Mixture model ‘ay’ optimizes for Gaussian and mixingparameters together. Initialization of Gaussian parametersfor optimization is done without rejecting the overlappingpart of Gaussians in CN and AD classes. ‘vv1’ also optimizesfor Gaussian and mixing parameters together (although withmuch stricter bounds) but the initialization of Gaussian pa-rameters is similar to the one in this paper. ‘jh’ couplesmixture modeling with estimation of subject-specific order-ing to estimate a combined optimum solution.

7The ADNI was launched in 2003 as a public-private part-nership, led by Principal Investigator Michael W. Weiner,MD. The primary goal of ADNI has been to test whetherserial magnetic resonance imaging (MRI), positron emissiontomography (PET), other biological markers, and clinicaland neuropsychological assessment can be combined to mea-sure the progression of mild cognitive impairment (MCI) andearly Alzheimers disease (AD). For up-to-date information,see www.adni-info.org.

is unknown for clinical datasets, we resort to us-ing the ability of patient staging to classify AD andCN subjects, as an indirect way of measuring thereliability of the event ordering. We also measurethe accuracy of event ordering and relative distancebetween events more directly by performing exten-sive experiments on synthetic data simulating theprogression of AD. The details of these experimentsare given in Section 4.2.

4.1. ADNI Data

We considered 1737 ADNI subjects (417 CN,978 MCI and 342 AD subjects) who had a struc-tural MRI (T1w) scan at baseline. The T1w scanswere non-uniformity corrected using the N3 algo-rithm (Tustison et al., 2010). This was followedby multi-atlas brain extraction using the methoddescribed in Bron et al. (2014). Multi-atlas seg-mentation was performed (Hammers et al., 2003;Gousias et al., 2008) using the structural MRI scansto obtain a region-labeling for 83 brain regions ineach subject using a set of 30 atlases. Probabilistictissue segmentations were obtained for white mat-ter, gray matter (GM), and cerebrospinal fluid onthe T1w image using the unified tissue segmen-tation method (Ashburner and Friston, 2005) ofSPM8 (Statistical Parametric Mapping, London,UK). The probabilistic GM segmentation was thencombined with region labeling to obtain GM vol-umes in the extracted regions. We also downloadedCSF (Aβ1−42 (ABETA), TAU and p-TAU) and cog-nitive score (MMSE, ADAS-Cog) values from theADNI database, making the total number of fea-tures equal to 88.

The features TAU and p-TAU were transformedto logarithmic scales to make the distributions lessskewed. GM volumes of segmented regions wereregressed with age, sex and intra-cranial volume(ICV) and the effects of these factors were subse-quently corrected for, before being used as biomark-ers. The effect of age and sex was regressed out ofCSF based features, whereas effects of age, sex andeducation was regressed out of cognitive scores.

We retained 52 biomarkers (GM volume basedbiomarkers of 47 regions, 3 CSF and 2 cognitivescores) having significant differences between CNand AD subjects using Student’s t-test with p <0.005, after Bonferroni correction. These biomarkervalues were used to perform two sets of experi-ments.

Experiment 1(a): A subset of 7 biomarkers in-cluding the 3 CSF features, MMSE score, ADAS-

8

https://github.com/88vikram/pyebm/

Cog score, gray matter volume of the hippocam-pus (combined volume of left and right hippocampi)and gray matter volume in whole brain was created.Event ordering of these 7 biomarkers was inferredusing DEBM. We studied the positional varianceof central ordering and variance of event centersinferred by DEBM by creating 100 bootstrappedsamples of the data.Experiment 1(b): The Biomarkers were

ranked based on their aforementioned p-value andthe above experiment was repeated with top 25 andtop 50 biomarkers to investigate if the event-centersestimated for the subset of Biomarkers used in Ex-periment 1(a), remain comparable to the ones esti-mated in Experiment 1(a).Experiment 2: As an indirect way of measur-

ing the accuracy of the estimated event ordering, weuse patient staging based on the estimated event or-derings as a way to classify CN and AD subjects inthe database. 10-fold cross validation was used forthis purpose. AUC measures were used to measurethe performance of these classifications and thusindirectly hint at the reliability of the event order-ing based on which the corresponding patient stag-ing were performed. We used varying number ofbiomarkers (ranked based on their p-value) rangingfrom 5 to 50 in steps of 5 for this experiment. Weused the methods FEBMay, HEBMjh, DEBMvv1

and DEBMvv2 for inferring the ordering. Patientstaging was done based on the methods describedin their respective papers. Since the earlier versionof DEBM (Venkatraghavan et al., 2017) had notintroduced a patient staging method, we use thepatient staging method described in this paper forevaluating the method.

4.2. Simulation Data

We used the framework developed by Young et al.(2015a) for simulating cross-sectional data consist-ing of scalar biomarker values for CN, MCI and ADsubjects. In this framework, disease progression ina subject is modeled by a cascade of biomarkers be-coming abnormal and individual biomarker trajec-tories are represented by a sigmoid. The equationfor generating biomarker values for different sub-jects is given below:

xj,i(Ψ) =Ri

1 + exp(−ρi(Ψ− ξj,i))+ βj,i (18)

Ψ denotes disease stage of a subject which wetake to be a random variable distributed uniformly

throughout the disease timeline. ρi signifies the rateof progression of a biomarker, which we take to beequal for all subjects. ξj,i denotes the disease stageat which the biomarker becomes abnormal. βj,i de-notes the value of the biomarker when the subject isnormal and Ri denotes the range of the sigmoidaltrajectory of the biomarker, which we take to beequal for all subjects.

In our experiments, βj,i and ξj,i ∀j are as-sumed to be random variables with Normal dis-tribution N(µβi ,Σβi) and N(µξi ,Σξi) respectively.µβi is equal to the mean value of the correspondingbiomarker in the CN group of the selected ADNIdata. Ri is equal to the difference between themean values of the biomarker in the CN and ADgroups of the selected ADNI data. Σβi

representsthe variability of biomarker values in the CN group.We consider a relative scale for Σβi , where 1 refersto the observed variation among the CN subjectsin ADNI data. Variation in ξj,i is controlled byΣξi and results in variation in ordering among sub-jects in population and could be seen as a param-eter controlling the disease heterogeneity within asimulated population. Σξi ∀i is varied in multiplesof ∆ξ, where ∆ξ is the average difference betweenadjacent µξi . µξi refers to the event centers of var-ious biomarkers. The set of µξi∀i will collectivelybe referred to as Λgt and they will be used to assessthe accuracy of estimated event centers (λi).

The parameters in the simulation framework thatcould have an effect on the performance of EBMsare Σβi

, µξi , Σξi , and ρi. Apart from this, the num-ber of subjects (M) and the number of biomarkers(N) in the dataset could also have an effect on theperformance of EBMs. Using this simulation frame-work, we study the effect of the aforementioned pa-rameters on the ability of different variants of EBMalgorithms to accurately infer the ground-truth cen-tral ordering in the population. Change in µβi re-sults only in a translational effect on biomarker val-ues and change in Ri results only in a scaling effecton biomarker values. These factors do not affectthe performance of the EBMs and hence were notevaluated in our experiments.

Performance of an EBM method can be measuredusing error in estimation of either S or Λ. Error inestimating S (εS) will henceforth be referred to as‘ordering error’ whereas the error in estimating Λ(εΛ) will henceforth be referred to as ‘event-center

9

error’. εS is computed using the following equation:

εS =K(S, Sgt)(

N2

) (19)

where Sgt is the ground truth ordering. εS is effec-tively a normalized Kendall’s Tau distance betweenS and Sgt. The normalization factor for

(N2

), was

chosen to make the accuracy measure interpretablefor different number of biomarkers.

For comparing Λ and Λgt, Λ were scaled andtranslated such that the mean and standard devi-ation of Λ were equal to that of Λgt. This is donebecause we are only interested in evaluating the er-rors in estimating relative distance between eventsand not the absolute position of event-centers. Thechoice of scale in event-centers are arbitrary andthe chosen scale for the estimated event-centers wasbased on pseudo-events, which need not necessarilycoincide with the simulation framework’s ground-truth event-centers.

εΛ =∑∀i

|λsti − µξi | (20)

where λsti is the scaled and translated version of λi.As mentioned before, the factors that can have

an effect on the performance of EBMs are Σβi, µξi ,

Σξi , ρi, M and N . In each of the following 5 ex-periments, a few of these factors were varied whilethe others were set to their default values. Thedefault value for Σβi was taken to be 1 as this cor-responds to the observed variation among CN sub-jects in ADNI. µξi were spaced equidistantly, i.e.,µξi+1

− µξi = 1/(N + 1). As the actual variation inevent centers among different subjects is not knownin a clinical dataset, the default value of Σξi wastaken to be 2∆ξ. For the sake of simplicity of nota-tion ∆ξ will be omitted henceforth, and the valuesof Σξi are implicitly in multiples of ∆ξ. ρi was con-sidered to be equal for all biomarkers by default.The default values for M and N were 1737 and 7respectively, mimicking the dataset used in Exper-iment 1(a). For each simulation setting, 50 repeti-tions of simulation data were created and used forbenchmarking the performance of EBMs on syn-thetic data.Experiment 3: The first simulation experi-

ment was performed to study the effect of Σβ ∈[0.2, 1.8] and Σξ ∈ [0, 4], varying one at a timewhile keeping the other at its mean value. TheεS of FEBMay, FEBMvv2, HEBMjh, HEBMvv2,DEBMvv1 and DEBMvv2 were determined.

Experiment 4: The above experiment was re-peated for DEBMvv2 and FEBMvv2 and the εΛ weremeasured for the two methods.

Experiment 5: This experiment was performedto study the effect of a non-uniform distribution ofµξi . Σβ and Σξ combinations of (0.6, 1), (1.0, 2),(1.4, 3) and (1.8, 4) were tested to study their ef-fect in non-uniformly spaced biomarkers. εS ofDEBMvv2, FEBMvv2 and HEBMvv2 were mea-sured. Additionally, εΛ of DEBMvv2 and FEBMvv2

were measured. To also study the effect of unequalrates of progression of biomarkers (ρi), the aboveexperiment was performed once with equal ρi forall biomarkers and once when they were unequal.The experiment with unequal biomarker rates hadthe same mean biomarker progression rate as thethe experiment with equal biomarker rates. Theprogression rates of different biomarkers has beenincluded as supplementary material (Figure S1).

Experiment 6: This experiment was performedto study the influence of the number of subjects(M). M was varied from 100 to 2100 in steps of200. εS of DEBMvv2, FEBMvv2 and HEBMvv2 weremeasured. DEBMvv2 and FEBMvv2 were also as-sessed based on εΛ.

Experiment 7: This experiment was performedto study the influence of the number of biomarkers(N). N was varied from 7 to 52 in steps of 5. Ineach random generation of a dataset, we randomlyselected (with replacement) the biomarkers to beused in the iteration. This was done to study theeffect of N on the EBM models and separate itfrom the effect of adding weaker biomarkers. εSof DEBMvv2, FEBMvv2 and HEBMvv2 were mea-sured. DEBMvv2 and FEBMvv2 were also assessedbased on εΛ.

5. Results

5.1. ADNI Data

Experiment 1: Figures 4, 5 and 6 show thepositional variance and event-center variance ob-tained using DEBMvv2 with 7, 25 and 50 numberof events respectively. Table 1 shows the abbrevia-tions used in Figures 5 and 6 along with their fullnames. Figure 7 maps the colors used for y-axislabels of Figures 5 and 6 to different lobes in thebrain.

It can be seen from Figure 4 (left) that CSF-based biomarkers ABETA becomes abnormal be-fore MMSE and CSF-based p-TAU. This is followed

10

Figure 4: Experiment 1(a): DEBMvv2 with 7 Events. The positional variance diagram (left) shows the uncertainty in estimatingthe central event ordering. The event-center variance diagram (right) shows the standard error of estimated event centers. Thesewere measured by 100 repetitions of bootstrapping.

Figure 5: Experiment 1(b): DEBMvv2 with 25 Events. The positional variance diagram (left) shows the uncertainty inestimating the central event ordering and the event-center variance diagram (right) shows the standard error of estimatedevent centers. These were measured by 100 repetitions of bootstrapping. The event centers of the biomarkers used in Figure 4are marked in red. Table 1 shows the full forms of the abbreviations used in the y-axis labels. Figure 7 maps the colors usedfor y-axis labels to different lobes in the brain.

11

Figure 6: Experiment 1(b): DEBMvv2 with 50 Events. Positional variance diagram (left) shows the uncertainty in estimatingthe central event ordering and event center variance diagram (right) shows the standard error of estimated event-centers. Thesewere measured by 100 repetitions of bootstrapping. The event-centers of the biomarkers used in Figure 4 are marked in red,whereas the ones used in Figure 5 are marked in blue. Table 1 shows the full forms of the abbreviations used in the y-axislabels. Figure 7 maps the colors used for y-axis labels to different lobes in the brain.

12

Abbreviation Full name

L LeftR Right

PHA Parahippocampalis et AmbiensMed. MedialInf. InferiorSup. Superior

Temp. TemporalPos. PosteriorLat. LateralAnt. AnteriorOT Occipitotemporal

Cent. CentralMid. MiddleRem. RemainderOcc. OccipitalPS Pre-subgenual

Table 1: Abbreviations used in Figures 5 and 6 along withtheir full names (Hammers et al., 2003).

Figure 7: Legend for the colors used in Figures 5 and 6. Thecolors map different biomarker labels to lobes in the brain.

by ADAS13, Hippocampal volume, TAU and wholebrain volume. However Figure 4 (right) shows thatthe event centers for MMSE, ADAS13, p-TAU areclose to each other and so are the event-centers ofTAU and hippocampus volume. The event associ-ated with the TAU biomarker seems closer to thewhole brain volume event as they are in positions6 and 7 of Figure 4 (left). However, the centersof these two events are quite far apart in Figure 4(right) and the p-TAU event (position 2) is closerto the TAU event than whole brain volume event.

As the number of biomarkers increases, the vari-ation in the positions also increases considerably, asseen in Figures 5 (left) and 6 (left). The event cen-ters of the biomarkers used in Experiment 1(a) re-main fairly consistent (±0.05) in Experiment 1(b).It can also be seen that biomarkers with lower p-values (biomarkers included in the model with 50biomarkers and not in the model with 25 biomark-ers), have larger variance in their event-center esti-mation.

Experiment 2: Figure 8 (a) shows the meanAUC when using patient stages for classifying CNversus AD subjects using DEBM and other vari-

ants of EBM methods. It can be observed that theAUC of all the methods decreases as the number ofevents increases. The proposed method DEBMvv2

followed by the proposed patient staging algorithmoutperforms all the existing EBM variants consis-tently.

Figure 8 (b) shows the distribution of patientstages for the whole population when the mostsignificant 25 features were given as input toDEBMvv2. This graph shows a peak at diseasestage 0 dominated by CN and MCI non-converters,which shows that these subjects are not progress-ing towards AD. The non-zero lower disease stagesare dominated by CN subjects and MCI non-converters, whereas MCI converters8 and the sub-jects with AD have higher disease stages.

5.2. Simulation Data

Experiment 3: Figures 9 shows the orderingerrors of DEBM, FEBM and HEBM models withdifferent mixture models as Σβ and Σξ increase.The error-bars depict mean and standard deviationof the errors obtained in 50 repetitions of simula-tions. It can be seen that the proposed optimiza-tion technique improves the performance of all threeEBM models. The change is particularly evidentwhen comparing the performance of FEBMvv2 andFEBMay.

It can also be seen that FEBMvv2 performsslightly better than DEBMvv2 when Σξ is low, butas Σξ increases, the performance of FEBMvv2 de-grades significantly. The performance of HEBMis almost always worse than its FEBM or DEBMcounterpart.

Experiment 4: Figure 10 (a) and (b) showsthe event-center errors in DEBMvv2 and FEBMvv2

as the variability in population (Σβ) and diseaseheterogeneity (Σξ) increases respectively. It shouldbe noted from Figure 9(b) and Figure 10 (b) that,even when the FEBMvv2 gets the ordering moreaccurately than DEBMvv2 in cases of low Σξ, theevent-center estimation of DEBMvv2 is on par withor better than its FEBM counterpart.

Figure 10 (c) shows the estimated event-centerlocations for Σβ = 1.0 and Σξ = 2 and the groundtruth event-centers.

Experiment 5: Figure 11 (a) shows the orderingerrors of DEBMvv2, FEBMvv2 and HEBMvv2 as Σβ

8MCI converters are subjects who convert to AD within3 years of baseline measurement

13

Number of Events

Areaunder

ROC

curve

(a)

Disease Stage

Frequen

cyofOccurren

ce

(b)

Figure 8: Experiment 2: In (a) we see the variation of AUCwith respect the number of biomarkers used for building themodel using DEBM, when the obtained patient stages wereused for classification of CN versus AD subjects. The AUCmeasure was obtained using 10-fold cross-validation. In (b)we see the frequency of occurrence of subjects in differentdisease stages, when the most significant 25 features weregiven as input to DEBMvv2 for inferring the ordering as wellas for patient staging.

and Σξ increase, when the ground-truth event cen-ters (µξi) are non-uniformly spaced. The spacingof µξi can be observed in Figure 11 (b), where theground truth event-centers as well as the estimatedevent-centers of DEBMvv2 and FEBMvv2 are shownfor Σβ = 1.0 and Σξ = 2. It can be observed thatthe estimated event-centers for DEBMvv2 are muchcloser to the ground-truth event centers than thoseof FEBMvv2 and also have a much lower varianceover different iterations of simulations.

Figure 11 (c) shows the ordering errors as Σβand Σξ increases, when µξi is non-uniformly spacedand ρi is not identical for all biomarkers. It shouldalso be noted that the mean of ρi over all i has not

Σβ

OrderingError

(a)

Σξ

OrderingError

(b)

(c)

Figure 9: Experiment 3: Ordering errors of DEBMvv1,DEBMvv2, FEBMay, FEBMvv2, HEBMjh and HEBMvv2 for50 repetitions of simulations. Figure (a) shows the orderingerror as a function of variability in population (Σβ). Fig-ure (a) shows the ordering error as a function of variationin ordering (Σξ). Error bars in (a) and (b) represent stan-dard deviations over the 50 repetitions. Figure (c) shows thelegend for the plots in (a) and (b).

changed between (a) and (c). The variation of er-rors in (c) is quite similar to the one in (a). Thisshows that performance of EBM methods that arereported in other experiments (where ρi is equalfor all biomarkers) can be expected to not deteri-orate in the more realistic scenario of ρi not beingequal for all biomarkers. The event-center variancefor Σβ = 1.0 and Σξ = 2 for the case of unequalρi is very similar to (b) and has been included assupplementary material (Figure S2).

14

Σβ

Even

t-CenterError

(a)

Σξ

Even

t-CenterError

(b)

Disease Stage

(c)

Figure 10: Experiment 4: Figures (a) and (b) show theevent-center errors of DEBMvv2 and FEBMvv2 as a func-tion of Σβ and Σξ respectively. Figure (c) shows the es-timated event-center locations for both methods as well asthe ground-truth event centers. Error bars in (a), (b) and(c) represent standard deviation over 50 repetitions of sim-ulation.

Experiment 6: Figure 12 shows the meanordering errors of DEBMvv2, FEBMvv2 andHEBMvv2 as a function of number of subjects inthe dataset on one vertical axis and shows the mean

{Σβ ,Σξ}

OrderingError

(a)

Disease Stage

(b)

{Σβ ,Σξ}

OrderingError

(c)

Figure 11: Experiment 5: Figures (a) and (c) show the or-dering errors of DEBMvv2, FEBMvv2 and HEBMvv2 whenµξi are not uniformly distributed. Σβ and Σξ increase aswe move from left to right. Figure (a) shows the errors inthe case when ρi are identical for all the biomarkers whereas(c) shows the errors when ρi are different. Figure (b) showsthe non-uniform µξi as well as the estimated event-centersby DEBMvv2 and FEBMvv2 for the case of ρi being equal.Error bars in (a), (b) and (c) represent standard deviationover 50 repetitions of simulation.

event-center errors of DEBMvv2 and FEBMvv2 onthe other vertical axis. As expected, the mod-els perform better as the number of subjects in-creases. DEBMvv2 is slightly better at inferringthe central ordering than FEBMvv2 when the num-

15

Number of Subjects (M)

Figure 12: Experiment 6: Ordering errors of DEBMvv2,FEBMvv2 and HEBMvv2 as a function of number of subjects(M) in the dataset. It also shows the event-center errors ofDEBMvv2 and FEBMvv2 as a function of M .

ber of subjects is very low, but FEBMvv2 outper-forms DEBMvv2 when the number of subjects ishigher. However, when the accuracy of event cen-ters are considered, DEBMvv2 consistently outper-forms FEBMvv2.

Experiment 7: Figure 13 shows the meanordering errors of DEBMvv2, FEBMvv2 andHEBMvv2 as a function of the number of events(biomarkers) in the dataset on one vertical axis andshows the mean event-center errors of DEBMvv2

and FEBMvv2 on the other vertical axis. Thebiomarkers were selected randomly after replace-ment so that the chances of selecting a badbiomarker remain equal as the number of eventsincreases. It can be noted that the errors of theEBM models increase as the number of events in-creases initially, even when the average quality ofbiomarkers remains the same. However the errorsstabilize beyond a certain point and do not increaseany more.

6. Discussion

We proposed a novel discriminative EBM frame-work to estimate the ordering in which biomark-ers become abnormal during disease progression,based on a cross-sectional dataset. The proposedframework outperforms state-of-the-art EBM tech-niques in estimating the event ordering. We alsointroduced the concept of relative distance betweenevent-centers, which enables creating a disease pro-gression timeline. This in turn led to the develop-ment of a new continuous patient staging mecha-

Number of Events (N)

Figure 13: Experiment 7: Ordering errors of DEBMvv2,FEBMvv2 and HEBMvv2 as a function of number of events(N) in the dataset. It also shows the event-center errors ofDEBMvv2 and FEBMvv2 as a function of N .

nism. In addition to the framework, we also pro-posed a novel probabilistic Kendall’s Tau distancemetric and a robust biomarker distribution estima-tion algorithm. In this section, we discuss differentaspects of the proposed algorithm.

6.1. Event Centers

Event-centers capture relative distance betweenevents. This helps in creating the disease progres-sion timeline from an ordering of events. Event cen-ters are an intrinsic property of the biomarker used,for the selected population. This was observed inExperiment 1(b) where the event-centers estimatedusing DEBMvv2 remained fairly consistent (±0.05)across models using different number of biomarkers.

The estimated disease progression timeline canbe used for inferring progression of the disease, withthe event centers being synonymous to milestonesof progression. A strict quantization of positionin ordering of events (as reported in Oxtoby andAlexander (2017), Venkatraghavan et al. (2017),Young et al. (2015a), Young et al. (2014), Fonteijnet al. (2012)) in the positional variance diagram cansometimes be non-intuitive in terms of inferring ac-tual progression of the disease. This was seen inExperiment 1(a), where the event center variancediagram showed that the TAU event (at position 6)was closer to the p-TAU event (at position 2) thanthe whole brain event (position 7).

The approach of scaling the event-centers be-tween [0, 1] has its advantages and disadvantages.The advantage of such a scaling is that modelsbuilt on different biomarkers, but within the same

16

population, remain comparable. For example, amodel built with CSF and MRI based biomarkerscan be compared with a model built on MRI basedbiomarkers alone, as the event-centers of MRI basedbiomarkers would approximately be the same. Onthe other hand, the position of the first event re-lies heavily on the number of ‘true’ controls inthe dataset (CN subjects who are not in an earlyasymptomatic stage of the disease). This is theresult of introducing pseudo-events for scaling theevents-centers.

Comparison of the event centers across differ-ent datasets with different number of controls (al-beit with the same biomarkers) can be done inthree ways. Event-centers can be scaled and trans-lated such that the mean and standard deviationof event centers computed across different datasetsare the same (similar to the comparisons betweenestimated and ground-truth event centers in thispaper). Alternately, the event center of the firstbiomarker can be set as 0 and the event center ofthe last biomarker can be set to 1, before compar-ison. Lastly, in a dataset where controls (i.e., sub-jects whose biomarker values are all normal) can beeasily identified, it would be better to exclude themfor event-center computation.

The estimated event centers have a good corre-lation with the groundtruth disease timeline. Thiscan be seen in the simulation experiments with andwithout uniform spacing of events (experiments 4and 5). It must however be noted that, the diseasestages Ψ of the simulated subjects were distributeduniformly throughout the disease timeline. If thedistribution is not uniform, we expect it to have aneffect on the estimation of event centers. Analyz-ing the exact effect of such non-uniform distribu-tions on the estimation of event centers and waysto estimate event centers invariant to the distribu-tion of subjects on the disease timeline could be aninteresting extension of the current work.

Experiment 5 also showed that different biomark-ers having different rates of progression does notdegrade the performance of EBM models, as longas the mean rate of progression is the same. Wedid not perform an experiment to benchmark theaccuracies by changing the mean rates of progres-sion of biomarkers. This experiment was alreadyperformed in Young et al. (2015b) and it was ob-served that FEBM ordering error decreases as themean rates of progression increase.

FEBM assumes that the disease is homogeneous,as it expects all the subjects in the dataset to fol-

low the same ordering. When the variability of or-dering in different subjects is low, FEBM with theproposed mixture model ‘vv2’ outperforms DEBMwith the proposed mixture model. This can be seenin the results of Experiments 3, 5 and 6. When theassumption becomes too restrictive, DEBM withthe proposed mixture model outperforms FEBM.Even when the assumption holds true, estimationof event-centers with DEBM is more accurate thanwith FEBM.

6.2. Patient Staging

Existing patient staging algorithms discretize thepatient stages based on event position, whereas thepatient staging algorithm introduced in this papertakes relative distance between events into consid-eration while staging new subjects. This makes pa-tient stages more useful for diagnosis and progno-sis as they correlate more with the actual diseaseprogression timeline. Discrete patient stages with-out considering the event centers could diminish theprognosis value of the obtained stages.

The cross-validation experiment on ADNI data(Experiment 2) showed that the CN and AD sub-jects are well separated after patient staging andthat the AUC of the proposed method is better thanthat of the state-of-the-art EBM techniques. It alsoshowed that MCI converters and non-converters arewell separated after patient staging, without explic-itly training the model to achieve this.

It must however be noted that even though het-erogeneity of the disease was considered while infer-ring the central ordering, it was not considered forpatient staging. Inferring multiple central orderingscorresponding to different disease subtypes (Younget al., 2015a) and staging patients on one of thesecentral orderings may help us overcome this draw-back. Patient staging with respect to subject-specific orderings (as done in HEBM) can also beconsidered when extending DEBM for longitudinaldata, where the subject-specific orderings might beestimated with higher confidence.

6.3. Scalability of Event-Based Models

Understanding the progression of several imagingand non-imaging biomarkers after disease onset isimportant for assessing the severity of the disease.Hence it is desirable to have a model scalable to alarge number of biomarkers. FEBM and DEBM arescalable to large number of events, whereas HEBMis not. This was seen in the simulation experiment

17

on varying number of events (Experiment 7), wherethe errors of FEBM and DEBM increased asymp-totically with increasing number of events. The or-dering errors of HEBM reached 0.5 for large numberof events, which is equivalent to random prediction.

In Experiment 6, we observed that the errorsof the EBMs decrease with increasing number ofsubjects in the dataset. We hence expect FEBM,DEBM and HEBMvv2 to be scalable to a large num-ber of subjects.

The performance of HEBMjh is seen to be consis-tently worse than FEBMay in Experiment 3. Thisis in contrast with the findings of Venkatragha-van et al. (2017), where HEBMjh performed bet-ter than FEBMay when the number of biomarkersused were 7, while it performed worse when thenumber of biomarkers used were 42. One of thekey differences between the experiment performedin Venkatraghavan et al. (2017) and Experiment 3is the number of subjects in the simulation dataset.While the previous study considered 509 subjects,Experiment 3 considered 1737 subjects. HEBMjh

jointly estimates the subject-specific orderings ofall the subjects and the mixture model to representthe biomarkers in different diagnostic groups. Wethink that while the joint estimation was good forlow number of subjects, increasing the number ofsubjects had an adverse effect on the convergenceof the algorithm. Hence HEBMjh is not scalable toa large number of subjects.

We decoupled the mixture model and estimationof subject-specific orderings in HEBMvv2 (Experi-ments 3, 5, 6 and 7). This made HEBM more scal-able as it improved the results in Experiment 3 with1737 subjects, but the decoupling had an adverseeffect on the algorithm when the number of subjectswas low, as seen in Experiment 6, where HEBMvv2

performs worse than FEBMvv2 even when the num-ber of subjects was low.

FEBM and HEBM are generative approaches forestimating the central ordering. Our results suggestthat HEBM is not very scalable. Although FEBMis scalable, the assumptions made in FEBM are toorestrictive for heterogeneous disease such as AD.DEBM is a discriminative approach to event-basedmodeling, which is both scalable and can robustlyestimate central ordering even when the disease isheterogeneous.

6.4. The Mixture Model

The optimization technique for the Gaussianmixture model that is presented in this paper de-

couples the optimization of Gaussian parametersand mixing parameters. When the Gaussians of thepre-event and post-event classes are highly overlap-ping, the optimum mixing parameter changes a loteven for small changes in the Gaussian parameters.By decoupling the optimizations for Gaussian pa-rameters and mixing parameters, we get more sta-ble mixing parameters. This helps in improving theaccuracy of all EBMs. This was observed in Exper-iment 3.

6.5. The Importance of Good Biomarkers

Quality of biomarkers plays a huge role in the ac-curacy of the EBMs. This was seen in Experiment7, where the mean error value for 7 biomarkers wasconsiderably higher than the mean error value withthe same number of biomarkers in Experiment 4(for the same Σβ and Σξ parameters). The ob-served difference can be explained by the choice ofthe biomarkers used in those experiments. Whilethe biomarkers chosen in Experiment 7 was at ran-dom, the ones chosen in Experiment 4 were the 7best biomarkers.

6.6. Interpretation of model results on ADNI

Experiment 1(a) showed that CSF biomarkersABETA and p-TAU are one of the first biomarkersthat become abnormal in AD, which is in agreementwith Jack’s hypothetical model (Jack Jr. et al.,2013). The event centers of Hippocampus volumeand TAU are quite close to each other as well, whichis also in agreement with the current understand-ing of the disease (de Souza et al., 2012). However,MMSE and ADAS biomarkers are seen to becomeabnormal quite early in the disease as well, which isnot in agreement with the hypothetical model. Thiscould suggest that functional abnormality starts be-fore structural abnormality. It could also be be-cause of the fact that we included everyone in ADNIbaseline measurement, for estimating the ordering.Subjects with MCI may not necessarily be progress-ing towards AD and including these subjects couldhave an effect on inferring the ordering.

Nucleus accumbens right and left are the firstbiomarkers to become abnormal as seen in Fig-ure 7. This was also observed by Young et al.(2017). However, the large standard error of theevent centers for the events before ABETA sug-gests that the exact position of those events areunreliable. Experiment 1(b) showed that weakbiomarkers (biomarkers excluded in Figure 5, but

18

included in Figure 6) could lead to greater uncer-tainty in event centers. This can be explained bythe fact that weak biomarkers are the ones wherethere is a lot of overlap between the Gaussians ofpre-event and post-event classes. Small variationin the sampling population during bootstrappingleads to large changes in the parameters estimatedin the mixture modeling step of the algorithm. Italso showed that majority of the early structuralbiomarkers are from Temporal lobe, followed byCentral structures, Frontal lobe, Parietal lobe andOccipital lobe.

7. Conclusion

We proposed a new framework for event-basedmodeling, called discriminative event-based model-ing (DEBM), which includes a new optimizationstrategy for Gaussian mixture modeling, a newparadigm for inferring the mean ordering, a wayfor estimating the proximity of events in the or-der to create a disease progression timeline, and anew way of staging patients that uses these rela-tive proximities of events while placing new sub-jects on the estimated timeline. The source codefor DEBM and FEBM was made publicly avail-able online under the GPL 3.0 license: https:

//github.com/88vikram/pyebm/.

We applied the DEBM framework to a set of 1737subjects from the baseline ADNI measurement, andalso performed an extensive set of simulation exper-iments verifying the technical validity of DEBM.The experiment on ADNI data illustrated a num-ber of advantages of the new approach. Firstly,we showed that strict quantization of position inordering of events in the positional variance dia-gram can sometimes be non-intuitive in terms ofinferring actual progression of a disease. Secondly,we showed that the patient staging based on theproposed approach separates CN and AD group ofsubjects much better than the previous EBM mod-els. Thirdly, we showed that the patient staging canbe used to identify individuals at-risk of developingAD as the MCI converters and non-converters werewell-separated. Staging patients based on the esti-mated disease progression timeline can thus makecomputer-aided diagnosis and prognosis more ex-plainable. The results of these experiments are en-couraging and suggest that DEBM is a promisingapproach to disease progression modeling.

Acknowledgement

This work is part of the EuroPOND initiative,which is funded by the European Union’s Hori-zon 2020 research and innovation programme undergrant agreement No. 666992. The authors thankDr. Jonathan Huang for sharing the implementa-tion of Huang’s EBM and Dr. Alexandra Young forsharing the implementation of the simulation sys-tem for biomarker evolution.

Data collection and sharing for this project wasfunded by the Alzheimer’s Disease NeuroimagingInitiative (ADNI) (National Institutes of HealthGrant U01 AG024904) and DOD ADNI (Depart-ment of Defense award number W81XWH-12-2-0012). ADNI is funded by the National Institute onAging, the National Institute of Biomedical Imag-ing and Bioengineering, and through generous con-tributions from the following: AbbVie, AlzheimersAssociation; Alzheimer’s Drug Discovery Founda-tion; Araclon Biotech; BioClinica, Inc.; Biogen;Bristol-Myers Squibb Company; CereSpir, Inc.;Cogstate; Eisai Inc.; Elan Pharmaceuticals, Inc.;Eli Lilly and Company; EuroImmun; F. Hoffmann-La Roche Ltd and its affiliated company Genen-tech, Inc.; Fujirebio; GE Healthcare; IXICO Ltd.;Janssen Alzheimer Immunotherapy Research & De-velopment, LLC.; Johnson & Johnson Pharma-ceutical Research & Development LLC.; Lumos-ity; Lundbeck; Merck & Co., Inc.; Meso Scale Di-agnostics, LLC.; NeuroRx Research; NeurotrackTechnologies; Novartis Pharmaceuticals Corpora-tion; Pfizer Inc.; Piramal Imaging; Servier; TakedaPharmaceutical Company; and Transition Thera-peutics. The Canadian Institutes of Health Re-search is providing funds to support ADNI clinicalsites in Canada. Private sector contributions arefacilitated by the Foundation for the National In-stitutes of Health (www.fnih.org). The grantee or-ganization is the Northern California Institute forResearch and Education, and the study is coor-dinated by the Alzheimer’s Therapeutic ResearchInstitute at the University of Southern California.ADNI data are disseminated by the Laboratory forNeuro Imaging at the University of Southern Cali-fornia.

References

References

Ashburner, J., Friston, K.J., 2005. Unified segmentation.NeuroImage 26, 839 – 851.

19



Bron, E.E., Steketee, R.M., Houston, G.C., Oliver, R.A.,Achterberg, H.C., Loog, M., van Swieten, J.C., Ham-mers, A., Niessen, W.J., Smits, M., Klein, S., for theAlzheimer’s Disease Neuroimaging Initiative, 2014. Di-agnostic classification of arterial spin labeling and struc-tural MRI in presenile early stage dementia. Human BrainMapping 35, 4916–4931.

Donohue, M.C., Jacqmin-Gadda, H., Goff, M.L., Thomas,R.G., Raman, R., Gamst, A.C., Beckett, L.A., Jack, C.R.,Weiner, M.W., Dartigues, J.F., Aisen, P.S., 2014. Es-timating long-term multivariate progression from short-term data. Alzheimer’s & Dementia 10, S400 – S410.

Fligner, M.A., Verducci, J.S., 1988. Multistage ranking mod-els. Journal of the American Statistical Association 83,892–901.

Fonteijn, H.M., Modat, M., Clarkson, M.J., Barnes, J.,Lehmann, M., Hobbs, N.Z., Scahill, R.I., Tabrizi, S.J.,Ourselin, S., Fox, N.C., Alexander, D.C., 2012. An event-based model for disease progression and its applicationin familial Alzheimer’s disease and Huntington’s disease.NeuroImage 60, 1880 – 1889.

Gousias, I.S., Rueckert, D., Heckemann, R.A., Dyet, L.,Boardman, J.P., Edwards, A.D., Hammers, A., 2008. Au-tomatic segmentation of brain MRIs of 2-year-olds into 83regions of interest. NeuroImage 40, 672–684.

Hammers, A., Allom, R., Koepp, M.J., Free, S.L., Myers,R., Lemieux, L., Mitchell, T.N., Brooks, D.J., Duncan,J.S., 2003. Three-dimensional maximum probability at-las of the human brain, with particular reference to thetemporal lobe. Human Brain Mapping 19, 224–247.

Huang, J., Alexander, D., 2012. Probabilistic event cascadesfor Alzheimer’s disease, in: Pereira, F., Burges, C.J.C.,Bottou, L., Weinberger, K.Q. (Eds.), Advances in NeuralInformation Processing Systems 25. Curran Associates,Inc., pp. 3095–3103.

Iturria-Medina, Y., Sotero, R.C., Toussaint, P.J., Mateos-Prez, J.M., Evans, A.C., on behalf of the Alzheimer’s Dis-ease Neuroimaging Initiative, 2016. Early role of vasculardysregulation on late-onset Alzheimer’s disease based onmultifactorial data-driven analysis. Nature Communica-tions 7, 11934.

Jack, C.R., Bennett, D.A., Blennow, K., Carrillo, M.C.,Dunn, B., Haeberlein, S.B., Holtzman, D.M., Jagust, W.,Jessen, F., Karlawish, J., Liu, E., Molinuevo, J.L., Mon-tine, T., Phelps, C., Rankin, K.P., Rowe, C.C., Schel-tens, P., Siemers, E., Snyder, H.M., Sperling, R., Elliott,C., Masliah, E., Ryan, L., Silverberg, N., 2018. NIA-AA research framework: Toward a biological definition ofAlzheimer’s disease. Alzheimer’s & Dementia 14, 535 –562.

Jack Jr., C.R., Knopman, D.S., Jagust, W.J., Petersen,R.C., Weiner, M.W., Aisen, P.S., Shaw, L.M., Vemuri,P., Wiste, H.J., Weigand, S.D., Lesnick, T.G., Pankratz,V.S., Donohue, M.C., Trojanowski, J.Q., 2013. Trackingpathophysiological processes in Alzheimer’s disease: anupdated hypothetical model of dynamic biomarkers. TheLancet Neurology 12, 207 – 216.

Kumar, R., Vassilvitskii, S., 2010. Generalized distances be-tween rankings, in: Proceedings of the 19th InternationalConference on World Wide Web, ACM, New York, NY,USA. pp. 571–580.

Meila, M., Bao, L., 2010. An exponential model for infiniterankings. J. Mach. Learn. Res. 11, 3481–3518.

Oxtoby, N.P., Alexander, D.C., 2017. Imaging plus X: mul-timodal models of neurodegenerative disease. Current

Opinion in Neurology 30, 371–379.Prince, M., Wimo, A., Guerchet, M., Ali, G.C., Wu, Y.T.,

Prina, M., 2015. World Alzheimer’s report 2015, theglobal impact of dementia: An analysis of prevalence, in-cidence, cost and trends. Alzheimer’s Disease Int’l .

Sabuncu, M.R., Bernal-Rusiel, J.L., Reuter, M., Greve,D.N., Fischl, B., 2014. Event time analysis of longitu-dinal neuroimage data. NeuroImage 97, 9 – 18.

Schmidt-Richberg, A., Ledig, C., Guerrero, R., Molina-Abril, H., Frangi, A., Rueckert, D., on behalf of theAlzheimers Disease Neuroimaging Initiative, 2016. Learn-ing biomarker models for progression estimation ofAlzheimers disease. PLoS ONE 11, 1–27.

Schott, J.M., Bartlett, J.W., Fox, N.C., Barnes, J., for theAlzheimer’s Disease Neuroimaging Initiative Investiga-tors, 2010. Increased brain atrophy rates in cognitivelynormal older adults with low cerebrospinal fluid Aβ1−42.Annals of Neurology 68, 825–834.

de Souza, L.C., Chupin, M., Lamari, F., Jardel, C., Leclercq,D., Colliot, O., Lehricy, S., Dubois, B., Sarazin, M., 2012.CSF tau markers are correlated with hippocampal volumein Alzheimer’s disease. Neurobiology of Aging 33, 1253 –1257.

Tustison, N.J., Avants, B.B., Cook, P.A., Zheng, Y., Egan,A., Yushkevich, P.A., Gee, J.C., 2010. N4ITK: ImprovedN3 bias correction. IEEE Transactions on Medical Imag-ing 29, 1310–1320.

Venkatraghavan, V., Bron, E.E., Niessen, W.J., Klein,S., 2017. A discriminative event based model forAlzheimer’s disease progression modeling, in: Nietham-mer, M., Styner, M., Aylward, S., Zhu, H., Oguz, I., Yap,P.T., Shen, D. (Eds.), Information Processing in Med-ical Imaging, Springer International Publishing, Cham.pp. 121–133.

Young, A.L., Marinescu, R.V.V., Oxtoby, N.P., Bocchetta,M., Yong, K., Firth, N., Cash, D.M., Thomas, D.L., Dick,K.M., Cardoso, J., van Swieten, J., Borroni, B., Gal-imberti, D., Masellis, M., Tartaglia, M.C., Rowe, J.B.,Graff, C., Tagliavini, F., Frisoni, G., Jr., R.L., Finger,E., Medona, A., Sorbi, S., Warren, J.D., Crutch, S., Fox,N.C., Ourselin, S., Schott, J.M., Rohrer, J.D., Alexander,D.C., GENFI, ADNI, 2017. Uncovering the heterogene-ity and temporal complexity of neurodegenerative diseaseswith subtype and stage inference. bioRxiv .

Young, A.L., Oxtoby, N.P., Daga, P., Cash, D.M., Fox,N.C., Ourselin, S., Schott, J.M., Alexander, D.C., 2014.A data-driven model of biomarker changes in sporadicAlzheimer’s disease. Brain 137, 2564–2577.

Young, A.L., Oxtoby, N.P., Huang, J., Marinescu, R.V.,Daga, P., Cash, D.M., Fox, N.C., Ourselin, S., Schott,J.M., Alexander, D.C., 2015a. Multiple orderings ofevents in disease progression. Springer International Pub-lishing, Cham. pp. 711–722.

Young, A.L., Oxtoby, N.P., Ourselin, S., Schott, J.M.,Alexander, D.C., 2015b. A simulation system forbiomarker evolution in neurodegenerative disease. Medi-cal Image Analysis 26, 47 – 56.

20

arxiv:1808.03604v1 [cs.lg] 10 aug 2018 · 2019-08-15 · disease progression timeline estimation...

Documents