prediction of the hemoglobin level in hemodialysis patients using machine learning techniques

c o m p u t e r m e t h o d s a n d p r o g r a m s i n b i o m e d i c i n e 1 1 7 ( 2 0 1 4 ) 208–217

j o ur na l ho me pag e: www.int l .e lsev ierhea l th .com/ journa ls /cmpb

Prediction of the hemoglobin level in hemodialysispatients using machine learning techniques

José M. Martínez-Martíneza,∗, Pablo Escandell-Monteroa, Carlo Barbieri b,Emilio Soria-Olivasa, Flavio Marib, Marcelino Martínez-Sobera,Claudia Amatob, Antonio J. Serrano Lópeza, Marcello Bassib,Rafael Magdalena-Beneditoa, Andrea Stopperb, José D. Martín-Guerreroa,Emanuele Gatti b,c

a IDAL, Intelligent Data Analysis Laboratory, University of Valencia, Electronic Engineering Department, Av de laUniversidad, s/n, Burjassot, 46100 Valencia, Spainb Fresenius Medical Care, Bad Homburg, Germanyc Department of Clinical Medicine and Biotechnology, Danube University Krems, Austria

a r t i c l e i n f o

Article history:

Received 2 December 2013

Received in revised form 2 July 2014

Accepted 4 July 2014

Keywords:

Prediction

Hemoglobin

Chronic renal failure

Hemodialysis

a b s t r a c t

Patients who suffer from chronic renal failure (CRF) tend to suffer from an associated ane-

mia as well. Therefore, it is essential to know the hemoglobin (Hb) levels in these patients.

The aim of this paper is to predict the hemoglobin (Hb) value using a database of Euro-

pean hemodialysis patients provided by Fresenius Medical Care (FMC) for improving the

treatment of this kind of patients. For the prediction of Hb, both analytical measurements

and medication dosage of patients suffering from chronic renal failure (CRF) are used. Two

kinds of models were trained, global and local models. In the case of local models, cluster-

ing techniques based on hierarchical approaches and the adaptive resonance theory (ART)

were used as a first step, and then, a different predictor was used for each obtained cluster.

Different global models have been applied to the dataset such as Linear Models, Artificial

Machine learning Neural Networks (ANNs), Support Vector Machines (SVM) and Regression Trees among oth-

ers. Also a relevance analysis has been carried out for each predictor model, thus finding

those features that are most relevant for the given prediction.

© 2014 Elsevier Ireland Ltd. All rights reserved.

(ESFs) such as thrombo embolisms and vascular problems [1],

1. Introduction

Patients who suffer from chronic renal failure (CRF) tendto suffer from an associated anemia, as well. Hemodialysis
is the most common treatment for patients with end-stagerenal disease (ESRD), an illness that has reached an impor-tant social and economic impact. Currently, erythropoietin
∗ Corresponding author. Tel.: +34 963543421E-mail address: [email protected] (J.M. Martínez-Martínez

http://dx.doi.org/10.1016/j.cmpb.2014.07.0010169-2607/© 2014 Elsevier Ireland Ltd. All rights reserved.

(EPO) is the treatment of choice for this kind of anemia. Theuse of this drug has greatly reduced cardiovascular problemsand the necessity of multiple transfusions. There are signif-icant risks associated with erythropoietic stimulating factors

) .

if hemoglobin (Hb) levels are too high or they increase too fast(they might be very oscillating from one month to another).Because of this, to predict the hemoglobin value from both

dx.doi.org/10.1016/j.cmpb.2014.07.001

www.intl.elsevierhealth.com/journals/cmpb

http://crossmark.crossref.org/dialog/?doi=10.1016/j.cmpb.2014.07.001&domain=pdf

mailto:[email protected]

dx.doi.org/10.1016/j.cmpb.2014.07.001

i n b i

aihcitctap

QpttaacgNto[mccdtehd

fmhp

1

2

orvtpSgw

c o m p u t e r m e t h o d s a n d p r o g r a m s

nalytical measurements and medication dosage is of greatnterest [2–4]. Thus, it is important to have a predictor ofemoglobin for next month because the prescription of EPOan control the level of Hb in a future month. For example, ift is known that a patient will present very high levels of Hb inhe next month if a given dose of EPO is administered, then itan be corrected by a modified prescription of EPO that keepshe patient in appropriate levels. This problematic presentsn important field of research since it is a very challengingroblem [5–10].

The National Kidney Foundation Kidney Disease Outcomesuality Initiative (NKF-K/DOQI) guidelines recommend thatatients with chronic kidney disease (CKD) should maintain aarget hemoglobin (Hb) concentration between 11 and 12 g/dl;he upper limit is specified to 13 g/dl since none of the tri-ls have shown a benefit of higher Hb targets [11]. Effectivenemia management is complex and expensive. Although theommon procedure is to follow the guidelines published byovernment agencies and international organizations (mainlyKF-K/DOQI), patient’s hemoglobin levels oscillate through

he target range during the treatment and only approximatelyne-third (38%) are within the target range at any given time

12]. This behavior is not surprising given the economic andedical pressures to avoid falling below or exceeding spe-

ific levels, and due to the fact that the response to ESAsan change over time in the same patient or can be veryifferent among different patients [13]. In the latter case,he so-called EPO-resistant patients are the most remarkablexample. According to the literature, about 5–10% of patientsave either a blunted or absent response to ESAs, despite high-ose therapy [14].

The goal of this work is to predict the hemoglobin level,or the next month from both analytical measurements and

edication dosage (such as EPO), in patients undergoingemodialysis using machine learning techniques. For this pur-ose, two main approaches have been proposed:

. Global models: This kind of models propose a unique solu-tion (model) for the whole cohort of patients.

. Local models: Local models propose different models fordifferent patient profiles. The approach is based on firstobtaining relevant profiles by means of clustering tech-niques, and after that, apply a different model for eachprofile. Therefore, models are focused on sets formed bysimilar patients; hence they are more homogeneous: localmodels tend to be simpler than global ones.

The organization of the paper is as follows. Section 2 pointsut some aspects of the data used in this work. Section 2.2eviews the inclusion criteria and the processing of missingalues. Section 3 presents the clustering approaches used ashe first step in local Hb predictors. Section 4 presents the
rediction models that were used both globally and locally.ection 5 discusses the achieved results. Finally, Section 6ives some concluding remarks, and suggestions for futureork.
o m e d i c i n e 1 1 7 ( 2 0 1 4 ) 208–217 209

2. Patient data

The analysis presented in this paper is based on the data ofan anemia database, which contains long-term records of thepatients undergoing hemodialysis in the Fresenius MedicalCare (FMC) clinics, collected with EuCliD system [15]. The Clin-ical Management System EuCliD (European Clinical Database)is an electronic system designed by FMC company used tomanage dialysis clinics’ processes and to collect all relevantdata related to dialysis clinical practice. EuCliD deploymentstarted in 2004 with the first pilot clinic in Italy; currently thesystem is implemented in 626 clinics belonging to 25 countriesall over the world. Using this system, all the data of patientstreated in FMC clinics settled all over the world are storedin a database that is property of FMC company. In particular,this paper presents the results for Spanish and Italian clinics.Spain and Italy were selected by FMC as two good representa-tives of their clinics since there is a long experience of FMC inboth countries. Moreover, it should also be emphasized thatsince Italy and Spain showed different characteristics, it alsoallowed to work on two different scenarios which are repre-sentative of the rest of countries.

2.1. Data merging

For each patient from the data base, both analytical measure-ments and medication dosage are recorded at periodic timeintervals. For the sake of reducing computational complexity,all the data from this huge database has been merged to asingle Matlab numerical data matrix for each country understudy, with one row (record) per single unit of data (either ablood test or a medication dosage), and as many rows as typesof information (variables) are available on the database. After-wards, this data is merged at hemoglobin sample intervals,as the main goal of the study is the modeling and predictionof the hemoglobin levels at the sampled instants Data merg-ing. It is the process of arranging the records according to theintervals between hemoglobin samples into a new dataset.This tasks generates a new dataset merging all the recordsbetween each two consecutive hemoglobin samples into a sin-gle record. Due to the process of joining all the informationpresent in the original database into a single table, each record(row of the data matrix) contains a single piece of informationof a certain patient (it can be a drug prescription, a labora-tory test result, etc.). Next step after data pre-processing is toarrange the different records of each patient according to thehemoglobin samples it contains. Taking the hemoglobin sam-ple times as reference, we have joined all the records betweenHb(t) and Hb(t+1), being t and t+1 the times correspondingwith two consecutive hemoglobin samples monitored. Thoserecords are related with the dose of Iron and EPO the patientreceives during dialysis, and the results of the laboratory mea-surements realized to him. All this information is stored in asingle record Hb(t) which will contain the accumulated dose of
Iron and EPO, and the summary of all the laboratory measure-ments, carried out over the patient during this period. Theidea on the prediction stage is to predict the hemoglobin untiltime t + 1 using information related to the state of the patient in
dx.doi.org/10.1016/j.cmpb.2014.07.001

210 c o m p u t e r m e t h o d s a n d p r o g r a m s i n b i o m e d i c i n e 1 1 7 ( 2 0 1 4 ) 208–217

Fig. 1 – All the records between two hemoglobin values Hb(t − 1) and Hb(t) are merged and the information is included in a
single record Hb(t − 1).
time t plus the treatment received between t and t + 1. Fig. 1 showsgraphically the merging process.

Merging process consists of finding all the differentpatients on the database, taking all the records that belongto each patient and sorting them according to the date of therecord (laboratory test, drug administration, treatment, . . .).Then, hemoglobin values are used as reference for generatingthe new dataset. Next step is to extract all the records betweentwo consecutive hemoglobin samples (the merged records sub-set). The process finally merges all these records into a singlerow and generates some additional variables. For instance,medication column is separated in two columns, one for EPOand another one for iron, accumulated dose and number oradministration are also calculated and stored for model devel-opment. This task is carried out iteratively for all the variableswithin this subset of records.

When merging process is concluded, a new matrix isobtained. The number of rows of this matrix equals the num-ber of hemoglobin values in the database, and the number ofcolumns (number of variables) is 125, which will be reducedfor training the models. Table 1 shows the number of recordsand patients for each country dataset.

In order to obtain useful models, cross validation methodwas carried out. One round of cross-validation involves par-titioning a sample of data into complementary subsets,performing the analysis on one subset (called the training set),and validating the analysis on the other subset (called the val-idation set or testing set). Since a particular lucky (unlucky)choice of the two subsets may be responsible of very good(bad) results in terms of the committed error, different trainingand test datasets were used to test that models’ performance.That is, multiple rounds of cross-validation are performedusing different partitions to reduce variability, and the vali-dation results are averaged over the rounds. For this reason,as stated above, many different training and test datasets weregenerated, in order to avoid the effects of the random distri-
bution of records. After carrying out each different partitionof the whole dataset into training and test data, each subsetwas standardized to zero mean and unit variance. Training
Table 1 – Information of imported files.

Country Records Patients

Italy (IT) 1,094,597 2764Spain (SP) 2,432,262 10,247

datasets are formed by 66% of the records, and test datasetsby the remaining 34%.

2.2. Inclusion criteria and missing values

Several inclusion criteria were applied to remove all the incon-sistent records. Table 2 indicates the ranges of accepted values.Patients with values outside those intervals were removedfrom the database. Variables not included in this table werenot affected by the inclusion criteria.

An additional condition was related to the minimum num-ber of patients’ records needed to be considered in the finaldataset. This value was fixed to six, because when generatingdatasets for prediction the dynamics of the system is takeninto account by including the values of some variables in theprevious periods. Six was a reasonable figure for this purpose.Another condition was related to Age. Only patients older than18 years were included in the study.

In addition to the inclusion criteria included in Table 2,another extra conditions were applied:

• Patients who presented an increase of weight during dialy-sis were removed, because obviously either no dialysis wascarried out or a mistake in the data acquisition was com-mitted.

• Patients with a variation of hemoglobin higher than 7 g/dlbetween two consecutive Hb samples were excluded.

After applying the conditions above mentioned, a series ofroutine actions were performed in order to solve some tech-nical problems for the forecasting algorithms. An importantproblem deals with the number of hollows present in the datamatrix associated with measured values in patients’ monitor-ing. In general, empty values were estimated from values inadjacent records or interpolated using mean or median values– depending on the type of variable. If a variable is empty inall the records of the patient, this patient was removed fromthe database.

3. Clustering for local models

The obvious heterogeneity shown in Chronic Renal Failure(CRF) patients in terms of their response to the anemia treat-ment suggests that clustering methods can help find differentpatient profiles. In this paper, two clustering approaches were

dx.doi.org/10.1016/j.cmpb.2014.07.001

c o m p u t e r m e t h o d s a n d p r o g r a m s i n b i o m e d i c i n e 1 1 7 ( 2 0 1 4 ) 208–217 211

Table 2 – Inclusion criteria. Values outside [minimum value, maximum value] were excluded.

Variable Min. value Max. value Units

Patient age 18 90 yearsUrea plasma level before the dialysis treatment (predialysis) 0.6 700 mg/mlUrea plasma level after the dialysis treatment (postdialysis) 0.6 300 mg/mlHemoglobin 2 24 g/dlFerritin 0 5000 ng/dlTransferrin Saturation Index (marker of iron utilization) 0 100 %Albumin plasma level 1 10 g/dlC-Reactive Protein plasma level (one of the main markers of

inflammation)0 500 mg/dl

Phosphate 1 16 g/dlLeukocytes 300 100,000 cells/mlNeutrophils (subtype of blood white cells; neutrophil high plasma

level is a marker of inflammation/infection)10 99 %

OcmKtv (Kt/V measured by the Online Clearance Monitoring system) 0 3Mean Blood flow (Blood Volume/Effective Dialysis Time) 0 1000 ml/min

ah

piwgug

ttHtit

1

2

3

4

t

Blood Volume (Total volume of processed blood)

Total Infusion

Accumulated EPO Darbe dose

pplied, namely, Adaptive Resonance Theory (ART) [16] andierarchical clustering [17].

Although it is out of the scope of this paper finding patientrofiles is also useful to detect anomalous situations, e.g., high

ncrease in Hb levels without any changes in EPO dosages asell as more usual situations like EPO-resistant patients or

ood responders; even though they are relatively usual sit-ations, a prediction/treatment for EPO resistant patients orood responders must be obviously completely different.

Four different kinds of data have been taken into accounto produce clusters for each country (Spain and Italy). Sincehe paramount interest is to profile patients according to theirb response to EPO, not all the features were used in the clus-

ering process1; thus, the obtained clusters can be more easilynterpreted analyzing the values of the prototypes. Therefore,he following kinds of data were used:

. Current value of Hb and the four previous values of Hb (itwas selected the four previous values of Hb, correspondingto the four previous months, because the lifespan of normalred blood cells is from 100 to 120 days).

. The same but also including EPO dosages in the consideredperiod of time (until four previous values of Hb); that willallow to detect not only performance in terms of the degreeof anemia but also new profiles since relationships betweenHb and EPO can show EPO-resistant profiles, patients whodo respond well to the treatment, etc.

. Current value of Hb and time derivatives2 of the differencesbetween consecutive values of Hb in the four previous sam-ples. The purpose of this approach was to find out whetherworking with Hb changes could make a difference provid-ing new profiles not found when dealing with raw valuesof Hb.

. The same but including EPO dosages in the consideredperiod of time.

1 Only features related with EPO and Hb evolution have beenaken into account2 Derivative of Hb level with respect to time.

0 300 l5 120 l0 4000 �g/month

Local models were obtained by training the models detailedin next section only taking into account those records includedin each cluster. That is, as many models as clusters were built.In general, the number of clusters for each country and dataused was between five and eight. In hierarchical clustering, thenumber of clusters were obtained using two criteria. On onehand the experts’ opinion (the medical board suggested var-ious choices about the number of clusters), and on the otherhand the “lifetime” approach in the dendrogram. This is anintuitive approach, which consists on searching in the prox-imity dendrogram for clusters that have a large lifetime. Thelifetime of a cluster is defined as the absolute value of the dif-ference between the proximity level at which it is created andthe proximity level at which it is absorbed into a larger clus-ter. It is worth mentioning that when using ART networks itis not necessary to choose the number of clusters in advance,but the network finds the number corresponding to the degreeof similarity chosen as explained in Section 3.2. Therefore, thenumber of clusters is not extracted by intuition or by visualiz-ing the dendrogram with this technique, so that the clusteringobtained can represent better the data than when using thehierarchical clustering method at certain times since it is notbiased with the subjectivity of the expert.

The membership of an input pattern was found by calcu-lating the distance from itself to the clusters centroids. Onceknown which cluster the input pattern belonged to, the modelcorresponding with such cluster was used (model trained withdata belonging to the cluster). As previously mentioned, itshould be emphasized that the inputs of the clustering algo-rithm were not all patient records, but only those containingjust five consecutive hemoglobin samples. This fact meansthat a particular patient can show different behaviors (thatis, belong to different clusters) at different time intervals.

3.1. Hierarchical clustering

Hierarchical Clustering Algorithms (HCA) are characterized,among other things, because they do not generate a sin-gle clustering but produce a set of hierarchically structuredclustering (one in each hierarchical step) [18]. In hierarchical

dx.doi.org/10.1016/j.cmpb.2014.07.001

s i n
212 c o m p u t e r m e t h o d s a n d p r o g r a m
clustering, m different partitions of the input data are gener-ated into clusters, where m is the number of objects in theinput data. One of these partitions corresponds to a singlecluster made up of all m objects of the input data, while at theopposite extreme there is a partition corresponding to m clus-ters, each made up of just one object. Between these extremesthere is a partition with 2 clusters, one with 3 clusters, and soon up to a partition with m − 1 clusters. The key characteristicof these partitions, which makes them hierarchical, is that thepartition with r clusters can be used to produce the partitionwith r − 1 clusters by merging two clusters, and it can also beused to produce the partition with r + 1 clusters by splitting acluster into two [18].

3.2. Adaptive Resonance Theory (ART)

The Adaptive Resonance Theory (ART) proposes an approachto deal with the stability–plasticity dilemma [16]. ART operatesas a two-stage process. Each time a pattern is presented, anappropriate cluster unit is chosen, and that cluster’s weightsare adjusted to let the cluster unit learn the pattern. Theweights on a cluster unit are considered to be a prototype forthe patterns assigned to that cluster. The second and crucialstage of the recognition process is to test whether the proto-type forms an adequate representation of the input pattern.Once a good-enough winning prototype has been selected, theprocess is referred to a vigilance test. From this, either the pro-totype is updated to form a running average of the input vector,or a new prototype is initiated. ART networks allow the algo-rithm’s designer to control the degree of similarity of patternsplaced on the same cluster; once this choice is done, it is notnecessary to choose the number of clusters in advance, butthe network finds the number corresponding to the degree ofsimilarity chosen.

4. Prediction models

The clusters obtained by hierarchical clustering and ARTwere used to model the problem locally, that is, the modelsexplained in this section have been applied separately to eachone of the clusters. Moreover, prediction models were alsoused as global models.

4.1. Linear model

A multiple linear regression is defined, for a kth dimen-sional data sample xi, yi, (i = 1, 2 . . . , n; being n the number ofinstances), by

yi = b0 + b1 · xi1 + b2 · xi2 + · · · + bk · xik (1)

or, equivalently, in more compact matrix terms,

Y = X · b + E (2)

where for all the n observations considered, Y is a column vec-tor with n rows containing the values of the response variable;X is a matrix with n rows and k + 1 columns containing foreach column the values of the explanatory variables for the n

b i o m e d i c i n e 1 1 7 ( 2 0 1 4 ) 208–217

observations, plus a column (to refer to the intercept) contain-ing n values equal to 1; b is a vector with k + 1 rows containingall the model parameters to be estimated on the basis of thedata (the intercept and the k slope coefficients relative to eachexplanatory variable); and E is a column vector of length n con-taining the error terms. It is necessary to estimate the vector ofthe parameters (b0, b1, . . ., bk) on the basis of the available data.

4.2. Regression Trees

Tree models begin by producing a classification of observa-tions into groups and then obtaining a score for each group[19,20]. Tree models are usually divided into Regression Trees,when the response variable is continuous, and classificationtrees, when the response variable is quantitative discrete orqualitative (categorical) [21]. Tree models can be defined as arecursive procedure, through which a set of n statistical unitsare progressively divided into groups, according to a divisionrule that aims to maximize a homogeneity or purity measureof the response variable in each of the obtained groups. Ateach step of the procedure, a division rule is specified by thechoice of an explanatory variable to split and the choice of asplitting rule for the variable, which establishes how to parti-tion the observations. The main result of a tree model is a finalpartition of the observations. To achieve this, it is necessary tospecify stopping criteria for the division process. The outputof the analysis is usually represented as a tree. This impliesthat the partition performed at a certain level is influenced bythe previous choices. The two main aspects are the divisioncriteria and the methods employed to reduce the dimensionof the tree (pruning).

4.3. Bagging

Bootstrap aggregation, or bagging, is a technique [22] that canbe used with many classification and regression methods toreduce the variance associated with prediction, and therebyimprove the prediction process. Bagging is the idea of col-lecting a random sample of observations into a bag (thoughthe term itself is an abbreviation of bootstrap aggregation).Multiple bags are made up of randomly selected observationsobtained from the original observations from the trainingdataset. Many bootstrap samples are drawn from the avail-able data, some prediction method is applied to each bootstrapsample, and then the results are combined, by averaging forregression and simple voting for classification, to obtain theoverall prediction, with the variance being reduced due tothe averaging. In each of these sets, some examples are notchosen, and some are duplicated. On average, each set con-tains about 63% of the original examples in our experiments.Bagging works best with unstable learners, that is those thatproduce different generalization patterns with small changesto the training data. Averaging over a collection of fitted valuescan help compensate for over-fitting [23].

4.4. Multilayer Perceptron (MLP)

A Multilayer Perceptron, MLP, is an Artificial Neural NetworkANN, formed by elementary processing units, the so-calledneurons. A typical neuron model is shown in Fig. 2 [24].

dx.doi.org/10.1016/j.cmpb.2014.07.001

c o m p u t e r m e t h o d s a n d p r o g r a m s i n b i

Fig. 2 – Diagram of an artificial neuron.

aTflactl(

wT[

4

SacosIaiia

4

AiI

and Italy datasets. Only a subset of the most relevant ones

As it can be inferred from Fig. 2, a neuron without anctivation function is equivalent to a multi-variant analysis.herefore, a non-linear combination should be more power-

ul than a multi-variant analysis [25]. Neurons are arranged inayers to form an MLP. The first layer is known as input layer,nd the last one is called output layer. All the other layers arealled hidden layers [26]. This kind of arrangement enableshe neuron outputs to be used as inputs to neurons of fol-owing layers (non-recurrent network) and/or previous layersrecurrent networks). Fig. 3 shows a typical MLP structure.

It should be taken into account the capability of this model,hich is able to establish any relation between two datasets.his fact is mathematically proven by the Cybenko’s Theorem

24].

.5. Support Vector Machines (SVM)

upport Vector Machines (SVM) is another model that acts as non-probabilistic binary linear classifier [27]. SVM works byonstructing a hyperplane or set of hyperplanes in a high-r infinite-dimensional space, which can produce a linearlyeparable solution to a classification or regression problem.n the same manner as the non-linear SVM for classificationpproach, a non-linear mapping can be used to map the datanto a high dimensional feature space where linear regressions performed [28]. The kernel approach is also employed toddress the curse of dimensionality.

.6. Committee of experts

committee is an ensemble method that consists in tak-ng a combination of several models to form a new model.n the case of a linear combination, the committee learning

Fig. 3 – Scheme of a Mu

o m e d i c i n e 1 1 7 ( 2 0 1 4 ) 208–217 213

algorithm tries to train a set of models {s1, . . ., sP} and choosecoefficients {ˇ1, . . ., ˇP} to combine them as

y(xi) =P∑

k=1

ˇksk(xi) = sTi ˇ, (3)

where si = [si(xi), . . ., sP(xi)]T are the predictions of each com-

mittee member on xi.In this work, two methods have been used to compute the

coefficients that combine the committee members. The firstmethod uses least squares regression, and the second oneuses quantile regression. The quantile regression [29] seeks tomodel the relationship between the input (x) and output (y) fordifferent quantiles of the distribution of the error committedby the model. Whereas the method of least squares minimizesthe sum of squared errors committed by the model.

5. Results

This section summarizes the achieved results in order to com-pare the performance of the models previously explained.

5.1. Feature relevance

This section presents an analysis of the most relevant fea-tures. Due to the high number of variables available in thedatasets, it is probable that some of them may be irrelevantor redundant to predict Hb level. Adding irrelevant attributesto a dataset often confuses machine learning systems, thusbecoming noise instead of useful information. Moreover, itentails worst generalization and a larger number of patternsis needed to fit the model. Therefore, the data must be prepro-cessed to select a subset of variables before apply any learningmethod.

Two techniques have been applied in order to discriminatethe irrelevant features. On the one hand, statistically signifi-cant features in a linear model were found. This set of featureswas used for training the linear model. On the other hand, asensitivity analysis was performed using a Multilayer Percep-tron (MLP) model. This latter selection of features was thenused for the rest of models presented in this paper. Table A.1in Appendix A shows the results of both techniques for Spain

(30 variables) has been used for the model developing. Forthe sake of simplicity, only the most 10 relevant variables arepresented in Table A.1 sorted by relevance (most relevant on

ltilayer Perceptron.

dx.doi.org/10.1016/j.cmpb.2014.07.001


Table 3 – Summary results (test dataset) for patients belonging to Italian clinics. Models are grouped in different blocksdepending on the approach used. They are sorted with the best (lowest) MAE first in each block. MAE stands for ‘MeanAbsolute Error’, RMSE for ‘Root Mean Square Error’, and Q25, Q50 and Q75 for the 25%, 50% and 75% quantiles of the errordistribution, respectively. The standard deviation of the whole Italy dataset (training and test) is 1.379.

Model MAE RMSE Q25 Q50 Q75

Global modelsBagging 0.662 0.883 −0.466 0.045 0.556SVM* 0.665 0.888 −0.488 0.036 0.558Quantiles* 0.669 0.891 −0.476 0.050 0.566MLP* 0.672 0.897 −0.461 0.047 0.569Random Foresta 0.690 0.922 −0.485 0.048 0.574Regression Tree 0.701 0.957 −0.512 0.034 0.578

Models based on ARTBagging Hb derivative 0.676 0.906 −0.458 0.050 0.583Bagging Hb 0.677 0.905 −0.467 0.056 0.584Random Forest Hb 0.689 0.915 −0.483 0.054 0.592Random Forest Hb der 0.689 0.915 −0.483 0.054 0.592Linear Hb derivative* 0.690 0.922 −0.485 0.048 0.574Linear Hb* 0.695 0.927 −0.487 0.062 0.591SVM Hb* 0.711 0.955 −0.533 0.014 0.558Regression Tree Hb derivative 0.713 0.949 −0.511 0.067 0.609

Models based on hierarchical clusteringTree Bagging Hb 0.667 0.895 −0.481 0.043 0.545Tree Bagging Hb der 0.669 0.891 −0.445 0.077 0.600Tree Bagging Hb EPO der 0.688 0.912 −0.469 0.060 0.605SVM Hb derivative* 0.689 0.915 −0.483 0.054 0.592SVM Hb* 0.689 0.916 −0.461 0.078 0.617Tree Bagging Hb EPO 0.692 0.917 −0.519 0.031 0.562SVM Hb EPO* 0.697 0.923 −0.476 0.077 0.617Linear Hb EPO* 0.701 0.932 −0.495 0.062 0.603Linear Hb derivative* 0.701 0.941 −0.547 0.025 0.557Linear Hb* 0.702 0.936 −0.507 0.043 0.584

Other modelsCommittee experts MSEb 0.687 0.913 −0.494 0.053 0.592Committee experts quantilesa 0.691 0.922 −0.517 0.030 0.572Regression Tree with SVM* 0.695 0.931 −0.479 0.057 0.581

Bold highlights the best model so each group of models.a ee de

ed on

Random forests are a combination of tree predictors so that each trwith the same distribution for all trees in the forest [30].

b The models marked with * have been used to build the models bas

the top). Table B.1 in Appendix B lists the meaning of the vari-ables included in Table A.1.

5.2. Comparison of models’ performance

Due to the many different approaches that have been tested,and in order to present the results in a compact format thatmakes easy to compare the performance of the differentapproaches, Tables 3 and 4 and summarize the performanceof the prediction carried out by the best models for Italy andSpain, respectively. Results refer to the corresponding testsets. The name of each model (in the case of local mod-els) consists of two terms separated by a low hyphen. Thefirst concerns the used prediction model and the second thetype of clustering used to the local prediction. When theterm is Hb, only the current value of Hb and the four pre-vious values of Hb are taken into account for the clustering(1st type in Section 3). When it is EPO, the same variables
as in previous case are taken into account but also includ-ing EPO dosages in the considered period of time (until fourprevious values of Hb) (2nd type in Section 3). When the
pends on the values of a random vector sampled independently and

committee of experts (MSE and quantiles).

term is Hb derivative, it is taken into account the currentvalue of Hb and time derivatives (Derivative of Hb level withrespect to time) of the differences between consecutive val-ues of Hb in the four previous samples (3rd type in Section 3).Finally, when the term is Hb EPO, the same variables as inthe previous case are taken into account as well as the EPOdosages in the considered period of time (4th type in Sec-tion 3).

As shown in Tables 3 and 4, many different approacheshave been tested, with different performances, but there isnot a model that clearly outperforms the others. It should benoted that the error distribution of the quantiles Q25 and Q75is between −0.5 and 0.5, so that MAE greater than 0.6 is due toa small number of patterns (outliers) with large error. Predic-tion errors in the test cohorts of patients were around 0.6 g/dl.Moreover, it was though that local models would present asignificant improvement, regarding the committed error, butfinally this assumption was refuted. This could be because it
has been exploited this dataset obtaining the maximum per-formance, hence, there is not a model that clearly outperformsthe others.
dx.doi.org/10.1016/j.cmpb.2014.07.001

c o m p u t e r m e t h o d s a n d p r o g r a m s i n b i o m e d i c i n e 1 1 7 ( 2 0 1 4 ) 208–217 215

Table 4 – Summary results (test dataset) for patients belonging to Spanish clinics. Models are sorted with the best (lowest)MAE first. MAE stands for ‘Mean Absolute Error’, RMSE for ‘Root Mean Square Error’, and Q25, Q50 and Q75 for the 25%,50% and 75% quantiles, respectively. The standard deviation of the whole Spain dataset (training and test) is 1.445.

Model MAE RMSE Q25 Q50 Q75

Global modelsBagging 0.673 0.905 −0.525 0.018 0.513SVM* 0.677 0.915 −0.555 −0.028 0.473Quantiles 0.680 0.918 −0.512 0.021 0.516Linear* 0.688 0.925 −0.562 −0.016 0.487MLP* 0.698 0.936 −0.545 0.009 0.520Random Forest 0.700 0.941 −0.493 0.053 0.578Regression Tree* 0.734 0.981 −0.556 0.024 0.591

Models based on ARTBagging Hb derivative 0.686 0.920 −0.519 0.023 0.530Bagging Hb 0.691 0.927 −0.529 0.022 0.540Linear Hb derivative* 0.699 0.941 −0.524 0.014 0.541SVM Hb 0.700 0.941 −0.493 0.053 0.578Linear Hb* 0.706 0.945 −0.532 0.022 0.538Random Forest Hb 0.708 0.948 −0.502 0.049 0.580Random Forest Hb der 0.708 0.948 −0.502 0.049 0.580Regression Tree Hb derivative 0.727 0.975 −0.536 0.006 0.570

Models based on hierarchical clusteringTree Bagging Hb der 0.688 0.925 −0.477 0.062 0.566Tree Bagging Hb 0.692 0.929 −0.498 0.029 0.552Tree Bagging Hb EPO der 0.698 0.943 −0.499 0.035 0.561Tree Bagging Hb EPO 0.703 0.943 −0.527 0.016 0.547SVM Hb derivative* 0.708 0.948 −0.502 0.049 0.580SVM Hb* 0.711 0.955 −0.539 0.024 0.530SVM Hb EPO* 0.714 0.955 −0.516 0.049 0.563Linear Hb* 0.722 0.973 −0.563 0.001 0.544Linear Hb EPO* 0.730 0.982 −0.512 0.059 0.591Linear Hb derivative* 0.734 0.991 −0.560 0.022 0.551

Other modelsCommittee experts MSE a 0.686 0.924 −0.518 0.007 0.514Committee experts quantiles 4 0.686 0.924 −0.518 0.007 0.514Regression Tree with Linear 0.695 0.933 −0.535 0.009 0.532Regression Tree with SVM* 0.702 0.941 −0.517 0.045 0.573Cluster ERI SVM b 0.704 0.942 −0.497 0.046 0.565Cluster ERI linear* a 0.705 0.946 −0.533 0.027 0.549

Bold highlights the best model so each group of models.ed onex (re

6

Tdh(bciomspidMta

a The models marked with * have been used to build the models basb These clusters have been obtained using the EPO Responsiveness Ind

. Conclusions

his paper has presented a summary of the main resultserived from the research on the database of Europeanemodialysis patients provided by Fresenius Medical Care

FMC). A set of many different prediction algorithms haseen applied to the dataset. Prediction errors in the testohorts of patients were around 0.6 g/dl. It should be takennto account that the Hb measure presents a systematic errorf up to 0.2 g/dl, which can be accumulated in consecutiveeasures; in particular, between two consecutive hemoglobin

amples, systematic error may be up to 0.4 g/dl. Therefore, theredictions can be considered at least as acceptable, tak-

ng into account the difficulty of obtaining lower errorsue to the precision of the measuring machine (0.2 g/dl).
oreover, it should be noted that the error distribu-
ion of the quantiles Q25 and Q75 is between −0.5 g/dlnd 0.5 g/dl,so that the MAE greater than 0.6 g/dl is

committee of experts (MSE and quantiles).quired EPO dosage divided by hemoglobin).

due to a small number of patterns (outliers) with largeerror.

Clustering techniques based on a hierarchical approachesand the Adaptive Resonance Theory (ART) have been usedas the first step in local Hb predictors, in which a differentpredictor was used for each cluster.

Moreover, a relevance analysis has been carried out for eachpredictor model, thus finding those features that are mostrelevant for the given prediction.

Conflict of interest

None declared.

Appendix A. Feature relevance

Table A.1.

dx.doi.org/10.1016/j.cmpb.2014.07.001


Table A.1 – Feature relevance analysis for Spain and Italy datasets using linear and neural models.

Spain Italy

Linear model Neural model Linear model Neural model

HbActual HbActual HbActual HbActualFerritin EPO3Actual Ferritin DryBodyWeight-AnteriorTrntPTransferrinSaturation DryBodyWeight WeightPre DryBodyWeightHbDelay1 DryBodyWeightAnteriorTrntP Albumin EPO3ActualHbDelay2 Inflamation Leukocytes InflamationModalityCode BloodVolume OcmKtv WeightedInflamationHbDelay3 N EPO3Actual EffectiveTime N EPO3ActualLeukocytes WeightedInflamation MeanBloodFlow IRON IVActualHbDelay4 MeanBloodFlow BloodVolume EPO3Delay3EPO3Delay4 EPO3Delay4 EPO3Delay3 N IRON IVActual

Appendix B. Description of variables

Table B.1 lists all the variables on the anemia database withtheir corresponding meanings.

Table B.1 – Description of the variables on the anemia database

HbActual Accumulated valueHbDelay1 Accumulated value

of hemoglobin (Hb(tHbDelay2 Accumulated value

value of hemoglobinHbDelay3 Accumulated value

of hemoglobin (Hb(tHbDelay4 Accumulated value

value of hemoglobinEPO3Actual Accumulated valueEPO3Delay4 Accumulated value

(EPO(t − 4))N EPO3Actual Number of EPO admFerritin Ferritin plasma leveTransferrinSaturation Transferrin SaturatiModalityCode Dialysis modality coLeukocytes Total white blood ceDryBodyWeight Weight that the patDryBodyWeightAnterior TrntP DryBody weight in pBloodVolume Total volume of proInflammation This variable is crea

Protein, Leukocytes ana possible inflamma

WeightedInflamation Ratio between the nnumber of non-empC-Reactive Protein, Le

MeanBloodFlow Automatically calcuWeightPre Actual patient weigAlbumin albumin plasma lev

OcmKtv Kt/V measured by tEffectiveTime Actual duration of tIRON IVActual Accumulated valueN IRON IVActual Number of intraven

.

Variable description

of Hemoglobin at the present instant (Hb(t)) of Hemoglobin at the first instant of time before the present value

− 1)) of Hemoglobin at the second instant of time before the present

(Hb(t − 2)) of Hemoglobin at the third instant of time before the present value

− 3)) of Hemoglobin at the fourth instant of time before the present

(Hb(t − 4)) of EPO at the present instant (EPO(t)) of EPO at the fourth instant of time before the present value of EPO

inistrationsl, marker of iron storageson Index, marker of iron utilizationdells count (immune system)

ient should reach after the dialysis treatment (target weight)revious dialysis

cessed bloodted from 4 variables related to inflammation (Albumin, C-Reactived Neutrophils) If any of the 4 variables falls in the range indicated astion, this variable is set to 1umber of variables that fall in the inflammation ranges and thety values of the four variables related to the inflammation Albumin,ukocytes and Neutrophils. This values ranges of 0–1lated as (Blood Volume/Effective Dialysis Time)ht before the dialysis treatment (pre-dialysis)el during the dialysis treatment

he OCM system (Online Clearance Monitoring)he dialysis treatment

of intravenous iron at the present instantous iron administrations

dx.doi.org/10.1016/j.cmpb.2014.07.001

i n b i

r

c o m p u t e r m e t h o d s a n d p r o g r a m s

e f e r e n c e s

[1] L. Lynne Peterson, FDA Oncologic Drugs AdvisoryCommittee (ODAC) meeting on the safety of erythropoietinin oncology, Trends Med. (May) (2004) 1–4.

[2] R. Bellazzi, Drug delivery optimization through Bayesiannetworks, in: Proceedings of the Annual Symposium onComputer Application in Medical Care, 1992, pp. 572–578.

[3] M. Brier, A. Gaweda, Predictive modeling for improvedanemia management in dialysis patients, Curr. Opin.Nephrol. Hypertens. 20 (6) (2011) 573–576.

[4] L. Gabutti, N. Lötscher, J. Bianda, C. Marone, G. Mombelli, M.Burnier, Would artificial neural networks implemented inclinical wards help nephrologists in predicting epoetinresponsiveness? BMC Nephrol. 7 (2006) 13, PMID: 16981983.

[5] J.D. Martín-Guerrero, F. Gomez, E. Soria-Olivas, J.Schmidhuber, M. Climente-Martí, N.V. Jiménez-Torres, Areinforcement learning approach for individualizingerythropoietin dosages in hemodialysis patients, ExpertSyst. Appl. 36 (6) (2009) 9737–9742.

[6] A. Gaweda, M. Muezzinoglu, G. Aronoff, A. Jacobs, J. Zurada,M. Brier, Individualization of pharmacological anemiamanagement using reinforcement learning, Neural Netw. 18(5/6) (2005) 826–834, cited By (since 1996) 16.

[7] J.S. Berns, H. Elzein, R.I. Lynn, S. Fishbane, I.S. Meisels, P.B.Deoreo, Hemoglobin variability in epoetin-treatedhemodialysis patients, Kidney Int. 64 (4) (2003) 1514–1521.

[8] S. Fishbane, J.S. Berns, Hemoglobin cycling in hemodialysispatients treated with recombinant human erythropoietin,Kidney Int. 68 (3) (2005) 1337–1343.

[9] E. Lacson, N. Ofsthun, J.M. Lazarus, Effect of variability inanemia management on hemoglobin outcomes in ESRD,Am. J. Kidney Dis. 41 (1) (2003) 111–124,http://dx.doi.org/10.1053/ajkd.2003.50030, PMID: 12500228.http://www.ncbi.nlm.nih.gov/pubmed/12500228

[10] I.C. Macdougall, P. Wilson, A. Roche, Impact of hemoglobinvariability in haemodialysis patients receivingerythropoiesis stimulating agents for the management ofrenal anaemia, J. Am. Soc. Nephrol. 16 (2005) 899–911.

[11] , in: KDOQI clinical practice guideline and clinical practicerecommendations for anemia in chronic kidney disease:2007 update of hemoglobin target, Am. J. Kidney Dis. 50 (3)
(2007) 471–530.
[12] E. Lacson, N. Ofsthun, J.M. Lazarus, in: Effect of variability inanemia management on hemoglobin outcomes in ESRD,Am. J. Kidney Dis. 41 (1) (2003) 111–124.

o m e d i c i n e 1 1 7 ( 2 0 1 4 ) 208–217 217

[13] M.E. Brier, A.E. Gaweda, A. Dailey, G.R. Aronoff, A.A. Jacobs,in: Randomized trial of model predictive control forimproved anemia management, Clin. J. Am. Soc. Nephrol. (5)(2010) 814–820.

[14] M. Kanbay, M.A. Perazella, B. Kasapoglu, M. Koroglu, A.Covic, in: Erythropoiesis stimulatory agent-resistant anemiain dialysis patients: review of causes and management,Blood Purif. 29 (1) (2010) 1–12.

[15] A. Stopper, C. Amato, S. Gioberge, G. Giordana, D. Marcelli, E.Gatti, in: Managing complexity at dialysis service centersacross Europe, Blood Purif. 25 (1) (2006) 77–89.

[16] G. Carpenter, S. Grossberg, in: The art of adaptive patternrecognition by a self-organizing neural network, Computer 1(3) (1988) 77–88.

[17] S.C. Johnson, in: Hierarchical clustering schemes,Psychometrika 32 (3) (1967) 241–254.

[18] S. Theodoridis, K. Koutroumbas, Pattern Recognition, 4thed., Academic Press, Burlington, MA, USA, 2008.

[19] L. Breiman, J. Friedman, C.J. Stone, R.A. Olshen,Classification and Regression Trees, 1st ed., Chapman andHall/CRC, Monterey, CA, USA, 1984.

[20] T. Hastie, R. Tibshirani, J.H. Friedman, The Elements ofStatistical Learning, corrected ed., Springer, New York, NY,USA, 2003.

[21] E. Alpaydin, Introduction to Machine Learning, 2nd ed., TheMIT Press, Boston, MA, USA, 2010.

[22] L. Breiman, in: Bagging predictors, Mach. Learn. 24 (1996)123–140.

[23] A. Webb, K. Copsey, Statistical Pattern Recognition, Wiley,Chichester, West Sussex, UK, 2011http://books.google.es/books?id=WpV9Xt-h3O0C

[24] S. Haykin, Neural Networks and Learning Machines, 3rd ed.,Pearson Education, Upper Saddle River, NJ, USA, 2009.

[25] B.D. Ripley, Pattern Recognition and Neural Networks, 1sted., Cambridge University Press, Cambridge, UK,1996.

[26] M.A. Arbib, The Handbook of Brain Theory and NeuralNetworks, 2nd ed., The MIT Press, Boston, MA, USA,2002.

[27] C. Cortes, V. Vapnik, in: Support-vector networks, Mach.Learn. 20 (1995) 273–297.

[28] A.J. Smola, B. Schölkopf, in: A tutorial on support vectorregression, Stat. Comput. 14 (3) (2004)199–222.

[29] R. Koenker, G. Bassett Jr., in: Regression quantiles,Econometrica 46 (1) (1978) 33–50.

[30] L. Breiman, in: Random forests, Mach. Learn. 45 (1) (2001)5–32.

dx.doi.org/10.1016/j.cmpb.2014.07.001

http://refhub.elsevier.com/S0169-2607(14)00256-9/sbref0005








































































































































































































































































































































































dx.doi.org/10.1053/ajkd.2003.50030

http://www.ncbi.nlm.nih.gov/pubmed/12500228
















































































































































































































































































































































































































































































http://books.google.es/books?id=WpV9Xt-h3O0C





































































































































































prediction of the hemoglobin level in hemodialysis patients using machine learning techniques

Documents