analysis of cytokine release assay data using machine learning approaches

15
Analysis of cytokine release assay data using machine learning approaches Feiyu Xiong a , Marco Janko a , Mindi Walker b , Dorie Makropoulos b , Daniel Weinstock b , Moshe Kam a , Leonid Hrebien c, a Department of Electrical and Computer Engineering, Drexel University, PA 19104, United States b Janssen Research & Development, LLC, Spring House, PA 19002, United States c Department of Electrical and Computer Engineering, Drexel University, 3141 Chestnut St, Philadelphia, PA 19104, United States abstract article info Article history: Received 21 February 2014 Received in revised form 1 July 2014 Accepted 21 July 2014 Available online 5 August 2014 Keywords: Cytokine Release Syndrome Machine learning Monoclonal antibodies The possible onset of Cytokine Release Syndrome (CRS) is an important consideration in the development of monoclonal antibody (mAb) therapeutics. In this study, several machine learning approaches are used to analyze CRS data. The analyzed data come from a human blood in vitro assay which was used to assess the potential of mAb-based therapeutics to produce cytokine release similar to that induced by Anti-CD28 superagonistic (Anti-CD28 SA) mAbs. The data contain 7 mAbs and two negative controls, a total of 423 samples coming from 44 donors. Three (3) machine learning approaches were applied in combination to observations obtained from that assay, namely (i) Hierarchical Cluster Analysis (HCA); (ii) Principal Component Analysis (PCA) followed by K-means clustering; and (iii) Decision Tree Classication (DTC). All three approaches were able to identify the treatment that caused the most severe cytokine response. HCA was able to provide information about the expected number of clusters in the data. PCA coupled with K-means clustering allowed classication of treatments sample by sample, and visualizing clusters of treatments. DTC models showed the relative importance of various cytokines such as IFN-γ, TNF-α and IL-10 to CRS. The use of these approaches in tandem provides better selection of parameters for one method based on outcomes from another, and an overall improved analysis of the data through complementary approaches. Moreover, the DTC analysis showed in addition that IL-17 may be correlated with CRS reactions, although this correlation has not yet been corroborated in the literature. © 2014 Elsevier B.V. All rights reserved. 1. Introduction Cytokine Release Syndrome (CRS) is an adverse event resulting in sys- temic release of cytokines in humans upon exposure to therapeutic mAbs. Cytokine release reactions in humans range from mild to severe, with symptoms such as fatigue, headache, urticaria and asthenia among others [1]. Some CRS reactions can be controlled by slowing the infusion rate of the mAb or by administering anti-inammatory drugs [2]. However, when applied to human volunteers in a 2006 phase I clinical trial, the Anti-CD28 SA mAb TGN 1412 caused unexpected serious adverse events; healthy volunteers developed severe CRS within 90 min of receiving a dose of Anti-CD28 SA [3]. These adverse events were not predicted by pre-clinical safety assessments using animal testing of the drugs [4]. Both non-human primates and rodents are relatively resistant to CRS. Although release of cytokines has been observed in animal models, rarely did it progress to clinically relevant levels [58]. Differences in expression of target molecules, regulatory T cells, cytokines required for inammatory response, and cell surface receptors among humans, rodents and non-human primates [914] all indicate that it may not be appropriate to use animal models to predict CRS in humans. To further understand CRS in humans, an in vitro assay using human whole blood was developed and tested by Walker et al. [2]. This assay was designed to support First-In-Human readiness, assessing the poten- tial for mAbs to release of cytokines similar to Anti-CD28 SA. The study reported in this paper has used this assay for further analysis of CRS using several machine learning approaches. The assay tests mAbs that have been immobilized on Protein-A- coated polystyrene beads and incubated with whole human blood cul- tures to stimulate cytokine release. Following incubation, supernatants were harvested and frozen for later multiplex analysis of cytokines using Searchlighttechnology. The assay was used by Walker et al. to compare the release of 11 cytokines 2 from Anti-CD28 SA (clone International Immunopharmacology 22 (2014) 465479 Corresponding author. Tel.: +1 215 895 6755. E-mail addresses: [email protected] (F. Xiong), [email protected] (M. Janko), [email protected] (M. Walker), [email protected] (D. Makropoulos), [email protected] (D. Weinstock), [email protected] (M. Kam), [email protected] (L. Hrebien). 2 12 cytokines were measured in the assay. However, Walker et al. [2] showed no statis- tically relevant difference in the concentration of IL-8 release produced by AutoPlasma and Anti-CD28 SA. Therefore, IL-8 was not considered further, leaving 11 cytokine levels to be analyzed. http://dx.doi.org/10.1016/j.intimp.2014.07.024 1567-5769/© 2014 Elsevier B.V. All rights reserved. Contents lists available at ScienceDirect International Immunopharmacology journal homepage: www.elsevier.com/locate/intimp

Upload: leonid

Post on 03-Feb-2017

221 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Analysis of cytokine release assay data using machine learning approaches

International Immunopharmacology 22 (2014) 465–479

Contents lists available at ScienceDirect

International Immunopharmacology

j ourna l homepage: www.e lsev ie r .com/ locate / in t imp

Analysis of cytokine release assay data using machinelearning approaches

Feiyu Xiong a, Marco Janko a, Mindi Walker b, Dorie Makropoulos b, Daniel Weinstock b,Moshe Kam a, Leonid Hrebien c,⁎a Department of Electrical and Computer Engineering, Drexel University, PA 19104, United Statesb Janssen Research & Development, LLC, Spring House, PA 19002, United Statesc Department of Electrical and Computer Engineering, Drexel University, 3141 Chestnut St, Philadelphia, PA 19104, United States

⁎ Corresponding author. Tel.: +1 215 895 6755.E-mail addresses: [email protected] (F. Xiong), maj79@

[email protected] (M. Walker), [email protected]@ITS.JNJ.COM (D. Weinstock), [email protected]@coe.drexel.edu (L. Hrebien).

http://dx.doi.org/10.1016/j.intimp.2014.07.0241567-5769/© 2014 Elsevier B.V. All rights reserved.

a b s t r a c t

a r t i c l e i n f o

Article history:Received 21 February 2014Received in revised form 1 July 2014Accepted 21 July 2014Available online 5 August 2014

Keywords:Cytokine Release SyndromeMachine learningMonoclonal antibodies

The possible onset of Cytokine Release Syndrome (CRS) is an important consideration in the development ofmonoclonal antibody (mAb) therapeutics. In this study, severalmachine learning approaches are used to analyzeCRS data. The analyzed data come from a human blood in vitro assay which was used to assess the potential ofmAb-based therapeutics to produce cytokine release similar to that induced by Anti-CD28 superagonistic(Anti-CD28 SA) mAbs. The data contain 7 mAbs and two negative controls, a total of 423 samples coming from44 donors. Three (3) machine learning approaches were applied in combination to observations obtained fromthat assay, namely (i) Hierarchical Cluster Analysis (HCA); (ii) Principal Component Analysis (PCA) followedby K-means clustering; and (iii) Decision Tree Classification (DTC). All three approaches were able to identifythe treatment that caused the most severe cytokine response. HCA was able to provide information aboutthe expected number of clusters in the data. PCA coupled with K-means clustering allowed classification oftreatments sample by sample, and visualizing clusters of treatments. DTCmodels showed the relative importanceof various cytokines such as IFN-γ, TNF-α and IL-10 to CRS. The use of these approaches in tandem providesbetter selection of parameters for onemethod based on outcomes from another, and an overall improved analysisof the data through complementary approaches. Moreover, the DTC analysis showed in addition that IL-17 maybe correlated with CRS reactions, although this correlation has not yet been corroborated in the literature.

© 2014 Elsevier B.V. All rights reserved.

1. Introduction

Cytokine Release Syndrome (CRS) is an adverse event resulting in sys-temic release of cytokines in humans upon exposure to therapeuticmAbs.Cytokine release reactions in humans range from mild to severe, withsymptoms such as fatigue, headache, urticaria and asthenia among others[1]. Some CRS reactions can be controlled by slowing the infusion rate ofthe mAb or by administering anti-inflammatory drugs [2]. However,when applied to human volunteers in a 2006 phase I clinical trial, theAnti-CD28 SAmAb TGN 1412 caused unexpected serious adverse events;healthy volunteers developed severe CRS within 90 min of receiving adose of Anti-CD28 SA [3]. These adverse events were not predicted bypre-clinical safety assessments using animal testing of the drugs [4].

Both non-human primates and rodents are relatively resistant toCRS. Although release of cytokines has been observed in animal models,rarely did it progress to clinically relevant levels [5–8]. Differences in

drexel.edu (M. Janko),m (D. Makropoulos),e.drexel.edu (M. Kam),

expression of target molecules, regulatory T cells, cytokines requiredfor inflammatory response, and cell surface receptors among humans,rodents and non-human primates [9–14] all indicate that it may notbe appropriate to use animal models to predict CRS in humans. Tofurther understand CRS in humans, an in vitro assay using humanwhole blood was developed and tested by Walker et al. [2]. This assaywas designed to support First-In-Human readiness, assessing the poten-tial for mAbs to release of cytokines similar to Anti-CD28 SA. The studyreported in this paper has used this assay for further analysis of CRSusing several machine learning approaches.

The assay tests mAbs that have been immobilized on Protein-A-coated polystyrene beads and incubated with whole human blood cul-tures to stimulate cytokine release. Following incubation, supernatantswere harvested and frozen for later multiplex analysis of cytokinesusing Searchlight™ technology. The assay was used by Walker et al.to compare the release of 11 cytokines2 from Anti-CD28 SA (clone

2 12 cytokinesweremeasured in the assay. However,Walker et al. [2] showedno statis-tically relevant difference in the concentration of IL-8 release produced byAutoPlasma andAnti-CD28 SA. Therefore, IL-8 was not considered further, leaving 11 cytokine levels to beanalyzed.

Page 2: Analysis of cytokine release assay data using machine learning approaches

Table 1List of cytokinesmeasured in the assay (theminimumvalues of all 11 cytokinesmeasuredwere below the level of quantitation. We use 0.1 pg/ml as minimum value for allcytokines.).

466 F. Xiong et al. / International Immunopharmacology 22 (2014) 465–479

ANC28.1/5D10) to those released by 8 othermAbs andcontrols usingHi-erarchical Cluster Analysis (HCA) [2]. In the present study, we analyze adata setwith 7mAbs and2 negative controls. They consist of (1) PBS (nobeads), which should not induce cytokines; and (2) beads coated withautologous plasma (AutoPlasma) from the same donors. The 7 mAbs,which are listed in Table 2, included 5 marketed mAbs whose ability toelicit CRS is known, and 2 research grade mAbs whose ability to elicitCRS is not known. Based on the mechanism of action of the mAb, wehave assigned these two “unknown”mAbs to the “Safe” category.

The objectives of our study, using this in vitro assay, were: 1) todetermine whether the cytokine response from a given mAb is similarto the severe cytokine response to Anti-CD28 SA; 2) to compare the util-ity of different machine learning approaches on the data collected fromthe assay; and 3) to determine the cytokines that are highly relevant forCRS detection. To achieve these objectives, three machine learningapproaches were applied to the data set. They are: 1) HierarchicalCluster Analysis (HCA) [15]; 2) Principal Component Analysis (PCA)[16] followed by K-means clustering [16]; and 3) Decision Tree Classifi-cation (DTC) [17,18].

Hierarchical Cluster Analysis (HCA) is a statistical method whichbuilds a hierarchy of clusters based on the similarity between samplesin a data set. This hierarchy is often displayed by means of a dendro-gram. HCA has beenwidely used to analyze various cytokine responses,for example, it was applied to group the pathogen exposures based onthemultivariate cytokine expression profiles induced in a host infectionmodel system [19].

Principal Component Analysis (PCA) is a method that identifiespatterns in high dimensional data and expresses these data in a waythat highlights their similarities and differences. The procedure usesan orthogonal transformation to convert the observations, which arepresented as functions of possibly correlated variables, into a set ofvalues of linearly uncorrelated variables called Principal Components[20]. PCA has been used to analyze cytokine responses in multiple stud-ies [21–23]. K-means clustering partitions a set of observations (whichin our case emanate from PCA) into a set of clusters. Each observationis placed in the cluster with the nearest mean (the mean of the clusterserves as its prototype). K-means clustering has also been used beforefor cytokine data analysis [24–26].

Decision Tree Classification (DTC) is a nonparametric statisticalsupervised learning method that incorporates feature selection andpossesses intuitive interpretability [27]. It maps observations about anitem into conclusions about the item's target value. DTC has been usedin several studies to analyze data related to cytokine response.McKinneyet al. [28] used DTC to study the serum cytokines that have the strongestassociationwith systemic adverse event of vaccinia virus. Patel et al. [29]used DTC to obtain amodel that classifies subjects based on 17 cytokinesinto a respiratory syncytial virus (RSV)-induced acute otitis media(AOM) group and an early treatment failure group, respectively.

Each of the machine learning techniques used in our study wasemployed in some form in past investigations of cytokine response.What is demonstrated here, however, is the advantage of using thesetechniques in tandem. This advantage includes better selection ofparameters for one method based on outcomes from another, andan overall improved analysis of the data through complementaryapproaches.

Cytokine Mean (pg/ml) Median (pg/ml) Maximum (pg/ml)

IL-1β 240.8 140.4 1932.3IL-2 12.0 2.3 406.2IL-4 2.2 0.5 45.2IL-6 4897.6 1620.7 39,195.5IL-10 6.3 2.7 100.8IL-12(p70) 8.3 5.4 55.5IL-17 22.1 0.1 263.2IL-18 13.0 10.6 71.1IFN-γ 4661.2 10.4 90,400.6

2. Materials and methods

2.1. Experimental design

Weused data from the in vitro assay described byWalker et al. [2] tostudy the cytokine concentrations elicited by various mAbs. For thisassay, blood was drawn aseptically under informed consent3 by

3 Quorum Review IRB approved protocol #NOCOMPOUNDNAP1001.

venipuncture using a 21-gauge needle from 44 normal human volun-teers into BD Heparin Vacutainer (San Jose, CA) tubes. Cultures wereset up within 2 h of blood collection. Previous reports on these typesof assays suggest the need to immobilize Anti-CD28 for maximal cyto-kine production [4]. For this purpose, Protein A coated polystyrenebeads were selected. Beads were coated with a saturating amount ofmAb and then distributed to a 96-well culture dish. Eachwell contained1 × 107 beads/well along with 200 μl of 1:10 diluted whole blood inRPMI 1640 media. After 48 h of culture, plates were centrifuged andsupernatants were collected, frozen, and stored for future multiplexcytokine analysis.

We used the assay to test the stimulation of human blood fromdifferent donors where the application of a given treatment (mAb) onblood from a particular donor constituted a sample. The concentrationsof the 11 cytokines shown in Table 1 were measured for each sample.These concentrations were measured in triplicate by multiplexenzyme-linked immunosorbent assay (ELISA) using SearchLight™technology fromAushon Biosystems (Billerica,MA). Datawere reportedin pg/ml for each sample and each cytokine. To allow calculationof mean values and graphic analysis, all concentrations below thelevel of quantitation were set to 0.1 [2]. The mean, median, and maxi-mum values of each cytokine for all the samples are also shown inTable 1.

The 7mAbs and 2 controls used in our study are described in Table 2,which shows the target, manufacturer, number of samples, expectedresults and class for each mAb. The “Expected results” column inTable 2 is based on the clinical literature (Tocilizumab and Palivizumab)and on themechanismof action of the research grademAb being similarto a compound that has clinical results (for Anti-CD28 SA, Anti-CD80,Anti-CD22, Anti-IL-1, or Anti-IL-5) [30]. The “class” column is based onthe expected reaction where severe CRS is caused by Anti-CD28 SA;no infusion reactions have been reported for the remaining treatments.The data were thus grouped into two categories, “CD28” and “Safe.” The“CD28” class contained samples only from cultures treated with Anti-CD28 SA. The “Safe” class contained mAbs that are not likely to causeCRS or an infusion reaction, and controls.

The data set analyzed in this paper contains a total of 432 samplesthat were measured through 11 runs of the assay. The information foreach run is shown in Table 3, including donors in each run, treatmentsused in each run, number of samples per treatment, and total numberof samples for each run. The sizes of sample sets corresponding todifferent treatments are uneven, an observation that would affect theperformance of subsequent analyses.

2.2. Hierarchical Cluster Analysis (HCA)

We will compare and integrate automatic methods to analyze andseparate mAbs and controls based on the known CRS responses of the

TNF-α (monomeric) 433.8 178.1 3729.6TNF-α (trimeric) 294.4 131.8 1809.5

Page 3: Analysis of cytokine release assay data using machine learning approaches

467F. Xiong et al. / International Immunopharmacology 22 (2014) 465–479

samples. The first method we used to analyze the treatments listed inTable 2 is agglomerative HCA, implemented inMatlab® 2012a software(The Mathworks, Natick, MA). Agglomerative HCA was applied tomeans of the cytokine samples from each one of the Table 2 treatments.It is a “bottom up” approach which first considers each treatment asbeing in its own cluster and then merges pairs of clusters by theirdistances from each other. The process repeats until all treatments arewithin one cluster. Each treatment was evaluated by using anunweighted group mean with Euclidean distance as the similaritymeasurement. The Euclidean distance, dab, between two means, ma

and mb of treatments a and b is defined as

dab ¼ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffima−mbð Þ mb−mað Þ

p: ð1Þ

HCA was first applied on the set {PBS, AutoPlasma, Anti-CD28 SA},then on all the treatments listed in Table 2.

2.3. Principal Component Analysis followed by K-means clustering

2.3.1. Principal Component AnalysisPrincipal Component Analysis (PCA) is an algorithmcommonly used

to reduce the number of attributes used to represent a set of data. PCAtransforms the original data (whichmay be given as a function of corre-lated variables) into linearly uncorrelated attributes, by projecting theoriginal data onto orthogonal components such that the variance ofthe projected data is maximized [16]. These orthogonal componentsare obtained by using singular value decomposition of the covariancematrix Σ associated with the data [18].

The eigenvectors of the covariancematrix Σ are known to character-ize the orthogonal components (Principal Components). The eigen-values of Σ are equal to the variances associated with each PrincipalComponent [16]. PCA was applied to reduce the number of attributesused to represent the data. We calculated the variance associated witheach of the Principal Components and chose the components with thelargest variances. We ignored the Principal Components that accountedfor small amount of variance.

The best results from PCA are obtained for data sets whose attributeshave similar dynamic ranges [16]. In our case, the data vary greatly, asseen in column 4 of Table 1, so we applied a logarithmic transformationon the data before proceeding with the analysis. This transformationmay be usedwhen the attributes show a linear or nearly-linear relation-ship between the standard deviation and the mean for each treatment[24,26,31], which our data do exhibit (as shown in Fig. 3).

2.3.2. K-means clusteringK-means clustering, which we applied to the data after PCA, assigns

the n observations in a data set into k clusters. Each observation isassigned to the cluster whose mean is the nearest to that observation.The standard K-means clustering algorithm is based on alternatingtwo procedures [32]. The first procedure is the assignment of observa-tions to clusters. The second procedure is the calculation of new cluster

Table 2List of mAbs and controls used in our CRS detection study.

mAb/clone no. Target Manufacturer

Anti-CD28 SA/ANC28.1/5D10 CD28 AncellAnti-CD80/2D10 CD80 AbcamAnti-CD22/LT22 CD22 AbcamAnti-IL-1β/2805 IL-1β R&D SystemsAnti-IL-5/QS-5 IL-5 AbcamTocilizumab IL-6 receptor RochePalivizumab RSV fusion MedimmunePBS (Control) –

AutoPlasma (Control) –

means based on the assignments. The process stops when no reassign-ment of an observation to a cluster would minimize the within-clustersum of distances between samples and the mean of each cluster. TheEuclidean measure was used here to calculate the distance betweenobservations.

2.4. Decision Tree Classification

The C4.5 DTC algorithm [17] implemented in the Weka 3.6.6 soft-ware [33] (University of Waikato, Hamilton, New Zealand) was alsoused to analyze the data set. This is a supervised machine learningalgorithm, meaning that a correctly-labeled data set is required to“train” the algorithm before the algorithm can be applied to unknowndata. Each observation in the data set is defined by a collection ofmeasured attributes (in our case cytokine levels) and a correspondinggroup or class label. The algorithm defines a set of rules that assigneach observation to a corresponding class. The input to the algorithmis the collection of attributes for a given observation, and the output isthe assigned class for that observation.

The DTC algorithm uses training data to infer rules that describe thecorrespondence between input attributes and the classes into whichthey are associated. The algorithm applies a “divide and conquer”approach, resulting in an iterative process that starts by analyzingeach attribute from the training data separately from the others. Itcalculates the information gain for each attribute with respect to thepossible class outcomes present in the training data [17]. The attributewith the highest information gain is denoted as the “root node.” Thisattribute is used to make the first separation of the data samples in aprocess called “branching” that assigns a “branch” to each sampleaccording to the attribute value present in the sample.

After the first branching operation, the samples belonging to aspecific branch may be associated with a class if all the samples in thebranch have the same class label. In this case, the algorithm is said tohave researched a “leaf node” denoted by the label associated with thecorresponding class. If the samples in the branch have different class la-bels, the sample is analyzed further tofind the attributewith the highestinformation gain. This process is repeated until all observations in thedata set are assigned a class.

Since the attributes in our data set have continuous values, it wasnecessary for the algorithm to define thresholds that separate the possi-ble attribute values into discrete groups which can then be associatedwith a corresponding class. Each threshold is chosen by iteratively cal-culating information gains for certain threshold candidates in the datafrom the attribute being analyzed (themethod for generating thresholdcandidates is described in [34, Section 6.1]). The candidate with thehighest information gain is used to split the values of the attributesand build the Decision Tree.

The rules created by the algorithm are displayed using a tree struc-ture consisting of test nodes and branches. Each node represents thetesting of a rule applied to a certain attribute. The branches representthe possible outcomes from the test, and point to either a class label

Samples Expected results Class

152 Severe CRS CD288 No CRS or infusion reactions Safe8 No CRS or infusion reactions Safe8 No CRS or infusion reactions Safe8 No CRS or infusion reactions Safe8 No CRS or infusion reactions Safe8 No CRS or infusion reactions Safe

80 No CRS or infusion reactions Safe152 No CRS or infusion reactions Safe

Page 4: Analysis of cytokine release assay data using machine learning approaches

Table 4Training data set 1 for DTC.

Class MAbs and controls # samples

Safe PBS (56)AutoPlasma (80)

136

CD28 Anti-CD28 SA (80) 80

468 F. Xiong et al. / International Immunopharmacology 22 (2014) 465–479

or to another node for further testing. The top node in the tree is the“root node,”which represents the testing of the most relevant attributeobtained by the algorithm. The class assigned for a given sample isdenoted by a “leaf node,” which is located at the bottom of the treeand contains the label assigned to a given sample.

The DTC algorithm analyzes all the attributes in the training set andselects the best attribute that maximizes the information gain as theroot node. This attribute is considered the most relevant for classifica-tion among all attributes [17]. The second most relevant attribute isfound by removing the attribute in the root node and applying the algo-rithm to the remaining attributes [35]. The new root node is consideredthe second most relevant attribute for classification. This process iscontinued until all attributes were considered.

2.4.1. The DTC model construction (definition of training data sets and testdata set)

The DTCmodel is constructed using the data set shown in Table 3. Inorder to build and validate the model, we divided the data into trainingdata and test data.Wedefined two trainingdata sets using someor all ofthe 264 samples from 32 donors in runs 1–8 shown in Table 3. Theremaining 168 samples from 15 donors in runs 9–11, were reserved astest data set. Before applying the test data from runs 9–11 we rancross validations with the two training data sets to determine theaccuracy of our models. This process is described in the next section.

Training data set 1, consisting of 216 samples shown in Table 4, wasassembled using only the control samples for the “Safe” class and the

Table 3Donor information for 11 runs in the data set.

Run ID Donor IDs

Run 1 Donor #5, donor #9, donor #30, donor #36

Run 2 Donor #1, donor #4, donor #27, donor #29

Run 3 Donor #2, donor #8, donor #10, donor #21, donor #22, donor #23, donor #25,

Run 4 Donor #1, donor #5, donor #7, donor #9, donor #10, donor #12, donor #19, do

Run 5 Donor #1, donor #5, donor #10, donor #25, donor #36, donor #37, donor #39,

Run 6 Donor #1, donor #7, donor #10, donor #13, donor #19, donor #25, donor #37,

Run 7 Donor #6, donor #12, donor #20, donor #21, donor #28, donor #34, donor #37

Run 8 Donor #13, donor #14 donor #17, donor #19, donor #20, donor #24, donor #2

Run 9 Donor #11, donor #19, donor #15, donor #31, donor #32, donor #33, donor #4

Run 10 Donor #11, donor #19, donor #15, donor #31, donor #32, donor #33, donor #4

Run 11 Donor #3, donor #16, donor #17, donor #18, donor #35, donor #41, donor #42

Anti-CD28 SA samples for the “CD28” class. The use of these data forthe training set is expected tomaximize thedifferencebetween samplesfrom the two classes.

For training data set 2, consisting of the 264 samples shown inTable 5, we used additional mAbs in the “Safe” class. The “Safe” classincluded Tocilizumab [36] and Palivizumab [37], two mAbs thatwere tested clinically and have shown no CRS reactions, and four(4) research-grade compounds that were not tested clinically but havethe same target as mAbs that were tested clinically. The four research-grade compounds are assumed – like their clinical counterparts – notto cause CRS reactions. They were: 1) CD80 which may be similarto Galiximab [38]; 2) CD22 which is similar to Epratuzumab [38]; 3)IL-1βwhich is similar to Canakinumab [39]; and 4) IL-5 which is similarto Mepolizumab [40]. The controls, PBS and AutoPlasma, were alsoincluded in the “Safe” class.

The test data set was selected from the 168 samples in runs 9–11shown in Table 3. Since there are 3 common donors (donor #17,donor #19 and donor #41) between runs 1–8 and runs 9–11, the 42samples of these three donors were removed from the test data set,

Treatments used Sample number per treatment Total samples

PBS 4 20AutoPlasma 4Anti-CD28 SA 4CD80 4CD22 4PBS 4 20AutoPlasma 4Anti-CD28 SA 4CD80 4CD22 4

donor #40 PBS 8 40AutoPlasma 8Anti-CD28 SA 8IL-1β 8IL-5 8

nor #25 PBS 8 24AutoPlasma 8Anti-CD28 SA 8

donor #40 PBS 8 40AutoPlasma 16Anti-CD28 SA 16

donor #39 PBS 8 56AutoPlasma 24Anti-CD28 SA 24

, donor #38 PBS 8 24AutoPlasma 8Anti-CD28 SA 8

6, donor #41 PBS 8 40AutoPlasma 8Anti-CD28 SA 8Tocilizumab 8Palivizumab 8

1, donor #43 PBS 8 56AutoPlasma 24Anti-CD28 SA 24

1, donor #43 PBS 8 56AutoPlasma 24Anti-CD28 SA 24

, donor #44 PBS 8 56AutoPlasma 24Anti-CD28 SA 24

Page 5: Analysis of cytokine release assay data using machine learning approaches

Table 5Training data set 2 for DTC.

Class MAbs and controls # samples

Safe PBS (56) 184AutoPlasma (80)Tocilizumab (8)Palivizumab (8)CD80 (8)CD22 (8)IL-1β (8)IL-5 (8)

CD28 Anti-CD28 SA (80) 80

469F. Xiong et al. / International Immunopharmacology 22 (2014) 465–479

leaving 126 samples in the test data set, as shown in Table 6. The testdata samples therefore came from different donors than those used intraining data sets 1 and 2, and could be used to assess the ability ofthe DTC models to classify new data.

Fig. 1. HCA dendrogram for Anti-CD28 SA, AutoPlasma and PBS (Table 4).

2.4.2. Cross validation method for estimating DTC classification accuracyIn order to estimate the classification accuracy of the DTC model on

new data with unknown class labels, the following cross validationmethodology was used: the entire training data set was split randomlyinto two sets: one set was used for training (2/3 of the data) and one setwas used for cross validation (1/3 of the data). To prevent samples fromthe same donor from appearing in both sets,we grouped the samples bydonor, and then randomly selected donors whose sample numbersadded up to one third of the total number of samples. The samplesfrom the rest of the donors were used for training the DTC model. Thisprocess was repeated multiple times until samples from all the donorswere considered equally often for both training and validation. Theclassification accuracy estimatewas defined as the average classificationaccuracy for all the training/testing set partitions.

3. Results

3.1. Hierarchical Cluster Analysis

Hierarchical Cluster Analysis was applied to the data shown inTables 4 and 5. The resulting dendrograms are shown in Figs. 1 and 2.The dendrograms represent the cluster hierarchy of the data set; thehorizontal axis is the standardized Euclidean distance between pairsof clusters. Both dendrograms show that the distances between theAnti-CD28 SA cluster and all other mAb clusters are quite large, indicat-ing that the average response for Anti-CD28 SA is very different fromthat of the other mAbs. Fig. 1 shows the HCA for Anti-CD28 SA and con-trols only, indicating clear separation between themeans of both sets ofdata. Fig. 2 shows the HCA for Anti-CD28 SA, the controls, and the 6mAbs from the “Safe” class. Again, we see clear separation betweenAnti-CD28 SA and the other treatments. In addition, the dendrogramin Fig. 2 appears to separate the controls and mAbs in the “Safe” classinto two separate clusters with Tocilizumab, IL-5, Palivizumab, CD22,and CD80 in one cluster and PBS, AutoPlasma and IL-1β in the othercluster. However, the method does not provide direct explanationas to why these treatments from the “Safe” class form two separateclusters and how the treatments within each cluster are related.

Table 6Test data set for DTC.

Class MAbs and controls # samples

Safe PBS (18)AutoPlasma (54)

72

CD28 Anti-CD28 SA (54) 54

3.2. Principal Component Analysis (PCA) with K-means clustering

Principal Component Analysis (PCA) was applied in order to revealthe internal structure of the data in away that best explains the variancein the data. It is well documented that in order to be successful, theattributes subject to PCA need to have similar dynamic ranges [16,31].However, the attributes from our data set have a large variation intheir dynamic ranges, as shown in column 4 of Table 1. As shown inFig. 3, our attributes also exhibit a nearly linear relationship betweenthe mean and the standard deviation for samples of different treat-ments. This property allows us to reduce the differences in the dynamicranges of the attributes by using the logarithmic transformation [24,26,31]. This transformation replaces each sample data value by its loga-rithm (base 10) before applying PCA.

We applied PCA to the samples from Anti-CD28 SA and the controlsin Table 4 and sorted the Principal Components in descending order bythe amount of variance associated with them. The variances associatedwith each of the Principal Components are listed in Table 7, where wesee that the first three Principal Components account for more than90% of the variance of the data. This observation suggests that we canpossibly ignore the rest of the Principal Components, with little impacton the underlying structure of the data. We then represented the datagraphically using a three-dimensional scatter plot against the threeprinciple components (Fig. 4(a)).

Principal Component Analysis (PCA) helps with visualizing but doesnot cluster the data point in the new coordinate system. In order to iden-tify sample populations in the new attribute space, we used K-means

Fig. 2. HCA dendrogram for all the treatments (Table 5).

Page 6: Analysis of cytokine release assay data using machine learning approaches

Fig. 3. Standard deviation vs. means for all cytokines showing the least squares linear approximation for all points (IFN-γ, IL-4 and IL-17 were plotted with and without Anti-CD28 SA toconfirm the linear relationships).

470 F. Xiong et al. / International Immunopharmacology 22 (2014) 465–479

clustering. The technique requires thatwe specify the number of clustersinto which observations will be assigned before the clustering processstarts. We used the previous HCA to determine the number of clusters,which in this case is 3. Fig. 4(b) shows the samples corresponding toeach cluster in different colors, and Table 8 shows the number of samplesin each cluster and the label associated with them. It is notable thatCluster 1 consists mostly of Anti-CD28 SA samples. Clusters 2 and 3 aredominated by AutoPlasma and PBS samples, respectively. These resultsconfirm that samples from Anti-CD28 SA and the two controls can beseparated from each other using this method.

Applying the known class labels of the samples to visualize the dataas shown in Fig. 4(c), we see that each sample population, Anti-CD28 SAand the two controls, is located in a different region within the three-dimensional space. Using PCA we are thus able to graphically representthe differences in the cytokine responses for each sample, and can

Table 7Variance associated with Principal Components for Anti-CD28 SA, PBS and AutoPlasma.

Variance included in 11 Principal Components (PC)

PC 1 PC 2 PC 3 PC 4–11

77.6% 10.9% 3.1% 8.4%91.6% 8.4%

observe how samples from a given population are grouped in a specificregion. These observations are consistent with the dendrogramfrom Fig. 1 in which the differences between the mean response ofAnti-CD28 SA and the controls are highlighted. Using PCA we can notonly corroborate this separation, but also see the differences betweensamples. Finally, using the information from the labels associated witheach sample and the results obtained from K-means clustering, wecan identify and visualize the misclassified samples. Fig. 4(d) showsthe misclassified samples. Not surprisingly, they are located on theboundaries of the established cluster.

Next, we repeated the analysis (PCA with K-means) using thesamples from all treatments shown in Table 5. This analysis wouldattempt to classify all treatment samples, not just the {Anti-CD28 SA,AutoPlasma, PBS} sample set. The variances associated with each ofthe Principal Components are listed in Table 9. Again, the first threePrincipal Components accounted for more than 90% of the variance ofthe data; we used these three Principal Components to create a three-dimensional scatter plot of the samples fromAnti-CD28 SA, “Safe” treat-ments, and controls (Fig. 5(a)).

After applying PCA, we applied K-means clustering to the results ofPCA, considering the first three Principal Components. As before, weused HCA to guide us as to the number of clusters. The dendrogramin Fig. 2 suggested 3 clusters, which we have used to run K-means clus-tering on the transformed data. The clustering results are shown in

Page 7: Analysis of cytokine release assay data using machine learning approaches

Fig. 4. (a) Graphical representation of the data using the first three Principal Components of PCA. (b) K-means clustering results based on the first three Principal Components. (c) Visualrepresentation of the data using the known labels to identify populations after applying PCA. (d) K-means clustering showing misclassified samples.

471F. Xiong et al. / International Immunopharmacology 22 (2014) 465–479

Fig. 5(b) and are detailed further in Table 10. The table shows the threeclusters and their constituents. The first cluster consists mostly of Anti-CD28 samples; the second cluster consists mostly of samples fromAutoPlasma, and the third cluster consists mostly of samples from PBS.

The samples from CD28 and the two controls (AutoPlasma and PBS)can again be distinguished from one another using this technique. Mostsamples from the “Safe” treatments are clustered with the controls,indicating the relative similarity of response between the controls andthe safe treatments. The small number of the “Safe” and control samplesmakes it impractical to cluster these samples separately. The results ofthis analysis are consistent with the results from HCA and the observa-tions made on PCAwith labeled data.Whenwe use the known labels of

Table 8This table shows the number of samples for each treatment in each cluster found byK-means on the data using the first three Principal Components.

CD28 AutoPlasma PBS

Cluster 1 132 10 0

Cluster 2 20 142 5

Cluster 3 0 0 75

the samples to visualize the data after PCA is applied (as shown inFig. 5(c)), we see that most of the samples from the “Safe” treatmentswere placed closer to the control samples than to the Anti-CD28 SAsamples. This observation is consistent with the dendrogram obtainedfrom the HCA analysis in Fig. 2.

3.3. Decision Tree Classification (DTC)

Using HCA and PCA with K-means clustering we can automaticallyseparate samples from Anti-CD28 SA and other treatments, and visual-ize the data in a three-dimensional space. However we still have notdetermined how the different cytokine releases led to this separation.To add this dimension to the study, we have employed Decision TreeClassification (DTC). The two training data sets described in Tables 4and 5were used to train the DTC algorithm. Training data set 1 included136 samples in the “Safe” class and 80 samples in the “CD28” class, for atotal of 216 samples (Table 4). The cytokine that best separated the twoclasses among the 11measured cytokines was selected first by the DTC,and the selection process was then repeated with the next best cyto-kine, etc. The tree model shown in Fig. 6(a) was built using trainingdata set 1. As can be seen from the root node, the cytokine which pro-duced the first split of data was IFN-γ. The 131 samples in this dataset with values of IFN-γ ≤ 31.5 pg/ml were classified into the “Safe”class while the 66 samples in this data set with values of IFN-γ greater

Page 8: Analysis of cytokine release assay data using machine learning approaches

Table 9Variance associated with Principal Components for all treatments.

Variance included in 11 Principal Components (PC)

PC 1 PC 2 PC 3 PC 4–11

75.2% 11.9% 3.2% 9.7%90.3% 9.7%

Table 10The number of samples for each treatment in each cluster found by K-means on the datausing the first three Principal Components.

CD28 AutoPlasma PBS CD80 CD22 IL–1β IL–5 Tocilizumab Palivizumab

Cluster 1 132 12 0 2 2 2 4 0 0

Cluster 2 20 140 4 6 6 0 4 8 8

Cluster 3 0 0 76 0 0 6 0 0 0

472 F. Xiong et al. / International Immunopharmacology 22 (2014) 465–479

than 136.9 pg/ml were classified into the “CD28” class. Samples withvalues of IFN-γ between 31.5 pg/ml and 136.9 pg/ml required furtheranalysis using the IL-2 node to identify the corresponding classes.Values of IL-2 ≤ 28.1 pg/ml were in the “CD28” class, and values of IL-2 greater than 28.1 pg/ml were in the “Safe” class. On the left branchof IL-2 node, the notation “(15/1)” indicates that of the 15 samplesassigned to the “CD28” class, one (1) sample was misclassified. The ac-curacy of this tree model in classifying new data correctly is estimatedby the cross validation procedure described in Section 2.4.2 (in addition,10-fold cross validation was also used, and the results are discussed inAppendix A). A square Confusion Matrix (CM) is used here to recordthe performance of the DTC model. Each column of the CM represents

Fig. 5. (a) Data (all treatments) representation based on Principal Components after selecting thPrincipal Components. (c) Representation of the data based on the labels known for each samp

the instances of the predicted class. Each row represents the instancesof the actual class. CMij is the number of samples in the true class iwhichwere assigned to predicted class j. Ideally CM is a diagonalmatrix.

Cross validation was performed 100 times. The average accuracyafter 100 times was 95.5%; the corresponding Confusion Matrix isshown in Fig. 6(b). The false alarm or false positive rate of the DTCmodel was 5.1% and the misdetection or false negative rate was 3.8%.The classification accuracy obtained from the cross validation methodis high, indicating that the developed model was reliable in this casein terms of its ability to classify samples from unknown donors.

e three first Principal Components. (b) K-means clustering results based on the first threele.

Page 9: Analysis of cytokine release assay data using machine learning approaches

4 We applied K-means clustering to both raw data and the outcomes of PCA on the rawdata. The clustering error rateswere comparable (8%–10% on average in both cases). How-ever, using PCA gave a clear low-dimensional representation of the data. Furthermore, wecould visualize the results of K-means clustering using the 3 principal components fromPCA, which could not be done directly using the 11-dimensional raw data.

(a) Decision Tree model

Classifier Outcome Classifier accuracy (95.5%)

Safe CD28

KnownSafe 129 7 False alarm/False posi�ve 5.1%CD28 3 77 Misdetec�on/False nega�ve 3.8%

(b) Confusion Matrix for cross validation

Fig. 6. (a) Decision Tree model using training data set 1 with 11 cytokines. (b) The Confusion Matrix shows the performance of the cross validation.

473F. Xiong et al. / International Immunopharmacology 22 (2014) 465–479

Trainingdata set 2 included184 samples from the “Safe” class and 80samples from the “CD28” class, for a total of 264 samples (Table 5). Thetree model and its Confusion Matrix are shown in Fig. 7. The averageaccuracy of the classification for this tree model was 93.1% for 100 iter-ations of the random donor splitting method. The Confusion Matrix isshown in Fig. 7(b). The classification rules can be inferred from themodel in the same manner as for training data set 1. Here we see thatIL-17, IL-10 and TNF-α (monomeric) are used in addition to IFN-γ andIL-2 to classify the samples.

The DTC model generated using training data set 2 is shown inFig. 7(a), where we observe that the root node was IFN-γ. It producedthe first data split identifying 164 samples in the “Safe” class with valuesof IFN-γ≤ 34.1 pg/ml and 100 sampleswith values of IFN-γ N 34.1 pg/mlthat required further classification. The cytokine IL-17 was used as thesecond cytokine for separation, suggesting that it is relevant for classifica-tion, although IL-17 release is not commonly measured in CRS assays(e.g., [30,41]).

Both DTC models identified IFN-γ as the most relevant cytokine forclassification. Next, we have removed IFN-γ samples from the data set,and ran our models with the remaining 10 cytokines. The results areshown in Fig. 8, where both models identify IL-17 as the root node.The resulting tree structures are more complex than the previous anal-ysis with IFN-γ as the root node; however, the classification accuraciesare still relatively high at 90.7% and 95.1% for training data sets 1 and2, respectively. These results indicate that IL-17 may also be relevantfor classification of these cytokine data.

In order to assess the ability of the DTC models (shown inFigs. 6(a) and 7(a)) to analyze newdata, we used test sets for themodelsthat did not include any of the data in the training set. Specifically, weused the (hitherto unseen) test data described in Table 6. The accuracyof the two models in Figs. 6(a) and 7(a) is provided in Tables 11 and

12, respectively. The accuracy percentages are 92.9% for the DTC inFig. 6(a) and 93.7% for the DTC in Fig. 7(a).

4. Discussion

The results obtained from the analysis of our data set, using the threemachine learning approaches, are consistent in discriminating betweenAnti-CD28 SA and other mAbs with respect to CRS. However, eachapproach provides different information about the studied data set.

HCA highlights the differences between the treatments usingthe means of their cytokine response. It also provides a simple clusterhierarchy of the treatments, e.g., in Fig. 2 we identified three clusterscorresponding to Anti-CD28 SA, controls, and the other treatments inthe “Safe” class. These findings are consistent with the results fromWalker et al. [2] and give guidance on how many clusters may exist inthe data. On the other hand, HCA does not allow classification of individ-ual samples from the data set since it uses the average response for eachmAb as the basis for the measure of “distance” in order to find thecluster hierarchy. Using HCA it is not clear whether some cytokineshave more discriminatory relevance than others.

PCAwas used in this study to build scatter plots to visualize the datasample-by-sample. The first three Principal Components that accountfor most of the variance were used to visualize the data, showingwhere each sample is located in the suggested Principal Componentspace. We used K-means clustering to further analyze the PCA results4

Page 10: Analysis of cytokine release assay data using machine learning approaches

Classifier Outcome Classifier accuracy (93.1%)

Safe CD28

KnownSafe 173 11 False alarm/False posi�ve 6.0%CD28 7 73 Misdetec�on/False nega�ve 8.75%

(a) Decision Tree model

(b) Confusion Matrix for cross validation

Fig. 7. (a) Decision Tree model using training data set 2 with 11 cytokines. (b) The Confusion Matrix shows the performance of the cross validation.

474 F. Xiong et al. / International Immunopharmacology 22 (2014) 465–479

seeking three clusters, whichwas suggested by HCA. Samples fromAnti-CD28 SAwere placed far away from the other samples, and samples fromthe treatments in the “Safe” class were closer to PBS and AutoPlasmathan they were to Anti-CD28 SA. However, PCA and K-means clusteringprovided, in addition to the information provided by HCA, a method ofclustering actual samples rather than clustering the means of samples.The resulting classification can also be used to assign unlabeled samplesto a particular cluster, based on the distance of the sample from thecentroid of the cluster. The Principal Components used in this approachdo not have a direct biological interpretation.

DTCfills this biological interpretation gap. AswithHCA and PCA, DTCcan also distinguish Anti-CD28 SA from all other treatments and con-trols, but in addition it provides information about which cytokinesare most relevant to the separation of Anti-CD28 SA. The rules createdby DTC specified thresholds that separated cytokine release levels intogroups. These groups can then be associated with the “Safe” and the“CD28” classes. For example, in the DTC model in Fig. 6(a), when thevalue of IFN-γ is greater than 136.9 pg/ml, the model classifies 66 sam-ples into the “CD28” class. The value “136.9 pg/ml” is a threshold chosenby iteratively calculating information gains for threshold candidatesassociated with each cytokine and using the value with the highestinformation gain (in [34, Section 6.1]).

In addition to ease of interpretation, DTC shows other advantages inanalyzing biological data. First, once the tree model is constructed froma training data set, it can easily be used to classify unknown samples aswe demonstrated using the test data set in Table 6. Our results alsoshow that DTC provided high classification accuracy, and required

relatively little effort from users for data preparation. Similarity conclu-sions were reported in [42], which compared several machine learningalgorithms in cancer tissue classification, and in [43], which studied theuse of Decision Tree based classification of uncertain data.

Our study needs to be viewed in the context of several other inves-tigations of cytokine release through machine learning approaches[19,21–26,28,29] as mentioned in Section 1. Collectively these studiespoint to the usefulness of applying the three machine learningapproaches studied here (HCA, PCA, K-means clustering and DTC) tomAb induced cytokine release data from a variety of assays and condi-tions. In order to apply these approaches to a new assay, the followingrequirements should be met. First, there should be some labeled refer-ence data from mAbs with known CRS potential. In some cases wemay know the CRS potential of all the mAbs and controls. In othercases, when new mAbs with unknown CRS potential are studied, wecould measure their similarity to the known mAbs or controls to inferthe CRS potential of the new mAbs. Second, enough samples shouldbe collected to get useful and consistent results. This has been done,in our case, by starting with a “reasonable” number of samples (5–10),categorizing the samples using the machine learning approaches, thenrepeating with about ten more samples each time. Once the resultsstarted stabilizing (which in our case was at about 120 samples), weconcluded that enough samples have been used (see Appendix B).

Prior studies have shown that CRS is mainly related to high levels ofIFN-γ, TNF-α, IL-6, IL-2, IL-8, and IL-10 [30,41]. The analysis of TGN 1412treated patients shows that TNF-α was increased at 1 h after infusionand IFN-γ, IL-2, IL-6 and IL-10were increased after 4 h [44]. In addition,

Page 11: Analysis of cytokine release assay data using machine learning approaches

(a) Decision Tree model using Training data set 1

(b) Decision Tree model using Training data set 2

Fig. 8. In order to verify the importance of IL-17, IFN-γ has been removed from training data sets 1 and 2. DTC was applied to the remaining 10 cytokines and the two tree models are:(a) Tree model corresponding to training data set 1. (b) Tree model corresponding to training data set 2. Both two tree models show IL-17 as the root node.

Table 11Test results for the tree model in Fig. 6(a).

Classifieroutcome

Test accuracy (92.9%)

Safe CD28

Known Safe 63 9 False alarm/false positive 12.5%CD28 0 54 Misdetection/false negative 0%

475F. Xiong et al. / International Immunopharmacology 22 (2014) 465–479

recent work usingmathematical modeling suggests that there is a causeand effect relationship among someof the cytokines [45]. Serum collect-ed from the TGN 1412 trial showed that INF-γ and TNF-α are the firstcytokines to be produced and from the mathematical modeling INF-γis thought to subsequently induce IL-10 and IL-6, whereas, TNF-α isthought to induce IL-8 and IL-10. In fact, our DTC models identifiedIFN-γ, TNF-α and IL-10 as important cytokines in CRS, corroboratingthese findings (as noted earlier we excluded IL-8 from the analysisdue to lack of differentiation of responses between Anti-CD28 SA andcontrols).

Page 12: Analysis of cytokine release assay data using machine learning approaches

Table A2Confusion Matrix of DTC model in Fig. 7(a) for 10-fold cross validation.

Classifieroutcome

Classification accuracy (96.2%)

Safe CD28

Known Safe 179 5 False alarm/false positive 2.7%CD28 5 75 Misdetection/false negative 6.25%

Table 12Test results for the tree model in Fig. 7(a).

Classifieroutcome

Test accuracy (93.7%)

Safe CD28

Known Safe 72 0 False alarm/false positive 0%CD28 8 46 Misdetection/false negative 14.8%

476 F. Xiong et al. / International Immunopharmacology 22 (2014) 465–479

Our DTC analysis suggested IL-17 as a potentially important cytokinein classifying treatments as “Safe” class or “CD28” class. IL-17 has notbeen identified so far in the literature as an important cytokine in thecontext of CRS. Literature reports of cytokine analysis of the TGN 1412trial did not measure IL-17. In the published literature, this cytokinehas not been measured in assays developed for detecting mAb-inducedCRS. Since IL-17 is a pro-inflammatory cytokine, it should contribute toCRS since it has been shown to stimulate a highly pro-inflammatorygene signature [46,47]. It induces NF-Kβ [48], a transcription factor his-torically known for fast-acting pro-inflammatory cellular responses[49]. IL-17 is also known to induce IL-6, IL-8, G-CSF, and prostaglandinE2 production by mesenchymal cells and cause accumulation of neutro-phils in the blood and tissues [50]. This IL-6 production could act on acutephase proteins and lead to inflammation. Moreover, IL-17 has beenshown to increase the effects of TNF-α partially by increasing Tumor Ne-crosis Factor Receptor 2 (TNFR2) or Lipopolysaccharide-Induced CXCChemokine (LIX) [51,52], possibly contributing to the CRS cascade,which is thought to start with TNF-α and IFN-γ. IL-17 is thought to beimportant in chronic inflammatory conditions such as autoimmune dis-eases, transplantation, and infections [53,54]. In addition to IL-17 beingpro-inflammatory in nature, IL-17 has been shown to be involved inother forms of cytokine storm including those induced by bacteria andtransplant [55–57]. Therefore, it is not surprising that our DTC modelwas able to identify IL-17 as an important cytokine in detecting CRSeven though this correlation has not been noted so far in the literature.

To conclude, we used different approaches to analyze data from anin vitro assay that uses human blood to assess the potential of CRSfrom different mAbs. All of the approaches were able to identify thetreatment that caused the most severe cytokine response. Additionally,PCA and K-means clustering allowed classifying treatments sample bysample and visualizing them in a low dimensional space. DTC modelsshowed the relative importance of various cytokines such as IFN-γ,TNF-α and IL-10 to CRS. The combined use of the techniques provideda more comprehensive view of the data and better-informed processesfor selection of parameters and thresholds.

Acknowledgments

This work was supported by Janssen Research and Development.

Appendix A. 10-Fold cross validation for Decision Tree Classification

A 10-fold cross validation [58] was used here to estimate the DTCmodel's classification accuracy on new data with unknown class labels.This method consists of splitting known data into 10 equal parts, about90% of the data were used to train the algorithm and the classificationaccuracy was tested with the remaining 10% of the data (cross valida-tion). This process was repeated 10 times using different parts of the

Table A1Confusion Matrix of DTC model in Fig. 6(a) for 10-fold cross validation.

Classifieroutcome

Classification accuracy (98.1%)

Safe CD28

Known Safe 132 4 False alarm/false positive 2.9%CD28 0 80 Misdetection/false negative 0%

data for training and cross validation. The classification accuracyestimate was defined as the average classification accuracy for the 10iterations.

The classification accuracy estimate of our DTC model built usingtraining data set 1 was 98.1%; the corresponding Confusion Matrix isshown in Table A1. For training data set 2, the average classificationaccuracy for the DTC model was 96.2%. The Confusion Matrix is shownin Table A2.

Appendix B. Sample size requirement assessment

In order to perform clustering (by PCA followed by K-means cluster-ing) or classification (by DTC), we need to estimate sample size. Thereexist several approaches that could be used to perform this estimation,

Fig. A1. Percentage error as function of the number of samples used. (a) Measured data.(b) Artificially generated data.

Page 13: Analysis of cytokine release assay data using machine learning approaches

477F. Xiong et al. / International Immunopharmacology 22 (2014) 465–479

depending on the data set and focus of study [59–61].Wepresent in thisappendix several approaches that proved useful for our data sets.

B.1. PCA followed by K-means clustering

First we estimate sample size for PCA followed by K-means clustering.We select 80 samples at random from each of the negative control sets(PBS and AutoPlasma) and the Anti-CD28 SA set for a total of 240samples. This subset of data was used to estimate sample size.

Before we estimate sample size for unlabeled data, it is instructive touse labeled sets for reference. In our case we had a labeled set of PBS,AutoPlasma and Anti-CD28 SA. We performed PCA followed by K-meansclustering, specifying 3 clusters. First we used only 6 samples from thedata set, and applied PCA followed by K-means clustering 10 times. Wethen computed the average error rate by comparing the clusteringresults to the labels of the data. Next, we included three more samples(selected at random) in the data set and applied the algorithm again.We repeated this procedure until we reached all 240 samples availablefor analysis. The results are illustrated in Fig. A1(a). It shows that theerror rates stabilized after about 100 samples, indicating that this isapproximately the lower bound on sample size.

When labels are not available, we can generate artificial data withthe same distribution as the data we need to process, and employ asimilar process. In our case, we generated three groups of artificialdata using the statistics from the samples of PBS, AutoPlasma andAnti-CD28 SA. We generated 240 samples of artificial data in total. Thesample size estimation was performed through the same procedure aswith assay data, and the results of this assessment are illustrated inFig. A1(b). Here the error rates seem to stabilize after about 80 samples,a result sufficiently close to the values found with the real data (whichwas 100). The use of artificial data would probably require the use of a“guard band,” adopting a somewhat higher sample size in practicethan the one that was estimated in simulations.

When the labels for the handled data are unknown, there are severalother valid approaches. As an example, we used the silhouette metric[62] to estimate the adequate number of samples necessary to get con-sistent results with PCA followed by K-means clustering. The silhouettemetric provides an assessment of how good the clustering results are,

Fig. A2. Optimal number of clusters for different s

irrespective of the clustering method. It measures how close data sam-ples are placed together within defined clusters, and how far away theclusters are from each other. This metric is commonly used to assessthe optimal number of clusters ‘k’ in K-means clustering [62]. Thevalue provided by the silhouettemetric is calculated for different valuesof ‘k’ (i.e. k=2, 3, 4,…, 9), and the value of ‘k’ thatmaximizes themetricis chosen.

We used the silhouette metric to obtain optimal values of ‘k’ for dif-ferent sample sizes out of our 240-sample set. We started by choosing12 samples at random to develop the estimate.We repeated the estima-tion by increasing the sample set by 12 samples at a time, until all 240sampleswere included. The processwas repeated 10 times. The averageoptimal values for ‘k’ for each sample size are plotted in Fig. A2. Thenumber of clusters stabilized at k= 3 after 120 samples. The approachcould be used for other data sets in a similar manner.

B.2. Decision Tree Classification

We applied DTC on a training data set with samples from the “CD28”class and samples from the AutoPlasma and PBS classes. We startedwith five (5) samples in each class, and increased the sample set ofeach class by 1 sample each time, till there were 80 samples in eachclass. We built the DTC model for each sample set. The accuracy of themodels is shown in Fig. A3(a). We observed that the accuracy levelswere inconsistent for small sample sizes. The accuracy level stabilizedfor sample sizes that exceed 20. Moreover, when the sample size ofeach class was below 20, the root nodes of the DTC models appearedto be picked randomly from all 11 cytokines. When the sample size ofeach class was above 20, almost all the root nodes of the DTC modelswere “IFN-γ”.

Next we performed the process of estimating sample size by usingsynthetic data. Synthetic data were generated based on the statisticsof two classes: “CD28” class (class 1) and samples from PBS andAutoPlasma (class 2). DTCwas applied on these two classes. The accura-cies for different sample sizes are shown in Fig. A3(b). The accuracystabilizes after 50–60 samples. Since we have two classes, the totalnumber of required samples is 100–120.

ample sizes, based on the silhouette metric.

Page 14: Analysis of cytokine release assay data using machine learning approaches

Fig. A3. (a) The accuracy of DTCmodels generated by different number of samples in eachclass using raw data. (b) The accuracy of DTC models generated by different number ofsamples in each class using synthetic data.

478 F. Xiong et al. / International Immunopharmacology 22 (2014) 465–479

References

[1] Common toxicity criteria (CTC) v2.0. NCI Cancer Therapy Evaluation Program:Bethesda, MD, USA; 1999.

[2] Walker MR, Makropoulos DA, Rachuthanandam R, Van Arsdell S, Bugelski PJ. Devel-opment of a human whole blood assay for prediction of cytokine release similar toanti-CD28 superagonists using multiplex cytokine and hierarchical cluster analysis.Int Immunopharmacol 2011;11(11):1697–705.

[3] Sandilands GP, Wilson M, Huser C, Jolly L, Sands WA, McSharry C. Were monocytesresponsible for initiating the cytokine storm in the TGN1412 clinical trial tragedy?Clin Exp Immunol 2010;162(3):516–27.

[4] stebbings R, Findley L, Edwards C, Eastwood D, Bird C, North D, et al. “Cytokinestorm” in the phase I trial of monoclonal antibody TGN1412: better understandingthe causes to improve preclinical testing of immunotherapeutics. J Immunol 2007;179(5):3325–31.

[5] Ferraan C, DyM, Sheehan K, Merite S, Schreiber R, Landais P, et al. Inter-mouse straindifferences in the in vivo anti-CD3 induced cytokine release. Clin Exp Immunol1991;86(3):537–43.

[6] Hsu DH, Shi JD, Homola M, Rowell TJ, Moran J, Levitt D, et al. A humanized anti-CD3antibody, HuM291, with low mitogenic activity, mediates complete and reversibleT-cell depletion in chimpanzees. Transplantation 1999;68(4):545–54.

[7] Hod EA, Cadwell CM, Liepkalns JS, Zimring JC, Sokol SA, Schirmer DA, et al. Cytokinestorm in a mouse model of IgG-mediated hemolytic transfusion reactions. Blood2008;112(3):891–4.

[8] Muller N, van den Brandt J, Odoardi F, Tischner D, Herath J, Flugel A, et al. A CD28superagonistic antibody elicits 2 functionally distinct waves of T cell activation inrats. J Clin Invest 2008;118(4):1405–16.

[9] Ansari AA, Mayne A, Hunt D, Sundstrom JB, Villinger F. TH1/TH2 subset analysis. I.Establishment of criteria for subset identification in PBMC samples from nonhumanprimates. J Med Primatol 1994;23(2–3):102–7.

[10] Eastwood D, Findley L, Poole S, Bird C, Wadhwa M, Moore M, et al. Monoclonal an-tibody TGN1412 trial failure explained by species differences in CD28 expression onCD4+ effector memory T-cells. Br J Pharmacol 2010;161(3):512–26.

[11] Nguyen DH, Hurtoda-Ziola N, Gagneux P, Varki A. Loss of Siglec expression on T lym-phocytes during human evolution. Proc Natl Acad Sci U S A 2006;103(20):7765–70.

[12] Ober RJ, Radu CG, Ghetie V, Ward ES. Differences in promiscuity for antibody–FcRninteractions across species: implications for therapeutic antibodies. Int Immunol2001;13(12):1551–9.

[13] Rogers KA, Scinicariello F, Attanasio R. IgG Fc receptor III homologues in nonhumanprimate species: genetic characterization and ligand interactions. J Immunol 2006;177(6):3848–56.

[14] Zuckermann FA. Extrathymic CD4/CD8 double positive T cells. Vet ImmunolImmunopathol 1999;72(1–2):55–66.

[15] Hastie TT, Robert, Friedman, Jerome. 14.3.12 hierarchical clustering. The Elements ofStatistical Learning. New York: Springer; 2009. p. 520–8.

[16] Bishop CM. Pattern Recognition and Machine Learning (Information Science andStatistics). Springer-Verlag New York, Inc.; 2006

[17] Quinlan JR. C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers Inc.;1993 302.

[18] Duda RO, Hart PE, Stork DG. Pattern Classification. 2nd Ed. Wiley-Interscience; 2000.[19] Chromy BA, Fodor IK, Montgomery NK, Luciw PA, McCutchen-Maloney SL. Cluster

analysis of host cytokine responses to biodefense pathogens in a whole bloodex vivo exposure model (WEEM). BMC Microbiol 2012;12(1):79.

[20] Smith LI. A tutorial on Principal Components Analysis; 2002.[21] Lvovschi V, Arnaud L, Parizot C, Freund Y, Juillien G, Ghillani-Dalbin P, et al. Cytokine

profiles in sepsis have limited relevance for stratifying patients in the emergency de-partment: a prospective observational study. PLoS ONE 2011;6(12):e28870.

[22] Helmy A, Antoniades CA, Guilfoyle MR, Carpernter KLH, Hutchinson PJ. Principalcomponent analysis of the cytokine and chemokine response to human traumaticbrain injury. PLoS ONE 2012;7(6):e39677.

[23] Wong H-L, Pfiffer RM, Fears TR, Vermeulen R, Ji S, Rabbin CS. Reproducibility andcorrelations of multiplex cytokine levels in asymptomatic persons. Cancer EpidemiolBiomark Prev 2008;17(12):3450–6.

[24] Desai D, May RD, Halder P, Shah S, Gupta S, Bafadhel M, et al. Cytokine profiling insevere asthma subphenotypes using factor and cluster analysis. American J Resp &Crit Care 2011;183:A3719.

[25] Heard BJ, Fritzler MJ, Wiley JP, McAllister J, Martin L, El-Gabalawy H, et al.Intraarticular and systemic inflammatory profiles may identify patients with osteo-arthritis. J Rheumatol 2013;40(8):1379–87.

[26] Kumar Y, Liang C, Bo Z, Rajapakse JC, Ooi EE, Tannenbaum SR. Serum proteome andcytokine analysis in a longitudinal cohort of adults with primary dengue infectionreveals predictive markers of DHF. PLoS Negl Trop Dis 2012;6(11):e1887.

[27] Chen X, Wang M, Zhang H. The use of classification trees for bioinformatics. WileyInterdiscip Rev Data Min Knowl Disc 2011;1(1):55–63.

[28] McKinney BA, Reif DM, Rock MT, Edwards KM, Kingsmore SF, Moore JH, et al. Cyto-kine expression patterns associated with systemic adverse events following small-pox immunization. J Infect Dis 2006;194(4):444–53.

[29] Patel JA, Nair S, Grady J, Revai K, Victor S, Brasier AR, et al. Systemic cytokine re-sponse profiles associated with respiratory virus-induced acute otitis media. PediatrInfect Dis J 2009;28(5):407–11.

[30] Bugelski PJ, Achuthanandam R, Capocasale RJ, Treacy G, Bouman-Thio E. Monoclonalantibody-induced cytokine-release syndrome. Expert Rev Clin Immunol 2009;5(5):499–521.

[31] Perveen F, Hussain Z. Use of statistical techniques in analysis of biological data. BasicRes J Agric Sci Rev 2012;1(1):01–10.

[32] Larranaga P, Calvo B, Santana R, Bielza C, Galdiano J, Inza I, et al. Machine learning inbioinformatics. Brief Bioinform 2006;7(1):86–112.

[33] Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH. The WEKA datamining software: an update. SIGKDD Explor Newsl 2009;11(1):10–8.

[34] Witten IH, Frank E, Hall MA. Data Mining: Practical Machine Learning Tools andTechniques. 3rd ed. Morgan Kaufmann; 2011.

[35] John G, Kohavi R, Pfleger K. Irrelevant features and the subset selection problem.International Conference on, Machine Learning; 1994.

[36] Bannwarth B, Richez C. Clinical safety of tocilizumab in rheumatoid arthritis. ExpertOpin Drug Saf 2011;10(1):123–31.

[37] Saez-Liorens X,MorenoM, Ramilo O, Sanchez PJ, Top FH, ConnorM. Safety and phar-macokinetics of palivizumab therapy in children hospitalized with respiratory syn-cytial virus infection. Pediatr Infect Dis J 2004;23(8):707–12.

[38] Fanale MA, Younes A. Monoclonal antibodies in the treatment of non-Hodgkin'slymphoma. Drugs 2007;67(3):333–50.

[39] Dhimolea E. Canakinumab. MAbs 2010;2(1):3–13.[40] Roufosse FE, Kahn J-M, Gleich GJ, Schwartz LB, Singh AD, Rosenwasser LJ, et al. Long-

term safety of mepolizumab for the treatment of hypereosinophilic syndromes. J Al-lergy Clin Immunol 2013;131(2):461–7 [.e5].

[41] Walker M, Makropoulos D, Achuthanandam R, Bugelski PJ. Recent advances in theunderstanding of drug-mediated infusion reactions and cytokine release syndrome.Curr Opin Drug Discov Dev 2010;13(1):124–35.

Page 15: Analysis of cytokine release assay data using machine learning approaches

479F. Xiong et al. / International Immunopharmacology 22 (2014) 465–479

[42] Ben-Dor A, Bruhu L, Friedman N, Nachman I, Schummer M, Yakhini Z. Tissue classi-fication with gene expression profiles. J Comput Biol 2000;7(3–4):559–83.

[43] Qin B, Xia Y, Li F. DTU: a decision tree for uncertain data. In: Theeramunkong T, et al,editors. Advances in Knowledge Discovery and Data Mining. Berlin Heidelberg:Springer; 2009. p. 4–15.

[44] Suntharalingam G, Perry MR, Ward S, Brett SJ, Castello-Cortes A, Brunner MD, et al.Cytokine storm in a phase 1 trial of the anti-CD28 monoclonal antibody TGN1412. NEngl J Med 2006;355(10):1018–28.

[45] Yiu HH, Graham AL, Stengel RF. Dynamics of a cytokine storm. PLoS ONE 2012;7(10):e45027.

[46] Park H, Li Z, Yang XO, Chang SH, Nurieva R, Wang Y-H, et al. A distinct lineage of CD4T cells regulates tissue inflammation by producing interleukin 17. Nat Immunol2005;6(11):1133–41.

[47] Shen F, RuddyMJ, Plamondon P, Gaffen SL. Cytokines link osteoblasts and inflamma-tion: microarray analysis of interleukin-17- and TNF-alpha-induced genes in bonecells. J Leukoc Biol 2005;77(3):388–99.

[48] Yao Z, FanslowWC, Seldin MF, Rousseau A-M, Painter SL, ComeauMR, et al. Herpes-virus Saimiri encodes a new cytokine, IL-17, which binds to a novel cytokine recep-tor. Immunity 1995;3(6):811–21.

[49] Gaffen SL. Structure and signalling in the IL-17 receptor family. Nat Rev Immunol2009;9(8):556–67.

[50] Fossiez F, Djossou O, Chomarat P, Flores-Romo L, Ait-Yahia S, Maat C, et al. T cellinterleukin-17 induces stromal cells to produce proinflammatory and hematopoieticcytokines. J Exp Med 1996;183(6):2593–603.

[51] Chabaud M, Fossiez F, Taupin J-L, Miossec P. Enhancing effect of IL-17 on IL-1-induced IL-6 and leukemia inhibitory factor production by rheumatoid arthritissynoviocytes and its regulation by Th2 cytokines. J Immunol 1998;161(1):409–14.

[52] RuddyMJ, Shen F, Smith JB, Sharma A, Gaffen SL. Interleukin-17 regulates expressionof the CXC chemokine LIX/CXCL5 in osteoblasts: implications for inflammation andneutrophil recruitment. J Leukoc Biol 2004;76(1):135–44.

[53] Gaffen SL. Recent advances in the IL-17 cytokine family. Curr Opin Immunol 2011;23(5):613–9.

[54] Miossec P, Kolls JK. Targeting IL-17 and TH17 cells in chronic inflammation. Nat RevDrug Discov 2012;11(10):763–76.

[55] Bachmann M, Horn K, Rudloff I, Goren I, Holdener M, Christen U, et al. Early produc-tion of IL-22 but not IL-17 by peripheral blood mononuclear cells exposed to liveBorrelia burgdorferi: the role of monocytes and interleukin-1. PLoS Pathog 2010;6(10):e1001144.

[56] Mohammadnia M, Solgi G, Ranjbar M, Shahrestani T, Edalat R, Razavi A, et al. Serumlevels of interleukin (IL)-10, IL-17, transforming growth factor (TGF)-beta1, andinterferon-gamma cytokines and expression levels of IL-10 and TGF-beta1 genesin renal allograft recipients after donor bone marrow cell infusion. Transplant Proc2011;43(2):495–9.

[57] Tilahun AY, Holz M, Wu T-T, David CS, Rajagopalan G. Interferon gamma-dependentintestinal pathology contributes to the lethality in bacterial superantigen-inducedtoxic shock syndrome. PLoS ONE 2011;6(2):e16764.

[58] Kohavi R. A study of cross-validation and bootstrap for accuracy estimation andmodel selection. Proceedings of the 14th International Joint Conference on ArtificialIntelligence. Montreal, Quebec, Canada: Morgan Kaufmann Publishers Inc.; 1995.p. 1137–43.

[59] Mukherjee S, Tamayo P, Rogers S, Rifkin R, Engel A, Campbell C, et al. Estimatingdataset size requirements for classifying DNA microarray data. J Comput Biol2003;10(2):119–42.

[60] Figueroa RL, Zeng-Treitler Q, Kandula S, Ngo LH. Predicting sample size required forclassification performance. BMC Med Informat Decis Mak 2012;12(1):8.

[61] Indira V, Vasanthakumari R, Sugumaran V. Minimum sample size determinationof vibration signals in machine learning approach to fault diagnosis using poweranalysis. Expert Syst Appl 2010;37(12):8650–8.

[62] Rousseeuw PJ. Silhouettes: a graphical aid to the interpretation and validation ofcluster analysis. J Comput Appl Math 1987;20:53–65.