analysis of lc ms data for characterizing the metabolic changes in...

8
Analysis of LC-MS Data for Characterizing the Metabolic Changes in Response to Radiation Rency S. Varghese, †,‡ Amrita Cheema, †,‡ Prabhdeep Cheema, § Marc Bourbeau, Leepika Tuli, Bin Zhou, Mira Jung, | Anatoly Dritschilo,* ,‡,| and Habtom W. Ressom* ,‡ Department of Oncology, Department of Radiation Medicine, Lombardi Comprehensive Cancer Center, Georgetown University Medical Center, Washington, DC, and Hartwick College, Oneonta, New York Received March 1, 2010 Abstract: Recent advances in mass spectrometry-based metabolomics have created the potential to measure the levels of hundreds of metabolites that are the end products of cellular regulatory processes. In this study, we investigate the metabolic changes in genetically engineered cell lines in response to radiation exposure. “Shrinkage t” statistic and partial least-squares-discrimi- nant analysis methods are utilized to identify peaks whose signal intensities were significantly altered by radiation. This is accomplished through pairwise comparison of radiation treated cell lines at various time points following radiation against untreated cell lines. A pathway analysis is performed following identification of the metabolites represented by the selected peaks. The results indicate an ATM regulated induction of major pathways in re- sponse to radiation treatment. Keywords: metabolomics LC-MS SIMCA “shrinkage t” statistic PLS-DA Introduction Genomics and proteomics have been used extensively to study radiation-induced molecular events. However, little is known about the alterations in the levels of small molecule metabolites in biological systems in response to radiation. Metabolomics, a methodology for measuring small-molecule metabolite profiles and fluxes in biological matrices, following genetic modification or exogenous changes, has become an important component of systems biology, complementing genomics, transcriptomics, and proteomics. 1,2 While still a relatively new member of the “omics” family, metabolomics is already finding applications in a wide variety of research fields including disease diagnosis, 3 drug toxicity studies, 4 and func- tional genomics. 5 Metabolomics is the large-scale analysis of metabolites and as such requires bioinformatics tools for data analysis, visualization, and integration. Typically hundreds to thousands of compounds or spectral features can be seen in a typical high throughput metabolomic assay. Multiple analytical techniques are being currently used for resolving and detecting thousands of metabolites each char- acterized by its own set of advantages and disadvantages. Choice of analytical technique is based on metabolites of interest and the subsequent methods of their analysis. Most commonly used platforms for metabolite profiling include mass spectrometry (MS), nuclear magnetic resonance (NMR), liquid chromatography coupled with electrochemical coulometric array detection (LCECA), Fourier transform infrared (FTIR) spectrometer, and imaging technologies. 4 Since no single method is best suited for detecting and identifying all metabo- lites (intra- and/or extra-cellular), a combination of techniques is needed to encompass most, if not all, of them. With advances in instrumentation, MS is the technique of choice for metabolomic studies. Small as well as large metabo- lites can be detected with high sensitivity and specificity. Following sample preparation and chromatographic separation, the ionized metabolites are fragmented and analyzed for their molecular composition and key substructures. 4 With tremen- dous progress being made in MS-based metabolomics, a variety of choices are available for chromatographic separation, mass analyzers, and use of spectral information for metabolite identification. Gas chromatography coupled with mass spectrometry (GC-MS) is an established technique well suited for thermally stable metabolites including compounds such as sugars, fatty acids, amino acids, aromatic amines, and sugar alcohols. 6 While volatile analytes can be directly separated and quantified by GC-MS, polar metabolites require chemical derivatization at the functional group to increase volatility and thermal stability prior to separation. 6 An alternative to GC-MS is liquid chro- matography coupled with mass spectrometry (LC-MS), which not only provides flexibility in terms of combination of different separation mechanisms but also precludes the need for me- tabolite derivatization. 6 Since LC is not limited to one separa- tion mode, it can be easily customized to a certain class of metabolites using reverse-phase, normal phase, size exclusion, hydrophilic interaction chromatography, etc. 7 It is applicable to compounds with varying polarity such as lipids, peptides, nucleotides, etc., making it more suitable to global metabolo- mics. Ultra performance liquid chromatography (UPLC) further enhances chromatographic resolution and peak capacity while enabling low detection limits. 7 Following chromatographic separation, a mass spectrometer separates ions based on their mass-to-charge ratio (m/z) in a * To whom correspondence should be addressed. H.W. Ressom, e-mail: [email protected]. A. Dritschilo, e-mail: [email protected]. These authors contributed equally to this manuscript. Department of Oncology, Georgetown University Medical Center. § Hartwick College. | Department of Radiation Medicine, Georgetown University Medical Center. 10.1021/pr100185b XXXX American Chemical Society Journal of Proteome Research XXXX, xxx, 000 A

Upload: others

Post on 13-Apr-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Analysis of LC MS Data for Characterizing the Metabolic Changes in …omics.georgetown.edu/downloads/pr100185b.pdf · 2015-05-20 · Analysis of LC-MS Data for Characterizing the

Analysis of LC-MS Data for Characterizing the Metabolic Changes

in Response to Radiation

Rency S. Varghese,†,‡ Amrita Cheema,†,‡ Prabhdeep Cheema,§ Marc Bourbeau,‡ Leepika Tuli,‡

Bin Zhou,‡ Mira Jung,| Anatoly Dritschilo,*,‡,| and Habtom W. Ressom*,‡

Department of Oncology, Department of Radiation Medicine, Lombardi Comprehensive Cancer Center,Georgetown University Medical Center, Washington, DC, and Hartwick College, Oneonta, New York

Received March 1, 2010

Abstract: Recent advances in mass spectrometry-basedmetabolomics have created the potential to measure thelevels of hundreds of metabolites that are the endproducts of cellular regulatory processes. In this study,we investigate the metabolic changes in geneticallyengineered cell lines in response to radiation exposure.“Shrinkage t” statistic and partial least-squares-discrimi-nant analysis methods are utilized to identify peaks whosesignal intensities were significantly altered by radiation.This is accomplished through pairwise comparison ofradiation treated cell lines at various time points followingradiation against untreated cell lines. A pathway analysisis performed following identification of the metabolitesrepresented by the selected peaks. The results indicatean ATM regulated induction of major pathways in re-sponse to radiation treatment.

Keywords: metabolomics • LC-MS • SIMCA • “shrinkaget” statistic • PLS-DA

Introduction

Genomics and proteomics have been used extensively tostudy radiation-induced molecular events. However, little isknown about the alterations in the levels of small moleculemetabolites in biological systems in response to radiation.Metabolomics, a methodology for measuring small-moleculemetabolite profiles and fluxes in biological matrices, followinggenetic modification or exogenous changes, has become animportant component of systems biology, complementinggenomics, transcriptomics, and proteomics.1,2 While still arelatively new member of the “omics” family, metabolomics isalready finding applications in a wide variety of research fieldsincluding disease diagnosis,3 drug toxicity studies,4 and func-tional genomics.5 Metabolomics is the large-scale analysis ofmetabolites and as such requires bioinformatics tools for dataanalysis, visualization, and integration. Typically hundreds tothousands of compounds or spectral features can be seen in atypical high throughput metabolomic assay.

Multiple analytical techniques are being currently used forresolving and detecting thousands of metabolites each char-acterized by its own set of advantages and disadvantages.Choice of analytical technique is based on metabolites ofinterest and the subsequent methods of their analysis. Mostcommonly used platforms for metabolite profiling include massspectrometry (MS), nuclear magnetic resonance (NMR), liquidchromatography coupled with electrochemical coulometricarray detection (LCECA), Fourier transform infrared (FTIR)spectrometer, and imaging technologies.4 Since no singlemethod is best suited for detecting and identifying all metabo-lites (intra- and/or extra-cellular), a combination of techniquesis needed to encompass most, if not all, of them.

With advances in instrumentation, MS is the technique ofchoice for metabolomic studies. Small as well as large metabo-lites can be detected with high sensitivity and specificity.Following sample preparation and chromatographic separation,the ionized metabolites are fragmented and analyzed for theirmolecular composition and key substructures.4 With tremen-dous progress being made in MS-based metabolomics, a varietyof choices are available for chromatographic separation, massanalyzers, and use of spectral information for metaboliteidentification.

Gas chromatography coupled with mass spectrometry(GC-MS) is an established technique well suited for thermallystable metabolites including compounds such as sugars, fattyacids, amino acids, aromatic amines, and sugar alcohols.6 Whilevolatile analytes can be directly separated and quantified byGC-MS, polar metabolites require chemical derivatization atthe functional group to increase volatility and thermal stabilityprior to separation.6 An alternative to GC-MS is liquid chro-matography coupled with mass spectrometry (LC-MS), whichnot only provides flexibility in terms of combination of differentseparation mechanisms but also precludes the need for me-tabolite derivatization.6 Since LC is not limited to one separa-tion mode, it can be easily customized to a certain class ofmetabolites using reverse-phase, normal phase, size exclusion,hydrophilic interaction chromatography, etc.7 It is applicableto compounds with varying polarity such as lipids, peptides,nucleotides, etc., making it more suitable to global metabolo-mics. Ultra performance liquid chromatography (UPLC) furtherenhances chromatographic resolution and peak capacity whileenabling low detection limits.7

Following chromatographic separation, a mass spectrometerseparates ions based on their mass-to-charge ratio (m/z) in a

* To whom correspondence should be addressed. H.W. Ressom, e-mail:[email protected]. A. Dritschilo, e-mail: [email protected].

† These authors contributed equally to this manuscript.‡ Department of Oncology, Georgetown University Medical Center.§ Hartwick College.| Department of Radiation Medicine, Georgetown University Medical

Center.

10.1021/pr100185b XXXX American Chemical Society Journal of Proteome Research XXXX, xxx, 000 A

Page 2: Analysis of LC MS Data for Characterizing the Metabolic Changes in …omics.georgetown.edu/downloads/pr100185b.pdf · 2015-05-20 · Analysis of LC-MS Data for Characterizing the

mass analyzer. For example, the hybrid quadropole-time-of-flight (Q-TOF), a high mass resolution (∼15 000) analyzer, filtersions in the quadropole based on mass trajectory. Ions ofinterest having amplitudes less than half the radius of the rodsare retained within the rods. The Q-TOF can select precursorions for fragmentation after traversing through the collision celland offers a mass accuracy in the 5-10 ppm range.10 Addition-ally, Q-TOF is easily maintained and can be efficiently coupledto LC.

Based on prevalent practices, metabolomic analysis byLC-MS can be categorized as either bottom-up (targeted) ortop-down (nontargeted) profiling.11 The bottom-up approachis a targeted profiling scheme where metabolites are identifiedprior to quantification. Experimental MS/MS data are com-pared to a spectral library of reference compounds; howeveraccuracy can only be ensured if the experimental MS/MS dataare collected and processed in a similar way to the ones in thereference library. Quantification can be achieved by comparingpeak intensities with internal standards of known concentra-tions. The limitation of this approach is the lack of large spectrallibraries currently available which can skew metabolite iden-tification and interpretation.11

In the top-down approach, peak intensities are analyzedwithout necessarily identifying all metabolites. With this ap-proach, the focus is to look at all metabolites at the same timerather than quantifying the ones that are already known. Sinceno identification is required at the initial level, only MS scansare utilized. The benefit of this holistic approach is balancedby the drawback of requiring further validation of componentsof interest through comparison of their fragmentation patternto pure compounds. Also, computational challenges exist inusing top-down metabolomic analysis of LC-MS data foridentification of metabolic changes between biological groups.The data analysis begins with data preprocessing proceduresincluding filtering, peak detection, peak alignment, and nor-malization. Filtering methods process the raw measurementsignal to remove effects like measurement noise or backgroundsignals. Peak detection is used to transform the continuousspectral data into discrete peak list where each peak isrepresented by a triplet of m/z, RT, and ion abundance (peakintensity). This transformation can bring two advantages: (1)part of the noise in the continuous data is removed; (2) thefeature space of the sample is reduced as most of the variationin data can be captured by a relatively small number of peaks.Peak alignment is necessary since the retention time of thesame ion can have significant drift in different runs. This driftis generally nonuniform across the retention time range andis a result of various instrumental, experimental and environ-mental factors which cannot be completely eliminated.15 Peakalignment methods cluster measurements across differentsamples and correct the drifts in retention time. One approachis to add reference compounds in the sample and use thesecompounds as landmarks to guide the alignment for theretention time.7 Peak alignment approaches without relying onreference compounds were also proposed.16–18 The alignmentis performed based on either total ion chromatograms (TICs)or the three-dimensional LC-MS maps. A critical assessmentof several popular alignment software tools have been previ-ously reported,19 where it was concluded that the performanceof various methods is largely based on user experience and theproper choice of parameters. Finally, normalization is neededto reduce the systematic bias and to make the peaks frommultiple runs comparable. Several software tools have been

developed to relieve the burden of LC-MS metabolomic datapreprocessing from users. These include MarkerLynx, Met-Align,23 XCMS,24 and MZmine.25 Other packages, some of themspecific for LC-MS-based metabolomics, have been re-viewed.12 A comparison on MZmine and XCMS for peakdetection and alignment have been reported.26 The comparisonsuggests that both software packages can be used with properparameter adjustments.

Following data preprocessing, statistical methods are typi-cally used to identify significant differences in metabolicchanges between distinct biological groups. Since preprocessedLC-MS and microarray data share common characteristics,statistical methods previously developed for microarray dataanalysis can be utilized for difference detection. For example,Gene Expression Dynamics Inspector,28 which was originallydeveloped for microarray data analysis, was used to identifythe subsets of dose-dependent metabolites in a study of cellularresponse to radiation.29 Other methods, such as SignificantAnalysis of Microarray (SAM) or Empirical Bayesian Analysisof Microarray (EBAM), have been incorporated into the me-tabolomics data analysis software MetaboAnalyst.27 Other toolssuch as MeltDB30 provide various methods including t test,principle component analysis (PCA), independent componentanalysis (ICA), clustering, etc. PCA and partial least-squares-discriminant analysis (PLS-DA) methods are widely used toreduce the dimension of metabolomics data and identifyrelevant peaks.31–34

Finally, the metabolites represented by the selected peaksare identified and verified. Database search is the most popularmetabolite identification method for nontargeted metabolo-mics study. For each ion of interest, its m/z value is used toretrieve those molecules in databases having monoisotopicweights within prespecified mass measurement accuracy,ionization mode, and modification of metabolites. Severaldatabases have been assembled to retrieve putative identifica-tions.35–37 The limitation of such mass-based database searchis that the identification result is rarely unique. It is shown thateven with an m/z accuracy of 1 ppm, it is still insufficient forunambiguous metabolite identification.38 Thus, it is importantthat the putative identifications are further verified by compar-ing their MS/MS spectra with those of authentic compounds.

In this paper, we present analytical and computationalmethods to characterize the metabolic changes that occur inresponse to radiation exposure by a top-down metabolomicanalysis. The goal is to understand the differential response toradiation exposure in two isogenic cell lines namely AT5BIVAand ATCL8 at the metabolic level owing to the differentialexpression status of ATM and in turn correlate this to the fateof the cells. ATCL8 contains an introduced ATM gene (mutatedin ataxia telangiectasia (A-T)).

A-T is a genetic disorder characterized by cerebellar degen-eration, immunodeficiency, defective recognition of damageto DNA, radiosensitivity, cell cycle checkpoint defects, andcancer predisposition.39 A-T cells in culture are also character-ized by slow growth rate, extreme radiosensitivity, chromo-somal instability, specific translocations, defective repair of asubtype of DNA strand breaks, and defective cell cycle check-point activation.39 The AT5BIVA cells used in this study werederived from a patient with A-T exhibiting extreme radiosen-sitivity.40 The ATCL8 cell line was derived by transfectingAT5BIVA cells with the wild type, full length “ATM” gene whichresulted in correction of radiosensitivity.

technical notes Varghese et al.

B Journal of Proteome Research • Vol. xxx, No. xx, XXXX

Page 3: Analysis of LC MS Data for Characterizing the Metabolic Changes in …omics.georgetown.edu/downloads/pr100185b.pdf · 2015-05-20 · Analysis of LC-MS Data for Characterizing the

We used XCMS to perform peak detection and peak align-ment. “Shrinkage t” statistic was used to select differentiallyexpressed metabolites between radiation treated and untreatedcell lines. A multivariate analysis was also performed using theSIMCA-P+ (Umetrics, Kinnelon, NJ) software. The metabolitescorresponding to the selected peaks were identified using mass-based databases. The Ingenuity Pathway Analysis (IPA) tool(Ingenuity Systems, Redwood City, CA) was utilized to searchfor the pathways regulated by the identified metabolites. Someof the selected metabolites are further validated by MS/MSanalysis. Figure 1 depicts an overview of a top-down metabo-lomic analysis we performed for the characterization of AT5BIVAand ATCL8 cells and identification of metabolic changes in thecells in response to radiation.

Materials and Methods

Samples. Ataxia telangiectasia fibroblast cells AT5BIVA wereobtained from the National Institute of General MedicalSciences (NIGMS). AT5BIVA have been previously characterizedas complementation Groups A, C, and D (3), and the othersare unknown. Cells were maintained in modified Eagle’smedium with 20% fetal bovine serum, 100 U/penicillin, and100 pg/mL streptomycin. For this study the cells were grownto 80% confluence and serum starved for 24 h and subjectedto radiation dose of 5Gy. The control cells were not givenradiation treatment. The cells were allowed to recover forpredetermined periods of time (30 min, 1, 2, 3, 6, and 24 h) by

incubating at 37 °C. The cells were then washed twice withchilled phosphate buffered saline (PBS) and harvested byscraping and centrifuging. Total cell count was carried out usinga hemocytometer and equal number of cells (107) from eachtime point was aliquoted for further processing for metabolo-mic analysis. The AT5BIVA cells used in this study were derivedfrom a patient with A-T exhibiting extreme radiation sensitiv-ity.40 The ATCL8 cell line was derived by transfecting AT5BIVAcells with the wild type, full length “ATM” gene which resultedin correction of radiation sensitivity. In this paper, we usedthese two isogenic cell lines (AT5BIVA and ATCL8) as a modelsystem for characterizing the metabolic changes that occur inresponse to radiation exposure in a time-dependent manner.

Sample Preparation and LC-MS Data Acquisition.Sample Preparation. The sample preparation for small mol-ecule metabolite profiling using LC-MS was carried out usingthe method described by Sana et al.41 Briefly, the cells werelysed using chilled water and sequential metabolite extractionwas done using methanol and chloroform. One micromolarDebrisoquine was added to the extraction buffer as internalstandard.

LC-MS Data Acquisition. An aliquot of each sample super-natant was deposited in an autosampler vial, and 5 µL wasseparated on a 50 mm ×2.1 mm Acquity 1.7 µm C18 columnusing an Acquity UPLC system online with Q-TOF Premier(Waters Corp, Milford, MA). The method for data acquisitionis described by Patterson et al.29 The gradient mobile phaseconsisted of 0.1% formic acid (A) and acetonitrile containing0.1% formic acid (B). A typical 10 min sample run consisted of0.5 min of 100% solvent A followed by an incremental increaseof solvent B up to 100% for the remaining 9.5 min. The flowrate was set to 0.6 mL/min. The eluent was introduced byelectrospray ionization into the Q-TOF Premier operating inpositive ionization mode. The capillary and sampling conevoltages were set to 3000 and 30 V, respectively. Source anddesolvation temperatures were set to 120 and 350 °C, respec-tively, and the cone and desolvation gas flows were set to 50.0and 650.0 L/h, respectively. To maintain mass accuracy,sulfadimethoxine ([M + H]+ ) 311.0814) at a concentration of500 pg/µL in 50% acetonitrile was used as a lock mass andinjected at a rate of 0.08 µL/min. For MS scanning, data wereacquired in centroid mode from 50 to 850 m/z, and for MS/MS, the collision energy was ramped from 5 to 35 V.

LC-MS Data Preprocessing. The raw data obtained fromthe UPLC-QTOF machine were converted into Network Com-mon Data Form (NetCDF) format using the MassLynx software(Waters Corp, Milford, MA).

We used XCMS to preprocess the LC-MS data. The XCMStool uses several algorithms to preprocess the data.24 The firststep is to filter and detect the peaks. In this step, the ionchromatograms are extracted and a model peak matched filteris applied. The peak detection algorithm is based on cuttingthe LC-MS data into slices, a fraction of a mass unit wide, andthen operating on those individual slices in the chromato-graphic time domain. After detecting peaks in individualsamples, the peaks are matched across samples to allowcalculation of retention time deviations and relative ionintensity comparison. This is accomplished using a groupmethod, which creates bins to group peaks in the mass domain.The peak matching algorithm in XCMS takes into account thetwo-dimensional anisotropic nature of LC-MS data. Thesegroups are then used to identify and correct drifts in retentiontime from run to run. For every group, the median retention

Figure 1. Overview of our top-down metabolomic analysis.

Metabolic Changes in Response to Radiation technical notes

Journal of Proteome Research • Vol. xxx, No. xx, XXXX C

Page 4: Analysis of LC MS Data for Characterizing the Metabolic Changes in …omics.georgetown.edu/downloads/pr100185b.pdf · 2015-05-20 · Analysis of LC-MS Data for Characterizing the

time and the deviation from median for every sample in thatgroup is calculated. To account for the missing peak data, amathematical function is used to approximate the differencebetween deviation and interpolate in sections where no peakgroups are present. Missing peaks can occur because they couldbe missed during the peak detection step or because an analytewas not present in a sample. This creates a data matrix of ionabundance for each pair of m/z and retention time.

The data matrix obtained from XCMS was normalized usingthe quantile normalization. Quantile normalization has beenapplied for normalization of microarray data.42 The methodassumes an underlying common distribution of intensitiesexists across samples.

Peak Selection. “Shrinkage t” Statistic Analysis. We usedthe “shrinkage t” statistic and correlation adjusted t-score toidentify peaks whose intensities are significantly different in apairwise comparison. “Shrinkage t” statistic is based on a noveland model-free shrinkage estimate of the variance vector acrosspeaks. It is a further development of the quasi-likelihoodapproach of Strimmer43 but with additional regularization. Itis developed in the framework of James-Stein-type analyticshrinkage.44 This approach offers a highly efficient means forregularized inference, both in the statistical and computationalsense. It is complementary to more well-known alternativessuch as Bayesian and penalized likelihood inference.45 It issimilar to the standard t statistic with the replacement of thesample variances by corresponding shrinkage estimates. Theseare derived in a distribution-free fashion and with little a prioriassumptions.

The cat (“correlation-adjusted t”) score is the product of thesquare root of the inverse correlation matrix with a vector oft-scores. The cat score describes the contribution of eachindividual feature in separating the two groups, after removingthe effect of all other features. The cat score is a natural

criterion to rank features in the presence of correlation.46 Ifthere is no correlation, the cat score reduces to the usualt-score.

We used the “st” package in R which implements the“shrinkage t” statistic and a shrinkage estimate of the cat score.For determining p-values and false discovery rate (FDR), a two-component mixture model is fitted to the observed cat scores.47

An example for ranking of metabolomics markers of prostatecancer using “shrinkage t” and cat score is shown in Zuber andStrimmer.46 A comparison of methods for assigning significanceto cat scores is also discussed. For computing FDR, we usedthe “fdrtool” software package in R.

Partial Least-Squares-Discriminant Analysis (PLS-DA). Weused the SIMCA-P+ software to generate PLS-DA two-compo-nent models. PLS-DA is a supervised method that uses multiplelinear regression technique to find the direction of maximum

Figure 2. S-plot of a pairwise comparison between 0 and 24 h for ATCL8 cell line. We selected the peaks marked with “0”.

Table 1. Number of Statistically Significant Peaks

cell line 0 vs 30 min 0 vs 1 h 0 vs 2 h 0 vs 3 h 0 vs 6 h 0 vs 24 h

Number of peaks selected by the “shrinkage t” statistic analysisATCL8 94 21 124 111 135 128AT5BIVA 287 253 292 400 340 405

Number of peaks selected by PLS-DAATCL8 91 39 124 84 142 151AT5BIVA 239 273 294 348 332 330

Number of peaks selected by both “shrinkage t” statistic andPLS-DA methods

ATCL8 47 3 65 33 72 77AT5BIVA 150 150 177 242 253 218

Number of peaks used for pathway analysis by IPAa

ATCL8 36 22 51 39 67 53AT5BIVA 57 53 89 79 56 88

a The peaks were selected by “shrinkage t” statistic and/or PLS-DA.Metabolites were identified by METLIN, MMCD, and HMDB.

technical notes Varghese et al.

D Journal of Proteome Research • Vol. xxx, No. xx, XXXX

Page 5: Analysis of LC MS Data for Characterizing the Metabolic Changes in …omics.georgetown.edu/downloads/pr100185b.pdf · 2015-05-20 · Analysis of LC-MS Data for Characterizing the

covariance between a data set and the class labels. It sharpensthe separation between groups of observations by rotating PCAcomponents such that a maximum separation among classesis obtained, and to understand which variables carry the classseparating information. PLS components are built by trying to

find a proper compromise between two purposes: describing theset of explanatory variables and predicting the response ones.

Mass-Based Metabolite Identification. We used the METLIN,Madison Metabolomics Consortium Database (MMCD), and theHuman Metabolite DataBase (HMDB) for mass-based metabolite

Figure 3. Heatmap of ion intensities of peaks selected by “Shrinkage t” statistic and/or PLS-DA in ATCL8 cells.

Figure 4. Heatmap of ion intensities of peaks selected by “Shrinkage t” statistic and/or PLS-DA in AT5BIVA cells.

Metabolic Changes in Response to Radiation technical notes

Journal of Proteome Research • Vol. xxx, No. xx, XXXX E

Page 6: Analysis of LC MS Data for Characterizing the Metabolic Changes in …omics.georgetown.edu/downloads/pr100185b.pdf · 2015-05-20 · Analysis of LC-MS Data for Characterizing the

identification. METLIN is a web-based database that has previ-ously been developed by the Scripps Research Institute to facilitatethe identification of metabolites using accurate mass data. Itincludes an annotated list of structural information for knownmetabolites. MMCD is a database maintained by the NationalMagnetic Resonance Facility at Madison, WI. HMDB focuses onmetabolites found in human body fluid. These databases havelinks to other public databases such as the Kyoto Encyclopediaof Genes and Genomes (KEGG) and PubChem.

Pathway Analysis. Following identification of metabolites byMETLIN, MMCD, and HMDB, we introduced the KEGG,HMDB, or PubChem IDs of the identified metabolites into theIPA. IPA-Metabolomics is a capability within IPA that extractsrich pathway information from metabolomics data. For a givendata set, IPA automatically generates the pathways that arerelated to those metabolites.

Results and Discussion

LC-MS metabolic profiles from the two isogenic cell lines,AT5BIVA and ATCL8 were analyzed before and after radiationat 30 min, 1 h, 2hrs, 3hrs, 6hrs and 24hrs. Each time pointconsisted of five samples. Thus, the data matrix consisted of35 samples including five controls that were not exposed toradiation. It uses two criteria (ratio and significance) to evaluatethe potential association of the biomarkers to these pathways.The ratio is calculated by dividing the number of candidatebiomarkers in a given pathway by the total number of biomol-ecules that make up the pathway. The significance of a givenpathway (p-value) is a measure of the likelihood that thepathway is associated with the list of candidate biomarkers bychance. The null hypothesis is that there is no association. Avery small p-value may be an indication that the pathway ismore likely to explain the observed phenotype. While the ratio

gives a good idea of percentage of candidate biomarkers in apathway, the significance addresses the question if the observedassociation is merely by chance.

The raw LC-MS data were converted into NetCDF formatusing MassLynx. Filtering and peak detection were performedinitially using the XCMS software package. To enable furtheranalysis and visualization of data, all m/z values were binnedto fixed m/z values with a bin size of 100 ppm. As a result, thedata were transformed into a two-dimensional matrix of ionabundances. Rows of the matrix represent the ions with specificRT and m/z values and columns represent the samples. A listof all peaks in each sample was compiled. After identifyingpeaks in individual samples, the peaks were aligned acrosssamples to allow calculation of retention time deviations andrelative ion intensity comparison. The peak-matching algorithmin XCMS that takes into account the two-dimensional, aniso-tropic nature of LC-MS data was used.

After preprocessing, 2729 peaks for AT5BIVA and 2719 peaksfor ATCL8 were detected. A quantile normalization method wasthen used to remove systematic bias. We performed statisticaldata analysis to identify differentially expressed metabolites inpairwise comparisons of 0 vs 30 min, 0 vs 1 h, 0 vs 2 h, 0 vs3 h, 0 vs 6 h, and 0 vs 24 h for each of the two cell lines.Specifically, we used the “shrinkage t” statistic and cat scoreto select the differentially expressed metabolites with p-valueless than 0.05 and FDR less than 10% for each pairwisecomparison. To select individual metabolites according to theirrelative effect on group separation, we ranked the metabolitesaccording to the magnitude of their respective ‘shrinkage t’statistic. For determining p-values and FDR values, a two-component mixture model was fitted to the observed catscores.47 The output list table is then ordered by p-value withthe most significant metabolites listed first.

In addition, we used the SIMCA-P+ software to fit a PLS-DA model for each pairwise comparison. The software gener-ates an S-plot, which is a scatter plot that can be utilized toexplain the variable influence on the model. By combiningcovariance (x-axis, p1) and correlation loading profiles (y-axis,p(corr)1), the s-plot can be utilized for the extraction of putativemetabolites (Figure 2). The upper right quadrant of the S-plotshows those components which are elevated in the controlgroup, while the lower left quadrant shows componentselevated in the treated group. The farther along the x-axis thegreater the contribution to the variance between the groups,while, the farther the y-axis the higher the reliability of theanalytical result. Figure 2 presents the S-plot derived from apairwise comparison (0 vs 24 h) for ATCL8 cell line by PLS-DA. For this comparison, we chose the peaks with |p1| > 0.035and |p(corr)1| > 0.6. The figure shows the selected peaks.

Table 1 summarizes the number of peaks that were selectedusing the “shrinkage t” statistic and PLS-DA. The table presentsthe peaks selected by these methods separately and the numberof overlapping peaks selected by both methods. A mass-baseddatabase search was performed using METLIN, MMCD, andHMDB databases for identification of the metabolites repre-sented by the peaks selected through “shrinkage t” statisticand/or PLS-DA methods. The table presents the number ofpeaks for which the corresponding metabolites were identifiedat each time point. Overall, the database search led to identi-fication of about 25% of the peaks. Specifically, 130 differentiallyabundant peaks were identified in ACTL8 cells across all timepoints combined, while 140 peaks were identified in AT5BIVAcells. These metabolites were used for the pathway analysis

Figure 5. Canonical pathways associated to the metabolites thatshowed significant changes in expression in response to radia-tion exposure of ATCL8 cells.

Figure 6. Canonical pathways associated to the metabolites thatshowed significant changes in expression in response to radia-tion exposure of AT5BIVA cells.

technical notes Varghese et al.

F Journal of Proteome Research • Vol. xxx, No. xx, XXXX

Page 7: Analysis of LC MS Data for Characterizing the Metabolic Changes in …omics.georgetown.edu/downloads/pr100185b.pdf · 2015-05-20 · Analysis of LC-MS Data for Characterizing the

using IPA. Figures 3 and 4 depict heatmaps of the ion intensitiesof these metabolites for ATCL8 and AT5BIVA cells, respectively,with the metabolites annotated by their most enriched canoni-cal pathways. Figures 5 and 6 present the top 12 canonicalpathways enriched in the metabolites identified for ATCL8 andAT5BIVA cells, respectively. The figures show remarkabledifferences in metabolic responses to radiation treatment underATM proficient (ATCL8) or deficient (AT5BIVA) conditions.While ATCL8 shows significant enrichment of metabolitesinvolved in purine metabolism, linoleic acid metabolism,pentose and glucurronate interconversions, fructose and man-nose metabolism, etc., AT5BIVA shows a predominance ofglycerophospholipd metabolism and phospholipid degradation.A number of metabolites involved in purine metabolism wereaffected due to radiation. This is consistent with our previousproteomics analysis where we found enzymes involved withpurine metabolism to be differentially changed in the two celllines.48 We observed that ATCL8 showed more enrichment ofmetabolites involved in purine metabolism than AT5BIVA. Takentogether, these results show that the presence of ATM in theATCL8 cells elicits a normal radiation response leading to inhibi-tion of cell growth and proliferation and increased DNA repair.Future mechanistic studies will be needed in order to correlatethe role of these pathways with respect to ATM functionality.

We verified some of the identified metabolites using MS/MS analysis. For example, we verified by MS/MS analysis theidentity of guanine (involved in purine metabolism), which wasup regulated nearly at all time points in ATCL8. Figure 7 depictsthe positive ion MS/MS fragmentation of the m/z 152.0569 fromATCL8 cell extracts (top panel) and the positive ion modefragmentation spectra for synthetic guanine (bottom panel).

Conclusion and Future Work

The study of metabolism at the global or “-omics” level,referred to as metabolomics, is a new but rapidly growing fieldthat has the potential to impact our understanding of molecularmechanisms of disease. In our study we used two isogenic celllines (AT5BIVA and ATCL8) as a model system for character-

izing the metabolic changes that occur in response to radiationexposure. The goal was to understand the underlying differ-ences in metabolic changes because of absence of ATM(AT5BIVA) or presence of ATM (ATCL8). “Shrinkage t” statisticand partial least-squares-discriminant analysis methods areperformed to select peaks with significant change in signalintensities between radiation-treated and untreated cell lines.The metabolites represented by the selected peaks are identi-fied through mass-based database search. A pathway analysisis done to identify the pathways these metabolites are involvedin. A limitation of this analysis is that only a quarter of the peaksselected were involved in the pathway analysis due to lack ofmetabolite identification by mass spectrometry.

The data analysis algorithms used in this study have success-fully revealed interesting molecular differences in the responseto radiation exposure in the presence and absence of ATM. Futurestudies will entail further validation of our findings with respectto the identification of the metabolites and their actual cellularlevels. Identifying metabolites is a difficult task. Efficient identi-fication tools are critical because of the large number of metabo-lites in any given metabolomics study. Extensive library ofexperimental MS/MS data is needed for a comprehensive char-acterization and identification of metabolites.

Acknowledgment. This work was supported in part bythe National Science Foundation Grant (IIS-0812246) andNational Institute of Health PO1 CA74175. We acknowledge thecontribution of Alfredo Velena for culturing the cell lines.

References(1) Nicholson, J. K.; Wilson, I. D. Opinion: understanding ‘global’

systems biology: metabonomics and the continuum of metabo-lism. Nat. Rev. Drug Discovery 2003, 2 (8), 668–76.

(2) Griffin, J. L. The Cinderella story of metabolic profiling: doesmetabolomics get to go to the functional genomics ball. Philos.Trans. R. Soc. Lond., B: Biol. Sci. 2006, 361 (1465), 147–61.

(3) Moolenaar, S. H.; Engelke, U. F.; Wevers, R. A. Proton nuclearmagnetic resonance spectroscopy of body fluids in the field ofinborn errors of metabolism. Ann. Clin. Biochem. 2003, 40 (Pt 1),16–24.

Figure 7. Determination of the chemical structure of an upregulated metabolite in ATCL8 by tandem mass spectrometry. Top panelshows the positive ion MS/MS fragmentation of the m/z 152.0569 from ATCL8 cell extracts. The bottom panel shows the positive ionmode fragmentation spectra for synthetic guanine.

Metabolic Changes in Response to Radiation technical notes

Journal of Proteome Research • Vol. xxx, No. xx, XXXX G

Page 8: Analysis of LC MS Data for Characterizing the Metabolic Changes in …omics.georgetown.edu/downloads/pr100185b.pdf · 2015-05-20 · Analysis of LC-MS Data for Characterizing the

(4) Kaddurah-Daouk, R.; Kristal, B. S.; Weinshilboum, R. M. Metabo-lomics: a global biochemical approach to drug response anddisease. Annu. Rev. Pharmacol. Toxicol. 2008, 48, 653–83.

(5) Bino, R. J.; Hall, R. D.; Fiehn, O.; Kopka, J.; Saito, K.; Draper, J.;Nikolau, B. J.; Mendes, P.; Roessner-Tunali, U.; Beale, M. H.;Trethewey, R. N.; Lange, B. M.; Wurtele, E. S.; Sumner, L. W.Potential of metabolomics as a functional genomics tool. TrendsPlant Sci. 2004, 9 (9), 418–25.

(6) Shulaev, V. Metabolomics technology and bioinformatics. Brief.Bioinform. 2006, 7 (2), 128–39.

(7) Dettmer, K.; Aronov, P. A.; Hammock, B. D. Mass spectrometry-based metabolomics. Mass Spectrom. Rev. 2007, 26 (1), 51–78.

(8) Yates, J. R.; Ruse, C. I.; Nakorchevsky, A. Proteomics by massspectrometry: approaches, advances, and applications. Annu. Rev.Biomed. Eng. 2009, 11, 49–79.

(9) Lu, W.; Bennett, B. D.; Rabinowitz, J. D. Analytical strategies forLC-MS-based targeted metabolomics. J. Chromatogr., B: Analyt.Technol. Biomed. Life Sci. 2008, 871 (2), 236–42.

(10) Dunn, W. B. Current trends and future requirements for the massspectrometric investigation of microbial, mammalian and plantmetabolomes. Phys. Biol. 2008, 5 (1), 11001.

(11) Wishart, D. S. Current progress in computational metabolomics.Brief. Bioinform. 2007, 8 (5), 279–93.

(12) Katajamaa, M.; Oresic, M. Data processing for mass spectrometry-based metabolomics. J. Chromatogr., A 2007, 1158 (1-2), 318–28.

(13) Du, P.; Kibbe, W. A.; Lin, S. M. Improved peak detection in massspectrum by incorporating continuous wavelet transform-basedpattern matching. Bioinformatics 2006, 22 (17), 2059–65.

(14) Yang, C.; He, Z.; Yu, W. Comparison of public peak detectionalgorithms for MALDI mass spectrometry data analysis. BMCBioinformatics 2009, 10 (1), 4.

(15) Mathias, V.; Li-Thiao-Te, S.; Hans-Michael, K.; Runxuan, Z.; Tero,A.; Benno, S. Alignment of LC-MS images, with applications tobiomarker discovery and protein identification. Proteomics 2008,8 (4), 650–72.

(16) Kong, X.; Reilly, C. A Bayesian approach to the alignment of massspectra. Bioinformatics 2009, 25 (24), 3213–20.

(17) Befekadu, G. K.; Tadesse, M. G.; Ressom, H. W. In A bayesian basedfunctional mixed-effects model for analysis of LC-MS data, Engi-neering in Medicine and Biology Society; Annual InternationalConference of the IEEE; IEEE Press Books: New York, 2009; pp43-6746.

(18) Hoffmann, N.; Stoye, J. ChromA: signal-based retention timealignment for chromatography-mass spectrometry data. Bioinfor-matics 2009, 25 (16), 2080–1.

(19) Lange, E.; Tautenhahn, R.; Neumann, S.; Gropl, C. Critical assess-ment of alignment procedures for LC-MS proteomics and me-tabolomics measurements. BMC Bioinform. 2008, 9, 375.

(20) Redestig, H.; Fukushima, A.; Stenlund, H.; Moritz, T.; Arita, M.;Saito, K.; Kusano, M. Compensation for Systematic Cross-Contribution Improves Normalization of Mass Spectrometry BasedMetabolomics Data. Anal. Chem. 2009, 81 (19), 7974–80.

(21) Sysi-Aho, M.; Katajamaa, M.; Yetukuri, L.; Oresic, M. Normalizationmethod for metabolomics data using optimal selection of multipleinternal standards. BMC Bioinform. 2007, 8 (1), 93.

(22) Warrack, B. M.; Hnatyshyn, S.; Ott, K.-H.; Reily, M. D.; Sanders,M.; Zhang, H.; Drexler, D. M. Normalization strategies for meta-bonomic analysis of urine samples. J. Chromatogr., B 2009, 877(5-6), 547–52.

(23) Tikunov, Y.; Lommen, A.; de Vos, C. H.; Verhoeven, H. A.; Bino,R. J.; Hall, R. D.; Bovy, A. G. A novel approach for nontargeteddata analysis for metabolomics. Large-scale profiling of tomatofruit volatiles. Plant Physiol. 2005, 139 (3), 1125–37.

(24) Smith, C. A.; Want, E. J.; O’Maille, G.; Abagyan, R.; Siuzdak, G.XCMS: processing mass spectrometry data for metabolite profilingusing nonlinear peak alignment, matching, and identification.Anal. Chem. 2006, 78 (3), 779–87.

(25) Katajamaa, M.; Oresic, M. Processing methods for differentialanalysis of LC/MS profile data. BMC Bioinform. 2005, 6, 179.

(26) Kind, T.; Tolstikov, V.; Fiehn, O.; Weiss, R. H. A comprehensiveurinary metabolomic approach for identifying kidney cancerr.Anal. Biochem. 2007, 363 (2), 185–95.

(27) Xia, J.; Psychogios, N.; Young, N.; Wishart, D. S. MetaboAnalyst: aweb server for metabolomic data analysis and interpretation.Nucleic Acids Res. 2009, 37 (Suppl 2), W652–60.

(28) Eichler, G. S.; Huang, S.; Ingber, D. E. Gene Expression DynamicsInspector (GEDI): for integrative analysis of expression profiles.Bioinformatics 2003, 19 (17), 2321–2.

(29) Patterson, A. D.; Li, H.; Eichler, G. S.; Krausz, K. W.; Weinstein,J. N.; Fornace, A. J.; Gonzalez, F. J.; Idle, J. R. UPLC-ESI-TOFMS-

Based Metabolomics and Gene Expression Dynamics InspectorSelf-Organizing Metabolomic Maps as Tools for Understandingthe Cellular Response to Ionizing Radiation. Anal. Chem. 2008,80 (3), 665–74.

(30) Neuweger, H.; Albaum, S. P.; Dondrup, M.; Persicke, M.; Watt, T.;Niehaus, K.; Stoye, J.; Goesmann, A. MeltDB: a software platformfor the analysis and integration of metabolomics experiment data.Bioinformatics 2008, 24 (23), 2726–32.

(31) Ramautar, R.; van der Plas, A. A.; Nevedomskaya, E.; Derks, R. J. E.;Somsen, G. W.; de Jong, G. J.; van Hilten, J. J.; Deelder, A. M.;Mayboroda, O. A. Explorative Analysis of Urine by CapillaryElectrophoresis-Mass Spectrometry in Chronic Patients with Com-plex Regional Pain Syndrome. J. Proteome Res. 2009, 8 (12), 5559–67.

(32) Bao, Y.; Zhao, T.; Wang, X.; Qiu, Y.; Su, M.; Jia, W.; Jia, W.Metabonomic Variations in the Drug-Treated Type 2 DiabetesMellitus Patients and Healthy Volunteers. J. Proteome Res. 2009,8 (4), 1623–30.

(33) Yin, P.; Zhao, X.; Li, Q.; Wang, J.; Li, J.; Xu, G. Metabonomics Studyof Intestinal Fistulas Based on Ultraperformance Liquid Chroma-tography Coupled with Q-TOF Mass Spectrometry (UPLC/Q-TOFMS). J. Proteome Res. 2006, 5 (9), 2135–43.

(34) Kim, K.; Aronov, P.; Zakharkin, S. O.; Anderson, D.; Perroud, B.;Thompson, I. M.; Weiss, R. H. Urine Metabolomics Analysis forKidney Cancer Detection and Biomarker Discovery. Mol CellProteomics 2009, 8 (3), 558–70.

(35) Wishart, D. S.; Tzur, D.; Knox, C.; Eisner, R.; Guo, A. C.; Young,N.; Cheng, D.; Jewell, K.; Arndt, D.; Sawhney, S.; Fung, C.; Nikolai,L.; Lewis, M.; Coutouly, M. A.; Forsythe, I.; Tang, P.; Shrivastava,S.; Jeroncic, K.; Stothard, P.; Amegbey, G.; Block, D.; Hau, D. D.;Wagner, J.; Miniaci, J.; Clements, M.; Gebremedhin, M.; Guo, N.;Zhang, Y.; Duggan, G. E.; Macinnis, G. D.; Weljie, A. M.; Dowla-tabadi, R.; Bamforth, F.; Clive, D.; Greiner, R.; Li, L.; Marrie, T.;Sykes, B. D.; Vogel, H. J.; Querengesser, L. HMDB: the HumanMetabolome Database. Nucleic Acids Res. 2007, 35 (Databaseissue), D521–6.

(36) Cui, Q.; Lewis, I. A.; Hegeman, A. D.; Anderson, M. E.; Li, J.; Schulte,C. F.; Westler, W. M.; Eghbalnia, H. R.; Sussman, M. R.; Markley,J. L. Metabolite identification via the Madison MetabolomicsConsortium Database. Nat. Biotechnol. 2008, 26 (2), 162–4.

(37) Smith, C. A.; Maille, G. O.; Want, E. J.; Qin, C.; Trauger, S. A.;Brandon, T. R.; Custodio, D. E.; Abagyan, R.; Siuzdak, G. METLIN:A Metabolite Mass Spectral Database. Ther. Drug Monit. 2005, 27(6), 747–51.

(38) Kind, T.; Fiehn, O. Metabolomics database annotations via queryof elemental compositions: Mass accuracy is insufficient even atless than 1 ppm. BMC Bioinform. 2006, 7, 234.

(39) Lavin, M. F.; Shiloh, Y. The genetic defect in ataxia-telangiectasia.Annu. Rev. Immunol. 1997, 15, 177–202.

(40) Jung, M.; Zhang, Y.; Lee, S.; Dritschilo, A. Correction of radiationsensitivity in ataxia telangiectasia cells by a truncated I kappaB-alpha. Science 1995, 268 (5217), 1619–21.

(41) Sana, T. R.; Waddell, K.; Fischer, S. M. A sample extraction andchromatographic strategy for increasing LC/MS detection coverageof the erythrocyte metabolome. J. Chromatogr., B: Analyt. Technol.Biomed. Life Sci. 2008, 871 (2), 314–21.

(42) Bolstad, B. M.; Irizarry, R. A.; Astrand, M.; Speed, T. P. Acomparison of normalization methods for high density oligonucle-otide array data based on variance and bias. Bioinformatics 2003,19 (2), 185–93.

(43) Strimmer, K. Modeling gene expression measurement error: aquasi-likelihood approach. BMC Bioinform. 2003, 4, 10.

(44) Schafer, J.; Strimmer, K. A shrinkage approach to large-scalecovariance matrix estimation and implications for functionalgenomics. Stat. Appl. Genet. Mol. Biol. 2005, 4, Article32.

(45) Opgen-Rhein, R.; Strimmer, K. Accurate ranking of differentiallyexpressed genes by a distribution-free shrinkage approach. StatAppl. Genet. Mol. Biol. 2007, 6, Article9.

(46) Zuber, V.; Strimmer, K. Gene ranking and biomarker discoveryunder correlation. Bioinformatics 2009, 25 (20), 2700–7.

(47) Efron, B. Microarrays, empirical Bayes, and the two-groups model.Statist. Sci. 2008, 23, 1–22.

(48) Hu, Z. Z.; Huang, H.; Cheema, A.; Jung, M.; Dritschilo, A.; Wu, C. H.Integrated Bioinformatics for Radiation-Induced Pathway Analysisfrom Proteomics and Microarray Data. J. Proteomics Bioinform.2008, 1 (2), 47–60.

PR100185B

technical notes Varghese et al.

H Journal of Proteome Research • Vol. xxx, No. xx, XXXX