predicting the hepatocarcinogenic potential of alkenylbenzene flavoring agents using toxicogenomics...

15
Predicting the hepatocarcinogenic potential of alkenylbenzene avoring agents using toxicogenomics and machine learning Scott S. Auerbach a , Ruchir R. Shah b , Deepak Mav b , Cynthia S. Smith a , Nigel J. Walker a , Molly K. Vallant a , Gary A. Boorman a , Richard D. Irwin a, a National Toxicology Program, National Institute of Environmental Health Sciences, NIH, RTP, NC 27709, USA b SRA International, RTP, NC 27709, USA abstract article info Article history: Received 4 September 2009 Revised 18 November 2009 Accepted 20 November 2009 Available online 11 December 2009 Keywords: Toxicogenomics Cancer Liver Prediction Alkenylbenzene Rat Identication of carcinogenic activity is the primary goal of the 2-year bioassay. The expense of these studies limits the number of chemicals that can be studied and therefore chemicals need to be prioritized based on a variety of parameters. We have developed an ensemble of support vector machine classication models based on male F344 rat liver gene expression following 2, 14 or 90 days of exposure to a collection of hepatocarcinogens (aatoxin B1, 1-amino-2,4-dibromoanthraquinone, N-nitrosodimethylamine, methyleu- genol) and non-hepatocarcinogens (acetaminophen, ascorbic acid, tryptophan). Seven models were generated based on individual exposure durations (2, 14 or 90 days) or a combination of exposures (2 + 14, 2 + 90, 14 + 90 and 2 + 14 + 90 days). All sets of data, with the exception of one yielded models with 0% cross-validation error. Independent validation of the models was performed using expression data from the liver of rats exposed at 2 dose levels to a collection of alkenylbenzene avoring agents. Depending on the model used and the exposure duration of the test data, independent validation error rates ranged from 47% to 10%. The variable with the most notable effect on independent validation accuracy was exposure duration of the alkenylbenzene test data. All models generally exhibited improved performance as the exposure duration of the alkenylbenzene data increased. The models differentiated between hepatocarcinogenic (estragole and safrole) and non-hepatocarcinogenic (anethole, eugenol and isoeugenol) alkenylbenzenes previously studied in a carcinogenicity bioassay. In the case of safrole the models correctly differentiated between carcinogenic and non-carcinogenic dose levels. The models predict that two alkenylbenzenes not previously assessed in a carcinogenicity bioassay, myristicin and isosafrole, would be weakly hepatocarcinogenic if studied at a dose level of 2 mmol/kg bw/day for 2 years in male F344 rats; therefore suggesting that these chemicals should be a higher priority relative to other untested alkenylbenzenes for evaluation in the carcinogenicity bioassay. The results of the study indicate that gene expression-based predictive models are an effective tool for identifying hepatocarcinogens. Furthermore, we nd that exposure duration is a critical variable in the success or failure of such an approach, particularly when evaluating chemicals with unknown carcinogenic potency. Published by Elsevier Inc. Introduction The purpose of toxicity/carcinogenicity testing in rodents is to identify agents that may pose a carcinogenic risk to humans (Bucher and Portier, 2004). The current protocol used by the National Toxicology Program involves exposing a total of 800 rodents to 4 different levels of a test article for a duration of 24 months. Following the 2-year exposure period, a comprehensive histopathological assessment is performed on over 40 organs/tissues in all animals used in the study. All lesions identied by the primary pathologist are extensively reviewed by a panel of veterinary pathologists. Any calls related to carcinogenic activity are then evaluated by an independent panel of both private and public sector scientists with expertise in the area of toxicology (Chhabra et al., 1990). The high sensitivity of these studies makes them the current standard for identifying chemicals that pose a carcinogenic risk for humans (Huff, 1998). The high sensitivity of the NTP carcinogenicity bioassay comes with signicant cost in terms of money, time, animals and chemical. A 2-year bioassay can cost several millions of dollars and take up to 5 years to complete. In approximately 30 years since the inception of the bioassay only 1485 chemicals have been assessed (Gold et al., 2005). Currently there are over 75,000 chemicals on the US EPA's Toxic Substances Control Act Inventory (USEPA, 2004), an estimated 30,000 chemicals in widespread commercial use in the United States and Canada (Muir and Howard, 2006) and over 140,000 substances registered by the REACH (REACH, 2008). Only a small fraction of these agents have undergone carcinogenicity testing (Judson et al., 2009). Toxicology and Applied Pharmacology 243 (2010) 300314 Abbreviations: RFE, (recursive feature elimination); support, vector machine (SVM). Corresponding author. Fax: +1 919 541 4255. E-mail address: [email protected] (R.D. Irwin). 0041-008X/$ see front matter. Published by Elsevier Inc. doi:10.1016/j.taap.2009.11.021 Contents lists available at ScienceDirect Toxicology and Applied Pharmacology journal homepage: www.elsevier.com/locate/ytaap

Upload: scott-s-auerbach

Post on 12-Sep-2016

224 views

Category:

Documents


7 download

TRANSCRIPT

Page 1: Predicting the hepatocarcinogenic potential of alkenylbenzene flavoring agents using toxicogenomics and machine learning

Toxicology and Applied Pharmacology 243 (2010) 300–314

Contents lists available at ScienceDirect

Toxicology and Applied Pharmacology

j ourna l homepage: www.e lsev ie r.com/ locate /ytaap

Predicting the hepatocarcinogenic potential of alkenylbenzene flavoring agents usingtoxicogenomics and machine learning

Scott S. Auerbach a, Ruchir R. Shah b, Deepak Mav b, Cynthia S. Smith a, Nigel J. Walker a, Molly K. Vallant a,Gary A. Boorman a, Richard D. Irwin a,⁎a National Toxicology Program, National Institute of Environmental Health Sciences, NIH, RTP, NC 27709, USAb SRA International, RTP, NC 27709, USA

Abbreviations: RFE, (recursive feature elimination); s⁎ Corresponding author. Fax: +1 919 541 4255.

E-mail address: [email protected] (R.D. Irwin).

0041-008X/$ – see front matter. Published by Elsevierdoi:10.1016/j.taap.2009.11.021

a b s t r a c t

a r t i c l e i n f o

Article history:Received 4 September 2009Revised 18 November 2009Accepted 20 November 2009Available online 11 December 2009

Keywords:ToxicogenomicsCancerLiverPredictionAlkenylbenzeneRat

Identification of carcinogenic activity is the primary goal of the 2-year bioassay. The expense of these studieslimits the number of chemicals that can be studied and therefore chemicals need to be prioritized based on avariety of parameters. We have developed an ensemble of support vector machine classificationmodels basedon male F344 rat liver gene expression following 2, 14 or 90 days of exposure to a collection ofhepatocarcinogens (aflatoxin B1, 1-amino-2,4-dibromoanthraquinone, N-nitrosodimethylamine, methyleu-genol) andnon-hepatocarcinogens (acetaminophen, ascorbic acid, tryptophan). Sevenmodelswere generatedbased on individual exposure durations (2, 14 or 90 days) or a combination of exposures (2+14, 2+90, 14+90 and 2+14+90 days). All sets of data, with the exception of one yielded models with 0% cross-validationerror. Independent validation of the models was performed using expression data from the liver of ratsexposed at 2 dose levels to a collection of alkenylbenzene flavoring agents. Depending on the model used andthe exposure duration of the test data, independent validation error rates ranged from 47% to 10%. The variablewith themost notable effect on independent validation accuracywas exposure duration of the alkenylbenzenetest data. All models generally exhibited improved performance as the exposure duration of thealkenylbenzene data increased. The models differentiated between hepatocarcinogenic (estragole andsafrole) and non-hepatocarcinogenic (anethole, eugenol and isoeugenol) alkenylbenzenes previously studiedin a carcinogenicity bioassay. In the case of safrole the models correctly differentiated between carcinogenicand non-carcinogenic dose levels. The models predict that two alkenylbenzenes not previously assessed in acarcinogenicity bioassay, myristicin and isosafrole, would be weakly hepatocarcinogenic if studied at a doselevel of 2mmol/kg bw/day for 2 years inmale F344 rats; therefore suggesting that these chemicals should be ahigher priority relative to other untested alkenylbenzenes for evaluation in the carcinogenicity bioassay. Theresults of the study indicate that gene expression-based predictive models are an effective tool for identifyinghepatocarcinogens. Furthermore, wefind that exposure duration is a critical variable in the success or failure ofsuch an approach, particularly when evaluating chemicals with unknown carcinogenic potency.

Published by Elsevier Inc.

Introduction

The purpose of toxicity/carcinogenicity testing in rodents is toidentify agents that may pose a carcinogenic risk to humans (Bucherand Portier, 2004). The current protocol used by the NationalToxicology Program involves exposing a total of 800 rodents to 4different levels of a test article for a duration of 24 months. Followingthe 2-year exposure period, a comprehensive histopathologicalassessment is performed on over 40 organs/tissues in all animalsused in the study. All lesions identified by the primary pathologist areextensively reviewed by a panel of veterinary pathologists. Any calls

upport, vector machine (SVM).

Inc.

related to carcinogenic activity are then evaluated by an independentpanel of both private and public sector scientists with expertise in thearea of toxicology (Chhabra et al., 1990). The high sensitivity of thesestudies makes them the current standard for identifying chemicalsthat pose a carcinogenic risk for humans (Huff, 1998).

The high sensitivity of the NTP carcinogenicity bioassay comeswith significant cost in terms of money, time, animals and chemical. A2-year bioassay can cost several millions of dollars and take up to 5years to complete. In approximately 30 years since the inception ofthe bioassay only 1485 chemicals have been assessed (Gold et al.,2005). Currently there are over 75,000 chemicals on the US EPA'sToxic Substances Control Act Inventory (USEPA, 2004), an estimated30,000 chemicals in widespread commercial use in the United Statesand Canada (Muir and Howard, 2006) and over 140,000 substancesregistered by the REACH (REACH, 2008). Only a small fraction of theseagents have undergone carcinogenicity testing (Judson et al., 2009).

Page 2: Predicting the hepatocarcinogenic potential of alkenylbenzene flavoring agents using toxicogenomics and machine learning

301S.S. Auerbach et al. / Toxicology and Applied Pharmacology 243 (2010) 300–314

Characterizing the carcinogenic activity of each of the untestedchemicals using the 2-year bioassay is not a viable approach especiallyconsidering that some of the chemicals will need to be assessedindividually and as mixtures. In light of these issues clearly moreefficient methods need to be developed to identify chemicals thatpose a carcinogenic risk.

Due to the combination of a broad landscape of untested chemicalsin commerce and the current limitations of the bioassay there has beenno shortage of attempts to identify methods that will allow for morerapid identification of potential human carcinogens. Efforts haveranged from purely computational SAR analysis (Benigni et al., 2007)to a range of biological approaches including various bacterial(Tennant et al., 1987) andmammalian cell-based in vitro genotoxicityassays (Isfort et al., 1996; Kerckaert et al., 1996), in vivo genotoxicity(Sasaki et al., 2000; Parry et al., 2002), mechanistic assessments basedon receptor activation (Van den Berg et al., 1998), assessment ofpreneoplastic lesions (Elcombe et al., 2002; Ito et al., 2003; Allen et al.,2004), use of geneticallymodified animals (Eastin et al., 2001; Storer etal., 2001; Usui et al., 2001; van Kreijl et al., 2001; Lambert et al., 2005),and approaches that combine a number of these technologies (Benigniand Zito, 2004; Cohen, 2004). Most have either fallen short in theirpredictive ability or have not undergone extensive validation due tothe expense of such a project (Jacobs, 2005). Some of the reasons forfailure of the predictive strategies range from inadequate training datain the case of in silico predictive models to the inadequacy of the testsystems to address certain modes of action (e.g. transgenic mice)(Bucher and Portier, 2004).

Genomic technology allows a researcher to query, in exquisitedetail, molecular level changes in biology. When used in combinationwith machine learning, genomics has excelled in determining cancerdiagnosis, prognosis and chemotherapeutic response (Garman et al.,2007). Recently, a number of groups have taken advantage of thistechnology to build carcinogenicity prediction models from geneexpression data (Kramer et al., 2004; Nie et al., 2006; Fielden et al.,2007; Thomas et al., 2007; Ellinger-Ziegelbauer et al., 2008; Uehara etal., 2008). Most of these studies have focused on liver because it iscommon target for chemical-induced carcinogenic transformationand predictive models have the potential to accelerate carcinogenicityhazard characterization. For comparison purposes the details of thesestudies are reviewed in the SupplementaryIntroduction. Overall thesestudies demonstrate the success and utility of such an approach. Inaddition, the studies that have incorporated a consideration ofhepatocarcinogenic mode of action indicate that it is possible todifferentiate between genotoxic and non-genotoxic modes of actionwhich is useful for determining human cancer risk and is not currentlypossible with the traditional in vivo toxicity/carcinogenicity assess-ment methods.

The toxic effects of a chemical are both a function of dose andduration of exposure (Rozman, 2000). Most of the genomic studiesdescribed above focused on identifying liver carcinogenicity signa-tures from animals exposed to chemical for 28 days or less, a durationthat is referred to in traditional toxicology as subacute. Wehypothesized that exposures up to 90 days would accentuate theexpression of genes related to carcinogenic activity and thereforeallow themodels to achieve a higher degree of certainty whenmakingpredictions. Furthermore, we reasoned that longer exposure dura-tions would limit the influence of mode of action genes and allow forbetter identification of predictive genes with biology related toprocesses involved in the formation of neoplasms that are typicallymanifest secondary to the primary toxicity. We feel that the idea of ashared precancerous biology (that is independent of a specificchemical challenge) is not unreasonable since the process of cancermanifestation is a continuum and most cancers share a degree ofuniversal biology that is manifest in their gene expression (Whitfieldet al., 2006). To test this hypothesis we exposed male F344 ratsto a collection of structurally diverse hepatocarcinogens and non-

hepatocarcinogens for 2, 14 or 90 days (carcinogen vs. non-carcinogen study (CVNC), performed genome-wide hepatic geneexpression using Agilent 4X44K microarrays (41,000+ rat genes andtranscripts) and created models that identified chemicals withhepatocarcinogenic activity. We then independently validated thesemodels using hepatic expression data derived from rats exposed to acollection of alkenylbenzenes (flavoring agent study (FA)). Alkenyl-benzenes are food additives with a range of hepatocarcinogenicproperties (Miller et al., 1983). In the course of these studies wespecifically address the effect of dose level and exposure duration onclassification accuracy, in addition to evaluating themolecular biologythat is associated with carcinogen exposure.

Materials and methods

Chemicals used for dosing. All chemicals administered to rats in thisstudy are listed in Table 1. Pre-start chemistry assessments indicatedthat all chemicals were at least 98% pure. All feed formulationsunderwent homogeneity assessments. All dose formulations werewithin 10% of the target concentration throughout the study.

Chemical diversity analysis. Leadscope Enterprise 2.4.15-6 (Lead-scope Inc., Columbus, OH) was used to evaluate the structuralsimilarity between the 13 chemicals used in the study. The Tanimotodistance was calculated using the chemical figure print derived fromthe 2-dimensionalmolfile of each chemical. Leadscopeuses a 27,000+feature set to derive a chemical figure print. This feature set is muchlarger than is used by most applications and causes the Tanimotodistances to be smaller relative to those calculated by other structuralanalysis programs.

Animal treatments and tissue collection. All animal studies wereperformed at Battelle (Columbus, OH) under the direction of MiltonHejtmancik, Ph.D., D.A.B.T. and Laurene Fomby, D.V.M., Ph.D., D.A.B.T.Male F344/N rats approximately 8 to 10weeks oldwere obtained fromTaconic Farms (Germantown, NY). After a 10 to 14-day quarantine/acclimation period, the rats were randomly assigned to treatment andcontrol groups. The light/dark cycle was 12 h on/12 h off with thelights coming on at 6 AM and going off at 6 PM. Rats were housed 3 percage with ad libitum access to NTP 2000 feed and city water. In theCVNC study groups of 24 male rats were administered chemical eitherin feed, in drinkingwater, or by gavage (Table 1). On days 3 (2 nights ofexposure), 15 (14 nights of exposure) and 91, (90 nights of exposure),6 rats from each chemical group plus 6 rats from the appropriatecontrol group were necropsied between 8 and 10 AM. Detailedinformation on the number of animals in each treatment group can befound in Table 1. In the alkenylbenzeneflavoring agent (FA) study eachchemical was given at 2 dose levels: 0.2 mmol/kg/day (low dose (L))and 2.0 mmol/kg/day (high dose (H)) by corn oil gavage 5 days perweek. Each dose group and the appropriate control consisted of sixanimals in the 2 and 14-day studies and 10 animals in the 90-daystudy. Necropsies were performed 24 h after the last administereddose. At necropsy rats were anesthetized with isofluorene, blooddrawn for clinical chemistry via cardiac puncture, the left and medianlobes of the liver removed, and animals euthanized by exsanguination.A cross-section of each lobe was obtained for histopathology. Theremainder of the left and median lobes of the liver were mincedquickly into very small pieces and dropped in liquid nitrogen within 4min of euthanasia and stored at −80 °C.

Hematology and clinical chemistry. Blood collected from all animalsin both studies was analyzed for routine hematology and clinicalchemistry markers including erythrocyte count, mean corpuscularvolume, hemoglobin, packed cell volume, mean corpuscular hemo-globin, mean corpuscular hemoglobin concentration, erythrocytemorphologic assessment, leukocyte count, leukocyte differential,

Page 3: Predicting the hepatocarcinogenic potential of alkenylbenzene flavoring agents using toxicogenomics and machine learning

Table 1Chemicals.

Chemical Classa Abbreviationb Formulation Dose(mg/kg/day)c

TD50 Liverd

(mg/kg bw/day)Ames Assaye GRAS

statusfAnimals per dose group

2-dayexposureg

14-dayexposure

90-dayexposure

CVNC study(training data)h

5000 ppm1-amino-2,4-dibromoanthraquinone

Hepatocarcinogen DBAQ Dose feed 409 40.1 mg/kg/day Positive (single strain) No 6 6 6

1 ppm aflatoxin B1 Hepatocarcinogen AFB1 Dose feed 0.1 932 ng/kg/day Positive No 6 6 6150 mg/kg methyleugenol Hepatocarcinogen MEG MCi gavage 150 21.1 mg/kg/day Negative No 6 6 65 ppm N-nitrosodimethylamine Hepatocarcinogen NMDA Drinking water 0.4 42.4 ug/kg/day Positive No 6 6 63000 ppm acetaminophen Non-hepatocarcinogen APAP Dose feed 242 Non-hepatocarcinogenic Negative Yes 6 6 625,000 ppm tryptophan Non-hepatocarcinogen TYP Dose feed 1994 Non-hepatocarcinogenic Negative Yes 6 6 625,000 ppm vitamin C Non-hepatocarcinogen VTC Dose feed 409.4 Non-hepatocarcinogenic Negative Yes 6 6 6Dosed feed control Non-hepatocarcinogen DFDC X X X NA 12 12 12Gavage control Non-hepatocarcinogen GAVC X X X NA 6 6 6Drinking water control Non-hepatocarcinogen DWC X X X NA 6 6 6

FA study(test data)j

Methyleugenol Hepatocarcinogen MEG-L Corn oil gavage 35.6 21.1 mg/kg/day Negative No 6 6 10Methyleugenol Hepatocarcinogen MEG-H Corn oil gavage 356 21.1 mg/kg/day Negative No 6 6 10Estragole Hepatocarcinogen ESG-L Corn oil gavage 29.6 NAk Negative Yes 6 6 10Estragole Hepatocarcinogen ESG-H Corn oil gavage 296 NAk Negative Yes 6 6 10Safrole Hepatocarcinogenl SAF-L Corn oil gavage 32.4 539 mg/kg/day Negative No 6 6 10Safrole Hepatocarcinogen SAF-H Corn oil gavage 324 539 mg/kg/day Negative No 6 6 10Eugenol Non-hepatocarcinogen EGN-L Corn oil gavage 32.8 NA Negative Yes 6 6 10Eugenol Non-hepatocarcinogen EGN-H Corn oil gavage 328 NA Negative Yes 6 6 10Isoeugenol Non-hepatocarcinogen IGN-L Corn oil gavage 32.8 NA Negative Yes 6 6 10Isoeugenol Non-hepatocarcinogen IGN-H Corn oil gavage 328 NA Negative Yes 6 6 10Anethole Non-hepatocarcinogenm ANT-L Corn oil gavage 29.6 NA Negative Yes 6 6 10Anethole Non-hepatocarcinogenm ANT-H Corn oil gavage 296 NA Negative Yes 6 6 10Vehicle control Non-hepatocarcinogen TC X X X NA NA 12 12 18Untreated control Non-hepatocarcinogen UTC X X X NA NA 6 6 10Myristicin Untested MYR-L Corn oil gavage 38.4 NA Negative No 6 6 9Myristicin Untested MYR-H Corn oil gavage 384 NA Negative No 6 6 10Isosafrole Untested ISF-L Corn oil gavage 32.4 NA Negative No 6 6 10Isosafrole Untested ISF-H Corn oil gavage 324 NA Negative No 6 6 10

a Label applied to the samples for the purposes of training and validating models. For purposes of evaluating the training and test data control treatments were labeled as non-hepatocarcinogen and treated as separate treatments.All treatments labeled.

b L and H indicate low and high dose, respectively.c Estimated dose based on food or water consumption for chemicals administered in food or water.d TD50 is an estimated dose (NA=not available) of chemical that will produce hepatic tumors in 50% of animals by 2 years. Only data derived from rat studies are presented in the table. Data are taken from CPDB=Carcinogen Potency

Database (http://potency.berkeley.edu/).e Ames studies performed by the National Toxicology Program.f Generally recognized as safe (http://www.accessdata.fda.gov/scripts/fcn/fcnNavigation.cfm?rpt=eafusListing).g NCVC study: 2-day exposure=2 days of treatment and terminate on the 3rd day of the study; FA study: 2-day exposure=1 gavage dose and terminate 24 h later.h Treatments from the carcinogen vs. non-carcinogen study that were used to train the models described in article.i Methylcellulose.j Treatments from the flavoring agent study that were used to independently validate the models described in the article.k Mouse liver TD50 is 51.8.l Not included in independent validation accuracy determination due to uncertainty about the carcinogenic effect at this dose level.m Although not hepatocarcinogenic in males rats at doses administered in this study these samples were not used to determine independent validation accuracy (Table 4).

302S.S.A

uerbachet

al./Toxicology

andApplied

Pharmacology

243(2010)

300–314

Page 4: Predicting the hepatocarcinogenic potential of alkenylbenzene flavoring agents using toxicogenomics and machine learning

303S.S. Auerbach et al. / Toxicology and Applied Pharmacology 243 (2010) 300–314

reticulocyte count, platelet count and morphologic assessment,sorbitol dehydrogenase (SOH), alkaline phosphatase (ALP), creatinekinase (CK), creatinine, total protein, albumin, urea nitrogen (BUN),total bile acids, alanine aminotransferase (ALT), cholesterol, triglycer-ide and glucose.

Histopathology. Histopathological examination was limited to theliver. When the liver tissues were removed they were placed intospecimen containers filled with 10% neutral buffered formalin forfixation. Samples were trimmed, dehydrated, cleared, and infiltratedwith paraffin in a tissue processor. Tissues were embedded using theTissue-Tek system, sectioned to approximately 5 μm on a microtome,and mounted on glass slides. After staining the slides withhematoxylin and eosin (H&E) a section of the median and left hepaticlobe from all study animals was evaluated microscopically.

Statistical analysis of histopathology and clinical chemistry. Histo-pathological changes were evaluated using the Fischer exact test.Hematology and clinical chemistry typically have skewed dis-tributions, therefore significant changes (pb 0.05) in these parameterswere determined using the nonparametric multiple comparisonmethod developed by Shirley and Dunn (Shirley, 1977).

RNA extraction, microarray hybridization, feature extraction and datanormalization. Procedures described in this section were carriedout at Icoria, Inc (Research Triangle Park, NC) under the direction ofEdward Lobenhofer, Ph.D. RNA was extracted using Qiagen RNeasyMidi Kits (Valencia, CA) in accordance with the manufacturer'sprotocol. Approximately 130 to 150 mg of liver tissue was used foreach RNA extraction. After extraction the RNAs were resuspended inRNAse-free water and concentrations were measured using aspectrophotometer. Integrity of RNA samples from both studies wasdetermined by the size distribution of the 18 S and 28 S ribosomalRNA using an Agilent Bioanalyzer (Agilent Technologies, Palo Alto,CA). In both studies all samples passed quality control and showed nosigns of degradation.

For microarray hybridization, 500 ng of total RNA was convertedinto labeled cRNA with nucleotides coupled to fluorescent dye Cy3using the Low RNA Input Linear Amplification Kit (AgilentTechnologies, Palo Alto, CA) following the manufacturer's protocol.The A260/280 ratio and yield of each of the cRNAs were determinedand cRNA integrity was evaluated using an Agilent Bioanalyzer.nCy3-labeled cRNA (1.65 ng) from each sample was hybridized toAgilent Rat Whole Genome Oligonucleotide Microarrays in 4X44Kformat. The hybridized array was then washed and scanned and datawere extracted from the scanned image using Feature Extractionversion 9.5 (Agilent Technologies). All microarrays passed QC

standards set forth in the Agilent Feature Extraction Software (V9.5)

reference guide.The gProcessed signal from the Agilent 4X44K arrays used in both

the CVNC and FA study data was normalized using quantilenormalization followed by per chip median centering. The data fromCVNC and FA study were normalized separately. All microarray datawill be available upon publication of the article through the ChemicalEffects in Biological Systems (CEBS) database.

Statistical analysis of microarray data. Significant changes in geneexpression (vehicle vs. control; hepatocarcinogen vs. non-hepatocarcinogen) were determined using a t-test (Benjamini andHochberg multiple testing correction, with a false discovery rate of0.05).

Unsupervised clustering of samples. All clustering, unsupervised orsupervised, was done using GeneSpring GX10. A hierarchical methodwas used to cluster both entities and sample dose groups. A Euclidiandistance metric and centroid linkage rule was used in all cases.

Gene set enrichment analysis (GSEA). GSEA was performed usingGeneSpring GX 10. Details of the statistical procedures can be foundelsewhere (Subramanian et al., 2005). For each GSEA the minimum ofnumber of matching genes was set to 10, the maximum number ofpermutations was 1000 and the multiple testing correction wasBenjamini and Hochberg. Only gene sets in the MIT Broad List c5 GOterms were evaluated for enrichment by GSEA. The gene setsexhibiting a q-valueb0.25 were reported.

Model generation, cross-validation and independent validation. Datafrom 10 unique treatments (CVNC study; 4 carcinogens, 6 non-carcinogens; Table 1) over 3 exposure periods were used to train themodels. The normalized CVNC (training) data were partitioned by asingle study duration (2, 14, or 90 days) or combination of studydurations (2+14, 2+90, 14+90, 2+14+90). After partitioningsamples were assigned to the hepatocarcinogen or non-hepato-carcinogen class based on results from 2-year bioassays. For trainingpurposes, all CVNC vehicle control samples were assigned to be non-hepatocarcinogens. All normalized expression data from each samplesubset were used to create models using a modified support vectormachine in combination with recursive feature elimination (SVM-RFE) (Mav et al., manuscript in preparation). RFE is an embeddedfeature selection method that selects probes (features) based on theircontribution to the performance of the classifier. SVMs are maximummargin classifiers that use kernel based transformation methods tomap sample dot products into higher dimensional space where thesample classes (in our case hepatocarcinogens and non-hepatocarcinogens) are separated by a linear hyperplane (Noble,2006). SVMs consistently outperform most other machine learningmethods when classifying microarray data (Statnikov et al., 2007;Statnikov et al., 2008).

A work flow for model generation is shown in Fig. 1. Each iterationinvolves training a classification model based on the training data(CVNC), evaluating the performance of the model via leave-one-outcross-validation (CV), and then using the model to classify the test set(FA). During cross-validation a mean weight (W value) is generatedfor each probe on the array that indicates the importance of thatprobe (and by inference the importance of the gene/transcriptmonitored by that probe) to the classification model. After eachiteration the probes are ranked by their respective mean W values(from most to least informative) the bottom 10% of the probes (leastinformative) removed, and the next cycle initiated using the reducedprobe set. The initial run starts with all probes on the array(approximately 41,000 probes) and after 87 iterations ends with atwo probe set. Thus one model per iteration or a total of 87 modelswere evaluated for each set (2 days, 14 days, 90 days, 2+14 days, 2+90 days, 14+90 days, 2+14+90 days) of CVNC training data. Foreach set of data an optimal model was identified. An optimal modelwas defined as a model that achieved 0% cross-validation error withthe lowest number of probes. During training, cross-validation andindependent validation data from each animal were treated as anindividual sample. Due to the presence of methyleugenol treatedsamples in training and test data when cumulative prediction errorswere calculated for test data all methyleugenol samples wereremoved. Cumulative prediction error rates of the FA study samples(independent validation set) are based on the classification ofsamples from the following treatment groups from 3 distinctexposure durations (2 days, 14 days, and 90 days): SAF-H, ESG-H,ESG-L, EGN-L, EGN-H, IEG-H, IEG-L, TC, UTC (9 unique treatments).Other treatments in the FA study (ANT-L, ANT-H, MYR-L, MYR-H, ISF-L, ISF-H) are reported in the article however they are not consideredwhen determining cumulative prediction error of the FA studysamples.

To create models using published hepatocarcinogenicity signa-tures we first identified probes present on the Agilent 4X44K arraysthat correspond to genes present in the published signatures using

Page 5: Predicting the hepatocarcinogenic potential of alkenylbenzene flavoring agents using toxicogenomics and machine learning

Fig. 1. Work flow for the creation of the RFE-SVM models. Details are described in the Materials and methods.

304 S.S. Auerbach et al. / Toxicology and Applied Pharmacology 243 (2010) 300–314

Entrez gene IDs. We then carried out one cycle of model building withall samples from the 2-, 14- and 90-day CVNC study. The 2+14+ 90-day SVM models built with each of the published signatures wereused to classify the 90-day FA samples.

Results

Chemical and dose selection

Test article chemicals used in both studies (CVNC and FA) arelisted in Table 1. The rat hepatocarcinogens (NDMA, AFB, DBAQ, MEG)that were in the training data (CVNC study) were administered atdoses that produced a tumor incidence of at least 40% in male F344rats in completed 2-year bioassays. All hepatocarcinogens in the CVNCstudy administered at doses well above their rat liver TD50 (the dailydose-rate in mg/kg/day for life to induce tumors in half of testanimals that would have remained tumor-free at zero dose; Table 1)were found in the Carcinogen Potency Database. The doses of non-hepatocarcinogens (VTC, TYP and APAP) in the training data set wereselected from completed 2-year bioassays, but were not associatedwith significant organ specific toxicity. For training purposes thevehicle control groups (DFDC=dose feed control; DWC; dose watercontrol; GAVC=gavage control) were treated as non-hepatocarcino-gens. NDMA (CAS No. 62-75-9) was included as a classic hepatocarci-nogen (IARC, 1976; IARC, 1978). NDMA is positive in the Ames assayand carcinogenic in all species tested (rats, mice, guinea pigs,hamsters, rabbits, ducks, fish). AFB1 (CAS No. 1162-65-8) is a widelystudied, Ames positive, classic hepatocarcinogen that is positive forliver tumors in most species tested (Wogan and Newberne, 1967;IARC, 1976). DBAQ (CAS No. 81-49-2) is perhaps the most positive ofall NTP chemicals for rat liver cancer (NTP, 1996). More than 90% ofthe male rats in the top two doses had solitary or multiplehepatocellular carcinomas with nearly as high an incidence in femalerats. DBAQ elicits a weak response in a single strain of Salmonella andis generally regarded as non-genotoxic. MEG (CAS No. 93-15-2) atdoses of 150 mg/kg/day was a potent hepatocellular carcinogen thatresulted in 50% of male rats developing hepatocellular carcinomasover a 2-year study (NTP, 2000). MEG is DNA reactive, but wasnegative in NTP Ames studies (Randerath et al., 1984). Administrationof VTC (CAS No. 50-81-7) or TYP (CAS No. 73-22-3) up to 50,000 ppm

in the feed to rats and mice in an NTP carcinogenicity study failed tocause increased tumors in rats or mice (NTP, 1983b). APAP (CAS No.103-90-2) administered at up to 6000 ppm in the feed to rats andmice in an NTP carcinogenicity study failed to cause increased tumorsin rats or mice (NTP, 1993). APAP is known to be hepatotoxic and themechanisms of hepatotoxicity are well understood. VTC, TYP andAPAP were all negative in the Ames assay. The average structuralsimilarity (Tanimoto distance) of the chemicals in the CVNC trainingdata was 0.058. A maximum similarity of 0.14 was observed betweenAFB1 and DBAQ. Tanimoto distances can be found in SupplementaryTable 1b.

For purposes of independent validation we evaluated geneexpression elicited by exposure to 7 alkenylbenzenes. The selectionof these chemicals was based on differential hepatocarcinogenichazard and potency, in addition to their ongoing evaluation by NTP.

The doses of alkenylbenzenes used for the test set (FA study) wereselected based on their molar equivalent doses to the potent rathepatocarcinogen, methyleugenol (also included in the CVNC at anintermediate dose level in relation to the FA study). All alkenylben-zenes in this set are classified as non-genotoxic as measured by theAmes assay, however safrole and estragole both form DNA adducts invivo (Randerath et al., 1984). SAF (CAS No. 94-59-7) was the first ofthis class to be discovered to have hepatocarcinogenic properties(Long et al., 1963). ESG has not undergone a 2-year carcinogenicitybioassay however in an NTP 90-day study doses of 600 mg/kg/dayproduce liver adenomas in F344 rats (Bristol, 2009). SAF and ESGwerepositive for carcinogenicity in the newborn mouse assay and producehepatic DNA adducts in mice (Miller et al., 1983; Randerath et al.,1984). ANT, IGN and EGN were tested for carcinogenicity in a 2-yearbioassay and found to be negative in male rats (NTP, 1983a; Truhautet al., 1989; NTP, 2008). The 2-year bioassay data for ISF are poorlydocumented, but appear to indicate hepatocarcinogenic activity inrats (Hagan et al., 1965). EGN, MYR, ANT, and ISF were all negative inthe newborn mouse assays and produced little to no increase in DNAadducts (Miller et al., 1983; Randerath et al., 1984; Wiseman et al.,1987). The average structural similarity of the FA data set (if MEG isincluded) is 0.40. A maximum similarity of 0.69 was observedbetween MEG and ESG. Tanimoto similarity coefficients can befound in Supplementary Table 1b. Further details related to thesechemicals can be found in Supplementary Table 1.

Page 6: Predicting the hepatocarcinogenic potential of alkenylbenzene flavoring agents using toxicogenomics and machine learning

305S.S. Auerbach et al. / Toxicology and Applied Pharmacology 243 (2010) 300–314

Liver pathology, hematology and clinical chemistry

Few treatment related microscopic lesions occurred in the livers ofrats exposed to hepatocarcinogens (DBAQ, MEG, and NDMA) or non-hepatocarcinogens (VTC, APAP, TYP), (Supplementary Table 2).Mitotic alteration consisting of increased mitoses (severity grade 2on a scale of 1–4) occurred in all animals exposed to hepatocarcino-gens and non-hepatocarcinogens after 2 days of exposure and in a fewanimals after 14 days of exposure. The exceptions were AFB1 exposedanimals whose livers exhibited oval cell proliferation after 14 days ofchemical exposure, and after 90 days one rat had developed ahepatocellular adenoma (Supplementary Table 2).

No microscopic lesions were observed in the livers of rats exposedto any alkenylbenzene after 2 or 14 days of chemical exposure.However after 90 days of exposure, rats exposed to a high dose ofestragole exhibited oval cell hyperplasia, cholangiofibrosis, hepato-cyte hypertrophy and regeneration, and Kupffer cell pigmentation,and rats exposed to a high dose of safrole exhibited hepatocytehypertrophy. Hematology and clinical chemistry assessments werelargely unremarkable, with no treatment related elevation of ALT forany of the treatment groups (Supplementary Table 3). Thus none ofthe chemicals were administered at doses that were acutely cytotoxicor necrogenic.

General characterization of gene expression changes in the training andtest data set

An unsupervised clustering analysis of the two data sets based onall probes on the array demonstrated that the primary contributor togene expression variance is age (Supplementary Fig. 1a (CVNC) and b(FA)). The effect of age was particularly noteworthy in the FA studywhere the 2- and 14-day dose groups clustered in a distinct branch ofthe dendrogram from the 90-day dose groups. Within individualexposure duration group from the CVNC study the hepatocarcinogensand non-hepatocarcinogens intermingled in the dendrogram. Therewas slightly greater distinction between hepatocarcinogens and non-hepatocarcinogen treatments in the FA study dose groups, where thehigh doses of hepatocarcinogens tended to cluster together.

To better understand the age related processes driving theunsupervised clustering GSEA was performed that compared thecombined 2- and 14-day DFDC samples with the 90-day DFDCsamples from the CVNC study (Supplementary Table 4). The 2- and14-day samples exhibited a relative enrichment of genes associatedwith extracellular matrix (EXTRACELLULAR_MATRIX_PART, EXTRA-CELLULAR_MATRIX, and COLLAGEN), cell proliferation (MITOSIS,M_PHASE_OF_MITOTIC_CELL_CYCLE, CELL_CYCLE_PHASE amongmany others) and DNA damage (RESPONSE_TO_DNA_DAMAGE_STI-MULUS and DNA_REPAIR) whereas the 90-day samples exhibited arelative enriched expression of genes associated with angiogenesis(REGULATION_OF_ANGIOGENESIS and WOUND_HEALING), micro-somes (MICROSOME) and steroid hormone signaling (STEROID_HORMONE_RECEPTOR_ACTIVITY). Such an observation is not sur-prising considering the rates of growth that take place over a typicalpre-chronic study. The growth rate of male F344 rats is rapid at the

Table 2Optimal model characteristics.

Optimal model No. of probes Cross-validation

2 days 3 0%14 days 6 0%90 days 15 0%2+14 days 28 1%2+90 days 59 0%14+90 days 4 0%2+14+90 days 13 0%

start of the studies (7–8 weeks of age) and progressively decreaseseventually reaching a plateau at 30 weeks of life.

In order to create classification models that exhibit robustsensitivity and specificity with respect to identifying the hepatocarci-nogenic potential of test chemicals it is essential that significantchanges in gene expression are manifest in the training data at boththe class level and at the level of the individual chemical treatmentsthat are used in the model training data. A t-test comparing geneexpression between combined hepatocarcinogens and combined non-hepatocarcinogens from the CVNC study was performed usingindividual exposure durations (Supplementary Table 5). Overall, the14-day data demonstrated the greatest number of significant changesin gene expression (4778 probes). Ninety days of exposure alsoproduced notable changes in expression related to hepatocarcinogenicactivity (665 probes), whereas the expression changes by 2 days wereless remarkable (109 probes). All individual chemical treatment/exposure duration combinations were subject to a comparison withtheir corresponding control group using a t-test. In all cases significant(pb 0.05) changes in gene expression were noted in the chemicallytreated dose groups from the CVNC study (Supplementary Table 6).

To identify the pathways differentially regulated between thehepatocarcinogens and non-hepatocarcinogens we performed GSEAthat evaluated the 2-, 14- and 90-day CVNC data separately. Theanalysis of the 2-, 14- and 90-day data revealed 0, 390, and 35 GO genesets enriched, respectively. After 14 days of exposure the carcinogentreated samples exhibited a relative increase in expression of genesets related to cell proliferation (M_PHASE_OF_MITOTIC_CELL_CYCLE,DNA_REPLICATION, CELL_CYCLE_PROCESS among others), DNA repair(RESPONSE_TO_DNA_DAMAGE_STIMULUS and DNA_REPAIR) andapoptosis (PROGRAMMED_CELL_DEATH, NEGATIVE_REGULATIO-N_OF_PROGRAMMED_CELL_DEATH and APOPTOSIS_GO). At 14 daysthere were no gene sets exhibiting relatively enriched expression inthe non-carcinogen treated animals. After 90 days of exposure thecarcinogen treated samples exhibited enriched expression of genesets related to xenobiotic metabolism (OXIDOREDUCTASE_ACTIVI-TY__ACTING_ON_THE_ALDEHYDE_OR_OXO_GROUP_OF_DONORS),cell proliferation (M_PHASE, CELL_CYCLE_PROCESS and MITOSISamong others) and apoptosis (CASPASE_ACTIVATION and POSITI-VE_REGULATION_OF_CASPASE_ACTIVITY). There was notable ab-sence of enriched gene sets related to DNA repair in the 90-dayhepatocarcinogen treated samples. A full list of enriched gene listsfrom the GSEA can be found in Supplementary Table 7.

Creation of optimal models and their characteristics

After comparing the performance of different combinations offeature selection and supervised machine learning methods weselected recursive feature elimination in combination with a supportvector machine to generate our hepatocarcinogenicity predictionmodels. Using this approach, 87 models were generated and cross-validated from sets of data representing a single exposure duration (2,14 or 90 days) or a combination of exposure durations (2+14 days, 2+90 days, 14+90 days and 2+14+90 days). The 87 modelsrepresent the number of iterations (Fig. 1) to reduce 41,000 starting

error Number of modelsachieving 0% CV error

Number ofsupport vectors

18 61 5

56 100 17

32 341 51 13

Page 7: Predicting the hepatocarcinogenic potential of alkenylbenzene flavoring agents using toxicogenomics and machine learning

Table 3Adjusted W valuesa of probe informative to more than one model.

Probe name Common name Gene symbol 2-dayoptimalmodel

14-dayoptimalmodel

90-dayoptimalmodel

2+14-dayoptimalmodel

2+90-dayoptimalmodel

14+90-dayoptimalmodel

2+14+90-dayoptimalmodel

Occurencesb

A_44_P869415 XM_001076479 Wwox 0.31 0.20 0.13 0.11 0.07 0.26 0.07 7A_44_P331276 NM_021774 Fhit 0.44 0.08 0.10 0.07 0.20 0.10 6A_44_P866659 XM_574584 Adam8 0.30 0.09 0.11 0.07 0.21 5A_44_P321009 NM_012540 Cyp1a1 0.07 0.06 0.01 0.02 0.03 5A_44_P884766 TC583017 RGD1561899 0.14 0.04 0.05 0.34 4A_44_P653701 TC606396 Mybl2 0.11 0.02 0.05 3A_44_P141897 NM_012623 Abcb1b 0.06 0.01 0.02 3A_42_P711139 NM_012541 Cyp1a2 0.02 0.20 0.06 3A_44_P841532 TC584998 TC584998

(Gbf1(intron))0.05 0.01 2

A_44_P654434 DV726161 Unc5d 0.03 0.01 2A_43_P12705 NM_031972 Aldh3a1 0.26 0.01 2A_43_P18376 ENSRNOT00000015394 RGD1305928_predicted 0.02 0.03 2A_44_P1071620 XM_235795 Mmp27_predicted 0.05 0.01 2A_44_P145133 ENSRNOT00000016459 RGD1560681_predicted 0.00 0.01 2A_44_P169972 BI294262 Bag5 0.01 0.04 2A_44_P414963 ENSRNOT00000030203 Slc2a10_predicted 0.02 0.03 2A_44_P478144 AF152002 LOC290595 0.01 0.07 2A_44_P541213 XM_001059639 LOC681383(C10orf11) 0.04 0.02 2A_44_P550145 NM_139192 Scd1 0.03 0.02 2A_44_P809199 TC614636 Dab1 0.24 0.01 2

a W values shown in table are adjusted so that the sum of all weights for an optimal model equals 1.b The number of optimal models informed by the probe.

306 S.S. Auerbach et al. / Toxicology and Applied Pharmacology 243 (2010) 300–314

probes to 2, the final number of probes selected. Plots showing thecross-validation error rates of each of the 87 models from each dataset are shown in Supplementary Fig. 2a–g.

With the exception of one combination of CVNC data (2+14 days)that yielded a minimum cross-validation error of 1%, all individualstudy durations and combinations thereof were capable of yielding 1or more SVM models that achieved 0% cross-validation error. Thenumber of models that are capable of achieving 0% cross-validationerror reflects the number of probes that are being consistentlyinfluenced by the phenotype under consideration (in our case livercarcinogenesis). The subsets of CVNC data yielded a range of SVM (56to 0) models that achieved 0% error by CV (Table 2). Of the 87 modelsbuilt using the 90-day data, 56 achieved 0% CV error. In contrast, the14-day exposure data did not yield a single model that achieved 0% CVerror.

The goal of iterative CVwas to identify an optimalmodel from eachset of data, which we define as the model that contains the minimumnumber of probes that is capable of achieving the lowest possiblecross-validation error. This criterion for optimal model selectionfavors models that should exhibit relatively broad generalizability.Generalizability is loosely defined as how well a model performswhen classifying gene expression changes induced by a wide varietyof chemicals. Models that are informed by relatively greater numbersof probes are typically more complex and therefore less generalizable.The number of probes informing the seven optimal models rangedfrom 3 (2-day model) to 59 (2+ 90-day model) (Table 2). Theoptimal models derived from combinations of study durations tendedto possess greater complexity (meaning they contain more probes)compared to the optimal models derived from single exposuredurations. Another metric commonly used to determine the gener-alizability of an SVM model is the number of support vectors itcontains. The optimal models based on data from 14 days of exposureor 14+90 days of exposure contained only 5 support vectors (Table2). Themost complicated optimal model was the model based on datafrom a combination of 2 and 90 days of exposure (34 support vectors).

Genes (and their corresponding probes) informing more than oneoptimal model are found in Table 3. Two of the most informativegenes wereWwox (all 7 optimal models) and Fhit (6 optimal models)were tumor suppressors that are commonly deleted in multiple

human cancers (Iliopoulos et al., 2006). Adam8 (5 optimal models) isa member of the disintegrin and metallopeptidase family and itsexpression is correlated with cancer progression (Mochizuki andOkada, 2007). A number of genes that lack functional characterizationalso informed a number of optimal models including RGD1561899 (4optimal models). All 89 probes that were informative to at least oneoptimal model along with corresponding dose group mean, normal-ized expression values are listed in Supplementary Table 8.

Independent validation of optimal models

We performed independent validation using data from the FAstudy. Only the hepatocarcinogens, ESG (L,H), SAF(H) and the non-hepatocarcinogens (EGN (L,H), IEG(H,L), UTC, TC) were consideredwhen calculating overall classification error rates that are plotted inSupplementary Fig. 2a–g and listed in Table 4. The integration ofindependent validation into each cycle of model building revealed,not surprisingly, that the independent validation accuracy of themodels increased as the less informative probes were removed(Supplementary Fig. 2a–g). When all FA data were considered theerror rates of optimal models ranged from 0.259 (14-day optimalmodel) to 0.162 (2+14+ 90-day optimal model). When the FA testdata were separated by exposure duration and all optimal modelswere used to predict the data, error rates went down as test dataexposure duration increased. For example, the 14+90 -day optimalmodel error rate was 0.21, 0.21 and 0.1 when classifying the 2-, 14-and 90-day FA data, respectively. The set of FA test data that wasconsistently classified by all the optimal models with the lowest errorwas the 90-day data (all model mean error: 0.130). The range of 90-day FA data classification error was marginal for the 7 differentoptimal models, ranging from 0.1 to 0.155, with the 14+90-dayoptimal producing the lowest classification error. These findingssuggest that the highest classification accuracy of the chemicals withuncertain hepatocarcinogenic activity (ISF and MYR) in the flavoringagent study would come from prediction of the 90-day exposure datain combination with the cumulative classification by all 7 optimalmodels.

To further explore why exposure duration of the test samples hadsuch an effect on classification accuracy we plotted the mean fraction

Page 8: Predicting the hepatocarcinogenic potential of alkenylbenzene flavoring agents using toxicogenomics and machine learning

Table 4Independent validation error rates of the optimal models.

Optimal model 2 day FA test dataa 14-day FA test dataa 90-day FA test dataa All FA test dataa

2 days 0.27 0.25 0.14 0.2114 days 0.47 0.27 0.13 0.2690 days 0.29 0.24 0.12 0.202+14 days 0.18 0.23 0.15 0.182+90 days 0.26 0.24 0.15 0.2114+90 days 0.21 0.21 0.10 0.162+14+90 days 0.21 0.19 0.12 0.16All optimal models 0.27 0.23 0.13 0.20

a Only SAF-H, ESG-H, ESG-L, EGN-L, EGN-H, IEG-H, IEG-L, TC, and UTC were considered when calculating cumulative error.

307S.S. Auerbach et al. / Toxicology and Applied Pharmacology 243 (2010) 300–314

of samples classified as “hepatocarcinogen” for all optimal models foreach dose group. The hepatocarcinogens (Fig. 2a) show an increase inthe number of samples being classified “hepatocarcinogen” withincreasing exposure duration. This effect wasmost consistent with thehigh doses of the hepatocarcinogens, but was also noted, albeit lessconsistently, with the low doses of hepatocarcinogens. The non-hepatocarcinogens (Fig. 2b) show a consistent decrease in “hepato-carcinogen” classification with increasing exposure duration, suggest-ing that false positive classification decreasedwith exposure duration.MYR and ISF show decreasing rates of “hepatocarcinogen” classifica-tion as exposure duration increased (Fig. 2c).

To better understand the reason why increasing the exposureduration of the FA test samples led to a decrease in classification error

Fig. 2. Effect of dose and exposure duration of the FA test data on classification. SampleClassifications for all optimal models were summed for each dose group and the fraction of thfor the known or anticipated hepatocarcinogens (A), non-carcinogens (B) and the inadeqexposure samples, 90D= 90-day exposure samples, L=low dose, H=high dose.

we plotted the expression (relative to the treated control) of a subsetof informative genes (Mybl2, Adam8, Wwox, and Fhit) that inform anumber of the optimal models (Fig. 3). For purposes of simplicity andto illustrate the effects of dose and time we only plotted the high andlow dose hepatocarcinogenic treatments from the FA study. Theamplitude of the gene expression change relative to the TC group wasboth time and dose dependent. For the genes induced by hepato-carcinogen treatment, Mybl2 (Fig. 3a) and Adam8 (Fig. 3b), there wasrelatively little induction after 2 or 14 days even in the high dosehepatocarcinogen treatment groups, however there was noteworthyinduction of these genes after 90 days. The induction response ofMybl2 and Adam8 after 90 days of exposure was clearly dosedependent with high doses of ESG, MEG and SAF producing notably

s were classified as “carcinogen” or “non-carcinogen” once by each optimal model.e total classifications that were “carcinogen”were plotted. Cumulative predictions plotsuately test chemicals (C) are shown. 2D =2 -day exposure samples, 14D= 14-day

Page 9: Predicting the hepatocarcinogenic potential of alkenylbenzene flavoring agents using toxicogenomics and machine learning

Fig. 3. Effect of dose and exposure duration on the expression of genes informative to the optimal models. The fold change (treated/control) for the different exposure durations tothe hepatocarcinogens in the FA study. Data used to calculate fold change were unlogged, quantile normalized intensity values. Shown are two genes that are induced by carcinogentreatment, Mybl2 (A_44_P653701) (A), Adam8 (A_44_P866659) (B), and genes that are down-regulated by carcinogen treatment, Wwox (A_44_P869415) (C), and Fhit(A_44_P331276) (D).

308 S.S. Auerbach et al. / Toxicology and Applied Pharmacology 243 (2010) 300–314

higher levels of induction compared to their low dose counterparts.Wwox (Fig. 3c) and Fhit (Fig. 3d) were down-regulated byhepatocarcinogen treatment. The response of these genes wasnoteworthy even after 2 days of treatment in the high dose groups.The low doses of ESG and MEG produced a much more muted down-regulation ofWwox and Fhit following 2- or 14-day exposures, but 90days of exposure led to more pronounced decreases in expression.SAF-L had little to no effect on Wwox and Fhit even after 90 days ofexposure.

Another approach to understanding the effect of exposureduration on classification accuracy is to evaluate the effect it has onmargin score (a classification confidence metric reported by SVMmodels). The models described here yield a positive margin scorewhen a sample is more closely associated with the hepatocarcinogenclass and the opposite for the non-hepatocarcinogen class. The moredistal the margin score is from 0 the more confident the model is thata sample is associated with one class. To avoid biasing our assessmentdue to selection of a model trained on a subset of the data we used the2+14+ 90-day model for this exercise. As shown in Fig. 4, the dosegroup mean margin scores of the high dose hepatocarcinogens werelargely the same after 2 and 14 days of exposure, but dramaticallyincreased by 90 days of exposure. The low dose hepatocarcinogenstended to produce margin scores that stayed close to 0 and becamemore positivewith increasing exposure. Themargin scores of the non-hepatocarcinogen dose groups generally became more negative withincreasing exposure duration. Similar margin score trends wereobserved with other 6 optimal models (Supplementary Table 9).

Prediction of individual chemicals from the flavoring agent study

The major determinant of independent validation error was theexposure duration of the FA test data, with 90-day data beingclassified with the lowest error. The low rates of error observed withthe 90-day FA data were consistent across all optimal models. Inconsidering these observations we came to the conclusion that themost effectiveway to evaluate the hepatocarcinogenic potential of theindividual FA chemicals would be to classify the 90-day FA data withall models and report a cumulative incidence of “hepatocarcinogen”classification (Table 5). In performing such an analysis each of the 90-day FA samples was classified 7 different times (one for each model).The high dose groups of the hepatocarcinogens MEG, ESG, and SAFwere consistently classified as “hepatocarcinogen” by all optimalmodels, whereas none of the high dose or low dose non-hepatocarci-nogens, ANT, EGN, IEG, TC, UTC were classified “hepatocarcinogen”.The low dose MEG samples were disproportionately classified“hepatocarcinogen”, whereas low dose ESG was classified “hepato-carcinogen” approximately 50% of the times and low dose SAF wasrarely classified “hepatocarcinogen”. A small fraction of the low doseISF samples were classified “hepatocarcinogen”. The fraction of ISFsample classified “hepatocarcinogen” in the high dose groupincreased notably to 42%. None of the low dose MYR samples wereclassified “hepatocarcinogen”, but approximately 35% of the samplesin the high dose were classified “hepatocarcinogen”. A more detaileddiscussion of the individual chemical treatment effects, the impactthese effects on classification and the potential implications for hazard

Page 10: Predicting the hepatocarcinogenic potential of alkenylbenzene flavoring agents using toxicogenomics and machine learning

Fig. 4. FA study dose group meanmargin score from classification using the 2+14+90 -day optimal model. Two-, 14- and 90-day hepatocarcinogen (A) and non-hepatocarcinogen(B) samples were classified using the 2+14+90 -day optimal model. Dose groups were identified by chemical, exposure duration, and dose level. The mean margin score for thedose groups is the average of the margin scores reported by the model. A positive margin score indicates a stronger association with the hepatocarcinogen class, whereas a negativemargin indicates a stronger association with the non-hepatocarcinogen class. As exposure duration increases the mean margin score becomes more distal to 0 (more positive ormore negative) indicating that the model is relatively more confident in the classification of the longer exposure duration data. Data are plotted as mean±standard error. 2D=2 -day exposure samples, 14D= 14-day exposure samples, 90D= 90-day exposure samples, L=low dose, H=high dose. aStatistically significantly difference (pb 0.05; unpaired t-test) between the 2D and 90D mean margin scores. bStatistically significantly difference (pb 0.05; unpaired t-test) between the 14D and 90D mean margin scores.

Table 5All optimal model cumulative incidence of hepatocarcinogen classification of the 90-day FA samples.

Class Treatment Dose level All optimal modelcumulative incidenceof “hepatocarcinogen”classification

HepatocarcinogensESG L 31/70

H 70/70MEG L 65/70

H 70/70SAF L 9/70

H 70/70Non-hepatocarcinogens

ANT L 1/70H 0/70

EGN L 0/70H 0/70

IEG L 0/70H 0/70

TC X 0/126UTC X 0/70

UntestedISF L 3/70

H 30/70MYR L 0/63

H 24/70

309S.S. Auerbach et al. / Toxicology and Applied Pharmacology 243 (2010) 300–314

characterization of the alkenylbenzenes can be found in theSupplementary Discussion.

In order to better visualize the expression changes that underliethe classifications of the individual treatment groups we clustered the90-day FA data (Fig. 5) using the 20 probes informative to two or moreof the models (Table 3). A hierarchical cluster diagram of the 90-dayFA study dose groups using 89 features informative to at least onemodel (Supplementary Table 8) can be found in Supplementary Fig. 3.Few of the informative genes exhibit profound differential expressionbetween the hepatocarcinogens and non-hepatocarcinogens (Sup-plementary Table 8). The high dose treatment groups of the threeknown/anticipated hepatocarcinogens, SAF, MEG and ESG, clustertogether on a distinct arm of the dendrogram. The low dose SAF, MEGand ESG treatment group clustered on the arm of the dendrogram thatcontained all of the non-hepatocarcinogens. MYR-H and ISF-Hclustered in close proximity to the low dose hepatocarcinogentreatment groups (ESG-L and MEG-L). The high dose hepatocarcino-gens caused a marked induction of Mybl2 and Adam8 and a decreasein expression of Fhit and Wwox. A similar pattern of Mylb2 andAdam8 induction and Fhit and Wwox reduction was observed, albeitto a lesser extent with the low dose hepatocarcinogens. MYR-H andISF-H do not cause a down-regulation of Wwox or Fhit, indicating alack of potent genotoxic properties. MYR-H and ISF-H did howeverinduce expression of Mybl2, Cyp1a1, Nol3 and Rhbg and decreasedexpression of Odz2, all changes in expression that parallel thoseproduced by treatment with the hepatocarcinogenic alkenylbenzenes.

Page 11: Predicting the hepatocarcinogenic potential of alkenylbenzene flavoring agents using toxicogenomics and machine learning

Fig. 5. Hierarchical cluster diagram of the 90-day FA dose groups created with the 20 probes informative to two or more of the optimal models. Treatment group mean intensityvalues used for the clustering were derived from data that was quantile normalized and baseline transformed. Chemical abbreviations can be found in Table 1. Yellow representsincreased expression and blue indicates decreased expression. Increased or decreased expression is in relation to the treated control samples.

310 S.S. Auerbach et al. / Toxicology and Applied Pharmacology 243 (2010) 300–314

Predictions using published gene lists

Since a number of rat hepatocarcinogen gene expression signa-tures have been published recently (Nakayama et al., 2006; Fieldenet al., 2007; Ellinger-Ziegelbauer et al., 2008; Uehara et al., 2008)(Supplementary Table 10) we wanted to evaluate their performanceusing the data described in this article. An initial comparison ofthe genes from the optimal models with those from the publishedrat hepatocarcinogenicity signatures revealed very little overlap(Supplementary Table 11).

Despite the absence of a noteworthy overlap in the signatures, wereasoned that if the biology behind the publishedmodels were similarin nature to our studies then use of gene lists from these publishedstudies would provide an adequate substitute. To limit the complexityof this comparison we decided to generate only 3 models using each

Table 6Cumulative error rates for optimal models and published signatures.

Model 2-day FA dataa

2-day optimal model 0.2714-day optimal model 0.4790-day optimal model 0.292+ 14-day optimal model 0.182+90 day optimal-model 0.2614+90 day optimal-model 0.212+14+90 day optimal-model 0.21Fielden, 2007, signature 0.30Ellinger-Ziegelbauer, RFE, 2008, signature 0.30Uehara, 2008, signature 0.40Nakayama, 2006, signature 0.15

a Only SAF-H, ESG-H, ESG-L, EGN-L, EGN-H, IEG-H, IEG-L, TC, and UTC were considered w

signature and all data from the CVNC data (2 days, 14 days and 90days). The SVMmodels built using the four published gene signaturesyielded models that achieved 97% to 100% cross-validation accuracy(data not shown). Independent validation error rates determinedusing all FA samples indicate that our models outperform all thepublished signatures, with the exception of the Nakayama signature(Table 6). As was documented above, the 90-day FA data wereclassified with the highest accuracy. When predicting the 90-day FAdata, the Nakayama signature produced error rates equivalent to thebest performing optimal model (14+90-day model). Predictions ofthe individual dose groups using the 2-, 14- or 90-day FA data arefound in Supplementary Table 12. Notably, there are indications fromall the models based on published signatures that the high doses ofMYR and ISF administered in this study would be hepatocarcinogenicin a 2-year study.

14-day FA dataa 90-day FA dataa All FA data

0.25 0.14 0.210.27 0.13 0.260.24 0.12 0.200.23 0.15 0.180.24 0.15 0.210.21 0.10 0.160.19 0.12 0.160.35 0.23 0.280.27 0.12 0.210.32 0.14 0.260.15 0.10 0.13

hen calculating cumulative error rates.

Page 12: Predicting the hepatocarcinogenic potential of alkenylbenzene flavoring agents using toxicogenomics and machine learning

311S.S. Auerbach et al. / Toxicology and Applied Pharmacology 243 (2010) 300–314

Discussion

In this article we described seven hepatocarcinogen SVM classi-fication models generated using hepatic gene expression from maleF344/N rats exposed to chemicals with characterized hepatocarcino-genic activity for 2, 14 or 90 days. With these models we haveaccurately classified a number of previously tested chemicals from thealkenylbenzene flavoring agent class including the hepatocarcino-gens, SAF, MEG, ESG and the non-hepatocarcinogens, EGN, IGN andANT. In addition, we have classified two untested (MYR)/poorlycharacterized (ISF) flavoring agents predicting that administration ofthese chemicals at 2 mmol/kg/day for 2-years would produce a weak,albeit significant increase in hepatic tumor burden in male rats.Despite the success of these studies, enthusiasmmust be tempered tosome degree, as the independent validation was performed on a set ofstructurally andmechanistically similar chemicals (alkenylbenzenes).Determining the degree of global predictivity of the models presentedhere will require the evaluation of a much greater (structurally andmechanistically) diversity of chemicals. Furthermore, the SVMtraining data, although more diverse than independent validationset, represent only a fraction of chemical and more importantly themechanistic universe as it pertains to chemical-induced hepatocarci-nogenesis. For this reason it is reasonable to hypothesize that thesemodels would produce errant classifications of certain chemicalclasses.

There has been a great deal of discussion in the literature withrespect to the most affective machine learning approach and featureselection method for model generation using genomic expressiondata (Saeys et al., 2007; Ressom et al., 2008). A variety of modelingapproaches including SVM-RFE have been used by groups undertak-ing studies similar to the one described here (Nie et al., 2006; Fieldenet al., 2007; Thomas et al., 2007; Ellinger-Ziegelbauer et al., 2008;Uehara et al., 2008). Initially we evaluated the CV performance of K-Nearest Neighbors, Decision Trees, Naïve Bayes, Multi-Layer Percep-tron Neural Networks, Predictive Analysis of Microarrays (PAM) andSVM using the same set of probes selected using either a chi-squaretest or signal-to-noise statistic (Golub et al., 1999). The algorithm thatyielded the best results independent of the feature selection methodwas SVM and it was therefore selected for the modeling describedhere. Another critical effecter of model performance is featureselection (Saeys et al., 2007; Pirooznia et al., 2008). The classificationaccuracy of a SVM can be improved by using an integrated featureselection method referred to as recursive feature elimination (Guyonet al., 2002; Pirooznia et al., 2008). By performing SVM-RFE weimproved our performance compared to when SVM was coupled tofilter-based feature selection methods. Based on these results we areconfident that the use of SVM-RFE provided one of the best availableapproaches for model generation. Notably, others undertaking asimilar line of work have also found that SVM-RFE outperformsother feature selection/machine learning combinations (Ellinger-Ziegelbauer et al., 2008).

A number of other groups have explored the use of toxicogenomicsfor predicting hepatocarcinogenic risk in rats, but we are the first toemploy gene expression from animals exposed for 90 days, the typicallength of an NTP pre-chronic study. One of the objectives of thesestudies was to determine whether models informed by geneexpression from longer exposure durations (90 days) exhibitedmore accurate carcinogen classification relative to those trainedon data from shorter durations. During the model generationphase it became clear that as exposure duration increased the fractionof the gene expression aligned with hepatocarcinogenic or non-hepatocarcinogenic activity increased. This effect was best illustratedby the identification of 18 models that achieved 0% CV error using the2-day compared to the 59 models that achieved 0% CV error using the90-day training data. The difference in the number of 0% error modelslikely indicates a more robust alignment of molecular level processes

after longer exposure durations. Despite our observation in the modelgeneration phase, the findings from the optimal model cross-validation phase of the study indicated that when models are trainedusing gene expression from animals exposed to highly potent doses ofhepatocarcinogens and high doses of non-hepatocarcinogens thatlonger exposure durations do not lead to the identification of modelswith enhanced prediction accuracy (as measured by independentvalidation). This is evidenced by the near equivalent accuracy of the 2-day and 90-day models when predicting the 90-day FA samples. Thelikely reason for near equivalent predictive accuracy of the 2-day and90-day models is that both incorporated gene expression changesindicative of genotoxic stress. The gene most indicative of genotoxicstress that informs both models is Wwox (a tumor suppressordiscussed more below), which is strongly down-regulated byhepatocarcinogens. All the chemicals in the FA study with documen-ted or anticipated hepatocarcinogenic activity (MEG, SAF and ESG) aregenotoxic and therefore cause a down-regulation of Wwox ifadministered for 90 days at a dose high enough to activate a DNAdamage response. Hence, both the models when predicting the 90-day FA were not necessarily predicting general hepatocarcinogenicactivity as much as they are identifying gene expression changesrelated to genotoxicity.

An observation that is critically important to the application ofgene expression-based hepatocarcinogenicity prediction models isthat weakly hepatocarcinogenic chemical/dose combinations requirelonger exposure durations to manifest changes reflective of hepato-carcinogenic activity. Such is the case with the low dose (0.2 mmol/kg/day) of estragole, which produced changes in gene expression thatyield largely “non-hepatocarcinogen” classification after 2 and 14days of exposure, but after 90 days of exposure yielded nearly a 50% allmodel cumulative “hepatocarcinogen” classification rate. This findingsuggests, not surprisingly, that gene expression changes followingchemical exposure are a function of both dose and exposure duration.The essential message to be gleaned from these observations is inorder to avoid misclassifying a weak hepatocarcinogen when usinggenomics-based predictive models, gene expression from exposuredurations of 90 days or longer should be employed.

An initial characterization of the two data sets indicated that thedominant effect on gene expression came from age of the animal.GSEA indicated that younger animals exhibited increased expressionof genes related to tissue remodeling and cell proliferation. Theincreased activity of both processes is likely a byproduct of the morerapid rate of growth in the younger rats. Tissue remodeling and cellproliferation are integral components of the carcinogenic process andthey involve genes whose expression is likely predictive of chemical-induced hepatocarcinogenic activity. This effect is best illustrated bythe relatively high expression in young animals of Adam8 and Mybl2,2 model informative genes that are involved in tissue remodeling andcell proliferation, respectively. The higher level of expression of genesinvolved in these processes may potentially mask the activity ofhepatocarcinogens or alternatively the high level of expression, mayin the non-hepatocarcinogen treated animals, lead to false positiveclassification. One potential way of overcoming this issue is to createtraining data using animals that have reached their growth plateau.The shortcoming of such an approach is that there would be noexposure in the growth phase when rates of mutation fixation arehigher (Preston-Martin et al., 1990). The lower rate of mutagenesismay be born out in lower overall responses to treatment and thereforedecrease sensitivity for detecting chemical effects. A comparativestudy that evaluates hepatocarcinogen elicited gene expressionduring and after the growth phase would provide better clarity onthis issue and may potentially lead to predictive models that exhibitincreased sensitivity and specificity.

A total of 89 probes informed one or more of the optimal models. Anumber of the consistently informative genes are intimately linked toprocesses related to carcinogenesis. Wwox (7 models) and Fhit (6

Page 13: Predicting the hepatocarcinogenic potential of alkenylbenzene flavoring agents using toxicogenomics and machine learning

312 S.S. Auerbach et al. / Toxicology and Applied Pharmacology 243 (2010) 300–314

models) are down-regulated by the genotoxic hepatocarcinogens inour study. These genes have previously been shown to be down-regulated inmouse and rat liver following genotoxic challenge (Iida etal., 2005; Nakayama et al., 2006; Ellinger-Ziegelbauer et al., 2008).Both are found in common fragile sites (FRA16D and FRA3B) in thegenome that are deleted in a variety of human cancers (Iliopoulos etal., 2006). Both genes have previously been shown to decrease inexpression in response to genotoxic agents and this decrease inexpression is correlated with S-phase cell cycle delay (Thavathiru etal., 2005). Other molecular levels studies have linked Wwox topositive regulation of apoptosis (Yang and Zhang, 2008) and Fhit torepression of β-catenin activity (Weiske et al., 2007). Anotherinformative gene with a noteworthy relationship to carcinogenesisis Mybl2. Mybl2 is a member of a 6 gene proliferation signature foundinmost cancers (Whitfield et al., 2006) and is generally induced by thehepatocarcinogens in our study. Mybl2 in combination with E2Ftranscription factors regulates expression of cyclin A2, cyclin B1 andcdc2 that are essential for progression of G2/M phase of the cell cycle(Sala, 2005). In addition, Mybl2 antagonizes apoptosis throughinduction of genes such as Bcl2 and ApoJ (Sala, 2005). Adam8 is amember of a disintegrin metalloprotease containing family of genesthat play a role in virtually all cancers (Mochizuki and Okada, 2007).The hepatocarcinogens in our study induce the expression of Adam8to varying degrees. Adam8 is over-expressed in a number of humancancers (Mochizuki and Okada, 2007). In addition, it is induced inpatients with chronic liver disease (Schwettmann et al., 2008). It isthought that expression of Adam8 is related to tissue remodeling andinvasion (Mochizuki and Okada, 2007). Other model informativegenes with more poorly defined roles in carcinogenesis include Nol3(apoptosis antagonist) (Mercier et al., 2005), Dab1 (regulation of SRCsignaling) (McAvoy et al., 2008), Unc5d (neovascularization and cellmigration) (Chedotal et al., 2005), Rhbg (β-catenin signaling) (Sekineet al., 2006; Takigawa and Brown, 2008) and Odz2 (development)(Ben-Zur et al., 2000). The presence of these genes and others in theoptimal models demonstrates the biological plausibility of the modelsand by extension the veracity of the predictions made by them.

We hypothesized that models that were trained using data fromlonger exposure durations would be populated by genes thatgenerally related to chemical-induced hepatocarcinogenic activity asopposed to specific modes of carcinogenic action. As discussed abovemany of the genes that populate themodels are related to the primarymodes of action of the training chemicals. Genes that would populatea general signature of hepatocarcinogenic activity should be altered inthe same manner following exposure to genotoxic or non-genotoxichepatocarcinogens, but unaffected by non-hepatocarcinogens. Fur-thermore, the changes in the expression of these genes would likelyoccur secondary to the primary toxic mode of action and would berelated to maladaptive changes such as proliferation, apoptosis, cellmigration and tissue remodeling that occur with repeat dosing(assuming that the dose administered is not overtly toxic). A searchfor genes in our data set that meet these criteria identified Mybl2 andAdam8. In the CVNC study both genes exhibit a relatively weakinduction after 2 days of exposure in the hepatocarcinogen treatedanimals, with the response being progressively amplified by 14 and 90days of exposure. Importantly, both genes are induced by thegenotoxic (AFB1, NDMA, MEG) and non-genotoxic hepatocarcinogen(DBAQ), but are affected by the non-hepatocarcinogen treatments.

False negative predictions using the models described here may bein part a byproduct of training data derived from rats treated withhighly hepatocarcinogenic doses of chemical. Training on this datamay set the bar high for detecting hepatocarcinogenic activity andtherefore may be the reason why the low doses of the hepatocarcino-gens were in some cases disproportionately characterized as non-hepatocarcinogens. In order to address this issue it will be necessaryto create training data from exposures to chemical/dose combinationswith lesser hepatocarcinogenic potency. Creation of such data would

not only address the detection of weak hepatocarcinogens, but mayalso allow for application of regression methods to provide quanti-tative predictions of tumor outcomes as opposed to the binaryclassifications, such as those presented here.

We created models with our data using published hepatocarcino-genicity signatures to characterize the degree to which signatures canbe employed across array platforms and studies (Nakayama et al.,2006; Fielden et al., 2007; Ellinger-Ziegelbauer et al., 2008; Uehara etal., 2008). The Iconix signature that was derived from themost diversecollection of chemicals surprisingly performed the worst of all thesignatures when evaluating the entire set of FA data, whereas thesignature derived by Nakayama et al. performed as well as our bestoptimal model (14+ 90-day model). The likely reason for thisdisparity is the abundance of genotoxins contained in each data set.The Iconix signature was derived solely from expression induced bynon-genotoxic hepatocarcinogens whereas our data set is largelypopulated with chemicals that are genotoxic. Furthermore, the Iconixsignature was derived from expression studies done in Sprague–Dawley rats, whereas the Nakayama signature was created usingexpression data from F344 rats, the same strain used in the studiesdescribed here. Following this line of reasoning it is not surprising thatthe Nakayama et al. signature performed comparatively well. In short,signature performance in the context of our studies is likely reflectiveof both how well aligned the hepatocarcinogenic modes of actionbetween our study and previously published analyses and the strainof rat employed to generate the signatures.

In conclusion, we identified SVM models trained on hepatic geneexpression elicited by hepatocarcinogenic and non-hepatocarcino-genic chemical treatments that were capable of identifying chemicalswith hepatocarcinogenic activity. Our results indicate that accuratemodels can be derived from gene expression following 2 -dayexposures to minimally toxic doses of hepatocarcinogens and non-hepatocarcinogens. Many of the genes that informed these modelshave a documented or plausible association with modes of chemicalhepatocarcinogenesis (genotoxicity) and processes more generallyrelated to carcinogenic transformation. The critical observation fromthese studies was that dose levels which produce moderate, yetsignificant increases in tumor incidence after 2 years require extendedexposure periods (90 days) to produce changes in liver geneexpression robustly indicative of hepatocarcinogenic activity. Inconsidering this observation we suggest that when evaluatingchemicals of unknown hepatocarcinogenic potency using genomics-based predictive models that gene expression from exposure periods≥90 days be employed. Furthermore, it is our assertion that the use ofexposures ≥90 days will be more effective in identifying hepatocar-cinogen hazard associated with minimally toxic exposures thatapproximate those used in a 2-year bioassay.

Acknowledgments

Funding: This research was supported [in part] by the IntramuralResearch Program of the NIH, National Institute of EnvironmentalHealth Sciences under Research Project Number 1 Z01 ESO45004-11BB. The authors would like to thank Drs. Julia Gohlke and Rick Paulesto critical review of the article.

Appendix A. Supplementary data

Supplementary data associated with this article can be found, inthe online version, at doi:10.1016/j.taap.2009.11.021.

References

Allen, D.G., Pearse, G., Haseman, J.K., Maronpot, R.R., 2004. Prediction of rodentcarcinogenesis: an evaluation of prechronic liver lesions as forecasters of livertumors in NTP carcinogenicity studies. Toxicol. Pathol. 32, 393–401.

Page 14: Predicting the hepatocarcinogenic potential of alkenylbenzene flavoring agents using toxicogenomics and machine learning

313S.S. Auerbach et al. / Toxicology and Applied Pharmacology 243 (2010) 300–314

Ben-Zur, T., Feige, E., Motro, B., Wides, R., 2000. The mammalian Odz gene family:homologs of a Drosophila pair-rule gene with expression implying distinct yetoverlapping developmental roles. Dev. Biol. 217, 107–120.

Benigni, R., Netzeva, T.I., Benfenati, E., Bossa, C., Franke, R., Helma, C., Hulzebos, E.,Marchant, C., Richard, A., Woo, Y.T., and Yang, C. (2007). The expanding role ofpredictive toxicology: an update on the (Q)SAR models for mutagens andcarcinogens.

Benigni, R., Zito, R., 2004. The second National Toxicology Program comparativeexercise on the prediction of rodent carcinogenicity: definitive results. Mutat. Res.566, 49–63.

Bristol, D., 2009. NTP. Toxicity Report 82.Bucher, J.R., Portier, C., 2004. Human carcinogenic risk evaluation, Part V: the National

Toxicology Program vision for assessing the human carcinogenic hazard ofchemicals. Toxicol. Sci. 82, 363–366.

Chedotal, A., Kerjan, G., Moreau-Fauvarque, C., 2005. The brain within the tumor: newroles for axon guidance molecules in cancers. Cell Death Differ. 12, 1044–1056.

Chhabra, R.S., Huff, J.E., Schwetz, B.S., Selkirk, J., 1990. An overview of prechronic andchronic toxicity/carcinogenicity experimental study designs and criteria used bythe National Toxicology Program. Environ. Health Perspect. 86, 313–321.

Cohen, S.M., 2004. Human carcinogenic risk evaluation: an alternative approach to thetwo-year rodent bioassay. Toxicol. Sci. 80, 225–229.

Eastin, W.C., Mennear, J.H., Tennant, R.W., Stoll, R.E., Branstetter, D.G., Bucher, J.R.,McCullough, B., Binder, R.L., Spalding, J.W., Mahler, J.F., 2001. Tg.AC geneticallyaltered mouse: assay working group overview of available data. Toxicol. Pathol. 29(Suppl), 60–80.

Elcombe, C.R., Odum, J., Foster, J.R., Stone, S., Hasmall, S., Soames, A.R., Kimber, I., Ashby,J., 2002. Prediction of rodent nongenotoxic carcinogenesis: evaluation ofbiochemical and tissue changes in rodents following exposure to nine nongeno-toxic NTP carcinogens. Environ. Health Perspect. 110, 363–375.

Ellinger-Ziegelbauer, H., Gmuender, H., Bandenburg, A., Ahr, H.J., 2008. Prediction of acarcinogenic potential of rat hepatocarcinogens using toxicogenomics analysis ofshort-term in vivo studies. Mutat. Res. 637, 23–39.

Fielden, M.R., Brennan, R., Gollub, J., 2007. A gene expression biomarker provides earlyprediction and mechanistic assessment of hepatic tumor induction by nongeno-toxic chemicals. Toxicol. Sci. 99, 90–100.

Garman, K.S., Nevins, J.R., Potti, A., 2007. Genomic strategies for personalized cancertherapy. Hum. Mol. Genet. 16 (Spec No. 2), R226–R232.

Gold, L.S., Manley, N.B., Slone, T.H., Rohrbach, L., Garfinkel, G.B., 2005. Supplement tothe Carcinogenic Potency Database (CPDB): results of animal bioassays publishedin the general literature through 1997 and by the National Toxicology Program in1997–1998. Toxicol. Sci. 85, 747–808.

Golub, T.R., Slonim, D.K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J.P., Coller, H.,Loh, M.L., Downing, J.R., Caligiuri, M.A., Bloomfield, C.D., Lander, E.S., 1999.Molecular classification of cancer: class discovery and class prediction by geneexpression monitoring. Science 286, 531–537.

Guyon, I., Weston, J., Barnhill, S., Vapnik, V., 2002. Gene selection for cancerclassification using support vector machines. Mach. Learn. 46, 389–422.

Hagan, E.C., Jenner, P.M., Jones,W.I., Fitzhugh, O.G., Long, E.L., Brouwer, J.G., Webb,W.K.,1965. Toxic properties of compounds related to safrole. Toxicol. Appl. Pharmacol. 7,18–24.

Huff, J. (1998). Carcinogenesis results in animals predict cancer risks to humans. InMaxcy–Rosenau–Last public health & preventive medicine (R.B. Wallace, Ed.), pp.543–550, 567–569. Appleton & Lange, Stamford, Conn.

IARC, 1976. Some naturally occurring compounds. In: IARC (Ed.), IARC Monographs onthe Evaluation of the Carcinogenic Risk of Chemicals to Humans. IARC ScientificPublishing, Lyon, pp. 51–72.

IARC, 1978. Some N-Nitroso compounds. In: IARC (Ed.), IARC Monographs on theEvaluation of the Carcinogenic Risk of Chemicals to Humans. IARC ScientificPublishing, Lyon, pp. 125–176.

Iida, M., Anna, C.H., Holliday, W.M., Collins, J.B., Cunningham, M.L., Sills, R.C., Devereux,T.R., 2005. Unique patterns of gene expression changes in liver after treatment ofmice for 2 weeks with different known carcinogens and non-carcinogens.Carcinogenesis 26, 689–699.

Iliopoulos, D., Guler, G., Han, S.Y., Druck, T., Ottey, M., McCorkell, K.A., Huebner, K., 2006.Roles of FHIT and WWOX fragile genes in cancer. Cancer Lett. 232, 27–36.

Isfort, R.J., Kerckaert, G.A., LeBoeuf, R.A., 1996. Comparison of the standard and reducedpH Syrian hamster embryo (SHE) cell in vitro transformation assays in predictingthe carcinogenic potential of chemicals. Mutat. Res. 356, 11–63.

Ito, N., Tamano, S., Shirai, T., 2003. A medium-term rat liver bioassay for rapid in vivodetection of carcinogenic potential of chemicals. Cancer Sci. 94, 3–8.

Jacobs, A., 2005. Prediction of 2-year carcinogenicity study results for pharmaceuticalproducts: how are we doing? Toxicol. Sci. 88, 18–23.

Judson, R., Richard, A., Dix, D.J., Houck, K., Martin, M., Kavlock, R., Dellarco, V., Henry,T., Holderman, T., Sayre, P., Tan, S., Carpenter, T., Smith, E., 2009. The toxicitydata landscape for environmental chemicals. Environ. Health Perspect. 117,685–695.

Kerckaert, G.A., Brauninger, R., LeBoeuf, R.A., Isfort, R.J., 1996. Use of the Syrian hamsterembryo cell transformation assay for carcinogenicity prediction of chemicalscurrently being tested by the National Toxicology Program in rodent bioassays.Environ. Health Perspect. 104 (Suppl 5) , 1075–1084.

Kramer, J.A., Curtiss, S.W., Kolaja, K.L., Alden, C.L., Blomme, E.A., Curtiss, W.C., Davila, J.C.,Jackson, C.J., Bunch, R.T., 2004. Acute molecular markers of rodent hepaticcarcinogenesis identified by transcription profiling. Chem. Res. Toxicol. 17,463–470.

Lambert, I.B., Singer, T.M., Boucher, S.E., and Douglas, G.R. (2005). Detailed review oftransgenic rodent mutation assays.

Long, E.L., Nelson, A.A., Fitzhugh, O.G., Hansen, W.H., 1963. Liver tumors produced inrats by feeding safrole. Arch. Pathol. 75, 595–604.

McAvoy, S., Zhu, Y., Perez, D.S., James, C.D., Smith, D.I., 2008. Disabled-1 is a largecommon fragile site gene, inactivated in multiple cancers. Genes ChromosomesCancer 47, 165–174.

Mercier, I., Vuolo, M., Madan, R., Xue, X., Levalley, A.J., Ashton, A.W., Jasmin, J.F., Czaja,M.T., Lin, E.Y., Armstrong, R.C., Pollard, J.W., Kitsis, R.N., 2005. ARC, an apoptosissuppressor limited to terminally differentiated cells, is induced in human breastcancer and confers chemo- and radiation-resistance. Cell Death Differ. 12, 682–686.

Miller, E.C., Swanson, A.B., Phillips, D.H., Fletcher, T.L., Liem, A., Miller, J.A., 1983.Structure–Activity studies of the carcinogenicities in the mouse and rat of somenaturally occurring and synthetic alkenylbenzene derivatives related to safrole andestragole. Cancer Res. 43, 1124–1134.

Mochizuki, S., Okada, Y., 2007. ADAMs in cancer cell proliferation and progression.Cancer Sci. 98, 621–628.

Muir, D.C., Howard, P.H., 2006. Are there other persistent organic pollutants? Achallenge for environmental chemists. Environ. Sci. Technol. 40, 7157–7166.

Nakayama, K., Kawano, Y., Kawakami, Y., Moriwaki, N., Sekijima, M., Otsuka, M., Yakabe,Y., Miyaura, H., Saito, K., Sumida, K., Shirai, T., 2006. Differences in gene expressionprofiles in the liver between carcinogenic and non-carcinogenic isomers ofcompounds given to rats in a 28-day repeat-dose toxicity study. Toxicol. Appl.Pharmacol. 217, 299–307.

Nie, A.Y., McMillian, M., Parker, J.B., Leone, A., Bryant, S., Yieh, L., Bittner, A., Nelson, J.,Carmen, A., Wan, J., Lord, P.G., 2006. Predictive toxicogenomics approaches revealunderlying molecular mechanisms of nongenotoxic carcinogenicity. Mol. Carcinog.45, 914–933.

Noble, W.S., 2006. What is a support vector machine? Nat. Biotechnol. 24, 1565–1567.NTP, 1983a. Carcinogenesis studies of eugenol (CAS No. 97-53-0) in F344/N rats and

B6C3F1 mice (feed studies). Natl. Toxicol. Program Tech. Rep. Ser. 223, 1–159.NTP, 1983b. NTP carcinogenesis bioassay of l-ascorbic acid (vitamin C) (CAS No. 50-81-

7) in F344/N rats and B6C3F1 mice (feed study). Natl. Toxicol. Program Tech. Rep.Ser. 247, 1–172.

NTP, 1993. NTP toxicology and carcinogenesis studies of acetaminophen (CAS No. 103-90-2) in F344 rats and B6C3F1 mice (feed studies). Natl. Toxicol. Program Tech.Rep. Ser. 394, 1–274.

NTP, 1996. NTP toxicology and carcinogenesis studies of 1-amino-2,4-dibromoanthra-quinone (CAS No. 81-49-2) in F344/N rats and B6C3F1 mice (feed studies). Natl.Toxicol. Program Tech. Rep. Ser. 383, 1–370.

NTP, 2000. NTP toxicology and carcinogenesis studies of methyleugenol (CAS NO. 93-15-2) in F344/N rats and B6C3F1 mice (gavage studies). Natl. Toxicol. ProgramTech. Rep. Ser. 491, 1–412.

NTP (2008). TR-551—NTP technical report on the toxicology and carcinogenesis studiesof isoeugenol.

Parry, E.M., Parry, J.M., Corso, C., Doherty, A., Haddad, F., Hermine, T.F., Johnson, G.,Kayani, M., Quick, E., Warr, T., Williamson, J., 2002. Detection and characterizationof mechanisms of action of aneugenic chemicals. Mutagenesis 17, 509–521.

Pirooznia, M., Yang, J.Y., Yang, M.Q., Deng, Y., 2008. A comparative study of differentmachine learning methods on microarray gene expression data. BMC Genomics 9(Suppl. 1), S13.

Preston-Martin, S., Pike, M.C., Ross, R.K., Jones, P.A., Henderson, B.E., 1990. Increased celldivision as a cause of human cancer. Cancer Res. 50, 7415–7421.

Randerath, K., Haglund, R.E., Phillips, D.H., Reddy, M.V., 1984. 32P-post-labellinganalysis of DNA adducts formed in the livers of animals treated with safrole,estragole and other naturally-occurring alkenylbenzenes. I. Adult female CD-1mice. Carcinogenesis 5, 1613–1622.

REACH (2008). ECHA—list of pre-registered substances.Ressom, H.W., Varghese, R.S., Zhang, Z., Xuan, J., Clarke, R., 2008. Classification

algorithms for phenotype prediction in genomics and proteomics. Front. Biosci. 13,691–708.

Rozman, K.K., 2000. The role of time in toxicology or Haber's c × t product. Toxicology149, 35–42.

Saeys, Y., Inza, I., Larranaga, P., 2007. A review of feature selection techniques inbioinformatics. Bioinformatics 23, 2507–2517.

Sala, A., 2005. B-MYB, a transcription factor implicated in regulating cell cycle,apoptosis and cancer. Eur. J. Cancer 41, 2479–2484.

Sasaki, Y.F., Sekihashi, K., Izumiyama, F., Nishidate, E., Saga, A., Ishida, K., Tsuda, S., 2000.The comet assay with multiple mouse organs: comparison of comet assay resultsand carcinogenicity with 208 chemicals selected from the IARC monographs and U.S. NTP Carcinogenicity Database. Crit. Rev. Toxicol. 30, 629–799.

Schwettmann, L., Wehmeier, M., Jokovic, D., Aleksandrova, K., Brand, K., Manns, M.P.,Lichtinghagen, R., Bahr, M.J., 2008. Hepatic expression of A disintegrin andmetalloproteinase (ADAM) and ADAMs with thrombospondin motives (ADAM-TS) enzymes in patients with chronic liver diseases. J. Hepatol. 49, 243–250.

Sekine, S., Lan, B.Y., Bedolli, M., Feng, S., Hebrok, M., 2006. Liver-specific loss of beta-catenin blocks glutamine synthesis pathway activity and cytochrome p450expression in mice. Hepatology (Baltimore, Md 43, 817–825.

Shirley, E., 1977. A non-parametric equivalent of Williams' test for contrastingincreasing dose levels of a treatment. Biometrics 33, 386–389.

Statnikov, A., Li, C., Aliferis, C.F., 2007. Effects of environment, genetics and data analysispitfalls in an esophageal cancer genome-wide association study. PLoS ONE 2, e958.

Statnikov, A., Wang, L., Aliferis, C.F., 2008. A comprehensive comparison of randomforests and support vector machines for microarray-based cancer classification.BMC Bioinformatics 9, 319.

Storer, R.D., French, J.E., Haseman, J., Hajian, G., LeGrand, E.K., Long, G.G., Mixson, L.A.,Ochoa, R., Sagartz, J.E., Soper, K.A., 2001. P53+/− hemizygous knockout mouse:overview of available data. Toxicol. Pathol. 29 (Suppl), 30–50.

Page 15: Predicting the hepatocarcinogenic potential of alkenylbenzene flavoring agents using toxicogenomics and machine learning

314 S.S. Auerbach et al. / Toxicology and Applied Pharmacology 243 (2010) 300–314

Subramanian, A., Tamayo, P., Mootha, V.K., Mukherjee, S., Ebert, B.L., Gillette, M.A.,Paulovich, A., Pomeroy, S.L., Golub, T.R., Lander, E.S., Mesirov, J.P., 2005. Gene setenrichment analysis: a knowledge-based approach for interpreting genome-wideexpression profiles. Proc. Natl. Acad. Sci. U. S. A. 102, 15545–15550.

Takigawa, Y., Brown, A.M., 2008. Wnt signaling in liver cancer. Curr. Drug Target 9,1013–1024.

Tennant, R.W., Margolin, B.H., Shelby, M.D., Zeiger, E., Haseman, J.K., Spalding, J.,Caspary, W., Resnick, M., Stasiewicz, S., Anderson, B., et al., 1987. Prediction ofchemical carcinogenicity in rodents from in vitro genetic toxicity assays. Science236, 933–941.

Thavathiru, E., Ludes-Meyers, J.H., MacLeod, M.C., Aldaz, C.M., 2005. Expression ofcommon chromosomal fragile site genes, WWOX/FRA16D and FHIT/FRA3B isdownregulated by exposure to environmental carcinogens, UV, and BPDE but notby IR. Mol. Carcinog. 44, 174–182.

Thomas, R.S., Pluta, L., Yang, L., Halsey, T.A., 2007. Application of genomic biomarkers topredict increased lung tumor incidence in 2-year rodent cancer bioassays. Toxicol.Sci. 97, 55–64.

Truhaut, R., Le Bourhis, B., Attia, M., Glomot, R., Newman, J., Caldwell, J., 1989. Chronictoxicity/carcinogenicity study of trans-anethole in rats. Food Chem. Toxicol. 27,11–20.

Uehara, T., Hirode, M., Ono, A., Kiyosawa, N., Omura, K., Shimizu, T., Mizukawa, Y.,Miyagishima, T., Nagao, T., Urushidani, T., 2008. A toxicogenomics approach forearly assessment of potential non-genotoxic hepatocarcinogenicity of chemicals inrats. Toxicology 250 (1), 15–26.

USEPA (2004). What is the TSCA chemical substance inventory?

Usui, T., Mutai, M., Hisada, S., Takoaka, M., Soper, K.A., McCullough, B., Alden, C., 2001.CB6F1-rasH2 mouse: overview of available data. Toxicol. Pathol. 29 (Suppl),90–108.

Van den Berg, M., Birnbaum, L., Bosveld, A.T., Brunstrom, B., Cook, P., Feeley, M., Giesy,J.P., Hanberg, A., Hasegawa, R., Kennedy, S.W., Kubiak, T., Larsen, J.C., van Leeuwen,F.X., Liem, A.K., Nolt, C., Peterson, R.E., Poellinger, L., Safe, S., Schrenk, D., Tillitt, D.,Tysklind, M., Younes, M., Waern, F., Zacharewski, T., 1998. Toxic equivalency factors(TEFs) for PCBs, PCDDs, PCDFs for humans and wildlife. Environ. Health Perspect.106, 775–792.

van Kreijl, C.F., McAnulty, P.A., Beems, R.B., Vynckier, A., van Steeg, H., Fransson-Steen,R., Alden, C.L., Forster, R., van der Laan, J.W., Vandenberghe, J., 2001. Xpa and Xpa/p53+/− knockout mice: overview of available data. Toxicol. Pathol. 29 (Suppl),117–127.

Weiske, J., Albring, K.F., Huber, O., 2007. The tumor suppressor Fhit acts as a repressor ofbeta-catenin transcriptional activity. Proc. Natl. Acad. Sci. U. S. A. 104, 20344–20349.

Whitfield, M.L., George, L.K., Grant, G.D., Perou, C.M., 2006. Common markers ofproliferation. Nat. Rev. Cancer 6, 99–106.

Wiseman, R.W., Miller, E.C., Miller, J.A., Liem, A., 1987. Structure–Activity studies of thehepatocarcinogenicities of alkenylbenzene derivatives related to estragole andsafrole on administration to preweanling male C57BL/6J x C3H/HeJ F1mice. CancerRes. 47, 2275–2283.

Wogan, G.N., Newberne, P.M., 1967. Dose-response of aflatoxin B1 carcinogenesis in therat. Cancer Res. 27, 2370–2376.

Yang, J., Zhang, W., 2008. WWOX tumor suppressor gene. Histol. Histopathol. 23,877–882.