performance comparison of machine learning algorithms and number of independent components used in...

Download Performance comparison of machine learning algorithms and number of independent components used in fMRI decoding of belief vs. disbelief

Post on 30-Nov-2016




1 download

Embed Size (px)


  • niec



    Accepted 1 November 2010Available online 10 November 2010

    Keywords:Machine learning

    become a popular tool for mining functional neuroimaging data, and there are

    sets are extremely large, and one of the key challenges in fMRI

    NeuroImage 56 (2011) 544553

    Contents lists available at ScienceDirect


    e lIn the early 1950s, Shannon developed an iterated penny-matching device designed to perform simple brain reading tasks(Shannon, 1953). Although this device performed only slightly betterthan chance, it created a fascination with brain reading technology(Budiansky et al., 1994). Recent advancements in neuroimaging haveprovided a quantitative method for visualizing brain activity thatcorresponds to mental processes (Cox and Savoy, 2003), and certainbrain reading feats have been accomplished by applying patternclassication techniques to functional magnetic resonance imaging(fMRI) data (Norman et al., 2006).

    The application of machine learning (ML) to fMRI analysis has

    classication has been to mine these vast data effectively.Although certain studies have used the entire fMRI data set for

    predictive purposes (LaConte et al., 2007), most investigators performa preliminary dimension reduction step to reduce the number ofinputs used for classication. Reducing dimensionality has a numberof benets; specically, computational burden is reduced andclassication efciency is often improved, and problems with over-tting that may lead to poor generalization are reduced (Yamashitaet al., 2008).

    Determining anoptimalmethod for dimension reduction is an activearea of research. Physiologically driven approaches for data reductionbecome increasingly popular, following itHaxby's visual object recognition data (Hannetworks (Hanson et al., 2004), Nave Bayes (support vector machine classiers (LaConte

    Corresponding author.E-mail address: (P.K. D

    1053-8119/$ see front matter 2010 Elsevier Inc. Aldoi:10.1016/j.neuroimage.2010.11.002yielded varying levels of predictive capability. However, fMRI dataIntroductionClassierfMRIInterpretabilityOptimizationIndependent component analysistheir belief or disbelief of a variety of propositional statements. We performed unsupervised dimensionreduction and automated feature extraction using independent component (IC) analysis and extracted IC timecourses. Optimization of classication hyperparameters across each classier occurred prior to assessment.Maximum accuracy was achieved at 92% for Random Forest, followed by 91% for AdaBoost, 89% for NaveBayes, 87% for a J48 decision tree, 86% for K*, and 84% for support vector machine. For real-time decodingapplications, nding a parsimonious subset of diagnostic ICs might be useful. We used a forward searchtechnique to sequentially add ranked ICs to the feature subspace. For the current data set, we determined thatapproximately six ICs represented a meaningful basis set for classication. We then projected these six ICspatial maps forward onto a later scanning session within subject. We then applied the optimized MLalgorithms to these new data instances, and found that classication accuracy results were reproducible.Additionally, we compared our classication method to our previously published general linear model resultson this same data set. The highest ranked IC spatial maps show similarity to brain regions associated withcontrasts for belief N disbelief, and disbelief b belief.

    2010 Elsevier Inc. All rights reserved.s initial application toson et al., 2004). NeuralPereira et al., 2009), andet al., 2005) each have

    have been emploselecting regionssubstantially (Ctypically requiremorphology ass2006). Searchligpowerful results.sparse logistic rouglas).

    l rights data of persons engaged in a bivariate task, assertingReceived 1 December 2009Revised 16 October 2010now hopes of performing such analyses efciently in real-time. Towards this goal, we compared accuracy ofsix different ML algorithms applied to neuroimagArticle history: Machine learning (ML) hasPerformance comparison of machine learindependent components used in fMRI d

    P.K. Douglas a,, Sam Harris b, Alan Yuille c, Mark S. Ca Department of Biomedical Engineering, University of California, Los Angeles, USAb Interdepartmental Neuroscience Program, University of California, Los Angeles, USAc Department of Statistics, University of California, Los Angeles, USAd Department of Psychiatry, Neurology, Psychology, Radiological Sciences, Biomedical Phys

    a b s t r a c ta r t i c l e i n f o

    j ourna l homepage: algorithms and number ofoding of belief vs. disbelief

    en a,b,d

    University of California, Los Angeles, USA


    sev ie locate /yn imgyed to incorporate spatial dependencies. For example,of interest (ROIs) in the brain, diminishes input sizeox and Savoy, 2003). However, these techniquesuser-input and a priori knowledge about brain

    ociated with a given task (Mouro-Miranda et al.,ht analysis (Kriegeskorte et al., 2006) has yieldedMultivoxel pattern analysis (Norman et al., 2006) andegression (Yamashita et al., 2008) provide useful

  • methods for determining voxel subsets with high signal-to-noise ratio.However many voxels, especially those in adjacent spatial locations,may provide approximately the same information, and subsets of voxelsselected may differ fold to fold. Using a method like independentcomponent analysis (ICA) allows basis images to cover the entire brain,and may prove to be a more reproducible method for parsimoniousdimension reduction.

    Here, we utilized independent component analysis (ICA) as anunsupervised method for both dimension reduction and featureextraction. ICA is a powerful blind source separation technique thathas found numerous applications in the eld of functional neuroima-ging to include: data exploration (Beckmann et al., 2006), noisecomponent elimination (Tohka et al., 2008), and more recently by ourgroup (A. Anderson et al., 2010) as a dimension reduction techniquein fMRI ML classication. We hypothesize that ICs themselves mayrepresent a reasonable basis to describe certain brain states. To theextent that this is true, output from certain classiers could beinterpreted as a weighting of these primitive bases to describe higher-level cognitive states.

    Lack of parameter optimization at key steps along an fMRI dataprocessing pipeline can substantially affect analysis outcome (Strotheret al., 2004). Here, we have paid particular attention in parameteroptimization at multiple steps in the decoding process. First, we com-pare classication accuracy using six different ML classiers, across a

    For the experiments performed here we utilized a previouslypublished data set (Harris et al., 2008). Some of the present resultshave been shown in abstract form (Douglas et al., 2009, 2010).



    Our methodology, in brief, is as follows:

    i. Data Preprocessingii. Dimension Reduction & Feature Extraction Using ICAiii. Machine Learning Algorithmsiv. Training, Testing, and Iterative Optimizationv. Comparison across Classiers and Feature Subsets.

    This methodology was applied within subject. A variety of decisioncriterion were aplied to determine the optimal number of features. Ablock diagram illustrating this methodology is shown, Fig. 1.

    Study data


    545P.K. Douglas et al. / NeuroImage 56 (2011) 544553range of complexity. Where applicable, hyperparameters within MLalgorithms were also optimized simultaneously with optimizing thenumber of IC features. Optimal feature subset selection has beenuseful anumber ofelds (Dash and Liu, 1997)wheremachine learning has beenapplied including: brain computer interfaces (Garrett et al., 2003), voicerecognition (Pandit and Kittler, 1998), and in classifying gene micro-array expression data (Bo and Jonassen, 2002). Nonetheless, there havebeen relatively fewefforts to perform this analysis on fMRIdata.Optimalfeature selection can clearly improve computational efciency, whichmaybe a considerationwhen applying tools in near real-time, especiallyin a neurofeedback protocol. However, there are multiple reasons whythis optimization step is useful when computation time is not a con-sideration. Improved classication accuracy (Kohavi, 1997), reduceddata storage requirements (Aha, 1992), and diminished probability ofovertting (Hastie et al. 2001) are three motivating factors. General-ization capability also increases with the ratio of training patterns tofeatures (Theodoridis and Koutroumbas 2009).

    Fig. 1.Methodology ow diagram. Following preprocessing steps that included motiontime courses associated with each spatial IC were sampled for belief and disbelief co

    for training and testing of the classier, which proceeds over an n-fold cross-validated samction and brain extraction, independent component analysis (ICA) was performed andions. IC component features were ranked and then sent as inputs into machine learningPrior to the experiment we obtained IRB approval from UCLA andwritten informed consent from fourteen healthy adult subjects (1845 years old; 7 women) with no medical history of psychiatricdisorder, as assessed by a medical history survey. Participantsunderwent three 7-minute functional MRI scans (Siemens Allegra3T). Each scanning session consisted of ~100 trials, yielding a total of~300 trials per subject. While in the scanner, subjects were presentedwith short statements through video goggle displays. The presenta-tion of stimuli was self-paced, with an inter-stimulus interval of500 ms. Subjects were asked to evaluate the truth content from sevendifferent statement categories: mathematical, ethical, factual, geo-graphic, autobiographical, religious, and semantic. After reading eachstatement, subjects were asked to press a but