current bioinformatics, 2015, 10 machine learning approaches...

16
Send Orders for Reprints to [email protected] Current Bioinformatics, 2015, 10, 000-000 1 1574-8936/15 $58.00+.00 © 2015 Bentham Science Publishers Machine Learning Approaches for Cognitive State Classification and Brain Activity Prediction: A Survey Shantipriya Parida 1 , Satchidananda Dehuri *,2 and Sung-Bae Cho 3 1 Carrier Software & Core Network Department, Huawei Technologies India Pvt Ltd, Nallurahalli, Whitefield, Bangalore 560066, India 2 Department of Information and Communication Technology, Fakir Mohan University, Vyasa Vihar, Balasore-756019, Odisha, India 3 Soft Computing Laboratory, Department of Computer Science, Yonsei University,50 Yonsei-ro, Sudaemoon-gu, Seoul 120-749, South Korea Abstract: The application of machine learning approaches to decode cognitive states through functional Magnetic Resonance Imaging (fMRI) is one of the emerging fields of research over the past decade. Multivoxel Pattern Analysis (MVPA) treats the activation of multiple voxels from the fMRI data as a pattern to decode the brain states using machine learning based classifiers. The potential in designing a classifier to accurately classify the discriminating cognitive states has attracted great attention from machine learning researchers. Interest has been evinced in particular to the application of such classifiers to study brain functions, diagnose mental diseases, detect deception and develop a brain-computer-interface. This paper surveys the recent development of machine learning approaches in cognitive state classification and brain activity prediction. Comparative studies of various techniques have been investigated to appreciate their merits and demerits. Furthermore, feature selection is discussed in this survey as an important preprocessing step in MVPA because it incorporates those features that will be integrated in the classification task of fMRI data, thereby improving the performance of the classifier. Features can be selected by restricting the analysis to specific anatomical regions or by computing univariate (voxel-wise) or multivariate statistics. Besides a summary and the future perspective of this field, an extensive list of bibliography is included for the community of interest. Keywords: Classification, feature selection, feature extraction, functional magnetic resonance imaging, brain activity prediction, machine learning. 1. INTRODUCTION Recent advances in human neuro-imaging have shown the possibility of accurately decoding a person’s consciousness based on non-invasive measurements of their activity [1]. Using neuroscientific methods to gain insight into the individual mind is a common objective in many institutions, including education, the military and courts of law [2]. The current neurodynamic functional brain imaging methods offer unique insight into the human brain processes by capturing the neuronal activity in real time [3]. The discovery of the fMRI is a major breakthrough in the research of computational neuroscience and enjoys unprecedented popularity in its use as a tool for cognitive neuroscience research [4-5]. It radically improves our understanding of the functional organization of the human brain [6]. The fMRI technique is based on the observation that activated brain regions reveal a change in the blood supply as well as blood oxygenation, i.e., any increases in the neural activity are indicated by localized increases in blood oxygenation [7]. *Address correspondence to this author at the Department of Information and Communication Technology, Fakir Mohan University, Vyasa Vihar, Balasore-756019, Odisha, India; Tel: +031-219-3574; E-mail: [email protected] It provides a sequence of 3D brain images with intensities representing the Blood Oxygenation Level Dependent (BOLD) brain activations. Brain images obtained with fMRI have been used in combination with machine learning to build decoders for mind states [8]. An illustration of a small portion of the fMRI data (collected over a 15-second interval) is shown in Fig. (1), where a subject reads a word and decides whether it is a ‘noun’ or ‘verb’ [9]. It is clear from Fig. (1) that brain state classification via fMRI involves an analysis of complex and multivariate data. Neuroscientists and physicians conduct fMRI experiments to determine precisely the part of the brain that is active for critical functions such as thinking, speech, sensation and attention, to assess the effects of stroke, trauma or degenerative diseases (such as Alzheimer’s diseases) on brain function, to monitor the growth and the activity of the brain tumor regions and to plan surgery, radiotherapy or other surgical treatments of the brain [10]. In this survey, we attempt to focus on the machine learning techniques that have made a significant contribution to the fMRI based classification. The motivation for machine learning in fMRI analysis is to identify the best possible classification performance, in the quest for algorithms that can separate any kind of dataset regardless of its complexity Satchidananda Dehuri

Upload: others

Post on 03-Jun-2020

13 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Current Bioinformatics, 2015, 10 Machine Learning Approaches …sclab.yonsei.ac.kr/publications/Papers/IJ/2015_CBIO_Dehuri.pdf · Machine Learning Approaches for Cognitive State Classification

Send Orders for Reprints to [email protected]

Current Bioinformatics, 2015, 10, 000-000 1

1574-8936/15 $58.00+.00 © 2015 Bentham Science Publishers

Machine Learning Approaches for Cognitive State Classification and Brain Activity Prediction: A Survey Shantipriya Parida1, Satchidananda Dehuri*,2 and Sung-Bae Cho3

1Carrier Software & Core Network Department, Huawei Technologies India Pvt Ltd, Nallurahalli, Whitefield, Bangalore 560066, India 2Department of Information and Communication Technology, Fakir Mohan University, Vyasa Vihar, Balasore-756019, Odisha, India

3Soft Computing Laboratory, Department of Computer Science, Yonsei University,50 Yonsei-ro, Sudaemoon-gu, Seoul 120-749, South Korea

Abstract: The application of machine learning approaches to decode cognitive states through functional Magnetic Resonance Imaging (fMRI) is one of the emerging fields of research over the past decade. Multivoxel Pattern Analysis (MVPA) treats the activation of multiple voxels from the fMRI data as a pattern to decode the brain states using machine learning based classifiers. The potential in designing a classifier to accurately classify the discriminating cognitive states has attracted great attention from machine learning researchers. Interest has been evinced in particular to the application of such classifiers to study brain functions, diagnose mental diseases, detect deception and develop a brain-computer-interface. This paper surveys the recent development of machine learning approaches in cognitive state classification and brain activity prediction. Comparative studies of various techniques have been investigated to appreciate their merits and demerits. Furthermore, feature selection is discussed in this survey as an important preprocessing step in MVPA because it incorporates those features that will be integrated in the classification task of fMRI data, thereby improving the performance of the classifier. Features can be selected by restricting the analysis to specific anatomical regions or by computing univariate (voxel-wise) or multivariate statistics. Besides a summary and the future perspective of this field, an extensive list of bibliography is included for the community of interest.

Keywords: Classification, feature selection, feature extraction, functional magnetic resonance imaging, brain activity prediction, machine learning.

1. INTRODUCTION

Recent advances in human neuro-imaging have shown the possibility of accurately decoding a person’s consciousness based on non-invasive measurements of their activity [1]. Using neuroscientific methods to gain insight into the individual mind is a common objective in many institutions, including education, the military and courts of law [2]. The current neurodynamic functional brain imaging methods offer unique insight into the human brain processes by capturing the neuronal activity in real time [3]. The discovery of the fMRI is a major breakthrough in the research of computational neuroscience and enjoys unprecedented popularity in its use as a tool for cognitive neuroscience research [4-5]. It radically improves our understanding of the functional organization of the human brain [6]. The fMRI technique is based on the observation that activated brain regions reveal a change in the blood supply as well as blood oxygenation, i.e., any increases in the neural activity are indicated by localized increases in blood oxygenation [7].

*Address correspondence to this author at the Department of Information and Communication Technology, Fakir Mohan University, Vyasa Vihar, Balasore-756019, Odisha, India; Tel: +031-219-3574; E-mail: [email protected]

It provides a sequence of 3D brain images with intensities representing the Blood Oxygenation Level Dependent (BOLD) brain activations. Brain images obtained with fMRI have been used in combination with machine learning to build decoders for mind states [8]. An illustration of a small portion of the fMRI data (collected over a 15-second interval) is shown in Fig. (1), where a subject reads a word and decides whether it is a ‘noun’ or ‘verb’ [9]. It is clear from Fig. (1) that brain state classification via fMRI involves an analysis of complex and multivariate data. Neuroscientists and physicians conduct fMRI experiments to determine precisely the part of the brain that is active for critical functions such as thinking, speech, sensation and attention, to assess the effects of stroke, trauma or degenerative diseases (such as Alzheimer’s diseases) on brain function, to monitor the growth and the activity of the brain tumor regions and to plan surgery, radiotherapy or other surgical treatments of the brain [10]. In this survey, we attempt to focus on the machine learning techniques that have made a significant contribution to the fMRI based classification. The motivation for machine learning in fMRI analysis is to identify the best possible classification performance, in the quest for algorithms that can separate any kind of dataset regardless of its complexity

Satchidananda Dehuri

Page 2: Current Bioinformatics, 2015, 10 Machine Learning Approaches …sclab.yonsei.ac.kr/publications/Papers/IJ/2015_CBIO_Dehuri.pdf · Machine Learning Approaches for Cognitive State Classification

2 Current Bioinformatics, 2015, Vol. 10, No. 4 Parida et al.

and nonlinearity. When this is the primary goal of the analysis, it is useful to apply the most powerful classifier algorithm available.

Fig. (1). Typical fMRI data for a selected set of voxels in the cortex, from a two dimensional image plane through the brain [9].

Fig. (2) shows the general framework for fMRI classification.

Fig. (2). Framework for fMRI classification.

In connection with the cognitive state classification techniques, we classified the various methods under three categories: i) probabilistic method, ii) machine learning method, and iii) ensembles and other methods. The rest of the paper is presented as follows. The preprocessing techniques are discussed in Section 2. Feature extraction and selection are dealt with in Sections 3 and 4, respectively. Learning and classification techniques are the major topics examined in Section 5. The pros and cons of cognitive state classification and their performance in classification are summarized in Section 6. Section 7 includes the conclusions and future perspectives.

2. PREPROCESSING

Noise and weakly activated voxels cause researchers a plethora of problems while analyzing the fMRI data [11]. The data obtained from fMRI neuro-imaging is especially rich and complex [12], and hence it is a challenge for machine learning and data mining researchers [13] to uncover hidden knowledge. Prior to analysis, the fMRI data typically undergoes a series of preprocessing steps to remove the artifacts and validate the model assumptions. Preprocessing or voxel selection is a common step in many pattern-based classification models of functional neuro-imaging data [14]. In Sections 3 and 4, we have explored some of the works geared towards feature extraction and selection. However, the interested reader may be recommended to refer [15-17] for further details regarding feature selection techniques for fMRI based cognitive state classification. The steps involved in fMRI preprocessing are slice timing correction, realignment, co-registration of structural and functional images, normalization and smoothing [18]. • Slice Timing Correction: It involves shifting the time

course of each voxel to enable the assumption that they were measured simultaneously.

• Motion Correction: The step for motion correction is to find the best possible alignment between the input image and any target image (e.g., the first image or the mean image).

• Co-registration and Normalization: Co-registration, the process of aligning structural and functional images, is typically done using either a rigid body (6 parameters) or an affine (12 parameters) transformation.

• Spatial Smoothing: The process of spatially smoothing an image is equivalent to applying a low-pass filter to the sampled k-space data before reconstruction.

Some research efforts towards optimal fMRI preprocessing includes the following: Tanbe et al. [19] demonstrated the advantages of the spline technique in analyzing the Region of Interest (ROI) analysis; the importance of spatial smoothing in the multi-voxel pattern was investigated by Kamitani et al. [20]; to analyze the preprocessing parameters, Della-Maggiore et al. [21] conducted a simulation based comparative study; Worsley et al. [22] proposed a technique for reducing bias, to improve the Signal to Noise Ratio (SNR); and Ngan et al. [23] proposed a time-varying filter based on Stationary Wavelet Transform (SWT). Alongside, a data-analysis framework termed Nonparametric Prediction, Activation, Influence and Reproducibility re-sampling (NPAIRS) for data acquisition, preprocessing, data analysis and extraction of Statistical Parametric Maps (SPMs) were introduced by Strother et al. [24]. Further, LaConte et al. [25] utilized the NPAIRS framework to obtain prediction accuracy estimation and by adding to it, Churchill et al. [26] proposed a flexible general framework for evaluating the preprocessing choices by combining the NPAIRS framework [24] and DISTATIS

Page 3: Current Bioinformatics, 2015, 10 Machine Learning Approaches …sclab.yonsei.ac.kr/publications/Papers/IJ/2015_CBIO_Dehuri.pdf · Machine Learning Approaches for Cognitive State Classification

Machine Learning Approaches for Cognitive State Classification and Brain Activity Prediction Current Bioinformatics, 2015, Vol. 10, No. 4 3

(a three-way version of metrics multidimensional scaling [27]).

3. FEATURE EXTRACTION

Feature extraction involves obtaining new features from the original features, where the extracted features represent patterns with minimal loss of the information required for the best possible classification. Therefore, feature extraction methods compute a weighted projection of multiple features onto new dimensions and select a pre-determined number of dimensions for classification rather than on the original attributes. Principal Component Analysis (PCA) and Independent Component Analysis (ICA) are widely used for feature extraction and dimensionality reduction.

3.1. Principal Component Analysis

PCA is the standard method for dimensionality reduction that eliminates redundancies in the data and reduces the number of dimensions required to model the available data [28-29]. The base of PCA formulation is the representation of an image in terms of its components (eigenvectors). Each eigenvector has an associated eigen-value. The highest eigen-value is the first principal Component (PC) of the image. The smaller ones represent the less significant components. Dimensionality reduction, therefore, entails choosing the less significant components to exclude the feature vector [30]. However, the features transformed by the principal components are not directly connected with the physical nature of original features, thereby complicating the interpretability of the classification. Assume that a set of N items is given and each item is an n-dimensional column vector, i.e. x! ∈  ℝ! , 1 ≤ i ≤ N. Assume that the data is mean centered, i.e. x!" = 0,!

!!! for 1 ≤ j ≤ N. The covariance matrix can subsequently be computed as follows:

C = !!

𝑥!𝑥!!!!!! . (1)

The PCA then diagonalizes the covariance matrix to obtain the principal components which can be achieved by solving the following eigen-value problem [31]:

𝜆𝑣 = 𝐶𝑣. (2) As explained by Viviani et al. [32] the functional PCA is more effective compared with the ordinary PCA in recovering the signal of interest even under limited or no prior knowledge. Friston et al. [33] demonstrated that it is the nonlinear PCA which identifies a small number of underlying components or sources that best explain the observed variance-covariance structure of any data. The nonlinear PCA is used in the neural network architecture for estimates, sources, and modes. The PCA plays a significant role in improving the classification accuracy by dimensionality reduction. You et al. [34] showed that the Nonlinear Decision Function (NDF) classifier with PCA results in highly accurate classification results of the fMRI activation map. The experimental study reveals that the PCA drastically reduces the input dimensions for the classifier. It shows that the five leading eigen-vectors represent 90% of the original data based on the eigen-values.

Hansen et al. [35] explained the uses of generalization error to select the number of principal components.

3.2. Independent Component Analysis (ICA)

ICA is essentially a method for extracting individual signals from a mixture of signals. The ICA can reveal inter-subject and inter-event differences in the temporal dynamics [36]. It is considered as an effective tool for denoising the fMRI with respect to random noise and confounding signals such as pulsation and breathing artifact [37]. The ICA has found applications in two fields of research relevant to cognitive science: analysis of biomedical data and computational modeling [38]. ICA algorithm decomposes the input matrix X into two new matrices:

X = A × S, (3) where X = (X!… X!)! is the 𝑚  ×1 continuous-valued random vector of the m observable signals, A = (a!")  is the unknown constant (i.e. non random) and invertible square mixing matrix of size m × m and S = (S!,… , S!)! is the 𝑚  ×1  continuous-valued random vector of the m unknown source signals to be recovered, referred as sources. Beckmann et al. [39] proposed the Probabilistic Independent Component Analysis (PICA) for fMRI data which allows for the non-square mixing process and assumes the data are contained by additive Gaussian noise. The PICA model is formulated as follows.

𝑥! =  𝐴𝑠! +  𝜇   +  𝑛!      ∀  𝑖  , (4) Here, x! denotes the p-dimensional column vector of the individual measurements at the voxel location i, 𝑠! denotes the q-dimensional column vector of the non-Gaussian source signals contained in the data and 𝑛! denotes Gaussian noise 𝑛!  ~    𝑁 0,𝜎!∑! . The vector µ defines the means of the observations 𝑥! where the index i is over the set of all voxel locations and the p × q matrix A is assumed to be non-degenerate, i.e., of rank q. Solving the blind separation problem requires finding a linear transformation matrix W such that

𝑠 =  𝑊!, (5) is a good approximation to the true source signal s. Smolders et al. [40] conducted an experiment based comparative study of ICA using the Fuzzy Clustering Method (FCM) in the context of time-resolved fMRI data, where ICA worked optimally on the original time series, averaging with respect to the task on set. Formisano et al. [41] proposed the cortex based spatial Independent Component Analysis (sICA) method for application to the fMRI data by selecting the potentially meaningful fMRI independent components. Liao et al. [42] extended the sICA for multitask fMRI data. Based on experimental results, it is observed that the sICA can separate two independent component patterns corresponding to each task in a multi-task experiment. Calhoun et al. [43] have extended the ICA which successfully analyzes the fMRI data for a single subject and proposed a new approach to the group of the subject.

Page 4: Current Bioinformatics, 2015, 10 Machine Learning Approaches …sclab.yonsei.ac.kr/publications/Papers/IJ/2015_CBIO_Dehuri.pdf · Machine Learning Approaches for Cognitive State Classification

4 Current Bioinformatics, 2015, Vol. 10, No. 4 Parida et al.

McKeown et al. [44] developed a new method of analyzing the fMRI data based on ICA. They decomposed eight fMRI data sets from the various activities of four normal subjects into spatially independent components. From the results of their experiment, they demonstrated that the ICA can extract both transiently and consistently task-related, non-task-related, and artifactual components without a prior knowledge of their temporal or spatial structure. Typically, PCA and ICA are used to decompose the fMRI signal and separate the BOLD signal change from the structured and random noise. Thomas et al. [45] proposed a different approach to identify the PCA or ICA components contributing primarily to the noise using an unsupervised algorithm. The noise components are then removed before subsequent reconstruction of the time series data. Some interesting and significant behavior is observed between the PCA and ICA. The ICA seems to be a more effective decomposition method for isolation and identification of structured noise, whereas the PCA appears to be a more effective decomposition method for the isolation and identification of non-structured random noise. Bai et al. [46] proposed a new retrieval algorithm using ICA for the fMRI images. For selecting the task related components, they introduced the low frequency heuristic as well as used Maximum Weight Bipartite Matching (MWB) to match the image similarity. Based on the experimental results, the ICA was observed to provide a good feature for fMRI image retrieval and task-related component selection using low frequency as well as possessed the potential of finding image similarity in the context of cognitive tasks. A hierarchical model for patterns in multi-subject fMRI was proposed by Varoquaux et al. [47]. They introduced canonical ICA (CanICA), canonical correlation analysis for identification of data subspace common to the group, ICA based pattern extraction and procedure based on cross validation to quantify the ICA stability. Based on the experimental result, the proposed feature extraction was found to be more reproducible at the group level. To address the different important issues that are present in the ICA, Beckmann and Smith [39] extended the general ICA and presented an integrated approach to probabilistic ICA for the fMRI data which allows for non-square mixing in the presence of Gaussian noise. Their model improves the robustness and interpretability of the IC maps as currently generated on the fMRI data. They applied it to real fMRI data and showed that the model was capable of producing a relevant pattern of activation, which would neither be generated by the standard GLM nor the standard ICA framework. Calhoun et al. [48] proposed an analysis model for fMRI data using ICA. The model includes spatial smoothing, ICA algorithm and data reduction for fMRI data analysis. In the experimental study, they compared three Gaussian Kernels as well as two ICA methods like infomax and fastICA. The k-means algorithm was used to perform clustering. The experimental result shows that infomax algorithm performance is marginally better than the fastICA and the clustering performance is better than the PCA at lower noise levels; however, the combination of infomax with PCA is best for data reduction approach.

Hansen [49], following an analytical study discussed the aspect of multivariate fMRI modeling for dimensionality reduction. They focused on models capable of extracting the relevant generalizable training set to select k. The leave-one-out error can be computed by classifying the training data using the neighbors excluding the actual training data point. The optimal k can then be used in classifying the test set. Random Subspace (RS) ensembles are considered for problems involving large dimensionality and excessive feature-to-instance ratios. It is observed that the RS ensemble together with SVM based classifiers is the most accurate classification approach across a variety of voxel pre-selection methods [50]. Michel et al. [51] proposed the Total Variation (TV) regularization method applying it to the fMRI data, proving that it is well suited for the purpose of brain mapping and its use as a powerful tool for brain decoding. They integrate TV with a logical regression framework for classification, which is the first of its kind. The TV approach yields a higher accuracy than the reference voxel based methods.

4. FEATURE SELECTION

The purpose of feature selection is to determine a subset within a set of features in order to minimize the classification error based on various criteria. For instance, if voxels are being used as features, we may desire only those voxels in a particular Region of Interest (ROI) or their values at a particular point in a trial. It finds its greatest use in the research of application for which datasets with tens or hundreds of thousands of variables are available [52]. The feature selection is vital in cognitive classification as it is computationally infeasible to use all the available features [53]. Further, its importance is also recognized in a technique like MVPA, where voxels with especially high noise levels (and low signal levels) can sharply reduce classifier performance [54]. Feature selection is effective in reducing dimensionality, deleting irrelevant data, increasing learning accuracy and improving result comprehensibility. Pereira et al. [55] overviewed the following method for selecting informative features. • Activity: This method selects voxels that are active in

at least one condition relative to a control task baseline, scoring a voxel is measured by a t-test on the difference in the mean activity levels between the condition and baseline.

• Accuracy: This method scores a voxel on how accurately a Gaussian Bayesian classifier can predict the condition of each example in the training set. It performs a cross-validation within the training set for the voxel and the resulting accuracy indicates the voxel score.

• Searchlight Accuracy: It is similar to accuracy; however, instead of using the data from a single voxel, it uses data from the vessel and its immediately adjacent neighbors in three dimensions.

• Analysis of Variance (ANOVA): This method identifies voxels, where there are reliable differences in the mean values across conditions (e.g., A is

Page 5: Current Bioinformatics, 2015, 10 Machine Learning Approaches …sclab.yonsei.ac.kr/publications/Papers/IJ/2015_CBIO_Dehuri.pdf · Machine Learning Approaches for Cognitive State Classification

Machine Learning Approaches for Cognitive State Classification and Brain Activity Prediction Current Bioinformatics, 2015, Vol. 10, No. 4 5

different from BCD or AB from CD, etc) as measured by the ANOVA.

• Stability: This method selects voxels that react to the various conditions consistently across the cross-validation groups in the training set.

Niiniskorpi et al. [56] employed the Particle Swarm Optimization (PSO) [57] for feature selection and based on the experimental result, the PSO has been proven successful in increasing the classification performance (refer [58] for more information). For multi-subject data, the PSO algorithm in combination with multiple linear regressions (MLR) has also been observed to be successful in identifying highly relevant voxels. One of the main obstacles encountered in the classification of fMRI brain activity is the selection of informative features (voxels), as addressed by Michel et al. [59]. They proposed a multivariate approach based on Mutual Information (MI) criteria estimated by the nearest neighbor. From their experiment, it is observed that the MI-based feature selection can handle large dimensions and achieve better performance together with a sparse feature selection. It, therefore, appears that MI is an interesting way to locate areas of activity in the brain by selecting very few, but strongly informative voxels (from 1500 to 8 voxels). Honorio et al. [60] proposed a feature selection method termed ‘threshold-split region’ which does not require a predefined set of a region of interest. They used the majority of vote as their classification technique. This new method selects the biggest region (on the training set) that activates for one class and deactivates for the other class. The method proposed revealed approximately 90% generalization accuracy in two completely different datasets, different subjects, different tasks and different acquisition protocols. Further, to resolve the challenging problem of feature selection in the case of high dimensional data, Wu et al. [61] proposed an efficient feature selection method for classification of high dimensional numerical data. They proposed the Dynamic Feature Extraction method (DFE) which uses ‘balanced information gain’ to measure the contribution of each feature for classification and calculates the correlation among the features. From their experimental results, it is observed that the feature selection method performs better than even the state-of-the art approaches. Venkataraman et al. [62] proposed an alternative to the univariate feature selection method which searches across the subsets of the data to isolate a set of robust, predictive functional connections. They compared the Gini Importance (GI) with the univariate statistical test that evaluates functional connectivity changes for schizophrenia and GI was observed to be more stable and accurate when compared with the univariate features which have little classification power. Genetic algorithm based feature selection has shown promising results when applied to one-class neural network for classifying cognitive state brain activities [63].

5. LEARNING AND CLASSIFICATION TECHNIQUES

Multivariate Pattern Analysis (MPA) evolved as an emerging technology for the analysis of fMRI data and has found applications in a variety of fMRI research [64]. The

MPA includes a series of analytical steps including those of data splitting, preprocessing, activity estimation, voxel selection, training and testing the classifiers. Weil et al. [65] reviewed the advantages, application and limitations of MPA. MPA overcomes the limitation of the conventional fMRI analysis for perception study using a spatial resolution of 1.5-3 mm3 voxels. The limitation of MPA is that it can classify brain activity patterns associated with only a limited number of categories. The MPA shows potential in clinical applications in the assessment of patients with expressive aphasia and chronic pain management. One of the popular techniques used in recent development is machine learning classifiers for classifying cognitive states by analyzing the fMRI data. An overview of the pattern classification methods, typically referred to as MPA can be obtained in [66] and a detailed review of hybrid machine learning approaches to cognitive classification can also be obtained in [5, 67]. Classification techniques can identify many types of activation patterns, and patterns unique to each subject or shared across subjects [68]. The trained classifier represents a function of the form [69, 70]:

𝐟: 𝐟𝐌𝐑𝐈 𝐭, 𝐭 + 𝐧 → 𝐘, (6) where 𝐟𝐌𝐑𝐈(𝐭, 𝐭 + 𝐧) is the observed fMRI data during the interval from time t to t+n and Y is a finite set of cognitive states to be discriminated. Similarly, 𝐟(𝐟𝐌𝐑𝐈 𝐭, 𝐭 + 𝐧 ) is the classifier prediction regarding which cognitive state actually gave rise to the fMRI data observed 𝐟𝐌𝐑𝐈(𝐭, 𝐭 + 𝐧).

Fig. (3). A classifier is learned from the training set and their labels and used to predict labels for a test set, whose labels it cannot see [69].

Any set of examples can be divided into training and test sets, as some form of cross-validation generally used in practice. This consists of repeatedly making that division (e.g., if half of the examples are training and the other half involve testing, then this is reversed), training and testing a classifier for each division and averaging the accuracy scores across the divisions [69] (c.f., Fig. 3). Partitioning the data into training and testing sets requires particular attention when working with fMRI data, due to the numerous unavoidable temporal dependencies and typical organization into runs and events [71].

5.1. fMRI Analysis Based on the Probabilistic Method

In this section various techniques and models are discussed as well as their application in fMRI data analysis based on the probabilistic theory. In particular, the Bayesian theory and its variants are focused upon in this section, for the fMRI data classification. Frank et al. [72] explained the application of the probability theory to analyze fMRI data. The probability

Page 6: Current Bioinformatics, 2015, 10 Machine Learning Approaches …sclab.yonsei.ac.kr/publications/Papers/IJ/2015_CBIO_Dehuri.pdf · Machine Learning Approaches for Cognitive State Classification

6 Current Bioinformatics, 2015, Vol. 10, No. 4 Parida et al.

theory involves the posterior distribution of parameters which includes available information from the data, the hypothesis and prior information. The posterior distribution can answer questions relating to data. The theory has the capacity to make predictions in the face of an unknown parameter such as noise. Penny et al. [73] used the variational Bayesian (VB) framework and explained the inference procedure for the fMRI time series based on general linear models with autoregressive (AR) error processes. The VB approach is an extension of the Bayesian analysis, which has the advantage of considering the variability of hyper-parameter estimation with additional computational effort. Kershaw et al. [74] revealed the application of Bayesian statistics in fMRI data analysis. They examined three specific models including the linear model with white Gaussian noise, linear Time-Invariant (LTI) and linear model with auto-regressive noise. The methods of Bayesian statistics have been shown to be applicable to derive a relevant statistical test for activation in an fMRI time series irrespective of the complexity of the model parameter. Friston et al. [75] introduced the Multivariate Bayesian scheme (MVB) to decode the various brain states from neuroimages. The MVB enables inference regarding different models of structure, function mapping such as distributed and sparse representations and permits optimizing the model and produces prediction, with better performance than SVM. Joshua et al. [76] introduced a Bayesian framework based theory for modeling inductive learning and reasoning as statistical inferences over structured knowledge representations. In their Bayesian model of induction, they focused on three vital questions: What is the content of the probabilistic theories? How are they used to support rapid learning? and, How can they themselves be learned? Palatucci et al. [77] discussed the problem of classification when the number of features greatly exceeded the number of training examples. They proposed the Bayesian framework which shares information among the features by modeling the similarities among the parameters and demonstrated the case for classifying 80,000 original features and only two successful training examples per class. Ahn et al. [78] demonstrated the manner in which the hierarchical Bayesian analysis (HBA) is useful for the model based fMRI and compared this with the other non-hierarchical and non-Bayesian methods. It is observed that when the participant data contain adequate information, then the non-hierarchical analysis is sufficient for the model-based fMRI, whereas for the real world scenario involving limited trials, the HBA performs better. Hutchinson et al. [79] presented Hidden Process Models (HPMs) to model the fMRI data. HPM assumes that the observed data generated by a sequence of mental processes is triggered by stimuli and allows for processes whose timing may be unknown and which might not directly connected to specific stimuli. They submitted the HPM framework and its learning and inference algorithm with the experimental result which they demonstrated on both simulated and real fMRI data. Janoos et al. [80] proposed the Hidden Markov Model (HMM) based approach to determine the intrinsic spatio-temporal patterns for studying neurological and psychiatric disorders. The mental processes are represented by an HMM

which captures the concept of the functional brain transitioning through a cognitive state-space over time as it performs a task. They constructed a spatio-temporal representation of the global and instantaneous cognitive state of a subject in an unsupervised fashion using the fMRI data. They applied the method to group-wise study for development of dyscalculia and demonstrated task related differences between healthy controls and dyscalculics. Kriegeskorte et al. [81] addressed the question ‘Where in the brain does the activity pattern contain information about the experimental condition?’ They scanned the image volume with a ‘searchlight’ where the contents were analyzed multivariately at each location in the brain. Their experiment shows that the focally distributed effects which occur in the fMRI data are better detected by information-based mapping. Information based mapping is suitable, when one is interested in identifying ‘Where in the brain do the regional spatial activity patterns differ across experimental conditions?’. To address the concern regarding non-independence, Esterman et al. [82] submitted a simple and practical solution to reduce bias in the secondary test by incorporating the Leave-One-Subject-Out (LOSO) approach. Their LOSO procedure reduces the inflation of the estimated effect sizes using non-independent whole-brain group analysis relative to an independent test by defining the ROIs with an independent data set. Conroy et al. [83] proposed a novel multi-subject algorithm which produced functional correspondence across a set of subjects using functional connectivity. The experiment revealed that in order to align the spatial patterns of functional responses in the fMRI data, the proposed algorithm is efficient with respect to both time and space complexity. McIntosh et al. [84] proposed a new tool Partial Least Squares (PLS) for neuroimage analysis, which extracts spatial patterns of brain activity that are optimally associated with aspects of the experiment design or some behavioral measure. One of the major strengths of this tool is its ability to extract certain features that are inaccessible by other methods, although it overlooks some complexities for which other methods may be more suited. Mitchel et al. [85] revealed how the trained classifier is able to categorize successfully the discriminate cognitive states such as ‘observing picture’ versus ‘reading a sentence’ and ‘reading a word about people’ versus ‘reading a word about buildings’. For picture-sentence study, for each subject s, they trained a classifier of the form:

f!: fMRI! t, t + 8  →   P, S , (7) where fMRI! t, t + 8 is a sequence of 16 fMRI images observed by the subject s throughout the 8-second time interval t, t + 8 and the target value for f!: fMRI! t, t + 8 is P if the subject is viewing a picture or S if the subject is viewing a sentence during this interval. They employed the Gaussian Naïve Bayes (GNB) classifier, which uses the statistical approach for classifying the cognitive state.

Page 7: Current Bioinformatics, 2015, 10 Machine Learning Approaches …sclab.yonsei.ac.kr/publications/Papers/IJ/2015_CBIO_Dehuri.pdf · Machine Learning Approaches for Cognitive State Classification

Machine Learning Approaches for Cognitive State Classification and Brain Activity Prediction Current Bioinformatics, 2015, Vol. 10, No. 4 7

Mitchel et al. [86] extended their research to multiple human subjects and demonstrated how such a classifier may be applied across multiple human subjects, including subjects who were not exposed during training. The classifiers GNB, SVM and k-NN are used in the classification task and for feature selection, Average, ActiveAvg (n) and Active (n) are used in the experiment. As multiple subjects are involved in representing fMRI data across multiple subject regions of interest (ROI), mapping and Talairach coordinates are used. The learning task includes training a classifier to determine whether the subject is viewing a picture or a sentence in a particular 16-image interval of the fMRI data.

f:   I!,… , I!"  →   Picture, Sentence , (8) where I! is the captured image at the time of stimulus (picture or sentence). For the syntactic study, the classifiers trained as below form:

f:   I!,… , I!"  →   Ambiguous,Unambiguous . (9) The above mentioned two data (picture vs sentence, syntactic study) were used to train the classifier for both single and multi subjects and tested to obtain classification accuracy. The experimental result shows the success of distinguishing between a predefined set of states, while the subject performs specific tasks. Roussos et al. [87] extended the ICA and proposed a hybrid wavelet-ICA (W-ICA) model for transforming the signals into a domain and applying the Bayesian inference. The Directed Acyclic Graph (DAG), also termed the Bayesian network structure represents the probabilistic dependence relationships. The random variables are represented as nodes and ‘causal’ relationships between the nodes as edges. They applied this W-ICA model to the fMRI data. This new approach is drawing great attention, but further analysis is required to test its competitiveness against other hybrid techniques and its use on neuro-scientific data. Kim et al. [88] proposed a hybrid ICA-Bayesian approach to identify the differences between the brain regions of normal and abnormal subjects. The fMRI noise gets reduced by the ICA filtering and the discrete Dynamic Bayesian Network (dDBN) is applied to the denoised components. The correlation between the brain regions in one group versus another is measured by the Approximate Conditional Likelihood Score (ACL) and it is observed that the ICA filtered data significantly improves the magnitude of the ACL score. The ICA for the filtering noise components from the fMRI dataset is highly recommended, as it plays a more significant role in determining the higher-level correlations and is more sensitive to noisy datasets.

5.2. fMRI Analysis Based on Pattern Classification

Pattern classification in the fMRI is a technique that automatically identifies the differences in the distributed neural substrates resulting from cognitive tasks [89]. This section discusses the usage of machine learning methods for cognitive state classification through the fMRI data.

Yong et al. [90] attempted to solve the challenging problem of detecting the cognitive states from the fMRI images of multiple subjects using a nonlinear multivariate classification method. They proposed a comprehensive framework as shown in Fig. (4), applied to lie-detection. Pereira and Gordon [91] proposed the Support Vector Decomposition Machine (SVDM) which combines the

Fig. (4). Yong et al.’s [90] framework for fMRI image classification.

dimensionality reduction and classification into a single objective function and presented an efficient minimization algorithm for optimizing this objective. The proposed technique dataset (matrix of n examples (rows) with m features (columns) is denoted as follows.

𝐗𝐧×𝐦 =      𝐱𝟏 𝟏 𝐱𝟏 𝟐 … 𝐱𝟏 𝐦𝐱𝟐 𝟐 𝐱𝟐 𝟐 … 𝐱𝟐 𝐦𝐱𝐧 𝟏 𝐱𝐧 𝟐 … 𝐱𝐧 𝐦

    = 𝐱𝟏!

𝐱𝟐!

𝐱𝐧! (10)

The SVD approximates 𝐗 as the product of two lower rank matrices,

𝐗𝐧×𝐦 ≈  𝐙𝐧×𝐥𝐖𝐥×𝐦 (11) where 𝐥 is a rank of approximation, 𝐖 is the basis matrix whose 𝐥 rows denote the direction of variability of the training example, 𝐙 is a matrix of the coordinates which gives the coefficients necessary to reconstruct the training example. The SVD minimizes the sum of the squared approximation errors. Z and W can be computed by solving the optimization problem:

𝐦𝐢𝐧𝐙,𝐖 𝐗 − 𝐙𝐖 𝐅𝐫𝐨𝟐 (12)

where 𝐀 𝐅𝐫𝐨 indicates the Frobenius norm of the matrix  𝐀, 𝐀𝐢𝐣𝟐𝐢𝐣 .

To solve the classification problems using SVD, it is required to fine tune the parameters 𝛉𝐥×𝐤 such that matrix 𝐬𝐠𝐧 𝐙𝛉 is a good approximation to 𝐘. The proposed SVDM approach can achieve better accuracy by learning a low dimensional representation and a classifier simultaneously, rather than by learning the two separately. Peltier et al. [92] applied SVM for classifying the complex fMRI data, both in the image domain and in the

Page 8: Current Bioinformatics, 2015, 10 Machine Learning Approaches …sclab.yonsei.ac.kr/publications/Papers/IJ/2015_CBIO_Dehuri.pdf · Machine Learning Approaches for Cognitive State Classification

8 Current Bioinformatics, 2015, Vol. 10, No. 4 Parida et al.

acquired k-space data. The SVM algorithm finds the linear decision boundary (separating hyperplane) using the decision function.

𝐃 𝐮𝐫 =   𝐰.𝐮𝐫 +  𝐰𝟎 (13) where 𝐰 denotes the linear decision boundary and selected to maximize the boundaries defined by 𝐃 =  +𝟏 and 𝐃 =  −𝟏 between two class distributions. In the case of fMRI, stimulus A and stimulus B are assigned unique classes and the input vector of the classifier consists of spatiotemporal image data. Each image is represented as an input vector u and in the experimental condition (behavioral state) associated with each u defines the class label, v. For complex data, the proposed approach uses the complex dot product

𝐰.𝐮 =   𝐮𝐧∗𝐰𝐧𝐍𝐧!𝟏 (14)

where N denotes the number of voxels in the complex image 𝐮 and 𝐮∗ represents the complex conjugate. The complex data is classified by using the complex Kernel. Based on the experimental results it is shown that the proposed approach obtains a higher classification accuracy for both k-space and image data. Balci et al. [93] showed memory encoding of fMRI data using weighted SVM. They used the General Linear Model (GLM) for feature extraction and t-test for feature selection. The GLM model for feature extraction is represented in equation (15):

𝐲 𝐯 = 𝐗  𝛃   𝐯 +  𝐞 𝐯 (15) 𝐲 𝐯 : fMRI signal of 𝐍 time points measured as spatial location 𝐯 𝐗   ∶matrix of regressor 𝛃 𝐯 : coefficients of regressors in the columns of 𝐗 𝐌 : total number of stimulus onsets 𝐞 𝐯 : Gaussian noise The classification accuracy showed over 90% in the major subject and it was observed that high accuracy resulted in a simple motor task, whereas the accuracy dropped for memory encoding task, which implies that classification accuracy is significantly affected by the complexity of the neuroimaging experiment. To address the two key issues of the current classifiers such as high data dimensionality and low sample size, Brodersen et al. [94] have proposed a generative-embedding approach which incorporates neurobiologically generative models in discriminative classifiers. In their approach they have introduced the generative-embedding for fMRI using a combination of dynamic causal models (DCMs) and SVMs. For DCM-based generative-embedding, which is used for subjective classification, they have proposed a general procedure which explains the guidelines for unbiased generative-embedding application with respect to fMRI. Cox et al. [95] used linear discriminant analysis and support vector machine to classify the patterns of fMRI activation evoked by the visual presentation of various categories of objects. It is observed that the support vector

machine performed much better than the linear discriminant classifier, particularly when large numbers of dimensions (voxels) were used. To evaluate the performance between SVM and Neural Network (NN), Hardoon et al. [96] performed a comparative analysis among the NN, one-class SVM, and two-class SVM methods. Based on the experiment, it is observed that the NN and one-class SVM are quite successful at learning the positive class, but not as successful as ruling out the negative one. Two class classification based on the SVM on visual data showed accuracy close to 90%. Formisano et al. [97] used the multivariate statistical method for the fMRI time series classification. They employed SVM and the Relevance Vector Machine (RVM) for the classification and regression of the fMRI patterns. The RVM which is based on the linear combination of Kernel function showed potential in the analysis of a complex dataset. In their experiment Onut et al. [98] indicated that when the number of selected features (voxels) increases, the accuracy of the system increases too, although after the saturation point the increase in the accuracy is very slow. They designed the classifier based on the neural network to distinguish between the cognitive states such as viewing pictures of houses, chairs, and faces. Hardoon et al. [99] proposed a new unsupervised machine learning approach to fMRI analysis in which a simple categorical description of the stimulus type (e.g., type of task) was replaced by a more informative vector of stimulus features. They used the Kernel Canonical Correlation Analysis (KCCA) to learn the correlation between the fMRI volume and the corresponding stimulus features presented at a particular time point. In their experiment to classify a new stimulus they computed the Kernel between the test and the training samples and projected that Kernel onto the learned semantic space. The output gives a score, which can be thresholded to allocate a category to each test example. Kernel based techniques have shown promising results when applied to fMRI analysis. In their experiment Yizhao et al. [100] selected the Kernel method for the following two useful properties. The Kernel trick reduces the computational complexity of high dimensional data as the parameter evaluation domain is reduced from the explicit feature space into the Kernel space. With an appropriate Kernel function one can map the input feature space into the higher dimensions. This allows the non-linear approaches in the original feature space to be achieved by linear approaches in the higher dimensional space. In their experiment, they used two Kernel methods, Kernel Ridge Regression (KRR) and Relevance Vector Regression (RVR) mentioned as follows for predicting the continuous brain states. The methods described above are more computationally efficient as the number of parameters to be learned (OM)) is much lesser than that for the approaches with explicit feature expression (O(P)).

Page 9: Current Bioinformatics, 2015, 10 Machine Learning Approaches …sclab.yonsei.ac.kr/publications/Papers/IJ/2015_CBIO_Dehuri.pdf · Machine Learning Approaches for Cognitive State Classification

Machine Learning Approaches for Cognitive State Classification and Brain Activity Prediction Current Bioinformatics, 2015, Vol. 10, No. 4 9

• Kernel Ridge Regression: for each task, ridge regression learns a linear operator w to minimize the squared difference between the predictions.

• Relevance Vector Regression: Relevance vector regression is formulated in a Bayesian framework while the general expression takes on an SVM-like form.

Observation revealed that if the fMRI pattern for a particular brain activity is consistent, then in most of the cases, the same cognitive and sensory state will act in the same fMRI pattern and sparse representation would incur the loss of information for further prediction. The fMRI pattern elicited by certain sensory experiences, such as hearing, is stable and easy to capture in contrast to the cognitive states such as emotions which are very unpredictable. Poldrack et al. [101] tried to solve one of the fundamental questions of cognitive classification, i.e., whether it is possible to predict the mental state of an individual using a statistical classifier, once it has been tried on other individuals. In their experiment a single statistical parametric (Z) map was obtained and the Z-statistic data was submitted for classification using a multi-class linear Support Vector Machine (SVM). The accuracy in predicting a task on a subject was estimated by the leave-one-out cross-validation. The classifier was trained on all subjects except for one, and then tested on that left-out individual, which was repeated for each individual. Based on their experiments, it was observed that the fMRI data contained sufficient information to accurately determine an individual’s mental state using the classifiers trained on data from other individuals. Jakel et al. [102] demonstrated the potential of the Kernel methods for analyzing behavioral data, particularly for identifying features in the categorization experiments. Hence, from the discussion above, it was concluded that the Kernel methods with regularization techniques are a particularly well understood way of assuring a good generalization performance in fMRI analysis. Schmah et al. [103] addressed one of the key issues concerned with neuroimaging data. They demonstrated that better discrimination can be achieved by fitting a generative model to each separate condition and then identifying which model is most likely to have generated the data. A set of fMRI volumes can be modeled using a two-layer network called the Restricted Boltzmann Machine (RBM), in which the stochastic visible units are connected to the stochastic hidden units using symmetrically weighted connections. The visible units of the RBM correspond to voxels, while the hidden units can be as assumed to be feature detectors. They trained two independent RBMs, one for each data class. For each subject, the data was split into three subsets: 75% training, 12.5% validation and 12.5% test. As per the experimental results, it was evident that in all cases the generatively-trained RBMs outperformed all the other methods. To analyze the real time data for classification, Sitaram et al. [104] demonstrated an online Support Vector Machine (SVM) which can recognize the discrete emotional states (happiness and disgust) from the fMRI signals. Their experiment included the following subsystems.

a) Image acquisition subsystem, and b) fMRI-BCI for image preprocessing, brain state

classification and visual feedback. Further, the experiment comprised three parts: experiment 1 for investigating real time classification; experiment 2 for feasibility testing of multi-class prediction; and experiment 3 for assessing the effect of the extended feedback training with a real time classifier. Another real time fMRI analysis method was proposed by Anderson et al. [105]. They demonstrated the real-time functional MRI (rt-fMRI) method to predict and detect online changes in the cognitive states. The method was applied for the prediction of video activity in nicotine-addicted subjects using both regional spatial averages and pre-constructed independent component spatial maps referred to as the ‘IC dictionary’. They explained the advantages and shortcomings of machine learning methods while applying them to predict and interpret cognitive states in the real-time context. One-class neural networks used to classify the cognitive states was proposed by Boehm et al. [63]. They used the genetic algorithm for feature selection. In the experiment, four subjects inside an MRI scanner were passively watching images belonging to five different semantic categories, as follows: human face, houses, patterns, objects and blank images. Based on the experimental results, the genetic algorithm together with the one-class neural network (i.e., a compression network) was shown capable of being used to identify appropriate features, that on the one hand, increases the accuracy of the classification to close to that obtained from the two-class method. Many researchers have applied hybrid techniques for fMRI data analysis. Wang [106] combined the advantages of both the General Linear Model (GLM) and SVM to develop a hybrid technique for fMRI data analysis. The technique proposed used the power of SVM for the data derive reference function and GLM for statistical inference. His experimental study confirmed that the combined approach made better sense than the regular GLM for detecting the sensorimotor task. Yang et al. [107] proposed the hybrid machine learning approach to classify schizophrenia and healthy control subjects. The hybrid approach includes the following three models and ultimately all three were combined into a single module using the majority of voting approach for the final decision. • Filter to remove irrelevant features and select the

candidate’s Single Nucleotide Polymorphism (SNP) subset from the SNP pool using the Forward Sequential Feature Selection (FSFS) as well as to combine the SNP selection with the AdaBoost SVM ensemble for constructing the Support Vector Machine Ensemble (SVME) was done.

• Another Voxel-SVME was built using the voxels in the fMRI map which contributed to the classification.

• Using the components obtained by the fMRI activation by Independent Component Analysis (ICA) a single SVM classifier (ICA-SVMC) was constructed.

Page 10: Current Bioinformatics, 2015, 10 Machine Learning Approaches …sclab.yonsei.ac.kr/publications/Papers/IJ/2015_CBIO_Dehuri.pdf · Machine Learning Approaches for Cognitive State Classification

10 Current Bioinformatics, 2015, Vol. 10, No. 4 Parida et al.

Their experimental results show that the hybrid approach classification accuracy is promising and able to extract the discriminating information to classify schizophrenia efficiently. Abraham et al. [108] proposed hybrid schemes which involved the particle swarm optimization algorithm based rough set reduction for SVM and applied to the cognitive state classification. The rough set which can determine the most important attributes in the classification angle has been used for feature selection. The particle swarm optimization algorithm considered the Swarm Intelligence (SI) model can find optimal regions of complex search spaces through the interaction of individuals in the case of population studies and its strength is its fast convergence. Based on the fMRI experiment they have shown that the proposed scheme is feasible for the classification of single or multiple subjects. Brodersen et al. [109] proposed a hybrid generative-embedding approach to address the two major challenges encountered in the fMRI classification, i.e., first challenge was due to the high data dimensionality and low sample size, whereas the second challenge was that popular discriminative method like SVM rarely afforded mechanistic interpretability. The approach proposed combines the Dynamic Causal Models (DCMs) and SVMs. In its multi-steps, the first step (model inversion) uses the subject specific data to estimate the parameters of a generative model. In the second step (Kernel construction), a Kernel function is defined where the Kernel represents a generative score space and provides statistical representation of each subject. In the third step, a classifier is used to find a separating hyperplane between a group of subjects and in the final or fourth step it enables a mechanistic interpretation of feature weights. It is shown that the generative-embedding approach achieves 98% accurate classification and performs better when compared with the conventional activation-based and correlation-based methods. Xu et al. [110] submitted a real-time Conjugate Gradient (rt-CG) algorithm for fMRI classification to overcome the challenges faced by the real-time fMRI. They compared the performance of three algorithms: 1) rt-CG; 2) real-time Partial Least Square (rt-PLS); and 3) real-time Bridge PLS (rt-BPLS). From the experimental results it is shown that the rt-CG can process high dimensional fMRI data using about 0.5 sec and reach a prediction accuracy of around 90%. To overcome the problem of dimensionality and strong noise seen in the fMRI data, Ng et al. [111] proposed a spatiotemporal classification using the Generalized Sparse Classifier (GSC). The brain activity is intrinsically a spatiotemporal prior process and tends to be sparsely distributed in localized clusters. These properties of brain activity can be jointly captured by simultaneously enforcing sparsity and spatiotemporal smoothness using the GSC. To exploit the GSC, they built a spatiotemporally-regularized sparse LDA classifier, which improves the prediction accuracy over the widely used LDA and linear SVM classifier as well as a modest number of sparse LDA variants. Yarkoni et al. [112] proposed an automated brain-mapping framework which can generate a large database using text-mining, meta-analysis and machine learning

techniques for mapping between the neural and cognitive states. The framework proposed can be used to automatically conduct large-scale, high-quality neuroimaging meta-analysis and can support accurate decoding of broad cognitive states in both entire studies and individual human subjects. Chu et al. [113] demonstrated the Kernel regression approach for brain activity prediction using fMRI imaging. They trained different regression machines for different experimental conditions to compute a temporal profile for each condition. A decision function was used to classify by comparing the predicted temporal profile elicited by each event. Based on the experimental results, it was noted that this approach achieved 100% classification accuracy for six of the subjects with an average 94% accuracy across all 16 subjects, exceeding the SVM classification accuracy (83%).

5.3. fMRI Analysis Based on Ensembles and Other Approaches

In this section, the approaches based on an ensemble of classifiers, wavelet theory, cellular automata theory and others that cannot be included in Subsections 5.1 and 5.2, respectively, are discussed. Ludmila et al. [114] employed the pattern recognition methods such as classifier ensembles for the fMRI classification. In their experiment, they used seven standard voxel selection methods and compared 18 classifier models (7 single classifier and 11 ensembles). The individual classifier used in the experiment is as follows: • Linear discriminant classifier (LDC) • Logistic Classifier • Support Vector Machine (SVM) • Decision Tree Classifier (DT) • Naive Bayes (NB) • Nearest Neighbor (1-NN) • Multi-Layer Perceptron (MLP) Classifiers that are not very accurate individually and which tend to make mistakes on different objects they may form a very accurate ensemble. It is found that ensembles increase the classification margin, and therefore reduce the chances of a classification mistake. The different classifier ensembles of SVM and decision trees used in this experiment are as follows. • Bagging • AdaBoost • Random subspace • Random forest • Rotation forest • Random oracle The overall ranking placed the Random Subspace ensemble with SVM classifiers at the top of the list. The Table shows that among the single classifiers, SVM is favored for fMRI data analysis. The computational complexity of the ensemble classifiers is obviously higher

Page 11: Current Bioinformatics, 2015, 10 Machine Learning Approaches …sclab.yonsei.ac.kr/publications/Papers/IJ/2015_CBIO_Dehuri.pdf · Machine Learning Approaches for Cognitive State Classification

Machine Learning Approaches for Cognitive State Classification and Brain Activity Prediction Current Bioinformatics, 2015, Vol. 10, No. 4 11

than that of a single SVM classifier but not too much. The training complexity is not an issue here because the classifier seen here can be run on-line in an FMRI experiment. Recently, Parida et al. [115] proposed a genetic algorithm based feature selection for cognitive state classification using an ensemble of decision trees. They reported that an ensemble based classifier is better than an individual classifier with a weak learner. Latif et al. [116] proposed a complete new Cellular Automata (CA) classification for the fMRI brain image classification. In their experiment, they compared it with the SVM which is one of the best techniques used for fMRI brain image classification. They claimed that the Cellular Automata classifier is better than the SVM in terms of time complexity. The CA approach depends on sorting O(n) and searching lg(n), where n is the total number of voxels of all the images, the total time complexity is O(n), which outperforms the SVM (with the linear Kernel) algorithm time complexity of O(n2). Plumpton et al. [117] performed a comparative study based on an experiment between the random subspace ensemble and linear classifiers (perceptron, balanced winnow and On-line Linear Discriminant Classifier (O-LDC)). The linear classifiers trained off-line (batch) on training data. After training they presented the online data point, one at a time. The current classifier was tested on the data point; if the data point was misclassified then the classifier was updated. This experiment was repeated using classifier ensembles rather than individual classifiers. Based on the experiment results, the random subspace ensemble was found to perform better than the individual classifiers. The O-LDC outperformed the individual and ensemble experiments. Shirer et al. [118] demonstrated that free-streaming subject-driven cognitive states can be decoded using the novel whole-brain functional connectivity analysis. In their experiment they trained the classifier to identify whole-brain connectivity (as the subjects rested quietly, remembered the events of their day, subtracted numbers or silently sang lyrics) and the classifier identified the cognitive states with 84% accuracy; moreover, the accuracy increased to 85% when identifying these states in a second, independent cohort of subjects. It was observed that the classification accuracy remained high with imaging runs as short as 30-60s. It was also found that the functional ROIs outperform the structural ROIs such as 90 functional defined ROIs outperformed a set of 112 commonly used structural ROIs in classifying the

cognitive states. Michel et al. [119] proposed a new supervised clustering approach for the brain state inference from the fMRI image. In the proposed supervised clustering approach, they first constructed a hierarchical subdivision of the search domain using the Ward hierarchical clustering algorithm. The output parcel sets constructed from the functional data is isomorphic to a tree and by construction; there is a one-to-one mapping between the cuts of this tree and parcellations of the domain. Given a parcellation, the signal can be represented by parcel-based averages, thus providing a low dimensional representation of the data. To overcome the limitations of the common fMRI analysis techniques in identifying small changes against a noisy background, Brammer et al. [120] discussed the use of multidimensional wavelet analysis, which is best suited for transient time series events and will adapt to periodic signals of decreasing or increasing amplitudes.

Richiardi et al. [121] proposed a complete different approach using multi-band classification of connected graphs. They brought out the brain decoding and graph representations based on functional connectivity measures. The main task of their experiment is as follows.

a) Estimate connectivity at different temporal scales using the wavelet transform as a preprocessing step.

b) Build a classifier trained on the functional connectivity graphs of a group of subjects to distinguish between the brain states of an unseen subject.

The experiment identified the connections that were most discriminating between the brain states. The classification was obtained by deriving the training samples and growing the decision trees using a leave-one-subject-out cross-validation procedure. The combined decisions of the classifiers in each sub-band obtained a higher degree of accuracy than the best single band classifier. The flowchart and ensemble procedures are shown in Fig. (5). The experimental results confirm that the functional connectivity analysis can benefit from filtering.

6. PERFORMANCE EVALUATION

We compared the different machine learning approaches and their advantages and disadvantages in cognitive state

Fig. (5). Flowchart of the classification and ensembling procedure [121].

Page 12: Current Bioinformatics, 2015, 10 Machine Learning Approaches …sclab.yonsei.ac.kr/publications/Papers/IJ/2015_CBIO_Dehuri.pdf · Machine Learning Approaches for Cognitive State Classification

12 Current Bioinformatics, 2015, Vol. 10, No. 4 Parida et al.

Table 1. Advantages and limitations of the different machine learning techniques for fMRI classification.

SL No Techniques for Cognitive State Classification Advantages Limitations

1 GNB High accuracy in single subject classification. Found low accuracy while applied on multi-subject.

2 SVM Most popular classifier for cognitive state classification.

Shows high accuracy when applied on multi-subject classification.

Tested on a set of predefined cognitive states and can be extended to other cognitive

activities, mental disorder etc.

3 Online SVM The use of new technique-Effective Mapping (EM) in real time classification is demonstrated showing robust

prediction rates for three discrete emotional states.

One needs to collect initial data for classifier training consumes.

4 Support Vector Decomposition Machine(SVDM)

Combined approach (classifier and dimensionality reduction method) showed high accuracy with a

consistent result compared with the other classifier. Not tested for multi-subject classification.

5 Neural Network High accuracy when applied on multi- subject classification.

Only tested for a pre-defined set of states occurring at specific times while the subject

performs a specific task.

6 Classifier Ensembles Computational complexity of the ensemble classifier is higher than the single classifier.

The experimental result reveals that ensemble classifiers are not universally better than the

SVM (single classifier).

7 Kernel method A new framework for process fMRI data and brain activity prediction was done using the Kernel method. Not tested in multi-subject classification.

8 Multiband Connectivity Graph New approaches inferring brain states from the

functional connectivity graph instead of the common brain voxel activation value.

The new approach was applied to the discriminative states; however, need to explore

for other cognitive state classification.

9 Cellular Automata Classifier The performance of the Cellular Automata model (accuracies, time complexity) is better than SVM.

The accuracy continues to remain low when compared with the other cognitive

classification techniques.

10 Generalized Sparse Classifier New approach to include spatiotemporal, prior to

classifier learning to handle the dimensionality and noise in the fMRI data.

Tested with limited dataset.

11 Hidden Markov Model (HMM)

HMM based model focused on neurological and psychiatric disorders opens up more research and

investigation in this area.

Abstract model, need more experimental results.

12 Multivariate analysis The new tool Relevance Vector Machine (RVM) shows potential in analyzing complex data.

Need to resolve the challenges of optimal feature selection.

13 Large Scale automated synthesis

New automated brain mapping framework for cognitive task decoding. Ability to map between the neural and

cognitive functions.

Many challenges still present, as follows, which need to be overcome in further research

work. NeuroSynth framework depends on

psychological terms NeuroSynth framework unable to extract

information regarding fine grained cognitive states.

14 WICA Hybrid model extending ICA and incorporating wavelet and Bayesian inference.

Lacking in experimental result for the proposed model.

15 Restricted Boltzmann Machines (RBM)

Good comparison study between Generative and Discriminative training for the discrimination tasks.

Discriminative training for the large dataset of all slices omitted and the number of hidden units reduced due to computational expense.

16 Partial Least Square New tool in Neuroimage analysis, which has great

strength to extract certain features that are inaccessible by other methods.

While extracting certain features which are inaccessible by other methods, it overlooks the complexities, for which other methods may be

more suited.

17 Real-Time Conjugate Gradient

High performance (accuracy around 90%) compared with other algorithms (rt-PLS and rt-BPLS).

Able to process high dimensional data in a short time (0.5 Sec).

New approaches need more testing on various datasets.

Page 13: Current Bioinformatics, 2015, 10 Machine Learning Approaches …sclab.yonsei.ac.kr/publications/Papers/IJ/2015_CBIO_Dehuri.pdf · Machine Learning Approaches for Cognitive State Classification

Machine Learning Approaches for Cognitive State Classification and Brain Activity Prediction Current Bioinformatics, 2015, Vol. 10, No. 4 13

classification and brain activity prediction. Table 1 shows the advantages and disadvantages of the various machine learning techniques ranging from the conventional approaches to the ensemble and the most recent hybrid approaches. The advantages of the techniques are discussed and their limitations and further improvements are highlighted for comparative visualization.

CONCLUSION AND FUTURE PERSPECTIVES

This paper discusses various machine learning techniques, their advantages and disadvantages and their applicability in the cognitive state classification. Many hybrid techniques and their limitations, which can be used as potential areas of research in future work have also been addressed. We discussed several classification techniques and their worth has been proven in different applications such as: lie detection [122], brain activity prediction [123], reverse inference [124], brain computer interface [125] and mental disorder discovery [126]. Now neuroscientists are archiving their studies in a publicly accessible database fMRI Data Center (fMRIDC) - a high performance computing center,

which allows the researcher to examine brain activity patterns beyond any single study [127]. Many opportunities continue to present themselves to analyze fMRI data and improve the classification accuracy for cognitive state identification and prediction. The extension of these methods into the real-world applications could prove very useful for medical diagnosis and neuroprosthesis [128]. Beginning with image acquisition right up to final data analysis, fMRI poses several challenges to the neuroscientists to make the best use of the technique [129]. The methods for analyzing the fMRI data and ways to increase in sophistication require much improvement [130]. The current learning algorithms lack robustness; hence, proper attention must be given to make them widely applicable [131]. Some of the challenges in fMRI data analysis, which can be explored are, inferring the causal role of pattern information, finding structure in the underlying high-dimensional neural representations and relating one person’s neural patterns to another [132]. Some of the possibilities for future investigations are to examine the internal representation of objects derived from the fMRI data of individual representations [133] and

(Table 1) contd…..

SL No Techniques for Cognitive State Classification Advantages Limitations

18 Supervised Clustering Approach

Better prediction accuracy for the inter-subjects study compared with the other state-of-the-art approaches

(SVR, Elastic net, SVC and SMLR). The method proposed not limited to brain imaging can

be applied in any dataset where the multi - scale structure is considered as important (e.g., medical or

satellite images).

There is no optimal solution for the proposed supervised-cut for the prediction task.

19 Hybrid(ICA-Bayesian) Hybrid dDBN approach along with the ICA filtering method applied for the fMRI classification of finding

brain disorders.

The hybrid dDBN-ICA filtering approach applied to brain disorders can extend to other

cognitive activities.

20 Hybrid (DCM-SVM) Hybrid generative-embedding approach shows high classification accuracy (98%) in the fMRI data.

The hybrid approach can be extended to apply for classifying other brain disorder problems.

21 Hybrid (PSO-SVM) Hybrid approach using rough set reduction for SVM and PSO for optimizing feature reduction.

Lacking in performance while testing for multi subject (accuracy trend same as single,

although the variance is higher).

22 Hybrid (SVM-GLM) New hybrid approach of conventional GLM with SVM. The new hybrid approach can be extended to test on other cognitive classification.

23 Hybrid (Markov SVM)

The online model is able to predict accurately real time tasks, such as subject viewing a video or indulging in

nicotine cravings. Demonstrated Independent Component Spatial Maps

referred to as “IC dictionary” and performed better than Region of Interest (ROI) based analysis, and using IC features; the results are much more stable with robust

prediction over time.

Feature dependency has not been studied.

24 Kernel regression method

New approach to compute a predicted temporal profile for each experimental condition, which uses the decision

function. High classification accuracy compared with SVM

classification result. Better performance in low training samples.

The classification depends on the predicted temporal profile.

25 One Class Neural Network Demonstrated successfully in one class classification and performed with high accuracy (90%) for multiple

subject classification.

The other class data which are not used during classifier training used in the testing phase

violated the purity of the one class approach.

Page 14: Current Bioinformatics, 2015, 10 Machine Learning Approaches …sclab.yonsei.ac.kr/publications/Papers/IJ/2015_CBIO_Dehuri.pdf · Machine Learning Approaches for Cognitive State Classification

14 Current Bioinformatics, 2015, Vol. 10, No. 4 Parida et al.

explore brain connectivity [134]. The new field, Computational Cognitive Neuroscience (CCN) which lies at the intersection of computational neuroscience and machine learning and the neural network theory has great potential for prediction on how drugs, genes and focal lesions affect behavior [135].

CONFLICT OF INTEREST

The authors confirm that this article content has no conflict of interest.

ACKNOWLEDGEMENTS

The authors gratefully acknowledge the support of the Original Technology Research Program for Brain Science through the National Research Foundation (NRF) of Korea (NRF: 2010-0018948) funded by the Ministry of Education, Science and Technology.

REFERENCES

[1] Haynes J, Rees G. Decoding mental states from brain activity in humans. Nat Rev Neurosci 2006; 7(7): 523-34.

[2] Miller M, Donovan C, Bennett C, Aminoff E, Mayer R. Individual differences in cognitive style and strategy predict similarities in the patterns of brain activity between individuals. NeuroImage 2012;59(1):83-93.

[3] Supek S, Magjarevic R. Neurodynamic measures of functional connectivity and cognition. Med Biol Eng Comput 2011; 49(5): 507-9.

[4] Poldrack R. The role of fMRI in Cognitive Neuroscience: where do we stand? Curr Opin Neurobiol 2008; 18(2): 223-7.

[5] Parida S, Dehuri S. A Review of Hybrid Machine Learning Approaches in Cognitive Classification. Proceedings of the Second International Conference on Soft Computing for Problem Solving (SocProS 2012), 2012; 2014: Springer.

[6] Op de Beeck H, Haushofer J, Kanwisher N. Interpreting fMRI data: maps, modules and dimensions. Nat Rev Neurosci 2008; 9(2): 123-35.

[7] Haxby JV, Gobbini MI, Furey ML, Ishai A, Schouten JL, Pietrini P. Distributed and overlapping representations of faces and objects in ventral temporal cortex. Science. 2001; 293(5539): 2425-30.

[8] Zanzotto FM, Croce D. Comparing EEG/ERP-like and fMRI-like techniques for reading machine thoughts. Brain Informatics: Springer; 2010. pp. 133-44.

[9] Mitchell T, Hutchinson R, Niculescu R, Pereira F, Wang X, Just M et al. Learning to Decode Cognitive States from Brain Images. Mach Learn 2004;57(1/2):145-75.

[10] Behroozi M, Daliri MR, Boyaci H. Statistical analysis methods for the fMRI data. Basic and Clinical Neuroscience. 2011; 2(4): 67-74.

[11] Friman O, Borga M, Lundberg P, Knutsson H. Detection and detrending in fMRI data analysis. NeuroImage. 2004; 22(2): 645-55.

[12] Horwitz B, Tagamets M, McIntosh A. Neural modeling, functional brain imaging, and cognition. Trends Cogn Sci 1999; 3(3): 91-8.

[13] Nielsen FA, Christensen MS, Madsen KH, Lund TE, Hansen LK. fMRI neuroinformatics. IEEE Eng Med Biol Mag 2006; 25(2): 112-9.

[14] O'Toole A, Jiang F, Abdi H, Pénard N, Dunlop J, Parent M. Theoretical, Statistical, and Practical Perspectives on Pattern-based Classification Approaches to the Analysis of Functional Neuroimaging Data. J Cogn Neurosci 2007; 19(11): 1735-52.

[15] Parida S, Dehuri S. A Study of Feature Selection Techniques for fMRI based Cognitive State Classification. Proceedings of the Advancements in the Era of Multi Disciplinary Systems (AEMDS), 2013 Apr 6-7; Kurukshetra, India. 2013: Elsevier Science.

[16] Wang L, Lei Y, Zeng Y, Tong L, Yan B. Principal Feature Analysis: A Multivariate Feature Selection Method for fMRI Data. Comput Math Methods Med 2013; 2013: 1-7.

[17] Mahmoudi A, Takerkart S, Regragui F, Boussaoud D, Brovelli A. Multivoxel Pattern Analysis for fMRI Data: A Review. Comput Math Methods Med 2012; 2012: 1-14.

[18] Lindquist M. The Statistical Analysis of fMRI Data. Statist Sci 2008; 23(4): 439-64.

[19] Tanabe J, Miller D, Tregellas J, Freedman R, Meyer F. Comparison of Detrending Methods for Optimal fMRI Preprocessing. NeuroImage 2002; 15(4): 902-7.

[20] Kamitani Y, Sawahata Y. Spatial smoothing hurts localization but not information: Pitfalls for brain mappers. NeuroImage 2010; 49(3): 1949-52.

[21] Della-Maggiore V, Chau W, Peres-Neto P, McIntosh A. An Empirical Comparison of SPM Preprocessing Parameters to the Analysis of fMRI Data. NeuroImage 2002; 17(1): 19-28.

[22] Worsley K, Liao C, Grabove M, Petre V, Ha B, Evans A. A general statistical analysis for fMRI data. NeuroImage 2000; 11(5): S648.

[23] Ngan S, LaConte S, Hu X. Temporal Filtering of Event-Related fMRI Data Using Cross-Validation. NeuroImage 2000; 11(6): 797-804.

[24] Strother S, Anderson J, Hansen L, et al. The Quantitative Evaluation of Functional Neuroimaging Experiments: The NPAIRS Data Analysis Framework. NeuroImage 2002; 15(4): 747-71.

[25] LaConte S, Anderson J, Muley S, et al. The Evaluation of Preprocessing Choices in Single-Subject BOLD fMRI Using NPAIRS Performance Metrics. NeuroImage 2003; 18(1): 10-27.

[26] Churchill NW, Oder A, Abdi H, et al. Optimizing preprocessing and analysis pipelines for single‐subject fMRI. I. Standard temporal motion and physiological noise correction methods. Hum Brain Mapp 2012; 33(3): 609-27.

[27] Abdi H, Dunlop J, Williams L. How to compute reliability estimates and display confidence and tolerance intervals for pattern classifiers using the Bootstrap and 3-way multidimensional scaling (DISTATIS). NeuroImage 2009; 45(1): 89-95.

[28] Zhang L, Samaras D, Tomasi D, Volkow N, Goldstein R, editors. Machine learning for clinical diagnosis from functional magnetic resonance imaging. Computer Vision and Pattern Recognition, 2005 CVPR 2005 IEEE Computer Society Conference on; 2005: IEEE.

[29] Jolliffe IT. ‘Principal Component Analysis, Second Edition’, Springer 2002; 30(3).

[30] Santo R, Sato J, Martin M. Discriminating brain activated area and predicting the stimuli performed using artificial neural network. Exacta 2007; 5(2).

[31] Yang K, Shahabi C, editors. A pca-based kernel for kernel pca on multivariate time series. Proceedings of ICDM 2005 workshop on temporal data mining: algorithms, theory and applications held in conjunction with the fifth IEEE international conference on data mining (ICDM’05); 2005.

[32] Viviani R, Grön G, Spitzer M. Functional principal component analysis of fMRI data. Hum Brain Mapp 2005; 24(2): 109-29.

[33] Friston K, Phillips J, Chawla D, Büchel C. Revealing interactions among brain systems with nonlinear PCA. Hum Brain Mapp 1999; 8(23): 92-7.

[34] You X, Guillen MR, Gaillard WD, Adjouadi M, editors. Application of Nonlinear Classifiers with the Principal Component Analysis in fMRI Language Activation Pattern Recognition in a Multisite Study for Pediatric Epilepsy. IPCV; 2009.

[35] Hansen L, Larsen J, Nielsen F, Strother S, Rostrup E, Savoy R et al. Generalizable Patterns in Neuroimaging: How Many Principal Components?. NeuroImage 1999; 9(5): 534-44.

[36] Calhoun VD, Adali T, Hansen LK, Larsen J, Pekar JJ. ICA of functional MRI data: an overview. 2003.

[37] McKeown MJ, Hansen LK, Sejnowsk TJ. Independent component analysis of functional MRI: what is signal and what is noise? Curr Opin Neurobiol 2003; 13(5): 620-9.

[38] Stone J. Independent component analysis: an introduction. Trends Cogn Sci 2002; 6(2): 59-64.

[39] Beckmann C, Smith S. Probabilistic Independent Component Analysis for Functional Magnetic Resonance Imaging. IEEE Trans Med Imaging. 2004; 23(2): 137-52.

[40] Smolders A, De Martino F, Staeren N, et al. Dissecting cognitive stages with time-resolved fMRI data: a comparison of fuzzy clustering and independent component analysis. Magn Reson Imaging 2007; 25(6): 860-8.

[41] Formisano E, Esposito F, Kriegeskorte N, Tedeschi G, Di Salle F, Goebel R. Spatial independent component analysis of functional magnetic resonance imaging time-series: characterization of the cortical components. Neurocomputing 2002; 49(1-4): 241-54.

Page 15: Current Bioinformatics, 2015, 10 Machine Learning Approaches …sclab.yonsei.ac.kr/publications/Papers/IJ/2015_CBIO_Dehuri.pdf · Machine Learning Approaches for Cognitive State Classification

Machine Learning Approaches for Cognitive State Classification and Brain Activity Prediction Current Bioinformatics, 2015, Vol. 10, No. 4 15

[42] Liao W, Chen H, Huang H, Mao D, Yao D, Zhao X, et al. Spatial Independent Component Analysis for Multi-task Functional MRI Data Processing. Int J Mag Reson Imaging 2007; 1(1): 49-60.

[43] Calhoun V, Adali T, Pearlson G, Pekar J. A method for making group inferences from functional MRI data using independent component analysis. Hum Brain Mapp 2001; 14(3): 140-51.

[44] Mckeown M, Makeig S, Brown G, et al. Analysis of fMRI data by blind separation into independent spatial components. Hum Brain Mapp 1998; 6(3): 160-88.

[45] Thomas C, Harshman R, Menon R. Noise Reduction in BOLD-Based fMRI Using Component Analysis. NeuroImage 2002; 17(3): 1521-37.

[46] Bai B, Kantor P, Shokoufandeh A, Silver D, editors. fMRI brain image retrieval based on ICA components. Current Trends in Computer Science, 2007 ENC 2007 Eighth Mexican International Conference on; 2007: IEEE.

[47] Varoquaux G, Sadaghiani S, Pinel P, Kleinschmidt A, Poline J, Thirion B. A group model for stable multi-subject ICA on fMRI datasets. NeuroImage 2010; 51(1): 288-99.

[48] Calhoun V, Pearlson G, Adali T. Independent Component Analysis Applied to fMRI Data: A Generative Model for Validating Results. J VLSI Signal Process Syst Signal Image Video Technol 2004; 37(2/3): 281-91.

[49] HANSEN L. Multivariate strategies in functional magnetic resonance imaging. Brain and Language. 2007;102(2):186-91.

[50] Kuncheva LI, Plumpton CO. Choosing parameters for random subspace ensembles for fMRI classification. Multiple Classifier Systems: Springer; 2010; 54-63.

[51] Michel V, Gramfort A, Varoquaux G, Eger E, Thirion B. Total Variation Regularization for fMRI-Based Prediction of Behavior. IEEE Trans Med Imaging 2011; 30(7): 1328-40.

[52] Guyon I, Elisseeff A. An introduction to variable and feature selection. J Mach Learn Res 2003; 3: 1157-82.

[53] Kohavi R, John G. Wrappers for feature subset selection. Artif Intell 1997; 97(1-2): 273-324.

[54] Norman K, Polyn S, Detre G, Haxby J. Beyond mind-reading: multi-voxel pattern analysis of fMRI data. Trends Cogn Sci 2006; 10(9): 424-30.

[55] Pereira F, Mitchell T, Botvinick M. Machine learning classifiers and fMRI: A tutorial overview. NeuroImage 2009; 45(1): S199-S209.

[56] Niiniskorpi T, Åberg MB, Wessberg J, editors. Particle Swarm Feature Selection for fMRI Pattern Classification. BIOSIGNALS; 2009.

[57] Kenndy J, Eberhart R, editors. Particle swarm optimization. Proceedings of IEEE International Conference on Neural Networks; 1995.

[58] Coello CAC, Dehuri S, Ghosh S. Swarm intelligence for multi-objective problems in data mining: Springer Science & Business Media; 2009.

[59] Michel V, Damon C, Thirion B, editors. Mutual information-based feature selection enhances fMRI brain activity classification. Biomedical Imaging: From Nano to Macro, 2008 ISBI 2008 5th IEEE International Symposium on; 2008: IEEE.

[60] Honorio J, Samaras D, Tomasi D, Goldstein R, editors. Simple fully automated group classification on brain fMRI. Biomedical Imaging: From Nano to Macro, 2010 IEEE International Symposium on; 2010: IEEE.

[61] Wu Y, Zhang A, editors. Feature selection for classifying high-dimensional numerical data. Computer Vision and Pattern Recognition, 2004 CVPR 2004 Proceedings of the 2004 IEEE Computer Society Conference on; 2004: IEEE.

[62] Venkataraman A, Kubicki M, Westin C-F, Golland P, editors. Robust feature selection in resting-state fmri connectivity based on population studies. Computer Vision and Pattern Recognition Workshops (CVPRW), 2010 IEEE Computer Society Conference on; 2010: IEEE.

[63] Boehm O, Hardoon D, Manevitz L. Classifying cognitive states of brain activity via one-class neural networks with feature selection by genetic algorithms. Int J Mach Learn Cybern 2011; 2(3): 125-34.

[64] Coutanche M, Thompson-Schill S, Schultz R. Multi-voxel pattern analysis of fMRI data predicts clinical symptom severity. NeuroImage 2011; 57(1): 113-23.

[65] Weil R, Rees G. Decoding the neural correlates of consciousness. Curr Opin Neurol 2010; 23(6): 649-55.

[66] Samuel Schwarzkopf D, Rees G. Pattern classification using functional magnetic resonance imaging. Wiley Interdiscip Rev Cogn Sci 2011; 2(5): 568-79.

[67] Parida S, Dehuri S, editors. Applying Machine Learning Techniques for Cognitive State Classification. IJCA Proceedings on International Conference in Distributed Computing and Internet Technology (ICDCIT), ICDCIT; 2013.

[68] Etzel JA, Gazzola V, Keysers C. An introduction to anatomical ROI-based fMRI classification analysis. Brain Res. 2009; 1282: 114-25.

[69] Pereira F, Botvinick M. Information mapping with pattern classifiers: A comparative study. NeuroImage 2011; 56(2): 476-96.

[70] Haynes J, Rees G. Predicting the orientation of invisible stimuli from activity in human primary visual cortex. Nat Neurosci 2005; 8(5): 686-91.

[71] Etzel JA, Valchev N, Keysers C. The impact of certain methodological choices on multivariate analysis of fMRI data with support vector machines. Neuroimage 2011; 54(2): 1159-67.

[72] Frank LR, Buxton RB, Wong EC. Probabilistic analysis of functional magnetic resonance imaging data. Magn Reson Med 1998; 39(1): 132-48.

[73] Penny W, Kiebel S, Friston K. Variational Bayesian inference for fMRI time series. NeuroImage 2003; 19(3): 727-41.

[74] Kershaw J, Ardekani B, Kanno I. Application of Bayesian inference to fMRI data analysis. IEEE Trans Med Imaging 1999; 18(12): 1138-53.

[75] Friston K, Chu C, Mourão-Miranda J, et al. Bayesian decoding of brain images. NeuroImage 2008; 39(1): 181-205.

[76] Tenenbaum J, Griffiths T, Kemp C. Theory-based Bayesian models of inductive learning and reasoning. Trends Cogn Sci 2006; 10(7): 309-18.

[77] Palatucci M, Mitchell TM. Classification in very high dimensional problems with handfuls of examples. Knowledge Discovery in Databases: PKDD 2007: Springer; 2007; 212-23.

[78] Ahn W-Y, Krawitz A, Kim W, Busemeyer JR, Brown JW. A model-based fMRI analysis with hierarchical Bayesian parameter estimation. J Neurosci Psychol Econ 2011; 4(2): 95.

[79] Hutchinson R, Niculescu R, Keller T, Rustandi I, Mitchell T. Modeling fMRI data generated by overlapping cognitive processes with unknown onsets using Hidden Process Models. NeuroImage 2009; 46(1): 87-104.

[80] Janoos F, Machiraju R, Singh S, Morocz I. Spatio-temporal models of mental processes from fMRI. NeuroImage 2011; 57(2): 362-77.

[81] Kriegeskorte N, Goebel R, Bandettini P. Information-based functional brain mapping. Proc Natl Acad Sci U S A. 2006; 103(10): 3863-8.

[82] Esterman M, Tamber-Rosenau B, Chiu Y, Yantis S. Avoiding non-independence in fMRI data analysis: Leave one subject out. NeuroImage 2010; 50(2): 572-76.

[83] Conroy B, Singer B, Haxby J, Ramadge PJ, editors. fMRI-based inter-subject cortical alignment using functional connectivity. Advances in Neural Information Processing Systems; 2009.

[84] McIntosh A, Bookstein F, Haxby JV, Grady C. Spatial pattern analysis of functional brain images using partial least squares. Neuroimage 1996; 3(3): 143-57.

[85] Mitchell TM, Hutchinson R, Just MA, Niculescu RS, Pereira F, Wang X, editors. Classifying instantaneous cognitive states from fMRI data. AMIA Annual Symposium Proceedings; 2003: American Medical Informatics Association.

[86] Wang X, Hutchinson R, Mitchell TM (2003). Training fMRI Classifiers to Detect Cognitive States across Multiple Human Subjects. Neural Information Processing Systems.

[87] Roussos E, Roberts S, Daubechies I. Variational Bayesian learning for wavelet independent component analysis. Bayesian Inference and Maximum Entropy Methods in Science and Engineering. 2005; 803: 274-81.

[88] Kim D, Burge J, Lane T, Pearlson G, Kiehl K, Calhoun V. Hybrid ICA-Bayesian network approach reveals distinct effective connectivity differences in schizophrenia. NeuroImage 2008; 42(4): 1560-8.

[89] Martínez-Ramón M, Koltchinskii V, Heileman G, Posse S. fMRI pattern classification using neuroanatomically constrained boosting. NeuroImage 2006; 31(3): 1129-41.

[90] Fan Y, Shen D, Davatzikos C, editors. Detecting cognitive states from fmri images by machine learning and multivariate classification. Computer Vision and Pattern Recognition Workshop, 2006 CVPRW'06 Conference on; 2006: IEEE.

Page 16: Current Bioinformatics, 2015, 10 Machine Learning Approaches …sclab.yonsei.ac.kr/publications/Papers/IJ/2015_CBIO_Dehuri.pdf · Machine Learning Approaches for Cognitive State Classification

16 Current Bioinformatics, 2015, Vol. 10, No. 4 Parida et al.

[91] Pereira F, Gordon G, editors. The support vector decomposition machine. Proceedings of the 23rd international conference on Machine learning; 2006: ACM.

[92] Peltier SJ, Lisinski JM, Noll DC, LaConte SM, editors. Support vector machine classification of complex fMRI data. Engineering in Medicine and Biology Society; 2009.

[93] Balci S, Sabuncu M, Yoo J, et al. editors. Prediction of successful memory encoding from fMRI data. Medical image computing and computer-assisted intervention: MICCAI International Conference on Medical Image Computing and Computer-Assisted Intervention; 2008: NIH Public Access.

[94] Brodersen KH, Schofield TM, Leff AP, Ong CS, Lomakina EI, Buhmann JM, et al. Generative embedding for model-based classification of fMRI data. PLoS computational biology. 2011; 7(6): e1002079.

[95] Cox DD, Savoy RL. Functional magnetic resonance imaging (fMRI)“brain reading”: detecting and classifying distributed patterns of fMRI activity in human visual cortex. Neuroimage 2003; 19(2): 261-70.

[96] Hardoon DR, Manevitz LM. Classifying Cognitive Tasks to fMRI Data Using Machine Learning Techniques.

[97] Formisano E, De Martino F, Valente G. Multivariate analysis of fMRI time series: classification and regression of brain responses using machine learning. Magn Reson Imaging 2008; 26(7): 921-34.

[98] Onut I-V, Ghorbani AA, editors. Classifying cognitive states from fMRI data using neural networks. Neural Networks, 2004 Proceedings 2004 IEEE International Joint Conference on; 2004: IEEE.

[99] Hardoon D, Mourao-Miranda J, Brammer M, Shawe-Taylor J. Unsupervised fMRI Analysis; 2006.

[100] Ni Y, Chu C, Saunders CJ, Ashburner J, editors. Kernel methods for fmri pattern prediction. Neural Networks, 2008 IJCNN 2008(IEEE World Congress on Computational Intelligence) IEEE International Joint Conference on; 2008: IEEE.

[101] Poldrack R, Halchenko Y, Hanson S. Decoding the Large-Scale Structure of Brain Function by Classifying Mental States Across Individuals. Psychol Sci 2009; 20(11): 1364-1372.

[102] Jäkel F, Schölkopf B, Wichmann F. Does Cognitive Science Need Kernels?. Trends Cogn Sci 2009; 13(9): 381-88.

[103] Schmah T, Hinton GE, Small SL, Strother S, Zemel RS, editors. Generative versus discriminative training of RBMs for classification of fMRI images. Advances in neural information processing systems; 2008.

[104] Sitaram R, Lee S, Ruiz S, Rana M, Veit R, Birbaumer N. Real-time support vector classification and feedback of multiple emotional brain states. NeuroImage 2011; 56(2): 753-65.

[105] Anderson A, Han D, Douglas PK, Bramen J, Cohen MS. Real-time functional MRI classification of brain states using markov-svm hybrid models: peering inside the rt-fMRI black box. Machine learning and interpretation in neuroimaging: Springer; 2012; 242-55.

[106] Wang Z. A hybrid SVM-GLM approach for fMRI data analysis. NeuroImage 2009; 46(3): 608-15.

[107] Yang H, Liu J, Sui J, Pearlson G, Calhoun V. A Hybrid Machine Learning Method for Fusing fMRI and Genetic Data: Combining both Improves Classification of Schizophrenia. Front Hum Neurosci 2010; 4: 192.

[108] Abraham A, Liu H, editors. Swarm intelligence based rough set reduction scheme for support vector machines. Proceedings of IEEE International Conference on Intelligence and Security Informatics, 2008 (ISI 2008); IEEE.

[109] Brodersen K, Schofield T, Leff A, et al. Generative Embedding for Model-Based Classification of fMRI Data. PLoS Comput Biol 2011; 7(6): e1002079.

[110] Xu H, Xi YT, Lee R, Ramadge PJ. Real-time Conjugate Gradients for Online fMRI Classification. Proceedings of IEEE Int. Conf. Acoustics, Speech and Signal Processing, May 2011; 565-8.

[111] Ng B, Abugharbieh R, editors. Modeling spatiotemporal structure in fMRI brain decoding using generalized sparse classifiers. Pattern Recognition in NeuroImaging (PRNI), 2011 International Workshop on; 2011: IEEE.

[112] Yarkoni T, Poldrack R, Nichols T, Van Essen D, Wager T. Large-scale automated synthesis of human functional neuroimaging data. Nat Meth 2011; 8(8): 665-70.

[113] Chu C, Mourão-Miranda J, Chiu Y, Kriegeskorte N, Tan G, Ashburner J. Utilizing temporal information in fMRI decoding: Classifier using kernel regression methods. NeuroImage 2011; 58(2): 560-71.

[114] Kuncheva L, Rodríguez J. Classifier ensembles for fMRI data analysis: an experiment. Magn Reson Imaging 2010; 28(4): 583-93.

[115] Parida S, Dehuri S, Wang G-N. Genetic algorithms based feature selection for cognitive state classification using ensemble of decision trees, In Z. Dimitrovova, et al. Eds. Proceedings of 11th International Conference on Vibration Problems; Lisbon, Portugal, 2013; 195-204.

[116] Latif A, Dalhoum A, Al-Dhamari I, editors. fMRI brain data classification using cellular automata. Proceedings of the 10th WSEAS international conference on applied informatics and communications, and 3rd WSEAS international conference on Biomedical electronics and biomedical informatics; 2010: World Scientific and Engineering Academy and Society (WSEAS).

[117] Plumpton CO, Kuncheva LI, Linden DE, Johnston SJ, editors. On-line fMRI data classification using linear and ensemble classifiers. Proceedings of 20th International Conference on Pattern Recognition (ICPR); 2010: IEEE.

[118] Shirer W, Ryali S, Rykhlevskaia E, Menon V, Greicius M. Decoding Subject-Driven Cognitive States with Whole-Brain Connectivity Patterns. Cereb Cortex 2011; 22(1): 158-165.

[119] Michel V, Gramfort A, Varoquaux G, Eger E, Keribin C, Thirion B. A supervised clustering approach for fMRI-based inference of brain states. Pattern Recognit 2012; 45(6): 2041-9.

[120] Brammer M. Multidimensional wavelet analysis of functional magnetic resonance images. Hum Brain Mapp 1998; 6(5-6): 378-82.

[121] Richiardi J, Eryilmaz H, Schwartz S, Vuilleumier P, Van De Ville D. Decoding brain states from fMRI connectivity graphs. NeuroImage 2011; 56(2): 616-26.

[122] Davatzikos C, Ruparel K, Fan Y, et al. Classifying spatial patterns of brain activity with machine learning methods: Application to lie detection. NeuroImage 2005; 28(3): 663-8.

[123] Boehm O, Hardoon D, Manevitz L. Classifying cognitive states of brain activity via one-class neural networks with feature selection by genetic algorithms. Int Mach Learn Cybern 2011; 2(3): 125-34.

[124] Poldrack R. Can cognitive processes be inferred from neuroimaging data? Trends Cogn Sci 2006; 10(2): 59-63.

[125] Christopher deCharms R. Applications of real-time fMRI. Nat Rev Neurosci 2008; 9(9): 720-9.

[126] Honorio J, Tomasi D, Goldstein R, Leung H, Samaras D. Can a Single Brain Region Predict a Disorder? IEEE Trans Med Imaging 2012; 31(11): 2062-72.

[127] Van Horn JD, Wolfe J, Agnoli A, et al. Neuroimaging databases as a resource for scientific discovery. Int Rev Neurobiol 2005; 66: 55-87.

[128] Tong F, Pratte M. Decoding Patterns of Human Brain Activity. Annu Rev Psychol 2012; 63(1): 483-509.

[129] Amaro E, Barker G. Study design in fMRI: Basic principles. Brain Cogn 2006; 60(3): 220-32.

[130] Lazar N, Eddy W, Genovese C, Welling J. Statistical Issues in fMRI for Brain Imaging. Int Statistical Rev 2001; 69(1): 105-27.

[131] Brodersen KH. Decoding mental activity from neuroimaging data—the science behind mind-reading. The New Collection; Oxford, 2009; 4: 50-61.

[132] Raizada R, Kriegeskorte N. Pattern-information fMRI: New questions which it opens up and challenges which face it. Int J Imaging Syst Tech 2010; 20(1): 31-41.

[133] Shinkareva S, Malave V, Just M, Mitchell T. Exploring commonalities across participants in the neural representation of objects. Hum Brain Mapp 2011; 33(6): 1375-83.

[134] Ramnani N, Lee L, Mechelli A, Phillips C, Roebroeck A, Formisano E. ‘Exploring brain connectivity: a new frontier in systems neuroscience’. Trends Neurosci 2002; 25(10): 496-7.

[135] Gregory Ashby F, Helie S. A tutorial on computational cognitive neuroscience: Modeling the neurodynamics of cognition. J Math Psychol 2011; 55(4): 273-89.

Received: December 27, 2013 Revised: April 29, 2014 Accepted: August 20, 2015