a shared vision for machine learning in neuroscience€¦ · 26/01/2018 · í í a shared vision...
TRANSCRIPT
Accepted manuscripts are peer-reviewed but have not been through the copyediting, formatting, or proofreadingprocess.
Copyright © 2018 the authors
This Accepted Manuscript has not been copyedited and formatted. The final version may differ from this version.
TechSights
A Shared Vision for Machine Learning in Neuroscience
Mai-Anh T. Vu1, Tulay Adali7, Demba Ba8, Gyorgy Buzsaki9, David Carlson3, Katherine Heller4, Conor
Liston10, Cynthia Rudin5,6, Vikaas Sohal11, Alik S. Widge12, Helen S. Mayberg13, Guillermo Sapiro5 and
Kafui Dzirasa1,2
1Dept. of Neurobiology2Dept. of Psychiatry and Behavioral Sciences3Dept. of Civil and Environmental Engineering4Department of Statistical Sciences5Dept. of Electrical and Computer Engineering6Dept. of Computer Science; Duke University, Durham, North Carolina 27710, USA.7Dept. of Computer Science and Electrical Engineering; University of Maryland Baltimore County, Baltimore,Maryland 21250.8Dept. of Electrical Engineering and Bioengineering; Harvard University, Cambridge, MA 02138.9Dept. of Neuroscience; NYU School of Medicine, New York, NY 10016.10Feil Family Brain and Mind Research Institute, Weill Cornell Medical College, New York, 10065.11Dept. of Psychiatry, University of California San Francisco, San Francisco, CA 94158.12Dept. of Psychiatry, Massachusetts General Hospital, Charlestown, MA 02129.13Dept. of Psychiatry, Neurology, and Radiology; Emory University, Atlanta, GA 30322.
DOI: 10.1523/JNEUROSCI.0508-17.2018
Received: 25 September 2017
Revised: 2 January 2018
Accepted: 9 January 2018
Published: 26 January 2018
Conflict of Interest: The authors declare no competing financial interests.
Many of the themes presented in this manuscript reflect areas of consensus that emerged from the NationalInstitutes of Mental Health sponsored Explainable Artificial Intelligence meeting on November 10th, 2017 inWashington, DC. This meeting was co-organized by Michele Ferrante, G. Sapiro, and H.S. Mayberg. We wouldlike to thank S. Hakimi, S.D. Mague, R.A. Adcock, N.M. Gallagher, A. Talbot, S. Burwell, and R. Hultman forcomments on this manuscript. Our work was supported by NIMH grant R01MH099192-05S2 to KD, NSF GRFPto MTV, and Duke Katherine Goodman Stern Fellowship to MTV.
Correspondence should be sent to: Kafui Dzirasa, M.D. Ph.D., Department of Psychiatry and BehavioralSciences, Duke University Medical Center, 421 Bryan Research Building, Box 3209, Durham, NC 27710,USA, Email: [email protected], Twitter: @KafuiDzirasa, or Mai-Anh Vu, Department of Neurobiology,Duke University Medical Center, 421 Bryan Research Building, Box 3209, Durham, NC 27710, USA, Email:[email protected], Twitter: @MaiAnh_T
Cite as: J. Neurosci ; 10.1523/JNEUROSCI.0508-17.2018
Alerts: Sign up at www.jneurosci.org/cgi/alerts to receive customized email alerts when the fully formattedversion of this article is published.
1
A Shared Vision for Machine Learning in Neuroscience 1 2
Mai-Anh T. Vu1†, Tulay Adali7, Demba Ba8, Gyorgy Buzsaki9, David Carlson3, Katherine Heller4, 3 Conor Liston10, Cynthia Rudin5,6, Vikaas Sohal11, Alik S. Widge12, Helen S. Mayberg13, Guillermo 4
Sapiro5, Kafui Dzirasa1,2† 5 6 7
1Dept. of Neurobiology, 2Dept. of Psychiatry and Behavioral Sciences, 3Dept. of Civil and 8 Environmental Engineering, 4Department of Statistical Sciences, 9
5Dept. of Electrical and Computer Engineering, 6Dept. of Computer Science; Duke University, 10 Durham, North Carolina 27710, USA. 7Dept. of Computer Science and Electrical Engineering; 11
University of Maryland Baltimore County, Baltimore, Maryland 21250. 8Dept. of Electrical 12 Engineering and Bioengineering; Harvard University, Cambridge, MA 02138. 9Dept. of 13
Neuroscience; NYU School of Medicine, New York, NY 10016. 10Feil Family Brain and Mind 14 Research Institute, Weill Cornell Medical College, New York, New York, 10065. 11Dept. of 15 Psychiatry, University of California San Francisco, San Francisco, CA 94158. 12Dept. of 16
Psychiatry, Massachusetts General Hospital, Charlestown, MA 02129. 13Dept. of Psychiatry, 17 Neurology, and Radiology; Emory University, Atlanta, GA 30322. 18
19 20
†Correspondence should be sent to: 21 Kafui Dzirasa, M.D. Ph.D. 22
Department of Psychiatry and Behavioral Sciences 23 Duke University Medical Center 24
421 Bryan Research Building, Box 3209 25 Durham, NC 27710, USA 26
Email: [email protected] 27 Twitter: @KafuiDzirasa 28
29 or 30 31
Mai-Anh Vu 32 Department of Neurobiology 33
Duke University Medical Center 34 421 Bryan Research Building, Box 3209 35
Durham, NC 27710, USA 36 Email: [email protected] 37
Twitter: @MaiAnh_T 38 39 40 41
42
43
44
45
46
47
2
48
49
50
Abstract 51
52
With ever-increasing advancements in technology, neuroscientists are able to collect data in 53
greater volumes and with finer resolution. The bottleneck in understanding how the brain works 54
is consequently shifting away from the amount and type of data we can collect and toward what 55
we actually do with the data. There has been a growing interest in leveraging this vast volume of 56
data across levels of analysis, measurement techniques, and experimental paradigms to gain 57
more insight into brain function. Such efforts are visible at an international scale, with the 58
emergence of big data neuroscience initiatives such as the BRAIN initiative(Bargmann et al. 59
2014), the Human Brain Project, the Human Connectome Project, and the National Institute of 60
Mental Health’s (NIMH) Research Domain Criteria (RDoC) initiative. With these large-scale 61
projects, much thought has been given to data-sharing across groups(Poldrack and 62
Gorgolewski 2014; Sejnowski, Churchland, and Movshon 2014); however, even with such data-63
sharing initiatives, funding mechanisms, and infrastructure, there still exists the challenge of 64
how to cohesively integrate all the data. At multiple stages and levels of neuroscience 65
investigation, machine learning holds great promise as an addition to the arsenal of analysis 66
tools for discovering how the brain works. 67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
3
82
83
84
Main Text 85
86
What is Machine Learning? 87
The term machine learning was coined by Arthur Samuel in 1959 to describe the 88
subfield of computer science that involves the “programming of a digital computer to behave in a 89
way which, if done by human beings or animals, would be described as involving the process of 90
learning”(Samuel 1959). In other words, the field investigates how computers can improve 91
predictions, actions, decisions, or perceptions based on data and ongoing experience. The field 92
of machine learning was driven by the development of algorithms for pattern recognition (for 93
example, an algorithm for filtering out unwanted marketing emails), and in general, investigates 94
the development of algorithms that can learn from and make predictions on data. These 95
algorithms largely fall into a few dominant categories: supervised machine learning, 96
unsupervised machine learning, and reinforcement learning. 97
In supervised machine learning, input data, or training data, has known labels, 98
commonly supplied by human experts. The goal is to learn the relationship between the data 99
and the labels such that the computer can predict the label of a previously unseen data item 100
with accuracy comparable to the human expert. For instance, a training data set could consist of 101
a set of emails that are already classified as spam or not spam, and the goal of the computer 102
algorithm is to settle on a model that can accurately label new incoming email as “spam” or “not 103
spam”. As another example, in a functional magnetic resonance imaging (fMRI) study, Shuck et 104
al. use a supervised machine learning classifier to classify the color of the stimuli seen by the 105
subjects based on local fMRI brain activity(Schuck et al. 2015). Examining the classifier 106
accuracy over time and in different brain regions allowed them to infer where and when color 107
was represented in the brain. Regression models, which learn relationships among variables, 108
would fall into the category of supervised machine learning. 109
Reinforcement learning is a branch of supervised machine learning that has inspired and 110
has been inspired by behaviorist psychology. The "classes" to be learned are actions that could 111
be taken in response to a data item. Machines are trained to make decisions through a dynamic 112
trial-and-error process to maximize a desired outcome. The human expert no longer labels 113
each item with the desired class (action), but instead creates a "scoring function" that tells the 114
algorithm how good its move was. For example, a machine might have the goal of winning 115
4
checkers games, and learn to select moves based on past interactions to maximize the chance 116
of winning the checkers match (Sutton and Barto 1998). In a typical situation, a scoring function 117
only provides a reward or information based upon the outcome of a complete task after several 118
actions (i.e. in the checkers match, only after a win/loss is achieved; in a Brain Computer 119
Interface task, only when the objective is successfully obtained). 120
In unsupervised machine learning, on the other hand, the training data have no labels. 121
The goal is to discover hidden structure in the data, perhaps by taking advantage of similarity or 122
redundancy. A well-known example is principal component analysis (PCA), a statistical 123
dimension reduction technique that exploits redundancy in the data using only second-order 124
statistics. For unsupervised machine learning, input data might be composed of the symptom 125
profiles of patients believed to have the same general neuropsychiatric illness but also to 126
comprise meaningful heterogeneity. In this case, the goal of the model would be to group similar 127
patients together, thus uncovering important structure within this diagnosis. This would be an 128
example of a family of algorithms aimed at clustering. Critically, a major challenge with 129
unsupervised machine learning algorithms for clustering or dimensionality reduction is 130
understanding the features that make up the groups or reduced dimensions and transforming 131
them into testable scientific hypotheses. For example, Drysdale et al. used machine learning to 132
discover subtypes of depression based on fMRI functional connectivity, and then subsequently 133
validated their findings via testing the follow-up hypothesis that a treatment modulating cortical 134
connectivity would yield different outcomes among these subgroups (Drysdale et al. 2017). 135
These categories of machine learning differ in their inputs, outputs, and objectives, and 136
thus encompass a powerful set of tools (e.g., classification, regression, clustering) that enable 137
us to refine the ways we make predictions, make decisions, or discover structure from large sets 138
of data. While machine learning has long been applied to the field of computational and 139
theoretical neuroscience, its burgeoning role in broader cellular, systems, and cognitive 140
neuroscience has been more recent, especially as statistical machine learning packages are 141
being made available in standard analysis software. Accordingly, questions arise as to its place 142
in empirical research(“Focus on Big Data” 2014). Data-driven machine learning approaches are 143
often directly contrasted to the more traditional hypothesis-driven approach, in which an 144
experiment is undertaken to mathematically evaluate the plausibility of a concrete, falsifiable 145
proposed explanation or model, given the observed data. These two approaches are often 146
pitted against each other(“Focus on Big Data” 2014). The question should not be whether one 147
approach is better than the other, but rather how and when we can take advantage of these two 148
complementary strategies. 149
5
150
Machine Learning within the Hypothesis-Driven Framework 151
Despite the distinction between data-driven and hypothesis-driven approaches, there are 152
already many applications of machine learning within the hypothesis-driven framework. Many of 153
these serve as ways to save time and effort, mitigate human biases, or make large datasets 154
computationally tractable. For example, in magnetic resonance imaging (MRI), image-155
processing algorithms allow for automated alignment of MRI scans from individual people to 156
atlases(Andersson, Jenkinson, and Smith 2007; Jenkinson et al. 2002; Jenkinson and Smith 157
2001). This registration of individual scans to a unified atlas makes group-level spatial analyses 158
and inferences possible. Other algorithms accomplish automated image segmentation, and 159
allow for applications such as parcellation of structural MRI scans into labeled regions(Fischl et 160
al. 2004), identification of white matter tracts from diffusion MRI(O’Donnell et al. 2017), or 161
identification of neuron structural boundaries in microscopic (EM) images(Jain, Seung, and 162
Turaga 2010). Image-processing algorithms can further be applied to video recordings to 163
automate the measurement, identification, and categorization of animal behaviors(Anderson 164
and Perona 2014; Hong et al. 2015). Such tasks are otherwise accomplished manually, and the 165
development of these technologies immensely reduces the required human time and effort, in 166
turn enabling higher-throughput analysis pipelines. Beyond the added efficiency, with automated 167
processes there is far less room for human subjectivity, biases, or error in coding of images or 168
behaviors. Applied correctly, this can make results more objective, consistent, and reproducible. 169
In addition to these algorithms that aid in data processing, another example of an 170
application of machine learning techniques within the hypothesis-driven framework is as a 171
strategy for hypothesis testing. For example, in a study on hippocampal-prefrontal input and 172
spatial working memory encoding, Spellman et al. optogenetically inhibited the ventral 173
hippocampus (vHPC) projection to the medial prefrontal cortex (mPFC) during a spatial working 174
memory task. From behavioral results, they drew the conclusion that input from vHPC to mPFC 175
is critical for spatial cue encoding(Spellman et al. 2015). To further test this hypothesis, they 176
trained a classifier on the mPFC population firing rate to decode the spatial location of the 177
animal’s goal, and separately to decode the task phase. This classifier approach allowed them 178
to quantify the reliability and strength of these mPFC neural representations. They were then 179
able to statistically show that inhibition of the vHPC-to-mPFC signaling resulted in decreased 180
classifier accuracy for spatial goal location but not for task phase. In other words, the ability to 181
classify an outcome from the mPFC activity became a measure of how well that outcome was 182
encoded in mPFC. This machine learning approach thus allowed them to test their hypothesis 183
6
that these projections were specifically supporting working memory encoding of space, and not 184
other task-relevant features. As another example, Paul et al., used supervised clustering to 185
analyze single-cell transcriptomes of a set of previously anatomically and physiologically 186
characterized cortical GABAergic neurons, and discovered that these categories in fact differ by 187
a transcriptional architecture that encodes their synaptic communication patterns(Paul et al. 188
2017), confirming the subcategorization of these neurons. Thus, there are already many 189
applications of machine learning that serve to bolster hypothesis-driven research, by automating 190
aspects of data processing or by yielding additional strategies for hypothesis testing. 191
192
Machine Learning Beyond the Hypothesis-Driven Framework 193
Machine learning has applications well beyond the hypothesis-driven framework. The 194
more exploratory data-driven approach allows us to explore data in a way that is less limited by 195
our hypothesis space. After all, experiments are only as useful as the hypotheses that they are 196
designed to test, and full hypothesis testing on a drastically expanding dimensionality is 197
intractable. To this end, machine learning methods allow us to extract from our data the 198
dimensions that explain the most variance or even to learn a data-driven taxonomy. For 199
example, data-driven video analysis of behavior may not only serve as an automated 200
replacement for human behavioral coding, but if an unsupervised approach is used, may even 201
generate new behavioral classifications, unlimited by human a priori behavioral classification or 202
labels(Anderson and Perona 2014). In fMRI, data-driven algorithms have been used to 203
parcellate the brain based on fMRI functional connectivity data, yielding a functionally relevant 204
fMRI atlas free of the constraints of a priori brain parcellations and labels(Craddock et al. 2012). 205
In another fMRI study, Chang et al. used machine learning to identify a sensitive and specific 206
neural signature of affective responses to aversive images that was unresponsive to physical 207
pain, thus allowing them to infer neural components differentiating negative emotion from pain, 208
“providing a basis for new, brain-based taxonomies of affective processes”(Chang et al. 2015). 209
Along these lines, independent component analysis (ICA) and classification algorithms have 210
been used to infer neural networks, decode brain states, or separate noise from signal(Calhoun 211
et al. 2014; Jung et al. 2001; Lemm et al. 2011; Thomas, Harshman, and Menon 2002; 212
Whitmore and Lin 2016; Zuo et al. 2010). Such strategies for capitalizing on the high-213
dimensionality and multivariate nature of data are certainly not unique to neuroscience; the 214
entire field of bioinformatics has emerged from this idea and is deeply rooted in machine 215
learning(Larrañaga et al. 2006; Libbrecht and Noble 2015). 216
7
By revealing structure in the data, such machine learning approaches may yield new, 217
testable hypotheses. In a study examining the neural mechanisms underlying stress-induced 218
behavioral adaptation, Carlson et al. recorded cellular activity and local field potentials from 219
multiple brain regions. Using a supervised machine-learning approach, they found that cellular 220
activity in two of the recorded brain regions infralimbic cortex and medial dorsal thalamus 221
showed adaptation across repeated exposure to stress(D. Carlson et al. 2017). This led to a 222
series of follow-up experiments to investigate whether and how these two regions were 223
connected. The two regions were found to be functionally connected via cross-frequency phase 224
coupling, which was further confirmed via an optogenetic manipulation experiment. In this case, 225
machine learning revealed potential predictors, generated a relevant hypothesis space, and 226
helped us to design a (successful) confirmatory experiment. 227
While certainly powerful as a brute force approach to hypothesis generation, data-driven 228
machine learning approaches may also yield more direct insight into brain function. After all, the 229
problems solved by the brain have strong parallels to the problems solved by machine learning. 230
For example, we as humans must be able to make sense of all the multisensory information 231
coming into the brain to make relevant inferences, like whether a person we are talking to may 232
be angry(Wegrzyn, Bruckhaus, and Kissler 2015). Similarly, machine learning algorithms must 233
learn structure from large multidimensional data, and moreover, it can be shown that allowing 234
data from multiple modalities to fully interact and inform each other leads to more powerful 235
biomarkers(Levin-Schwartz, Calhoun, and Adali 2017). Such parallels have made for incredibly 236
powerful cross-pollination between machine learning and neuroscience. For example, 237
reinforcement learning first emerged in computer science in the 1950s(Bellman 1957, 2010; 238
Sutton and Barto 1998). Quantitative models of reinforcement learning emerged to describe 239
error-based learning(Bush and Mosteller 1951; Mackintosh 1975; Pearce and Hall 1980; 240
Rescorla and Wagner 1972). Then in a seminal paper, Schultz et al. showed neuronal evidence 241
for reward prediction error, a key element of reinforcement learning, in the brain(Schultz, Dayan, 242
and Montague 1997), invigorating a whole branch of neuroscience(Gershman and Daw 2017; 243
Niv 2009). Another example is that of deep learning, a family of machine learning algorithms 244
aimed at learning data representations; early artificial neural networks were inspired by 245
neurobiology(McCulloch and Pitts 1990). The deep learning field continued to advance, and 246
developments made on the computational front have now inspired hypotheses on the 247
neuroscience front(Marblestone, Wayne, and Kording 2016). Thus, because of the similarity of 248
the problems being solved by machine learning algorithms and the brain, statistical and 249
computational developments can inform neuroscience and yield new theories of brain function. 250
8
251
Validation of Machine Learning Results 252
A common criticism of data-driven approaches is that they can be void of mechanism 253
and thus can limit inferences and interpretation(T. Carlson et al. 2017). To return to the study on 254
neural adaptation to repeated stress exposure(D. Carlson et al. 2017), the discovery that the 255
cell firing in the infralambic cortex and medial dorsal thalamus was related to the behavior 256
provided little insight into the mechanism per se. What were the cells doing? How were the 257
regions connected? In which direction does the information flow? Does the behavior drive the 258
activity or vice versa? Only through our follow-up experiments was it determined that the two 259
regions were functionally connected through cross-frequency phase coupling. Further 260
optogenetic manipulation experiments revealed that the changes observed in this circuitry were 261
part of a compensatory mechanism in response to repeated stress exposure. 262
As illustrated, one way to overcome the limitations of each of these approaches is to 263
design experiments that leverage the advantages of both approaches (Figure 1). A scientific 264
study could be divided into two phases. The first phase would be aimed at exploration, 265
discovery, and ultimately hypothesis generation. For instance, in an animal study, initial 266
experiments would focus on collecting large, broad data sets, such as cellular activity, LFPs, 267
motion, behavior, etc., from a mouse as it undergoes a contextual manipulation. Machine-268
learning approaches could identify relationships among the dimensions in ways that relate to the 269
mouse’s physiology, behavior, and context, yielding specific hypotheses regarding these 270
relationships. The second phase of the experiment would then be designed to test these 271
concrete hypotheses, using a variety of techniques for biological manipulation. Viral strategies 272
enable us to generate mice with specific and conditional genetic mutations. Technologies such 273
as optogenetics and DREADDs (designer receptors exclusively activated by designer drugs) 274
allow us to manipulate the activity of specific cell types, and in the case of optogenetics, with 275
precise timing and frequency(Armbruster et al. 2007; Boyden et al. 2005). Such technologies 276
allow us to test more refined hypotheses and discover more specific mechanisms underlying 277
brain function, as a recent multisite in vivo recording study showed by linking large scale neural 278
dynamics to stress-related behavioral dysfunction(Hultman et al. 2016). In human subjects, a 279
similar chain of events could be carried out. From machine learning on a high-dimensional fMRI 280
or electroencephalography (EEG) dataset, one could generate hypotheses about functional 281
encoding of behavior, then manipulate the putative encoding through focused brain stimulation. 282
Invasive and non-invasive stimulation techniques are far from the precision of optogenetics, but 283
9
there is emerging evidence that they can be used to change specific features of brain 284
activity(Philip et al. 2017; Smart, Tiruvadi, and Mayberg 2015; Widge et al. 2017). 285
Regardless of whether we use these machine learning approaches to generate 286
hypotheses or to learn something about the computations being carried out in the brain, our 287
success will depend heavily on the model space we choose. A common aphorism in statistics 288
states, “All models are wrong, but some are useful”(Box 1979). In other words, no model we 289
come up with can fully describe what is happening in a given system; however, a model can 290
nonetheless be useful, capturing some amount of truth or making reliable predictions. So how 291
do we know which models are useful? Validation is key. For example, while Drysdale et 292
al.(Drysdale et al. 2017) were able to determine patterns of neural activity that clustered with 293
behavioral symptoms and treatment responses in a cohort of depressed subjects, the scientific 294
and potential clinical value of their finding would have been dramatically diminished if their 295
model failed to predict which treatments a new patient with depression would respond to. Simply 296
put, we can find regularities in nearly any data set, but the real test is whether those regularities 297
or predictions hold up using additional (non-overlapping) data sets handled in the same 298
manner(Woo et al. 2017). Generally, this level of validation is performed within research groups, 299
where model predictions generated on a subset of data acquired during an experiment are 300
validated on another subset of data that was held out during the model generation stage. For 301
example, cellular firing data acquired for animals during a selected subset of experimental days 302
may be used to develop the machine learning models, while data from other days may be used 303
for model validation (Figure 2, top). The limitation of this strategy is that while the model solution 304
may apply to the specific group of animals used for experimental testing, it may not extrapolate 305
to new animals. Thus, it is more optimal for the model validation to be performed out-of-subject. 306
In this scenario, the model would be developed using all of the trials from a specific set of 307
experimental animals, then validated using another set of experimental animals (Figure 2, 308
bottom). The next level of validation is determining whether similar findings will hold up across 309
research groups. In fact, in their study, Drysdale et al., replicated their initial findings from their 310
unsupervised clustering analysis in a completely separate data set, thus adding confidence that 311
these findings are robust(Drysdale et al. 2017). This level of validation has always been the 312
gold-standard for both data- and hypothesis-driven scientific discovery. 313
314
Explainable Artificial Intelligence: A Vision for Machine Learning 315
With proper validation, machine learning has great promise both within and beyond the 316
hypothesis-driven experimental framework. Nonetheless, machine learning further holds the 317
10
potential to generate unifying models of brain function and behavior. Maximizing this potential of 318
machine learning in neuroscience will require a different type of validation approach that 319
emphasizes interpretability and generalizability. In other words, do the machine learning-320
discovered models capture fundamental principles of brain function and reflect causative 321
phenomenon that extrapolate across multiple biologically relevant contexts? Explainable 322
artificial intelligence (XAI) emphasizes the development of more interpretable, explainable 323
models and should be the ultimate goal of big data-neuroscience (Figure 3). To achieve XAI, 324
machine learning models must accomplish a broad set of comprehensive goals. Firstly, models 325
must be based on measurements of brain biology such as cell firing, LFP oscillations, protein 326
levels, blood-oxygen level dependent (BOLD) MRI signal, scalp EEG, etc. Second, models must 327
explain how measured features are organized relative to each other (explainable generative 328
models, which provide a probabilistic description of how the complex observed data can be 329
synthesized from simpler, explainable properties). Explainability, or interpretability, is crucial; 330
while a very complex model might yield more accurate predictions than a simpler model, a high 331
priority for XAI is understanding how the model works and how the variables interact. Third, 332
models must predict specific outcomes such as behavioral or physiological changes in response 333
to perturbations on the system (predictive models and discriminative, or conditional, models). 334
Fourth, models must extrapolate across multiple subjects (generalizable models). Fifth, models 335
must extrapolate across multiple biological contexts (convergent models). An example of an 336
explainable model is the Krebs Cycle, which describes how multiple enzymes and substrates 337
are organized together to generate energy. Additionally, this model of energy production 338
extrapolates across individuals, and across many biological models of health and disease. 339
XAI fits well with the RDoC framework, which strives to integrate many levels of 340
explanation to better understand basic dimensions of human brain function underlying behavior. 341
An XAI model may, for example, explain how gene expression, cell firing, and/or oscillations 342
across multiple brain regions are organized within a biological network. The dependencies 343
inferred among these neural network components should show predictable changes, given an 344
experimental perturbation of gene expression, or cellular excitability, or oscillatory synchrony. 345
The model would also explain how this network relates to a behavior, such as sociability, and it 346
would extrapolate across multiple individuals. Finally, this model would explain why normal 347
social behavior is disrupted in seemingly distinct clinical phenomena such as depression and 348
autism. Thus, XAI could comprehensively explain phenomena at multiple levels and their 349
relationships with each other, and demonstrate that this explanation withstands hypothesis-350
based perturbations and validation. 351
11
Developing these explainable models will require big data sets. One potential strategy 352
for acquiring these data sets could involve longitudinal observations in a relatively small number 353
of subjects (i.e., in the hundreds range). This approach would ultimately facilitate many repeated 354
observations of the brain during behavior. The explainable models would then be built using 355
within-subject variance, isolating the relationship between brain function and behavior relative to 356
a drifting biological baseline. The advantage of this approach is that it can initially be 357
implemented by a smaller number of research groups. Another potential strategy could be to 358
collect time-limited data across a much larger number of subjects. (i.e., in the tens of thousands 359
to millions range). The explainable models would then be built using across-subject variance. 360
This latter approach would likely require that many research groups collaborate in the data 361
collection phase. A critical first step would be to align collection methods, standards, and 362
paradigms across a broad research community, as many within the neuroimaging community 363
are already doing(Gorgolewski et al. 2016; Poldrack et al. 2013). 364
Since developing these explainable models will ultimately also require observations and 365
hypothesis testing in both human and animal studies, directed efforts to build transdisciplinary 366
teams made up of neuroscience researchers, clinicians, and data scientists with varying levels 367
of analytical expertise are warranted. These directed efforts may include developing novel 368
funding mechanisms, or revising current peer review processes to prioritize grant applications 369
that include both human and animal studies. Nevertheless, it will not simply be sufficient to build 370
teams that include expertise in genetics, cellular/molecular neuroscience, systems 371
neuroscience, cognitive neuroscience, treatment paradigms including pharmacology and 372
neuromodulation, behavior quantification in health and disease, statistics, and machine learning. 373
Rather, a unique new group of scientists capable of bridging the broad gaps between these 374
disciplines will be needed to yield the promise of XAI. These scientific “integrators” will need 375
expertise in multiple disciplines, enabling them to successfully translate across the intellectual 376
and cultural boundaries that exist between fields. These boundaries may exist at the level of 377
logic constructs (such as the difference between frequentist and Bayesian statistics), or at the 378
level of simple language. For example, the word “model” has a different connotation for each of 379
the fields described above. 380
So where do we find these scientific integrators? Simply put, we must train them. The 381
medical scientist training program, in which students are trained as both basic scientists and 382
clinicians, is a long-standing example of an approach for developing scientific integrators. Along 383
these lines, our nation’s neuroscience leadership has highlighted the important role that 384
psychiatrists cross-trained in engineering/mathematics will play in advancing neuroscience, and 385
12
grant mechanisms that foster postdoctoral cross-training in neuroscience and data science have 386
recently been promoted through the national BRAIN Initiative(“RFA-MH-17-250: BRAIN Initiative 387
Fellows: Ruth L. Kirschstein National Research Service Award (NRSA) Individual Postdoctoral 388
Fellowship (F32)” n.d.). Finally, we must continue to adapt our neuroscience ecosystem to 389
promote studies that advance XAI. For example, federal agencies can optimize grant review 390
processes both by promoting the broad participation of data scientists and scientific integrators 391
in peer review panels, and by educating peer reviewers on the strengths of data-driven science. 392
There is without doubt still room and a necessary scientific role for traditional hypothesis-driven 393
experiments, but if we are to expand neuroscience to incorporate big data and XAI, then we 394
must allow for and encourage interdisciplinary integrative peer review as well. 395
Machine learning thus holds great promise in advancing the field of neuroscience, not as 396
a replacement for hypothesis-driven research, but in conjunction with it. Machine learning tools 397
can bolster large-scale hypothesis generation, and they have the potential to reveal interactions, 398
structure, and mechanisms of brain and behavior. Importantly, given the dangers of spurious 399
findings or explanations void of mechanism, care must be taken to ensure the utility of such an 400
approach. It is with proper replication, validation, and hypothesis-driven confirmation that 401
machine learning analysis approaches will fulfill the great promise they hold, allowing us to 402
make greater strides toward understanding how the brain works. 403
404
Figure 1: Model for data-driven science supporting hypothesis-driven science. Within the 405
framework of hypothesis-driven science, machine learning can be used to generate hypotheses 406
to be subsequently tested. 407
Figure 2: Hold-out-trial versus out-of-sample model validation. Validation commonly 408
accomplished within animal. For example, a model might be trained on subsets of each animal’s 409
data (top, green) and tested on the remainder of data from the same animal (top, yellow). Here, 410
we propose training on a subset of the animals (bottom, green), and testing on an independent 411
set of animals (bottom, yellow). 412
Figure 3: Explainable Artificial Intelligence (XAI). Here we present a vision for leveraging 413
machine learning toward developing unified models. The criteria for models achieving XAI are 414
that they must be based on measurable brain biology and be generative, discriminative, 415
generalizable, and convergent. 416
417
Acknowledgements: Many of the themes presented in this manuscript reflect areas of 418
consensus that emerged from the National Institutes of Mental Health sponsored Explainable 419
13
Artificial Intelligence meeting on November 10th, 2017 in Washington, DC. This meeting was co-420
organized by Michele Ferrante, G. Sapiro, and H.S. Mayberg. We would like to thank S. Hakimi, 421
S.D. Mague, R.A. Adcock, N.M. Gallagher, A. Talbot, S. Burwell, and R. Hultman for comments 422
on this manuscript. Our work was supported by NIMH grant R01MH099192-05S2 to KD, NSF 423
GRFP to MTV, and Duke Katherine Goodman Stern Fellowship to MTV. 424
425 References 426
Anderson, David J., and Pietro Perona. 2014. “Toward a Science of Computational Ethology.” 427 Neuron 84 (1). Elsevier:18–31. 428
Andersson, Jesper L. R., Mark Jenkinson, and Stephen Smith. 2007. “Non-Linear Registration Aka 429 Spatial Normalisation.” FMRIB Technical Report TR07JA2. 430 https://www.fmrib.ox.ac.uk/datasets/techrep/tr07ja2/tr07ja2.pdf. 431
Armbruster, Blaine N., Xiang Li, Mark H. Pausch, Stefan Herlitze, and Bryan L. Roth. 2007. 432 “Evolving the Lock to Fit the Key to Create a Family of G Protein-Coupled Receptors Potently 433 Activated by an Inert Ligand.” Proceedings of the National Academy of Sciences of the United 434 States of America 104 (12):5163–68. 435
Bargmann, C., W. Newsome, A. Anderson, E. Brown - National Institutes of Health …, and 2014. 436 2014. “BRAIN 2025: A Scientific Vision. Brain Research through Advancing Neurotechnologies 437 (BRAIN) Working Group Report to the Advisory Committee to the ….” 438
Bellman, Richard. 1957. “A Markovian Decision Process.” Journal of Mathematics and Mechanics 6 439 (5). Indiana University Mathematics Department:679–84. 440
———. 2010. Dynamic Programming. Princeton, NJ, USA: Princeton University Press. 441 Box, G. E. P. 1979. “Robustness in the Strategy of Scientific Model Building.” MRC-TSR-1954. 442
WISCONSIN UNIV-MADISON MATHEMATICS RESEARCH CENTER, WISCONSIN UNIV-443 MADISON MATHEMATICS RESEARCH CENTER. 444 http://www.dtic.mil/docs/citations/ADA070213. 445
Boyden, Edward S., Feng Zhang, Ernst Bamberg, Georg Nagel, and Karl Deisseroth. 2005. 446 “Millisecond-Timescale, Genetically Targeted Optical Control of Neural Activity.” Nature 447 Neuroscience 8 (9):1263–68. 448
Bush, R. R., and F. Mosteller. 1951. “A Mathematical Model for Simple Learning.” Psychological 449 Review 58 (5):313–23. 450
Calhoun, Vince D., Robyn Miller, Godfrey Pearlson, and Tulay Adalı. 2014. “The Chronnectome: 451 Time-Varying Connectivity Networks as the next Frontier in fMRI Data Discovery.” Neuron 84 452 (2):262–74. 453
Carlson, David, Lisa K. David, Neil M. Gallagher, Mai-Anh T. Vu, Matthew Shirley, Rainbo Hultman, 454 Joyce Wang, et al. 2017. “Dynamically Timed Stimulation of Corticolimbic Circuitry Activates a 455 Stress-Compensatory Pathway.” Biological Psychiatry 0 (0). 456 https://doi.org/10.1016/j.biopsych.2017.06.008. 457
Carlson, Thomas, Erin Goddard, David M. Kaplan, Colin Klein, and J. Brendan Ritchie. 2017. 458 “Ghosts in Machine Learning for Cognitive Neuroscience: Moving from Data to Theory.” 459 NeuroImage, August. https://doi.org/10.1016/j.neuroimage.2017.08.019. 460
Chang, Luke J., Peter J. Gianaros, Stephen B. Manuck, Anjali Krishnan, and Tor D. Wager. 2015. “A 461 Sensitive and Specific Neural Signature for Picture-Induced Negative Affect.” PLoS Biology 13 462 (6):e1002180. 463
Craddock, R. Cameron, G. Andrew James, Paul E. Holtzheimer, Xiaoping P. Hu, and Helen S. 464 Mayberg. 2012. “A Whole Brain fMRI Atlas Generated via Spatially Constrained Spectral 465 Clustering.” Human Brain Mapping 33 (8):1914–28. 466
14
Drysdale, Andrew T., Logan Grosenick, Jonathan Downar, Katharine Dunlop, Farrokh Mansouri, 467 Yue Meng, Robert N. Fetcho, et al. 2017. “Resting-State Connectivity Biomarkers Define 468 Neurophysiological Subtypes of Depression.” Nature Medicine 23 (1):28–38. 469
Fischl, Bruce, André van der Kouwe, Christophe Destrieux, Eric Halgren, Florent Ségonne, David H. 470 Salat, Evelina Busa, et al. 2004. “Automatically Parcellating the Human Cerebral Cortex.” 471 Cerebral Cortex 14 (1). Oxford University Press:11–22. 472
“Focus on Big Data.” 2014. Nature Neuroscience 17 (11):1429–1429. 473 Gershman, Samuel J., and Nathaniel D. Daw. 2017. “Reinforcement Learning and Episodic Memory 474
in Humans and Animals: An Integrative Framework.” Annual Review of Psychology 68 (1):101–475 28. 476
Gorgolewski, Krzysztof J., Tibor Auer, Vince D. Calhoun, R. Cameron Craddock, Samir Das, Eugene 477 P. Duff, Guillaume Flandin, et al. 2016. “The Brain Imaging Data Structure, a Format for 478 Organizing and Describing Outputs of Neuroimaging Experiments.” Scientific Data 3 479 (June):160044. 480
Hong, Weizhe, Ann Kennedy, Xavier P. Burgos-Artizzu, Moriel Zelikowsky, Santiago G. Navonne, 481 Pietro Perona, and David J. Anderson. 2015. “Automated Measurement of Mouse Social 482 Behaviors Using Depth Sensing, Video Tracking, and Machine Learning.” Proceedings of the 483 National Academy of Sciences of the United States of America 112 (38):E5351–60. 484
Hultman, Rainbo, Stephen D. Mague, Qiang Li, Brittany M. Katz, Nadine Michel, Lizhen Lin, Joyce 485 Wang, et al. 2016. “Dysregulation of Prefrontal Cortex-Mediated Slow-Evolving Limbic 486 Dynamics Drives Stress-Induced Emotional Pathology.” Neuron 91 (2):439–52. 487
Jain, Viren, H. Sebastian Seung, and Srinivas C. Turaga. 2010. “Machines That Learn to Segment 488 Images: A Crucial Technology for Connectomics.” Current Opinion in Neurobiology 20 (5):653–489 66. 490
Jenkinson, Mark, Peter Bannister, Michael Brady, and Stephen Smith. 2002. “Improved Optimization 491 for the Robust and Accurate Linear Registration and Motion Correction of Brain Images.” 492 NeuroImage 17 (2):825–41. 493
Jenkinson, Mark, and Stephen Smith. 2001. “A Global Optimisation Method for Robust Affine 494 Registration of Brain Images.” Medical Image Analysis 5 (2):143–56. 495
Jung, Tzyy-Ping, Scott Makeig, Martin J. McKeown, Anthony J. Bell, Te-Won Lee, and Terrence J. 496 Sejnowski. 2001. “Imaging Brain Dynamics Using Independent Component Analysis.” 497 Proceedings of the IEEE. Institute of Electrical and Electronics Engineers 89 (7):1107–22. 498
Larrañaga, Pedro, Borja Calvo, Roberto Santana, Concha Bielza, Josu Galdiano, Iñaki Inza, José A. 499 Lozano, et al. 2006. “Machine Learning in Bioinformatics.” Briefings in Bioinformatics 7 (1):86–500 112. 501
Lemm, Steven, Benjamin Blankertz, Thorsten Dickhaus, and Klaus-Robert Müller. 2011. 502 “Introduction to Machine Learning for Brain Imaging.” NeuroImage 56 (2):387–99. 503
Levin-Schwartz, Yuri, Vince D. Calhoun, and Tulay Adali. 2017. “Quantifying the Interaction and 504 Contribution of Multiple Datasets in Fusion: Application to the Detection of Schizophrenia.” IEEE 505 Transactions on Medical Imaging 36 (7):1385–95. 506
Libbrecht, Maxwell W., and William Stafford Noble. 2015. “Machine Learning Applications in 507 Genetics and Genomics.” Nature Reviews. Genetics 16 (6):321–32. 508
Mackintosh, N. J. 1975. “A Theory of Attention: Variations in the Associability of Stimuli with 509 Reinforcement.” Psychological Review 82 (4):276–98. 510
Marblestone, Adam H., Greg Wayne, and Konrad P. Kording. 2016. “Toward an Integration of Deep 511 Learning and Neuroscience.” Frontiers in Computational Neuroscience 10. 512 https://doi.org/10.3389/fncom.2016.00094. 513
McCulloch, W. S., and W. Pitts. 1990. “A Logical Calculus of the Ideas Immanent in Nervous Activity. 514 1943.” Bulletin of Mathematical Biology 52 (1-2):99–115; discussion 73–97. 515
Niv, Yael. 2009. “Reinforcement Learning in the Brain.” Journal of Mathematical Psychology 53 516 (3):139–54. 517
15
O’Donnell, Lauren J., Yannick Suter, Laura Rigolo, Pegah Kahali, Fan Zhang, Isaiah Norton, Angela 518 Albi, et al. 2017. “Automated White Matter Fiber Tract Identification in Patients with Brain 519 Tumors.” NeuroImage. Clinical 13:138–53. 520
Paul, Anirban, Megan Crow, Ricardo Raudales, Miao He, Jesse Gillis, and Z. Josh Huang. 2017. 521 “Transcriptional Architecture of Synaptic Communication Delineates GABAergic Neuron 522 Identity.” Cell 171 (3):522–39.e20. 523
Pearce, J. M., and G. Hall. 1980. “A Model for Pavlovian Learning: Variations in the Effectiveness of 524 Conditioned but Not of Unconditioned Stimuli.” Psychological Review 87 (6):532–52. 525
Philip, Noah S., Brent G. Nelson, Flavio Frohlich, Kelvin O. Lim, Alik S. Widge, and Linda L. 526 Carpenter. 2017. “Low-Intensity Transcranial Current Stimulation in Psychiatry.” The American 527 Journal of Psychiatry 174 (7):628–39. 528
Poldrack, Russell A., Deanna M. Barch, Jason P. Mitchell, Tor D. Wager, Anthony D. Wagner, 529 Joseph T. Devlin, Chad Cumba, Oluwasanmi Koyejo, and Michael P. Milham. 2013. “Toward 530 Open Sharing of Task-Based fMRI Data: The OpenfMRI Project.” Frontiers in Neuroinformatics 531 7 (July):12. 532
Poldrack, Russell A., and Krzysztof J. Gorgolewski. 2014. “Making Big Data Open: Data Sharing in 533 Neuroimaging.” Nature Neuroscience 17 (11):1510–17. 534
Rescorla, R. A., and Allan Wagner. 1972. A Theory of Pavlovian Conditioning: Variations in the 535 Effectiveness of Reinforcement and Nonreinforcement. Vol. 2. 536
“RFA-MH-17-250: BRAIN Initiative Fellows: Ruth L. Kirschstein National Research Service Award 537 (NRSA) Individual Postdoctoral Fellowship (F32).” n.d. Accessed November 24, 2017. 538 https://grants.nih.gov/grants/guide/rfa-files/RFA-MH-17-250.html. 539
Samuel, A. L. 1959. “Some Studies in Machine Learning Using the Game of Checkers.” IBM Journal 540 of Research and Development 3 (3):210–29. 541
Schuck, Nicolas W., Robert Gaschler, Dorit Wenke, Jakob Heinzle, Peter A. Frensch, John-Dylan 542 Haynes, and Carlo Reverberi. 2015. “Medial Prefrontal Cortex Predicts Internally Driven 543 Strategy Shifts.” Neuron 86 (1):331–40. 544
Schultz, W., P. Dayan, and P. R. Montague. 1997. “A Neural Substrate of Prediction and Reward.” 545 Science 275 (5306):1593–99. 546
Sejnowski, Terrence J., Patricia S. Churchland, and J. Anthony Movshon. 2014. “Putting Big Data to 547 Good Use in Neuroscience.” Nature Neuroscience 17 (11):1440–41. 548
Smart, Otis Lkuwamy, Vineet Ravi Tiruvadi, and Helen S. Mayberg. 2015. “Multimodal Approaches 549 to Define Network Oscillations in Depression.” Biological Psychiatry 77 (12):1061–70. 550
Spellman, Timothy, Mattia Rigotti, Susanne E. Ahmari, Stefano Fusi, Joseph A. Gogos, and Joshua 551 A. Gordon. 2015. “Hippocampal-Prefrontal Input Supports Spatial Encoding in Working 552 Memory.” Nature 522 (7556):309–14. 553
Sutton, Richard S., and Andrew G. Barto. 1998. Reinforcement Learning: An Introduction. Vol. 1. 554 MIT press Cambridge. 555
Thomas, Christopher G., Richard A. Harshman, and Ravi S. Menon. 2002. “Noise Reduction in 556 BOLD-Based fMRI Using Component Analysis.” NeuroImage 17 (3):1521–37. 557
Wegrzyn, Martin, Isabelle Bruckhaus, and Johanna Kissler. 2015. “Categorical Perception of Fear 558 and Anger Expressions in Whole, Masked and Composite Faces.” PloS One 10 (8):e0134790. 559
Whitmore, Nathan W., and Shih-Chieh Lin. 2016. “Unmasking Local Activity within Local Field 560 Potentials (LFPs) by Removing Distal Electrical Signals Using Independent Component 561 Analysis.” NeuroImage 132 (May):79–92. 562
Widge, Alik S., Kristen K. Ellard, Angelique C. Paulk, Ishita Basu, Ali Yousefi, Samuel Zorowitz, 563 Anna Gilmour, et al. 2017. “Treating Refractory Mental Illness with Closed-Loop Brain 564 Stimulation: Progress towards a Patient-Specific Transdiagnostic Approach.” Experimental 565 Neurology 287 (Pt 4):461–72. 566
Woo, Choong-Wan, Luke J. Chang, Martin A. Lindquist, and Tor D. Wager. 2017. “Building Better 567 Biomarkers: Brain Models in Translational Neuroimaging.” Nature Neuroscience 20 (3):365–77. 568
16
Zuo, Xi-Nian, Clare Kelly, Jonathan S. Adelstein, Donald F. Klein, F. Xavier Castellanos, and 569 Michael P. Milham. 2010. “Reliable Intrinsic Connectivity Networks: Test-Retest Evaluation 570 Using ICA and Dual Regression Approach.” NeuroImage 49 (3):2163–77. 571
Previous Research
Data-Driven Machine learning
Hypothesis Genera on
Perform Experiments
Analyze ResultsDraw Conclusion
Trial 1 Trial 2 Trial 3 Trial 4 Trial 5 Trial 6 Trial 7 Trial 8 Trial 9 Trial 10 Trial 11 Trial 12Animal 1Animal 2Animal 3Animal 4Animal 5Animal 6Animal 7Animal 8Animal 9Animal 10Animal 11Animal 12
Trial 1 Trial 2 Trial 3 Trial 4 Trial 5 Trial 6 Trial 7 Trial 8 Trial 9 Trial 10 Trial 11 Trial 12Animal 1Animal 2Animal 3Animal 4Animal 5Animal 6Animal 7Animal 8Animal 9Animal 10Animal 11Animal 12
Descrip ve?(explain feature
organiza on)
Predic ve?(predict system perturba ons)
Generalizable?(extrapolate across mul ple subjects)
Convergent?(extrapolate across mul ple biological
contexts)
NO
YES
Previous Research
Hypothesis-Driven
Experiments
Data-Driven Machine Learning
Feature Selec on:
Direct Brain Measurements
Model Development
XAI