a shared vision for machine learning in neuroscience€¦ · 26/01/2018 · í í a shared vision...

Accepted manuscripts are peer-reviewed but have not been through the copyediting, formatting, or proofreadingprocess.

Copyright © 2018 the authors

This Accepted Manuscript has not been copyedited and formatted. The final version may differ from this version.

TechSights

A Shared Vision for Machine Learning in Neuroscience

Mai-Anh T. Vu1, Tulay Adali7, Demba Ba8, Gyorgy Buzsaki9, David Carlson3, Katherine Heller4, Conor

Liston10, Cynthia Rudin5,6, Vikaas Sohal11, Alik S. Widge12, Helen S. Mayberg13, Guillermo Sapiro5 and

Kafui Dzirasa1,2

1Dept. of Neurobiology2Dept. of Psychiatry and Behavioral Sciences3Dept. of Civil and Environmental Engineering4Department of Statistical Sciences5Dept. of Electrical and Computer Engineering6Dept. of Computer Science; Duke University, Durham, North Carolina 27710, USA.7Dept. of Computer Science and Electrical Engineering; University of Maryland Baltimore County, Baltimore,Maryland 21250.8Dept. of Electrical Engineering and Bioengineering; Harvard University, Cambridge, MA 02138.9Dept. of Neuroscience; NYU School of Medicine, New York, NY 10016.10Feil Family Brain and Mind Research Institute, Weill Cornell Medical College, New York, 10065.11Dept. of Psychiatry, University of California San Francisco, San Francisco, CA 94158.12Dept. of Psychiatry, Massachusetts General Hospital, Charlestown, MA 02129.13Dept. of Psychiatry, Neurology, and Radiology; Emory University, Atlanta, GA 30322.

DOI: 10.1523/JNEUROSCI.0508-17.2018

Received: 25 September 2017

Revised: 2 January 2018

Accepted: 9 January 2018

Published: 26 January 2018

Conflict of Interest: The authors declare no competing financial interests.

Many of the themes presented in this manuscript reflect areas of consensus that emerged from the NationalInstitutes of Mental Health sponsored Explainable Artificial Intelligence meeting on November 10th, 2017 inWashington, DC. This meeting was co-organized by Michele Ferrante, G. Sapiro, and H.S. Mayberg. We wouldlike to thank S. Hakimi, S.D. Mague, R.A. Adcock, N.M. Gallagher, A. Talbot, S. Burwell, and R. Hultman forcomments on this manuscript. Our work was supported by NIMH grant R01MH099192-05S2 to KD, NSF GRFPto MTV, and Duke Katherine Goodman Stern Fellowship to MTV.

Correspondence should be sent to: Kafui Dzirasa, M.D. Ph.D., Department of Psychiatry and BehavioralSciences, Duke University Medical Center, 421 Bryan Research Building, Box 3209, Durham, NC 27710,USA, Email: [email protected], Twitter: @KafuiDzirasa, or Mai-Anh Vu, Department of Neurobiology,Duke University Medical Center, 421 Bryan Research Building, Box 3209, Durham, NC 27710, USA, Email:[email protected], Twitter: @MaiAnh_T

Cite as: J. Neurosci ; 10.1523/JNEUROSCI.0508-17.2018

Alerts: Sign up at www.jneurosci.org/cgi/alerts to receive customized email alerts when the fully formattedversion of this article is published.

1

A Shared Vision for Machine Learning in Neuroscience 1 2

Mai-Anh T. Vu1†, Tulay Adali7, Demba Ba8, Gyorgy Buzsaki9, David Carlson3, Katherine Heller4, 3 Conor Liston10, Cynthia Rudin5,6, Vikaas Sohal11, Alik S. Widge12, Helen S. Mayberg13, Guillermo 4

Sapiro5, Kafui Dzirasa1,2† 5 6 7

1Dept. of Neurobiology, 2Dept. of Psychiatry and Behavioral Sciences, 3Dept. of Civil and 8 Environmental Engineering, 4Department of Statistical Sciences, 9

5Dept. of Electrical and Computer Engineering, 6Dept. of Computer Science; Duke University, 10 Durham, North Carolina 27710, USA. 7Dept. of Computer Science and Electrical Engineering; 11

University of Maryland Baltimore County, Baltimore, Maryland 21250. 8Dept. of Electrical 12 Engineering and Bioengineering; Harvard University, Cambridge, MA 02138. 9Dept. of 13

Neuroscience; NYU School of Medicine, New York, NY 10016. 10Feil Family Brain and Mind 14 Research Institute, Weill Cornell Medical College, New York, New York, 10065. 11Dept. of 15 Psychiatry, University of California San Francisco, San Francisco, CA 94158. 12Dept. of 16

Psychiatry, Massachusetts General Hospital, Charlestown, MA 02129. 13Dept. of Psychiatry, 17 Neurology, and Radiology; Emory University, Atlanta, GA 30322. 18

19 20

†Correspondence should be sent to: 21 Kafui Dzirasa, M.D. Ph.D. 22

Department of Psychiatry and Behavioral Sciences 23 Duke University Medical Center 24

421 Bryan Research Building, Box 3209 25 Durham, NC 27710, USA 26

Email: [email protected] 27 Twitter: @KafuiDzirasa 28

29 or 30 31

Mai-Anh Vu 32 Department of Neurobiology 33

Duke University Medical Center 34 421 Bryan Research Building, Box 3209 35

Durham, NC 27710, USA 36 Email: [email protected] 37

Twitter: @MaiAnh_T 38 39 40 41

42

43

44

45

46

47

2

48

49

50

Abstract 51

52

With ever-increasing advancements in technology, neuroscientists are able to collect data in 53

greater volumes and with finer resolution. The bottleneck in understanding how the brain works 54

is consequently shifting away from the amount and type of data we can collect and toward what 55

we actually do with the data. There has been a growing interest in leveraging this vast volume of 56

data across levels of analysis, measurement techniques, and experimental paradigms to gain 57

more insight into brain function. Such efforts are visible at an international scale, with the 58

emergence of big data neuroscience initiatives such as the BRAIN initiative(Bargmann et al. 59

2014), the Human Brain Project, the Human Connectome Project, and the National Institute of 60

Mental Health’s (NIMH) Research Domain Criteria (RDoC) initiative. With these large-scale 61

projects, much thought has been given to data-sharing across groups(Poldrack and 62

Gorgolewski 2014; Sejnowski, Churchland, and Movshon 2014); however, even with such data-63

sharing initiatives, funding mechanisms, and infrastructure, there still exists the challenge of 64

how to cohesively integrate all the data. At multiple stages and levels of neuroscience 65

investigation, machine learning holds great promise as an addition to the arsenal of analysis 66

tools for discovering how the brain works. 67

68

69

70

71

72

73

74

75

76

77

78

79

80

81

3

82

83

84

Main Text 85

86

What is Machine Learning? 87

The term machine learning was coined by Arthur Samuel in 1959 to describe the 88

subfield of computer science that involves the “programming of a digital computer to behave in a 89

way which, if done by human beings or animals, would be described as involving the process of 90

learning”(Samuel 1959). In other words, the field investigates how computers can improve 91

predictions, actions, decisions, or perceptions based on data and ongoing experience. The field 92

of machine learning was driven by the development of algorithms for pattern recognition (for 93

example, an algorithm for filtering out unwanted marketing emails), and in general, investigates 94

the development of algorithms that can learn from and make predictions on data. These 95

algorithms largely fall into a few dominant categories: supervised machine learning, 96

unsupervised machine learning, and reinforcement learning. 97

In supervised machine learning, input data, or training data, has known labels, 98

commonly supplied by human experts. The goal is to learn the relationship between the data 99

and the labels such that the computer can predict the label of a previously unseen data item 100

with accuracy comparable to the human expert. For instance, a training data set could consist of 101

a set of emails that are already classified as spam or not spam, and the goal of the computer 102

algorithm is to settle on a model that can accurately label new incoming email as “spam” or “not 103

spam”. As another example, in a functional magnetic resonance imaging (fMRI) study, Shuck et 104

al. use a supervised machine learning classifier to classify the color of the stimuli seen by the 105

subjects based on local fMRI brain activity(Schuck et al. 2015). Examining the classifier 106

accuracy over time and in different brain regions allowed them to infer where and when color 107

was represented in the brain. Regression models, which learn relationships among variables, 108

would fall into the category of supervised machine learning. 109

Reinforcement learning is a branch of supervised machine learning that has inspired and 110

has been inspired by behaviorist psychology. The "classes" to be learned are actions that could 111

be taken in response to a data item. Machines are trained to make decisions through a dynamic 112

trial-and-error process to maximize a desired outcome. The human expert no longer labels 113

each item with the desired class (action), but instead creates a "scoring function" that tells the 114

algorithm how good its move was. For example, a machine might have the goal of winning 115

4

checkers games, and learn to select moves based on past interactions to maximize the chance 116

of winning the checkers match (Sutton and Barto 1998). In a typical situation, a scoring function 117

only provides a reward or information based upon the outcome of a complete task after several 118

actions (i.e. in the checkers match, only after a win/loss is achieved; in a Brain Computer 119

Interface task, only when the objective is successfully obtained). 120

In unsupervised machine learning, on the other hand, the training data have no labels. 121

The goal is to discover hidden structure in the data, perhaps by taking advantage of similarity or 122

redundancy. A well-known example is principal component analysis (PCA), a statistical 123

dimension reduction technique that exploits redundancy in the data using only second-order 124

statistics. For unsupervised machine learning, input data might be composed of the symptom 125

profiles of patients believed to have the same general neuropsychiatric illness but also to 126

comprise meaningful heterogeneity. In this case, the goal of the model would be to group similar 127

patients together, thus uncovering important structure within this diagnosis. This would be an 128

example of a family of algorithms aimed at clustering. Critically, a major challenge with 129

unsupervised machine learning algorithms for clustering or dimensionality reduction is 130

understanding the features that make up the groups or reduced dimensions and transforming 131

them into testable scientific hypotheses. For example, Drysdale et al. used machine learning to 132

discover subtypes of depression based on fMRI functional connectivity, and then subsequently 133

validated their findings via testing the follow-up hypothesis that a treatment modulating cortical 134

connectivity would yield different outcomes among these subgroups (Drysdale et al. 2017). 135

These categories of machine learning differ in their inputs, outputs, and objectives, and 136

thus encompass a powerful set of tools (e.g., classification, regression, clustering) that enable 137

us to refine the ways we make predictions, make decisions, or discover structure from large sets 138

of data. While machine learning has long been applied to the field of computational and 139

theoretical neuroscience, its burgeoning role in broader cellular, systems, and cognitive 140

neuroscience has been more recent, especially as statistical machine learning packages are 141

being made available in standard analysis software. Accordingly, questions arise as to its place 142

in empirical research(“Focus on Big Data” 2014). Data-driven machine learning approaches are 143

often directly contrasted to the more traditional hypothesis-driven approach, in which an 144

experiment is undertaken to mathematically evaluate the plausibility of a concrete, falsifiable 145

proposed explanation or model, given the observed data. These two approaches are often 146

pitted against each other(“Focus on Big Data” 2014). The question should not be whether one 147

approach is better than the other, but rather how and when we can take advantage of these two 148

complementary strategies. 149

5

150

Machine Learning within the Hypothesis-Driven Framework 151

Despite the distinction between data-driven and hypothesis-driven approaches, there are 152

already many applications of machine learning within the hypothesis-driven framework. Many of 153

these serve as ways to save time and effort, mitigate human biases, or make large datasets 154

computationally tractable. For example, in magnetic resonance imaging (MRI), image-155

processing algorithms allow for automated alignment of MRI scans from individual people to 156

atlases(Andersson, Jenkinson, and Smith 2007; Jenkinson et al. 2002; Jenkinson and Smith 157

2001). This registration of individual scans to a unified atlas makes group-level spatial analyses 158

and inferences possible. Other algorithms accomplish automated image segmentation, and 159

allow for applications such as parcellation of structural MRI scans into labeled regions(Fischl et 160

al. 2004), identification of white matter tracts from diffusion MRI(O’Donnell et al. 2017), or 161

identification of neuron structural boundaries in microscopic (EM) images(Jain, Seung, and 162

Turaga 2010). Image-processing algorithms can further be applied to video recordings to 163

automate the measurement, identification, and categorization of animal behaviors(Anderson 164

and Perona 2014; Hong et al. 2015). Such tasks are otherwise accomplished manually, and the 165

development of these technologies immensely reduces the required human time and effort, in 166

turn enabling higher-throughput analysis pipelines. Beyond the added efficiency, with automated 167

processes there is far less room for human subjectivity, biases, or error in coding of images or 168

behaviors. Applied correctly, this can make results more objective, consistent, and reproducible. 169

In addition to these algorithms that aid in data processing, another example of an 170

application of machine learning techniques within the hypothesis-driven framework is as a 171

strategy for hypothesis testing. For example, in a study on hippocampal-prefrontal input and 172

spatial working memory encoding, Spellman et al. optogenetically inhibited the ventral 173

hippocampus (vHPC) projection to the medial prefrontal cortex (mPFC) during a spatial working 174

memory task. From behavioral results, they drew the conclusion that input from vHPC to mPFC 175

is critical for spatial cue encoding(Spellman et al. 2015). To further test this hypothesis, they 176

trained a classifier on the mPFC population firing rate to decode the spatial location of the 177

animal’s goal, and separately to decode the task phase. This classifier approach allowed them 178

to quantify the reliability and strength of these mPFC neural representations. They were then 179

able to statistically show that inhibition of the vHPC-to-mPFC signaling resulted in decreased 180

classifier accuracy for spatial goal location but not for task phase. In other words, the ability to 181

classify an outcome from the mPFC activity became a measure of how well that outcome was 182

encoded in mPFC. This machine learning approach thus allowed them to test their hypothesis 183

6

that these projections were specifically supporting working memory encoding of space, and not 184

other task-relevant features. As another example, Paul et al., used supervised clustering to 185

analyze single-cell transcriptomes of a set of previously anatomically and physiologically 186

characterized cortical GABAergic neurons, and discovered that these categories in fact differ by 187

a transcriptional architecture that encodes their synaptic communication patterns(Paul et al. 188

2017), confirming the subcategorization of these neurons. Thus, there are already many 189

applications of machine learning that serve to bolster hypothesis-driven research, by automating 190

aspects of data processing or by yielding additional strategies for hypothesis testing. 191

192

Machine Learning Beyond the Hypothesis-Driven Framework 193

Machine learning has applications well beyond the hypothesis-driven framework. The 194

more exploratory data-driven approach allows us to explore data in a way that is less limited by 195

our hypothesis space. After all, experiments are only as useful as the hypotheses that they are 196

designed to test, and full hypothesis testing on a drastically expanding dimensionality is 197

intractable. To this end, machine learning methods allow us to extract from our data the 198

dimensions that explain the most variance or even to learn a data-driven taxonomy. For 199

example, data-driven video analysis of behavior may not only serve as an automated 200

replacement for human behavioral coding, but if an unsupervised approach is used, may even 201

generate new behavioral classifications, unlimited by human a priori behavioral classification or 202

labels(Anderson and Perona 2014). In fMRI, data-driven algorithms have been used to 203

parcellate the brain based on fMRI functional connectivity data, yielding a functionally relevant 204

fMRI atlas free of the constraints of a priori brain parcellations and labels(Craddock et al. 2012). 205

In another fMRI study, Chang et al. used machine learning to identify a sensitive and specific 206

neural signature of affective responses to aversive images that was unresponsive to physical 207

pain, thus allowing them to infer neural components differentiating negative emotion from pain, 208

“providing a basis for new, brain-based taxonomies of affective processes”(Chang et al. 2015). 209

Along these lines, independent component analysis (ICA) and classification algorithms have 210

been used to infer neural networks, decode brain states, or separate noise from signal(Calhoun 211

et al. 2014; Jung et al. 2001; Lemm et al. 2011; Thomas, Harshman, and Menon 2002; 212

Whitmore and Lin 2016; Zuo et al. 2010). Such strategies for capitalizing on the high-213

dimensionality and multivariate nature of data are certainly not unique to neuroscience; the 214

entire field of bioinformatics has emerged from this idea and is deeply rooted in machine 215

learning(Larrañaga et al. 2006; Libbrecht and Noble 2015). 216

7

By revealing structure in the data, such machine learning approaches may yield new, 217

testable hypotheses. In a study examining the neural mechanisms underlying stress-induced 218

behavioral adaptation, Carlson et al. recorded cellular activity and local field potentials from 219

multiple brain regions. Using a supervised machine-learning approach, they found that cellular 220

activity in two of the recorded brain regions infralimbic cortex and medial dorsal thalamus 221

showed adaptation across repeated exposure to stress(D. Carlson et al. 2017). This led to a 222

series of follow-up experiments to investigate whether and how these two regions were 223

connected. The two regions were found to be functionally connected via cross-frequency phase 224

coupling, which was further confirmed via an optogenetic manipulation experiment. In this case, 225

machine learning revealed potential predictors, generated a relevant hypothesis space, and 226

helped us to design a (successful) confirmatory experiment. 227

While certainly powerful as a brute force approach to hypothesis generation, data-driven 228

machine learning approaches may also yield more direct insight into brain function. After all, the 229

problems solved by the brain have strong parallels to the problems solved by machine learning. 230

For example, we as humans must be able to make sense of all the multisensory information 231

coming into the brain to make relevant inferences, like whether a person we are talking to may 232

be angry(Wegrzyn, Bruckhaus, and Kissler 2015). Similarly, machine learning algorithms must 233

learn structure from large multidimensional data, and moreover, it can be shown that allowing 234

data from multiple modalities to fully interact and inform each other leads to more powerful 235

biomarkers(Levin-Schwartz, Calhoun, and Adali 2017). Such parallels have made for incredibly 236

powerful cross-pollination between machine learning and neuroscience. For example, 237

reinforcement learning first emerged in computer science in the 1950s(Bellman 1957, 2010; 238

Sutton and Barto 1998). Quantitative models of reinforcement learning emerged to describe 239

error-based learning(Bush and Mosteller 1951; Mackintosh 1975; Pearce and Hall 1980; 240

Rescorla and Wagner 1972). Then in a seminal paper, Schultz et al. showed neuronal evidence 241

for reward prediction error, a key element of reinforcement learning, in the brain(Schultz, Dayan, 242

and Montague 1997), invigorating a whole branch of neuroscience(Gershman and Daw 2017; 243

Niv 2009). Another example is that of deep learning, a family of machine learning algorithms 244

aimed at learning data representations; early artificial neural networks were inspired by 245

neurobiology(McCulloch and Pitts 1990). The deep learning field continued to advance, and 246

developments made on the computational front have now inspired hypotheses on the 247

neuroscience front(Marblestone, Wayne, and Kording 2016). Thus, because of the similarity of 248

the problems being solved by machine learning algorithms and the brain, statistical and 249

computational developments can inform neuroscience and yield new theories of brain function. 250

8

251

Validation of Machine Learning Results 252

A common criticism of data-driven approaches is that they can be void of mechanism 253

and thus can limit inferences and interpretation(T. Carlson et al. 2017). To return to the study on 254

neural adaptation to repeated stress exposure(D. Carlson et al. 2017), the discovery that the 255

cell firing in the infralambic cortex and medial dorsal thalamus was related to the behavior 256

provided little insight into the mechanism per se. What were the cells doing? How were the 257

regions connected? In which direction does the information flow? Does the behavior drive the 258

activity or vice versa? Only through our follow-up experiments was it determined that the two 259

regions were functionally connected through cross-frequency phase coupling. Further 260

optogenetic manipulation experiments revealed that the changes observed in this circuitry were 261

part of a compensatory mechanism in response to repeated stress exposure. 262

As illustrated, one way to overcome the limitations of each of these approaches is to 263

design experiments that leverage the advantages of both approaches (Figure 1). A scientific 264

study could be divided into two phases. The first phase would be aimed at exploration, 265

discovery, and ultimately hypothesis generation. For instance, in an animal study, initial 266

experiments would focus on collecting large, broad data sets, such as cellular activity, LFPs, 267

motion, behavior, etc., from a mouse as it undergoes a contextual manipulation. Machine-268

learning approaches could identify relationships among the dimensions in ways that relate to the 269

mouse’s physiology, behavior, and context, yielding specific hypotheses regarding these 270

relationships. The second phase of the experiment would then be designed to test these 271

concrete hypotheses, using a variety of techniques for biological manipulation. Viral strategies 272

enable us to generate mice with specific and conditional genetic mutations. Technologies such 273

as optogenetics and DREADDs (designer receptors exclusively activated by designer drugs) 274

allow us to manipulate the activity of specific cell types, and in the case of optogenetics, with 275

precise timing and frequency(Armbruster et al. 2007; Boyden et al. 2005). Such technologies 276

allow us to test more refined hypotheses and discover more specific mechanisms underlying 277

brain function, as a recent multisite in vivo recording study showed by linking large scale neural 278

dynamics to stress-related behavioral dysfunction(Hultman et al. 2016). In human subjects, a 279

similar chain of events could be carried out. From machine learning on a high-dimensional fMRI 280

or electroencephalography (EEG) dataset, one could generate hypotheses about functional 281

encoding of behavior, then manipulate the putative encoding through focused brain stimulation. 282

Invasive and non-invasive stimulation techniques are far from the precision of optogenetics, but 283

9

there is emerging evidence that they can be used to change specific features of brain 284

activity(Philip et al. 2017; Smart, Tiruvadi, and Mayberg 2015; Widge et al. 2017). 285

Regardless of whether we use these machine learning approaches to generate 286

hypotheses or to learn something about the computations being carried out in the brain, our 287

success will depend heavily on the model space we choose. A common aphorism in statistics 288

states, “All models are wrong, but some are useful”(Box 1979). In other words, no model we 289

come up with can fully describe what is happening in a given system; however, a model can 290

nonetheless be useful, capturing some amount of truth or making reliable predictions. So how 291

do we know which models are useful? Validation is key. For example, while Drysdale et 292

al.(Drysdale et al. 2017) were able to determine patterns of neural activity that clustered with 293

behavioral symptoms and treatment responses in a cohort of depressed subjects, the scientific 294

and potential clinical value of their finding would have been dramatically diminished if their 295

model failed to predict which treatments a new patient with depression would respond to. Simply 296

put, we can find regularities in nearly any data set, but the real test is whether those regularities 297

or predictions hold up using additional (non-overlapping) data sets handled in the same 298

manner(Woo et al. 2017). Generally, this level of validation is performed within research groups, 299

where model predictions generated on a subset of data acquired during an experiment are 300

validated on another subset of data that was held out during the model generation stage. For 301

example, cellular firing data acquired for animals during a selected subset of experimental days 302

may be used to develop the machine learning models, while data from other days may be used 303

for model validation (Figure 2, top). The limitation of this strategy is that while the model solution 304

may apply to the specific group of animals used for experimental testing, it may not extrapolate 305

to new animals. Thus, it is more optimal for the model validation to be performed out-of-subject. 306

In this scenario, the model would be developed using all of the trials from a specific set of 307

experimental animals, then validated using another set of experimental animals (Figure 2, 308

bottom). The next level of validation is determining whether similar findings will hold up across 309

research groups. In fact, in their study, Drysdale et al., replicated their initial findings from their 310

unsupervised clustering analysis in a completely separate data set, thus adding confidence that 311

these findings are robust(Drysdale et al. 2017). This level of validation has always been the 312

gold-standard for both data- and hypothesis-driven scientific discovery. 313

314

Explainable Artificial Intelligence: A Vision for Machine Learning 315

With proper validation, machine learning has great promise both within and beyond the 316

hypothesis-driven experimental framework. Nonetheless, machine learning further holds the 317

10

potential to generate unifying models of brain function and behavior. Maximizing this potential of 318

machine learning in neuroscience will require a different type of validation approach that 319

emphasizes interpretability and generalizability. In other words, do the machine learning-320

discovered models capture fundamental principles of brain function and reflect causative 321

phenomenon that extrapolate across multiple biologically relevant contexts? Explainable 322

artificial intelligence (XAI) emphasizes the development of more interpretable, explainable 323

models and should be the ultimate goal of big data-neuroscience (Figure 3). To achieve XAI, 324

machine learning models must accomplish a broad set of comprehensive goals. Firstly, models 325

must be based on measurements of brain biology such as cell firing, LFP oscillations, protein 326

levels, blood-oxygen level dependent (BOLD) MRI signal, scalp EEG, etc. Second, models must 327

explain how measured features are organized relative to each other (explainable generative 328

models, which provide a probabilistic description of how the complex observed data can be 329

synthesized from simpler, explainable properties). Explainability, or interpretability, is crucial; 330

while a very complex model might yield more accurate predictions than a simpler model, a high 331

priority for XAI is understanding how the model works and how the variables interact. Third, 332

models must predict specific outcomes such as behavioral or physiological changes in response 333

to perturbations on the system (predictive models and discriminative, or conditional, models). 334

Fourth, models must extrapolate across multiple subjects (generalizable models). Fifth, models 335

must extrapolate across multiple biological contexts (convergent models). An example of an 336

explainable model is the Krebs Cycle, which describes how multiple enzymes and substrates 337

are organized together to generate energy. Additionally, this model of energy production 338

extrapolates across individuals, and across many biological models of health and disease. 339

XAI fits well with the RDoC framework, which strives to integrate many levels of 340

explanation to better understand basic dimensions of human brain function underlying behavior. 341

An XAI model may, for example, explain how gene expression, cell firing, and/or oscillations 342

across multiple brain regions are organized within a biological network. The dependencies 343

inferred among these neural network components should show predictable changes, given an 344

experimental perturbation of gene expression, or cellular excitability, or oscillatory synchrony. 345

The model would also explain how this network relates to a behavior, such as sociability, and it 346

would extrapolate across multiple individuals. Finally, this model would explain why normal 347

social behavior is disrupted in seemingly distinct clinical phenomena such as depression and 348

autism. Thus, XAI could comprehensively explain phenomena at multiple levels and their 349

relationships with each other, and demonstrate that this explanation withstands hypothesis-350

based perturbations and validation. 351

11

Developing these explainable models will require big data sets. One potential strategy 352

for acquiring these data sets could involve longitudinal observations in a relatively small number 353

of subjects (i.e., in the hundreds range). This approach would ultimately facilitate many repeated 354

observations of the brain during behavior. The explainable models would then be built using 355

within-subject variance, isolating the relationship between brain function and behavior relative to 356

a drifting biological baseline. The advantage of this approach is that it can initially be 357

implemented by a smaller number of research groups. Another potential strategy could be to 358

collect time-limited data across a much larger number of subjects. (i.e., in the tens of thousands 359

to millions range). The explainable models would then be built using across-subject variance. 360

This latter approach would likely require that many research groups collaborate in the data 361

collection phase. A critical first step would be to align collection methods, standards, and 362

paradigms across a broad research community, as many within the neuroimaging community 363

are already doing(Gorgolewski et al. 2016; Poldrack et al. 2013). 364

Since developing these explainable models will ultimately also require observations and 365

hypothesis testing in both human and animal studies, directed efforts to build transdisciplinary 366

teams made up of neuroscience researchers, clinicians, and data scientists with varying levels 367

of analytical expertise are warranted. These directed efforts may include developing novel 368

funding mechanisms, or revising current peer review processes to prioritize grant applications 369

that include both human and animal studies. Nevertheless, it will not simply be sufficient to build 370

teams that include expertise in genetics, cellular/molecular neuroscience, systems 371

neuroscience, cognitive neuroscience, treatment paradigms including pharmacology and 372

neuromodulation, behavior quantification in health and disease, statistics, and machine learning. 373

Rather, a unique new group of scientists capable of bridging the broad gaps between these 374

disciplines will be needed to yield the promise of XAI. These scientific “integrators” will need 375

expertise in multiple disciplines, enabling them to successfully translate across the intellectual 376

and cultural boundaries that exist between fields. These boundaries may exist at the level of 377

logic constructs (such as the difference between frequentist and Bayesian statistics), or at the 378

level of simple language. For example, the word “model” has a different connotation for each of 379

the fields described above. 380

So where do we find these scientific integrators? Simply put, we must train them. The 381

medical scientist training program, in which students are trained as both basic scientists and 382

clinicians, is a long-standing example of an approach for developing scientific integrators. Along 383

these lines, our nation’s neuroscience leadership has highlighted the important role that 384

psychiatrists cross-trained in engineering/mathematics will play in advancing neuroscience, and 385

12

grant mechanisms that foster postdoctoral cross-training in neuroscience and data science have 386

recently been promoted through the national BRAIN Initiative(“RFA-MH-17-250: BRAIN Initiative 387

Fellows: Ruth L. Kirschstein National Research Service Award (NRSA) Individual Postdoctoral 388

Fellowship (F32)” n.d.). Finally, we must continue to adapt our neuroscience ecosystem to 389

promote studies that advance XAI. For example, federal agencies can optimize grant review 390

processes both by promoting the broad participation of data scientists and scientific integrators 391

in peer review panels, and by educating peer reviewers on the strengths of data-driven science. 392

There is without doubt still room and a necessary scientific role for traditional hypothesis-driven 393

experiments, but if we are to expand neuroscience to incorporate big data and XAI, then we 394

must allow for and encourage interdisciplinary integrative peer review as well. 395

Machine learning thus holds great promise in advancing the field of neuroscience, not as 396

a replacement for hypothesis-driven research, but in conjunction with it. Machine learning tools 397

can bolster large-scale hypothesis generation, and they have the potential to reveal interactions, 398

structure, and mechanisms of brain and behavior. Importantly, given the dangers of spurious 399

findings or explanations void of mechanism, care must be taken to ensure the utility of such an 400

approach. It is with proper replication, validation, and hypothesis-driven confirmation that 401

machine learning analysis approaches will fulfill the great promise they hold, allowing us to 402

make greater strides toward understanding how the brain works. 403

404

Figure 1: Model for data-driven science supporting hypothesis-driven science. Within the 405

framework of hypothesis-driven science, machine learning can be used to generate hypotheses 406

to be subsequently tested. 407

Figure 2: Hold-out-trial versus out-of-sample model validation. Validation commonly 408

accomplished within animal. For example, a model might be trained on subsets of each animal’s 409

data (top, green) and tested on the remainder of data from the same animal (top, yellow). Here, 410

we propose training on a subset of the animals (bottom, green), and testing on an independent 411

set of animals (bottom, yellow). 412

Figure 3: Explainable Artificial Intelligence (XAI). Here we present a vision for leveraging 413

machine learning toward developing unified models. The criteria for models achieving XAI are 414

that they must be based on measurable brain biology and be generative, discriminative, 415

generalizable, and convergent. 416

417

Acknowledgements: Many of the themes presented in this manuscript reflect areas of 418

consensus that emerged from the National Institutes of Mental Health sponsored Explainable 419

13

Artificial Intelligence meeting on November 10th, 2017 in Washington, DC. This meeting was co-420

organized by Michele Ferrante, G. Sapiro, and H.S. Mayberg. We would like to thank S. Hakimi, 421

S.D. Mague, R.A. Adcock, N.M. Gallagher, A. Talbot, S. Burwell, and R. Hultman for comments 422

on this manuscript. Our work was supported by NIMH grant R01MH099192-05S2 to KD, NSF 423

GRFP to MTV, and Duke Katherine Goodman Stern Fellowship to MTV. 424

425 References 426

Anderson, David J., and Pietro Perona. 2014. “Toward a Science of Computational Ethology.” 427 Neuron 84 (1). Elsevier:18–31. 428

Andersson, Jesper L. R., Mark Jenkinson, and Stephen Smith. 2007. “Non-Linear Registration Aka 429 Spatial Normalisation.” FMRIB Technical Report TR07JA2. 430 https://www.fmrib.ox.ac.uk/datasets/techrep/tr07ja2/tr07ja2.pdf. 431

Armbruster, Blaine N., Xiang Li, Mark H. Pausch, Stefan Herlitze, and Bryan L. Roth. 2007. 432 “Evolving the Lock to Fit the Key to Create a Family of G Protein-Coupled Receptors Potently 433 Activated by an Inert Ligand.” Proceedings of the National Academy of Sciences of the United 434 States of America 104 (12):5163–68. 435

Bargmann, C., W. Newsome, A. Anderson, E. Brown - National Institutes of Health …, and 2014. 436 2014. “BRAIN 2025: A Scientific Vision. Brain Research through Advancing Neurotechnologies 437 (BRAIN) Working Group Report to the Advisory Committee to the ….” 438

Bellman, Richard. 1957. “A Markovian Decision Process.” Journal of Mathematics and Mechanics 6 439 (5). Indiana University Mathematics Department:679–84. 440

———. 2010. Dynamic Programming. Princeton, NJ, USA: Princeton University Press. 441 Box, G. E. P. 1979. “Robustness in the Strategy of Scientific Model Building.” MRC-TSR-1954. 442

WISCONSIN UNIV-MADISON MATHEMATICS RESEARCH CENTER, WISCONSIN UNIV-443 MADISON MATHEMATICS RESEARCH CENTER. 444 http://www.dtic.mil/docs/citations/ADA070213. 445

Boyden, Edward S., Feng Zhang, Ernst Bamberg, Georg Nagel, and Karl Deisseroth. 2005. 446 “Millisecond-Timescale, Genetically Targeted Optical Control of Neural Activity.” Nature 447 Neuroscience 8 (9):1263–68. 448

Bush, R. R., and F. Mosteller. 1951. “A Mathematical Model for Simple Learning.” Psychological 449 Review 58 (5):313–23. 450

Calhoun, Vince D., Robyn Miller, Godfrey Pearlson, and Tulay Adalı. 2014. “The Chronnectome: 451 Time-Varying Connectivity Networks as the next Frontier in fMRI Data Discovery.” Neuron 84 452 (2):262–74. 453

Carlson, David, Lisa K. David, Neil M. Gallagher, Mai-Anh T. Vu, Matthew Shirley, Rainbo Hultman, 454 Joyce Wang, et al. 2017. “Dynamically Timed Stimulation of Corticolimbic Circuitry Activates a 455 Stress-Compensatory Pathway.” Biological Psychiatry 0 (0). 456 https://doi.org/10.1016/j.biopsych.2017.06.008. 457

Carlson, Thomas, Erin Goddard, David M. Kaplan, Colin Klein, and J. Brendan Ritchie. 2017. 458 “Ghosts in Machine Learning for Cognitive Neuroscience: Moving from Data to Theory.” 459 NeuroImage, August. https://doi.org/10.1016/j.neuroimage.2017.08.019. 460

Chang, Luke J., Peter J. Gianaros, Stephen B. Manuck, Anjali Krishnan, and Tor D. Wager. 2015. “A 461 Sensitive and Specific Neural Signature for Picture-Induced Negative Affect.” PLoS Biology 13 462 (6):e1002180. 463

Craddock, R. Cameron, G. Andrew James, Paul E. Holtzheimer, Xiaoping P. Hu, and Helen S. 464 Mayberg. 2012. “A Whole Brain fMRI Atlas Generated via Spatially Constrained Spectral 465 Clustering.” Human Brain Mapping 33 (8):1914–28. 466

14

Drysdale, Andrew T., Logan Grosenick, Jonathan Downar, Katharine Dunlop, Farrokh Mansouri, 467 Yue Meng, Robert N. Fetcho, et al. 2017. “Resting-State Connectivity Biomarkers Define 468 Neurophysiological Subtypes of Depression.” Nature Medicine 23 (1):28–38. 469

Fischl, Bruce, André van der Kouwe, Christophe Destrieux, Eric Halgren, Florent Ségonne, David H. 470 Salat, Evelina Busa, et al. 2004. “Automatically Parcellating the Human Cerebral Cortex.” 471 Cerebral Cortex 14 (1). Oxford University Press:11–22. 472

“Focus on Big Data.” 2014. Nature Neuroscience 17 (11):1429–1429. 473 Gershman, Samuel J., and Nathaniel D. Daw. 2017. “Reinforcement Learning and Episodic Memory 474

in Humans and Animals: An Integrative Framework.” Annual Review of Psychology 68 (1):101–475 28. 476

Gorgolewski, Krzysztof J., Tibor Auer, Vince D. Calhoun, R. Cameron Craddock, Samir Das, Eugene 477 P. Duff, Guillaume Flandin, et al. 2016. “The Brain Imaging Data Structure, a Format for 478 Organizing and Describing Outputs of Neuroimaging Experiments.” Scientific Data 3 479 (June):160044. 480

Hong, Weizhe, Ann Kennedy, Xavier P. Burgos-Artizzu, Moriel Zelikowsky, Santiago G. Navonne, 481 Pietro Perona, and David J. Anderson. 2015. “Automated Measurement of Mouse Social 482 Behaviors Using Depth Sensing, Video Tracking, and Machine Learning.” Proceedings of the 483 National Academy of Sciences of the United States of America 112 (38):E5351–60. 484

Hultman, Rainbo, Stephen D. Mague, Qiang Li, Brittany M. Katz, Nadine Michel, Lizhen Lin, Joyce 485 Wang, et al. 2016. “Dysregulation of Prefrontal Cortex-Mediated Slow-Evolving Limbic 486 Dynamics Drives Stress-Induced Emotional Pathology.” Neuron 91 (2):439–52. 487

Jain, Viren, H. Sebastian Seung, and Srinivas C. Turaga. 2010. “Machines That Learn to Segment 488 Images: A Crucial Technology for Connectomics.” Current Opinion in Neurobiology 20 (5):653–489 66. 490

Jenkinson, Mark, Peter Bannister, Michael Brady, and Stephen Smith. 2002. “Improved Optimization 491 for the Robust and Accurate Linear Registration and Motion Correction of Brain Images.” 492 NeuroImage 17 (2):825–41. 493

Jenkinson, Mark, and Stephen Smith. 2001. “A Global Optimisation Method for Robust Affine 494 Registration of Brain Images.” Medical Image Analysis 5 (2):143–56. 495

Jung, Tzyy-Ping, Scott Makeig, Martin J. McKeown, Anthony J. Bell, Te-Won Lee, and Terrence J. 496 Sejnowski. 2001. “Imaging Brain Dynamics Using Independent Component Analysis.” 497 Proceedings of the IEEE. Institute of Electrical and Electronics Engineers 89 (7):1107–22. 498

Larrañaga, Pedro, Borja Calvo, Roberto Santana, Concha Bielza, Josu Galdiano, Iñaki Inza, José A. 499 Lozano, et al. 2006. “Machine Learning in Bioinformatics.” Briefings in Bioinformatics 7 (1):86–500 112. 501

Lemm, Steven, Benjamin Blankertz, Thorsten Dickhaus, and Klaus-Robert Müller. 2011. 502 “Introduction to Machine Learning for Brain Imaging.” NeuroImage 56 (2):387–99. 503

Levin-Schwartz, Yuri, Vince D. Calhoun, and Tulay Adali. 2017. “Quantifying the Interaction and 504 Contribution of Multiple Datasets in Fusion: Application to the Detection of Schizophrenia.” IEEE 505 Transactions on Medical Imaging 36 (7):1385–95. 506

Libbrecht, Maxwell W., and William Stafford Noble. 2015. “Machine Learning Applications in 507 Genetics and Genomics.” Nature Reviews. Genetics 16 (6):321–32. 508

Mackintosh, N. J. 1975. “A Theory of Attention: Variations in the Associability of Stimuli with 509 Reinforcement.” Psychological Review 82 (4):276–98. 510

Marblestone, Adam H., Greg Wayne, and Konrad P. Kording. 2016. “Toward an Integration of Deep 511 Learning and Neuroscience.” Frontiers in Computational Neuroscience 10. 512 https://doi.org/10.3389/fncom.2016.00094. 513

McCulloch, W. S., and W. Pitts. 1990. “A Logical Calculus of the Ideas Immanent in Nervous Activity. 514 1943.” Bulletin of Mathematical Biology 52 (1-2):99–115; discussion 73–97. 515

Niv, Yael. 2009. “Reinforcement Learning in the Brain.” Journal of Mathematical Psychology 53 516 (3):139–54. 517

15

O’Donnell, Lauren J., Yannick Suter, Laura Rigolo, Pegah Kahali, Fan Zhang, Isaiah Norton, Angela 518 Albi, et al. 2017. “Automated White Matter Fiber Tract Identification in Patients with Brain 519 Tumors.” NeuroImage. Clinical 13:138–53. 520

Paul, Anirban, Megan Crow, Ricardo Raudales, Miao He, Jesse Gillis, and Z. Josh Huang. 2017. 521 “Transcriptional Architecture of Synaptic Communication Delineates GABAergic Neuron 522 Identity.” Cell 171 (3):522–39.e20. 523

Pearce, J. M., and G. Hall. 1980. “A Model for Pavlovian Learning: Variations in the Effectiveness of 524 Conditioned but Not of Unconditioned Stimuli.” Psychological Review 87 (6):532–52. 525

Philip, Noah S., Brent G. Nelson, Flavio Frohlich, Kelvin O. Lim, Alik S. Widge, and Linda L. 526 Carpenter. 2017. “Low-Intensity Transcranial Current Stimulation in Psychiatry.” The American 527 Journal of Psychiatry 174 (7):628–39. 528

Poldrack, Russell A., Deanna M. Barch, Jason P. Mitchell, Tor D. Wager, Anthony D. Wagner, 529 Joseph T. Devlin, Chad Cumba, Oluwasanmi Koyejo, and Michael P. Milham. 2013. “Toward 530 Open Sharing of Task-Based fMRI Data: The OpenfMRI Project.” Frontiers in Neuroinformatics 531 7 (July):12. 532

Poldrack, Russell A., and Krzysztof J. Gorgolewski. 2014. “Making Big Data Open: Data Sharing in 533 Neuroimaging.” Nature Neuroscience 17 (11):1510–17. 534

Rescorla, R. A., and Allan Wagner. 1972. A Theory of Pavlovian Conditioning: Variations in the 535 Effectiveness of Reinforcement and Nonreinforcement. Vol. 2. 536

“RFA-MH-17-250: BRAIN Initiative Fellows: Ruth L. Kirschstein National Research Service Award 537 (NRSA) Individual Postdoctoral Fellowship (F32).” n.d. Accessed November 24, 2017. 538 https://grants.nih.gov/grants/guide/rfa-files/RFA-MH-17-250.html. 539

Samuel, A. L. 1959. “Some Studies in Machine Learning Using the Game of Checkers.” IBM Journal 540 of Research and Development 3 (3):210–29. 541

Schuck, Nicolas W., Robert Gaschler, Dorit Wenke, Jakob Heinzle, Peter A. Frensch, John-Dylan 542 Haynes, and Carlo Reverberi. 2015. “Medial Prefrontal Cortex Predicts Internally Driven 543 Strategy Shifts.” Neuron 86 (1):331–40. 544

Schultz, W., P. Dayan, and P. R. Montague. 1997. “A Neural Substrate of Prediction and Reward.” 545 Science 275 (5306):1593–99. 546

Sejnowski, Terrence J., Patricia S. Churchland, and J. Anthony Movshon. 2014. “Putting Big Data to 547 Good Use in Neuroscience.” Nature Neuroscience 17 (11):1440–41. 548

Smart, Otis Lkuwamy, Vineet Ravi Tiruvadi, and Helen S. Mayberg. 2015. “Multimodal Approaches 549 to Define Network Oscillations in Depression.” Biological Psychiatry 77 (12):1061–70. 550

Spellman, Timothy, Mattia Rigotti, Susanne E. Ahmari, Stefano Fusi, Joseph A. Gogos, and Joshua 551 A. Gordon. 2015. “Hippocampal-Prefrontal Input Supports Spatial Encoding in Working 552 Memory.” Nature 522 (7556):309–14. 553

Sutton, Richard S., and Andrew G. Barto. 1998. Reinforcement Learning: An Introduction. Vol. 1. 554 MIT press Cambridge. 555

Thomas, Christopher G., Richard A. Harshman, and Ravi S. Menon. 2002. “Noise Reduction in 556 BOLD-Based fMRI Using Component Analysis.” NeuroImage 17 (3):1521–37. 557

Wegrzyn, Martin, Isabelle Bruckhaus, and Johanna Kissler. 2015. “Categorical Perception of Fear 558 and Anger Expressions in Whole, Masked and Composite Faces.” PloS One 10 (8):e0134790. 559

Whitmore, Nathan W., and Shih-Chieh Lin. 2016. “Unmasking Local Activity within Local Field 560 Potentials (LFPs) by Removing Distal Electrical Signals Using Independent Component 561 Analysis.” NeuroImage 132 (May):79–92. 562

Widge, Alik S., Kristen K. Ellard, Angelique C. Paulk, Ishita Basu, Ali Yousefi, Samuel Zorowitz, 563 Anna Gilmour, et al. 2017. “Treating Refractory Mental Illness with Closed-Loop Brain 564 Stimulation: Progress towards a Patient-Specific Transdiagnostic Approach.” Experimental 565 Neurology 287 (Pt 4):461–72. 566

Woo, Choong-Wan, Luke J. Chang, Martin A. Lindquist, and Tor D. Wager. 2017. “Building Better 567 Biomarkers: Brain Models in Translational Neuroimaging.” Nature Neuroscience 20 (3):365–77. 568

16

Zuo, Xi-Nian, Clare Kelly, Jonathan S. Adelstein, Donald F. Klein, F. Xavier Castellanos, and 569 Michael P. Milham. 2010. “Reliable Intrinsic Connectivity Networks: Test-Retest Evaluation 570 Using ICA and Dual Regression Approach.” NeuroImage 49 (3):2163–77. 571

Previous Research

Data-Driven Machine learning

Hypothesis Genera on

Perform Experiments

Analyze ResultsDraw Conclusion

Trial 1 Trial 2 Trial 3 Trial 4 Trial 5 Trial 6 Trial 7 Trial 8 Trial 9 Trial 10 Trial 11 Trial 12Animal 1Animal 2Animal 3Animal 4Animal 5Animal 6Animal 7Animal 8Animal 9Animal 10Animal 11Animal 12

Trial 1 Trial 2 Trial 3 Trial 4 Trial 5 Trial 6 Trial 7 Trial 8 Trial 9 Trial 10 Trial 11 Trial 12Animal 1Animal 2Animal 3Animal 4Animal 5Animal 6Animal 7Animal 8Animal 9Animal 10Animal 11Animal 12

Descrip ve?(explain feature

organiza on)

Predic ve?(predict system perturba ons)

Generalizable?(extrapolate across mul ple subjects)

Convergent?(extrapolate across mul ple biological

contexts)

NO

YES

Previous Research

Hypothesis-Driven

Experiments

Data-Driven Machine Learning

Feature Selec on:

Direct Brain Measurements

Model Development

XAI

a shared vision for machine learning in neuroscience€¦ · 26/01/2018 · í í a shared vision...

Documents