Machine learning and cognitive neuroimaging:new tools can answer new questions
Gaël Varoquaux
How machine learning is shaping cognitive neuroimaging[Varoquaux and Thirion 2014]
Cognitive neuroscience: linking psychology andneuroscience (neural implementations)
Vision: A computational investigation into the human representationand processing of visual information [Marr 1982]
G Varoquaux 2
Machine learning:computational statisticsfor prediction(out-of-sample properties)
Paradigm shiftthe dimensionality ofdata grows,
enables richer modelsOpen-ended questions
⇒ large # features
From parameterinference to prediction
x
y
Understanding, not predicting
Danger of solving thewrong problemLost in formalization
G Varoquaux 3
Machine learning:computational statisticsfor prediction(out-of-sample properties)
Paradigm shiftthe dimensionality ofdata grows,
enables richer modelsOpen-ended questions
⇒ large # features
From parameterinference to prediction
x
y
Understanding, not predicting
Danger of solving thewrong problemLost in formalization
G Varoquaux 3
Statistics Machine learningStatistical machine learning
Hypothesis testing PredictionT-test Tests on prediction Cross-validation
In sample Out of sample
Parametric Non-parametricNon-parametric tests Probabilistic modelingFew parameters Many parameters
Univariate MultivariateGLM 6= correlations Naive Bayes
Univariate selection
Differences mostly cultural: it’s a continuum
G Varoquaux 4
Statistics Machine learningStatistical machine learning
Hypothesis testing PredictionT-test Tests on prediction Cross-validationIn sample Out of sample
Parametric Non-parametricNon-parametric tests Probabilistic modelingFew parameters Many parameters
Univariate MultivariateGLM 6= correlations Naive Bayes
Univariate selection
Differences mostly cultural: it’s a continuum
G Varoquaux 4
Statistics Machine learningStatistical machine learning
Hypothesis testing PredictionT-test Tests on prediction Cross-validationIn sample Out of sample
Parametric Non-parametric
Non-parametric tests Probabilistic modelingFew parameters Many parameters
Univariate MultivariateGLM 6= correlations Naive Bayes
Univariate selection
Differences mostly cultural: it’s a continuum
G Varoquaux 4
Statistics Machine learningStatistical machine learning
Hypothesis testing PredictionT-test Tests on prediction Cross-validationIn sample Out of sample
Parametric Non-parametricNon-parametric tests Probabilistic modelingFew parameters Many parameters
Univariate MultivariateGLM 6= correlations Naive Bayes
Univariate selection
Differences mostly cultural: it’s a continuum
G Varoquaux 4
Statistics Machine learningStatistical machine learning
Hypothesis testing PredictionT-test Tests on prediction Cross-validationIn sample Out of sample
Parametric Non-parametricNon-parametric tests Probabilistic modelingFew parameters Many parameters
Univariate Multivariate
GLM 6= correlations Naive BayesUnivariate selection
Differences mostly cultural: it’s a continuum
G Varoquaux 4
Statistics Machine learningStatistical machine learning
Hypothesis testing PredictionT-test Tests on prediction Cross-validationIn sample Out of sample
Parametric Non-parametricNon-parametric tests Probabilistic modelingFew parameters Many parameters
Univariate MultivariateGLM 6= correlations Naive Bayes
Univariate selection
Differences mostly cultural: it’s a continuumG Varoquaux 4
Cognitive neuroimaging and machine learning
G Varoquaux 5
Cognitive neuroimaging and machine learning
Predicting the task: decodingG Varoquaux 5
Cognitive neuroimaging and machine learning
Predicting neural response: encodingG Varoquaux 5
Cognitive neuroimaging and machine learning
Unsupervised learning on brain activityG Varoquaux 5
Cognitive neuroimaging and machine learning
Unsupervised learning on behaviorG Varoquaux 5
Cognitive neuroimaging and machine learning
G Varoquaux 5
Rest of this talk
1 Encoding
2 Decoding
G Varoquaux 6
1 Encoding
Towards richer models of brain activity
G Varoquaux 7
1 Uncovering neural codingInsights on breaking down cognitive functions intoatomic steps
[Hubel and Wiesel 1962]Neurons receptive toGabors (edges)
[Logothetis... 1995]Shapes in inferiortemporal cortex
G Varoquaux 8
1 Uncovering neural codingInsights on breaking down cognitive functions intoatomic steps
[Hubel and Wiesel 1962]Neurons receptive toGabors (edges)
[Logothetis... 1995]Shapes in inferiortemporal cortex
G Varoquaux 8
1 Uncovering neural coding: richer modelsInsights on breaking down cognitive functions intoatomic steps
[Hubel and Wiesel 1962]Neurons receptive toGabors (edges)
[Logothetis... 1995]Shapes in inferiortemporal cortex
Machine learning:computer-vision models mapped to brain activity
[Yamins... 2014]G Varoquaux 8
1 Uncovering neural coding: in fMRIModel-based fMRI [O’Doherty... 2007]
[Harvey... 2013]
High-level descriptions [Mitchell... 2008]
Natural stimuly [Kay... 2008]
G Varoquaux 9
Machine learning for encoding models
Richer models of encodingcapture fine descriptions of behavior / stimuli
Require to forgo the contrast methodolgy
Is this a good or a bad thing?
G Varoquaux 10
1 Models of the visual system
Image
V1cortex
V2cortex
Inferiortemporal
cortex
Fusiformface area
Jack?
Is there a “face” region? A “foot” region? A “left big toe” region?
G Varoquaux 11
1 Uncovering neural coding: cognitive oppositionsIs there a “face” region? A “foot” region? A “left big toe” region?
vs
G Varoquaux 12
1 Uncovering neural coding: cognitive oppositionsIs there a “face” region? A “foot” region? A “left big toe” region?
vs
G Varoquaux 12
1 Uncovering neural coding: cognitive oppositionsIs there a “face” region? A “foot” region? A “left big toe” region?
vs
-G Varoquaux 12
1 Uncovering neural coding: cognitive oppositionsIs there a “face” region? A “foot” region? A “left big toe” region?
vs
-Mapping relies on cognitive subtractionBound to mental process decomposition
G Varoquaux 12
1 Decomposing visual stimuliLow-level visual cortex is tunedto natural image statistics
[Olshausen et al. 1996]
What drives high-level representations?
Convolutional Net
G Varoquaux 13
1 Decomposing visual stimuliLow-level visual cortex is tunedto natural image statistics
[Olshausen et al. 1996]
What drives high-level representations?
Convolutional Net
G Varoquaux 13
Data-driven encoding models
Image
V1cortex
V2cortex
Inferiortemporal
cortex
Fusiformface area
Jack?
[Khaligh-Razavi and Kriegeskorte 2014, Güçlü and van Gerven 2015]
FMRI beyond a handfull of contrasts⇒ Sets us free from the paradigm
G Varoquaux 14
2 Decoding
From brain activity to behavior
G Varoquaux 15
2 Increased sensitivity“Given the goal of detecting the presence of a particularmental representation in the brain, the primary advantageof MVPA methods over individual-voxel-based methods isincreased sensitivity.” — [Norman... 2006]
“However, these maps are not guaranteed to include allthe voxels that are involved in representing the categoriesof interest.” — [Norman... 2006]
G Varoquaux 16
2 Increased sensitivity
An omnibus test
“Given the goal of detecting the presence of a particularmental representation in the brain, the primary advantageof MVPA methods over individual-voxel-based methods isincreased sensitivity.” — [Norman... 2006]
Is there “information” about astimuli in a given region?
“However, these maps are not guaranteed to include allthe voxels that are involved in representing the categoriesof interest.” — [Norman... 2006]
G Varoquaux 16
2 Increased sensitivity
An omnibus test
“Given the goal of detecting the presence of a particularmental representation in the brain, the primary advantageof MVPA methods over individual-voxel-based methods isincreased sensitivity.” — [Norman... 2006]
“However, these maps are not guaranteed to include allthe voxels that are involved in representing the categoriesof interest.” — [Norman... 2006]
G Varoquaux 16
Non-linearcognitive model
Linearpredictive models
Representations
Stimuli
2 Increased sensitivity
An omnibus test
Decoding used to test / compare encoding models[Naselaris... 2011]
G Varoquaux 17
2 Generalization as a test: cross-validation
x
y
x
y
High-dimensional models
⇒ Important to test on independent data,to control for model complexity
40% 20% 10% 0% +10% +20% +40%
Leave onesample out
Leave onesubject/session
20% leftout, 3 splits
20% leftout, 10 splits
20% leftout, 50 splits
22% +19%
+3% +43%
10% +10%
21% +17%
11% +11%
24% +16%
9% +9%
24% +14%
9% +8%
23% +13%
Intrasubject
Intersubject
No silver bullet Poster 3829, Oral Th 12:45
G Varoquaux 18
2 Generalization as a test: cross-validation
x
y
x
y
High-dimensional models⇒ Important to test on independent data,
to control for model complexity
40% 20% 10% 0% +10% +20% +40%
Leave onesample out
Leave onesubject/session
20% leftout, 3 splits
20% leftout, 10 splits
20% leftout, 50 splits
22% +19%
+3% +43%
10% +10%
21% +17%
11% +11%
24% +16%
9% +9%
24% +14%
9% +8%
23% +13%
Intrasubject
Intersubject
No silver bullet Poster 3829, Oral Th 12:45
G Varoquaux 18
2 Generalization as a test: cross-validation
High-dimensional models⇒ Important to test on independent data,
to control for model complexity
40% 20% 10% 0% +10% +20% +40%
Leave onesample out
Leave onesubject/session
20% leftout, 3 splits
20% leftout, 10 splits
20% leftout, 50 splits
22% +19%
+3% +43%
10% +10%
21% +17%
11% +11%
24% +16%
9% +9%
24% +14%
9% +8%
23% +13%
Intrasubject
Intersubject
No silver bullet Poster 3829, Oral Th 12:45G Varoquaux 18
2 Behavioral predictions as a testIncrease “cognitive resolution”One voxel’s information is not enough to distinguishmany cognitive states⇒ analysis combining info across voxels
Interpreting overlapping activationsPsychology not interested in where a task iscreating activation,but if two tasks are creating activations in same areas
G Varoquaux 19
2 Behavioral predictions as a testIncrease “cognitive resolution”One voxel’s information is not enough to distinguishmany cognitive states⇒ analysis combining info across voxels
Interpreting overlapping activationsPsychology not interested in where a task iscreating activation,but if two tasks are creating activations in same areas
G Varoquaux 19
2 Inference in cognitive neuroimagingWhat is the neural support of a function?
What is function of a given brain module?
G Varoquaux 20
2 Inference in cognitive neuroimagingWhat is the neural support of a function?
What is function of a given brain module?
Brain mapping = task-evoked activity
+ crafting “contrasts” to isolate effects
G Varoquaux 20
2 Inference in cognitive neuroimaging
[Poldrack 2006, Henson 2006]
What is the neural support of a function?
What is function of a given brain module?Reverse inference
Brain mapping = task-evoked activity+ crafting “contrasts” to isolate effects
G Varoquaux 20
2 Inference in cognitive neuroimaging
[Kanwisher... 1997, Gauthier... 2000, Hanson and Halchenko 2008]
What is the neural support of a function?
What is function of a given brain module?Reverse inference
Is there a face area?
G Varoquaux 20
2 Inference in cognitive neuroimaging
[Poldrack... 2009, Schwartz... 2013]
What is the neural support of a function?
What is function of a given brain module?Reverse inference
Decoding: Find regions thatpredict observed cognition
G Varoquaux 20
2 Decoding for reverse inference
[Poldrack... 2009, Schwartz... 2013]
Prediction = proxy for implication
Need large cognitive coverage
Interpretation of the “grandmother neuron”“more than a neuron re-sponds to one concept and[...] neurons do not neces-sarily respond to only oneconcept are given by thedata itself[Quian Quiroga and Kreiman 2010]
G Varoquaux 21
2 Decoding for reverse inference
[Poldrack... 2009, Schwartz... 2013]
Prediction = proxy for implication
Need large cognitive coverage
Interpretation of the “grandmother neuron”“more than a neuron re-sponds to one concept and[...] neurons do not neces-sarily respond to only oneconcept are given by thedata itself[Quian Quiroga and Kreiman 2010]
G Varoquaux 21
2 Brain decoding with linear models
Designmatrix × Coefficients =
Coefficients arebrain maps
Target
G Varoquaux 22
2 Brain decoding to recover predictive regions?Face vs house visual recognition [Haxby... 2001]
SVMerror: 26%
G Varoquaux 23
2 Brain decoding to recover predictive regions?Face vs house visual recognition [Haxby... 2001]
Sparse modelerror: 19%
G Varoquaux 23
2 Brain decoding to recover predictive regions?Face vs house visual recognition [Haxby... 2001]
Ridgeerror: 15%
Best predictor outlines the worst regionsBest maps predict worst
G Varoquaux 23
2 Decoders as estimators [Gramfort... 2013]
Inverse problemMinimize the error term:
w = argminw
l(y− Xw)
Ill-posed:Many different w will givethe same prediction error
Choice driven by (implicit) priors of the decoder
SVM sparse ridge TV-`1
Inferences rely, explicitely or implicitely,on the regions estimated by the decoder
G Varoquaux 24
2 Decoders as estimators [Gramfort... 2013]
Inverse problemMinimize the error term:
w = argminw
l(y− Xw)
Ill-posed:Many different w will givethe same prediction error
Choice driven by (implicit) priors of the decoder
SVM sparse ridge TV-`1
Inferences rely, explicitely or implicitely,on the regions estimated by the decoder
G Varoquaux 24
Wrapping up
G Varoquaux 25
@GaelVaroquaux
Machine learning for cognitive neuroimaging
The description of cognition is hard ⇒ EncodingRich models depend less on paradigms
Decoding as an omnibus testDecoding for reverse inferenceEstimation of predictive regions is difficultSoftware: nilearn
In Pythonhttp://nilearn.github.io
ni[Varoquaux and Thirion 2014]How machine learning isshaping cognitive neuroimaging
@GaelVaroquaux
Machine learning for cognitive neuroimaging
The description of cognition is hard ⇒ EncodingDecoding as an omnibus test
For rich encoding modelsTo interpret overlaping activation
Cross-validation error bars
Decoding for reverse inferenceEstimation of predictive regions is difficultSoftware: nilearn
In Pythonhttp://nilearn.github.io
ni[Varoquaux and Thirion 2014]How machine learning isshaping cognitive neuroimaging
@GaelVaroquaux
Machine learning for cognitive neuroimaging
The description of cognition is hard ⇒ EncodingDecoding as an omnibus testDecoding for reverse inference
Requires large cognitive coverage
Estimation of predictive regions is difficultSoftware: nilearn
In Pythonhttp://nilearn.github.io
ni[Varoquaux and Thirion 2014]How machine learning isshaping cognitive neuroimaging
@GaelVaroquaux
Machine learning for cognitive neuroimaging
The description of cognition is hard ⇒ EncodingDecoding as an omnibus testDecoding for reverse inferenceEstimation of predictive regions is difficult
Infinite number of maps predict as well
Software: nilearnIn Python
http://nilearn.github.io
ni[Varoquaux and Thirion 2014]How machine learning isshaping cognitive neuroimaging
@GaelVaroquaux
Machine learning for cognitive neuroimaging
The description of cognition is hard ⇒ EncodingDecoding as an omnibus testDecoding for reverse inferenceEstimation of predictive regions is difficultSoftware: nilearn
In Pythonhttp://nilearn.github.io
ni[Varoquaux and Thirion 2014]How machine learning isshaping cognitive neuroimaging
References I
I. Gauthier, M. J. Tarr, J. Moylan, P. Skudlarski, J. C. Gore, andA. W. Anderson. The fusiform “face area” is part of a networkthat processes faces at the individual level. J cognitiveneuroscience, 12:495, 2000.
A. Gramfort, B. Thirion, and G. Varoquaux. Identifying predictiveregions from fMRI with TV-L1 prior. In PRNI, page 17, 2013.
U. Güçlü and M. A. van Gerven. Deep neural networks reveal agradient in the complexity of neural representations across theventral stream. The Journal of Neuroscience, 35(27):10005–10014, 2015.
S. J. Hanson and Y. O. Halchenko. Brain reading using full brainsupport vector machines for object recognition: there is no“face” identification area. Neural Computation, 20:486, 2008.
B. Harvey, B. Klein, N. Petridou, and S. Dumoulin. Topographicrepresentation of numerosity in the human parietal cortex.Science, 341(6150):1123–1126, 2013.
References IIJ. V. Haxby, I. M. Gobbini, M. L. Furey, ... Distributed andoverlapping representations of faces and objects in ventraltemporal cortex. Science, 293:2425, 2001.
R. Henson. Forward inference using functional neuroimaging:Dissociations versus associations. Trends in cognitive sciences,10:64, 2006.
D. H. Hubel and T. N. Wiesel. Receptive fields, binocularinteraction and functional architecture in the cat’s visual cortex.The Journal of physiology, 160:106, 1962.
N. Kanwisher, J. McDermott, and M. M. Chun. The fusiform facearea: a module in human extrastriate cortex specialized for faceperception. J Neuroscience, 17:4302, 1997.
K. N. Kay, T. Naselaris, R. J. Prenger, and J. L. Gallant.Identifying natural images from human brain activity. Nature,452:352, 2008.
References IIIS.-M. Khaligh-Razavi and N. Kriegeskorte. Deep supervised, butnot unsupervised, models may explain it cortical representation.PLoS Comput Biol, 10(11):e1003915, 2014.
N. K. Logothetis, J. Pauls, and T. Poggio. Shape representation inthe inferior temporal cortex of monkeys. Current Biology, 5:552,1995.
D. Marr. Vision: A computational investigation into the humanrepresentation and processing of visual information. The MITpress, Cambridge, 1982.
T. M. Mitchell, S. V. Shinkareva, A. Carlson, K.-M. Chang, V. L.Malave, R. A. Mason, and M. A. Just. Predicting human brainactivity associated with the meanings of nouns. science, 320:1191, 2008.
T. Naselaris, K. N. Kay, S. Nishimoto, and J. L. Gallant. Encodingand decoding in fMRI. Neuroimage, 56:400, 2011.
References IVK. A. Norman, S. M. Polyn, G. J. Detre, and J. V. Haxby. Beyondmind-reading: multi-voxel pattern analysis of fmri data. Trendsin cognitive sciences, 10:424, 2006.
J. P. O’Doherty, A. Hampton, and H. Kim. Model-based fMRI andits application to reward learning and decision making. Annals ofthe New York Academy of Sciences, 1104:35, 2007.
B. Olshausen ... Emergence of simple-cell remainsceptive fieldproperties by learning a sparse code for natural images. Nature,381:607, 1996.
R. Poldrack. Can cognitive processes be inferred fromneuroimaging data? Trends in cognitive sciences, 10:59, 2006.
R. A. Poldrack, Y. O. Halchenko, and S. J. Hanson. Decoding thelarge-scale structure of brain function by classifying mentalstates across individuals. Psychological Science, 20:1364, 2009.
References VR. Quian Quiroga and G. Kreiman. Postscript: About grandmothercells and jennifer aniston neurons. Psychological Review, 117:297, 2010.
Y. Schwartz, B. Thirion, and G. Varoquaux. Mapping cognitiveontologies to and from the brain. In NIPS, 2013.
G. Varoquaux and B. Thirion. How machine learning is shapingcognitive neuroimaging. GigaScience, 3:28, 2014.
D. L. Yamins, H. Hong, C. F. Cadieu, E. A. Solomon, D. Seibert,and J. J. DiCarlo. Performance-optimized hierarchical modelspredict neural responses in higher visual cortex. Proc Natl AcadSci, page 201403112, 2014.