concept classification with bayesian multi-task learning · 2010. 6. 15. · concept classication...

8
Proceedings of the NAACL HLT 2010 First Workshop on Computational Neurolinguistics, pages 10–17, Los Angeles, California, June 2010. c 2010 Association for Computational Linguistics Concept Classification with Bayesian Multi-task Learning Marcel van Gerven Radboud University Nijmegen Intelligent Systems Heyendaalseweg 135 6525 AJ Nijmegen, The Netherlands [email protected] Irina Simanova Max Planck Institute for Psycholinguistics Wundtlaan 1 6525 XD Nijmegen, The Netherlands [email protected] Abstract Multivariate analysis allows decoding of sin- gle trial data in individual subjects. Since dif- ferent models are obtained for each subject it becomes hard to perform an analysis on the group level. We introduce a new algorithm for Bayesian multi-task learning which imposes a coupling between single-subject models. Us- ing the CMU fMRI dataset it is shown that the algorithm can be used for concept classifica- tion based on the average activation of regions in the AAL atlas. Concepts which were most easily classified correspond to the categories shelter, manipulation and eating, which is in accordance with the literature. The multi-task learning algorithm is shown to find regions of interest that are common to all subjects which therefore facilitates interpretation of the ob- tained models. 1 Introduction Multivariate analysis allows decoding of neural rep- resentations at the single trial level in single sub- jects. Its introduction into the field of cognitive neu- roscience has led to novel insights about the neu- ral representation of cognitive functions such as lan- guage (Mitchell et al., 2008), memory (Hassabis et al., 2009), and vision (Miyawaki et al., 2008). However, interpretation of the models obtained using a multivariate analysis can be hard due to the fact that different models are obtained for indi- vidual subjects. For example, when analyzing K separately acquired datasets, K sets of model pa- rameters will be obtained which may or may not show a common pattern. In some sense, we are in need of a second-level analysis such that we can draw inferences on the group level, as in the con- ventional analysis of neuroimaging data using the general linear model. One way to achieve this in the context of multivariate analysis is by means of multi-task learning, a special case of transfer learn- ing (Thrun, 1996) where model parameters for dif- ferent tasks (datasets) are estimated simultaneously and no longer assumed to be independent (Caru- ana, 1997). In an fMRI context, multi-task learning has been explored using canonical correlation anal- ysis (Rustandi et al., 2009). In a Bayesian setting, multi-task learning is typ- ically realized by assuming a hierarchical Bayesian framework where shared prior distributions condi- tion task-specific parameters (Gelman et al., 1995). In this paper, we explore a new Bayesian approach to multi-task learning in the context of concept clas- sification; i.e., the prediction of the semantic cate- gory of concrete nouns from BOLD response. Effec- tively, we are using a shared prior to induce param- eter shrinkage. We show that Bayesian multi-task learning leads to more interpretable models, thereby facilitating the interpretation of the models obtained using multivariate analysis. 2 Bayesian multi-task learning The goal of concept classification is to predict the semantic category y of a presented (and previously unseen) concrete noun from the measured BOLD re- sponse x. In this paper, we will use Bayesian logis- tic regression as the underlying classification model. Let B(y; p)= p y (1 - p) 1-y denote the Bernoulli distribution and l(x) = log(x/(1 - x)) the logit link 10

Upload: others

Post on 14-Sep-2020

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Concept Classification with Bayesian Multi-task Learning · 2010. 6. 15. · Concept Classication with Bayesian Multi-task Learning Marcel van Gerven Radboud University Nijmegen Intelligent

Proceedings of the NAACL HLT 2010 First Workshop on Computational Neurolinguistics, pages 10–17,Los Angeles, California, June 2010. c©2010 Association for Computational Linguistics

Concept Classification with Bayesian Multi-task Learning

Marcel van GervenRadboud University Nijmegen

Intelligent SystemsHeyendaalseweg 135 6525 AJ

Nijmegen, The [email protected]

Irina SimanovaMax Planck Institute for Psycholinguistics

Wundtlaan 1 6525 XDNijmegen, The Netherlands

[email protected]

Abstract

Multivariate analysis allows decoding of sin-gle trial data in individual subjects. Since dif-ferent models are obtained for each subject itbecomes hard to perform an analysis on thegroup level. We introduce a new algorithm forBayesian multi-task learning which imposes acoupling between single-subject models. Us-ing the CMU fMRI dataset it is shown that thealgorithm can be used for concept classifica-tion based on the average activation of regionsin the AAL atlas. Concepts which were mosteasily classified correspond to the categoriesshelter, manipulation and eating, which is inaccordance with the literature. The multi-tasklearning algorithm is shown to find regions ofinterest that are common to all subjects whichtherefore facilitates interpretation of the ob-tained models.

1 Introduction

Multivariate analysis allows decoding of neural rep-resentations at the single trial level in single sub-jects. Its introduction into the field of cognitive neu-roscience has led to novel insights about the neu-ral representation of cognitive functions such as lan-guage (Mitchell et al., 2008), memory (Hassabis etal., 2009), and vision (Miyawaki et al., 2008).

However, interpretation of the models obtainedusing a multivariate analysis can be hard due tothe fact that different models are obtained for indi-vidual subjects. For example, when analyzing Kseparately acquired datasets, K sets of model pa-rameters will be obtained which may or may notshow a common pattern. In some sense, we are in

need of a second-level analysis such that we candraw inferences on the group level, as in the con-ventional analysis of neuroimaging data using thegeneral linear model. One way to achieve this inthe context of multivariate analysis is by means ofmulti-task learning, a special case of transfer learn-ing (Thrun, 1996) where model parameters for dif-ferent tasks (datasets) are estimated simultaneouslyand no longer assumed to be independent (Caru-ana, 1997). In an fMRI context, multi-task learninghas been explored using canonical correlation anal-ysis (Rustandi et al., 2009).

In a Bayesian setting, multi-task learning is typ-ically realized by assuming a hierarchical Bayesianframework where shared prior distributions condi-tion task-specific parameters (Gelman et al., 1995).In this paper, we explore a new Bayesian approachto multi-task learning in the context of concept clas-sification; i.e., the prediction of the semantic cate-gory of concrete nouns from BOLD response. Effec-tively, we are using a shared prior to induce param-eter shrinkage. We show that Bayesian multi-tasklearning leads to more interpretable models, therebyfacilitating the interpretation of the models obtainedusing multivariate analysis.

2 Bayesian multi-task learning

The goal of concept classification is to predict thesemantic category y of a presented (and previouslyunseen) concrete noun from the measured BOLD re-sponse x. In this paper, we will use Bayesian logis-tic regression as the underlying classification model.Let B(y; p) = py(1 − p)1−y denote the Bernoullidistribution and l(x) = log(x/(1−x)) the logit link

10

Page 2: Concept Classification with Bayesian Multi-task Learning · 2010. 6. 15. · Concept Classication with Bayesian Multi-task Learning Marcel van Gerven Radboud University Nijmegen Intelligent

s=0

β1

β 2

−2 −1 0 1 2−2−1

012

s=10

β1

β 2

−2 −1 0 1 2−2−1

012

s=100

β1

β 2

−2 −1 0 1 2−2−1

012

Figure 1: Contour plots of samples drawn from the prior for two regression coefficients β1 and β2 given three differentvalues of the coupling strength s. For uncoupled covariates, the magnitude of one covariate has no influence on themagnitude of the other covariate. For strongly coupled covariates, in contrast, a large magnitude of one covariateincreases the probability of a large magnitude in the other covariate.

function. We are interested in the following predic-tive density:

p(y | x,D,Θ) =∫B(y; l−1(xT β))p(β | D,Θ)dβ

where we integrate out the regression coefficients βand condition on the response x, observed trainingdata D = (y,X) and hyper-parameters Θ. UsingBayes rule, we can write the second term on the righthand side as

p(β | D,Θ) ∝ p(D | β)p(β | Θ) (1)

where

p(D | β) =∏n

B(yn; l−1(xTnβ))

is the likelihood term, which does not depend on thehyper-parameters Θ, and p(β | Θ) is the prior onthe regression coefficients.

Let N (x; µ,Θ) denote a multivariate Gaussianwith mean µ and covariance matrix Θ. In order tocouple the tasks in a multi-task problem, we will usethe multivariate Laplace prior, which can be writtenas a scale-mixture using auxiliary variables u andv (van Gerven et al., 2010):

p(β | Θ) =∫ (∏

k

N (βk; 0, u2k + v2

k)

)×N (u; 0,Θ)N (v; 0,Θ)du dv

The multivariate Laplace prior allows one to con-trol the prior variance of the regression coefficientsβ through the covariance matrix Θ of the auxiliaryvariables u and v. This covariance matrix is conve-niently specified in terms of the precision matrix:

Θ−1 =1θVRV.

Here, θ is a scale parameter which controls regu-larization of the regression coefficients towards zeroand R is a structure matrix where rij = −s speci-fies a fixed coupling strength s between covariate iand covariate j. A negative rij penalizes differencesbetween covariates i and j, see van Gerven et al.(2010) for details. V is a scaling matrix whose solepurpose is to ensure that the prior variance of theauxiliary variables is independent of the couplingstrength.1 Figure 1 shows the multivariate Laplaceprior for two covariates and three different couplingstrengths.

The specification of the prior in terms of θ andR promotes sparse solutions and allows the inclu-sion of prior knowledge about the relation betweencovariates. The posterior marginals for the latentvariables (β,u,v) can be approximated using ex-pectation propagation (Minka, 2001) and the poste-rior variance of the auxiliary variables ui (or vi bysymmetry) can be interpreted as a measure of im-

1V is a matrix withp

diag(R−1) on the diagonal.

11

Page 3: Concept Classification with Bayesian Multi-task Learning · 2010. 6. 15. · Concept Classication with Bayesian Multi-task Learning Marcel van Gerven Radboud University Nijmegen Intelligent

portance of the corresponding covariate xi since iteventually determines how large the regression co-efficients βi can become.

Interpretation becomes complicated whenever wehave collected multiple datasets for the same tasksince each corresponding model may give differ-ent results regarding the importance of the co-variates used when solving the classification prob-lem. Multi-task learning presents a solution to thisproblem by dropping the assumption that datasets{D1, . . . ,DK} are independent. Here, this is eas-ily realized using the multivariate Laplace prior byworking with the augmented dataset

D∗ =

y1

y2...

yK

,

X1 0 0 00 X2 0 0

0 0. . . 0

0 0 0 XK

and by assuming that each covariate is coupled be-tween datasets. I.e., the structure matrix is given byelements

rij =

−s if i 6= j and

(i− j) modP = 01 + (K − 1) · s if i = j

0 otherwise

where P stands for the number of covariates in eachdataset. In this way, we have coupled covariates overdatasets with coupling strength s. Note that this cou-pling is realized on the level of the auxiliary vari-ables and not on the regression coefficients. Hence,coupled auxiliary variables control the magnitude ofthe regression coefficients β but the β’s themselvescan still be different for the individual subjects.

3 Experiments

In order to test our approach to Bayesian multi-tasklearning for concept classification we have madeuse of the CMU fMRI dataset2, which consists ofsixty concrete concepts in twelve categories. Thedataset was collected while nine English speakerswere presented with sixty line drawings of objectswith text labels and were instructed to think of thesame properties of the stimulus object consistentlyduring each presentation. For each concept there are

2http://www.cs.cmu.edu/∼tom/science2008

six instances per subject for which BOLD responsein multiple voxels was measured.

In our experiments we assessed whether previ-ously unseen concepts from two different categories(e.g., building-tool) can be classified correctly basedon measured BOLD response. To this end, all con-cepts belonging to two out of the twelve semanticcategories were selected. Subsequently, we traineda classifier on all concepts belonging to these twocategories save one. The semantic category of thesix instances of the left-out concept were then pre-dicted using the trained classifier. This procedurewas repeated for each of the concepts and classifi-cation performance was averaged over all concepts.This performance was computed for all of the 66possible category pairs.

In order to determine the effect of multi-tasklearning, results were obtained when assuming nocoupling between datasets (s = 0) as well as whenassuming a very strong coupling between datasets(s = 100). The scale parameter was fixed to θ =1. In order to allow the coupling to be made, alldatasets are required to contain the same features.One way to achieve this is to warp the data for eachsubject from native space to normalized space and toperform the multi-task learning in normalized space.Here, in contrast, we computed the average activa-tion in 116 predefined regions of interest (ROIs) us-ing the AAL atlas (Tzourio-Mazoyer et al., 2002).ROI activations were used as input to the classifier.This considerably reduces computational overheadsince we need to couple just 116 ROIs instead ofapproximately 20000 voxels between all nine sub-jects.3 Furthermore, it facilitates interpretation sinceresults can be analyzed at the ROI level instead of atthe single voxel level. Of course, this presupposesthat category-specific information is captured by theaverage activation in predefined ROIs, which is animportant open question we set out to answer withour experiments.

4 Results

4.1 Classification of category pairs

We achieved good classification performance formany of the category pairs both with and with-

3The efficiency of our algorithm depends on the sparsenessof the structure matrix R.

12

Page 4: Concept Classification with Bayesian Multi-task Learning · 2010. 6. 15. · Concept Classication with Bayesian Multi-task Learning Marcel van Gerven Radboud University Nijmegen Intelligent

anim

al

body

part

build

ing

build

part

clot

hing

furn

iture

inse

ct

kitc

hen

man

mad

eto

ol

vege

tabl

eve

hicl

e

animal

bodypart

building

buildpart

clothing

furniture

insect

kitchen

manmade

tool

vegetable

vehicle0.5

0.55

0.6

0.65

0.7

0.75

Figure 2: Accuracies for concept classification of the 66category pairs. The upper triangular part shows the re-sults of multi-task learning whereas the lower triangularpart shows the results of standard classification. Non-significant outcomes have been masked (Wilcoxon ranksum test on outcomes for all nine subjects, p=0.05, Bon-ferroni corrected).

out multi-task learning. Figure 2 shows these re-sults where non-significant outcomes have beenmasked. Interestingly, outcomes for all subjectsshowed a preference for particular category pairs.The concepts from building-tool, building-kitchenand buildpart-tool had the highest mean classifica-tion accuracies (proportion of correctly classifier tri-als) of 0.78, 0.76 and 0.74, closely followed by con-cepts from building-clothing and animal-buildpartwith a mean classification accuracy of 0.71.

This result bears a strong resemblance to the re-cent work of Just et al. (2010). The authors con-ducted a factor analysis of fMRI brain activation inresponse to presentations of written words of differ-ent categories and discovered three semantic factorswith the highest predictive potential: manipulation,eating and shelter-entry. They subsequently usedthese factors to select voxels for a features set andwere able to accurately identify the activation gen-erated by concrete word using multivariate learningmethods on the basis of selected voxels. Moreover,using the factor-related activation profiles they wereable to identify common neuronal signatures for par-ticular words across participants. The authors sug-

Table 1: Stimulus words from the semantic categoriesthat showed best classification accuracies. Superscriptsindicate the words belonging to the list of ten words withhighest factor scores in the study by Just et al. (Just,2010). We use the following abbreviations: s = shelter,m = manipulation, e = eating.

Building Buildpart Tool Kitchenapartments window chiselm glasse

barn doors hammerm knifem

houses chimney screwdriverm bottlechurchs closets pliersm cupe

igloo arch sawm spoonm

gest the revealed factors to represent major semanticdimensions that relate to the ways the human beingcan interact with an object. Although they assumethe existence of other semantic attributes that deter-mine conceptual representation, the factors shelter,manipulation and eating are proposed to be domi-nant for the particular set of nouns. It is easy to drawan analogy as the set of words used by Just and col-leagues was exactly the same as in the current study.Although the taxonomic categorization used in ourstudy does not exactly match the factor-based cate-gorization, most of the items from categories build-ing, buildpart, tool and kitchen show a strong corre-spondence with one of the semantic factors and arelisted among ten words with highest factor scoresaccording to Just et al. (2010) (see Table 1).

The subsets of items that are set far apart in thesuggested semantic dimensions appear to be pre-ferred by the classifier in our study. The classifierwas not able to identify the category of an unseenconcept in pairs building-buildpart and tool-kitchen,possibly since they these categories shared the samesemantic features. Thus, the current study brings anindependent corroboration for the finding on the se-mantic dimensions underlying concrete noun repre-sentation.

4.2 Single versus multi-task learning

The use of AAL regions instead of native voxel ac-tivity patterns allowed efficient multi-task learningby coupling each region between nine subjects. Re-liable classification accuracies were obtained for allthe participants, although there were strong differ-ences in individual performances (Fig. 3). The move

13

Page 5: Concept Classification with Bayesian Multi-task Learning · 2010. 6. 15. · Concept Classication with Bayesian Multi-task Learning Marcel van Gerven Radboud University Nijmegen Intelligent

1 2 3 4 5 6 7 8 90.5

0.55

0.6

0.65

0.7

0.75

0.8

subject

accu

racy

standardmulti-task

Figure 3: Classification performance per subject aver-aged over all category pairs for standard classification andmulti-task learning (error bars show standard error of themean).

to multi-task learning seems to improve classifica-tion results slightly in most of subjects, although theimprovement is not significant.

The main outcome and advantage of our approachto multi-task learning is the convergence of modelsobtained from different subjects. Figure 4 showsthat the subject-specific models become stronglycorrelated when they are obtained in the multi-tasksetting, even for weak coupling strengths. For strongcoupling strengths, the models are almost perfectlycorrelated, resulting in identical models for all thenine subjects as shown in Fig. 4 for the category pairbuilding-tool. It is important to realize here that themodel is defined in terms of the variance of the aux-iliary variables, which acts as a proxy to the impor-tance of a region. At the level of the regression coef-ficients β, the model will still find subject-specificparameters due to the likelihood term in Eq. (1).Even though the contribution of each brain regionis constrained by the induced coupling, this doesnot impede but rather improve classification perfor-mance. This fact entitles us to believe that our ap-proach to multi-task learning tracks down the com-mon task-specific activations while ignoring back-ground noise.

Our study demonstrates that Bayesian multi-tasklearning allows generalization across subjects. Ouralgorithm identifies identical cortical locations asbeing important in solving the classification prob-lem for all individuals within the group. The iden-tified regions agree with previously published re-

sults on concept encoding. For example, the re-gions which were considered important for the cat-egory pair building-tool (Fig. 5) are almost indis-tinguishable from those described in a recent studyby Shinkareva et al. (2008). These are regions thatare traditionally considered to be involved in read-ing, objects meaning retrieval and visual semantictasks (Vandenberghe et al., 1996; Phillips et al.,2002).

Strikingly, very similar regions were picked bythe classifier for the other two category pairs withhigh classification accuracy, i.e., building-kitchenand buildpart-tool. This fact brings back the issueabout the semantic factors relevant for the discrimi-nation of the entities from these categories. The fac-tors shelter, manipulation and eating are associatedwith the concepts from the first three addressed cat-egory pairs. The locations of voxel clusters associ-ated with the semantic factors in (Just et al., 2010)match the brain regions that contributed to the clas-sification for the three most optimal pairs in our ex-periment. In the Just et al. study these were leftand right fusiform gyri, left and right precuneus andleft inferior temporal gyrus for shelter, left supra-marginal gyrus, left postcentral gyrus and left infe-rior temporal gyrus for manipulation and left inferiorfrontal gyrus, left middle/inferior frontal gyri, andleft inferior temporal gyrus for eating. The occipitallobes detected exclusively in our experiment mightbe explained by the fact that in our experiment thesubjects were viewing picture-text pairs in contrastto only text in (Just et al., 2010).

5 Discussion

We have demonstrated that Bayesian multi-tasklearning can be realized through Bayesian logisticregression when using a multivariate Laplace priorthat couples features between multiple datasets.This approach has not been used before and yieldspromising results. As such it complements otherBayesian and non-Bayesian approaches to multi-task learning such as those reported in (Yu et al.,2005; Dunson et al., 2008; Argyriou et al., 2008; vanGerven et al., 2009; Obozinski et al., 2009; Rustandiet al., 2009).

Results show that many category pairs can beclassified based on the average activation of regions

14

Page 6: Concept Classification with Bayesian Multi-task Learning · 2010. 6. 15. · Concept Classication with Bayesian Multi-task Learning Marcel van Gerven Radboud University Nijmegen Intelligent

1 2 3 4 5 6 7 8 9

1

2

3

4

5

6

7

8

9 0

0.5

0.9

1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 9

−2

−1

0

1

2

3

4

5

6

7

No coupling (s = 0)

AAL

brai

n re

gion

Subj

ect

Subject

A CStrong coupling (s = 100)

No coupling (s = 0)

1 2 3 4 5 6 7 8 9

1

2

3

4

5

6

7

8

9 0.86

0.92

0.98

Subject

Subj

ect

Weak coupling (s = 1)B

Figure 4: Correlation matrices for subject-specific models for standard classification (A) and multi-task learning (B)with weak coupling (s=1) for building versus tool. The right panel (C) shows the difference between the obtainedmodels for standard classification and strong coupling (s=100) for the thirty most important AAL regions.

in the AAL template. Although obtained accura-cies are lower than those which would have been ob-tained using single-voxel activations, it is interestingin its own right that the activation in just 116 pre-defined regions still allows concept decoding. How-ever, it remains an open question to what extent clas-sifiability truly reflects semantic processing insteadof sensory processing of words and/or pictures.

The coupling induced by multi-task learning leadsto interpretable models when using auxiliary vari-able variance as a measure of importance. The ob-tained models for the pairs which were easiest toclassify corresponded well to the results reportedin (Shinkareva et al., 2008) and mapped nicely ontothe semantic features shelter, manipulation and eat-ing identified in (Just et al., 2010).

In this paper we used the multivariate Laplace

prior to induce a coupling between tasks. It isstraightforward to combine this with other couplingconstraints such as coupling nearby regions withinsubjects. Our algorithm also does not precludemulti-task learning on thousands of voxels. Com-putation time depends on the number of non-zerosin the structure matrix R and matrices containinghundreds of thousands of non-zero elements are stillmanageable with computation time being in the or-der of hours.

Another interesting application of multi-tasklearning in the context of concept learning is to cou-ple the datasets of all condition pairs within a sub-ject. This effectively tries to find a model where usedregions of interest can predict multiple conditionpairs. The correlation structure between the modelsfor each condition pair then informs about their sim-

15

Page 7: Concept Classification with Bayesian Multi-task Learning · 2010. 6. 15. · Concept Classication with Bayesian Multi-task Learning Marcel van Gerven Radboud University Nijmegen Intelligent

Figure 5: The brain regions contributing to the identification of building versus tool categories.

ilarity. An interesting direction for future researchis to perform multi-task learning on the level of thesemantic features that define a concept instead of onthe concepts themselves. If we are able to predict thesemantic features reliably then we may be able topredict previously unseen concepts from their con-stituent features (Palatucci et al., 2009).

References

A. Argyriou, T Evgeniou, and M. Pontil. 2008. Con-vex multi-task feature learning. Machine Learning,73(3):243–272.

R. Caruana. 1997. Multitask learning. Machine Learn-ing, 28(1):41–75.

D. Dunson, Y. Xue, and L. Carin. 2008. The ma-trix stick-breaking process: flexible Bayes meta anal-ysis. Journal of the American Statistical Association,103(481):317–327.

A. Gelman, J. B. Carlin, H. S. Stern, and D. B. Rubin.

1995. Bayesian Data Analysis. Chapman and Hall,London, UK, 1st edition.

D. Hassabis, C. Chu, G. Rees, N. Weiskopf, P. D.Molyneux, and E. A. Maguire. 2009. Decoding neu-ronal ensembles in the human hippocampus. CurrentBiology, 19:546–554.

M. A. Just, V. L. Cherkassky, S. Aryal, and T. M.Mitchell. 2010. A neurosemantic theory of concretenoun representation based on the underlying braincodes. PLoS ONE, 5(1):e8622.

T. Minka. 2001. Expectation propagation for approxi-mate Bayesian inference. In J. Breese and D. Koller,editors, Proceedings of the Seventeenth Conference onUncertainty in Artificial Intelligence, pages 362–369.Morgan Kaufmann.

T. M. Mitchell, S. V. Shinkareva, A. Carlson, K.-M.Chang, V. L. Malave, R. A. Mason, and M. A. Just.2008. Predicting human brain activity associated withthe meanings of nouns. Science, 320(5880):1191–1195.

Y. Miyawaki, H. Uchida, O. Yamashita, M. Sato,Y. Morito, H. C. Tanabe, N. Sadato, and Y. Kamitani.

16

Page 8: Concept Classification with Bayesian Multi-task Learning · 2010. 6. 15. · Concept Classication with Bayesian Multi-task Learning Marcel van Gerven Radboud University Nijmegen Intelligent

2008. Visual image reconstruction from human brainactivity using a combination of multiscale local imagedecoders. Neuron, 60(5):915–929.

G. Obozinski, B. Taskar, and M. I. Jordan. 2009. Jointcovariate selection and joint subspace selection formultiple classification problems. In Statistics andComputing. Springer.

M. Palatucci, D. Pomerleau, G. Hinton, and T. Mitchell.2009. Zero-shot learning with semantic output codes.In Y. Bengio, D. Schuurmans, J. Lafferty, C. K. I.Williams, and A. Culotta, editors, Neural InformationProcessing Systems, pages 1410–1418.

J. A. Phillips, U. Noppeney, G. W. Humphreys, and C. J.Price. 2002. Can segregation within the semanticsystem account for category-specific deficits? Brain,125(9):2067–2080.

I. Rustandi, M. A. Just, and T. M. Mitchell. 2009. Inte-grating multiple-study multiple-subject fMRI datasetsusing canonical correlation analysis. In Proceedingsof the MICCAI 2009 Workshop.

S. V. Shinkareva, R. A. Mason, V. L. Malave, W. Wang,and T. M. Mitchell. 2008. Using fMRI brain activa-tion to identify cognitive states associated with percep-tion of tools and dwellings. PLoS ONE, 3(1):e1394.

S. Thrun. 1996. Is learning the n-th thing any easier thanlearning the first? In Advances in Neural InformationProcessing Systems, pages 640–646. The MIT Press.

N. Tzourio-Mazoyer, B. Landeau, D. Papathanassiou,F. Crivello, O. Etard, N. Delcroix, B. Mazoyer, andM. Joliot. 2002. Automated anatomical labeling of ac-tivations in SPM using a macroscopic anatomical par-cellation of the MNI MRI single-subject brain. Neu-roimage, 15(1):273–289.

M. A. J. van Gerven, C. Hesse, O. Jensen, and T. Heskes.2009. Interpreting single trial data using groupwiseregularisation. NeuroImage, 46:665–676.

M. A. J. van Gerven, B. Cseke, F. P. de Lange, and T. Hes-kes. 2010. Efficient Bayesian multivariate fMRI anal-ysis using a sparsifying spatio-temporal prior. Neu-roImage, 50(1):150–161.

R. Vandenberghe, C. Price, R. Wise, O. Josephs, and R. S.Frackowiak. 1996. Functional anatomy of a com-mon semantic system for words and pictures. Nature,383(6597):254–256.

K. Yu, V. Tresp, and A. Schwaighofer. 2005. LearningGaussian processes from multiple tasks. In Interna-tional Conference on Machine Learning, pages 1012–1019.

17