nonlinear pca: characterizing interactions between modes ... › ~karl › nonlinear pca.pdf ·...

12
Nonlinear PCA: characterizing interactions between modes of brain activity Karl Friston * , Jacquie Phillips, Dave Chawla and Christian Bu« chel The Wellcome Department of Cognitive Neurology, Institute of Neurology, Queen Square, London WC1N 3BG, UK This paper presents a nonlinear principal component analysis (PCA) that identi¢es underlying sources causing the expression of spatial modes or patterns of activity in neuroimaging time-series. The critical aspect of this technique is that, in relation to conventional PCA, the sources can interact to produce (second-order) spatial modes that represent the modulation of one (¢rst-order) spatial mode by another. This nonlinear PCA uses a simple neural network architecture that embodies a speci¢c form for the nonlinear mixing of sources that cause observed data. This form is motivated by a second-order approx- imation to any general nonlinear mixing and emphasizes interactions among pairs of sources. By introducing these nonlinearities principal components obtain with a unique rotation and scaling that does not depend on the biologically implausible constraints adopted by conventional PCA. The technique is illustrated by application to functional (positron emission tomography and functional magnetic resonance imaging) imaging data where the ensuing ¢rst- and second-order modes can be inter- preted in terms of distributed brain systems. The interactions among sources render the expression of any one mode context-sensitive, where that context is established by the expression of other modes. The examples considered include interactions between cognitive states and time (i.e. adaptation or plasticity in PETdata) and among functionally specialized brain systems (using a fMRI study of colour and motion processing). Keywords: functional neuroimaging; PCA; interactions; spatial modes; nonlinear unmixing; sources 1. INTRODUCTION This paper introduces a new technique that falls under the heading of nonlinear principal component analysis (PCA), in the characterization of functional neuro- imaging time-series. This technique identi¢es the under- lying dynamics that determine the expression of spatial modes or patterns of brain activity where, in contra- distinction to conventional PCA, the underlying causes can interact to produce second-order spatial modes. These second-order modes represent the modulation of one distributed brain system by another and provide for a parsimonious characterization of multivariate time-series that embody nonlinear interactions. (a) Eigenimage analysis In Friston et al. (1993) we introduced voxel-based PCA of functional neuroimaging time-series to characterize distributed brain systems implicated in sensorimotor, perceptual or cognitive processes. These distributed systems are identi¢ed with principal components or eigen- images that correspond to spatial modes of coherent brain activity. This approach represents one of the simplest multivariate characterizations of functional neuroimaging time-series and falls into the class of exploratory analyses. Principal component or eigenimage analysis generally uses singular value decomposition (SVD) to identify a set of orthogonal spatial modes that capture the greatest amount of variance, expressed over time. As such, the ensuing modes embody the most prominent aspects of the variance^covariance structure of a given time-series. Noting that the covariances among brain regions is equivalent to functional connectivity renders eigenimage analysis particularly interesting because it was among the ¢rst ways of addressing functional integration (i.e. connectivity) in the human brain. Subsequently eigen- image analysis has been elaborated in a number of ways. Notable among these are the application of canonical variate analysis (CVA; Friston et al. 1996a), multi- dimensional scaling (Friston et al. 1996b) and partial least squares (PLS; McIntosh et al. 1996). Canonical variate analysis was introduced in the context of ManCova (multiple analysis of covariance) and uses the generalized eigenvector solution to maximize the variance that can be explained by some explanatory variables relative to error. CVA can be thought of as an extension of eigenimage analysis that refers explicitly to some explanatory vari- ables and allows for statistical inference. Partial least squares is another name for SVD and can be thought of as an eigenimage analysis of the cross covariances between two sets of data. It was ¢rst applied to neuro- physiological data from di¡erent parts of the brain (the right and left hemispheres; Friston 1995) and has been developed to look at the relationships between imaging time-series and explanatory variables pertaining to experimental design and behaviour (McIntosh et al. 1996). Phil. Trans. R. Soc. Lond. B (2000) 355 , 135^146 135 © 2000 The Royal Society * Author for correspondence (k.friston@¢l.ion.ucl.ac.uk).

Upload: others

Post on 05-Jul-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Nonlinear PCA: characterizing interactions between modes ... › ~karl › Nonlinear PCA.pdf · Eigenimage analysis has been applied widely to positron emission tomographic (PET)

Nonlinear PCA characterizing interactionsbetween modes of brain activity

Karl Friston Jacquie Phillips Dave Chawla and Christian Bulaquo chelThe Wellcome Department of Cognitive Neurology Institute of Neurology Queen Square London WC1N 3BG UK

This paper presents a nonlinear principal component analysis (PCA) that identicentes underlying sourcescausing the expression of spatial modes or patterns of activity in neuroimaging time-series The criticalaspect of this technique is that in relation to conventional PCA the sources can interact to produce(second-order) spatial modes that represent the modulation of one (centrst-order) spatial mode by anotherThis nonlinear PCA uses a simple neural network architecture that embodies a specicentc form for thenonlinear mixing of sources that cause observed data This form is motivated by a second-order approx-imation to any general nonlinear mixing and emphasizes interactions among pairs of sources Byintroducing these nonlinearities principal components obtain with a unique rotation and scaling that doesnot depend on the biologically implausible constraints adopted by conventional PCA

The technique is illustrated by application to functional (positron emission tomography and functionalmagnetic resonance imaging) imaging data where the ensuing centrst- and second-order modes can be inter-preted in terms of distributed brain systems The interactions among sources render the expression of anyone mode context-sensitive where that context is established by the expression of other modes Theexamples considered include interactions between cognitive states and time (ie adaptation or plasticityin PETdata) and among functionally specialized brain systems (using a fMRI study of colour and motionprocessing)

Keywords functional neuroimaging PCA interactions spatial modes nonlinear unmixing sources

1 INTRODUCTION

This paper introduces a new technique that falls underthe heading of nonlinear principal component analysis(PCA) in the characterization of functional neuro-imaging time-series This technique identicentes the under-lying dynamics that determine the expression of spatialmodes or patterns of brain activity where in contra-distinction to conventional PCA the underlying causescan interact to produce second-order spatial modesThese second-order modes represent the modulation ofone distributed brain system by another and provide for aparsimonious characterization of multivariate time-seriesthat embody nonlinear interactions

(a) Eigenimage analysisIn Friston et al (1993) we introduced voxel-based PCA

of functional neuroimaging time-series to characterizedistributed brain systems implicated in sensorimotorperceptual or cognitive processes These distributedsystems are identicented with principal components or eigen-images that correspond to spatial modes of coherent brainactivity This approach represents one of the simplestmultivariate characterizations of functional neuroimagingtime-series and falls into the class of exploratory analysesPrincipal component or eigenimage analysis generallyuses singular value decomposition (SVD) to identify a set

of orthogonal spatial modes that capture the greatestamount of variance expressed over time As such theensuing modes embody the most prominent aspects of thevariance^covariance structure of a given time-seriesNoting that the covariances among brain regions isequivalent to functional connectivity renders eigenimageanalysis particularly interesting because it was among thecentrst ways of addressing functional integration (ieconnectivity) in the human brain Subsequently eigen-image analysis has been elaborated in a number of waysNotable among these are the application of canonicalvariate analysis (CVA Friston et al 1996a) multi-dimensional scaling (Friston et al 1996b) and partial leastsquares (PLS McIntosh et al 1996) Canonical variateanalysis was introduced in the context of ManCova(multiple analysis of covariance) and uses the generalizedeigenvector solution to maximize the variance that can beexplained by some explanatory variables relative to errorCVA can be thought of as an extension of eigenimageanalysis that refers explicitly to some explanatory vari-ables and allows for statistical inference Partial leastsquares is another name for SVD and can be thought ofas an eigenimage analysis of the cross covariancesbetween two sets of data It was centrst applied to neuro-physiological data from diiexclerent parts of the brain (theright and left hemispheres Friston 1995) and has beendeveloped to look at the relationships between imagingtime-series and explanatory variables pertaining toexperimental design and behaviour (McIntosh et al 1996)

Phil Trans R Soc Lond B (2000) 355 135^146 135 copy 2000 The Royal Society

Author for correspondence (kfristoncentlionuclacuk)

Eigenimage analysis has been applied widely topositron emission tomographic (PET) and subsequentlyfunctional magnetic resonance imaging (fMRI) data Itis generally used as an exploratory device to characterizethe main contributions to coherent brain activity Thesevariance components may or may not be related toexperimental design and recently spontaneous or endo-genous correlations have been observed in the motorsystem without experimental manipulation (Biswal et al1995) Despite its exploratory power eigenimage analysisis fundamentally limited for two reasons First it oiexclersonly a linear decomposition of any set of neurophysio-logical measurements and second the particular set ofeigenimages or spatial modes obtained are uniquelydetermined by constraints that are biologically implau-sible These aspects of PCA represent inherent limitationson the interpretability and usefulness of eigenimageanalysis of biological time-series

In this paper we introduce a new technique thatresolves both the problems of linearity and non-uniquenessof the modes This technique is a special case of nonlinearPCA that is motivated by a second-order approximationto any general interactions among a small number of`sourcesrsquo or causesrsquo of variance in multivariate time-seriesIn brief this technique identicentes a small number ofsources or components that explain the most variance inthe observed data while allowing for high-order inter-actions among these sources The sources are identicentedsubject to and only to the constraint that they are ortho-gonal or uncorrelated By virtue of the nonlinear inter-actions the sources are uniquely identicented eschewing theneed to refer to unnatural constraints

(b) The importance of nonlinear PCA incharacterizing distributed brain systems

As noted above the two main limitations of conven-tional eigenimage analysis are that the decomposition ofany observed time-series is in terms of linearly separablecomponents characterized by their spatial modes andscores Second that the spatial modes are somewhat arbi-trarily constrained to be orthogonal and account succes-sively for the largest amount of variance In general theidenticentcation of independent components (independentcomponent analysis or ICA) is only possible to withinsome permutation and scaling PCA relaxes the require-ment of independence and replaces it with orthogonalityintroducing the further problem that there is no uniquerotation of the principal components In PCA a uniquerotation and permutation is obtained by requiring succes-sive modes to account for the greatest amount of variancethat remains once higher components have beenremoved Scaling is constrained by ensuring the spatialmodes have unit sum of squares

From a biological perspective the linearity constraintis a rather severe one Because the decomposition is linearit precludes interactions among the causes of spatialmodes This is a highly unnatural restriction on theactivity expressed by distributed brain systems where oneexpects to see substantial interactions that render theexpression of one mode sensitive to the expression ofothers There are numerous examples of nonlinear inter-actions and modulatory eiexclects that shape the context-sensitive nature of neuronal dynamics and brain activity

Perhaps the most compelling example at a systems levelis attentional modulation Consider two distributed brainsystems one subserving the processing of dynamic visualstimuli and the other responsible for a particular atten-tional set In the context of attending to visual motionthe neuronal responses in the visual system elicited bymotion stimuli will depend on whether the subject isattending to this attribute or not Attentional status willbe repoundected in the activity of some attentional mode andtherefore the expression of the visual processing modewill be a function of the expression of the attentionalmode It is more than likely that the implicit interactionbetween the visual and attentional modes will result notonly in the degree to which the modes are expressed butin their form or relative regionally specicentc contributionsFor example activity in visual area V5 that has beenimplicated in the processing of visual motion (Zeki 1990)may be enhanced (relative to say V2 or V1) whenever theappropriate attentional mode is being expressed (Treue ampMaunsell 1996 Bulaquo chel et al 1998) This context-sensitiveexpression of spatial modes can be modelled conceptuallyin terms of centrst- and second-order eiexclects In thisexample there are two sources or causes of distributedneuronal responses namely the presence of visual motionin the visual centeld and attention Both these causes areexpressed in terms of activity in their respective spatialmodes and the interactions between these two causeswould correspond to a second-order eiexclect that wasexpressed in V5 These second-order eiexclects can bethought of as changes in the centrst mode that depend onthe expression of activity in the second mode or equiva-lently modulation of one mode that is sensitive to thecontext engendered by the other

The example considered in this paper is based on afMRI study of visual processing that was designed toaddress the interaction between colour and motionprocessing We had expected to demonstrate that a`colourrsquo mode and `motionrsquo mode would interact toproduce a second-order mode repoundecting (i) reciprocalinteractions between extrastriate areas functionallyspecialized for colour and motion (ii) interactions inlower visual areas mediated by convergent backwardseiexclerents or (iii) interactions in the pulvinar mediated bycorticothalamic loops) Two out of three of these predic-tions were seen (see frac12 3(b))

In summary to properly model the context-sensitivenature of distributed but coherent brain responses it maybe necessary to address interactions among spatial modesthat allow for the modulation of one mode by anotherThese modulatory eiexclects are second- or high-order innature and correspond to an interaction among theunderlying causes in determining a particular pattern ofcortical responses This is the principle motivation for thedevelopment and use of nonlinear forms of PCA

This paper is divided into two sections The centrst sectionreviews the theoretical background to nonlinear PCAcentrst in general terms and then the specicentc implementa-tion proposed here This section includes the theoreticalmotivation behind the particular form of the decomposi-tion employed how sources and modes are identicented andhow the ensuing modes can be interpreted The secondsection is an illustrative application of nonlinear PCA to

136 K Friston and others Characterizing interactions between modes of brain activity

PhilTrans R Soc Lond B (2000)

the original PET study used to illustrate eigenimageanalysis (Friston et al 1993) and a fMRI study of visualmotion and colour processing that exemplicentes the modu-lation of one brain system by another

2 THEORETICAL BACKGROUND

(a) Nonlinear PCANonlinear PCA (eg Kramer 1991 Softky amp Kammen

1991 Karhunen amp Joutsensalo 1994) is a natural exten-sion of PCA in a sense that it aims to identify a smallnumber of underlying components or sources causing amultivariate data set that best explain the observedvariance^covariance structure Nonlinear PCA is itself avariant of related nonlinear approaches to structuralanalysis that have been pioneered over the last fewdecades These include nonlinear factor analysis (egMcDonald 1984) and nonlinear partial least squares (egWold 1992)

Imagine that we had m observations of an n-variateFor example m scans where each scan comprised n voxelsWe can represent these data as m points in an n-dimensionalspace In conventional PCA the centrst principal compo-nent corresponds to the direction of a line running alongthe principal axis of the resulting cloud of points Thedirection of this line is specicented by an n-vector that corre-sponds to the eigenimage and the projections of any onepoint on to this line give the expression of this componentfor the observation (ie point) in question There are twoequivalent perspectives on the role that the principal axisserves The centrst is that the projection of the m points on tothis line has the maximum dispersion or variance Inother words the component scores of the centrst principalcomponent has the most variance The other perspectiveis that the average distance of any point from this line isminimized in relation to all possible lines In other wordsthe centrst principal component is the pattern over the nvoxels that minimizes the unexplained variance in thedata Nonlinear PCA adopts exactly the same principlesbut allows for curvilinear lines In brief a curve is centttedto the data in n-space such that the average distance ofthe data points from this principal curve is minimized(centgure 1) This heuristic description highlights the inti-mate relationship between nonlinear PCA and the identi-centcation of principal curves or surfaces (Dong amp McAvoy1996) In the case of linear PCA the principal axes aredetermined analytically using the eigenvector solution ofthe npound n covariance matrix In nonlinear PCA there is noclosed-form solution and iterative techniques are gener-ally employed These iterative approaches are usually bestframed in terms of simple neural networks using gradientascent or descent on the weights of the connections withinthe network One attractive architecture (Kramer 1991)that has been used in this context is based on the notionof `bottle-neck nodesrsquo These architectures have centve layerswith a mirror symmetry about the middle or third layerThe centrst and centfth layers represent inputs and outputsThe middle layer has typically a very small number Jof nodes The intermediate layers two and four havelarger numbers of nodes or neurons than the input oroutput layers and employ some nonlinear activation func-tion The network is trained to reproduce its input at theoutputs This simple training forces the network to learn

a nonlinear function of the inputs that best predicts theinputs themselves subject to the constraint that it can beexpressed as a function of small number J of sources(activities of the `bottle-neck nodesrsquo in the middle layer)The transformation from the input to the middle layerrepresents a projection of the data on to the J principalsurfaces of the data and the nonlinear transformationfrom the middle layer to the output layer decentnes the formof these surfaces There are some extremely interestingissues pertaining to the use of these architectures inidentifying principal components of a nonlinear sortbut we will not pursue them here In this paper we takea somewhat diiexclerent approach that embodies someexplicit constraints on the form of the nonlinearities thatmay cause biological data and develop an alternativearchitecture that retains the two basic principles of(i) using `bottle-neck nodesrsquo and (ii) training the

Characterizing interactions between modes of brain activity K Friston and others 137

Phil Trans R Soc Lond B (2000)

(a)

(b)

y(voxel 2)

x (voxel 1)

r

Vxx + Vyy

voxel 1

r

f(xy) = V0 + Vxx + Vyy + Vxyxy

Figure 1 Schematic illustrating the idea behind nonlinearPCA and principal curves In this simple example there areonly two voxels and a series of images corresponding to pointswith values x y in the plots A conventional PCA centnds theprincipal line or axis given by a linear function of x and y thatminimizes the average squared distance (r) of each point fromthat line (a) Nonlinear PCA is exactly the same but in thisinstance the axis is a curve given by some nonlinear functionof x and y (b)

network so that the output best predicts the input in aleast-squares sense

(b) Second-order PCAIn this subsection we will introduce a variant of

nonlinear PCA that uses a specicentc form for the assumednonlinear mixing of sources to produce the observedresponses in data This form is predicated on interactionsamong sources in the genesis of multivariate time-seriesIn what follows we shall assume that an n-variate obser-vation is caused by a small number of J underlyingsources and interactions among these sources Generallythe observation of the ith variate (eg at the ith voxel)will be some nonlinear function of the underlying sources

yi(t) ˆ fi(s(t)) (1)

where y(t) ˆ [ y1(t) yn(t)] is an n-vector function oftime Similarly for s(t) ˆ [s1(t) sJ(t)]

A second-order approximation of the Taylor expansionof equation (1) about some expected value -s(t) for thesources is given by

yi(t) ordm fi( middots) DaggerX

j

fi

ujuj Dagger

X

jk

2fi

ujukujuk (2)

where u(t) ˆ (s(t)7-s(t)) is an alternative representation ofthe sources Now incorporating all n observations (ievoxels) equation (2) can be expressed in matrix form interms of zeroth- centrst- and second-order modes repre-sented by the n-vectorsV 0V 1 and V 2

y(t) ordm V 0 DaggerX

j

ujV1j Dagger

X

jk

ujukV2jk

where

V 0 ˆ permil f1(s) fn(sdaggerŠ V1j ˆ f1

uj

fn

uj

V2jk ˆ 2f1

ujuk

2fn

ujuk (3)

The elements of the vectors V1 correspond to the centrstorder partial derivatives in equation (2) and the elementsof the vectors V 2 correspond to the second-order deriva-tives V1 and V 2 have the natural interpretation of centrst-and second-order modes respectively In other words thejth source is expressed in terms of the spatial mode V1

j

and the interaction between the jth and kth modesexpressed as a spatial mode V2

jk Equation (3) can beconsidered a special case of a more general equation thatembodies higher-order terms

y(t) ordm V0 DaggerX

j

ujV1j Dagger

X

jk

frac14(ujuk)V2jk (4)

frac14(cent) is some sigmoid or squashing function that allows forslightly more general forms of interactions among sourcesand ensures a unique scaling for the sources u(t)

The above gives a suitable form for a nonlinear decom-position or PCA of a multivariate data set y(t) To identifythe values of u(t) and the spatial modes it is necessary toassume a constraint of orthogonality for the sources Thisis a natural constraint in a sense that the underlying

causes of any biological data should be independent ifthey represent true causes and as such will be ortho-gonal Notice that the assumption of orthogonality is aweaker assumption than independence and it is thisassumption that decentnes the algorithm described here as anonlinear PCA Had we assumed independence then theproblem would become that of nonlinear independentcomponent analysis (Common 1994) which woulddemand a diiexclerent approach

(c) Neuronal architecture and identicentcation ofnonlinear components

In this section the neuronal architecture and gradientdescent scheme used to identify sources and their modesare described Note that equation (4) can be treated as ageneral linear model and as such if we knew the sourcesu(t) the modes could be estimated by minimizing theresiduals tracefRg in a least squares sense where

R ˆ (y iexcl XV)T (y iexcl XV) (5)

and

V ˆ permilV V1 V2Š ˆ (XT X)iexcl1XT y

X ˆ permil1 u1 uk frac14(u1u2) frac14(ukuk)Š

Here 1 is a column of ones and I below is the identitymatrix (XTX)71Xy is simply the least-squares estimatorof V given the inputs y and estimated sources (and theirinteractions) in X The problem therefore reduces toidentifying the variates u(t) corresponding to estimates ofthe sources that minimize the norm of the residualstracefRg By noting the existence of some vector Gi whereV 0Gi ˆ 0 V1

i Gi ˆ 1 for i ˆ j and 0 otherwise andV2

jkGi ˆ 0 then post-multiplying equation (4) throughoutby Gi gives ui(t) ˆ y(t)poundGi This means that there must bea linear combination of the inputs that gives the ithsource One simply has to centnd the linear combination ofinputs that minimizes tracefRg for a given input y(t)subject to the constraint that the sources are orthogonal

These observations lead to the following simple neuralnetwork the network has three layers comprising inputmiddle and output layers The input and output layershave n nodes and linear activation functions and can beimagined as lying next to each other (centgure 2) Themiddle layer comprises a small (J 5 n) number of centrst-order nodes with linear activation functions that receiveinputs from all the input nodes In addition the middlelayer includes p ˆ n(n71)2 second-order nodes thatreceive lateral inputs from the centrst-order nodes Eachsecond-order node receives two inputs that are multipliedand subject to the nonlinear function frac14(cent) to provide theiroutput The network is trained on the feedforwardconnection strengths from the input layer to the centrst-order nodes of the middle layer The weights to the ithcentrst-order node are eiexclectively estimates of Gi Theconnections from all middle-layer nodes to the outputsare determined using the least-squares estimators of themodes given the current estimate of the sources and theinputs according to equation (5) Anti-Hebbian lateralconnections (Foldiak 1993) among the centrst-order nodes inthe middle layer ensure that the sources u(t) are ortho-gonal These JpoundJ lateral connection strengths L are

138 K Friston and others Characterizing interactions between modes of brain activity

PhilTrans R Soc Lond B (2000)

determined at each iteration to render the oiexcl-diagonalelements of Covu) zero

L ˆ I iexcl paraiexcl1curren1=2ET (6)

para is a leading diagonal matrix whose elements correspondto the variance of the sources in the absence of decorre-lating lateral connections (ie para ˆ diagfucurrenT ucurreng whereucurren ˆ y pound G) curren and E are the eigenvalues and eigen-vectors of ucurrenT ucurren Estimates of the sources are given by

u ˆ yG Dagger uL ˆ y pound G(I iexcl L)iexcl1 (7)

Note that substituting equation (6) into equation (7) givesCovfug uT u ˆ para thereby ensuring orthogonality of theestimated sources Implementing changes in the lateralconnections in this way enforces orthogonality of theprojection eiexclected by the feed-forward connections ateach iteration L imposes this constraint because theeiexclective feed-forward connections are G(I7L)71 Essen-tially the architecture is centnding the rotation and scaling

of some projection on to a low-dimensional orthogonalsubspace that enables the interactions among theprojected inputs that ensue to predict as much of theoriginal inputs as possible u enters into equation (5) withy to compute tracefRg The learning rule for the esti-mates of Gi corresponds to gradient descent on the traceof the residuals In practice we use a Nelder^Meadsimplex search as implemented in MATLAB (MathWorksInc Natick MA USA) that minimizes tracefRg whichis simply a function of the input y and the feed-forwardconnections strengths G This neural network appears tobe robust and usually converges within a few tens of itera-tions to give estimates of the underlying sources u(t) andleast-square estimates of corresponding spatial modes VFigure 2 is a schematic that tries to convey the simplicityof the architecture Here the third or output layer hasbeen repoundected back on to the inputs to emphasize thesymmetry between the assumed structure of causes in theworld and the architecture used to identify them

(d) Interpreting the estimatesGenerally in PCA in the absence of any variance

maximization^minimization criterion there is no uniquerotation of the spatial modes (ie any linear combinationof the modes that conforms to an orthogonal rotation isas equally good as any other) The incorporation of theinteraction term into the decomposition implicit in equa-tion (3) ensures a unique rotation and furthermoreincorporating the sigmoid function in equation (4)ensures that the scaling is uniquely determined The latterfollows from the fact that there will be some optimumsquashing of each interaction term to best predict theobserved data The reason that unique solutions obtain inthis form of nonlinear PCA is that we have assumed avery specicentc form for the high-order interactions amongsources in causing the data This is based on the pairwiseinteractions between sources as modelled by the secondterm in equation (4) The interpretation of the sourcesand their modes is relatively straightforward

The observed multivariate data can be explained by asmall number of J sources whose expressions are given byu The values of uj scale the contribution of the centrst-orderspatial mode V1

j in a way that is directly analogous toconventional PCA where uj would be the jth componentscore In addition there are second-order eiexclects thatrepresent interactions between pairs of sources frac14(uj uk)These interactions are expressed in second-order modescorresponding to V2

jk Each second-order mode will havea variance component that may or may not be orthogonalto the centrst-order modes Although the sources are ortho-gonal there is no explicit requirement for the modes to beso The variance accounted for by each source and inter-action is given by

jujj cent jV1j j and jfrac14(ujuk)j cent jV2

jkj (8)

and can be used to rank the relative contributions of eachsource or interaction j cent j denotes the vector norm (iesum of squares)

The nonlinear PCA proposed here therefore decom-poses a multivariate data set into centrst- and second-ordercomponents that can be ascribed to a small number ofunderlying sources The number of sources will be

Characterizing interactions between modes of brain activity K Friston and others 139

Phil Trans R Soc Lond B (2000)

L12

u1

u1

(u1su2

u2

u2)

(u1s u2)

V 212

V12

G1 trace (R)

trainingerror

estimated sources andinteraction (layer 2)

predicted data(layer 3)

y1 y2 y3 y4 y5 y6 y7 y8 y9 y10

y1 y2 y3 y4 y5 y6 y7 y8 y9 y10

V 212

V12

interactions sources

observed data(layer 1)

Figure 2 The neural net architecture used to estimate sourcesand modes The lower half of the schematic represents the realworld with its sources and interactions (here only two sourcesand the subsequent interaction are shown) These sources andinteractions (u) cause signals ( y) in the input layer (layer 1)that here comprises ten voxels or channels The signals causedby the sources are weighted by the voxel-specicentc elements ofthe corresponding centrst- or second-order spatial modes (V1 andV 2) Feedforward connections (G) from the input layer tolayer 2 provide an estimate of the sources (u) in layer 2 Thisestimation obtains by changing G to minimize the sum ofsquared residuals (traceR) or diiexclerences between theobserved signals and those predicted by the activity in layer 3The activity in layer 3 results from backwards connectionsfrom the estimated source and interaction nodes in layer 2These backward connections are the estimates of the spatialmodes (V1 and V 2) and are determined using least-squaresgiven the input ( y) and the current estimate of the sources(u) Lateral decorrelating or anti-Hebbian connections Lbetween the centrst-order modes ensure orthogonality of thesource estimates Note that in the absence of any interactionthe solution would correspond to a conventional PCA whereG ˆ pinv(V1)

generally greater than the minimum number acquired toaccount for the rank of the multivariate data Forexample with just three voxels or channels two sourceswould be sucurrencient if the third dimension was explainedby the interactionbetween these two sourcesThree sourceswould be sucurrencient to account for a six-dimensional dataset and so on Clearly in the analysis of functional neuro-imaging time-series an initial dimension reduction isrequired before nonlinear PCA can be applied Forexample taking the centrst 36 spatial modes of a fMRIimaging time-series using conventional singular valuedecomposition would in principle require only eightunderlying sources In this sense nonlinear PCA representsa parsimonious characterization of the data that capturesnonlinear interactions among spatial modes or distributedbrain systems in a comprehensive but intuitive fashionThese and other issues will be demonstrated in the nextsection which uses real data to illustrate the technique

3 ILLUSTRATIVE EXAMPLES

In this section we will use a multisubject PET study ofverbal pounduency and a fMRI case study of visual processingto illustrate the use of nonlinear PCA and some aspects offunctional anatomy that can be addressed with this tech-nique The PET study is used to show that a considerableamount of variance can be accounted for by interactionsamong causes of the data and is presented for comparisonwith the original linear PCA characterization in Friston etal (1993) In brief we will show that the two experi-mental factors (task and time) combine to expressthemselves in a second-order mode that repoundects time-dependent adaptation of task-related responses Thiseiexclect can be construed as large-scale neurophysiologicalplasticity attributable to strategic changes in cognitiveprocessing during intrinsic relative to extrinsic genera-tion of words The second example using fMRI dealsmore explicitly with the modulation of one brain systemby another In particular the interactions betweenspecialized cortical systems that may be mediated bycorticothalamic loops

(a) A PET study of verbal pounduency(i) Data acquisition experimental design and preprocessing

The data were obtained from centve subjects scanned 12times (every 8 min) while performing one of two verbaltasks Scans were obtained with a CTI PETcamera (model953B CTI Knoxville TN USA) 15O was administeredintravenously as radiolabelled water infused over 2 minTotal counts per voxel during the build-up phase of radio-activity served as an estimate of regional cerebral bloodpoundow (rCBF) Subjects performed two tasks in alternationOne task involved repeating a letter presented aurally atone per two seconds (word shadowing) The other was apaced verbal pounduency task where the subjects respondedwith a word that began with the letter presented (intrinsicword generation)The data were realigned stereotacticallynormalized and smoothed with a 16 mm Gaussian kernel(Friston et al 1996c) The data were subject to a conven-tional SPM analysis using multiple linear regression with12 condition-specicentc eiexclects centve subject eiexclects and globalactivity as described in Friston et al (1995a) Parameterestimates representing condition-specicentc eiexclects averaged

over subjects were selected from voxels that exceeded athreshold of p5005 in the ensuing SPMfFgand subject tononlinear PCA as described below

(ii) Nonlinear PCAThe data were reduced to an eight-dimensional

subspace using SVD and entered into the nonlinear PCAusing two sources The ability of these two sources andtheir interaction to explain the observed regional activityis illustrated in centgure 3a Here an arbitrary voxel (thatshowing the highest F-value in the conventional SPManalysis) was selected from the left inferior frontal gyrus(Brodmann Area 47) The observed condition-specicentcactivity over 12 scans is shown in black and thatpredicted by the two sources is shown in white The rela-tive amount of variance accounted for by the two sourcesand their interaction is shown in the middle panel It canbe seen that 88 of the total variance over all voxelsincluded in the analysis can be explained by two sourcesThe second-order mode accounts of 22 of this (afterremoving that which can be modelled by the centrst-ordereiexclects) and would have been distributed over othermodes in a conventional PCA Figure 3b shows this distri-bution indicating that the centfth and sixth eigenimages ina conventional PCA largely comprise the interactionbetween the two modes identicented by nonlinear PCA

The centrst- and second-order modes are seen in centgure 4along with their expression over the 12 scans It is imme-diately apparent that the centrst mode repoundects task-relatedeiexclects paralleling the alternation between word genera-tion and word shadowing This procentle of brain regions istypical of verbal pounduency paradigms that isolate theintrinsic generation of semantic representations encodingand retrieval processes required to compare the currentoutput with previous words and the maintenance of anappropriate cognitive set The key regions involvedinclude the thalamus dorsolateral prefrontal cortex ante-rior cingulate temporal cortices and cerebellum Thesecond mode represents the other experimental factornamely time or order eiexclects A nonlinear eiexclect is evidentwith increases in activity in the cerebellar thalamic andleft basotemporal regions More interesting is the second-order mode that by implication repoundects an interactionbetween task-related responses and time ie time-dependent increases in physiological responses elicited bycognitive operations that distinguish between the twotasks employed This physiological adaptation in mostpronounced in Brocarsquos Area (Brodmann Area 44 in theleft prefrontal cortex and the right lateral thalamicregions) Brocarsquos Area is traditionally associated withspeech production and appears to undergo a profoundchange in its relative activation during word shadowingand generation after the centrst pair of scans that presum-ably repoundects an underlying change in cognitive archi-tecture or set

(b) A fMRI study of colour and motion processing(i) Data acquisition experimental design and preprocessing

The experiment was performed on a 2 Tesla MagnetomVISION (Siemens Erlangen Germany) whole body MRIsystem equipped with a head volume coil Contiguousmultislice T

2 -weighted fMRI images (TE ˆ 40 ms64 mmpound64 mmpound48 mm 3 mmpound3 mmpound3 mm voxels)

140 K Friston and others Characterizing interactions between modes of brain activity

PhilTrans R Soc Lond B (2000)

were obtained with echoplanar imaging using an axialslice orientation The eiexclective repetition time was 48 s Ayoung right-handed subject was scanned under fourdiiexclerent conditions in six scan epochs intercalated witha low level (visual centxation) baseline condition The fourconditions were repeated eight times in a pseudorandomorder giving 384 scans in total or 32 stimulation^baselineepoch pairs During all stimulation conditions the subjectlooked at dots back-projected on a screen by an LCDvideo projector The four experimental conditionscomprised the presentation of (i) radially moving dotsand (ii) stationary dots using (i) luminance contrast and(ii) chromatic contrast in a two-by-two factorial designLuminance contrast was established using isochromaticstimuli (red dots on a red background or green dots on agreen background) Hue contrast was obtained by usingred (or green) dots on a green (or red) background andestablishing isoluminance with poundicker photometry In thetwo movement conditions the dots moved radially fromthe centre of the screen at 88 s71 to the periphery wherethey vanished This creates the impression of optical poundowBy using these stimuli we hoped to excite activity invisual motion systems and those specialized for colourprocessing Any interaction between these systems wouldbe expressed in terms of motion-sensitive responses thatdepended on the hue or luminance contrast subtendingthat motion

The time-series were realigned corrected for movement-related eiexclects and spatially normalized into the standardspace of Talairach amp Tournoux (1988) using the subjectrsquosco-registered structural T1 scan and nonlinear deforma-tions (Friston et al 1996c) The data were spatiallysmoothed with a 6 mm isotropic Gaussian kernel As inthe PETexample voxels were selected that showed signicent-cant condition-specicentc eiexclects according to a conventionalSPM analysis (Friston et al 1995b Worsley amp Friston1995) This analysis used a multiple linear regression andcondition-specicentc box car regressors convolved with ahaemodynamic response function In this instance thenumber of voxels was exceeding large and we used ahigher threshold than in the PETanalysis ( p ˆ 0001) andincluded only those voxels that were posterior to theposterior commissure

(ii) Nonlinear PCAThe data were again reduced to an eight-dimensional

subspace using SVD and entered into the nonlinear PCAusing two sources The functional attribution of thesesources was established by looking at the expression of thecorresponding centrst-order modes over the four conditionsThe expression of epoch-related responses over all 32stimulation^baseline epoch pairs are shown in terms ofthe four conditions in centgure 5 This expression is simplythe score on the centrst principal component over all 32epoch-related responses for each source The centrst mode is

Characterizing interactions between modes of brain activity K Friston and others 141

Phil Trans R Soc Lond B (2000)

(a)

(b)

(c)

5

4

3

2

1

0

- 1

- 2

- 3

acti

vity

70

60

50

40

30

20

10

0

4

35

3

25

2

15

1

05

0

o

f va

rian

ce

1 2 3 4 5 6scan

7 8 9 10 11 12

mode 1652

mode 2207

interaction22

1 2 3 4eigenimage

5 6 7 8

Figure 3 Variance partitioning following a nonlinear PCA ofthe PET verbal pounduency study (a) Observed activity in a voxelin the left inferior frontal gyrus (centlled bars) and that predictedon the basis of two sources and their interaction (open bars)estimated with nonlinear PCA Activity is in units

Figure 3 (Cont) corresponding to regional cerebralperfusion in ml dl71 min71 (b) Variance over all voxelsincluded in the analysis accounted for by the two sources (ormodes) and their interaction Total ˆ 88 (c) Distributionof variance accounted for by second-order or interactioneiexclects over the conventional eigenimages obtained in theinitial SVD dimension reduction

clearly a motion-sensitive mode but one that embodiessome colour preference in the sense that the motion-dependent responses of this system are accentuated in thepresence of colour cues This was not quite what we hadanticipated the centrst-order eiexclect contains what would

functionally be called an interaction between motion andcolour processing The second source appears to beconcerned exclusively with colour processing in the sensethat its expression is uniformly higher under colourstimuli relative to isochromatic stimuli in a way that does

142 K Friston and others Characterizing interactions between modes of brain activity

PhilTrans R Soc Lond B (2000)

(a) (b)

(i)

(ii)

(iii)

15

1

05

0

- 05

- 1

- 15

1

05

0

- 05

- 1

- 15

acute 10- 5

5

0

- 5

acute 10- 6

1 2 3 4 5 6condition

7 8 9 10 11 12

Figure 4 Maximum intensity projections and expression of the centrst- and second-order spatial modes of the PET verbal pounduencystudy (i) Spatial mode 1 (ii) spatial mode 2 and (iii) second-order mode The maximum intensity projections (a) are of thepositive values of each mode and are displayed in standard format The three orthogonal brain views are from the right theback and the top of the brain The projections have been scaled to the maximum intensity of each mode The time-dependentexpression of these modes are in terms of the 12 scans (b) The units are adimensional and their absolute values are not important(the variance they account for is determined by the scaling of the spatial modes which in turn is dictated by the sigmoidsquashing function see centgure 3)

not depend on motion The corresponding anatomicalprocentle is seen in centgure 6 (maximum intensity projectionsin centgure 6a and thresholded axial sections in centgure 6b)The centrst-order mode which shows both motion andcolour-related responses shows high loadings in bilateralmotion-sensitive complex V5 (Brodmann Areas 19 and 37at the occipto-temporal junction) and areas traditionallyassociated with colour processing (V4oumlthe lingualgyrus Brodmann Area 19 ventromedially) The secondcentrst-order mode is most prominent in the hippocampusparahippocampal and related lingual cortices on bothsides The two more lateral blobs subsume the tails of thecaudate nuclei (centgure 6b(ii)) This system is not onenormally associated with colour processing but it should

be noted that some of the main eiexclect of colour has beenexplained by the centrst mode that includes V4 Insummary the two centrst-order modes comprise (i) anextrastriate cortical system including V5 and V4 thatresponds to motion and preferentially so when motion issupported by colour cues and (ii) a (para)hippocampal^lingual system that is concerned exclusively with colourprocessing above and beyond that accounted for by thecentrst system The critical question is where do these modesinteract

The interaction between the extrastriate and (para)hippocampal lingual systems conforms to the second-order mode in the lower panels This mode highlights thepulvinar of the thalamus and V5 bilaterally This is apleasing result in that it clearly implicates the thalamus inthe integration of extrastriate and (para)hippocampalsystems This integration being mediated by recurrent(sub)corticothalamic connections It is also a result thatwould not have obtained from a conventional SPManalysis Indeed we looked for an interaction betweenmotion and colour processing and did not see any sucheiexclect in the pulvinar The reason that the nonlinear PCAwas able to centnd this interaction was that there were noconstraints on the sources underlying the interaction In aconventional SPM analysis the sources are explicitlyassumed to be colour and motion in the visual centeldwhereas the two interacting modes identicented by thenonlinear PCA were caused by complicated admixtures ofcolour and motion This result is presented to illustratethe potential usefulness of nonlinear PCA not to makeany statistical inferences about reproducible functionalarchitectures The exploratory analysis based on this casestudy could now be used to motivate hypothesis-ledanalyses of other subjects

4 CONCLUSION

In this paper we have described a specicentc form ofnonlinear PCA that is predicated on the interactionbetween underlying sources in modulating spatial modesof brain activity Its theoretical motivation stems directlyfrom a second-order approximation to the Taylor expan-sion of any nonlinear function of sources that can causemultivariate observations A simple three-layer neuronalnetwork architecture is sucurrencient to identify or estimatethe underlying causes and associated centrst- and second-order spatial modes The centrst-order modes correspond toconventional eigenimages or principal components andthe second-order modes describe the patterns of brainactivity that result from interactions among thesesources The ensuing decomposition into centrst- andsecond-order components represents an exploratoryanalysis of the data that eschews some of the shortcom-ings of conventional PCA In particular nonlinear PCAallows for the context-sensitive expression of spatialmodes through second-order modes that can be inter-preted as the anatomical substrate of integration ormodulation The highly constrained form of nonlinearPCA presented above has an intuitive interpretation interms of pairwise interactions among underlying sourcesand by virtue of this represents a useful and parsimo-nious characterization of functional neuroimaging time-series

Characterizing interactions between modes of brain activity K Friston and others 143

Phil Trans R Soc Lond B (2000)

04

03

02

01

00

- 01

- 02

- 03

(a)

04

02

0

- 02

- 04

- 06

(b)

cond

itio

nndashsp

ecif

ic e

xpre

ssio

n

motion

motion

stat

colour colour

isoch isoch

stat

1 2condition

3 4

Figure 5 Condition-specicentc expression of the two centrst-ordermodes ensuing from the visual processing fMRI studyThese data represent the degree to which the centrst principalcomponent of epoch-related waveforms over the 32 photicstimulation^baseline pairs was expressed These condition-specicentc responses are plotted in terms of the four conditionsfor the two modes motion motion present stat stationarydots colour isoluminant chromatic contrast stimuli isochisochromatic luminance contrast stimuli (a) First-ordermode 1 (b) centrst-order mode 2

In this paper we have chosen to illustrate the techniqueusing the interaction between modes associated withcolour and motion processing Nonlinear PCA could ofcourse be used in any situation where one expects theactivity of a distributed brain system to be modulated bythe expression of another system Many examples come tomind that may or may not be grounded in cognitivescience or neuroscience models For example Are the

modes implicated in the visual processing of word formsand graphemes modulated by semantic modes in moreanterior temporal and parietal cortices Althoughnonlinear PCA is an exploratory device and is implicitlydata-led careful experimental design can be used tocontrol the expression of various spatial modes that onewishes to characterize As a general point it is likely thatthe more powerful designs will be factorial in nature

144 K Friston and others Characterizing interactions between modes of brain activity

PhilTrans R Soc Lond B (2000)

z = - 4 mm

z = 0 mm

z = 4 mm

(a) (b)

(i)

(ii)

(iii)

Figure 6 Maximum intensity projections and axial (transverse) sections of the centrst- and second-order spatial modes of the fMRIphotic stimulation study The maximum intensity projections (a) adhere to the same format as in centgure 4 The axial slices havebeen selected to include the maxima of the corresponding spatial modes (i) Spatial mode 1 (ii) spatial mode 2 and (iii) second-order mode In this display format the modes have been thresholded at 164 of each modersquos standard deviation over all voxels(white areas) The resulting excursion set has been superimposed onto a structural T1-weighted MRI image conforming to thesame anatomical space (b)

allowing the expression of one mode associated with oneexperimental factor to be assessed under diiexclerent levelsof the expression of a second mode elicited by a secondexperimental cognitive or sensory factor In someinstances factorial designs are not always easy toimplement (eg in selective attention because it is dicurrencultto attend selectively to a particular attribute when it isnot present in the visual centeld) However many multi-factorial experiments designed to look at languageprocessing and memory may lend themselves nicely tocharacterization using the techniques described in thispaper

(a) Extensions and limitationsThe limitations of the nonlinear PCA proposed above

are embodied in the constraints on the form of thedecomposition assumed The most obvious constraint isthat it only allows for second-order interactions amongsources or causes of the data whereas higher-order inter-actions may prevail It would of course be easy toextend the neural net architecture to include third- orhigher-order nodes and this may be justicented in some dataanalytic situations In neuroimaging however the time-series one deals with are usually quite short and noisy andsimply identifying second-order eiexclects can be quite ambi-tious The second limitation is that the number of sourceshas to be prespecicented Again this may be a drawback interms of system identicentcation and independent compo-nent analysis in general However in neuroimaging onehas experimental control over the number of factors (iesources) that are likely to cause neurophysiologicalchanges and specifying the number of sources is a muchmore tenable in terms of justicentable restrictions on thecasual model assumed for the data

Another important consideration is that in the specialapplication of nonlinear PCA to functional imagingdata an initial dimension reduction using SVD isrequired This is because there are many more voxelsthan observations It is well known that systematic errorscan creep into applying SVD to simple nonlinear depen-dencies and that these depend on the rate of convergenceof the Taylor series associated with equation (1) In thispaper the SVD is done centrst and then the nonlinearanalysis is performed It is always possible that the SVDhas not established the right bases for the subsequentanalysis and that some bias will ensue The result will bethat apparent modulations of the centrst-order modes willnot be correct These issues represent areas of futurework and could be addressed using `toyrsquo nonlinearsystems and synthetic imaging data and by examiningthe sensitivity of the second-order modes to the degree ofSVD dimension reduction

A more biological consideration relates to themechanism of the interaction In the examples presentedabove we have assumed that the interaction is expressedat a neuronal level in terms of modulation of neuronalresponses and dynamics themselves It should be borne inmind that interactions can also be expressed at the levelof the haemodynamic response to [non-interacting]neuronal responses (eg Friston et al 1998) This shouldbe considered where there is a high degree of anatomicalconvergence between centrst-order modes that evidence astrong interaction

By virtue of the iterative scheme used for learning inthe neural net there is always the problem of localminima and the associated dependency on starting esti-mates In our applications we start with the conventionalPCA solution and therefore ensure that the ensuingmodes and interactions always account for more variancethan the corresponding linear PCA In this sense there isa unique solution (for any given gradient descent scheme)and this is the nearest to the solution where the second-order eiexclects are zero

Perhaps the most interesting limitation of the techniquepresented in this paper is buried in the assumption thatthere exists a linear combination of the inputs that givesthe expression of the sources This depends on theassumption (see 2) that centrst- and second-order modesare not collinear As long as they are not collinear there isalways a set of feed-forward connection strengths thatspan the subspace of one centrst-order mode that is ortho-gonal to all other modes (centrst and second order) Whatare the implications of collinearity between a centrst- andsecond-order mode Collinearity means that the expres-sion of a centrst-order mode is itself sensitive to the expres-sion of another mode (ie the centrst- and second-ordermodes are the same thing) The possibility of this speaksto two fundamentally diiexclerent context-sensitive eiexclectsThe centrst is when the interaction between two modes orcauses is expressed as a second-order mode with a distri-bution that is distinct from both centrst-order modes This isthe situation considered in this paper and can beaddressed using nonlinear PCA as described aboveSecond the interaction may be expressed solely in termsof the expression of one of the two centrst-order modesHere there is no second-order mode only a contextualexpression of centrst-order modes This second form ofcontext-sensitivity requires a diiexclerent sort of approach(nonlinear ICA) and is interesting because it may repre-sents a true contextual eiexclect with which the brain has tocontend in everyday sensory processing

This work was funded by the Wellcome TrustWe would also liketo thank Gary Green and an anonymous reviewer for help inpresenting this work

REFERENCES

Biswal B Yetkin F Z Haughton V M amp Hyde J S 1995Functional connectivity in the motor cortex of resting humanbrain using echo-planar MRI Magn Res Med 34 537^541

Bulaquo chel C Josephs O Rees G Turner R Frith C D ampFriston K J 1998 The functional anatomy of attention tovisual motion a fMRI study Brain 121 1281^1294

Common P 1994 Independent component analysis a newconcept Signal Processing 36 287^314

Dong D amp McAvoy T J 1996 Nonlinear principal componentanalysisoumlbased on principal curves and neural networksComp Chem Engng 20 65^78

Foldiak P 1990 Forming sparse representations by local anti-Hebbian learning Biol Cybern 64 165^170

Friston K J 1995 Functional and eiexclective connectivity inneuroimaging a synthesis Hum Brain Mapp 2 56^78

Friston K J Frith C Liddle P amp Frackowiak R S J 1993Functional connectivity the principal component analysis oflarge data sets J Cerebr Blood-Flow Metab 13 5^14

Characterizing interactions between modes of brain activity K Friston and others 145

Phil Trans R Soc Lond B (2000)

Friston K J Holmes A P Worsley K J Poline J B FrithC D amp Frackowiak R S J 1995a Statistical parametricmaps in functional imaging a general linear approach HumBrain Mapp 2 189^210

Friston K J Holmes A P Poline J-B Grasby P JWilliams S C R Frackowiak R S J amp Turner R 1995bAnalysis of fMRI time-series revisited NeuroImage 2 45^53

Friston K J Poline J-B Holmes A P Frith C D ampFrackowiak R S J 1996a A multivariate analysis of PETactivation studies Hum Brain Mapp 4 140^151

Friston K J Frith C D Fletcher P Liddle P F ampFrackowiak R S J 1996b Functional topography multi-dimensional scaling and functional connectivity in the brainCerebr Cortex 6 156^164

Friston K J Ashburner J Frith C D Poline J-B HeatherJ D amp Frackowiak R S J 1996c Spatial registration andnormalisation of images Hum Brain Mapp 3 165^189

Friston K J Josephs O Rees G amp Turner R 1998Nonlinear event-related responses in fMRI Magn Reson Med39 41^52

Karhunen J amp Joutsensalo J 1994 Representation and separa-tion of signals using nonlinear PCA type learning NeuralNetworks 7 113^127

Krame M A 1991 Nonlinear principal component analysisusing auto-associative neural networks AIChE J 37 233^243

McDonald R P 1984 Concentrmatory models for non-linear structural analysis In Data analysis and informaticsIII (Versailles 1983) pp 425^432 Amsterdam North-Holland

McIntosh A R Bookstien F L Haxby J V amp Grady C L1996 Spatial pattern analysis of functional brain images usingpartial least squares NeuroImage 3 143^157

Softky W amp Kammen D 1991 Correlations in high dimen-sional or asymmetric data sets Hebbian neuronal processingNeural Networks 4 337^347

Talairach J amp Tournoux P 1988 A coplanar stereotaxic atlas of ahuman brain Stuttgart Germany Thieme

Taleb A amp Jutten C 1997 Nonlinear source separation thepost-nonlinear mixtures In ESANN rsquo97 Bruges April 1997pp 279^284

Treue S amp Maunsell H R 1996 Attentional modulation ofvisual motion processing in cortical areas MT and MSTNature 382 539^541

Wold S 1992 Nonlinear partial least squares modellingChemometrics Int Lab Syst 14 71^84

Worsley K J amp Friston K J 1995 Analysis of fMRI time-series revisitedoumlagain NeuroImage 2 173^181

Zeki S 1990 The motion pathways of the visual cortex InVision coding and ecurrenciency (ed C Blakemore) pp 321^345Cambridge University Press

146 K Friston and others Characterizing interactions between modes of brain activity

PhilTrans R Soc Lond B (2000)

Page 2: Nonlinear PCA: characterizing interactions between modes ... › ~karl › Nonlinear PCA.pdf · Eigenimage analysis has been applied widely to positron emission tomographic (PET)

Eigenimage analysis has been applied widely topositron emission tomographic (PET) and subsequentlyfunctional magnetic resonance imaging (fMRI) data Itis generally used as an exploratory device to characterizethe main contributions to coherent brain activity Thesevariance components may or may not be related toexperimental design and recently spontaneous or endo-genous correlations have been observed in the motorsystem without experimental manipulation (Biswal et al1995) Despite its exploratory power eigenimage analysisis fundamentally limited for two reasons First it oiexclersonly a linear decomposition of any set of neurophysio-logical measurements and second the particular set ofeigenimages or spatial modes obtained are uniquelydetermined by constraints that are biologically implau-sible These aspects of PCA represent inherent limitationson the interpretability and usefulness of eigenimageanalysis of biological time-series

In this paper we introduce a new technique thatresolves both the problems of linearity and non-uniquenessof the modes This technique is a special case of nonlinearPCA that is motivated by a second-order approximationto any general interactions among a small number of`sourcesrsquo or causesrsquo of variance in multivariate time-seriesIn brief this technique identicentes a small number ofsources or components that explain the most variance inthe observed data while allowing for high-order inter-actions among these sources The sources are identicentedsubject to and only to the constraint that they are ortho-gonal or uncorrelated By virtue of the nonlinear inter-actions the sources are uniquely identicented eschewing theneed to refer to unnatural constraints

(b) The importance of nonlinear PCA incharacterizing distributed brain systems

As noted above the two main limitations of conven-tional eigenimage analysis are that the decomposition ofany observed time-series is in terms of linearly separablecomponents characterized by their spatial modes andscores Second that the spatial modes are somewhat arbi-trarily constrained to be orthogonal and account succes-sively for the largest amount of variance In general theidenticentcation of independent components (independentcomponent analysis or ICA) is only possible to withinsome permutation and scaling PCA relaxes the require-ment of independence and replaces it with orthogonalityintroducing the further problem that there is no uniquerotation of the principal components In PCA a uniquerotation and permutation is obtained by requiring succes-sive modes to account for the greatest amount of variancethat remains once higher components have beenremoved Scaling is constrained by ensuring the spatialmodes have unit sum of squares

From a biological perspective the linearity constraintis a rather severe one Because the decomposition is linearit precludes interactions among the causes of spatialmodes This is a highly unnatural restriction on theactivity expressed by distributed brain systems where oneexpects to see substantial interactions that render theexpression of one mode sensitive to the expression ofothers There are numerous examples of nonlinear inter-actions and modulatory eiexclects that shape the context-sensitive nature of neuronal dynamics and brain activity

Perhaps the most compelling example at a systems levelis attentional modulation Consider two distributed brainsystems one subserving the processing of dynamic visualstimuli and the other responsible for a particular atten-tional set In the context of attending to visual motionthe neuronal responses in the visual system elicited bymotion stimuli will depend on whether the subject isattending to this attribute or not Attentional status willbe repoundected in the activity of some attentional mode andtherefore the expression of the visual processing modewill be a function of the expression of the attentionalmode It is more than likely that the implicit interactionbetween the visual and attentional modes will result notonly in the degree to which the modes are expressed butin their form or relative regionally specicentc contributionsFor example activity in visual area V5 that has beenimplicated in the processing of visual motion (Zeki 1990)may be enhanced (relative to say V2 or V1) whenever theappropriate attentional mode is being expressed (Treue ampMaunsell 1996 Bulaquo chel et al 1998) This context-sensitiveexpression of spatial modes can be modelled conceptuallyin terms of centrst- and second-order eiexclects In thisexample there are two sources or causes of distributedneuronal responses namely the presence of visual motionin the visual centeld and attention Both these causes areexpressed in terms of activity in their respective spatialmodes and the interactions between these two causeswould correspond to a second-order eiexclect that wasexpressed in V5 These second-order eiexclects can bethought of as changes in the centrst mode that depend onthe expression of activity in the second mode or equiva-lently modulation of one mode that is sensitive to thecontext engendered by the other

The example considered in this paper is based on afMRI study of visual processing that was designed toaddress the interaction between colour and motionprocessing We had expected to demonstrate that a`colourrsquo mode and `motionrsquo mode would interact toproduce a second-order mode repoundecting (i) reciprocalinteractions between extrastriate areas functionallyspecialized for colour and motion (ii) interactions inlower visual areas mediated by convergent backwardseiexclerents or (iii) interactions in the pulvinar mediated bycorticothalamic loops) Two out of three of these predic-tions were seen (see frac12 3(b))

In summary to properly model the context-sensitivenature of distributed but coherent brain responses it maybe necessary to address interactions among spatial modesthat allow for the modulation of one mode by anotherThese modulatory eiexclects are second- or high-order innature and correspond to an interaction among theunderlying causes in determining a particular pattern ofcortical responses This is the principle motivation for thedevelopment and use of nonlinear forms of PCA

This paper is divided into two sections The centrst sectionreviews the theoretical background to nonlinear PCAcentrst in general terms and then the specicentc implementa-tion proposed here This section includes the theoreticalmotivation behind the particular form of the decomposi-tion employed how sources and modes are identicented andhow the ensuing modes can be interpreted The secondsection is an illustrative application of nonlinear PCA to

136 K Friston and others Characterizing interactions between modes of brain activity

PhilTrans R Soc Lond B (2000)

the original PET study used to illustrate eigenimageanalysis (Friston et al 1993) and a fMRI study of visualmotion and colour processing that exemplicentes the modu-lation of one brain system by another

2 THEORETICAL BACKGROUND

(a) Nonlinear PCANonlinear PCA (eg Kramer 1991 Softky amp Kammen

1991 Karhunen amp Joutsensalo 1994) is a natural exten-sion of PCA in a sense that it aims to identify a smallnumber of underlying components or sources causing amultivariate data set that best explain the observedvariance^covariance structure Nonlinear PCA is itself avariant of related nonlinear approaches to structuralanalysis that have been pioneered over the last fewdecades These include nonlinear factor analysis (egMcDonald 1984) and nonlinear partial least squares (egWold 1992)

Imagine that we had m observations of an n-variateFor example m scans where each scan comprised n voxelsWe can represent these data as m points in an n-dimensionalspace In conventional PCA the centrst principal compo-nent corresponds to the direction of a line running alongthe principal axis of the resulting cloud of points Thedirection of this line is specicented by an n-vector that corre-sponds to the eigenimage and the projections of any onepoint on to this line give the expression of this componentfor the observation (ie point) in question There are twoequivalent perspectives on the role that the principal axisserves The centrst is that the projection of the m points on tothis line has the maximum dispersion or variance Inother words the component scores of the centrst principalcomponent has the most variance The other perspectiveis that the average distance of any point from this line isminimized in relation to all possible lines In other wordsthe centrst principal component is the pattern over the nvoxels that minimizes the unexplained variance in thedata Nonlinear PCA adopts exactly the same principlesbut allows for curvilinear lines In brief a curve is centttedto the data in n-space such that the average distance ofthe data points from this principal curve is minimized(centgure 1) This heuristic description highlights the inti-mate relationship between nonlinear PCA and the identi-centcation of principal curves or surfaces (Dong amp McAvoy1996) In the case of linear PCA the principal axes aredetermined analytically using the eigenvector solution ofthe npound n covariance matrix In nonlinear PCA there is noclosed-form solution and iterative techniques are gener-ally employed These iterative approaches are usually bestframed in terms of simple neural networks using gradientascent or descent on the weights of the connections withinthe network One attractive architecture (Kramer 1991)that has been used in this context is based on the notionof `bottle-neck nodesrsquo These architectures have centve layerswith a mirror symmetry about the middle or third layerThe centrst and centfth layers represent inputs and outputsThe middle layer has typically a very small number Jof nodes The intermediate layers two and four havelarger numbers of nodes or neurons than the input oroutput layers and employ some nonlinear activation func-tion The network is trained to reproduce its input at theoutputs This simple training forces the network to learn

a nonlinear function of the inputs that best predicts theinputs themselves subject to the constraint that it can beexpressed as a function of small number J of sources(activities of the `bottle-neck nodesrsquo in the middle layer)The transformation from the input to the middle layerrepresents a projection of the data on to the J principalsurfaces of the data and the nonlinear transformationfrom the middle layer to the output layer decentnes the formof these surfaces There are some extremely interestingissues pertaining to the use of these architectures inidentifying principal components of a nonlinear sortbut we will not pursue them here In this paper we takea somewhat diiexclerent approach that embodies someexplicit constraints on the form of the nonlinearities thatmay cause biological data and develop an alternativearchitecture that retains the two basic principles of(i) using `bottle-neck nodesrsquo and (ii) training the

Characterizing interactions between modes of brain activity K Friston and others 137

Phil Trans R Soc Lond B (2000)

(a)

(b)

y(voxel 2)

x (voxel 1)

r

Vxx + Vyy

voxel 1

r

f(xy) = V0 + Vxx + Vyy + Vxyxy

Figure 1 Schematic illustrating the idea behind nonlinearPCA and principal curves In this simple example there areonly two voxels and a series of images corresponding to pointswith values x y in the plots A conventional PCA centnds theprincipal line or axis given by a linear function of x and y thatminimizes the average squared distance (r) of each point fromthat line (a) Nonlinear PCA is exactly the same but in thisinstance the axis is a curve given by some nonlinear functionof x and y (b)

network so that the output best predicts the input in aleast-squares sense

(b) Second-order PCAIn this subsection we will introduce a variant of

nonlinear PCA that uses a specicentc form for the assumednonlinear mixing of sources to produce the observedresponses in data This form is predicated on interactionsamong sources in the genesis of multivariate time-seriesIn what follows we shall assume that an n-variate obser-vation is caused by a small number of J underlyingsources and interactions among these sources Generallythe observation of the ith variate (eg at the ith voxel)will be some nonlinear function of the underlying sources

yi(t) ˆ fi(s(t)) (1)

where y(t) ˆ [ y1(t) yn(t)] is an n-vector function oftime Similarly for s(t) ˆ [s1(t) sJ(t)]

A second-order approximation of the Taylor expansionof equation (1) about some expected value -s(t) for thesources is given by

yi(t) ordm fi( middots) DaggerX

j

fi

ujuj Dagger

X

jk

2fi

ujukujuk (2)

where u(t) ˆ (s(t)7-s(t)) is an alternative representation ofthe sources Now incorporating all n observations (ievoxels) equation (2) can be expressed in matrix form interms of zeroth- centrst- and second-order modes repre-sented by the n-vectorsV 0V 1 and V 2

y(t) ordm V 0 DaggerX

j

ujV1j Dagger

X

jk

ujukV2jk

where

V 0 ˆ permil f1(s) fn(sdaggerŠ V1j ˆ f1

uj

fn

uj

V2jk ˆ 2f1

ujuk

2fn

ujuk (3)

The elements of the vectors V1 correspond to the centrstorder partial derivatives in equation (2) and the elementsof the vectors V 2 correspond to the second-order deriva-tives V1 and V 2 have the natural interpretation of centrst-and second-order modes respectively In other words thejth source is expressed in terms of the spatial mode V1

j

and the interaction between the jth and kth modesexpressed as a spatial mode V2

jk Equation (3) can beconsidered a special case of a more general equation thatembodies higher-order terms

y(t) ordm V0 DaggerX

j

ujV1j Dagger

X

jk

frac14(ujuk)V2jk (4)

frac14(cent) is some sigmoid or squashing function that allows forslightly more general forms of interactions among sourcesand ensures a unique scaling for the sources u(t)

The above gives a suitable form for a nonlinear decom-position or PCA of a multivariate data set y(t) To identifythe values of u(t) and the spatial modes it is necessary toassume a constraint of orthogonality for the sources Thisis a natural constraint in a sense that the underlying

causes of any biological data should be independent ifthey represent true causes and as such will be ortho-gonal Notice that the assumption of orthogonality is aweaker assumption than independence and it is thisassumption that decentnes the algorithm described here as anonlinear PCA Had we assumed independence then theproblem would become that of nonlinear independentcomponent analysis (Common 1994) which woulddemand a diiexclerent approach

(c) Neuronal architecture and identicentcation ofnonlinear components

In this section the neuronal architecture and gradientdescent scheme used to identify sources and their modesare described Note that equation (4) can be treated as ageneral linear model and as such if we knew the sourcesu(t) the modes could be estimated by minimizing theresiduals tracefRg in a least squares sense where

R ˆ (y iexcl XV)T (y iexcl XV) (5)

and

V ˆ permilV V1 V2Š ˆ (XT X)iexcl1XT y

X ˆ permil1 u1 uk frac14(u1u2) frac14(ukuk)Š

Here 1 is a column of ones and I below is the identitymatrix (XTX)71Xy is simply the least-squares estimatorof V given the inputs y and estimated sources (and theirinteractions) in X The problem therefore reduces toidentifying the variates u(t) corresponding to estimates ofthe sources that minimize the norm of the residualstracefRg By noting the existence of some vector Gi whereV 0Gi ˆ 0 V1

i Gi ˆ 1 for i ˆ j and 0 otherwise andV2

jkGi ˆ 0 then post-multiplying equation (4) throughoutby Gi gives ui(t) ˆ y(t)poundGi This means that there must bea linear combination of the inputs that gives the ithsource One simply has to centnd the linear combination ofinputs that minimizes tracefRg for a given input y(t)subject to the constraint that the sources are orthogonal

These observations lead to the following simple neuralnetwork the network has three layers comprising inputmiddle and output layers The input and output layershave n nodes and linear activation functions and can beimagined as lying next to each other (centgure 2) Themiddle layer comprises a small (J 5 n) number of centrst-order nodes with linear activation functions that receiveinputs from all the input nodes In addition the middlelayer includes p ˆ n(n71)2 second-order nodes thatreceive lateral inputs from the centrst-order nodes Eachsecond-order node receives two inputs that are multipliedand subject to the nonlinear function frac14(cent) to provide theiroutput The network is trained on the feedforwardconnection strengths from the input layer to the centrst-order nodes of the middle layer The weights to the ithcentrst-order node are eiexclectively estimates of Gi Theconnections from all middle-layer nodes to the outputsare determined using the least-squares estimators of themodes given the current estimate of the sources and theinputs according to equation (5) Anti-Hebbian lateralconnections (Foldiak 1993) among the centrst-order nodes inthe middle layer ensure that the sources u(t) are ortho-gonal These JpoundJ lateral connection strengths L are

138 K Friston and others Characterizing interactions between modes of brain activity

PhilTrans R Soc Lond B (2000)

determined at each iteration to render the oiexcl-diagonalelements of Covu) zero

L ˆ I iexcl paraiexcl1curren1=2ET (6)

para is a leading diagonal matrix whose elements correspondto the variance of the sources in the absence of decorre-lating lateral connections (ie para ˆ diagfucurrenT ucurreng whereucurren ˆ y pound G) curren and E are the eigenvalues and eigen-vectors of ucurrenT ucurren Estimates of the sources are given by

u ˆ yG Dagger uL ˆ y pound G(I iexcl L)iexcl1 (7)

Note that substituting equation (6) into equation (7) givesCovfug uT u ˆ para thereby ensuring orthogonality of theestimated sources Implementing changes in the lateralconnections in this way enforces orthogonality of theprojection eiexclected by the feed-forward connections ateach iteration L imposes this constraint because theeiexclective feed-forward connections are G(I7L)71 Essen-tially the architecture is centnding the rotation and scaling

of some projection on to a low-dimensional orthogonalsubspace that enables the interactions among theprojected inputs that ensue to predict as much of theoriginal inputs as possible u enters into equation (5) withy to compute tracefRg The learning rule for the esti-mates of Gi corresponds to gradient descent on the traceof the residuals In practice we use a Nelder^Meadsimplex search as implemented in MATLAB (MathWorksInc Natick MA USA) that minimizes tracefRg whichis simply a function of the input y and the feed-forwardconnections strengths G This neural network appears tobe robust and usually converges within a few tens of itera-tions to give estimates of the underlying sources u(t) andleast-square estimates of corresponding spatial modes VFigure 2 is a schematic that tries to convey the simplicityof the architecture Here the third or output layer hasbeen repoundected back on to the inputs to emphasize thesymmetry between the assumed structure of causes in theworld and the architecture used to identify them

(d) Interpreting the estimatesGenerally in PCA in the absence of any variance

maximization^minimization criterion there is no uniquerotation of the spatial modes (ie any linear combinationof the modes that conforms to an orthogonal rotation isas equally good as any other) The incorporation of theinteraction term into the decomposition implicit in equa-tion (3) ensures a unique rotation and furthermoreincorporating the sigmoid function in equation (4)ensures that the scaling is uniquely determined The latterfollows from the fact that there will be some optimumsquashing of each interaction term to best predict theobserved data The reason that unique solutions obtain inthis form of nonlinear PCA is that we have assumed avery specicentc form for the high-order interactions amongsources in causing the data This is based on the pairwiseinteractions between sources as modelled by the secondterm in equation (4) The interpretation of the sourcesand their modes is relatively straightforward

The observed multivariate data can be explained by asmall number of J sources whose expressions are given byu The values of uj scale the contribution of the centrst-orderspatial mode V1

j in a way that is directly analogous toconventional PCA where uj would be the jth componentscore In addition there are second-order eiexclects thatrepresent interactions between pairs of sources frac14(uj uk)These interactions are expressed in second-order modescorresponding to V2

jk Each second-order mode will havea variance component that may or may not be orthogonalto the centrst-order modes Although the sources are ortho-gonal there is no explicit requirement for the modes to beso The variance accounted for by each source and inter-action is given by

jujj cent jV1j j and jfrac14(ujuk)j cent jV2

jkj (8)

and can be used to rank the relative contributions of eachsource or interaction j cent j denotes the vector norm (iesum of squares)

The nonlinear PCA proposed here therefore decom-poses a multivariate data set into centrst- and second-ordercomponents that can be ascribed to a small number ofunderlying sources The number of sources will be

Characterizing interactions between modes of brain activity K Friston and others 139

Phil Trans R Soc Lond B (2000)

L12

u1

u1

(u1su2

u2

u2)

(u1s u2)

V 212

V12

G1 trace (R)

trainingerror

estimated sources andinteraction (layer 2)

predicted data(layer 3)

y1 y2 y3 y4 y5 y6 y7 y8 y9 y10

y1 y2 y3 y4 y5 y6 y7 y8 y9 y10

V 212

V12

interactions sources

observed data(layer 1)

Figure 2 The neural net architecture used to estimate sourcesand modes The lower half of the schematic represents the realworld with its sources and interactions (here only two sourcesand the subsequent interaction are shown) These sources andinteractions (u) cause signals ( y) in the input layer (layer 1)that here comprises ten voxels or channels The signals causedby the sources are weighted by the voxel-specicentc elements ofthe corresponding centrst- or second-order spatial modes (V1 andV 2) Feedforward connections (G) from the input layer tolayer 2 provide an estimate of the sources (u) in layer 2 Thisestimation obtains by changing G to minimize the sum ofsquared residuals (traceR) or diiexclerences between theobserved signals and those predicted by the activity in layer 3The activity in layer 3 results from backwards connectionsfrom the estimated source and interaction nodes in layer 2These backward connections are the estimates of the spatialmodes (V1 and V 2) and are determined using least-squaresgiven the input ( y) and the current estimate of the sources(u) Lateral decorrelating or anti-Hebbian connections Lbetween the centrst-order modes ensure orthogonality of thesource estimates Note that in the absence of any interactionthe solution would correspond to a conventional PCA whereG ˆ pinv(V1)

generally greater than the minimum number acquired toaccount for the rank of the multivariate data Forexample with just three voxels or channels two sourceswould be sucurrencient if the third dimension was explainedby the interactionbetween these two sourcesThree sourceswould be sucurrencient to account for a six-dimensional dataset and so on Clearly in the analysis of functional neuro-imaging time-series an initial dimension reduction isrequired before nonlinear PCA can be applied Forexample taking the centrst 36 spatial modes of a fMRIimaging time-series using conventional singular valuedecomposition would in principle require only eightunderlying sources In this sense nonlinear PCA representsa parsimonious characterization of the data that capturesnonlinear interactions among spatial modes or distributedbrain systems in a comprehensive but intuitive fashionThese and other issues will be demonstrated in the nextsection which uses real data to illustrate the technique

3 ILLUSTRATIVE EXAMPLES

In this section we will use a multisubject PET study ofverbal pounduency and a fMRI case study of visual processingto illustrate the use of nonlinear PCA and some aspects offunctional anatomy that can be addressed with this tech-nique The PET study is used to show that a considerableamount of variance can be accounted for by interactionsamong causes of the data and is presented for comparisonwith the original linear PCA characterization in Friston etal (1993) In brief we will show that the two experi-mental factors (task and time) combine to expressthemselves in a second-order mode that repoundects time-dependent adaptation of task-related responses Thiseiexclect can be construed as large-scale neurophysiologicalplasticity attributable to strategic changes in cognitiveprocessing during intrinsic relative to extrinsic genera-tion of words The second example using fMRI dealsmore explicitly with the modulation of one brain systemby another In particular the interactions betweenspecialized cortical systems that may be mediated bycorticothalamic loops

(a) A PET study of verbal pounduency(i) Data acquisition experimental design and preprocessing

The data were obtained from centve subjects scanned 12times (every 8 min) while performing one of two verbaltasks Scans were obtained with a CTI PETcamera (model953B CTI Knoxville TN USA) 15O was administeredintravenously as radiolabelled water infused over 2 minTotal counts per voxel during the build-up phase of radio-activity served as an estimate of regional cerebral bloodpoundow (rCBF) Subjects performed two tasks in alternationOne task involved repeating a letter presented aurally atone per two seconds (word shadowing) The other was apaced verbal pounduency task where the subjects respondedwith a word that began with the letter presented (intrinsicword generation)The data were realigned stereotacticallynormalized and smoothed with a 16 mm Gaussian kernel(Friston et al 1996c) The data were subject to a conven-tional SPM analysis using multiple linear regression with12 condition-specicentc eiexclects centve subject eiexclects and globalactivity as described in Friston et al (1995a) Parameterestimates representing condition-specicentc eiexclects averaged

over subjects were selected from voxels that exceeded athreshold of p5005 in the ensuing SPMfFgand subject tononlinear PCA as described below

(ii) Nonlinear PCAThe data were reduced to an eight-dimensional

subspace using SVD and entered into the nonlinear PCAusing two sources The ability of these two sources andtheir interaction to explain the observed regional activityis illustrated in centgure 3a Here an arbitrary voxel (thatshowing the highest F-value in the conventional SPManalysis) was selected from the left inferior frontal gyrus(Brodmann Area 47) The observed condition-specicentcactivity over 12 scans is shown in black and thatpredicted by the two sources is shown in white The rela-tive amount of variance accounted for by the two sourcesand their interaction is shown in the middle panel It canbe seen that 88 of the total variance over all voxelsincluded in the analysis can be explained by two sourcesThe second-order mode accounts of 22 of this (afterremoving that which can be modelled by the centrst-ordereiexclects) and would have been distributed over othermodes in a conventional PCA Figure 3b shows this distri-bution indicating that the centfth and sixth eigenimages ina conventional PCA largely comprise the interactionbetween the two modes identicented by nonlinear PCA

The centrst- and second-order modes are seen in centgure 4along with their expression over the 12 scans It is imme-diately apparent that the centrst mode repoundects task-relatedeiexclects paralleling the alternation between word genera-tion and word shadowing This procentle of brain regions istypical of verbal pounduency paradigms that isolate theintrinsic generation of semantic representations encodingand retrieval processes required to compare the currentoutput with previous words and the maintenance of anappropriate cognitive set The key regions involvedinclude the thalamus dorsolateral prefrontal cortex ante-rior cingulate temporal cortices and cerebellum Thesecond mode represents the other experimental factornamely time or order eiexclects A nonlinear eiexclect is evidentwith increases in activity in the cerebellar thalamic andleft basotemporal regions More interesting is the second-order mode that by implication repoundects an interactionbetween task-related responses and time ie time-dependent increases in physiological responses elicited bycognitive operations that distinguish between the twotasks employed This physiological adaptation in mostpronounced in Brocarsquos Area (Brodmann Area 44 in theleft prefrontal cortex and the right lateral thalamicregions) Brocarsquos Area is traditionally associated withspeech production and appears to undergo a profoundchange in its relative activation during word shadowingand generation after the centrst pair of scans that presum-ably repoundects an underlying change in cognitive archi-tecture or set

(b) A fMRI study of colour and motion processing(i) Data acquisition experimental design and preprocessing

The experiment was performed on a 2 Tesla MagnetomVISION (Siemens Erlangen Germany) whole body MRIsystem equipped with a head volume coil Contiguousmultislice T

2 -weighted fMRI images (TE ˆ 40 ms64 mmpound64 mmpound48 mm 3 mmpound3 mmpound3 mm voxels)

140 K Friston and others Characterizing interactions between modes of brain activity

PhilTrans R Soc Lond B (2000)

were obtained with echoplanar imaging using an axialslice orientation The eiexclective repetition time was 48 s Ayoung right-handed subject was scanned under fourdiiexclerent conditions in six scan epochs intercalated witha low level (visual centxation) baseline condition The fourconditions were repeated eight times in a pseudorandomorder giving 384 scans in total or 32 stimulation^baselineepoch pairs During all stimulation conditions the subjectlooked at dots back-projected on a screen by an LCDvideo projector The four experimental conditionscomprised the presentation of (i) radially moving dotsand (ii) stationary dots using (i) luminance contrast and(ii) chromatic contrast in a two-by-two factorial designLuminance contrast was established using isochromaticstimuli (red dots on a red background or green dots on agreen background) Hue contrast was obtained by usingred (or green) dots on a green (or red) background andestablishing isoluminance with poundicker photometry In thetwo movement conditions the dots moved radially fromthe centre of the screen at 88 s71 to the periphery wherethey vanished This creates the impression of optical poundowBy using these stimuli we hoped to excite activity invisual motion systems and those specialized for colourprocessing Any interaction between these systems wouldbe expressed in terms of motion-sensitive responses thatdepended on the hue or luminance contrast subtendingthat motion

The time-series were realigned corrected for movement-related eiexclects and spatially normalized into the standardspace of Talairach amp Tournoux (1988) using the subjectrsquosco-registered structural T1 scan and nonlinear deforma-tions (Friston et al 1996c) The data were spatiallysmoothed with a 6 mm isotropic Gaussian kernel As inthe PETexample voxels were selected that showed signicent-cant condition-specicentc eiexclects according to a conventionalSPM analysis (Friston et al 1995b Worsley amp Friston1995) This analysis used a multiple linear regression andcondition-specicentc box car regressors convolved with ahaemodynamic response function In this instance thenumber of voxels was exceeding large and we used ahigher threshold than in the PETanalysis ( p ˆ 0001) andincluded only those voxels that were posterior to theposterior commissure

(ii) Nonlinear PCAThe data were again reduced to an eight-dimensional

subspace using SVD and entered into the nonlinear PCAusing two sources The functional attribution of thesesources was established by looking at the expression of thecorresponding centrst-order modes over the four conditionsThe expression of epoch-related responses over all 32stimulation^baseline epoch pairs are shown in terms ofthe four conditions in centgure 5 This expression is simplythe score on the centrst principal component over all 32epoch-related responses for each source The centrst mode is

Characterizing interactions between modes of brain activity K Friston and others 141

Phil Trans R Soc Lond B (2000)

(a)

(b)

(c)

5

4

3

2

1

0

- 1

- 2

- 3

acti

vity

70

60

50

40

30

20

10

0

4

35

3

25

2

15

1

05

0

o

f va

rian

ce

1 2 3 4 5 6scan

7 8 9 10 11 12

mode 1652

mode 2207

interaction22

1 2 3 4eigenimage

5 6 7 8

Figure 3 Variance partitioning following a nonlinear PCA ofthe PET verbal pounduency study (a) Observed activity in a voxelin the left inferior frontal gyrus (centlled bars) and that predictedon the basis of two sources and their interaction (open bars)estimated with nonlinear PCA Activity is in units

Figure 3 (Cont) corresponding to regional cerebralperfusion in ml dl71 min71 (b) Variance over all voxelsincluded in the analysis accounted for by the two sources (ormodes) and their interaction Total ˆ 88 (c) Distributionof variance accounted for by second-order or interactioneiexclects over the conventional eigenimages obtained in theinitial SVD dimension reduction

clearly a motion-sensitive mode but one that embodiessome colour preference in the sense that the motion-dependent responses of this system are accentuated in thepresence of colour cues This was not quite what we hadanticipated the centrst-order eiexclect contains what would

functionally be called an interaction between motion andcolour processing The second source appears to beconcerned exclusively with colour processing in the sensethat its expression is uniformly higher under colourstimuli relative to isochromatic stimuli in a way that does

142 K Friston and others Characterizing interactions between modes of brain activity

PhilTrans R Soc Lond B (2000)

(a) (b)

(i)

(ii)

(iii)

15

1

05

0

- 05

- 1

- 15

1

05

0

- 05

- 1

- 15

acute 10- 5

5

0

- 5

acute 10- 6

1 2 3 4 5 6condition

7 8 9 10 11 12

Figure 4 Maximum intensity projections and expression of the centrst- and second-order spatial modes of the PET verbal pounduencystudy (i) Spatial mode 1 (ii) spatial mode 2 and (iii) second-order mode The maximum intensity projections (a) are of thepositive values of each mode and are displayed in standard format The three orthogonal brain views are from the right theback and the top of the brain The projections have been scaled to the maximum intensity of each mode The time-dependentexpression of these modes are in terms of the 12 scans (b) The units are adimensional and their absolute values are not important(the variance they account for is determined by the scaling of the spatial modes which in turn is dictated by the sigmoidsquashing function see centgure 3)

not depend on motion The corresponding anatomicalprocentle is seen in centgure 6 (maximum intensity projectionsin centgure 6a and thresholded axial sections in centgure 6b)The centrst-order mode which shows both motion andcolour-related responses shows high loadings in bilateralmotion-sensitive complex V5 (Brodmann Areas 19 and 37at the occipto-temporal junction) and areas traditionallyassociated with colour processing (V4oumlthe lingualgyrus Brodmann Area 19 ventromedially) The secondcentrst-order mode is most prominent in the hippocampusparahippocampal and related lingual cortices on bothsides The two more lateral blobs subsume the tails of thecaudate nuclei (centgure 6b(ii)) This system is not onenormally associated with colour processing but it should

be noted that some of the main eiexclect of colour has beenexplained by the centrst mode that includes V4 Insummary the two centrst-order modes comprise (i) anextrastriate cortical system including V5 and V4 thatresponds to motion and preferentially so when motion issupported by colour cues and (ii) a (para)hippocampal^lingual system that is concerned exclusively with colourprocessing above and beyond that accounted for by thecentrst system The critical question is where do these modesinteract

The interaction between the extrastriate and (para)hippocampal lingual systems conforms to the second-order mode in the lower panels This mode highlights thepulvinar of the thalamus and V5 bilaterally This is apleasing result in that it clearly implicates the thalamus inthe integration of extrastriate and (para)hippocampalsystems This integration being mediated by recurrent(sub)corticothalamic connections It is also a result thatwould not have obtained from a conventional SPManalysis Indeed we looked for an interaction betweenmotion and colour processing and did not see any sucheiexclect in the pulvinar The reason that the nonlinear PCAwas able to centnd this interaction was that there were noconstraints on the sources underlying the interaction In aconventional SPM analysis the sources are explicitlyassumed to be colour and motion in the visual centeldwhereas the two interacting modes identicented by thenonlinear PCA were caused by complicated admixtures ofcolour and motion This result is presented to illustratethe potential usefulness of nonlinear PCA not to makeany statistical inferences about reproducible functionalarchitectures The exploratory analysis based on this casestudy could now be used to motivate hypothesis-ledanalyses of other subjects

4 CONCLUSION

In this paper we have described a specicentc form ofnonlinear PCA that is predicated on the interactionbetween underlying sources in modulating spatial modesof brain activity Its theoretical motivation stems directlyfrom a second-order approximation to the Taylor expan-sion of any nonlinear function of sources that can causemultivariate observations A simple three-layer neuronalnetwork architecture is sucurrencient to identify or estimatethe underlying causes and associated centrst- and second-order spatial modes The centrst-order modes correspond toconventional eigenimages or principal components andthe second-order modes describe the patterns of brainactivity that result from interactions among thesesources The ensuing decomposition into centrst- andsecond-order components represents an exploratoryanalysis of the data that eschews some of the shortcom-ings of conventional PCA In particular nonlinear PCAallows for the context-sensitive expression of spatialmodes through second-order modes that can be inter-preted as the anatomical substrate of integration ormodulation The highly constrained form of nonlinearPCA presented above has an intuitive interpretation interms of pairwise interactions among underlying sourcesand by virtue of this represents a useful and parsimo-nious characterization of functional neuroimaging time-series

Characterizing interactions between modes of brain activity K Friston and others 143

Phil Trans R Soc Lond B (2000)

04

03

02

01

00

- 01

- 02

- 03

(a)

04

02

0

- 02

- 04

- 06

(b)

cond

itio

nndashsp

ecif

ic e

xpre

ssio

n

motion

motion

stat

colour colour

isoch isoch

stat

1 2condition

3 4

Figure 5 Condition-specicentc expression of the two centrst-ordermodes ensuing from the visual processing fMRI studyThese data represent the degree to which the centrst principalcomponent of epoch-related waveforms over the 32 photicstimulation^baseline pairs was expressed These condition-specicentc responses are plotted in terms of the four conditionsfor the two modes motion motion present stat stationarydots colour isoluminant chromatic contrast stimuli isochisochromatic luminance contrast stimuli (a) First-ordermode 1 (b) centrst-order mode 2

In this paper we have chosen to illustrate the techniqueusing the interaction between modes associated withcolour and motion processing Nonlinear PCA could ofcourse be used in any situation where one expects theactivity of a distributed brain system to be modulated bythe expression of another system Many examples come tomind that may or may not be grounded in cognitivescience or neuroscience models For example Are the

modes implicated in the visual processing of word formsand graphemes modulated by semantic modes in moreanterior temporal and parietal cortices Althoughnonlinear PCA is an exploratory device and is implicitlydata-led careful experimental design can be used tocontrol the expression of various spatial modes that onewishes to characterize As a general point it is likely thatthe more powerful designs will be factorial in nature

144 K Friston and others Characterizing interactions between modes of brain activity

PhilTrans R Soc Lond B (2000)

z = - 4 mm

z = 0 mm

z = 4 mm

(a) (b)

(i)

(ii)

(iii)

Figure 6 Maximum intensity projections and axial (transverse) sections of the centrst- and second-order spatial modes of the fMRIphotic stimulation study The maximum intensity projections (a) adhere to the same format as in centgure 4 The axial slices havebeen selected to include the maxima of the corresponding spatial modes (i) Spatial mode 1 (ii) spatial mode 2 and (iii) second-order mode In this display format the modes have been thresholded at 164 of each modersquos standard deviation over all voxels(white areas) The resulting excursion set has been superimposed onto a structural T1-weighted MRI image conforming to thesame anatomical space (b)

allowing the expression of one mode associated with oneexperimental factor to be assessed under diiexclerent levelsof the expression of a second mode elicited by a secondexperimental cognitive or sensory factor In someinstances factorial designs are not always easy toimplement (eg in selective attention because it is dicurrencultto attend selectively to a particular attribute when it isnot present in the visual centeld) However many multi-factorial experiments designed to look at languageprocessing and memory may lend themselves nicely tocharacterization using the techniques described in thispaper

(a) Extensions and limitationsThe limitations of the nonlinear PCA proposed above

are embodied in the constraints on the form of thedecomposition assumed The most obvious constraint isthat it only allows for second-order interactions amongsources or causes of the data whereas higher-order inter-actions may prevail It would of course be easy toextend the neural net architecture to include third- orhigher-order nodes and this may be justicented in some dataanalytic situations In neuroimaging however the time-series one deals with are usually quite short and noisy andsimply identifying second-order eiexclects can be quite ambi-tious The second limitation is that the number of sourceshas to be prespecicented Again this may be a drawback interms of system identicentcation and independent compo-nent analysis in general However in neuroimaging onehas experimental control over the number of factors (iesources) that are likely to cause neurophysiologicalchanges and specifying the number of sources is a muchmore tenable in terms of justicentable restrictions on thecasual model assumed for the data

Another important consideration is that in the specialapplication of nonlinear PCA to functional imagingdata an initial dimension reduction using SVD isrequired This is because there are many more voxelsthan observations It is well known that systematic errorscan creep into applying SVD to simple nonlinear depen-dencies and that these depend on the rate of convergenceof the Taylor series associated with equation (1) In thispaper the SVD is done centrst and then the nonlinearanalysis is performed It is always possible that the SVDhas not established the right bases for the subsequentanalysis and that some bias will ensue The result will bethat apparent modulations of the centrst-order modes willnot be correct These issues represent areas of futurework and could be addressed using `toyrsquo nonlinearsystems and synthetic imaging data and by examiningthe sensitivity of the second-order modes to the degree ofSVD dimension reduction

A more biological consideration relates to themechanism of the interaction In the examples presentedabove we have assumed that the interaction is expressedat a neuronal level in terms of modulation of neuronalresponses and dynamics themselves It should be borne inmind that interactions can also be expressed at the levelof the haemodynamic response to [non-interacting]neuronal responses (eg Friston et al 1998) This shouldbe considered where there is a high degree of anatomicalconvergence between centrst-order modes that evidence astrong interaction

By virtue of the iterative scheme used for learning inthe neural net there is always the problem of localminima and the associated dependency on starting esti-mates In our applications we start with the conventionalPCA solution and therefore ensure that the ensuingmodes and interactions always account for more variancethan the corresponding linear PCA In this sense there isa unique solution (for any given gradient descent scheme)and this is the nearest to the solution where the second-order eiexclects are zero

Perhaps the most interesting limitation of the techniquepresented in this paper is buried in the assumption thatthere exists a linear combination of the inputs that givesthe expression of the sources This depends on theassumption (see 2) that centrst- and second-order modesare not collinear As long as they are not collinear there isalways a set of feed-forward connection strengths thatspan the subspace of one centrst-order mode that is ortho-gonal to all other modes (centrst and second order) Whatare the implications of collinearity between a centrst- andsecond-order mode Collinearity means that the expres-sion of a centrst-order mode is itself sensitive to the expres-sion of another mode (ie the centrst- and second-ordermodes are the same thing) The possibility of this speaksto two fundamentally diiexclerent context-sensitive eiexclectsThe centrst is when the interaction between two modes orcauses is expressed as a second-order mode with a distri-bution that is distinct from both centrst-order modes This isthe situation considered in this paper and can beaddressed using nonlinear PCA as described aboveSecond the interaction may be expressed solely in termsof the expression of one of the two centrst-order modesHere there is no second-order mode only a contextualexpression of centrst-order modes This second form ofcontext-sensitivity requires a diiexclerent sort of approach(nonlinear ICA) and is interesting because it may repre-sents a true contextual eiexclect with which the brain has tocontend in everyday sensory processing

This work was funded by the Wellcome TrustWe would also liketo thank Gary Green and an anonymous reviewer for help inpresenting this work

REFERENCES

Biswal B Yetkin F Z Haughton V M amp Hyde J S 1995Functional connectivity in the motor cortex of resting humanbrain using echo-planar MRI Magn Res Med 34 537^541

Bulaquo chel C Josephs O Rees G Turner R Frith C D ampFriston K J 1998 The functional anatomy of attention tovisual motion a fMRI study Brain 121 1281^1294

Common P 1994 Independent component analysis a newconcept Signal Processing 36 287^314

Dong D amp McAvoy T J 1996 Nonlinear principal componentanalysisoumlbased on principal curves and neural networksComp Chem Engng 20 65^78

Foldiak P 1990 Forming sparse representations by local anti-Hebbian learning Biol Cybern 64 165^170

Friston K J 1995 Functional and eiexclective connectivity inneuroimaging a synthesis Hum Brain Mapp 2 56^78

Friston K J Frith C Liddle P amp Frackowiak R S J 1993Functional connectivity the principal component analysis oflarge data sets J Cerebr Blood-Flow Metab 13 5^14

Characterizing interactions between modes of brain activity K Friston and others 145

Phil Trans R Soc Lond B (2000)

Friston K J Holmes A P Worsley K J Poline J B FrithC D amp Frackowiak R S J 1995a Statistical parametricmaps in functional imaging a general linear approach HumBrain Mapp 2 189^210

Friston K J Holmes A P Poline J-B Grasby P JWilliams S C R Frackowiak R S J amp Turner R 1995bAnalysis of fMRI time-series revisited NeuroImage 2 45^53

Friston K J Poline J-B Holmes A P Frith C D ampFrackowiak R S J 1996a A multivariate analysis of PETactivation studies Hum Brain Mapp 4 140^151

Friston K J Frith C D Fletcher P Liddle P F ampFrackowiak R S J 1996b Functional topography multi-dimensional scaling and functional connectivity in the brainCerebr Cortex 6 156^164

Friston K J Ashburner J Frith C D Poline J-B HeatherJ D amp Frackowiak R S J 1996c Spatial registration andnormalisation of images Hum Brain Mapp 3 165^189

Friston K J Josephs O Rees G amp Turner R 1998Nonlinear event-related responses in fMRI Magn Reson Med39 41^52

Karhunen J amp Joutsensalo J 1994 Representation and separa-tion of signals using nonlinear PCA type learning NeuralNetworks 7 113^127

Krame M A 1991 Nonlinear principal component analysisusing auto-associative neural networks AIChE J 37 233^243

McDonald R P 1984 Concentrmatory models for non-linear structural analysis In Data analysis and informaticsIII (Versailles 1983) pp 425^432 Amsterdam North-Holland

McIntosh A R Bookstien F L Haxby J V amp Grady C L1996 Spatial pattern analysis of functional brain images usingpartial least squares NeuroImage 3 143^157

Softky W amp Kammen D 1991 Correlations in high dimen-sional or asymmetric data sets Hebbian neuronal processingNeural Networks 4 337^347

Talairach J amp Tournoux P 1988 A coplanar stereotaxic atlas of ahuman brain Stuttgart Germany Thieme

Taleb A amp Jutten C 1997 Nonlinear source separation thepost-nonlinear mixtures In ESANN rsquo97 Bruges April 1997pp 279^284

Treue S amp Maunsell H R 1996 Attentional modulation ofvisual motion processing in cortical areas MT and MSTNature 382 539^541

Wold S 1992 Nonlinear partial least squares modellingChemometrics Int Lab Syst 14 71^84

Worsley K J amp Friston K J 1995 Analysis of fMRI time-series revisitedoumlagain NeuroImage 2 173^181

Zeki S 1990 The motion pathways of the visual cortex InVision coding and ecurrenciency (ed C Blakemore) pp 321^345Cambridge University Press

146 K Friston and others Characterizing interactions between modes of brain activity

PhilTrans R Soc Lond B (2000)

Page 3: Nonlinear PCA: characterizing interactions between modes ... › ~karl › Nonlinear PCA.pdf · Eigenimage analysis has been applied widely to positron emission tomographic (PET)

the original PET study used to illustrate eigenimageanalysis (Friston et al 1993) and a fMRI study of visualmotion and colour processing that exemplicentes the modu-lation of one brain system by another

2 THEORETICAL BACKGROUND

(a) Nonlinear PCANonlinear PCA (eg Kramer 1991 Softky amp Kammen

1991 Karhunen amp Joutsensalo 1994) is a natural exten-sion of PCA in a sense that it aims to identify a smallnumber of underlying components or sources causing amultivariate data set that best explain the observedvariance^covariance structure Nonlinear PCA is itself avariant of related nonlinear approaches to structuralanalysis that have been pioneered over the last fewdecades These include nonlinear factor analysis (egMcDonald 1984) and nonlinear partial least squares (egWold 1992)

Imagine that we had m observations of an n-variateFor example m scans where each scan comprised n voxelsWe can represent these data as m points in an n-dimensionalspace In conventional PCA the centrst principal compo-nent corresponds to the direction of a line running alongthe principal axis of the resulting cloud of points Thedirection of this line is specicented by an n-vector that corre-sponds to the eigenimage and the projections of any onepoint on to this line give the expression of this componentfor the observation (ie point) in question There are twoequivalent perspectives on the role that the principal axisserves The centrst is that the projection of the m points on tothis line has the maximum dispersion or variance Inother words the component scores of the centrst principalcomponent has the most variance The other perspectiveis that the average distance of any point from this line isminimized in relation to all possible lines In other wordsthe centrst principal component is the pattern over the nvoxels that minimizes the unexplained variance in thedata Nonlinear PCA adopts exactly the same principlesbut allows for curvilinear lines In brief a curve is centttedto the data in n-space such that the average distance ofthe data points from this principal curve is minimized(centgure 1) This heuristic description highlights the inti-mate relationship between nonlinear PCA and the identi-centcation of principal curves or surfaces (Dong amp McAvoy1996) In the case of linear PCA the principal axes aredetermined analytically using the eigenvector solution ofthe npound n covariance matrix In nonlinear PCA there is noclosed-form solution and iterative techniques are gener-ally employed These iterative approaches are usually bestframed in terms of simple neural networks using gradientascent or descent on the weights of the connections withinthe network One attractive architecture (Kramer 1991)that has been used in this context is based on the notionof `bottle-neck nodesrsquo These architectures have centve layerswith a mirror symmetry about the middle or third layerThe centrst and centfth layers represent inputs and outputsThe middle layer has typically a very small number Jof nodes The intermediate layers two and four havelarger numbers of nodes or neurons than the input oroutput layers and employ some nonlinear activation func-tion The network is trained to reproduce its input at theoutputs This simple training forces the network to learn

a nonlinear function of the inputs that best predicts theinputs themselves subject to the constraint that it can beexpressed as a function of small number J of sources(activities of the `bottle-neck nodesrsquo in the middle layer)The transformation from the input to the middle layerrepresents a projection of the data on to the J principalsurfaces of the data and the nonlinear transformationfrom the middle layer to the output layer decentnes the formof these surfaces There are some extremely interestingissues pertaining to the use of these architectures inidentifying principal components of a nonlinear sortbut we will not pursue them here In this paper we takea somewhat diiexclerent approach that embodies someexplicit constraints on the form of the nonlinearities thatmay cause biological data and develop an alternativearchitecture that retains the two basic principles of(i) using `bottle-neck nodesrsquo and (ii) training the

Characterizing interactions between modes of brain activity K Friston and others 137

Phil Trans R Soc Lond B (2000)

(a)

(b)

y(voxel 2)

x (voxel 1)

r

Vxx + Vyy

voxel 1

r

f(xy) = V0 + Vxx + Vyy + Vxyxy

Figure 1 Schematic illustrating the idea behind nonlinearPCA and principal curves In this simple example there areonly two voxels and a series of images corresponding to pointswith values x y in the plots A conventional PCA centnds theprincipal line or axis given by a linear function of x and y thatminimizes the average squared distance (r) of each point fromthat line (a) Nonlinear PCA is exactly the same but in thisinstance the axis is a curve given by some nonlinear functionof x and y (b)

network so that the output best predicts the input in aleast-squares sense

(b) Second-order PCAIn this subsection we will introduce a variant of

nonlinear PCA that uses a specicentc form for the assumednonlinear mixing of sources to produce the observedresponses in data This form is predicated on interactionsamong sources in the genesis of multivariate time-seriesIn what follows we shall assume that an n-variate obser-vation is caused by a small number of J underlyingsources and interactions among these sources Generallythe observation of the ith variate (eg at the ith voxel)will be some nonlinear function of the underlying sources

yi(t) ˆ fi(s(t)) (1)

where y(t) ˆ [ y1(t) yn(t)] is an n-vector function oftime Similarly for s(t) ˆ [s1(t) sJ(t)]

A second-order approximation of the Taylor expansionof equation (1) about some expected value -s(t) for thesources is given by

yi(t) ordm fi( middots) DaggerX

j

fi

ujuj Dagger

X

jk

2fi

ujukujuk (2)

where u(t) ˆ (s(t)7-s(t)) is an alternative representation ofthe sources Now incorporating all n observations (ievoxels) equation (2) can be expressed in matrix form interms of zeroth- centrst- and second-order modes repre-sented by the n-vectorsV 0V 1 and V 2

y(t) ordm V 0 DaggerX

j

ujV1j Dagger

X

jk

ujukV2jk

where

V 0 ˆ permil f1(s) fn(sdaggerŠ V1j ˆ f1

uj

fn

uj

V2jk ˆ 2f1

ujuk

2fn

ujuk (3)

The elements of the vectors V1 correspond to the centrstorder partial derivatives in equation (2) and the elementsof the vectors V 2 correspond to the second-order deriva-tives V1 and V 2 have the natural interpretation of centrst-and second-order modes respectively In other words thejth source is expressed in terms of the spatial mode V1

j

and the interaction between the jth and kth modesexpressed as a spatial mode V2

jk Equation (3) can beconsidered a special case of a more general equation thatembodies higher-order terms

y(t) ordm V0 DaggerX

j

ujV1j Dagger

X

jk

frac14(ujuk)V2jk (4)

frac14(cent) is some sigmoid or squashing function that allows forslightly more general forms of interactions among sourcesand ensures a unique scaling for the sources u(t)

The above gives a suitable form for a nonlinear decom-position or PCA of a multivariate data set y(t) To identifythe values of u(t) and the spatial modes it is necessary toassume a constraint of orthogonality for the sources Thisis a natural constraint in a sense that the underlying

causes of any biological data should be independent ifthey represent true causes and as such will be ortho-gonal Notice that the assumption of orthogonality is aweaker assumption than independence and it is thisassumption that decentnes the algorithm described here as anonlinear PCA Had we assumed independence then theproblem would become that of nonlinear independentcomponent analysis (Common 1994) which woulddemand a diiexclerent approach

(c) Neuronal architecture and identicentcation ofnonlinear components

In this section the neuronal architecture and gradientdescent scheme used to identify sources and their modesare described Note that equation (4) can be treated as ageneral linear model and as such if we knew the sourcesu(t) the modes could be estimated by minimizing theresiduals tracefRg in a least squares sense where

R ˆ (y iexcl XV)T (y iexcl XV) (5)

and

V ˆ permilV V1 V2Š ˆ (XT X)iexcl1XT y

X ˆ permil1 u1 uk frac14(u1u2) frac14(ukuk)Š

Here 1 is a column of ones and I below is the identitymatrix (XTX)71Xy is simply the least-squares estimatorof V given the inputs y and estimated sources (and theirinteractions) in X The problem therefore reduces toidentifying the variates u(t) corresponding to estimates ofthe sources that minimize the norm of the residualstracefRg By noting the existence of some vector Gi whereV 0Gi ˆ 0 V1

i Gi ˆ 1 for i ˆ j and 0 otherwise andV2

jkGi ˆ 0 then post-multiplying equation (4) throughoutby Gi gives ui(t) ˆ y(t)poundGi This means that there must bea linear combination of the inputs that gives the ithsource One simply has to centnd the linear combination ofinputs that minimizes tracefRg for a given input y(t)subject to the constraint that the sources are orthogonal

These observations lead to the following simple neuralnetwork the network has three layers comprising inputmiddle and output layers The input and output layershave n nodes and linear activation functions and can beimagined as lying next to each other (centgure 2) Themiddle layer comprises a small (J 5 n) number of centrst-order nodes with linear activation functions that receiveinputs from all the input nodes In addition the middlelayer includes p ˆ n(n71)2 second-order nodes thatreceive lateral inputs from the centrst-order nodes Eachsecond-order node receives two inputs that are multipliedand subject to the nonlinear function frac14(cent) to provide theiroutput The network is trained on the feedforwardconnection strengths from the input layer to the centrst-order nodes of the middle layer The weights to the ithcentrst-order node are eiexclectively estimates of Gi Theconnections from all middle-layer nodes to the outputsare determined using the least-squares estimators of themodes given the current estimate of the sources and theinputs according to equation (5) Anti-Hebbian lateralconnections (Foldiak 1993) among the centrst-order nodes inthe middle layer ensure that the sources u(t) are ortho-gonal These JpoundJ lateral connection strengths L are

138 K Friston and others Characterizing interactions between modes of brain activity

PhilTrans R Soc Lond B (2000)

determined at each iteration to render the oiexcl-diagonalelements of Covu) zero

L ˆ I iexcl paraiexcl1curren1=2ET (6)

para is a leading diagonal matrix whose elements correspondto the variance of the sources in the absence of decorre-lating lateral connections (ie para ˆ diagfucurrenT ucurreng whereucurren ˆ y pound G) curren and E are the eigenvalues and eigen-vectors of ucurrenT ucurren Estimates of the sources are given by

u ˆ yG Dagger uL ˆ y pound G(I iexcl L)iexcl1 (7)

Note that substituting equation (6) into equation (7) givesCovfug uT u ˆ para thereby ensuring orthogonality of theestimated sources Implementing changes in the lateralconnections in this way enforces orthogonality of theprojection eiexclected by the feed-forward connections ateach iteration L imposes this constraint because theeiexclective feed-forward connections are G(I7L)71 Essen-tially the architecture is centnding the rotation and scaling

of some projection on to a low-dimensional orthogonalsubspace that enables the interactions among theprojected inputs that ensue to predict as much of theoriginal inputs as possible u enters into equation (5) withy to compute tracefRg The learning rule for the esti-mates of Gi corresponds to gradient descent on the traceof the residuals In practice we use a Nelder^Meadsimplex search as implemented in MATLAB (MathWorksInc Natick MA USA) that minimizes tracefRg whichis simply a function of the input y and the feed-forwardconnections strengths G This neural network appears tobe robust and usually converges within a few tens of itera-tions to give estimates of the underlying sources u(t) andleast-square estimates of corresponding spatial modes VFigure 2 is a schematic that tries to convey the simplicityof the architecture Here the third or output layer hasbeen repoundected back on to the inputs to emphasize thesymmetry between the assumed structure of causes in theworld and the architecture used to identify them

(d) Interpreting the estimatesGenerally in PCA in the absence of any variance

maximization^minimization criterion there is no uniquerotation of the spatial modes (ie any linear combinationof the modes that conforms to an orthogonal rotation isas equally good as any other) The incorporation of theinteraction term into the decomposition implicit in equa-tion (3) ensures a unique rotation and furthermoreincorporating the sigmoid function in equation (4)ensures that the scaling is uniquely determined The latterfollows from the fact that there will be some optimumsquashing of each interaction term to best predict theobserved data The reason that unique solutions obtain inthis form of nonlinear PCA is that we have assumed avery specicentc form for the high-order interactions amongsources in causing the data This is based on the pairwiseinteractions between sources as modelled by the secondterm in equation (4) The interpretation of the sourcesand their modes is relatively straightforward

The observed multivariate data can be explained by asmall number of J sources whose expressions are given byu The values of uj scale the contribution of the centrst-orderspatial mode V1

j in a way that is directly analogous toconventional PCA where uj would be the jth componentscore In addition there are second-order eiexclects thatrepresent interactions between pairs of sources frac14(uj uk)These interactions are expressed in second-order modescorresponding to V2

jk Each second-order mode will havea variance component that may or may not be orthogonalto the centrst-order modes Although the sources are ortho-gonal there is no explicit requirement for the modes to beso The variance accounted for by each source and inter-action is given by

jujj cent jV1j j and jfrac14(ujuk)j cent jV2

jkj (8)

and can be used to rank the relative contributions of eachsource or interaction j cent j denotes the vector norm (iesum of squares)

The nonlinear PCA proposed here therefore decom-poses a multivariate data set into centrst- and second-ordercomponents that can be ascribed to a small number ofunderlying sources The number of sources will be

Characterizing interactions between modes of brain activity K Friston and others 139

Phil Trans R Soc Lond B (2000)

L12

u1

u1

(u1su2

u2

u2)

(u1s u2)

V 212

V12

G1 trace (R)

trainingerror

estimated sources andinteraction (layer 2)

predicted data(layer 3)

y1 y2 y3 y4 y5 y6 y7 y8 y9 y10

y1 y2 y3 y4 y5 y6 y7 y8 y9 y10

V 212

V12

interactions sources

observed data(layer 1)

Figure 2 The neural net architecture used to estimate sourcesand modes The lower half of the schematic represents the realworld with its sources and interactions (here only two sourcesand the subsequent interaction are shown) These sources andinteractions (u) cause signals ( y) in the input layer (layer 1)that here comprises ten voxels or channels The signals causedby the sources are weighted by the voxel-specicentc elements ofthe corresponding centrst- or second-order spatial modes (V1 andV 2) Feedforward connections (G) from the input layer tolayer 2 provide an estimate of the sources (u) in layer 2 Thisestimation obtains by changing G to minimize the sum ofsquared residuals (traceR) or diiexclerences between theobserved signals and those predicted by the activity in layer 3The activity in layer 3 results from backwards connectionsfrom the estimated source and interaction nodes in layer 2These backward connections are the estimates of the spatialmodes (V1 and V 2) and are determined using least-squaresgiven the input ( y) and the current estimate of the sources(u) Lateral decorrelating or anti-Hebbian connections Lbetween the centrst-order modes ensure orthogonality of thesource estimates Note that in the absence of any interactionthe solution would correspond to a conventional PCA whereG ˆ pinv(V1)

generally greater than the minimum number acquired toaccount for the rank of the multivariate data Forexample with just three voxels or channels two sourceswould be sucurrencient if the third dimension was explainedby the interactionbetween these two sourcesThree sourceswould be sucurrencient to account for a six-dimensional dataset and so on Clearly in the analysis of functional neuro-imaging time-series an initial dimension reduction isrequired before nonlinear PCA can be applied Forexample taking the centrst 36 spatial modes of a fMRIimaging time-series using conventional singular valuedecomposition would in principle require only eightunderlying sources In this sense nonlinear PCA representsa parsimonious characterization of the data that capturesnonlinear interactions among spatial modes or distributedbrain systems in a comprehensive but intuitive fashionThese and other issues will be demonstrated in the nextsection which uses real data to illustrate the technique

3 ILLUSTRATIVE EXAMPLES

In this section we will use a multisubject PET study ofverbal pounduency and a fMRI case study of visual processingto illustrate the use of nonlinear PCA and some aspects offunctional anatomy that can be addressed with this tech-nique The PET study is used to show that a considerableamount of variance can be accounted for by interactionsamong causes of the data and is presented for comparisonwith the original linear PCA characterization in Friston etal (1993) In brief we will show that the two experi-mental factors (task and time) combine to expressthemselves in a second-order mode that repoundects time-dependent adaptation of task-related responses Thiseiexclect can be construed as large-scale neurophysiologicalplasticity attributable to strategic changes in cognitiveprocessing during intrinsic relative to extrinsic genera-tion of words The second example using fMRI dealsmore explicitly with the modulation of one brain systemby another In particular the interactions betweenspecialized cortical systems that may be mediated bycorticothalamic loops

(a) A PET study of verbal pounduency(i) Data acquisition experimental design and preprocessing

The data were obtained from centve subjects scanned 12times (every 8 min) while performing one of two verbaltasks Scans were obtained with a CTI PETcamera (model953B CTI Knoxville TN USA) 15O was administeredintravenously as radiolabelled water infused over 2 minTotal counts per voxel during the build-up phase of radio-activity served as an estimate of regional cerebral bloodpoundow (rCBF) Subjects performed two tasks in alternationOne task involved repeating a letter presented aurally atone per two seconds (word shadowing) The other was apaced verbal pounduency task where the subjects respondedwith a word that began with the letter presented (intrinsicword generation)The data were realigned stereotacticallynormalized and smoothed with a 16 mm Gaussian kernel(Friston et al 1996c) The data were subject to a conven-tional SPM analysis using multiple linear regression with12 condition-specicentc eiexclects centve subject eiexclects and globalactivity as described in Friston et al (1995a) Parameterestimates representing condition-specicentc eiexclects averaged

over subjects were selected from voxels that exceeded athreshold of p5005 in the ensuing SPMfFgand subject tononlinear PCA as described below

(ii) Nonlinear PCAThe data were reduced to an eight-dimensional

subspace using SVD and entered into the nonlinear PCAusing two sources The ability of these two sources andtheir interaction to explain the observed regional activityis illustrated in centgure 3a Here an arbitrary voxel (thatshowing the highest F-value in the conventional SPManalysis) was selected from the left inferior frontal gyrus(Brodmann Area 47) The observed condition-specicentcactivity over 12 scans is shown in black and thatpredicted by the two sources is shown in white The rela-tive amount of variance accounted for by the two sourcesand their interaction is shown in the middle panel It canbe seen that 88 of the total variance over all voxelsincluded in the analysis can be explained by two sourcesThe second-order mode accounts of 22 of this (afterremoving that which can be modelled by the centrst-ordereiexclects) and would have been distributed over othermodes in a conventional PCA Figure 3b shows this distri-bution indicating that the centfth and sixth eigenimages ina conventional PCA largely comprise the interactionbetween the two modes identicented by nonlinear PCA

The centrst- and second-order modes are seen in centgure 4along with their expression over the 12 scans It is imme-diately apparent that the centrst mode repoundects task-relatedeiexclects paralleling the alternation between word genera-tion and word shadowing This procentle of brain regions istypical of verbal pounduency paradigms that isolate theintrinsic generation of semantic representations encodingand retrieval processes required to compare the currentoutput with previous words and the maintenance of anappropriate cognitive set The key regions involvedinclude the thalamus dorsolateral prefrontal cortex ante-rior cingulate temporal cortices and cerebellum Thesecond mode represents the other experimental factornamely time or order eiexclects A nonlinear eiexclect is evidentwith increases in activity in the cerebellar thalamic andleft basotemporal regions More interesting is the second-order mode that by implication repoundects an interactionbetween task-related responses and time ie time-dependent increases in physiological responses elicited bycognitive operations that distinguish between the twotasks employed This physiological adaptation in mostpronounced in Brocarsquos Area (Brodmann Area 44 in theleft prefrontal cortex and the right lateral thalamicregions) Brocarsquos Area is traditionally associated withspeech production and appears to undergo a profoundchange in its relative activation during word shadowingand generation after the centrst pair of scans that presum-ably repoundects an underlying change in cognitive archi-tecture or set

(b) A fMRI study of colour and motion processing(i) Data acquisition experimental design and preprocessing

The experiment was performed on a 2 Tesla MagnetomVISION (Siemens Erlangen Germany) whole body MRIsystem equipped with a head volume coil Contiguousmultislice T

2 -weighted fMRI images (TE ˆ 40 ms64 mmpound64 mmpound48 mm 3 mmpound3 mmpound3 mm voxels)

140 K Friston and others Characterizing interactions between modes of brain activity

PhilTrans R Soc Lond B (2000)

were obtained with echoplanar imaging using an axialslice orientation The eiexclective repetition time was 48 s Ayoung right-handed subject was scanned under fourdiiexclerent conditions in six scan epochs intercalated witha low level (visual centxation) baseline condition The fourconditions were repeated eight times in a pseudorandomorder giving 384 scans in total or 32 stimulation^baselineepoch pairs During all stimulation conditions the subjectlooked at dots back-projected on a screen by an LCDvideo projector The four experimental conditionscomprised the presentation of (i) radially moving dotsand (ii) stationary dots using (i) luminance contrast and(ii) chromatic contrast in a two-by-two factorial designLuminance contrast was established using isochromaticstimuli (red dots on a red background or green dots on agreen background) Hue contrast was obtained by usingred (or green) dots on a green (or red) background andestablishing isoluminance with poundicker photometry In thetwo movement conditions the dots moved radially fromthe centre of the screen at 88 s71 to the periphery wherethey vanished This creates the impression of optical poundowBy using these stimuli we hoped to excite activity invisual motion systems and those specialized for colourprocessing Any interaction between these systems wouldbe expressed in terms of motion-sensitive responses thatdepended on the hue or luminance contrast subtendingthat motion

The time-series were realigned corrected for movement-related eiexclects and spatially normalized into the standardspace of Talairach amp Tournoux (1988) using the subjectrsquosco-registered structural T1 scan and nonlinear deforma-tions (Friston et al 1996c) The data were spatiallysmoothed with a 6 mm isotropic Gaussian kernel As inthe PETexample voxels were selected that showed signicent-cant condition-specicentc eiexclects according to a conventionalSPM analysis (Friston et al 1995b Worsley amp Friston1995) This analysis used a multiple linear regression andcondition-specicentc box car regressors convolved with ahaemodynamic response function In this instance thenumber of voxels was exceeding large and we used ahigher threshold than in the PETanalysis ( p ˆ 0001) andincluded only those voxels that were posterior to theposterior commissure

(ii) Nonlinear PCAThe data were again reduced to an eight-dimensional

subspace using SVD and entered into the nonlinear PCAusing two sources The functional attribution of thesesources was established by looking at the expression of thecorresponding centrst-order modes over the four conditionsThe expression of epoch-related responses over all 32stimulation^baseline epoch pairs are shown in terms ofthe four conditions in centgure 5 This expression is simplythe score on the centrst principal component over all 32epoch-related responses for each source The centrst mode is

Characterizing interactions between modes of brain activity K Friston and others 141

Phil Trans R Soc Lond B (2000)

(a)

(b)

(c)

5

4

3

2

1

0

- 1

- 2

- 3

acti

vity

70

60

50

40

30

20

10

0

4

35

3

25

2

15

1

05

0

o

f va

rian

ce

1 2 3 4 5 6scan

7 8 9 10 11 12

mode 1652

mode 2207

interaction22

1 2 3 4eigenimage

5 6 7 8

Figure 3 Variance partitioning following a nonlinear PCA ofthe PET verbal pounduency study (a) Observed activity in a voxelin the left inferior frontal gyrus (centlled bars) and that predictedon the basis of two sources and their interaction (open bars)estimated with nonlinear PCA Activity is in units

Figure 3 (Cont) corresponding to regional cerebralperfusion in ml dl71 min71 (b) Variance over all voxelsincluded in the analysis accounted for by the two sources (ormodes) and their interaction Total ˆ 88 (c) Distributionof variance accounted for by second-order or interactioneiexclects over the conventional eigenimages obtained in theinitial SVD dimension reduction

clearly a motion-sensitive mode but one that embodiessome colour preference in the sense that the motion-dependent responses of this system are accentuated in thepresence of colour cues This was not quite what we hadanticipated the centrst-order eiexclect contains what would

functionally be called an interaction between motion andcolour processing The second source appears to beconcerned exclusively with colour processing in the sensethat its expression is uniformly higher under colourstimuli relative to isochromatic stimuli in a way that does

142 K Friston and others Characterizing interactions between modes of brain activity

PhilTrans R Soc Lond B (2000)

(a) (b)

(i)

(ii)

(iii)

15

1

05

0

- 05

- 1

- 15

1

05

0

- 05

- 1

- 15

acute 10- 5

5

0

- 5

acute 10- 6

1 2 3 4 5 6condition

7 8 9 10 11 12

Figure 4 Maximum intensity projections and expression of the centrst- and second-order spatial modes of the PET verbal pounduencystudy (i) Spatial mode 1 (ii) spatial mode 2 and (iii) second-order mode The maximum intensity projections (a) are of thepositive values of each mode and are displayed in standard format The three orthogonal brain views are from the right theback and the top of the brain The projections have been scaled to the maximum intensity of each mode The time-dependentexpression of these modes are in terms of the 12 scans (b) The units are adimensional and their absolute values are not important(the variance they account for is determined by the scaling of the spatial modes which in turn is dictated by the sigmoidsquashing function see centgure 3)

not depend on motion The corresponding anatomicalprocentle is seen in centgure 6 (maximum intensity projectionsin centgure 6a and thresholded axial sections in centgure 6b)The centrst-order mode which shows both motion andcolour-related responses shows high loadings in bilateralmotion-sensitive complex V5 (Brodmann Areas 19 and 37at the occipto-temporal junction) and areas traditionallyassociated with colour processing (V4oumlthe lingualgyrus Brodmann Area 19 ventromedially) The secondcentrst-order mode is most prominent in the hippocampusparahippocampal and related lingual cortices on bothsides The two more lateral blobs subsume the tails of thecaudate nuclei (centgure 6b(ii)) This system is not onenormally associated with colour processing but it should

be noted that some of the main eiexclect of colour has beenexplained by the centrst mode that includes V4 Insummary the two centrst-order modes comprise (i) anextrastriate cortical system including V5 and V4 thatresponds to motion and preferentially so when motion issupported by colour cues and (ii) a (para)hippocampal^lingual system that is concerned exclusively with colourprocessing above and beyond that accounted for by thecentrst system The critical question is where do these modesinteract

The interaction between the extrastriate and (para)hippocampal lingual systems conforms to the second-order mode in the lower panels This mode highlights thepulvinar of the thalamus and V5 bilaterally This is apleasing result in that it clearly implicates the thalamus inthe integration of extrastriate and (para)hippocampalsystems This integration being mediated by recurrent(sub)corticothalamic connections It is also a result thatwould not have obtained from a conventional SPManalysis Indeed we looked for an interaction betweenmotion and colour processing and did not see any sucheiexclect in the pulvinar The reason that the nonlinear PCAwas able to centnd this interaction was that there were noconstraints on the sources underlying the interaction In aconventional SPM analysis the sources are explicitlyassumed to be colour and motion in the visual centeldwhereas the two interacting modes identicented by thenonlinear PCA were caused by complicated admixtures ofcolour and motion This result is presented to illustratethe potential usefulness of nonlinear PCA not to makeany statistical inferences about reproducible functionalarchitectures The exploratory analysis based on this casestudy could now be used to motivate hypothesis-ledanalyses of other subjects

4 CONCLUSION

In this paper we have described a specicentc form ofnonlinear PCA that is predicated on the interactionbetween underlying sources in modulating spatial modesof brain activity Its theoretical motivation stems directlyfrom a second-order approximation to the Taylor expan-sion of any nonlinear function of sources that can causemultivariate observations A simple three-layer neuronalnetwork architecture is sucurrencient to identify or estimatethe underlying causes and associated centrst- and second-order spatial modes The centrst-order modes correspond toconventional eigenimages or principal components andthe second-order modes describe the patterns of brainactivity that result from interactions among thesesources The ensuing decomposition into centrst- andsecond-order components represents an exploratoryanalysis of the data that eschews some of the shortcom-ings of conventional PCA In particular nonlinear PCAallows for the context-sensitive expression of spatialmodes through second-order modes that can be inter-preted as the anatomical substrate of integration ormodulation The highly constrained form of nonlinearPCA presented above has an intuitive interpretation interms of pairwise interactions among underlying sourcesand by virtue of this represents a useful and parsimo-nious characterization of functional neuroimaging time-series

Characterizing interactions between modes of brain activity K Friston and others 143

Phil Trans R Soc Lond B (2000)

04

03

02

01

00

- 01

- 02

- 03

(a)

04

02

0

- 02

- 04

- 06

(b)

cond

itio

nndashsp

ecif

ic e

xpre

ssio

n

motion

motion

stat

colour colour

isoch isoch

stat

1 2condition

3 4

Figure 5 Condition-specicentc expression of the two centrst-ordermodes ensuing from the visual processing fMRI studyThese data represent the degree to which the centrst principalcomponent of epoch-related waveforms over the 32 photicstimulation^baseline pairs was expressed These condition-specicentc responses are plotted in terms of the four conditionsfor the two modes motion motion present stat stationarydots colour isoluminant chromatic contrast stimuli isochisochromatic luminance contrast stimuli (a) First-ordermode 1 (b) centrst-order mode 2

In this paper we have chosen to illustrate the techniqueusing the interaction between modes associated withcolour and motion processing Nonlinear PCA could ofcourse be used in any situation where one expects theactivity of a distributed brain system to be modulated bythe expression of another system Many examples come tomind that may or may not be grounded in cognitivescience or neuroscience models For example Are the

modes implicated in the visual processing of word formsand graphemes modulated by semantic modes in moreanterior temporal and parietal cortices Althoughnonlinear PCA is an exploratory device and is implicitlydata-led careful experimental design can be used tocontrol the expression of various spatial modes that onewishes to characterize As a general point it is likely thatthe more powerful designs will be factorial in nature

144 K Friston and others Characterizing interactions between modes of brain activity

PhilTrans R Soc Lond B (2000)

z = - 4 mm

z = 0 mm

z = 4 mm

(a) (b)

(i)

(ii)

(iii)

Figure 6 Maximum intensity projections and axial (transverse) sections of the centrst- and second-order spatial modes of the fMRIphotic stimulation study The maximum intensity projections (a) adhere to the same format as in centgure 4 The axial slices havebeen selected to include the maxima of the corresponding spatial modes (i) Spatial mode 1 (ii) spatial mode 2 and (iii) second-order mode In this display format the modes have been thresholded at 164 of each modersquos standard deviation over all voxels(white areas) The resulting excursion set has been superimposed onto a structural T1-weighted MRI image conforming to thesame anatomical space (b)

allowing the expression of one mode associated with oneexperimental factor to be assessed under diiexclerent levelsof the expression of a second mode elicited by a secondexperimental cognitive or sensory factor In someinstances factorial designs are not always easy toimplement (eg in selective attention because it is dicurrencultto attend selectively to a particular attribute when it isnot present in the visual centeld) However many multi-factorial experiments designed to look at languageprocessing and memory may lend themselves nicely tocharacterization using the techniques described in thispaper

(a) Extensions and limitationsThe limitations of the nonlinear PCA proposed above

are embodied in the constraints on the form of thedecomposition assumed The most obvious constraint isthat it only allows for second-order interactions amongsources or causes of the data whereas higher-order inter-actions may prevail It would of course be easy toextend the neural net architecture to include third- orhigher-order nodes and this may be justicented in some dataanalytic situations In neuroimaging however the time-series one deals with are usually quite short and noisy andsimply identifying second-order eiexclects can be quite ambi-tious The second limitation is that the number of sourceshas to be prespecicented Again this may be a drawback interms of system identicentcation and independent compo-nent analysis in general However in neuroimaging onehas experimental control over the number of factors (iesources) that are likely to cause neurophysiologicalchanges and specifying the number of sources is a muchmore tenable in terms of justicentable restrictions on thecasual model assumed for the data

Another important consideration is that in the specialapplication of nonlinear PCA to functional imagingdata an initial dimension reduction using SVD isrequired This is because there are many more voxelsthan observations It is well known that systematic errorscan creep into applying SVD to simple nonlinear depen-dencies and that these depend on the rate of convergenceof the Taylor series associated with equation (1) In thispaper the SVD is done centrst and then the nonlinearanalysis is performed It is always possible that the SVDhas not established the right bases for the subsequentanalysis and that some bias will ensue The result will bethat apparent modulations of the centrst-order modes willnot be correct These issues represent areas of futurework and could be addressed using `toyrsquo nonlinearsystems and synthetic imaging data and by examiningthe sensitivity of the second-order modes to the degree ofSVD dimension reduction

A more biological consideration relates to themechanism of the interaction In the examples presentedabove we have assumed that the interaction is expressedat a neuronal level in terms of modulation of neuronalresponses and dynamics themselves It should be borne inmind that interactions can also be expressed at the levelof the haemodynamic response to [non-interacting]neuronal responses (eg Friston et al 1998) This shouldbe considered where there is a high degree of anatomicalconvergence between centrst-order modes that evidence astrong interaction

By virtue of the iterative scheme used for learning inthe neural net there is always the problem of localminima and the associated dependency on starting esti-mates In our applications we start with the conventionalPCA solution and therefore ensure that the ensuingmodes and interactions always account for more variancethan the corresponding linear PCA In this sense there isa unique solution (for any given gradient descent scheme)and this is the nearest to the solution where the second-order eiexclects are zero

Perhaps the most interesting limitation of the techniquepresented in this paper is buried in the assumption thatthere exists a linear combination of the inputs that givesthe expression of the sources This depends on theassumption (see 2) that centrst- and second-order modesare not collinear As long as they are not collinear there isalways a set of feed-forward connection strengths thatspan the subspace of one centrst-order mode that is ortho-gonal to all other modes (centrst and second order) Whatare the implications of collinearity between a centrst- andsecond-order mode Collinearity means that the expres-sion of a centrst-order mode is itself sensitive to the expres-sion of another mode (ie the centrst- and second-ordermodes are the same thing) The possibility of this speaksto two fundamentally diiexclerent context-sensitive eiexclectsThe centrst is when the interaction between two modes orcauses is expressed as a second-order mode with a distri-bution that is distinct from both centrst-order modes This isthe situation considered in this paper and can beaddressed using nonlinear PCA as described aboveSecond the interaction may be expressed solely in termsof the expression of one of the two centrst-order modesHere there is no second-order mode only a contextualexpression of centrst-order modes This second form ofcontext-sensitivity requires a diiexclerent sort of approach(nonlinear ICA) and is interesting because it may repre-sents a true contextual eiexclect with which the brain has tocontend in everyday sensory processing

This work was funded by the Wellcome TrustWe would also liketo thank Gary Green and an anonymous reviewer for help inpresenting this work

REFERENCES

Biswal B Yetkin F Z Haughton V M amp Hyde J S 1995Functional connectivity in the motor cortex of resting humanbrain using echo-planar MRI Magn Res Med 34 537^541

Bulaquo chel C Josephs O Rees G Turner R Frith C D ampFriston K J 1998 The functional anatomy of attention tovisual motion a fMRI study Brain 121 1281^1294

Common P 1994 Independent component analysis a newconcept Signal Processing 36 287^314

Dong D amp McAvoy T J 1996 Nonlinear principal componentanalysisoumlbased on principal curves and neural networksComp Chem Engng 20 65^78

Foldiak P 1990 Forming sparse representations by local anti-Hebbian learning Biol Cybern 64 165^170

Friston K J 1995 Functional and eiexclective connectivity inneuroimaging a synthesis Hum Brain Mapp 2 56^78

Friston K J Frith C Liddle P amp Frackowiak R S J 1993Functional connectivity the principal component analysis oflarge data sets J Cerebr Blood-Flow Metab 13 5^14

Characterizing interactions between modes of brain activity K Friston and others 145

Phil Trans R Soc Lond B (2000)

Friston K J Holmes A P Worsley K J Poline J B FrithC D amp Frackowiak R S J 1995a Statistical parametricmaps in functional imaging a general linear approach HumBrain Mapp 2 189^210

Friston K J Holmes A P Poline J-B Grasby P JWilliams S C R Frackowiak R S J amp Turner R 1995bAnalysis of fMRI time-series revisited NeuroImage 2 45^53

Friston K J Poline J-B Holmes A P Frith C D ampFrackowiak R S J 1996a A multivariate analysis of PETactivation studies Hum Brain Mapp 4 140^151

Friston K J Frith C D Fletcher P Liddle P F ampFrackowiak R S J 1996b Functional topography multi-dimensional scaling and functional connectivity in the brainCerebr Cortex 6 156^164

Friston K J Ashburner J Frith C D Poline J-B HeatherJ D amp Frackowiak R S J 1996c Spatial registration andnormalisation of images Hum Brain Mapp 3 165^189

Friston K J Josephs O Rees G amp Turner R 1998Nonlinear event-related responses in fMRI Magn Reson Med39 41^52

Karhunen J amp Joutsensalo J 1994 Representation and separa-tion of signals using nonlinear PCA type learning NeuralNetworks 7 113^127

Krame M A 1991 Nonlinear principal component analysisusing auto-associative neural networks AIChE J 37 233^243

McDonald R P 1984 Concentrmatory models for non-linear structural analysis In Data analysis and informaticsIII (Versailles 1983) pp 425^432 Amsterdam North-Holland

McIntosh A R Bookstien F L Haxby J V amp Grady C L1996 Spatial pattern analysis of functional brain images usingpartial least squares NeuroImage 3 143^157

Softky W amp Kammen D 1991 Correlations in high dimen-sional or asymmetric data sets Hebbian neuronal processingNeural Networks 4 337^347

Talairach J amp Tournoux P 1988 A coplanar stereotaxic atlas of ahuman brain Stuttgart Germany Thieme

Taleb A amp Jutten C 1997 Nonlinear source separation thepost-nonlinear mixtures In ESANN rsquo97 Bruges April 1997pp 279^284

Treue S amp Maunsell H R 1996 Attentional modulation ofvisual motion processing in cortical areas MT and MSTNature 382 539^541

Wold S 1992 Nonlinear partial least squares modellingChemometrics Int Lab Syst 14 71^84

Worsley K J amp Friston K J 1995 Analysis of fMRI time-series revisitedoumlagain NeuroImage 2 173^181

Zeki S 1990 The motion pathways of the visual cortex InVision coding and ecurrenciency (ed C Blakemore) pp 321^345Cambridge University Press

146 K Friston and others Characterizing interactions between modes of brain activity

PhilTrans R Soc Lond B (2000)

Page 4: Nonlinear PCA: characterizing interactions between modes ... › ~karl › Nonlinear PCA.pdf · Eigenimage analysis has been applied widely to positron emission tomographic (PET)

network so that the output best predicts the input in aleast-squares sense

(b) Second-order PCAIn this subsection we will introduce a variant of

nonlinear PCA that uses a specicentc form for the assumednonlinear mixing of sources to produce the observedresponses in data This form is predicated on interactionsamong sources in the genesis of multivariate time-seriesIn what follows we shall assume that an n-variate obser-vation is caused by a small number of J underlyingsources and interactions among these sources Generallythe observation of the ith variate (eg at the ith voxel)will be some nonlinear function of the underlying sources

yi(t) ˆ fi(s(t)) (1)

where y(t) ˆ [ y1(t) yn(t)] is an n-vector function oftime Similarly for s(t) ˆ [s1(t) sJ(t)]

A second-order approximation of the Taylor expansionof equation (1) about some expected value -s(t) for thesources is given by

yi(t) ordm fi( middots) DaggerX

j

fi

ujuj Dagger

X

jk

2fi

ujukujuk (2)

where u(t) ˆ (s(t)7-s(t)) is an alternative representation ofthe sources Now incorporating all n observations (ievoxels) equation (2) can be expressed in matrix form interms of zeroth- centrst- and second-order modes repre-sented by the n-vectorsV 0V 1 and V 2

y(t) ordm V 0 DaggerX

j

ujV1j Dagger

X

jk

ujukV2jk

where

V 0 ˆ permil f1(s) fn(sdaggerŠ V1j ˆ f1

uj

fn

uj

V2jk ˆ 2f1

ujuk

2fn

ujuk (3)

The elements of the vectors V1 correspond to the centrstorder partial derivatives in equation (2) and the elementsof the vectors V 2 correspond to the second-order deriva-tives V1 and V 2 have the natural interpretation of centrst-and second-order modes respectively In other words thejth source is expressed in terms of the spatial mode V1

j

and the interaction between the jth and kth modesexpressed as a spatial mode V2

jk Equation (3) can beconsidered a special case of a more general equation thatembodies higher-order terms

y(t) ordm V0 DaggerX

j

ujV1j Dagger

X

jk

frac14(ujuk)V2jk (4)

frac14(cent) is some sigmoid or squashing function that allows forslightly more general forms of interactions among sourcesand ensures a unique scaling for the sources u(t)

The above gives a suitable form for a nonlinear decom-position or PCA of a multivariate data set y(t) To identifythe values of u(t) and the spatial modes it is necessary toassume a constraint of orthogonality for the sources Thisis a natural constraint in a sense that the underlying

causes of any biological data should be independent ifthey represent true causes and as such will be ortho-gonal Notice that the assumption of orthogonality is aweaker assumption than independence and it is thisassumption that decentnes the algorithm described here as anonlinear PCA Had we assumed independence then theproblem would become that of nonlinear independentcomponent analysis (Common 1994) which woulddemand a diiexclerent approach

(c) Neuronal architecture and identicentcation ofnonlinear components

In this section the neuronal architecture and gradientdescent scheme used to identify sources and their modesare described Note that equation (4) can be treated as ageneral linear model and as such if we knew the sourcesu(t) the modes could be estimated by minimizing theresiduals tracefRg in a least squares sense where

R ˆ (y iexcl XV)T (y iexcl XV) (5)

and

V ˆ permilV V1 V2Š ˆ (XT X)iexcl1XT y

X ˆ permil1 u1 uk frac14(u1u2) frac14(ukuk)Š

Here 1 is a column of ones and I below is the identitymatrix (XTX)71Xy is simply the least-squares estimatorof V given the inputs y and estimated sources (and theirinteractions) in X The problem therefore reduces toidentifying the variates u(t) corresponding to estimates ofthe sources that minimize the norm of the residualstracefRg By noting the existence of some vector Gi whereV 0Gi ˆ 0 V1

i Gi ˆ 1 for i ˆ j and 0 otherwise andV2

jkGi ˆ 0 then post-multiplying equation (4) throughoutby Gi gives ui(t) ˆ y(t)poundGi This means that there must bea linear combination of the inputs that gives the ithsource One simply has to centnd the linear combination ofinputs that minimizes tracefRg for a given input y(t)subject to the constraint that the sources are orthogonal

These observations lead to the following simple neuralnetwork the network has three layers comprising inputmiddle and output layers The input and output layershave n nodes and linear activation functions and can beimagined as lying next to each other (centgure 2) Themiddle layer comprises a small (J 5 n) number of centrst-order nodes with linear activation functions that receiveinputs from all the input nodes In addition the middlelayer includes p ˆ n(n71)2 second-order nodes thatreceive lateral inputs from the centrst-order nodes Eachsecond-order node receives two inputs that are multipliedand subject to the nonlinear function frac14(cent) to provide theiroutput The network is trained on the feedforwardconnection strengths from the input layer to the centrst-order nodes of the middle layer The weights to the ithcentrst-order node are eiexclectively estimates of Gi Theconnections from all middle-layer nodes to the outputsare determined using the least-squares estimators of themodes given the current estimate of the sources and theinputs according to equation (5) Anti-Hebbian lateralconnections (Foldiak 1993) among the centrst-order nodes inthe middle layer ensure that the sources u(t) are ortho-gonal These JpoundJ lateral connection strengths L are

138 K Friston and others Characterizing interactions between modes of brain activity

PhilTrans R Soc Lond B (2000)

determined at each iteration to render the oiexcl-diagonalelements of Covu) zero

L ˆ I iexcl paraiexcl1curren1=2ET (6)

para is a leading diagonal matrix whose elements correspondto the variance of the sources in the absence of decorre-lating lateral connections (ie para ˆ diagfucurrenT ucurreng whereucurren ˆ y pound G) curren and E are the eigenvalues and eigen-vectors of ucurrenT ucurren Estimates of the sources are given by

u ˆ yG Dagger uL ˆ y pound G(I iexcl L)iexcl1 (7)

Note that substituting equation (6) into equation (7) givesCovfug uT u ˆ para thereby ensuring orthogonality of theestimated sources Implementing changes in the lateralconnections in this way enforces orthogonality of theprojection eiexclected by the feed-forward connections ateach iteration L imposes this constraint because theeiexclective feed-forward connections are G(I7L)71 Essen-tially the architecture is centnding the rotation and scaling

of some projection on to a low-dimensional orthogonalsubspace that enables the interactions among theprojected inputs that ensue to predict as much of theoriginal inputs as possible u enters into equation (5) withy to compute tracefRg The learning rule for the esti-mates of Gi corresponds to gradient descent on the traceof the residuals In practice we use a Nelder^Meadsimplex search as implemented in MATLAB (MathWorksInc Natick MA USA) that minimizes tracefRg whichis simply a function of the input y and the feed-forwardconnections strengths G This neural network appears tobe robust and usually converges within a few tens of itera-tions to give estimates of the underlying sources u(t) andleast-square estimates of corresponding spatial modes VFigure 2 is a schematic that tries to convey the simplicityof the architecture Here the third or output layer hasbeen repoundected back on to the inputs to emphasize thesymmetry between the assumed structure of causes in theworld and the architecture used to identify them

(d) Interpreting the estimatesGenerally in PCA in the absence of any variance

maximization^minimization criterion there is no uniquerotation of the spatial modes (ie any linear combinationof the modes that conforms to an orthogonal rotation isas equally good as any other) The incorporation of theinteraction term into the decomposition implicit in equa-tion (3) ensures a unique rotation and furthermoreincorporating the sigmoid function in equation (4)ensures that the scaling is uniquely determined The latterfollows from the fact that there will be some optimumsquashing of each interaction term to best predict theobserved data The reason that unique solutions obtain inthis form of nonlinear PCA is that we have assumed avery specicentc form for the high-order interactions amongsources in causing the data This is based on the pairwiseinteractions between sources as modelled by the secondterm in equation (4) The interpretation of the sourcesand their modes is relatively straightforward

The observed multivariate data can be explained by asmall number of J sources whose expressions are given byu The values of uj scale the contribution of the centrst-orderspatial mode V1

j in a way that is directly analogous toconventional PCA where uj would be the jth componentscore In addition there are second-order eiexclects thatrepresent interactions between pairs of sources frac14(uj uk)These interactions are expressed in second-order modescorresponding to V2

jk Each second-order mode will havea variance component that may or may not be orthogonalto the centrst-order modes Although the sources are ortho-gonal there is no explicit requirement for the modes to beso The variance accounted for by each source and inter-action is given by

jujj cent jV1j j and jfrac14(ujuk)j cent jV2

jkj (8)

and can be used to rank the relative contributions of eachsource or interaction j cent j denotes the vector norm (iesum of squares)

The nonlinear PCA proposed here therefore decom-poses a multivariate data set into centrst- and second-ordercomponents that can be ascribed to a small number ofunderlying sources The number of sources will be

Characterizing interactions between modes of brain activity K Friston and others 139

Phil Trans R Soc Lond B (2000)

L12

u1

u1

(u1su2

u2

u2)

(u1s u2)

V 212

V12

G1 trace (R)

trainingerror

estimated sources andinteraction (layer 2)

predicted data(layer 3)

y1 y2 y3 y4 y5 y6 y7 y8 y9 y10

y1 y2 y3 y4 y5 y6 y7 y8 y9 y10

V 212

V12

interactions sources

observed data(layer 1)

Figure 2 The neural net architecture used to estimate sourcesand modes The lower half of the schematic represents the realworld with its sources and interactions (here only two sourcesand the subsequent interaction are shown) These sources andinteractions (u) cause signals ( y) in the input layer (layer 1)that here comprises ten voxels or channels The signals causedby the sources are weighted by the voxel-specicentc elements ofthe corresponding centrst- or second-order spatial modes (V1 andV 2) Feedforward connections (G) from the input layer tolayer 2 provide an estimate of the sources (u) in layer 2 Thisestimation obtains by changing G to minimize the sum ofsquared residuals (traceR) or diiexclerences between theobserved signals and those predicted by the activity in layer 3The activity in layer 3 results from backwards connectionsfrom the estimated source and interaction nodes in layer 2These backward connections are the estimates of the spatialmodes (V1 and V 2) and are determined using least-squaresgiven the input ( y) and the current estimate of the sources(u) Lateral decorrelating or anti-Hebbian connections Lbetween the centrst-order modes ensure orthogonality of thesource estimates Note that in the absence of any interactionthe solution would correspond to a conventional PCA whereG ˆ pinv(V1)

generally greater than the minimum number acquired toaccount for the rank of the multivariate data Forexample with just three voxels or channels two sourceswould be sucurrencient if the third dimension was explainedby the interactionbetween these two sourcesThree sourceswould be sucurrencient to account for a six-dimensional dataset and so on Clearly in the analysis of functional neuro-imaging time-series an initial dimension reduction isrequired before nonlinear PCA can be applied Forexample taking the centrst 36 spatial modes of a fMRIimaging time-series using conventional singular valuedecomposition would in principle require only eightunderlying sources In this sense nonlinear PCA representsa parsimonious characterization of the data that capturesnonlinear interactions among spatial modes or distributedbrain systems in a comprehensive but intuitive fashionThese and other issues will be demonstrated in the nextsection which uses real data to illustrate the technique

3 ILLUSTRATIVE EXAMPLES

In this section we will use a multisubject PET study ofverbal pounduency and a fMRI case study of visual processingto illustrate the use of nonlinear PCA and some aspects offunctional anatomy that can be addressed with this tech-nique The PET study is used to show that a considerableamount of variance can be accounted for by interactionsamong causes of the data and is presented for comparisonwith the original linear PCA characterization in Friston etal (1993) In brief we will show that the two experi-mental factors (task and time) combine to expressthemselves in a second-order mode that repoundects time-dependent adaptation of task-related responses Thiseiexclect can be construed as large-scale neurophysiologicalplasticity attributable to strategic changes in cognitiveprocessing during intrinsic relative to extrinsic genera-tion of words The second example using fMRI dealsmore explicitly with the modulation of one brain systemby another In particular the interactions betweenspecialized cortical systems that may be mediated bycorticothalamic loops

(a) A PET study of verbal pounduency(i) Data acquisition experimental design and preprocessing

The data were obtained from centve subjects scanned 12times (every 8 min) while performing one of two verbaltasks Scans were obtained with a CTI PETcamera (model953B CTI Knoxville TN USA) 15O was administeredintravenously as radiolabelled water infused over 2 minTotal counts per voxel during the build-up phase of radio-activity served as an estimate of regional cerebral bloodpoundow (rCBF) Subjects performed two tasks in alternationOne task involved repeating a letter presented aurally atone per two seconds (word shadowing) The other was apaced verbal pounduency task where the subjects respondedwith a word that began with the letter presented (intrinsicword generation)The data were realigned stereotacticallynormalized and smoothed with a 16 mm Gaussian kernel(Friston et al 1996c) The data were subject to a conven-tional SPM analysis using multiple linear regression with12 condition-specicentc eiexclects centve subject eiexclects and globalactivity as described in Friston et al (1995a) Parameterestimates representing condition-specicentc eiexclects averaged

over subjects were selected from voxels that exceeded athreshold of p5005 in the ensuing SPMfFgand subject tononlinear PCA as described below

(ii) Nonlinear PCAThe data were reduced to an eight-dimensional

subspace using SVD and entered into the nonlinear PCAusing two sources The ability of these two sources andtheir interaction to explain the observed regional activityis illustrated in centgure 3a Here an arbitrary voxel (thatshowing the highest F-value in the conventional SPManalysis) was selected from the left inferior frontal gyrus(Brodmann Area 47) The observed condition-specicentcactivity over 12 scans is shown in black and thatpredicted by the two sources is shown in white The rela-tive amount of variance accounted for by the two sourcesand their interaction is shown in the middle panel It canbe seen that 88 of the total variance over all voxelsincluded in the analysis can be explained by two sourcesThe second-order mode accounts of 22 of this (afterremoving that which can be modelled by the centrst-ordereiexclects) and would have been distributed over othermodes in a conventional PCA Figure 3b shows this distri-bution indicating that the centfth and sixth eigenimages ina conventional PCA largely comprise the interactionbetween the two modes identicented by nonlinear PCA

The centrst- and second-order modes are seen in centgure 4along with their expression over the 12 scans It is imme-diately apparent that the centrst mode repoundects task-relatedeiexclects paralleling the alternation between word genera-tion and word shadowing This procentle of brain regions istypical of verbal pounduency paradigms that isolate theintrinsic generation of semantic representations encodingand retrieval processes required to compare the currentoutput with previous words and the maintenance of anappropriate cognitive set The key regions involvedinclude the thalamus dorsolateral prefrontal cortex ante-rior cingulate temporal cortices and cerebellum Thesecond mode represents the other experimental factornamely time or order eiexclects A nonlinear eiexclect is evidentwith increases in activity in the cerebellar thalamic andleft basotemporal regions More interesting is the second-order mode that by implication repoundects an interactionbetween task-related responses and time ie time-dependent increases in physiological responses elicited bycognitive operations that distinguish between the twotasks employed This physiological adaptation in mostpronounced in Brocarsquos Area (Brodmann Area 44 in theleft prefrontal cortex and the right lateral thalamicregions) Brocarsquos Area is traditionally associated withspeech production and appears to undergo a profoundchange in its relative activation during word shadowingand generation after the centrst pair of scans that presum-ably repoundects an underlying change in cognitive archi-tecture or set

(b) A fMRI study of colour and motion processing(i) Data acquisition experimental design and preprocessing

The experiment was performed on a 2 Tesla MagnetomVISION (Siemens Erlangen Germany) whole body MRIsystem equipped with a head volume coil Contiguousmultislice T

2 -weighted fMRI images (TE ˆ 40 ms64 mmpound64 mmpound48 mm 3 mmpound3 mmpound3 mm voxels)

140 K Friston and others Characterizing interactions between modes of brain activity

PhilTrans R Soc Lond B (2000)

were obtained with echoplanar imaging using an axialslice orientation The eiexclective repetition time was 48 s Ayoung right-handed subject was scanned under fourdiiexclerent conditions in six scan epochs intercalated witha low level (visual centxation) baseline condition The fourconditions were repeated eight times in a pseudorandomorder giving 384 scans in total or 32 stimulation^baselineepoch pairs During all stimulation conditions the subjectlooked at dots back-projected on a screen by an LCDvideo projector The four experimental conditionscomprised the presentation of (i) radially moving dotsand (ii) stationary dots using (i) luminance contrast and(ii) chromatic contrast in a two-by-two factorial designLuminance contrast was established using isochromaticstimuli (red dots on a red background or green dots on agreen background) Hue contrast was obtained by usingred (or green) dots on a green (or red) background andestablishing isoluminance with poundicker photometry In thetwo movement conditions the dots moved radially fromthe centre of the screen at 88 s71 to the periphery wherethey vanished This creates the impression of optical poundowBy using these stimuli we hoped to excite activity invisual motion systems and those specialized for colourprocessing Any interaction between these systems wouldbe expressed in terms of motion-sensitive responses thatdepended on the hue or luminance contrast subtendingthat motion

The time-series were realigned corrected for movement-related eiexclects and spatially normalized into the standardspace of Talairach amp Tournoux (1988) using the subjectrsquosco-registered structural T1 scan and nonlinear deforma-tions (Friston et al 1996c) The data were spatiallysmoothed with a 6 mm isotropic Gaussian kernel As inthe PETexample voxels were selected that showed signicent-cant condition-specicentc eiexclects according to a conventionalSPM analysis (Friston et al 1995b Worsley amp Friston1995) This analysis used a multiple linear regression andcondition-specicentc box car regressors convolved with ahaemodynamic response function In this instance thenumber of voxels was exceeding large and we used ahigher threshold than in the PETanalysis ( p ˆ 0001) andincluded only those voxels that were posterior to theposterior commissure

(ii) Nonlinear PCAThe data were again reduced to an eight-dimensional

subspace using SVD and entered into the nonlinear PCAusing two sources The functional attribution of thesesources was established by looking at the expression of thecorresponding centrst-order modes over the four conditionsThe expression of epoch-related responses over all 32stimulation^baseline epoch pairs are shown in terms ofthe four conditions in centgure 5 This expression is simplythe score on the centrst principal component over all 32epoch-related responses for each source The centrst mode is

Characterizing interactions between modes of brain activity K Friston and others 141

Phil Trans R Soc Lond B (2000)

(a)

(b)

(c)

5

4

3

2

1

0

- 1

- 2

- 3

acti

vity

70

60

50

40

30

20

10

0

4

35

3

25

2

15

1

05

0

o

f va

rian

ce

1 2 3 4 5 6scan

7 8 9 10 11 12

mode 1652

mode 2207

interaction22

1 2 3 4eigenimage

5 6 7 8

Figure 3 Variance partitioning following a nonlinear PCA ofthe PET verbal pounduency study (a) Observed activity in a voxelin the left inferior frontal gyrus (centlled bars) and that predictedon the basis of two sources and their interaction (open bars)estimated with nonlinear PCA Activity is in units

Figure 3 (Cont) corresponding to regional cerebralperfusion in ml dl71 min71 (b) Variance over all voxelsincluded in the analysis accounted for by the two sources (ormodes) and their interaction Total ˆ 88 (c) Distributionof variance accounted for by second-order or interactioneiexclects over the conventional eigenimages obtained in theinitial SVD dimension reduction

clearly a motion-sensitive mode but one that embodiessome colour preference in the sense that the motion-dependent responses of this system are accentuated in thepresence of colour cues This was not quite what we hadanticipated the centrst-order eiexclect contains what would

functionally be called an interaction between motion andcolour processing The second source appears to beconcerned exclusively with colour processing in the sensethat its expression is uniformly higher under colourstimuli relative to isochromatic stimuli in a way that does

142 K Friston and others Characterizing interactions between modes of brain activity

PhilTrans R Soc Lond B (2000)

(a) (b)

(i)

(ii)

(iii)

15

1

05

0

- 05

- 1

- 15

1

05

0

- 05

- 1

- 15

acute 10- 5

5

0

- 5

acute 10- 6

1 2 3 4 5 6condition

7 8 9 10 11 12

Figure 4 Maximum intensity projections and expression of the centrst- and second-order spatial modes of the PET verbal pounduencystudy (i) Spatial mode 1 (ii) spatial mode 2 and (iii) second-order mode The maximum intensity projections (a) are of thepositive values of each mode and are displayed in standard format The three orthogonal brain views are from the right theback and the top of the brain The projections have been scaled to the maximum intensity of each mode The time-dependentexpression of these modes are in terms of the 12 scans (b) The units are adimensional and their absolute values are not important(the variance they account for is determined by the scaling of the spatial modes which in turn is dictated by the sigmoidsquashing function see centgure 3)

not depend on motion The corresponding anatomicalprocentle is seen in centgure 6 (maximum intensity projectionsin centgure 6a and thresholded axial sections in centgure 6b)The centrst-order mode which shows both motion andcolour-related responses shows high loadings in bilateralmotion-sensitive complex V5 (Brodmann Areas 19 and 37at the occipto-temporal junction) and areas traditionallyassociated with colour processing (V4oumlthe lingualgyrus Brodmann Area 19 ventromedially) The secondcentrst-order mode is most prominent in the hippocampusparahippocampal and related lingual cortices on bothsides The two more lateral blobs subsume the tails of thecaudate nuclei (centgure 6b(ii)) This system is not onenormally associated with colour processing but it should

be noted that some of the main eiexclect of colour has beenexplained by the centrst mode that includes V4 Insummary the two centrst-order modes comprise (i) anextrastriate cortical system including V5 and V4 thatresponds to motion and preferentially so when motion issupported by colour cues and (ii) a (para)hippocampal^lingual system that is concerned exclusively with colourprocessing above and beyond that accounted for by thecentrst system The critical question is where do these modesinteract

The interaction between the extrastriate and (para)hippocampal lingual systems conforms to the second-order mode in the lower panels This mode highlights thepulvinar of the thalamus and V5 bilaterally This is apleasing result in that it clearly implicates the thalamus inthe integration of extrastriate and (para)hippocampalsystems This integration being mediated by recurrent(sub)corticothalamic connections It is also a result thatwould not have obtained from a conventional SPManalysis Indeed we looked for an interaction betweenmotion and colour processing and did not see any sucheiexclect in the pulvinar The reason that the nonlinear PCAwas able to centnd this interaction was that there were noconstraints on the sources underlying the interaction In aconventional SPM analysis the sources are explicitlyassumed to be colour and motion in the visual centeldwhereas the two interacting modes identicented by thenonlinear PCA were caused by complicated admixtures ofcolour and motion This result is presented to illustratethe potential usefulness of nonlinear PCA not to makeany statistical inferences about reproducible functionalarchitectures The exploratory analysis based on this casestudy could now be used to motivate hypothesis-ledanalyses of other subjects

4 CONCLUSION

In this paper we have described a specicentc form ofnonlinear PCA that is predicated on the interactionbetween underlying sources in modulating spatial modesof brain activity Its theoretical motivation stems directlyfrom a second-order approximation to the Taylor expan-sion of any nonlinear function of sources that can causemultivariate observations A simple three-layer neuronalnetwork architecture is sucurrencient to identify or estimatethe underlying causes and associated centrst- and second-order spatial modes The centrst-order modes correspond toconventional eigenimages or principal components andthe second-order modes describe the patterns of brainactivity that result from interactions among thesesources The ensuing decomposition into centrst- andsecond-order components represents an exploratoryanalysis of the data that eschews some of the shortcom-ings of conventional PCA In particular nonlinear PCAallows for the context-sensitive expression of spatialmodes through second-order modes that can be inter-preted as the anatomical substrate of integration ormodulation The highly constrained form of nonlinearPCA presented above has an intuitive interpretation interms of pairwise interactions among underlying sourcesand by virtue of this represents a useful and parsimo-nious characterization of functional neuroimaging time-series

Characterizing interactions between modes of brain activity K Friston and others 143

Phil Trans R Soc Lond B (2000)

04

03

02

01

00

- 01

- 02

- 03

(a)

04

02

0

- 02

- 04

- 06

(b)

cond

itio

nndashsp

ecif

ic e

xpre

ssio

n

motion

motion

stat

colour colour

isoch isoch

stat

1 2condition

3 4

Figure 5 Condition-specicentc expression of the two centrst-ordermodes ensuing from the visual processing fMRI studyThese data represent the degree to which the centrst principalcomponent of epoch-related waveforms over the 32 photicstimulation^baseline pairs was expressed These condition-specicentc responses are plotted in terms of the four conditionsfor the two modes motion motion present stat stationarydots colour isoluminant chromatic contrast stimuli isochisochromatic luminance contrast stimuli (a) First-ordermode 1 (b) centrst-order mode 2

In this paper we have chosen to illustrate the techniqueusing the interaction between modes associated withcolour and motion processing Nonlinear PCA could ofcourse be used in any situation where one expects theactivity of a distributed brain system to be modulated bythe expression of another system Many examples come tomind that may or may not be grounded in cognitivescience or neuroscience models For example Are the

modes implicated in the visual processing of word formsand graphemes modulated by semantic modes in moreanterior temporal and parietal cortices Althoughnonlinear PCA is an exploratory device and is implicitlydata-led careful experimental design can be used tocontrol the expression of various spatial modes that onewishes to characterize As a general point it is likely thatthe more powerful designs will be factorial in nature

144 K Friston and others Characterizing interactions between modes of brain activity

PhilTrans R Soc Lond B (2000)

z = - 4 mm

z = 0 mm

z = 4 mm

(a) (b)

(i)

(ii)

(iii)

Figure 6 Maximum intensity projections and axial (transverse) sections of the centrst- and second-order spatial modes of the fMRIphotic stimulation study The maximum intensity projections (a) adhere to the same format as in centgure 4 The axial slices havebeen selected to include the maxima of the corresponding spatial modes (i) Spatial mode 1 (ii) spatial mode 2 and (iii) second-order mode In this display format the modes have been thresholded at 164 of each modersquos standard deviation over all voxels(white areas) The resulting excursion set has been superimposed onto a structural T1-weighted MRI image conforming to thesame anatomical space (b)

allowing the expression of one mode associated with oneexperimental factor to be assessed under diiexclerent levelsof the expression of a second mode elicited by a secondexperimental cognitive or sensory factor In someinstances factorial designs are not always easy toimplement (eg in selective attention because it is dicurrencultto attend selectively to a particular attribute when it isnot present in the visual centeld) However many multi-factorial experiments designed to look at languageprocessing and memory may lend themselves nicely tocharacterization using the techniques described in thispaper

(a) Extensions and limitationsThe limitations of the nonlinear PCA proposed above

are embodied in the constraints on the form of thedecomposition assumed The most obvious constraint isthat it only allows for second-order interactions amongsources or causes of the data whereas higher-order inter-actions may prevail It would of course be easy toextend the neural net architecture to include third- orhigher-order nodes and this may be justicented in some dataanalytic situations In neuroimaging however the time-series one deals with are usually quite short and noisy andsimply identifying second-order eiexclects can be quite ambi-tious The second limitation is that the number of sourceshas to be prespecicented Again this may be a drawback interms of system identicentcation and independent compo-nent analysis in general However in neuroimaging onehas experimental control over the number of factors (iesources) that are likely to cause neurophysiologicalchanges and specifying the number of sources is a muchmore tenable in terms of justicentable restrictions on thecasual model assumed for the data

Another important consideration is that in the specialapplication of nonlinear PCA to functional imagingdata an initial dimension reduction using SVD isrequired This is because there are many more voxelsthan observations It is well known that systematic errorscan creep into applying SVD to simple nonlinear depen-dencies and that these depend on the rate of convergenceof the Taylor series associated with equation (1) In thispaper the SVD is done centrst and then the nonlinearanalysis is performed It is always possible that the SVDhas not established the right bases for the subsequentanalysis and that some bias will ensue The result will bethat apparent modulations of the centrst-order modes willnot be correct These issues represent areas of futurework and could be addressed using `toyrsquo nonlinearsystems and synthetic imaging data and by examiningthe sensitivity of the second-order modes to the degree ofSVD dimension reduction

A more biological consideration relates to themechanism of the interaction In the examples presentedabove we have assumed that the interaction is expressedat a neuronal level in terms of modulation of neuronalresponses and dynamics themselves It should be borne inmind that interactions can also be expressed at the levelof the haemodynamic response to [non-interacting]neuronal responses (eg Friston et al 1998) This shouldbe considered where there is a high degree of anatomicalconvergence between centrst-order modes that evidence astrong interaction

By virtue of the iterative scheme used for learning inthe neural net there is always the problem of localminima and the associated dependency on starting esti-mates In our applications we start with the conventionalPCA solution and therefore ensure that the ensuingmodes and interactions always account for more variancethan the corresponding linear PCA In this sense there isa unique solution (for any given gradient descent scheme)and this is the nearest to the solution where the second-order eiexclects are zero

Perhaps the most interesting limitation of the techniquepresented in this paper is buried in the assumption thatthere exists a linear combination of the inputs that givesthe expression of the sources This depends on theassumption (see 2) that centrst- and second-order modesare not collinear As long as they are not collinear there isalways a set of feed-forward connection strengths thatspan the subspace of one centrst-order mode that is ortho-gonal to all other modes (centrst and second order) Whatare the implications of collinearity between a centrst- andsecond-order mode Collinearity means that the expres-sion of a centrst-order mode is itself sensitive to the expres-sion of another mode (ie the centrst- and second-ordermodes are the same thing) The possibility of this speaksto two fundamentally diiexclerent context-sensitive eiexclectsThe centrst is when the interaction between two modes orcauses is expressed as a second-order mode with a distri-bution that is distinct from both centrst-order modes This isthe situation considered in this paper and can beaddressed using nonlinear PCA as described aboveSecond the interaction may be expressed solely in termsof the expression of one of the two centrst-order modesHere there is no second-order mode only a contextualexpression of centrst-order modes This second form ofcontext-sensitivity requires a diiexclerent sort of approach(nonlinear ICA) and is interesting because it may repre-sents a true contextual eiexclect with which the brain has tocontend in everyday sensory processing

This work was funded by the Wellcome TrustWe would also liketo thank Gary Green and an anonymous reviewer for help inpresenting this work

REFERENCES

Biswal B Yetkin F Z Haughton V M amp Hyde J S 1995Functional connectivity in the motor cortex of resting humanbrain using echo-planar MRI Magn Res Med 34 537^541

Bulaquo chel C Josephs O Rees G Turner R Frith C D ampFriston K J 1998 The functional anatomy of attention tovisual motion a fMRI study Brain 121 1281^1294

Common P 1994 Independent component analysis a newconcept Signal Processing 36 287^314

Dong D amp McAvoy T J 1996 Nonlinear principal componentanalysisoumlbased on principal curves and neural networksComp Chem Engng 20 65^78

Foldiak P 1990 Forming sparse representations by local anti-Hebbian learning Biol Cybern 64 165^170

Friston K J 1995 Functional and eiexclective connectivity inneuroimaging a synthesis Hum Brain Mapp 2 56^78

Friston K J Frith C Liddle P amp Frackowiak R S J 1993Functional connectivity the principal component analysis oflarge data sets J Cerebr Blood-Flow Metab 13 5^14

Characterizing interactions between modes of brain activity K Friston and others 145

Phil Trans R Soc Lond B (2000)

Friston K J Holmes A P Worsley K J Poline J B FrithC D amp Frackowiak R S J 1995a Statistical parametricmaps in functional imaging a general linear approach HumBrain Mapp 2 189^210

Friston K J Holmes A P Poline J-B Grasby P JWilliams S C R Frackowiak R S J amp Turner R 1995bAnalysis of fMRI time-series revisited NeuroImage 2 45^53

Friston K J Poline J-B Holmes A P Frith C D ampFrackowiak R S J 1996a A multivariate analysis of PETactivation studies Hum Brain Mapp 4 140^151

Friston K J Frith C D Fletcher P Liddle P F ampFrackowiak R S J 1996b Functional topography multi-dimensional scaling and functional connectivity in the brainCerebr Cortex 6 156^164

Friston K J Ashburner J Frith C D Poline J-B HeatherJ D amp Frackowiak R S J 1996c Spatial registration andnormalisation of images Hum Brain Mapp 3 165^189

Friston K J Josephs O Rees G amp Turner R 1998Nonlinear event-related responses in fMRI Magn Reson Med39 41^52

Karhunen J amp Joutsensalo J 1994 Representation and separa-tion of signals using nonlinear PCA type learning NeuralNetworks 7 113^127

Krame M A 1991 Nonlinear principal component analysisusing auto-associative neural networks AIChE J 37 233^243

McDonald R P 1984 Concentrmatory models for non-linear structural analysis In Data analysis and informaticsIII (Versailles 1983) pp 425^432 Amsterdam North-Holland

McIntosh A R Bookstien F L Haxby J V amp Grady C L1996 Spatial pattern analysis of functional brain images usingpartial least squares NeuroImage 3 143^157

Softky W amp Kammen D 1991 Correlations in high dimen-sional or asymmetric data sets Hebbian neuronal processingNeural Networks 4 337^347

Talairach J amp Tournoux P 1988 A coplanar stereotaxic atlas of ahuman brain Stuttgart Germany Thieme

Taleb A amp Jutten C 1997 Nonlinear source separation thepost-nonlinear mixtures In ESANN rsquo97 Bruges April 1997pp 279^284

Treue S amp Maunsell H R 1996 Attentional modulation ofvisual motion processing in cortical areas MT and MSTNature 382 539^541

Wold S 1992 Nonlinear partial least squares modellingChemometrics Int Lab Syst 14 71^84

Worsley K J amp Friston K J 1995 Analysis of fMRI time-series revisitedoumlagain NeuroImage 2 173^181

Zeki S 1990 The motion pathways of the visual cortex InVision coding and ecurrenciency (ed C Blakemore) pp 321^345Cambridge University Press

146 K Friston and others Characterizing interactions between modes of brain activity

PhilTrans R Soc Lond B (2000)

Page 5: Nonlinear PCA: characterizing interactions between modes ... › ~karl › Nonlinear PCA.pdf · Eigenimage analysis has been applied widely to positron emission tomographic (PET)

determined at each iteration to render the oiexcl-diagonalelements of Covu) zero

L ˆ I iexcl paraiexcl1curren1=2ET (6)

para is a leading diagonal matrix whose elements correspondto the variance of the sources in the absence of decorre-lating lateral connections (ie para ˆ diagfucurrenT ucurreng whereucurren ˆ y pound G) curren and E are the eigenvalues and eigen-vectors of ucurrenT ucurren Estimates of the sources are given by

u ˆ yG Dagger uL ˆ y pound G(I iexcl L)iexcl1 (7)

Note that substituting equation (6) into equation (7) givesCovfug uT u ˆ para thereby ensuring orthogonality of theestimated sources Implementing changes in the lateralconnections in this way enforces orthogonality of theprojection eiexclected by the feed-forward connections ateach iteration L imposes this constraint because theeiexclective feed-forward connections are G(I7L)71 Essen-tially the architecture is centnding the rotation and scaling

of some projection on to a low-dimensional orthogonalsubspace that enables the interactions among theprojected inputs that ensue to predict as much of theoriginal inputs as possible u enters into equation (5) withy to compute tracefRg The learning rule for the esti-mates of Gi corresponds to gradient descent on the traceof the residuals In practice we use a Nelder^Meadsimplex search as implemented in MATLAB (MathWorksInc Natick MA USA) that minimizes tracefRg whichis simply a function of the input y and the feed-forwardconnections strengths G This neural network appears tobe robust and usually converges within a few tens of itera-tions to give estimates of the underlying sources u(t) andleast-square estimates of corresponding spatial modes VFigure 2 is a schematic that tries to convey the simplicityof the architecture Here the third or output layer hasbeen repoundected back on to the inputs to emphasize thesymmetry between the assumed structure of causes in theworld and the architecture used to identify them

(d) Interpreting the estimatesGenerally in PCA in the absence of any variance

maximization^minimization criterion there is no uniquerotation of the spatial modes (ie any linear combinationof the modes that conforms to an orthogonal rotation isas equally good as any other) The incorporation of theinteraction term into the decomposition implicit in equa-tion (3) ensures a unique rotation and furthermoreincorporating the sigmoid function in equation (4)ensures that the scaling is uniquely determined The latterfollows from the fact that there will be some optimumsquashing of each interaction term to best predict theobserved data The reason that unique solutions obtain inthis form of nonlinear PCA is that we have assumed avery specicentc form for the high-order interactions amongsources in causing the data This is based on the pairwiseinteractions between sources as modelled by the secondterm in equation (4) The interpretation of the sourcesand their modes is relatively straightforward

The observed multivariate data can be explained by asmall number of J sources whose expressions are given byu The values of uj scale the contribution of the centrst-orderspatial mode V1

j in a way that is directly analogous toconventional PCA where uj would be the jth componentscore In addition there are second-order eiexclects thatrepresent interactions between pairs of sources frac14(uj uk)These interactions are expressed in second-order modescorresponding to V2

jk Each second-order mode will havea variance component that may or may not be orthogonalto the centrst-order modes Although the sources are ortho-gonal there is no explicit requirement for the modes to beso The variance accounted for by each source and inter-action is given by

jujj cent jV1j j and jfrac14(ujuk)j cent jV2

jkj (8)

and can be used to rank the relative contributions of eachsource or interaction j cent j denotes the vector norm (iesum of squares)

The nonlinear PCA proposed here therefore decom-poses a multivariate data set into centrst- and second-ordercomponents that can be ascribed to a small number ofunderlying sources The number of sources will be

Characterizing interactions between modes of brain activity K Friston and others 139

Phil Trans R Soc Lond B (2000)

L12

u1

u1

(u1su2

u2

u2)

(u1s u2)

V 212

V12

G1 trace (R)

trainingerror

estimated sources andinteraction (layer 2)

predicted data(layer 3)

y1 y2 y3 y4 y5 y6 y7 y8 y9 y10

y1 y2 y3 y4 y5 y6 y7 y8 y9 y10

V 212

V12

interactions sources

observed data(layer 1)

Figure 2 The neural net architecture used to estimate sourcesand modes The lower half of the schematic represents the realworld with its sources and interactions (here only two sourcesand the subsequent interaction are shown) These sources andinteractions (u) cause signals ( y) in the input layer (layer 1)that here comprises ten voxels or channels The signals causedby the sources are weighted by the voxel-specicentc elements ofthe corresponding centrst- or second-order spatial modes (V1 andV 2) Feedforward connections (G) from the input layer tolayer 2 provide an estimate of the sources (u) in layer 2 Thisestimation obtains by changing G to minimize the sum ofsquared residuals (traceR) or diiexclerences between theobserved signals and those predicted by the activity in layer 3The activity in layer 3 results from backwards connectionsfrom the estimated source and interaction nodes in layer 2These backward connections are the estimates of the spatialmodes (V1 and V 2) and are determined using least-squaresgiven the input ( y) and the current estimate of the sources(u) Lateral decorrelating or anti-Hebbian connections Lbetween the centrst-order modes ensure orthogonality of thesource estimates Note that in the absence of any interactionthe solution would correspond to a conventional PCA whereG ˆ pinv(V1)

generally greater than the minimum number acquired toaccount for the rank of the multivariate data Forexample with just three voxels or channels two sourceswould be sucurrencient if the third dimension was explainedby the interactionbetween these two sourcesThree sourceswould be sucurrencient to account for a six-dimensional dataset and so on Clearly in the analysis of functional neuro-imaging time-series an initial dimension reduction isrequired before nonlinear PCA can be applied Forexample taking the centrst 36 spatial modes of a fMRIimaging time-series using conventional singular valuedecomposition would in principle require only eightunderlying sources In this sense nonlinear PCA representsa parsimonious characterization of the data that capturesnonlinear interactions among spatial modes or distributedbrain systems in a comprehensive but intuitive fashionThese and other issues will be demonstrated in the nextsection which uses real data to illustrate the technique

3 ILLUSTRATIVE EXAMPLES

In this section we will use a multisubject PET study ofverbal pounduency and a fMRI case study of visual processingto illustrate the use of nonlinear PCA and some aspects offunctional anatomy that can be addressed with this tech-nique The PET study is used to show that a considerableamount of variance can be accounted for by interactionsamong causes of the data and is presented for comparisonwith the original linear PCA characterization in Friston etal (1993) In brief we will show that the two experi-mental factors (task and time) combine to expressthemselves in a second-order mode that repoundects time-dependent adaptation of task-related responses Thiseiexclect can be construed as large-scale neurophysiologicalplasticity attributable to strategic changes in cognitiveprocessing during intrinsic relative to extrinsic genera-tion of words The second example using fMRI dealsmore explicitly with the modulation of one brain systemby another In particular the interactions betweenspecialized cortical systems that may be mediated bycorticothalamic loops

(a) A PET study of verbal pounduency(i) Data acquisition experimental design and preprocessing

The data were obtained from centve subjects scanned 12times (every 8 min) while performing one of two verbaltasks Scans were obtained with a CTI PETcamera (model953B CTI Knoxville TN USA) 15O was administeredintravenously as radiolabelled water infused over 2 minTotal counts per voxel during the build-up phase of radio-activity served as an estimate of regional cerebral bloodpoundow (rCBF) Subjects performed two tasks in alternationOne task involved repeating a letter presented aurally atone per two seconds (word shadowing) The other was apaced verbal pounduency task where the subjects respondedwith a word that began with the letter presented (intrinsicword generation)The data were realigned stereotacticallynormalized and smoothed with a 16 mm Gaussian kernel(Friston et al 1996c) The data were subject to a conven-tional SPM analysis using multiple linear regression with12 condition-specicentc eiexclects centve subject eiexclects and globalactivity as described in Friston et al (1995a) Parameterestimates representing condition-specicentc eiexclects averaged

over subjects were selected from voxels that exceeded athreshold of p5005 in the ensuing SPMfFgand subject tononlinear PCA as described below

(ii) Nonlinear PCAThe data were reduced to an eight-dimensional

subspace using SVD and entered into the nonlinear PCAusing two sources The ability of these two sources andtheir interaction to explain the observed regional activityis illustrated in centgure 3a Here an arbitrary voxel (thatshowing the highest F-value in the conventional SPManalysis) was selected from the left inferior frontal gyrus(Brodmann Area 47) The observed condition-specicentcactivity over 12 scans is shown in black and thatpredicted by the two sources is shown in white The rela-tive amount of variance accounted for by the two sourcesand their interaction is shown in the middle panel It canbe seen that 88 of the total variance over all voxelsincluded in the analysis can be explained by two sourcesThe second-order mode accounts of 22 of this (afterremoving that which can be modelled by the centrst-ordereiexclects) and would have been distributed over othermodes in a conventional PCA Figure 3b shows this distri-bution indicating that the centfth and sixth eigenimages ina conventional PCA largely comprise the interactionbetween the two modes identicented by nonlinear PCA

The centrst- and second-order modes are seen in centgure 4along with their expression over the 12 scans It is imme-diately apparent that the centrst mode repoundects task-relatedeiexclects paralleling the alternation between word genera-tion and word shadowing This procentle of brain regions istypical of verbal pounduency paradigms that isolate theintrinsic generation of semantic representations encodingand retrieval processes required to compare the currentoutput with previous words and the maintenance of anappropriate cognitive set The key regions involvedinclude the thalamus dorsolateral prefrontal cortex ante-rior cingulate temporal cortices and cerebellum Thesecond mode represents the other experimental factornamely time or order eiexclects A nonlinear eiexclect is evidentwith increases in activity in the cerebellar thalamic andleft basotemporal regions More interesting is the second-order mode that by implication repoundects an interactionbetween task-related responses and time ie time-dependent increases in physiological responses elicited bycognitive operations that distinguish between the twotasks employed This physiological adaptation in mostpronounced in Brocarsquos Area (Brodmann Area 44 in theleft prefrontal cortex and the right lateral thalamicregions) Brocarsquos Area is traditionally associated withspeech production and appears to undergo a profoundchange in its relative activation during word shadowingand generation after the centrst pair of scans that presum-ably repoundects an underlying change in cognitive archi-tecture or set

(b) A fMRI study of colour and motion processing(i) Data acquisition experimental design and preprocessing

The experiment was performed on a 2 Tesla MagnetomVISION (Siemens Erlangen Germany) whole body MRIsystem equipped with a head volume coil Contiguousmultislice T

2 -weighted fMRI images (TE ˆ 40 ms64 mmpound64 mmpound48 mm 3 mmpound3 mmpound3 mm voxels)

140 K Friston and others Characterizing interactions between modes of brain activity

PhilTrans R Soc Lond B (2000)

were obtained with echoplanar imaging using an axialslice orientation The eiexclective repetition time was 48 s Ayoung right-handed subject was scanned under fourdiiexclerent conditions in six scan epochs intercalated witha low level (visual centxation) baseline condition The fourconditions were repeated eight times in a pseudorandomorder giving 384 scans in total or 32 stimulation^baselineepoch pairs During all stimulation conditions the subjectlooked at dots back-projected on a screen by an LCDvideo projector The four experimental conditionscomprised the presentation of (i) radially moving dotsand (ii) stationary dots using (i) luminance contrast and(ii) chromatic contrast in a two-by-two factorial designLuminance contrast was established using isochromaticstimuli (red dots on a red background or green dots on agreen background) Hue contrast was obtained by usingred (or green) dots on a green (or red) background andestablishing isoluminance with poundicker photometry In thetwo movement conditions the dots moved radially fromthe centre of the screen at 88 s71 to the periphery wherethey vanished This creates the impression of optical poundowBy using these stimuli we hoped to excite activity invisual motion systems and those specialized for colourprocessing Any interaction between these systems wouldbe expressed in terms of motion-sensitive responses thatdepended on the hue or luminance contrast subtendingthat motion

The time-series were realigned corrected for movement-related eiexclects and spatially normalized into the standardspace of Talairach amp Tournoux (1988) using the subjectrsquosco-registered structural T1 scan and nonlinear deforma-tions (Friston et al 1996c) The data were spatiallysmoothed with a 6 mm isotropic Gaussian kernel As inthe PETexample voxels were selected that showed signicent-cant condition-specicentc eiexclects according to a conventionalSPM analysis (Friston et al 1995b Worsley amp Friston1995) This analysis used a multiple linear regression andcondition-specicentc box car regressors convolved with ahaemodynamic response function In this instance thenumber of voxels was exceeding large and we used ahigher threshold than in the PETanalysis ( p ˆ 0001) andincluded only those voxels that were posterior to theposterior commissure

(ii) Nonlinear PCAThe data were again reduced to an eight-dimensional

subspace using SVD and entered into the nonlinear PCAusing two sources The functional attribution of thesesources was established by looking at the expression of thecorresponding centrst-order modes over the four conditionsThe expression of epoch-related responses over all 32stimulation^baseline epoch pairs are shown in terms ofthe four conditions in centgure 5 This expression is simplythe score on the centrst principal component over all 32epoch-related responses for each source The centrst mode is

Characterizing interactions between modes of brain activity K Friston and others 141

Phil Trans R Soc Lond B (2000)

(a)

(b)

(c)

5

4

3

2

1

0

- 1

- 2

- 3

acti

vity

70

60

50

40

30

20

10

0

4

35

3

25

2

15

1

05

0

o

f va

rian

ce

1 2 3 4 5 6scan

7 8 9 10 11 12

mode 1652

mode 2207

interaction22

1 2 3 4eigenimage

5 6 7 8

Figure 3 Variance partitioning following a nonlinear PCA ofthe PET verbal pounduency study (a) Observed activity in a voxelin the left inferior frontal gyrus (centlled bars) and that predictedon the basis of two sources and their interaction (open bars)estimated with nonlinear PCA Activity is in units

Figure 3 (Cont) corresponding to regional cerebralperfusion in ml dl71 min71 (b) Variance over all voxelsincluded in the analysis accounted for by the two sources (ormodes) and their interaction Total ˆ 88 (c) Distributionof variance accounted for by second-order or interactioneiexclects over the conventional eigenimages obtained in theinitial SVD dimension reduction

clearly a motion-sensitive mode but one that embodiessome colour preference in the sense that the motion-dependent responses of this system are accentuated in thepresence of colour cues This was not quite what we hadanticipated the centrst-order eiexclect contains what would

functionally be called an interaction between motion andcolour processing The second source appears to beconcerned exclusively with colour processing in the sensethat its expression is uniformly higher under colourstimuli relative to isochromatic stimuli in a way that does

142 K Friston and others Characterizing interactions between modes of brain activity

PhilTrans R Soc Lond B (2000)

(a) (b)

(i)

(ii)

(iii)

15

1

05

0

- 05

- 1

- 15

1

05

0

- 05

- 1

- 15

acute 10- 5

5

0

- 5

acute 10- 6

1 2 3 4 5 6condition

7 8 9 10 11 12

Figure 4 Maximum intensity projections and expression of the centrst- and second-order spatial modes of the PET verbal pounduencystudy (i) Spatial mode 1 (ii) spatial mode 2 and (iii) second-order mode The maximum intensity projections (a) are of thepositive values of each mode and are displayed in standard format The three orthogonal brain views are from the right theback and the top of the brain The projections have been scaled to the maximum intensity of each mode The time-dependentexpression of these modes are in terms of the 12 scans (b) The units are adimensional and their absolute values are not important(the variance they account for is determined by the scaling of the spatial modes which in turn is dictated by the sigmoidsquashing function see centgure 3)

not depend on motion The corresponding anatomicalprocentle is seen in centgure 6 (maximum intensity projectionsin centgure 6a and thresholded axial sections in centgure 6b)The centrst-order mode which shows both motion andcolour-related responses shows high loadings in bilateralmotion-sensitive complex V5 (Brodmann Areas 19 and 37at the occipto-temporal junction) and areas traditionallyassociated with colour processing (V4oumlthe lingualgyrus Brodmann Area 19 ventromedially) The secondcentrst-order mode is most prominent in the hippocampusparahippocampal and related lingual cortices on bothsides The two more lateral blobs subsume the tails of thecaudate nuclei (centgure 6b(ii)) This system is not onenormally associated with colour processing but it should

be noted that some of the main eiexclect of colour has beenexplained by the centrst mode that includes V4 Insummary the two centrst-order modes comprise (i) anextrastriate cortical system including V5 and V4 thatresponds to motion and preferentially so when motion issupported by colour cues and (ii) a (para)hippocampal^lingual system that is concerned exclusively with colourprocessing above and beyond that accounted for by thecentrst system The critical question is where do these modesinteract

The interaction between the extrastriate and (para)hippocampal lingual systems conforms to the second-order mode in the lower panels This mode highlights thepulvinar of the thalamus and V5 bilaterally This is apleasing result in that it clearly implicates the thalamus inthe integration of extrastriate and (para)hippocampalsystems This integration being mediated by recurrent(sub)corticothalamic connections It is also a result thatwould not have obtained from a conventional SPManalysis Indeed we looked for an interaction betweenmotion and colour processing and did not see any sucheiexclect in the pulvinar The reason that the nonlinear PCAwas able to centnd this interaction was that there were noconstraints on the sources underlying the interaction In aconventional SPM analysis the sources are explicitlyassumed to be colour and motion in the visual centeldwhereas the two interacting modes identicented by thenonlinear PCA were caused by complicated admixtures ofcolour and motion This result is presented to illustratethe potential usefulness of nonlinear PCA not to makeany statistical inferences about reproducible functionalarchitectures The exploratory analysis based on this casestudy could now be used to motivate hypothesis-ledanalyses of other subjects

4 CONCLUSION

In this paper we have described a specicentc form ofnonlinear PCA that is predicated on the interactionbetween underlying sources in modulating spatial modesof brain activity Its theoretical motivation stems directlyfrom a second-order approximation to the Taylor expan-sion of any nonlinear function of sources that can causemultivariate observations A simple three-layer neuronalnetwork architecture is sucurrencient to identify or estimatethe underlying causes and associated centrst- and second-order spatial modes The centrst-order modes correspond toconventional eigenimages or principal components andthe second-order modes describe the patterns of brainactivity that result from interactions among thesesources The ensuing decomposition into centrst- andsecond-order components represents an exploratoryanalysis of the data that eschews some of the shortcom-ings of conventional PCA In particular nonlinear PCAallows for the context-sensitive expression of spatialmodes through second-order modes that can be inter-preted as the anatomical substrate of integration ormodulation The highly constrained form of nonlinearPCA presented above has an intuitive interpretation interms of pairwise interactions among underlying sourcesand by virtue of this represents a useful and parsimo-nious characterization of functional neuroimaging time-series

Characterizing interactions between modes of brain activity K Friston and others 143

Phil Trans R Soc Lond B (2000)

04

03

02

01

00

- 01

- 02

- 03

(a)

04

02

0

- 02

- 04

- 06

(b)

cond

itio

nndashsp

ecif

ic e

xpre

ssio

n

motion

motion

stat

colour colour

isoch isoch

stat

1 2condition

3 4

Figure 5 Condition-specicentc expression of the two centrst-ordermodes ensuing from the visual processing fMRI studyThese data represent the degree to which the centrst principalcomponent of epoch-related waveforms over the 32 photicstimulation^baseline pairs was expressed These condition-specicentc responses are plotted in terms of the four conditionsfor the two modes motion motion present stat stationarydots colour isoluminant chromatic contrast stimuli isochisochromatic luminance contrast stimuli (a) First-ordermode 1 (b) centrst-order mode 2

In this paper we have chosen to illustrate the techniqueusing the interaction between modes associated withcolour and motion processing Nonlinear PCA could ofcourse be used in any situation where one expects theactivity of a distributed brain system to be modulated bythe expression of another system Many examples come tomind that may or may not be grounded in cognitivescience or neuroscience models For example Are the

modes implicated in the visual processing of word formsand graphemes modulated by semantic modes in moreanterior temporal and parietal cortices Althoughnonlinear PCA is an exploratory device and is implicitlydata-led careful experimental design can be used tocontrol the expression of various spatial modes that onewishes to characterize As a general point it is likely thatthe more powerful designs will be factorial in nature

144 K Friston and others Characterizing interactions between modes of brain activity

PhilTrans R Soc Lond B (2000)

z = - 4 mm

z = 0 mm

z = 4 mm

(a) (b)

(i)

(ii)

(iii)

Figure 6 Maximum intensity projections and axial (transverse) sections of the centrst- and second-order spatial modes of the fMRIphotic stimulation study The maximum intensity projections (a) adhere to the same format as in centgure 4 The axial slices havebeen selected to include the maxima of the corresponding spatial modes (i) Spatial mode 1 (ii) spatial mode 2 and (iii) second-order mode In this display format the modes have been thresholded at 164 of each modersquos standard deviation over all voxels(white areas) The resulting excursion set has been superimposed onto a structural T1-weighted MRI image conforming to thesame anatomical space (b)

allowing the expression of one mode associated with oneexperimental factor to be assessed under diiexclerent levelsof the expression of a second mode elicited by a secondexperimental cognitive or sensory factor In someinstances factorial designs are not always easy toimplement (eg in selective attention because it is dicurrencultto attend selectively to a particular attribute when it isnot present in the visual centeld) However many multi-factorial experiments designed to look at languageprocessing and memory may lend themselves nicely tocharacterization using the techniques described in thispaper

(a) Extensions and limitationsThe limitations of the nonlinear PCA proposed above

are embodied in the constraints on the form of thedecomposition assumed The most obvious constraint isthat it only allows for second-order interactions amongsources or causes of the data whereas higher-order inter-actions may prevail It would of course be easy toextend the neural net architecture to include third- orhigher-order nodes and this may be justicented in some dataanalytic situations In neuroimaging however the time-series one deals with are usually quite short and noisy andsimply identifying second-order eiexclects can be quite ambi-tious The second limitation is that the number of sourceshas to be prespecicented Again this may be a drawback interms of system identicentcation and independent compo-nent analysis in general However in neuroimaging onehas experimental control over the number of factors (iesources) that are likely to cause neurophysiologicalchanges and specifying the number of sources is a muchmore tenable in terms of justicentable restrictions on thecasual model assumed for the data

Another important consideration is that in the specialapplication of nonlinear PCA to functional imagingdata an initial dimension reduction using SVD isrequired This is because there are many more voxelsthan observations It is well known that systematic errorscan creep into applying SVD to simple nonlinear depen-dencies and that these depend on the rate of convergenceof the Taylor series associated with equation (1) In thispaper the SVD is done centrst and then the nonlinearanalysis is performed It is always possible that the SVDhas not established the right bases for the subsequentanalysis and that some bias will ensue The result will bethat apparent modulations of the centrst-order modes willnot be correct These issues represent areas of futurework and could be addressed using `toyrsquo nonlinearsystems and synthetic imaging data and by examiningthe sensitivity of the second-order modes to the degree ofSVD dimension reduction

A more biological consideration relates to themechanism of the interaction In the examples presentedabove we have assumed that the interaction is expressedat a neuronal level in terms of modulation of neuronalresponses and dynamics themselves It should be borne inmind that interactions can also be expressed at the levelof the haemodynamic response to [non-interacting]neuronal responses (eg Friston et al 1998) This shouldbe considered where there is a high degree of anatomicalconvergence between centrst-order modes that evidence astrong interaction

By virtue of the iterative scheme used for learning inthe neural net there is always the problem of localminima and the associated dependency on starting esti-mates In our applications we start with the conventionalPCA solution and therefore ensure that the ensuingmodes and interactions always account for more variancethan the corresponding linear PCA In this sense there isa unique solution (for any given gradient descent scheme)and this is the nearest to the solution where the second-order eiexclects are zero

Perhaps the most interesting limitation of the techniquepresented in this paper is buried in the assumption thatthere exists a linear combination of the inputs that givesthe expression of the sources This depends on theassumption (see 2) that centrst- and second-order modesare not collinear As long as they are not collinear there isalways a set of feed-forward connection strengths thatspan the subspace of one centrst-order mode that is ortho-gonal to all other modes (centrst and second order) Whatare the implications of collinearity between a centrst- andsecond-order mode Collinearity means that the expres-sion of a centrst-order mode is itself sensitive to the expres-sion of another mode (ie the centrst- and second-ordermodes are the same thing) The possibility of this speaksto two fundamentally diiexclerent context-sensitive eiexclectsThe centrst is when the interaction between two modes orcauses is expressed as a second-order mode with a distri-bution that is distinct from both centrst-order modes This isthe situation considered in this paper and can beaddressed using nonlinear PCA as described aboveSecond the interaction may be expressed solely in termsof the expression of one of the two centrst-order modesHere there is no second-order mode only a contextualexpression of centrst-order modes This second form ofcontext-sensitivity requires a diiexclerent sort of approach(nonlinear ICA) and is interesting because it may repre-sents a true contextual eiexclect with which the brain has tocontend in everyday sensory processing

This work was funded by the Wellcome TrustWe would also liketo thank Gary Green and an anonymous reviewer for help inpresenting this work

REFERENCES

Biswal B Yetkin F Z Haughton V M amp Hyde J S 1995Functional connectivity in the motor cortex of resting humanbrain using echo-planar MRI Magn Res Med 34 537^541

Bulaquo chel C Josephs O Rees G Turner R Frith C D ampFriston K J 1998 The functional anatomy of attention tovisual motion a fMRI study Brain 121 1281^1294

Common P 1994 Independent component analysis a newconcept Signal Processing 36 287^314

Dong D amp McAvoy T J 1996 Nonlinear principal componentanalysisoumlbased on principal curves and neural networksComp Chem Engng 20 65^78

Foldiak P 1990 Forming sparse representations by local anti-Hebbian learning Biol Cybern 64 165^170

Friston K J 1995 Functional and eiexclective connectivity inneuroimaging a synthesis Hum Brain Mapp 2 56^78

Friston K J Frith C Liddle P amp Frackowiak R S J 1993Functional connectivity the principal component analysis oflarge data sets J Cerebr Blood-Flow Metab 13 5^14

Characterizing interactions between modes of brain activity K Friston and others 145

Phil Trans R Soc Lond B (2000)

Friston K J Holmes A P Worsley K J Poline J B FrithC D amp Frackowiak R S J 1995a Statistical parametricmaps in functional imaging a general linear approach HumBrain Mapp 2 189^210

Friston K J Holmes A P Poline J-B Grasby P JWilliams S C R Frackowiak R S J amp Turner R 1995bAnalysis of fMRI time-series revisited NeuroImage 2 45^53

Friston K J Poline J-B Holmes A P Frith C D ampFrackowiak R S J 1996a A multivariate analysis of PETactivation studies Hum Brain Mapp 4 140^151

Friston K J Frith C D Fletcher P Liddle P F ampFrackowiak R S J 1996b Functional topography multi-dimensional scaling and functional connectivity in the brainCerebr Cortex 6 156^164

Friston K J Ashburner J Frith C D Poline J-B HeatherJ D amp Frackowiak R S J 1996c Spatial registration andnormalisation of images Hum Brain Mapp 3 165^189

Friston K J Josephs O Rees G amp Turner R 1998Nonlinear event-related responses in fMRI Magn Reson Med39 41^52

Karhunen J amp Joutsensalo J 1994 Representation and separa-tion of signals using nonlinear PCA type learning NeuralNetworks 7 113^127

Krame M A 1991 Nonlinear principal component analysisusing auto-associative neural networks AIChE J 37 233^243

McDonald R P 1984 Concentrmatory models for non-linear structural analysis In Data analysis and informaticsIII (Versailles 1983) pp 425^432 Amsterdam North-Holland

McIntosh A R Bookstien F L Haxby J V amp Grady C L1996 Spatial pattern analysis of functional brain images usingpartial least squares NeuroImage 3 143^157

Softky W amp Kammen D 1991 Correlations in high dimen-sional or asymmetric data sets Hebbian neuronal processingNeural Networks 4 337^347

Talairach J amp Tournoux P 1988 A coplanar stereotaxic atlas of ahuman brain Stuttgart Germany Thieme

Taleb A amp Jutten C 1997 Nonlinear source separation thepost-nonlinear mixtures In ESANN rsquo97 Bruges April 1997pp 279^284

Treue S amp Maunsell H R 1996 Attentional modulation ofvisual motion processing in cortical areas MT and MSTNature 382 539^541

Wold S 1992 Nonlinear partial least squares modellingChemometrics Int Lab Syst 14 71^84

Worsley K J amp Friston K J 1995 Analysis of fMRI time-series revisitedoumlagain NeuroImage 2 173^181

Zeki S 1990 The motion pathways of the visual cortex InVision coding and ecurrenciency (ed C Blakemore) pp 321^345Cambridge University Press

146 K Friston and others Characterizing interactions between modes of brain activity

PhilTrans R Soc Lond B (2000)

Page 6: Nonlinear PCA: characterizing interactions between modes ... › ~karl › Nonlinear PCA.pdf · Eigenimage analysis has been applied widely to positron emission tomographic (PET)

generally greater than the minimum number acquired toaccount for the rank of the multivariate data Forexample with just three voxels or channels two sourceswould be sucurrencient if the third dimension was explainedby the interactionbetween these two sourcesThree sourceswould be sucurrencient to account for a six-dimensional dataset and so on Clearly in the analysis of functional neuro-imaging time-series an initial dimension reduction isrequired before nonlinear PCA can be applied Forexample taking the centrst 36 spatial modes of a fMRIimaging time-series using conventional singular valuedecomposition would in principle require only eightunderlying sources In this sense nonlinear PCA representsa parsimonious characterization of the data that capturesnonlinear interactions among spatial modes or distributedbrain systems in a comprehensive but intuitive fashionThese and other issues will be demonstrated in the nextsection which uses real data to illustrate the technique

3 ILLUSTRATIVE EXAMPLES

In this section we will use a multisubject PET study ofverbal pounduency and a fMRI case study of visual processingto illustrate the use of nonlinear PCA and some aspects offunctional anatomy that can be addressed with this tech-nique The PET study is used to show that a considerableamount of variance can be accounted for by interactionsamong causes of the data and is presented for comparisonwith the original linear PCA characterization in Friston etal (1993) In brief we will show that the two experi-mental factors (task and time) combine to expressthemselves in a second-order mode that repoundects time-dependent adaptation of task-related responses Thiseiexclect can be construed as large-scale neurophysiologicalplasticity attributable to strategic changes in cognitiveprocessing during intrinsic relative to extrinsic genera-tion of words The second example using fMRI dealsmore explicitly with the modulation of one brain systemby another In particular the interactions betweenspecialized cortical systems that may be mediated bycorticothalamic loops

(a) A PET study of verbal pounduency(i) Data acquisition experimental design and preprocessing

The data were obtained from centve subjects scanned 12times (every 8 min) while performing one of two verbaltasks Scans were obtained with a CTI PETcamera (model953B CTI Knoxville TN USA) 15O was administeredintravenously as radiolabelled water infused over 2 minTotal counts per voxel during the build-up phase of radio-activity served as an estimate of regional cerebral bloodpoundow (rCBF) Subjects performed two tasks in alternationOne task involved repeating a letter presented aurally atone per two seconds (word shadowing) The other was apaced verbal pounduency task where the subjects respondedwith a word that began with the letter presented (intrinsicword generation)The data were realigned stereotacticallynormalized and smoothed with a 16 mm Gaussian kernel(Friston et al 1996c) The data were subject to a conven-tional SPM analysis using multiple linear regression with12 condition-specicentc eiexclects centve subject eiexclects and globalactivity as described in Friston et al (1995a) Parameterestimates representing condition-specicentc eiexclects averaged

over subjects were selected from voxels that exceeded athreshold of p5005 in the ensuing SPMfFgand subject tononlinear PCA as described below

(ii) Nonlinear PCAThe data were reduced to an eight-dimensional

subspace using SVD and entered into the nonlinear PCAusing two sources The ability of these two sources andtheir interaction to explain the observed regional activityis illustrated in centgure 3a Here an arbitrary voxel (thatshowing the highest F-value in the conventional SPManalysis) was selected from the left inferior frontal gyrus(Brodmann Area 47) The observed condition-specicentcactivity over 12 scans is shown in black and thatpredicted by the two sources is shown in white The rela-tive amount of variance accounted for by the two sourcesand their interaction is shown in the middle panel It canbe seen that 88 of the total variance over all voxelsincluded in the analysis can be explained by two sourcesThe second-order mode accounts of 22 of this (afterremoving that which can be modelled by the centrst-ordereiexclects) and would have been distributed over othermodes in a conventional PCA Figure 3b shows this distri-bution indicating that the centfth and sixth eigenimages ina conventional PCA largely comprise the interactionbetween the two modes identicented by nonlinear PCA

The centrst- and second-order modes are seen in centgure 4along with their expression over the 12 scans It is imme-diately apparent that the centrst mode repoundects task-relatedeiexclects paralleling the alternation between word genera-tion and word shadowing This procentle of brain regions istypical of verbal pounduency paradigms that isolate theintrinsic generation of semantic representations encodingand retrieval processes required to compare the currentoutput with previous words and the maintenance of anappropriate cognitive set The key regions involvedinclude the thalamus dorsolateral prefrontal cortex ante-rior cingulate temporal cortices and cerebellum Thesecond mode represents the other experimental factornamely time or order eiexclects A nonlinear eiexclect is evidentwith increases in activity in the cerebellar thalamic andleft basotemporal regions More interesting is the second-order mode that by implication repoundects an interactionbetween task-related responses and time ie time-dependent increases in physiological responses elicited bycognitive operations that distinguish between the twotasks employed This physiological adaptation in mostpronounced in Brocarsquos Area (Brodmann Area 44 in theleft prefrontal cortex and the right lateral thalamicregions) Brocarsquos Area is traditionally associated withspeech production and appears to undergo a profoundchange in its relative activation during word shadowingand generation after the centrst pair of scans that presum-ably repoundects an underlying change in cognitive archi-tecture or set

(b) A fMRI study of colour and motion processing(i) Data acquisition experimental design and preprocessing

The experiment was performed on a 2 Tesla MagnetomVISION (Siemens Erlangen Germany) whole body MRIsystem equipped with a head volume coil Contiguousmultislice T

2 -weighted fMRI images (TE ˆ 40 ms64 mmpound64 mmpound48 mm 3 mmpound3 mmpound3 mm voxels)

140 K Friston and others Characterizing interactions between modes of brain activity

PhilTrans R Soc Lond B (2000)

were obtained with echoplanar imaging using an axialslice orientation The eiexclective repetition time was 48 s Ayoung right-handed subject was scanned under fourdiiexclerent conditions in six scan epochs intercalated witha low level (visual centxation) baseline condition The fourconditions were repeated eight times in a pseudorandomorder giving 384 scans in total or 32 stimulation^baselineepoch pairs During all stimulation conditions the subjectlooked at dots back-projected on a screen by an LCDvideo projector The four experimental conditionscomprised the presentation of (i) radially moving dotsand (ii) stationary dots using (i) luminance contrast and(ii) chromatic contrast in a two-by-two factorial designLuminance contrast was established using isochromaticstimuli (red dots on a red background or green dots on agreen background) Hue contrast was obtained by usingred (or green) dots on a green (or red) background andestablishing isoluminance with poundicker photometry In thetwo movement conditions the dots moved radially fromthe centre of the screen at 88 s71 to the periphery wherethey vanished This creates the impression of optical poundowBy using these stimuli we hoped to excite activity invisual motion systems and those specialized for colourprocessing Any interaction between these systems wouldbe expressed in terms of motion-sensitive responses thatdepended on the hue or luminance contrast subtendingthat motion

The time-series were realigned corrected for movement-related eiexclects and spatially normalized into the standardspace of Talairach amp Tournoux (1988) using the subjectrsquosco-registered structural T1 scan and nonlinear deforma-tions (Friston et al 1996c) The data were spatiallysmoothed with a 6 mm isotropic Gaussian kernel As inthe PETexample voxels were selected that showed signicent-cant condition-specicentc eiexclects according to a conventionalSPM analysis (Friston et al 1995b Worsley amp Friston1995) This analysis used a multiple linear regression andcondition-specicentc box car regressors convolved with ahaemodynamic response function In this instance thenumber of voxels was exceeding large and we used ahigher threshold than in the PETanalysis ( p ˆ 0001) andincluded only those voxels that were posterior to theposterior commissure

(ii) Nonlinear PCAThe data were again reduced to an eight-dimensional

subspace using SVD and entered into the nonlinear PCAusing two sources The functional attribution of thesesources was established by looking at the expression of thecorresponding centrst-order modes over the four conditionsThe expression of epoch-related responses over all 32stimulation^baseline epoch pairs are shown in terms ofthe four conditions in centgure 5 This expression is simplythe score on the centrst principal component over all 32epoch-related responses for each source The centrst mode is

Characterizing interactions between modes of brain activity K Friston and others 141

Phil Trans R Soc Lond B (2000)

(a)

(b)

(c)

5

4

3

2

1

0

- 1

- 2

- 3

acti

vity

70

60

50

40

30

20

10

0

4

35

3

25

2

15

1

05

0

o

f va

rian

ce

1 2 3 4 5 6scan

7 8 9 10 11 12

mode 1652

mode 2207

interaction22

1 2 3 4eigenimage

5 6 7 8

Figure 3 Variance partitioning following a nonlinear PCA ofthe PET verbal pounduency study (a) Observed activity in a voxelin the left inferior frontal gyrus (centlled bars) and that predictedon the basis of two sources and their interaction (open bars)estimated with nonlinear PCA Activity is in units

Figure 3 (Cont) corresponding to regional cerebralperfusion in ml dl71 min71 (b) Variance over all voxelsincluded in the analysis accounted for by the two sources (ormodes) and their interaction Total ˆ 88 (c) Distributionof variance accounted for by second-order or interactioneiexclects over the conventional eigenimages obtained in theinitial SVD dimension reduction

clearly a motion-sensitive mode but one that embodiessome colour preference in the sense that the motion-dependent responses of this system are accentuated in thepresence of colour cues This was not quite what we hadanticipated the centrst-order eiexclect contains what would

functionally be called an interaction between motion andcolour processing The second source appears to beconcerned exclusively with colour processing in the sensethat its expression is uniformly higher under colourstimuli relative to isochromatic stimuli in a way that does

142 K Friston and others Characterizing interactions between modes of brain activity

PhilTrans R Soc Lond B (2000)

(a) (b)

(i)

(ii)

(iii)

15

1

05

0

- 05

- 1

- 15

1

05

0

- 05

- 1

- 15

acute 10- 5

5

0

- 5

acute 10- 6

1 2 3 4 5 6condition

7 8 9 10 11 12

Figure 4 Maximum intensity projections and expression of the centrst- and second-order spatial modes of the PET verbal pounduencystudy (i) Spatial mode 1 (ii) spatial mode 2 and (iii) second-order mode The maximum intensity projections (a) are of thepositive values of each mode and are displayed in standard format The three orthogonal brain views are from the right theback and the top of the brain The projections have been scaled to the maximum intensity of each mode The time-dependentexpression of these modes are in terms of the 12 scans (b) The units are adimensional and their absolute values are not important(the variance they account for is determined by the scaling of the spatial modes which in turn is dictated by the sigmoidsquashing function see centgure 3)

not depend on motion The corresponding anatomicalprocentle is seen in centgure 6 (maximum intensity projectionsin centgure 6a and thresholded axial sections in centgure 6b)The centrst-order mode which shows both motion andcolour-related responses shows high loadings in bilateralmotion-sensitive complex V5 (Brodmann Areas 19 and 37at the occipto-temporal junction) and areas traditionallyassociated with colour processing (V4oumlthe lingualgyrus Brodmann Area 19 ventromedially) The secondcentrst-order mode is most prominent in the hippocampusparahippocampal and related lingual cortices on bothsides The two more lateral blobs subsume the tails of thecaudate nuclei (centgure 6b(ii)) This system is not onenormally associated with colour processing but it should

be noted that some of the main eiexclect of colour has beenexplained by the centrst mode that includes V4 Insummary the two centrst-order modes comprise (i) anextrastriate cortical system including V5 and V4 thatresponds to motion and preferentially so when motion issupported by colour cues and (ii) a (para)hippocampal^lingual system that is concerned exclusively with colourprocessing above and beyond that accounted for by thecentrst system The critical question is where do these modesinteract

The interaction between the extrastriate and (para)hippocampal lingual systems conforms to the second-order mode in the lower panels This mode highlights thepulvinar of the thalamus and V5 bilaterally This is apleasing result in that it clearly implicates the thalamus inthe integration of extrastriate and (para)hippocampalsystems This integration being mediated by recurrent(sub)corticothalamic connections It is also a result thatwould not have obtained from a conventional SPManalysis Indeed we looked for an interaction betweenmotion and colour processing and did not see any sucheiexclect in the pulvinar The reason that the nonlinear PCAwas able to centnd this interaction was that there were noconstraints on the sources underlying the interaction In aconventional SPM analysis the sources are explicitlyassumed to be colour and motion in the visual centeldwhereas the two interacting modes identicented by thenonlinear PCA were caused by complicated admixtures ofcolour and motion This result is presented to illustratethe potential usefulness of nonlinear PCA not to makeany statistical inferences about reproducible functionalarchitectures The exploratory analysis based on this casestudy could now be used to motivate hypothesis-ledanalyses of other subjects

4 CONCLUSION

In this paper we have described a specicentc form ofnonlinear PCA that is predicated on the interactionbetween underlying sources in modulating spatial modesof brain activity Its theoretical motivation stems directlyfrom a second-order approximation to the Taylor expan-sion of any nonlinear function of sources that can causemultivariate observations A simple three-layer neuronalnetwork architecture is sucurrencient to identify or estimatethe underlying causes and associated centrst- and second-order spatial modes The centrst-order modes correspond toconventional eigenimages or principal components andthe second-order modes describe the patterns of brainactivity that result from interactions among thesesources The ensuing decomposition into centrst- andsecond-order components represents an exploratoryanalysis of the data that eschews some of the shortcom-ings of conventional PCA In particular nonlinear PCAallows for the context-sensitive expression of spatialmodes through second-order modes that can be inter-preted as the anatomical substrate of integration ormodulation The highly constrained form of nonlinearPCA presented above has an intuitive interpretation interms of pairwise interactions among underlying sourcesand by virtue of this represents a useful and parsimo-nious characterization of functional neuroimaging time-series

Characterizing interactions between modes of brain activity K Friston and others 143

Phil Trans R Soc Lond B (2000)

04

03

02

01

00

- 01

- 02

- 03

(a)

04

02

0

- 02

- 04

- 06

(b)

cond

itio

nndashsp

ecif

ic e

xpre

ssio

n

motion

motion

stat

colour colour

isoch isoch

stat

1 2condition

3 4

Figure 5 Condition-specicentc expression of the two centrst-ordermodes ensuing from the visual processing fMRI studyThese data represent the degree to which the centrst principalcomponent of epoch-related waveforms over the 32 photicstimulation^baseline pairs was expressed These condition-specicentc responses are plotted in terms of the four conditionsfor the two modes motion motion present stat stationarydots colour isoluminant chromatic contrast stimuli isochisochromatic luminance contrast stimuli (a) First-ordermode 1 (b) centrst-order mode 2

In this paper we have chosen to illustrate the techniqueusing the interaction between modes associated withcolour and motion processing Nonlinear PCA could ofcourse be used in any situation where one expects theactivity of a distributed brain system to be modulated bythe expression of another system Many examples come tomind that may or may not be grounded in cognitivescience or neuroscience models For example Are the

modes implicated in the visual processing of word formsand graphemes modulated by semantic modes in moreanterior temporal and parietal cortices Althoughnonlinear PCA is an exploratory device and is implicitlydata-led careful experimental design can be used tocontrol the expression of various spatial modes that onewishes to characterize As a general point it is likely thatthe more powerful designs will be factorial in nature

144 K Friston and others Characterizing interactions between modes of brain activity

PhilTrans R Soc Lond B (2000)

z = - 4 mm

z = 0 mm

z = 4 mm

(a) (b)

(i)

(ii)

(iii)

Figure 6 Maximum intensity projections and axial (transverse) sections of the centrst- and second-order spatial modes of the fMRIphotic stimulation study The maximum intensity projections (a) adhere to the same format as in centgure 4 The axial slices havebeen selected to include the maxima of the corresponding spatial modes (i) Spatial mode 1 (ii) spatial mode 2 and (iii) second-order mode In this display format the modes have been thresholded at 164 of each modersquos standard deviation over all voxels(white areas) The resulting excursion set has been superimposed onto a structural T1-weighted MRI image conforming to thesame anatomical space (b)

allowing the expression of one mode associated with oneexperimental factor to be assessed under diiexclerent levelsof the expression of a second mode elicited by a secondexperimental cognitive or sensory factor In someinstances factorial designs are not always easy toimplement (eg in selective attention because it is dicurrencultto attend selectively to a particular attribute when it isnot present in the visual centeld) However many multi-factorial experiments designed to look at languageprocessing and memory may lend themselves nicely tocharacterization using the techniques described in thispaper

(a) Extensions and limitationsThe limitations of the nonlinear PCA proposed above

are embodied in the constraints on the form of thedecomposition assumed The most obvious constraint isthat it only allows for second-order interactions amongsources or causes of the data whereas higher-order inter-actions may prevail It would of course be easy toextend the neural net architecture to include third- orhigher-order nodes and this may be justicented in some dataanalytic situations In neuroimaging however the time-series one deals with are usually quite short and noisy andsimply identifying second-order eiexclects can be quite ambi-tious The second limitation is that the number of sourceshas to be prespecicented Again this may be a drawback interms of system identicentcation and independent compo-nent analysis in general However in neuroimaging onehas experimental control over the number of factors (iesources) that are likely to cause neurophysiologicalchanges and specifying the number of sources is a muchmore tenable in terms of justicentable restrictions on thecasual model assumed for the data

Another important consideration is that in the specialapplication of nonlinear PCA to functional imagingdata an initial dimension reduction using SVD isrequired This is because there are many more voxelsthan observations It is well known that systematic errorscan creep into applying SVD to simple nonlinear depen-dencies and that these depend on the rate of convergenceof the Taylor series associated with equation (1) In thispaper the SVD is done centrst and then the nonlinearanalysis is performed It is always possible that the SVDhas not established the right bases for the subsequentanalysis and that some bias will ensue The result will bethat apparent modulations of the centrst-order modes willnot be correct These issues represent areas of futurework and could be addressed using `toyrsquo nonlinearsystems and synthetic imaging data and by examiningthe sensitivity of the second-order modes to the degree ofSVD dimension reduction

A more biological consideration relates to themechanism of the interaction In the examples presentedabove we have assumed that the interaction is expressedat a neuronal level in terms of modulation of neuronalresponses and dynamics themselves It should be borne inmind that interactions can also be expressed at the levelof the haemodynamic response to [non-interacting]neuronal responses (eg Friston et al 1998) This shouldbe considered where there is a high degree of anatomicalconvergence between centrst-order modes that evidence astrong interaction

By virtue of the iterative scheme used for learning inthe neural net there is always the problem of localminima and the associated dependency on starting esti-mates In our applications we start with the conventionalPCA solution and therefore ensure that the ensuingmodes and interactions always account for more variancethan the corresponding linear PCA In this sense there isa unique solution (for any given gradient descent scheme)and this is the nearest to the solution where the second-order eiexclects are zero

Perhaps the most interesting limitation of the techniquepresented in this paper is buried in the assumption thatthere exists a linear combination of the inputs that givesthe expression of the sources This depends on theassumption (see 2) that centrst- and second-order modesare not collinear As long as they are not collinear there isalways a set of feed-forward connection strengths thatspan the subspace of one centrst-order mode that is ortho-gonal to all other modes (centrst and second order) Whatare the implications of collinearity between a centrst- andsecond-order mode Collinearity means that the expres-sion of a centrst-order mode is itself sensitive to the expres-sion of another mode (ie the centrst- and second-ordermodes are the same thing) The possibility of this speaksto two fundamentally diiexclerent context-sensitive eiexclectsThe centrst is when the interaction between two modes orcauses is expressed as a second-order mode with a distri-bution that is distinct from both centrst-order modes This isthe situation considered in this paper and can beaddressed using nonlinear PCA as described aboveSecond the interaction may be expressed solely in termsof the expression of one of the two centrst-order modesHere there is no second-order mode only a contextualexpression of centrst-order modes This second form ofcontext-sensitivity requires a diiexclerent sort of approach(nonlinear ICA) and is interesting because it may repre-sents a true contextual eiexclect with which the brain has tocontend in everyday sensory processing

This work was funded by the Wellcome TrustWe would also liketo thank Gary Green and an anonymous reviewer for help inpresenting this work

REFERENCES

Biswal B Yetkin F Z Haughton V M amp Hyde J S 1995Functional connectivity in the motor cortex of resting humanbrain using echo-planar MRI Magn Res Med 34 537^541

Bulaquo chel C Josephs O Rees G Turner R Frith C D ampFriston K J 1998 The functional anatomy of attention tovisual motion a fMRI study Brain 121 1281^1294

Common P 1994 Independent component analysis a newconcept Signal Processing 36 287^314

Dong D amp McAvoy T J 1996 Nonlinear principal componentanalysisoumlbased on principal curves and neural networksComp Chem Engng 20 65^78

Foldiak P 1990 Forming sparse representations by local anti-Hebbian learning Biol Cybern 64 165^170

Friston K J 1995 Functional and eiexclective connectivity inneuroimaging a synthesis Hum Brain Mapp 2 56^78

Friston K J Frith C Liddle P amp Frackowiak R S J 1993Functional connectivity the principal component analysis oflarge data sets J Cerebr Blood-Flow Metab 13 5^14

Characterizing interactions between modes of brain activity K Friston and others 145

Phil Trans R Soc Lond B (2000)

Friston K J Holmes A P Worsley K J Poline J B FrithC D amp Frackowiak R S J 1995a Statistical parametricmaps in functional imaging a general linear approach HumBrain Mapp 2 189^210

Friston K J Holmes A P Poline J-B Grasby P JWilliams S C R Frackowiak R S J amp Turner R 1995bAnalysis of fMRI time-series revisited NeuroImage 2 45^53

Friston K J Poline J-B Holmes A P Frith C D ampFrackowiak R S J 1996a A multivariate analysis of PETactivation studies Hum Brain Mapp 4 140^151

Friston K J Frith C D Fletcher P Liddle P F ampFrackowiak R S J 1996b Functional topography multi-dimensional scaling and functional connectivity in the brainCerebr Cortex 6 156^164

Friston K J Ashburner J Frith C D Poline J-B HeatherJ D amp Frackowiak R S J 1996c Spatial registration andnormalisation of images Hum Brain Mapp 3 165^189

Friston K J Josephs O Rees G amp Turner R 1998Nonlinear event-related responses in fMRI Magn Reson Med39 41^52

Karhunen J amp Joutsensalo J 1994 Representation and separa-tion of signals using nonlinear PCA type learning NeuralNetworks 7 113^127

Krame M A 1991 Nonlinear principal component analysisusing auto-associative neural networks AIChE J 37 233^243

McDonald R P 1984 Concentrmatory models for non-linear structural analysis In Data analysis and informaticsIII (Versailles 1983) pp 425^432 Amsterdam North-Holland

McIntosh A R Bookstien F L Haxby J V amp Grady C L1996 Spatial pattern analysis of functional brain images usingpartial least squares NeuroImage 3 143^157

Softky W amp Kammen D 1991 Correlations in high dimen-sional or asymmetric data sets Hebbian neuronal processingNeural Networks 4 337^347

Talairach J amp Tournoux P 1988 A coplanar stereotaxic atlas of ahuman brain Stuttgart Germany Thieme

Taleb A amp Jutten C 1997 Nonlinear source separation thepost-nonlinear mixtures In ESANN rsquo97 Bruges April 1997pp 279^284

Treue S amp Maunsell H R 1996 Attentional modulation ofvisual motion processing in cortical areas MT and MSTNature 382 539^541

Wold S 1992 Nonlinear partial least squares modellingChemometrics Int Lab Syst 14 71^84

Worsley K J amp Friston K J 1995 Analysis of fMRI time-series revisitedoumlagain NeuroImage 2 173^181

Zeki S 1990 The motion pathways of the visual cortex InVision coding and ecurrenciency (ed C Blakemore) pp 321^345Cambridge University Press

146 K Friston and others Characterizing interactions between modes of brain activity

PhilTrans R Soc Lond B (2000)

Page 7: Nonlinear PCA: characterizing interactions between modes ... › ~karl › Nonlinear PCA.pdf · Eigenimage analysis has been applied widely to positron emission tomographic (PET)

were obtained with echoplanar imaging using an axialslice orientation The eiexclective repetition time was 48 s Ayoung right-handed subject was scanned under fourdiiexclerent conditions in six scan epochs intercalated witha low level (visual centxation) baseline condition The fourconditions were repeated eight times in a pseudorandomorder giving 384 scans in total or 32 stimulation^baselineepoch pairs During all stimulation conditions the subjectlooked at dots back-projected on a screen by an LCDvideo projector The four experimental conditionscomprised the presentation of (i) radially moving dotsand (ii) stationary dots using (i) luminance contrast and(ii) chromatic contrast in a two-by-two factorial designLuminance contrast was established using isochromaticstimuli (red dots on a red background or green dots on agreen background) Hue contrast was obtained by usingred (or green) dots on a green (or red) background andestablishing isoluminance with poundicker photometry In thetwo movement conditions the dots moved radially fromthe centre of the screen at 88 s71 to the periphery wherethey vanished This creates the impression of optical poundowBy using these stimuli we hoped to excite activity invisual motion systems and those specialized for colourprocessing Any interaction between these systems wouldbe expressed in terms of motion-sensitive responses thatdepended on the hue or luminance contrast subtendingthat motion

The time-series were realigned corrected for movement-related eiexclects and spatially normalized into the standardspace of Talairach amp Tournoux (1988) using the subjectrsquosco-registered structural T1 scan and nonlinear deforma-tions (Friston et al 1996c) The data were spatiallysmoothed with a 6 mm isotropic Gaussian kernel As inthe PETexample voxels were selected that showed signicent-cant condition-specicentc eiexclects according to a conventionalSPM analysis (Friston et al 1995b Worsley amp Friston1995) This analysis used a multiple linear regression andcondition-specicentc box car regressors convolved with ahaemodynamic response function In this instance thenumber of voxels was exceeding large and we used ahigher threshold than in the PETanalysis ( p ˆ 0001) andincluded only those voxels that were posterior to theposterior commissure

(ii) Nonlinear PCAThe data were again reduced to an eight-dimensional

subspace using SVD and entered into the nonlinear PCAusing two sources The functional attribution of thesesources was established by looking at the expression of thecorresponding centrst-order modes over the four conditionsThe expression of epoch-related responses over all 32stimulation^baseline epoch pairs are shown in terms ofthe four conditions in centgure 5 This expression is simplythe score on the centrst principal component over all 32epoch-related responses for each source The centrst mode is

Characterizing interactions between modes of brain activity K Friston and others 141

Phil Trans R Soc Lond B (2000)

(a)

(b)

(c)

5

4

3

2

1

0

- 1

- 2

- 3

acti

vity

70

60

50

40

30

20

10

0

4

35

3

25

2

15

1

05

0

o

f va

rian

ce

1 2 3 4 5 6scan

7 8 9 10 11 12

mode 1652

mode 2207

interaction22

1 2 3 4eigenimage

5 6 7 8

Figure 3 Variance partitioning following a nonlinear PCA ofthe PET verbal pounduency study (a) Observed activity in a voxelin the left inferior frontal gyrus (centlled bars) and that predictedon the basis of two sources and their interaction (open bars)estimated with nonlinear PCA Activity is in units

Figure 3 (Cont) corresponding to regional cerebralperfusion in ml dl71 min71 (b) Variance over all voxelsincluded in the analysis accounted for by the two sources (ormodes) and their interaction Total ˆ 88 (c) Distributionof variance accounted for by second-order or interactioneiexclects over the conventional eigenimages obtained in theinitial SVD dimension reduction

clearly a motion-sensitive mode but one that embodiessome colour preference in the sense that the motion-dependent responses of this system are accentuated in thepresence of colour cues This was not quite what we hadanticipated the centrst-order eiexclect contains what would

functionally be called an interaction between motion andcolour processing The second source appears to beconcerned exclusively with colour processing in the sensethat its expression is uniformly higher under colourstimuli relative to isochromatic stimuli in a way that does

142 K Friston and others Characterizing interactions between modes of brain activity

PhilTrans R Soc Lond B (2000)

(a) (b)

(i)

(ii)

(iii)

15

1

05

0

- 05

- 1

- 15

1

05

0

- 05

- 1

- 15

acute 10- 5

5

0

- 5

acute 10- 6

1 2 3 4 5 6condition

7 8 9 10 11 12

Figure 4 Maximum intensity projections and expression of the centrst- and second-order spatial modes of the PET verbal pounduencystudy (i) Spatial mode 1 (ii) spatial mode 2 and (iii) second-order mode The maximum intensity projections (a) are of thepositive values of each mode and are displayed in standard format The three orthogonal brain views are from the right theback and the top of the brain The projections have been scaled to the maximum intensity of each mode The time-dependentexpression of these modes are in terms of the 12 scans (b) The units are adimensional and their absolute values are not important(the variance they account for is determined by the scaling of the spatial modes which in turn is dictated by the sigmoidsquashing function see centgure 3)

not depend on motion The corresponding anatomicalprocentle is seen in centgure 6 (maximum intensity projectionsin centgure 6a and thresholded axial sections in centgure 6b)The centrst-order mode which shows both motion andcolour-related responses shows high loadings in bilateralmotion-sensitive complex V5 (Brodmann Areas 19 and 37at the occipto-temporal junction) and areas traditionallyassociated with colour processing (V4oumlthe lingualgyrus Brodmann Area 19 ventromedially) The secondcentrst-order mode is most prominent in the hippocampusparahippocampal and related lingual cortices on bothsides The two more lateral blobs subsume the tails of thecaudate nuclei (centgure 6b(ii)) This system is not onenormally associated with colour processing but it should

be noted that some of the main eiexclect of colour has beenexplained by the centrst mode that includes V4 Insummary the two centrst-order modes comprise (i) anextrastriate cortical system including V5 and V4 thatresponds to motion and preferentially so when motion issupported by colour cues and (ii) a (para)hippocampal^lingual system that is concerned exclusively with colourprocessing above and beyond that accounted for by thecentrst system The critical question is where do these modesinteract

The interaction between the extrastriate and (para)hippocampal lingual systems conforms to the second-order mode in the lower panels This mode highlights thepulvinar of the thalamus and V5 bilaterally This is apleasing result in that it clearly implicates the thalamus inthe integration of extrastriate and (para)hippocampalsystems This integration being mediated by recurrent(sub)corticothalamic connections It is also a result thatwould not have obtained from a conventional SPManalysis Indeed we looked for an interaction betweenmotion and colour processing and did not see any sucheiexclect in the pulvinar The reason that the nonlinear PCAwas able to centnd this interaction was that there were noconstraints on the sources underlying the interaction In aconventional SPM analysis the sources are explicitlyassumed to be colour and motion in the visual centeldwhereas the two interacting modes identicented by thenonlinear PCA were caused by complicated admixtures ofcolour and motion This result is presented to illustratethe potential usefulness of nonlinear PCA not to makeany statistical inferences about reproducible functionalarchitectures The exploratory analysis based on this casestudy could now be used to motivate hypothesis-ledanalyses of other subjects

4 CONCLUSION

In this paper we have described a specicentc form ofnonlinear PCA that is predicated on the interactionbetween underlying sources in modulating spatial modesof brain activity Its theoretical motivation stems directlyfrom a second-order approximation to the Taylor expan-sion of any nonlinear function of sources that can causemultivariate observations A simple three-layer neuronalnetwork architecture is sucurrencient to identify or estimatethe underlying causes and associated centrst- and second-order spatial modes The centrst-order modes correspond toconventional eigenimages or principal components andthe second-order modes describe the patterns of brainactivity that result from interactions among thesesources The ensuing decomposition into centrst- andsecond-order components represents an exploratoryanalysis of the data that eschews some of the shortcom-ings of conventional PCA In particular nonlinear PCAallows for the context-sensitive expression of spatialmodes through second-order modes that can be inter-preted as the anatomical substrate of integration ormodulation The highly constrained form of nonlinearPCA presented above has an intuitive interpretation interms of pairwise interactions among underlying sourcesand by virtue of this represents a useful and parsimo-nious characterization of functional neuroimaging time-series

Characterizing interactions between modes of brain activity K Friston and others 143

Phil Trans R Soc Lond B (2000)

04

03

02

01

00

- 01

- 02

- 03

(a)

04

02

0

- 02

- 04

- 06

(b)

cond

itio

nndashsp

ecif

ic e

xpre

ssio

n

motion

motion

stat

colour colour

isoch isoch

stat

1 2condition

3 4

Figure 5 Condition-specicentc expression of the two centrst-ordermodes ensuing from the visual processing fMRI studyThese data represent the degree to which the centrst principalcomponent of epoch-related waveforms over the 32 photicstimulation^baseline pairs was expressed These condition-specicentc responses are plotted in terms of the four conditionsfor the two modes motion motion present stat stationarydots colour isoluminant chromatic contrast stimuli isochisochromatic luminance contrast stimuli (a) First-ordermode 1 (b) centrst-order mode 2

In this paper we have chosen to illustrate the techniqueusing the interaction between modes associated withcolour and motion processing Nonlinear PCA could ofcourse be used in any situation where one expects theactivity of a distributed brain system to be modulated bythe expression of another system Many examples come tomind that may or may not be grounded in cognitivescience or neuroscience models For example Are the

modes implicated in the visual processing of word formsand graphemes modulated by semantic modes in moreanterior temporal and parietal cortices Althoughnonlinear PCA is an exploratory device and is implicitlydata-led careful experimental design can be used tocontrol the expression of various spatial modes that onewishes to characterize As a general point it is likely thatthe more powerful designs will be factorial in nature

144 K Friston and others Characterizing interactions between modes of brain activity

PhilTrans R Soc Lond B (2000)

z = - 4 mm

z = 0 mm

z = 4 mm

(a) (b)

(i)

(ii)

(iii)

Figure 6 Maximum intensity projections and axial (transverse) sections of the centrst- and second-order spatial modes of the fMRIphotic stimulation study The maximum intensity projections (a) adhere to the same format as in centgure 4 The axial slices havebeen selected to include the maxima of the corresponding spatial modes (i) Spatial mode 1 (ii) spatial mode 2 and (iii) second-order mode In this display format the modes have been thresholded at 164 of each modersquos standard deviation over all voxels(white areas) The resulting excursion set has been superimposed onto a structural T1-weighted MRI image conforming to thesame anatomical space (b)

allowing the expression of one mode associated with oneexperimental factor to be assessed under diiexclerent levelsof the expression of a second mode elicited by a secondexperimental cognitive or sensory factor In someinstances factorial designs are not always easy toimplement (eg in selective attention because it is dicurrencultto attend selectively to a particular attribute when it isnot present in the visual centeld) However many multi-factorial experiments designed to look at languageprocessing and memory may lend themselves nicely tocharacterization using the techniques described in thispaper

(a) Extensions and limitationsThe limitations of the nonlinear PCA proposed above

are embodied in the constraints on the form of thedecomposition assumed The most obvious constraint isthat it only allows for second-order interactions amongsources or causes of the data whereas higher-order inter-actions may prevail It would of course be easy toextend the neural net architecture to include third- orhigher-order nodes and this may be justicented in some dataanalytic situations In neuroimaging however the time-series one deals with are usually quite short and noisy andsimply identifying second-order eiexclects can be quite ambi-tious The second limitation is that the number of sourceshas to be prespecicented Again this may be a drawback interms of system identicentcation and independent compo-nent analysis in general However in neuroimaging onehas experimental control over the number of factors (iesources) that are likely to cause neurophysiologicalchanges and specifying the number of sources is a muchmore tenable in terms of justicentable restrictions on thecasual model assumed for the data

Another important consideration is that in the specialapplication of nonlinear PCA to functional imagingdata an initial dimension reduction using SVD isrequired This is because there are many more voxelsthan observations It is well known that systematic errorscan creep into applying SVD to simple nonlinear depen-dencies and that these depend on the rate of convergenceof the Taylor series associated with equation (1) In thispaper the SVD is done centrst and then the nonlinearanalysis is performed It is always possible that the SVDhas not established the right bases for the subsequentanalysis and that some bias will ensue The result will bethat apparent modulations of the centrst-order modes willnot be correct These issues represent areas of futurework and could be addressed using `toyrsquo nonlinearsystems and synthetic imaging data and by examiningthe sensitivity of the second-order modes to the degree ofSVD dimension reduction

A more biological consideration relates to themechanism of the interaction In the examples presentedabove we have assumed that the interaction is expressedat a neuronal level in terms of modulation of neuronalresponses and dynamics themselves It should be borne inmind that interactions can also be expressed at the levelof the haemodynamic response to [non-interacting]neuronal responses (eg Friston et al 1998) This shouldbe considered where there is a high degree of anatomicalconvergence between centrst-order modes that evidence astrong interaction

By virtue of the iterative scheme used for learning inthe neural net there is always the problem of localminima and the associated dependency on starting esti-mates In our applications we start with the conventionalPCA solution and therefore ensure that the ensuingmodes and interactions always account for more variancethan the corresponding linear PCA In this sense there isa unique solution (for any given gradient descent scheme)and this is the nearest to the solution where the second-order eiexclects are zero

Perhaps the most interesting limitation of the techniquepresented in this paper is buried in the assumption thatthere exists a linear combination of the inputs that givesthe expression of the sources This depends on theassumption (see 2) that centrst- and second-order modesare not collinear As long as they are not collinear there isalways a set of feed-forward connection strengths thatspan the subspace of one centrst-order mode that is ortho-gonal to all other modes (centrst and second order) Whatare the implications of collinearity between a centrst- andsecond-order mode Collinearity means that the expres-sion of a centrst-order mode is itself sensitive to the expres-sion of another mode (ie the centrst- and second-ordermodes are the same thing) The possibility of this speaksto two fundamentally diiexclerent context-sensitive eiexclectsThe centrst is when the interaction between two modes orcauses is expressed as a second-order mode with a distri-bution that is distinct from both centrst-order modes This isthe situation considered in this paper and can beaddressed using nonlinear PCA as described aboveSecond the interaction may be expressed solely in termsof the expression of one of the two centrst-order modesHere there is no second-order mode only a contextualexpression of centrst-order modes This second form ofcontext-sensitivity requires a diiexclerent sort of approach(nonlinear ICA) and is interesting because it may repre-sents a true contextual eiexclect with which the brain has tocontend in everyday sensory processing

This work was funded by the Wellcome TrustWe would also liketo thank Gary Green and an anonymous reviewer for help inpresenting this work

REFERENCES

Biswal B Yetkin F Z Haughton V M amp Hyde J S 1995Functional connectivity in the motor cortex of resting humanbrain using echo-planar MRI Magn Res Med 34 537^541

Bulaquo chel C Josephs O Rees G Turner R Frith C D ampFriston K J 1998 The functional anatomy of attention tovisual motion a fMRI study Brain 121 1281^1294

Common P 1994 Independent component analysis a newconcept Signal Processing 36 287^314

Dong D amp McAvoy T J 1996 Nonlinear principal componentanalysisoumlbased on principal curves and neural networksComp Chem Engng 20 65^78

Foldiak P 1990 Forming sparse representations by local anti-Hebbian learning Biol Cybern 64 165^170

Friston K J 1995 Functional and eiexclective connectivity inneuroimaging a synthesis Hum Brain Mapp 2 56^78

Friston K J Frith C Liddle P amp Frackowiak R S J 1993Functional connectivity the principal component analysis oflarge data sets J Cerebr Blood-Flow Metab 13 5^14

Characterizing interactions between modes of brain activity K Friston and others 145

Phil Trans R Soc Lond B (2000)

Friston K J Holmes A P Worsley K J Poline J B FrithC D amp Frackowiak R S J 1995a Statistical parametricmaps in functional imaging a general linear approach HumBrain Mapp 2 189^210

Friston K J Holmes A P Poline J-B Grasby P JWilliams S C R Frackowiak R S J amp Turner R 1995bAnalysis of fMRI time-series revisited NeuroImage 2 45^53

Friston K J Poline J-B Holmes A P Frith C D ampFrackowiak R S J 1996a A multivariate analysis of PETactivation studies Hum Brain Mapp 4 140^151

Friston K J Frith C D Fletcher P Liddle P F ampFrackowiak R S J 1996b Functional topography multi-dimensional scaling and functional connectivity in the brainCerebr Cortex 6 156^164

Friston K J Ashburner J Frith C D Poline J-B HeatherJ D amp Frackowiak R S J 1996c Spatial registration andnormalisation of images Hum Brain Mapp 3 165^189

Friston K J Josephs O Rees G amp Turner R 1998Nonlinear event-related responses in fMRI Magn Reson Med39 41^52

Karhunen J amp Joutsensalo J 1994 Representation and separa-tion of signals using nonlinear PCA type learning NeuralNetworks 7 113^127

Krame M A 1991 Nonlinear principal component analysisusing auto-associative neural networks AIChE J 37 233^243

McDonald R P 1984 Concentrmatory models for non-linear structural analysis In Data analysis and informaticsIII (Versailles 1983) pp 425^432 Amsterdam North-Holland

McIntosh A R Bookstien F L Haxby J V amp Grady C L1996 Spatial pattern analysis of functional brain images usingpartial least squares NeuroImage 3 143^157

Softky W amp Kammen D 1991 Correlations in high dimen-sional or asymmetric data sets Hebbian neuronal processingNeural Networks 4 337^347

Talairach J amp Tournoux P 1988 A coplanar stereotaxic atlas of ahuman brain Stuttgart Germany Thieme

Taleb A amp Jutten C 1997 Nonlinear source separation thepost-nonlinear mixtures In ESANN rsquo97 Bruges April 1997pp 279^284

Treue S amp Maunsell H R 1996 Attentional modulation ofvisual motion processing in cortical areas MT and MSTNature 382 539^541

Wold S 1992 Nonlinear partial least squares modellingChemometrics Int Lab Syst 14 71^84

Worsley K J amp Friston K J 1995 Analysis of fMRI time-series revisitedoumlagain NeuroImage 2 173^181

Zeki S 1990 The motion pathways of the visual cortex InVision coding and ecurrenciency (ed C Blakemore) pp 321^345Cambridge University Press

146 K Friston and others Characterizing interactions between modes of brain activity

PhilTrans R Soc Lond B (2000)

Page 8: Nonlinear PCA: characterizing interactions between modes ... › ~karl › Nonlinear PCA.pdf · Eigenimage analysis has been applied widely to positron emission tomographic (PET)

clearly a motion-sensitive mode but one that embodiessome colour preference in the sense that the motion-dependent responses of this system are accentuated in thepresence of colour cues This was not quite what we hadanticipated the centrst-order eiexclect contains what would

functionally be called an interaction between motion andcolour processing The second source appears to beconcerned exclusively with colour processing in the sensethat its expression is uniformly higher under colourstimuli relative to isochromatic stimuli in a way that does

142 K Friston and others Characterizing interactions between modes of brain activity

PhilTrans R Soc Lond B (2000)

(a) (b)

(i)

(ii)

(iii)

15

1

05

0

- 05

- 1

- 15

1

05

0

- 05

- 1

- 15

acute 10- 5

5

0

- 5

acute 10- 6

1 2 3 4 5 6condition

7 8 9 10 11 12

Figure 4 Maximum intensity projections and expression of the centrst- and second-order spatial modes of the PET verbal pounduencystudy (i) Spatial mode 1 (ii) spatial mode 2 and (iii) second-order mode The maximum intensity projections (a) are of thepositive values of each mode and are displayed in standard format The three orthogonal brain views are from the right theback and the top of the brain The projections have been scaled to the maximum intensity of each mode The time-dependentexpression of these modes are in terms of the 12 scans (b) The units are adimensional and their absolute values are not important(the variance they account for is determined by the scaling of the spatial modes which in turn is dictated by the sigmoidsquashing function see centgure 3)

not depend on motion The corresponding anatomicalprocentle is seen in centgure 6 (maximum intensity projectionsin centgure 6a and thresholded axial sections in centgure 6b)The centrst-order mode which shows both motion andcolour-related responses shows high loadings in bilateralmotion-sensitive complex V5 (Brodmann Areas 19 and 37at the occipto-temporal junction) and areas traditionallyassociated with colour processing (V4oumlthe lingualgyrus Brodmann Area 19 ventromedially) The secondcentrst-order mode is most prominent in the hippocampusparahippocampal and related lingual cortices on bothsides The two more lateral blobs subsume the tails of thecaudate nuclei (centgure 6b(ii)) This system is not onenormally associated with colour processing but it should

be noted that some of the main eiexclect of colour has beenexplained by the centrst mode that includes V4 Insummary the two centrst-order modes comprise (i) anextrastriate cortical system including V5 and V4 thatresponds to motion and preferentially so when motion issupported by colour cues and (ii) a (para)hippocampal^lingual system that is concerned exclusively with colourprocessing above and beyond that accounted for by thecentrst system The critical question is where do these modesinteract

The interaction between the extrastriate and (para)hippocampal lingual systems conforms to the second-order mode in the lower panels This mode highlights thepulvinar of the thalamus and V5 bilaterally This is apleasing result in that it clearly implicates the thalamus inthe integration of extrastriate and (para)hippocampalsystems This integration being mediated by recurrent(sub)corticothalamic connections It is also a result thatwould not have obtained from a conventional SPManalysis Indeed we looked for an interaction betweenmotion and colour processing and did not see any sucheiexclect in the pulvinar The reason that the nonlinear PCAwas able to centnd this interaction was that there were noconstraints on the sources underlying the interaction In aconventional SPM analysis the sources are explicitlyassumed to be colour and motion in the visual centeldwhereas the two interacting modes identicented by thenonlinear PCA were caused by complicated admixtures ofcolour and motion This result is presented to illustratethe potential usefulness of nonlinear PCA not to makeany statistical inferences about reproducible functionalarchitectures The exploratory analysis based on this casestudy could now be used to motivate hypothesis-ledanalyses of other subjects

4 CONCLUSION

In this paper we have described a specicentc form ofnonlinear PCA that is predicated on the interactionbetween underlying sources in modulating spatial modesof brain activity Its theoretical motivation stems directlyfrom a second-order approximation to the Taylor expan-sion of any nonlinear function of sources that can causemultivariate observations A simple three-layer neuronalnetwork architecture is sucurrencient to identify or estimatethe underlying causes and associated centrst- and second-order spatial modes The centrst-order modes correspond toconventional eigenimages or principal components andthe second-order modes describe the patterns of brainactivity that result from interactions among thesesources The ensuing decomposition into centrst- andsecond-order components represents an exploratoryanalysis of the data that eschews some of the shortcom-ings of conventional PCA In particular nonlinear PCAallows for the context-sensitive expression of spatialmodes through second-order modes that can be inter-preted as the anatomical substrate of integration ormodulation The highly constrained form of nonlinearPCA presented above has an intuitive interpretation interms of pairwise interactions among underlying sourcesand by virtue of this represents a useful and parsimo-nious characterization of functional neuroimaging time-series

Characterizing interactions between modes of brain activity K Friston and others 143

Phil Trans R Soc Lond B (2000)

04

03

02

01

00

- 01

- 02

- 03

(a)

04

02

0

- 02

- 04

- 06

(b)

cond

itio

nndashsp

ecif

ic e

xpre

ssio

n

motion

motion

stat

colour colour

isoch isoch

stat

1 2condition

3 4

Figure 5 Condition-specicentc expression of the two centrst-ordermodes ensuing from the visual processing fMRI studyThese data represent the degree to which the centrst principalcomponent of epoch-related waveforms over the 32 photicstimulation^baseline pairs was expressed These condition-specicentc responses are plotted in terms of the four conditionsfor the two modes motion motion present stat stationarydots colour isoluminant chromatic contrast stimuli isochisochromatic luminance contrast stimuli (a) First-ordermode 1 (b) centrst-order mode 2

In this paper we have chosen to illustrate the techniqueusing the interaction between modes associated withcolour and motion processing Nonlinear PCA could ofcourse be used in any situation where one expects theactivity of a distributed brain system to be modulated bythe expression of another system Many examples come tomind that may or may not be grounded in cognitivescience or neuroscience models For example Are the

modes implicated in the visual processing of word formsand graphemes modulated by semantic modes in moreanterior temporal and parietal cortices Althoughnonlinear PCA is an exploratory device and is implicitlydata-led careful experimental design can be used tocontrol the expression of various spatial modes that onewishes to characterize As a general point it is likely thatthe more powerful designs will be factorial in nature

144 K Friston and others Characterizing interactions between modes of brain activity

PhilTrans R Soc Lond B (2000)

z = - 4 mm

z = 0 mm

z = 4 mm

(a) (b)

(i)

(ii)

(iii)

Figure 6 Maximum intensity projections and axial (transverse) sections of the centrst- and second-order spatial modes of the fMRIphotic stimulation study The maximum intensity projections (a) adhere to the same format as in centgure 4 The axial slices havebeen selected to include the maxima of the corresponding spatial modes (i) Spatial mode 1 (ii) spatial mode 2 and (iii) second-order mode In this display format the modes have been thresholded at 164 of each modersquos standard deviation over all voxels(white areas) The resulting excursion set has been superimposed onto a structural T1-weighted MRI image conforming to thesame anatomical space (b)

allowing the expression of one mode associated with oneexperimental factor to be assessed under diiexclerent levelsof the expression of a second mode elicited by a secondexperimental cognitive or sensory factor In someinstances factorial designs are not always easy toimplement (eg in selective attention because it is dicurrencultto attend selectively to a particular attribute when it isnot present in the visual centeld) However many multi-factorial experiments designed to look at languageprocessing and memory may lend themselves nicely tocharacterization using the techniques described in thispaper

(a) Extensions and limitationsThe limitations of the nonlinear PCA proposed above

are embodied in the constraints on the form of thedecomposition assumed The most obvious constraint isthat it only allows for second-order interactions amongsources or causes of the data whereas higher-order inter-actions may prevail It would of course be easy toextend the neural net architecture to include third- orhigher-order nodes and this may be justicented in some dataanalytic situations In neuroimaging however the time-series one deals with are usually quite short and noisy andsimply identifying second-order eiexclects can be quite ambi-tious The second limitation is that the number of sourceshas to be prespecicented Again this may be a drawback interms of system identicentcation and independent compo-nent analysis in general However in neuroimaging onehas experimental control over the number of factors (iesources) that are likely to cause neurophysiologicalchanges and specifying the number of sources is a muchmore tenable in terms of justicentable restrictions on thecasual model assumed for the data

Another important consideration is that in the specialapplication of nonlinear PCA to functional imagingdata an initial dimension reduction using SVD isrequired This is because there are many more voxelsthan observations It is well known that systematic errorscan creep into applying SVD to simple nonlinear depen-dencies and that these depend on the rate of convergenceof the Taylor series associated with equation (1) In thispaper the SVD is done centrst and then the nonlinearanalysis is performed It is always possible that the SVDhas not established the right bases for the subsequentanalysis and that some bias will ensue The result will bethat apparent modulations of the centrst-order modes willnot be correct These issues represent areas of futurework and could be addressed using `toyrsquo nonlinearsystems and synthetic imaging data and by examiningthe sensitivity of the second-order modes to the degree ofSVD dimension reduction

A more biological consideration relates to themechanism of the interaction In the examples presentedabove we have assumed that the interaction is expressedat a neuronal level in terms of modulation of neuronalresponses and dynamics themselves It should be borne inmind that interactions can also be expressed at the levelof the haemodynamic response to [non-interacting]neuronal responses (eg Friston et al 1998) This shouldbe considered where there is a high degree of anatomicalconvergence between centrst-order modes that evidence astrong interaction

By virtue of the iterative scheme used for learning inthe neural net there is always the problem of localminima and the associated dependency on starting esti-mates In our applications we start with the conventionalPCA solution and therefore ensure that the ensuingmodes and interactions always account for more variancethan the corresponding linear PCA In this sense there isa unique solution (for any given gradient descent scheme)and this is the nearest to the solution where the second-order eiexclects are zero

Perhaps the most interesting limitation of the techniquepresented in this paper is buried in the assumption thatthere exists a linear combination of the inputs that givesthe expression of the sources This depends on theassumption (see 2) that centrst- and second-order modesare not collinear As long as they are not collinear there isalways a set of feed-forward connection strengths thatspan the subspace of one centrst-order mode that is ortho-gonal to all other modes (centrst and second order) Whatare the implications of collinearity between a centrst- andsecond-order mode Collinearity means that the expres-sion of a centrst-order mode is itself sensitive to the expres-sion of another mode (ie the centrst- and second-ordermodes are the same thing) The possibility of this speaksto two fundamentally diiexclerent context-sensitive eiexclectsThe centrst is when the interaction between two modes orcauses is expressed as a second-order mode with a distri-bution that is distinct from both centrst-order modes This isthe situation considered in this paper and can beaddressed using nonlinear PCA as described aboveSecond the interaction may be expressed solely in termsof the expression of one of the two centrst-order modesHere there is no second-order mode only a contextualexpression of centrst-order modes This second form ofcontext-sensitivity requires a diiexclerent sort of approach(nonlinear ICA) and is interesting because it may repre-sents a true contextual eiexclect with which the brain has tocontend in everyday sensory processing

This work was funded by the Wellcome TrustWe would also liketo thank Gary Green and an anonymous reviewer for help inpresenting this work

REFERENCES

Biswal B Yetkin F Z Haughton V M amp Hyde J S 1995Functional connectivity in the motor cortex of resting humanbrain using echo-planar MRI Magn Res Med 34 537^541

Bulaquo chel C Josephs O Rees G Turner R Frith C D ampFriston K J 1998 The functional anatomy of attention tovisual motion a fMRI study Brain 121 1281^1294

Common P 1994 Independent component analysis a newconcept Signal Processing 36 287^314

Dong D amp McAvoy T J 1996 Nonlinear principal componentanalysisoumlbased on principal curves and neural networksComp Chem Engng 20 65^78

Foldiak P 1990 Forming sparse representations by local anti-Hebbian learning Biol Cybern 64 165^170

Friston K J 1995 Functional and eiexclective connectivity inneuroimaging a synthesis Hum Brain Mapp 2 56^78

Friston K J Frith C Liddle P amp Frackowiak R S J 1993Functional connectivity the principal component analysis oflarge data sets J Cerebr Blood-Flow Metab 13 5^14

Characterizing interactions between modes of brain activity K Friston and others 145

Phil Trans R Soc Lond B (2000)

Friston K J Holmes A P Worsley K J Poline J B FrithC D amp Frackowiak R S J 1995a Statistical parametricmaps in functional imaging a general linear approach HumBrain Mapp 2 189^210

Friston K J Holmes A P Poline J-B Grasby P JWilliams S C R Frackowiak R S J amp Turner R 1995bAnalysis of fMRI time-series revisited NeuroImage 2 45^53

Friston K J Poline J-B Holmes A P Frith C D ampFrackowiak R S J 1996a A multivariate analysis of PETactivation studies Hum Brain Mapp 4 140^151

Friston K J Frith C D Fletcher P Liddle P F ampFrackowiak R S J 1996b Functional topography multi-dimensional scaling and functional connectivity in the brainCerebr Cortex 6 156^164

Friston K J Ashburner J Frith C D Poline J-B HeatherJ D amp Frackowiak R S J 1996c Spatial registration andnormalisation of images Hum Brain Mapp 3 165^189

Friston K J Josephs O Rees G amp Turner R 1998Nonlinear event-related responses in fMRI Magn Reson Med39 41^52

Karhunen J amp Joutsensalo J 1994 Representation and separa-tion of signals using nonlinear PCA type learning NeuralNetworks 7 113^127

Krame M A 1991 Nonlinear principal component analysisusing auto-associative neural networks AIChE J 37 233^243

McDonald R P 1984 Concentrmatory models for non-linear structural analysis In Data analysis and informaticsIII (Versailles 1983) pp 425^432 Amsterdam North-Holland

McIntosh A R Bookstien F L Haxby J V amp Grady C L1996 Spatial pattern analysis of functional brain images usingpartial least squares NeuroImage 3 143^157

Softky W amp Kammen D 1991 Correlations in high dimen-sional or asymmetric data sets Hebbian neuronal processingNeural Networks 4 337^347

Talairach J amp Tournoux P 1988 A coplanar stereotaxic atlas of ahuman brain Stuttgart Germany Thieme

Taleb A amp Jutten C 1997 Nonlinear source separation thepost-nonlinear mixtures In ESANN rsquo97 Bruges April 1997pp 279^284

Treue S amp Maunsell H R 1996 Attentional modulation ofvisual motion processing in cortical areas MT and MSTNature 382 539^541

Wold S 1992 Nonlinear partial least squares modellingChemometrics Int Lab Syst 14 71^84

Worsley K J amp Friston K J 1995 Analysis of fMRI time-series revisitedoumlagain NeuroImage 2 173^181

Zeki S 1990 The motion pathways of the visual cortex InVision coding and ecurrenciency (ed C Blakemore) pp 321^345Cambridge University Press

146 K Friston and others Characterizing interactions between modes of brain activity

PhilTrans R Soc Lond B (2000)

Page 9: Nonlinear PCA: characterizing interactions between modes ... › ~karl › Nonlinear PCA.pdf · Eigenimage analysis has been applied widely to positron emission tomographic (PET)

not depend on motion The corresponding anatomicalprocentle is seen in centgure 6 (maximum intensity projectionsin centgure 6a and thresholded axial sections in centgure 6b)The centrst-order mode which shows both motion andcolour-related responses shows high loadings in bilateralmotion-sensitive complex V5 (Brodmann Areas 19 and 37at the occipto-temporal junction) and areas traditionallyassociated with colour processing (V4oumlthe lingualgyrus Brodmann Area 19 ventromedially) The secondcentrst-order mode is most prominent in the hippocampusparahippocampal and related lingual cortices on bothsides The two more lateral blobs subsume the tails of thecaudate nuclei (centgure 6b(ii)) This system is not onenormally associated with colour processing but it should

be noted that some of the main eiexclect of colour has beenexplained by the centrst mode that includes V4 Insummary the two centrst-order modes comprise (i) anextrastriate cortical system including V5 and V4 thatresponds to motion and preferentially so when motion issupported by colour cues and (ii) a (para)hippocampal^lingual system that is concerned exclusively with colourprocessing above and beyond that accounted for by thecentrst system The critical question is where do these modesinteract

The interaction between the extrastriate and (para)hippocampal lingual systems conforms to the second-order mode in the lower panels This mode highlights thepulvinar of the thalamus and V5 bilaterally This is apleasing result in that it clearly implicates the thalamus inthe integration of extrastriate and (para)hippocampalsystems This integration being mediated by recurrent(sub)corticothalamic connections It is also a result thatwould not have obtained from a conventional SPManalysis Indeed we looked for an interaction betweenmotion and colour processing and did not see any sucheiexclect in the pulvinar The reason that the nonlinear PCAwas able to centnd this interaction was that there were noconstraints on the sources underlying the interaction In aconventional SPM analysis the sources are explicitlyassumed to be colour and motion in the visual centeldwhereas the two interacting modes identicented by thenonlinear PCA were caused by complicated admixtures ofcolour and motion This result is presented to illustratethe potential usefulness of nonlinear PCA not to makeany statistical inferences about reproducible functionalarchitectures The exploratory analysis based on this casestudy could now be used to motivate hypothesis-ledanalyses of other subjects

4 CONCLUSION

In this paper we have described a specicentc form ofnonlinear PCA that is predicated on the interactionbetween underlying sources in modulating spatial modesof brain activity Its theoretical motivation stems directlyfrom a second-order approximation to the Taylor expan-sion of any nonlinear function of sources that can causemultivariate observations A simple three-layer neuronalnetwork architecture is sucurrencient to identify or estimatethe underlying causes and associated centrst- and second-order spatial modes The centrst-order modes correspond toconventional eigenimages or principal components andthe second-order modes describe the patterns of brainactivity that result from interactions among thesesources The ensuing decomposition into centrst- andsecond-order components represents an exploratoryanalysis of the data that eschews some of the shortcom-ings of conventional PCA In particular nonlinear PCAallows for the context-sensitive expression of spatialmodes through second-order modes that can be inter-preted as the anatomical substrate of integration ormodulation The highly constrained form of nonlinearPCA presented above has an intuitive interpretation interms of pairwise interactions among underlying sourcesand by virtue of this represents a useful and parsimo-nious characterization of functional neuroimaging time-series

Characterizing interactions between modes of brain activity K Friston and others 143

Phil Trans R Soc Lond B (2000)

04

03

02

01

00

- 01

- 02

- 03

(a)

04

02

0

- 02

- 04

- 06

(b)

cond

itio

nndashsp

ecif

ic e

xpre

ssio

n

motion

motion

stat

colour colour

isoch isoch

stat

1 2condition

3 4

Figure 5 Condition-specicentc expression of the two centrst-ordermodes ensuing from the visual processing fMRI studyThese data represent the degree to which the centrst principalcomponent of epoch-related waveforms over the 32 photicstimulation^baseline pairs was expressed These condition-specicentc responses are plotted in terms of the four conditionsfor the two modes motion motion present stat stationarydots colour isoluminant chromatic contrast stimuli isochisochromatic luminance contrast stimuli (a) First-ordermode 1 (b) centrst-order mode 2

In this paper we have chosen to illustrate the techniqueusing the interaction between modes associated withcolour and motion processing Nonlinear PCA could ofcourse be used in any situation where one expects theactivity of a distributed brain system to be modulated bythe expression of another system Many examples come tomind that may or may not be grounded in cognitivescience or neuroscience models For example Are the

modes implicated in the visual processing of word formsand graphemes modulated by semantic modes in moreanterior temporal and parietal cortices Althoughnonlinear PCA is an exploratory device and is implicitlydata-led careful experimental design can be used tocontrol the expression of various spatial modes that onewishes to characterize As a general point it is likely thatthe more powerful designs will be factorial in nature

144 K Friston and others Characterizing interactions between modes of brain activity

PhilTrans R Soc Lond B (2000)

z = - 4 mm

z = 0 mm

z = 4 mm

(a) (b)

(i)

(ii)

(iii)

Figure 6 Maximum intensity projections and axial (transverse) sections of the centrst- and second-order spatial modes of the fMRIphotic stimulation study The maximum intensity projections (a) adhere to the same format as in centgure 4 The axial slices havebeen selected to include the maxima of the corresponding spatial modes (i) Spatial mode 1 (ii) spatial mode 2 and (iii) second-order mode In this display format the modes have been thresholded at 164 of each modersquos standard deviation over all voxels(white areas) The resulting excursion set has been superimposed onto a structural T1-weighted MRI image conforming to thesame anatomical space (b)

allowing the expression of one mode associated with oneexperimental factor to be assessed under diiexclerent levelsof the expression of a second mode elicited by a secondexperimental cognitive or sensory factor In someinstances factorial designs are not always easy toimplement (eg in selective attention because it is dicurrencultto attend selectively to a particular attribute when it isnot present in the visual centeld) However many multi-factorial experiments designed to look at languageprocessing and memory may lend themselves nicely tocharacterization using the techniques described in thispaper

(a) Extensions and limitationsThe limitations of the nonlinear PCA proposed above

are embodied in the constraints on the form of thedecomposition assumed The most obvious constraint isthat it only allows for second-order interactions amongsources or causes of the data whereas higher-order inter-actions may prevail It would of course be easy toextend the neural net architecture to include third- orhigher-order nodes and this may be justicented in some dataanalytic situations In neuroimaging however the time-series one deals with are usually quite short and noisy andsimply identifying second-order eiexclects can be quite ambi-tious The second limitation is that the number of sourceshas to be prespecicented Again this may be a drawback interms of system identicentcation and independent compo-nent analysis in general However in neuroimaging onehas experimental control over the number of factors (iesources) that are likely to cause neurophysiologicalchanges and specifying the number of sources is a muchmore tenable in terms of justicentable restrictions on thecasual model assumed for the data

Another important consideration is that in the specialapplication of nonlinear PCA to functional imagingdata an initial dimension reduction using SVD isrequired This is because there are many more voxelsthan observations It is well known that systematic errorscan creep into applying SVD to simple nonlinear depen-dencies and that these depend on the rate of convergenceof the Taylor series associated with equation (1) In thispaper the SVD is done centrst and then the nonlinearanalysis is performed It is always possible that the SVDhas not established the right bases for the subsequentanalysis and that some bias will ensue The result will bethat apparent modulations of the centrst-order modes willnot be correct These issues represent areas of futurework and could be addressed using `toyrsquo nonlinearsystems and synthetic imaging data and by examiningthe sensitivity of the second-order modes to the degree ofSVD dimension reduction

A more biological consideration relates to themechanism of the interaction In the examples presentedabove we have assumed that the interaction is expressedat a neuronal level in terms of modulation of neuronalresponses and dynamics themselves It should be borne inmind that interactions can also be expressed at the levelof the haemodynamic response to [non-interacting]neuronal responses (eg Friston et al 1998) This shouldbe considered where there is a high degree of anatomicalconvergence between centrst-order modes that evidence astrong interaction

By virtue of the iterative scheme used for learning inthe neural net there is always the problem of localminima and the associated dependency on starting esti-mates In our applications we start with the conventionalPCA solution and therefore ensure that the ensuingmodes and interactions always account for more variancethan the corresponding linear PCA In this sense there isa unique solution (for any given gradient descent scheme)and this is the nearest to the solution where the second-order eiexclects are zero

Perhaps the most interesting limitation of the techniquepresented in this paper is buried in the assumption thatthere exists a linear combination of the inputs that givesthe expression of the sources This depends on theassumption (see 2) that centrst- and second-order modesare not collinear As long as they are not collinear there isalways a set of feed-forward connection strengths thatspan the subspace of one centrst-order mode that is ortho-gonal to all other modes (centrst and second order) Whatare the implications of collinearity between a centrst- andsecond-order mode Collinearity means that the expres-sion of a centrst-order mode is itself sensitive to the expres-sion of another mode (ie the centrst- and second-ordermodes are the same thing) The possibility of this speaksto two fundamentally diiexclerent context-sensitive eiexclectsThe centrst is when the interaction between two modes orcauses is expressed as a second-order mode with a distri-bution that is distinct from both centrst-order modes This isthe situation considered in this paper and can beaddressed using nonlinear PCA as described aboveSecond the interaction may be expressed solely in termsof the expression of one of the two centrst-order modesHere there is no second-order mode only a contextualexpression of centrst-order modes This second form ofcontext-sensitivity requires a diiexclerent sort of approach(nonlinear ICA) and is interesting because it may repre-sents a true contextual eiexclect with which the brain has tocontend in everyday sensory processing

This work was funded by the Wellcome TrustWe would also liketo thank Gary Green and an anonymous reviewer for help inpresenting this work

REFERENCES

Biswal B Yetkin F Z Haughton V M amp Hyde J S 1995Functional connectivity in the motor cortex of resting humanbrain using echo-planar MRI Magn Res Med 34 537^541

Bulaquo chel C Josephs O Rees G Turner R Frith C D ampFriston K J 1998 The functional anatomy of attention tovisual motion a fMRI study Brain 121 1281^1294

Common P 1994 Independent component analysis a newconcept Signal Processing 36 287^314

Dong D amp McAvoy T J 1996 Nonlinear principal componentanalysisoumlbased on principal curves and neural networksComp Chem Engng 20 65^78

Foldiak P 1990 Forming sparse representations by local anti-Hebbian learning Biol Cybern 64 165^170

Friston K J 1995 Functional and eiexclective connectivity inneuroimaging a synthesis Hum Brain Mapp 2 56^78

Friston K J Frith C Liddle P amp Frackowiak R S J 1993Functional connectivity the principal component analysis oflarge data sets J Cerebr Blood-Flow Metab 13 5^14

Characterizing interactions between modes of brain activity K Friston and others 145

Phil Trans R Soc Lond B (2000)

Friston K J Holmes A P Worsley K J Poline J B FrithC D amp Frackowiak R S J 1995a Statistical parametricmaps in functional imaging a general linear approach HumBrain Mapp 2 189^210

Friston K J Holmes A P Poline J-B Grasby P JWilliams S C R Frackowiak R S J amp Turner R 1995bAnalysis of fMRI time-series revisited NeuroImage 2 45^53

Friston K J Poline J-B Holmes A P Frith C D ampFrackowiak R S J 1996a A multivariate analysis of PETactivation studies Hum Brain Mapp 4 140^151

Friston K J Frith C D Fletcher P Liddle P F ampFrackowiak R S J 1996b Functional topography multi-dimensional scaling and functional connectivity in the brainCerebr Cortex 6 156^164

Friston K J Ashburner J Frith C D Poline J-B HeatherJ D amp Frackowiak R S J 1996c Spatial registration andnormalisation of images Hum Brain Mapp 3 165^189

Friston K J Josephs O Rees G amp Turner R 1998Nonlinear event-related responses in fMRI Magn Reson Med39 41^52

Karhunen J amp Joutsensalo J 1994 Representation and separa-tion of signals using nonlinear PCA type learning NeuralNetworks 7 113^127

Krame M A 1991 Nonlinear principal component analysisusing auto-associative neural networks AIChE J 37 233^243

McDonald R P 1984 Concentrmatory models for non-linear structural analysis In Data analysis and informaticsIII (Versailles 1983) pp 425^432 Amsterdam North-Holland

McIntosh A R Bookstien F L Haxby J V amp Grady C L1996 Spatial pattern analysis of functional brain images usingpartial least squares NeuroImage 3 143^157

Softky W amp Kammen D 1991 Correlations in high dimen-sional or asymmetric data sets Hebbian neuronal processingNeural Networks 4 337^347

Talairach J amp Tournoux P 1988 A coplanar stereotaxic atlas of ahuman brain Stuttgart Germany Thieme

Taleb A amp Jutten C 1997 Nonlinear source separation thepost-nonlinear mixtures In ESANN rsquo97 Bruges April 1997pp 279^284

Treue S amp Maunsell H R 1996 Attentional modulation ofvisual motion processing in cortical areas MT and MSTNature 382 539^541

Wold S 1992 Nonlinear partial least squares modellingChemometrics Int Lab Syst 14 71^84

Worsley K J amp Friston K J 1995 Analysis of fMRI time-series revisitedoumlagain NeuroImage 2 173^181

Zeki S 1990 The motion pathways of the visual cortex InVision coding and ecurrenciency (ed C Blakemore) pp 321^345Cambridge University Press

146 K Friston and others Characterizing interactions between modes of brain activity

PhilTrans R Soc Lond B (2000)

Page 10: Nonlinear PCA: characterizing interactions between modes ... › ~karl › Nonlinear PCA.pdf · Eigenimage analysis has been applied widely to positron emission tomographic (PET)

In this paper we have chosen to illustrate the techniqueusing the interaction between modes associated withcolour and motion processing Nonlinear PCA could ofcourse be used in any situation where one expects theactivity of a distributed brain system to be modulated bythe expression of another system Many examples come tomind that may or may not be grounded in cognitivescience or neuroscience models For example Are the

modes implicated in the visual processing of word formsand graphemes modulated by semantic modes in moreanterior temporal and parietal cortices Althoughnonlinear PCA is an exploratory device and is implicitlydata-led careful experimental design can be used tocontrol the expression of various spatial modes that onewishes to characterize As a general point it is likely thatthe more powerful designs will be factorial in nature

144 K Friston and others Characterizing interactions between modes of brain activity

PhilTrans R Soc Lond B (2000)

z = - 4 mm

z = 0 mm

z = 4 mm

(a) (b)

(i)

(ii)

(iii)

Figure 6 Maximum intensity projections and axial (transverse) sections of the centrst- and second-order spatial modes of the fMRIphotic stimulation study The maximum intensity projections (a) adhere to the same format as in centgure 4 The axial slices havebeen selected to include the maxima of the corresponding spatial modes (i) Spatial mode 1 (ii) spatial mode 2 and (iii) second-order mode In this display format the modes have been thresholded at 164 of each modersquos standard deviation over all voxels(white areas) The resulting excursion set has been superimposed onto a structural T1-weighted MRI image conforming to thesame anatomical space (b)

allowing the expression of one mode associated with oneexperimental factor to be assessed under diiexclerent levelsof the expression of a second mode elicited by a secondexperimental cognitive or sensory factor In someinstances factorial designs are not always easy toimplement (eg in selective attention because it is dicurrencultto attend selectively to a particular attribute when it isnot present in the visual centeld) However many multi-factorial experiments designed to look at languageprocessing and memory may lend themselves nicely tocharacterization using the techniques described in thispaper

(a) Extensions and limitationsThe limitations of the nonlinear PCA proposed above

are embodied in the constraints on the form of thedecomposition assumed The most obvious constraint isthat it only allows for second-order interactions amongsources or causes of the data whereas higher-order inter-actions may prevail It would of course be easy toextend the neural net architecture to include third- orhigher-order nodes and this may be justicented in some dataanalytic situations In neuroimaging however the time-series one deals with are usually quite short and noisy andsimply identifying second-order eiexclects can be quite ambi-tious The second limitation is that the number of sourceshas to be prespecicented Again this may be a drawback interms of system identicentcation and independent compo-nent analysis in general However in neuroimaging onehas experimental control over the number of factors (iesources) that are likely to cause neurophysiologicalchanges and specifying the number of sources is a muchmore tenable in terms of justicentable restrictions on thecasual model assumed for the data

Another important consideration is that in the specialapplication of nonlinear PCA to functional imagingdata an initial dimension reduction using SVD isrequired This is because there are many more voxelsthan observations It is well known that systematic errorscan creep into applying SVD to simple nonlinear depen-dencies and that these depend on the rate of convergenceof the Taylor series associated with equation (1) In thispaper the SVD is done centrst and then the nonlinearanalysis is performed It is always possible that the SVDhas not established the right bases for the subsequentanalysis and that some bias will ensue The result will bethat apparent modulations of the centrst-order modes willnot be correct These issues represent areas of futurework and could be addressed using `toyrsquo nonlinearsystems and synthetic imaging data and by examiningthe sensitivity of the second-order modes to the degree ofSVD dimension reduction

A more biological consideration relates to themechanism of the interaction In the examples presentedabove we have assumed that the interaction is expressedat a neuronal level in terms of modulation of neuronalresponses and dynamics themselves It should be borne inmind that interactions can also be expressed at the levelof the haemodynamic response to [non-interacting]neuronal responses (eg Friston et al 1998) This shouldbe considered where there is a high degree of anatomicalconvergence between centrst-order modes that evidence astrong interaction

By virtue of the iterative scheme used for learning inthe neural net there is always the problem of localminima and the associated dependency on starting esti-mates In our applications we start with the conventionalPCA solution and therefore ensure that the ensuingmodes and interactions always account for more variancethan the corresponding linear PCA In this sense there isa unique solution (for any given gradient descent scheme)and this is the nearest to the solution where the second-order eiexclects are zero

Perhaps the most interesting limitation of the techniquepresented in this paper is buried in the assumption thatthere exists a linear combination of the inputs that givesthe expression of the sources This depends on theassumption (see 2) that centrst- and second-order modesare not collinear As long as they are not collinear there isalways a set of feed-forward connection strengths thatspan the subspace of one centrst-order mode that is ortho-gonal to all other modes (centrst and second order) Whatare the implications of collinearity between a centrst- andsecond-order mode Collinearity means that the expres-sion of a centrst-order mode is itself sensitive to the expres-sion of another mode (ie the centrst- and second-ordermodes are the same thing) The possibility of this speaksto two fundamentally diiexclerent context-sensitive eiexclectsThe centrst is when the interaction between two modes orcauses is expressed as a second-order mode with a distri-bution that is distinct from both centrst-order modes This isthe situation considered in this paper and can beaddressed using nonlinear PCA as described aboveSecond the interaction may be expressed solely in termsof the expression of one of the two centrst-order modesHere there is no second-order mode only a contextualexpression of centrst-order modes This second form ofcontext-sensitivity requires a diiexclerent sort of approach(nonlinear ICA) and is interesting because it may repre-sents a true contextual eiexclect with which the brain has tocontend in everyday sensory processing

This work was funded by the Wellcome TrustWe would also liketo thank Gary Green and an anonymous reviewer for help inpresenting this work

REFERENCES

Biswal B Yetkin F Z Haughton V M amp Hyde J S 1995Functional connectivity in the motor cortex of resting humanbrain using echo-planar MRI Magn Res Med 34 537^541

Bulaquo chel C Josephs O Rees G Turner R Frith C D ampFriston K J 1998 The functional anatomy of attention tovisual motion a fMRI study Brain 121 1281^1294

Common P 1994 Independent component analysis a newconcept Signal Processing 36 287^314

Dong D amp McAvoy T J 1996 Nonlinear principal componentanalysisoumlbased on principal curves and neural networksComp Chem Engng 20 65^78

Foldiak P 1990 Forming sparse representations by local anti-Hebbian learning Biol Cybern 64 165^170

Friston K J 1995 Functional and eiexclective connectivity inneuroimaging a synthesis Hum Brain Mapp 2 56^78

Friston K J Frith C Liddle P amp Frackowiak R S J 1993Functional connectivity the principal component analysis oflarge data sets J Cerebr Blood-Flow Metab 13 5^14

Characterizing interactions between modes of brain activity K Friston and others 145

Phil Trans R Soc Lond B (2000)

Friston K J Holmes A P Worsley K J Poline J B FrithC D amp Frackowiak R S J 1995a Statistical parametricmaps in functional imaging a general linear approach HumBrain Mapp 2 189^210

Friston K J Holmes A P Poline J-B Grasby P JWilliams S C R Frackowiak R S J amp Turner R 1995bAnalysis of fMRI time-series revisited NeuroImage 2 45^53

Friston K J Poline J-B Holmes A P Frith C D ampFrackowiak R S J 1996a A multivariate analysis of PETactivation studies Hum Brain Mapp 4 140^151

Friston K J Frith C D Fletcher P Liddle P F ampFrackowiak R S J 1996b Functional topography multi-dimensional scaling and functional connectivity in the brainCerebr Cortex 6 156^164

Friston K J Ashburner J Frith C D Poline J-B HeatherJ D amp Frackowiak R S J 1996c Spatial registration andnormalisation of images Hum Brain Mapp 3 165^189

Friston K J Josephs O Rees G amp Turner R 1998Nonlinear event-related responses in fMRI Magn Reson Med39 41^52

Karhunen J amp Joutsensalo J 1994 Representation and separa-tion of signals using nonlinear PCA type learning NeuralNetworks 7 113^127

Krame M A 1991 Nonlinear principal component analysisusing auto-associative neural networks AIChE J 37 233^243

McDonald R P 1984 Concentrmatory models for non-linear structural analysis In Data analysis and informaticsIII (Versailles 1983) pp 425^432 Amsterdam North-Holland

McIntosh A R Bookstien F L Haxby J V amp Grady C L1996 Spatial pattern analysis of functional brain images usingpartial least squares NeuroImage 3 143^157

Softky W amp Kammen D 1991 Correlations in high dimen-sional or asymmetric data sets Hebbian neuronal processingNeural Networks 4 337^347

Talairach J amp Tournoux P 1988 A coplanar stereotaxic atlas of ahuman brain Stuttgart Germany Thieme

Taleb A amp Jutten C 1997 Nonlinear source separation thepost-nonlinear mixtures In ESANN rsquo97 Bruges April 1997pp 279^284

Treue S amp Maunsell H R 1996 Attentional modulation ofvisual motion processing in cortical areas MT and MSTNature 382 539^541

Wold S 1992 Nonlinear partial least squares modellingChemometrics Int Lab Syst 14 71^84

Worsley K J amp Friston K J 1995 Analysis of fMRI time-series revisitedoumlagain NeuroImage 2 173^181

Zeki S 1990 The motion pathways of the visual cortex InVision coding and ecurrenciency (ed C Blakemore) pp 321^345Cambridge University Press

146 K Friston and others Characterizing interactions between modes of brain activity

PhilTrans R Soc Lond B (2000)

Page 11: Nonlinear PCA: characterizing interactions between modes ... › ~karl › Nonlinear PCA.pdf · Eigenimage analysis has been applied widely to positron emission tomographic (PET)

allowing the expression of one mode associated with oneexperimental factor to be assessed under diiexclerent levelsof the expression of a second mode elicited by a secondexperimental cognitive or sensory factor In someinstances factorial designs are not always easy toimplement (eg in selective attention because it is dicurrencultto attend selectively to a particular attribute when it isnot present in the visual centeld) However many multi-factorial experiments designed to look at languageprocessing and memory may lend themselves nicely tocharacterization using the techniques described in thispaper

(a) Extensions and limitationsThe limitations of the nonlinear PCA proposed above

are embodied in the constraints on the form of thedecomposition assumed The most obvious constraint isthat it only allows for second-order interactions amongsources or causes of the data whereas higher-order inter-actions may prevail It would of course be easy toextend the neural net architecture to include third- orhigher-order nodes and this may be justicented in some dataanalytic situations In neuroimaging however the time-series one deals with are usually quite short and noisy andsimply identifying second-order eiexclects can be quite ambi-tious The second limitation is that the number of sourceshas to be prespecicented Again this may be a drawback interms of system identicentcation and independent compo-nent analysis in general However in neuroimaging onehas experimental control over the number of factors (iesources) that are likely to cause neurophysiologicalchanges and specifying the number of sources is a muchmore tenable in terms of justicentable restrictions on thecasual model assumed for the data

Another important consideration is that in the specialapplication of nonlinear PCA to functional imagingdata an initial dimension reduction using SVD isrequired This is because there are many more voxelsthan observations It is well known that systematic errorscan creep into applying SVD to simple nonlinear depen-dencies and that these depend on the rate of convergenceof the Taylor series associated with equation (1) In thispaper the SVD is done centrst and then the nonlinearanalysis is performed It is always possible that the SVDhas not established the right bases for the subsequentanalysis and that some bias will ensue The result will bethat apparent modulations of the centrst-order modes willnot be correct These issues represent areas of futurework and could be addressed using `toyrsquo nonlinearsystems and synthetic imaging data and by examiningthe sensitivity of the second-order modes to the degree ofSVD dimension reduction

A more biological consideration relates to themechanism of the interaction In the examples presentedabove we have assumed that the interaction is expressedat a neuronal level in terms of modulation of neuronalresponses and dynamics themselves It should be borne inmind that interactions can also be expressed at the levelof the haemodynamic response to [non-interacting]neuronal responses (eg Friston et al 1998) This shouldbe considered where there is a high degree of anatomicalconvergence between centrst-order modes that evidence astrong interaction

By virtue of the iterative scheme used for learning inthe neural net there is always the problem of localminima and the associated dependency on starting esti-mates In our applications we start with the conventionalPCA solution and therefore ensure that the ensuingmodes and interactions always account for more variancethan the corresponding linear PCA In this sense there isa unique solution (for any given gradient descent scheme)and this is the nearest to the solution where the second-order eiexclects are zero

Perhaps the most interesting limitation of the techniquepresented in this paper is buried in the assumption thatthere exists a linear combination of the inputs that givesthe expression of the sources This depends on theassumption (see 2) that centrst- and second-order modesare not collinear As long as they are not collinear there isalways a set of feed-forward connection strengths thatspan the subspace of one centrst-order mode that is ortho-gonal to all other modes (centrst and second order) Whatare the implications of collinearity between a centrst- andsecond-order mode Collinearity means that the expres-sion of a centrst-order mode is itself sensitive to the expres-sion of another mode (ie the centrst- and second-ordermodes are the same thing) The possibility of this speaksto two fundamentally diiexclerent context-sensitive eiexclectsThe centrst is when the interaction between two modes orcauses is expressed as a second-order mode with a distri-bution that is distinct from both centrst-order modes This isthe situation considered in this paper and can beaddressed using nonlinear PCA as described aboveSecond the interaction may be expressed solely in termsof the expression of one of the two centrst-order modesHere there is no second-order mode only a contextualexpression of centrst-order modes This second form ofcontext-sensitivity requires a diiexclerent sort of approach(nonlinear ICA) and is interesting because it may repre-sents a true contextual eiexclect with which the brain has tocontend in everyday sensory processing

This work was funded by the Wellcome TrustWe would also liketo thank Gary Green and an anonymous reviewer for help inpresenting this work

REFERENCES

Biswal B Yetkin F Z Haughton V M amp Hyde J S 1995Functional connectivity in the motor cortex of resting humanbrain using echo-planar MRI Magn Res Med 34 537^541

Bulaquo chel C Josephs O Rees G Turner R Frith C D ampFriston K J 1998 The functional anatomy of attention tovisual motion a fMRI study Brain 121 1281^1294

Common P 1994 Independent component analysis a newconcept Signal Processing 36 287^314

Dong D amp McAvoy T J 1996 Nonlinear principal componentanalysisoumlbased on principal curves and neural networksComp Chem Engng 20 65^78

Foldiak P 1990 Forming sparse representations by local anti-Hebbian learning Biol Cybern 64 165^170

Friston K J 1995 Functional and eiexclective connectivity inneuroimaging a synthesis Hum Brain Mapp 2 56^78

Friston K J Frith C Liddle P amp Frackowiak R S J 1993Functional connectivity the principal component analysis oflarge data sets J Cerebr Blood-Flow Metab 13 5^14

Characterizing interactions between modes of brain activity K Friston and others 145

Phil Trans R Soc Lond B (2000)

Friston K J Holmes A P Worsley K J Poline J B FrithC D amp Frackowiak R S J 1995a Statistical parametricmaps in functional imaging a general linear approach HumBrain Mapp 2 189^210

Friston K J Holmes A P Poline J-B Grasby P JWilliams S C R Frackowiak R S J amp Turner R 1995bAnalysis of fMRI time-series revisited NeuroImage 2 45^53

Friston K J Poline J-B Holmes A P Frith C D ampFrackowiak R S J 1996a A multivariate analysis of PETactivation studies Hum Brain Mapp 4 140^151

Friston K J Frith C D Fletcher P Liddle P F ampFrackowiak R S J 1996b Functional topography multi-dimensional scaling and functional connectivity in the brainCerebr Cortex 6 156^164

Friston K J Ashburner J Frith C D Poline J-B HeatherJ D amp Frackowiak R S J 1996c Spatial registration andnormalisation of images Hum Brain Mapp 3 165^189

Friston K J Josephs O Rees G amp Turner R 1998Nonlinear event-related responses in fMRI Magn Reson Med39 41^52

Karhunen J amp Joutsensalo J 1994 Representation and separa-tion of signals using nonlinear PCA type learning NeuralNetworks 7 113^127

Krame M A 1991 Nonlinear principal component analysisusing auto-associative neural networks AIChE J 37 233^243

McDonald R P 1984 Concentrmatory models for non-linear structural analysis In Data analysis and informaticsIII (Versailles 1983) pp 425^432 Amsterdam North-Holland

McIntosh A R Bookstien F L Haxby J V amp Grady C L1996 Spatial pattern analysis of functional brain images usingpartial least squares NeuroImage 3 143^157

Softky W amp Kammen D 1991 Correlations in high dimen-sional or asymmetric data sets Hebbian neuronal processingNeural Networks 4 337^347

Talairach J amp Tournoux P 1988 A coplanar stereotaxic atlas of ahuman brain Stuttgart Germany Thieme

Taleb A amp Jutten C 1997 Nonlinear source separation thepost-nonlinear mixtures In ESANN rsquo97 Bruges April 1997pp 279^284

Treue S amp Maunsell H R 1996 Attentional modulation ofvisual motion processing in cortical areas MT and MSTNature 382 539^541

Wold S 1992 Nonlinear partial least squares modellingChemometrics Int Lab Syst 14 71^84

Worsley K J amp Friston K J 1995 Analysis of fMRI time-series revisitedoumlagain NeuroImage 2 173^181

Zeki S 1990 The motion pathways of the visual cortex InVision coding and ecurrenciency (ed C Blakemore) pp 321^345Cambridge University Press

146 K Friston and others Characterizing interactions between modes of brain activity

PhilTrans R Soc Lond B (2000)

Page 12: Nonlinear PCA: characterizing interactions between modes ... › ~karl › Nonlinear PCA.pdf · Eigenimage analysis has been applied widely to positron emission tomographic (PET)

Friston K J Holmes A P Worsley K J Poline J B FrithC D amp Frackowiak R S J 1995a Statistical parametricmaps in functional imaging a general linear approach HumBrain Mapp 2 189^210

Friston K J Holmes A P Poline J-B Grasby P JWilliams S C R Frackowiak R S J amp Turner R 1995bAnalysis of fMRI time-series revisited NeuroImage 2 45^53

Friston K J Poline J-B Holmes A P Frith C D ampFrackowiak R S J 1996a A multivariate analysis of PETactivation studies Hum Brain Mapp 4 140^151

Friston K J Frith C D Fletcher P Liddle P F ampFrackowiak R S J 1996b Functional topography multi-dimensional scaling and functional connectivity in the brainCerebr Cortex 6 156^164

Friston K J Ashburner J Frith C D Poline J-B HeatherJ D amp Frackowiak R S J 1996c Spatial registration andnormalisation of images Hum Brain Mapp 3 165^189

Friston K J Josephs O Rees G amp Turner R 1998Nonlinear event-related responses in fMRI Magn Reson Med39 41^52

Karhunen J amp Joutsensalo J 1994 Representation and separa-tion of signals using nonlinear PCA type learning NeuralNetworks 7 113^127

Krame M A 1991 Nonlinear principal component analysisusing auto-associative neural networks AIChE J 37 233^243

McDonald R P 1984 Concentrmatory models for non-linear structural analysis In Data analysis and informaticsIII (Versailles 1983) pp 425^432 Amsterdam North-Holland

McIntosh A R Bookstien F L Haxby J V amp Grady C L1996 Spatial pattern analysis of functional brain images usingpartial least squares NeuroImage 3 143^157

Softky W amp Kammen D 1991 Correlations in high dimen-sional or asymmetric data sets Hebbian neuronal processingNeural Networks 4 337^347

Talairach J amp Tournoux P 1988 A coplanar stereotaxic atlas of ahuman brain Stuttgart Germany Thieme

Taleb A amp Jutten C 1997 Nonlinear source separation thepost-nonlinear mixtures In ESANN rsquo97 Bruges April 1997pp 279^284

Treue S amp Maunsell H R 1996 Attentional modulation ofvisual motion processing in cortical areas MT and MSTNature 382 539^541

Wold S 1992 Nonlinear partial least squares modellingChemometrics Int Lab Syst 14 71^84

Worsley K J amp Friston K J 1995 Analysis of fMRI time-series revisitedoumlagain NeuroImage 2 173^181

Zeki S 1990 The motion pathways of the visual cortex InVision coding and ecurrenciency (ed C Blakemore) pp 321^345Cambridge University Press

146 K Friston and others Characterizing interactions between modes of brain activity

PhilTrans R Soc Lond B (2000)