nonparametric regression modeling with equiprobable topographic maps and projection pursuit learning...

Journal of VLSI Signal Processing 18, 275–285 (1998)c© 1998 Kluwer Academic Publishers. Manufactured in The Netherlands.

Nonparametric Regression Modeling with Equiprobable TopographicMaps and Projection Pursuit Learning with Application

to PET Image Processing

MARC M. VAN HULLEK.U. Leuven, Laboratorium voor Neuro- en Psychofysiologie, Campus Gasthuisberg, Herestraat 49,

B-3000 Leuven, Belgium

Received October 20, 1996; Revised April 2, 1997

Abstract. A recently introduced rule for equiprobable topographic map formation, called the Vectorial BoundaryAdaptation Rule (VBAR), is extended and applied to nonparametric projection pursuit regression. The performanceof the regression procedure is compared to that of a number of other nonparametric regression procedures. Theprocedure is applied to positron emission tomography (PET) images for adaptive filtering and data compressionpurposes.

1. Introduction

The Self-Organizing (feature) Map (SOM) [1] has seena wide range of statistical applications [2, 3] however,the algorithm performs poorly on regression problemsof even low dimensionality. This is mainly due to thefact that the topology-preserving ordering aimed for inthe d-dimensional input space may be lost after pro-jection into the (d − 1)-dimensional subspace of inde-pendent variables (nonfunctional mapping). In orderto solve this problem, Ritter and Schulten [4] assumedthat the multi-variable function to be fitted could be de-composed into a number of single-variable functions,and that enough input samples were available.

Cherkassky and Lari-Najafi [5] modified the originalSOM algorithm to alleviate the occurrence of nonfunc-tional mappings by constraining the learning process(constrained topological mapping). A disadvantage ofperforming regression in this way is the prohibitiveamount of neurons needed for function approximationin high dimensional spaces and the high number of in-put samples needed to allocate the neuron weight vec-tors reliably (cf. the curse of dimensionality). One ofthe very few approaches which perform well even inhigh dimensional spaces is projection pursuit regres-sion [6]: the high dimensional data are interpreted

through optimally-chosen lower dimensional projec-tions. Recently, Heskes and Kappen [7] combined pro-jection pursuit regression with a modified version ofthe original SOM algorithm. The modification has theadvantage that the weight update process can now bederived from an energy function. The energy functionsuggests an EM algorithm for batch update learning.The authors claim that nonfunctional mappings areavoided by introducing a noise process for smoothingthe dependent variable (output variable), but there is noformal guarantee for this.

Recently, we have introduced a rule, called theVectorial Boundary Adaptation Rule (VBAR), whichyields a correct topology-preserving map by perform-ing local weight updates only [8–10]. Our rule can bederived from an energy function and is aimed at achiev-ing an equiprobable quantization of the input spaceand, thus, an unbiased allocation of neuron weight vec-tors (equiprobable topographic map). In this article,we will extend VBAR with a neighborhood function(eVBAR) to allow for kernel smoothing (in topologicalspace). We will apply our VBAR-based procedure tononparametric projection pursuit regression and com-pare the regression performance obtained with that ofthe (modified) SOM algorithm [7], back-propagation-,and projection pursuit learning [11]. Finally, we will

276 Van Hulle

Figure 1. (A) Definition of quantization region in a two-dimensional, rectangular lattice. The quadrilaterals denote the quantization regionslabeledHa, . . . , Hd in the input space. The shaded region marks the receptive field of neuronj . The full and dashed lines represent the latticebefore and after the weights are updated using VBAR (not to scale). The current input is indicated by the black dot. (B) Idem but with theweights updated using eVBAR.

apply our regression procedure to positron emissiontomography (PET) images for adaptive filtering anddata compression purposes. Data compression is per-formed with our topographic map trained to minimizethe regression error, rather than with a map trainedto minimize the quantization error directly as is usu-ally done [12], also with topographic maps [2, 13, 14].Furthermore, since our compression technique is basedon a partitioning of the input space, but supplied withan interpolating function, it is different from conven-tional compression techniques such as the Karhunen-Loeve transform (principal component analysis) whichis based on a continuous representation of the inputspace using linear combinations of basis vectors [15].

2. Vectorial Boundary Adaptation Rule

In topographic map formation, a mapping is devel-oped from thed-dimensional spaceV of inputsv =[v1, . . . , vd] onto an equal or lower-dimensional dis-crete latticeA of N formal neurons in such a way thatneighboring inputs are assigned to neighboring neu-rons. To each neuroni ∈ A corresponds a weightvectorwi = [wi 1, . . . , wid ] defined inV space. Tra-ditionally, the formal neurons quantize the input spaceV into a discrete number of mutually non-overlappingpartition cells or quantization regions (Voronoi tessel-lation). However, in our case, the definition of quanti-zation region is different from the one used in a Voronoitessellation. Assume thatA comprises a regulard-dimensional lattice with a rectangular topology.

Since the topology is rectangular, the neurons of thelattice defined-dimensional quadrilaterals (Fig. 1(A)).Each quadrilateral represents a quantization region,quantizing the input spaceV (we also consider the“overload” quantization regions facing the outer bor-der of the lattice). In this way, every neuron ofA is avertex common to 2d adjacent quadrilaterals. For ex-ample, in Fig. 1(A), neuronj is a vertex common to fourquadrilateralsHa, Hb, Hc, Hd: these four quadrilater-als represent neuronj ’s “receptive field” (shaded area).Conversely, each quadrilateral belongs to the receptivefields of 2d neurons: hence, receptive fields overlap.To each of theQ quadrilaterals ofA, we associate acode membership function: 1lHj (v) = 1 if v ∈ Hj ,else 1lHj (v) = 0. Let Si be the set of 2d quadrilateralsfor which neuroni is a common vertex. Neuroni issaid to be “active” if at least one of the quadrilaterals ofthe setSi is active (i.e., with non-zero code membershipfunction output). The Vectorial Boundary AdaptationRule (VBAR) is defined as:

1wi = η∑

Hj ∈Si

1lHj (v) Sgn(v − wi ), ∀ i ∈ A, (1)

with η the learning rate, and where the “Sgn” opera-tor symbolizes the sign function taken componentwise.Hence, by this operator,(v−wi ) becomes a vector with+1 or−1 components. The effect of the rule is shownin Fig. 1(A) (dashed lines). It can be easily verified thatthe average, taken over a set{vm} of M input samples,of Eq. (1) performs (stochastic) gradient descent on the

Nonparametric Regression Modeling 277

energy function:

E =M∑

m=1

N∑i =1

∑Hj ∈ Si

1lHj (vm)|vm − wi |, (2)

with |vm − wi | .= ∑dj =1 |vm

j − wi j |.It can be formally proven that in the one-dimensional

case, VBAR yields an equiprobable quantization forany N [8], and that in the multi-dimensional case,VBAR yields a quantization which will approximatean equiprobable one for largeN [10]. In other words,the weight distributionp(wi ) obtained at convergenceis a linear function of the input densityp(v) for largeN [10]. This should be contrasted with the SOM al-gorithm for which the weight distribution at conver-gence is proportional top

23 (v) in the one-dimensional

case, in the limit of an infinite density of neighbor neu-rons (continuum approximation) and vanishing neigh-borhood function range [16]. For a discrete map ind-dimensional space, the weight density is expected

to be proportional top1

1+ 2d (v) for large N and for

minimum MSE quantization [17] (as is also the casewith the standard unsupervised competitive learningrule [18]).

With VBAR, each weight vector will converge to thed-dimensional “median” of the corresponding recep-tive field, where the “median” is defined as the vectorof the (scalar) medians in each of thed input dimen-sions separately. (Note that there exists no unique def-inition of median ind dimensions.) In other words,each convergedwi j divides the set of inputs which ac-tivate neuroni into two equally-sized subsets along thej th input dimension.

We can easily extend VBAR by adding a neighbor-hood function. DefineSHj as the set of neurons thatare the vertices of quadrilateralHj . The neighbor-hood function3 is then defined as a decreasing func-tion of the minimum distance between neuroni andthe neurons of the setSHj in lattice-space. The ex-tended Vectorial Boundary Adaptation Rule (eVBAR)is defined as:

1wi = η

N∑k=1

∑Hj ∈Sk

3(i, SHj , σ )1lHj (v) Sgn(v − wi ),

∀i ∈ A, (3)

with σ the neighborhood range. The effect of eVBARis shown in Fig. 1(B) (dashed lines). Also for eVBAR,it can be easily verified that Eq. (3) performs gradient

descent on the energy function:

E =M∑

m=1

N∑i,k=1

∑Hj ∈Sk

3(i, SHj , σ

)1lHj (v

m) |vm − wi|.

(4)

As a result of adding a neighborhood function, eVBARperforms kernel smoothing in topological space (pre-dicted on theoretical grounds, see [19]). For the re-mainder of this article, we will refer to VBAR whenwe mean the original rule, without neighborhood func-tion, and to eVBAR when the neighborhood functionis present.

Finally, a drawback of the present scheme is thatit assumes an equal dimensionality of the topographicmap and the data space in which it is embedded. Inthe next section, we will introduce a technique whichwill enable us to develop lattices in lower-dimensionalspaces, thereby avoiding the occurrence of nonfunc-tional mappings.

3. Projection Pursuit Regression

Consider the regression fitting of a scalar functionyof d − 1 independent variables, denoted by the vec-tor x = [x1, . . . , xd−1], from a given set ofM possi-bly noisy data points or measurements{(ym, xm), m =1, . . . , M} in d-dimensional space:

ym = f (xm) + noise, (5)

where f is the unknown function to be estimated, andwhere the noise contribution has zero mean and is inde-pendent from the{xm}. In projection pursuit regression,thed-dimensional data points are interpreted throughoptimally-chosen lower dimensional projections [6];the “pursuit” part refers to optimization with respect tothe projection directions. For simplicity, we will con-sider the case where the functionf is approximated bya sum of scalar functions:

f (x) =K∑

k=1

fk(akxT ), (6)

with f the approximation off , andak the projectiondirections (unit vectors). Thefk are piecewise smoothactivation functions or (nonsmoothing) splines that joincontinuously at points called “knots”. The functionsfk and projectionsak are estimated sequentially in or-der to minimize the mean squared error (MSE) of the

278 Van Hulle

residuals:

C(ak) = 1

M

M∑m=1

[fk(akxmT) −

{ym −

k−1∑k′=1

fk′(a′kx

mT)

}]2

,

(7)

and where the term between curly brackets denotes thekth residual of themth data point,r m

k , with r m1

.= ym.By considering the projected data points{(r m

k , akxmT)}as the input samples{vm}, and the knots as neuronsof a one-dimensional topographic mapA, we can de-termine the position of these knots adaptively by de-veloping the map in the two-dimensional space of theprojected data points. The topographic maps are deter-mined with VBAR or eVBAR. Consider first the caseof VBAR: we modify the original VBAR concept insuch a way that a one-dimensional topographic mapcan be formed in two-dimensional space as follows(Fig. 2(A)). By observing thatrk is the dependent vari-able, the code membership functions are defined for theprojected independent variables(akxT ). The resultingcode membership functions are demarcated by the thindashed lines in Fig. 2(A): e.g., the shaded regions rep-resent the quantization regionsHc andHd, i.e., neuronj ’s receptive field. However, the weights are updated inthe(rk, akxT ) plane using VBAR in batch mode (thickdashed lines). When the one-dimensional topographicmap has converged, the weights will represent the me-dians (our definition) of the corresponding quantiza-tion regions. This implies that, e.g., for neuronj inFig. 2(B), that there will be an equal number of datapoints above and belowj (i.e., in the light and darkshaded regions). Furthermore, since this is the case

Figure 2. (A) Definition of quantization region in a one-dimensional lattice. The thick full and dashed lines represent the lattice before andafter the weights are updated using VBAR. The thin dashed lines represent the borders of the quantization regions prior to updating the weights.The shaded region corresponds to the receptive field of neuronj and it comprises the quantization regionsHc and Hd. (B) At convergence,there will be an equal number of data points above (dark shaded region) and below neuronj (light shaded region).

for every neuron of the map, there will be, roughlyspeaking, an equal number of data points from the set{(r m

k , akxmT)} above and below the piecewise smoothactivation functionfk. The occurrence of nonfunctionalmappings is avoided since VBAR yields a lattice with-out overlapping quadrilaterals in each projection di-rection [9]. In case eVBAR is used instead of VBAR,the aforementioned stays the same except that, by thepresence of a neighborhood function, a smoothing op-eration will be performed in the one-dimensional topo-graphic map so that the converged neuron weights donot necessarily represent the medians of their receptivefields. The smoothing operation will be reflected bythe shape of the resulting activation functionfk.

4. Simulations

We apply our regression procedure to the test exam-ple used by Heskes and Kappen [7]: we compile aset of M = 50 data points{(ym, xm), m= 1, . . . , M}with xm homogeneously and independently drawn fromthe uniform distribution [0, 1]4, f (x) = sin(2πx1x2),and zero mean Gaussian noise with standard deviationσ = 0.1. We use a lattice ofN = 7 neurons, cubic splineinterpolation between the knots (i.e., nonsmoothingsplines),K = 4 interpolating functions, and a learn-ing rateη = 0.02. After each VBAR epoch, we de-termine the MSE between the actual and the desiredequiprobable code membership function usage. We runVBAR until the magnitude of the difference betweenthe present and the previous running averaged MSE islower than 1.0× 10−7 or until 15,000 epochs have


elapsed; the present running average equals 10% ofthe present, unaveraged MSE added to 90% of the pre-vious running average. In order to optimizeC(ak),the procedure is first run for theak taken as unit

Figure 3. (A)–(D) Smooth activation functionsfk obtained fork = 1, . . . , 4. The weight vectors are indicated with open circles and theprojected data points with small dots. (E), (F) Regression performanceC(a4) plotted as a function of the standard deviation of the added noiseσ (E), and the data set sizeM (F). TheC(a4) values are plotted forN = 7, 5 and 10 (thick, thin and dashed lines, respectively) and are averagesover 20 runs with different, randomly-chosen data sets of the same size; the standard deviations are indicated with vertical bars (drawn in thecorresponding line types). The black dot is the average performance forN = 50.

vectors; the components of the unit vector with thelowest residual error are then further optimized byperforming hill descent in steps of 0.01. The resultsare shown in Fig. 3(A)–(D). The optimal projection

280 Van Hulle

Figure 4. Nonlinear function (A) and obtained estimate (N = 7, K = 10) (B).

vectors obtained area1 = [1.01, 1.01, 0.01, 0.01],a2 =[0.01, 1, 0, 0], a3 = [1.02, 0.02, 0.02, 0.02], anda4 =[−0.01, −0.01, 1, 0]. TheC values obtained for eachadded projection vector are: 3.79× 10−2, 2.25× 10−2,1.75× 10−2, and 1.47× 10−2, respectively. Overfit-ting starts with the 4th projection vector (x3 andx4 aredummy variables in the definition off ). The qualityof the reconstructed functionf can be assessed by de-termining the MSE betweenf (xm) and f (xm) for a setof e.g., 50× 50 equidistant points taken on a rectangu-lar grid in (x1, x2) space: the MSE value obtained inthis way for the “reconstruction” error (or “generaliza-tion” error in the absence of noise in Eq. (5)) equals3.44 × 10−2. Finally, the performance of our pro-cedure is statistically quantified by running the sim-ulations 20 times for different standard deviationsσ

and different data set sizesM . Figure 3(E) showsthat C(a4) more rapidly increases withσ for smallernetwork sizesN. Figure 3(F) reveals that overfittingstarts belowM ≈ 3N and thatC(a4) levels off aboveM ≈ 10N; whenN increases, the asymptotic perfor-mance approximates 2× 10−2 (black dot in Fig. 3(F)).

In the case of Heskes and Kappen’s results, we esti-mate that theC values obtained for each added projec-tion vector are: 2.02×10−2, 1.11×10−2, 5.28×10−3,and 3.70 × 10−3, respectively. (We extracted the po-sition of the weight vectors and data points from thepublicly-available postscript file of their article.) Theselower error values are somewhat misleading sincethe sequentially-added projection vectors soon lead tooverfitting: the “reconstruction” error obtained for thefirst projection vector equals 6.8× 10−2 but this valueincreases to 0.18 when the second projection vectoris added, even though the training errorC decreases.Hence, our approach yields a superior generalizationperformance (i.e., a lower “reconstruction” error forthe test set).

The next example is taken from Hwang and co-workers’ seminal article on regression modeling in

back-propagation- and projection pursuit learning [11].A set ofM = 225 data points is compiled with the{xm}taken homogeneously and independently from the uni-form distribution [0, 1]2, f (x) =1.3356(1.5(1− x1) +exp(2x1 − 1) sin(3π(x1 − 0.6)2) + exp(3(x2 − 0.5))

sin(4π(x2 − 0.9)2)) (Fig. 4(A)), and zero mean Gaus-sian noise with standard deviation 0.25. They consid-ered two-layer perceptrons withK = 5 and 10 hiddenneurons, when trained with back-propagation learn-ing (based on the Gauss-Newton method) (BPL); orwith K = 3 and 5 hidden neurons when trained withprojection pursuit learning (PPL). (Note that the hid-den neurons of the two-layer perceptrons correspondto the activation functions and that the weights of theseneurons correspond to the projection directions.) Forthe latter, the modified supersmoother method [11]was applied for obtaining smooth output functionrepresentations for the hidden neurons; the Gauss-Newton method was used for obtaining the optimal pro-jection vectors. The generalization performance wasdetermined as the fraction of variance unexplained(FVU) for a set of Mt = 10,000 independent testdata:

FVU =∑Mt

m=1( f (xm) − f (xm))2∑Mtm=1( f (xm) − f (xm))2

, (8)

with f the set average.For VBAR, we run the same learning strategy as be-

fore but until 50,000 epochs have elapsed instead of1500 epochs. In order to further optimize the projec-tion directions, we also apply backfitting (B), as wasdone by Hwang and co-workers for PPL. Backfittingconsists of cyclically minimizingC(ak) for the resid-uals of projection directionk, until there is little orno change. We also run the simulations using eVBAR:as a neighborhood function, we take a Gaussian anddetermine the optimal standard deviation by cross-validation. Also for eVBAR, we always use backfitting.


Table 1. Nonparametric regression.

Regressionprocedure Parameters FVU

BPL K = 5 0.577

K = 10 0.0198

PPL K = 3 0.334

K = 5 0.0186

VBAR N = 7 K = 3 0.0470

K = 5 0.0342

K = 10 0.0353

VBAR + B N = 7 K = 3 0.0440

K = 5 0.0282

K = 10 0.0194

eVBAR N = 7 K = 3 0.0384

K = 5 0.0214

K = 10 0.0142

VBAR N = 10 K = 3 0.0372

K = 5 0.0255

K = 10 0.0353

VBAR + B N = 10 K = 3 0.0165

K = 5 0.0190

K = 10 0.0240

eVBAR N = 10 K = 3 0.0155

K = 5 0.0153

K = 10 0.0151

The FVU results are summarized in Table 1: VBARyields low FVU values for small-sized networks (K )but these values decrease only slightly when the net-work size increases. However, by applying backfitting(B) the projection directions can be further optimized:VBAR now also yields low FVU values even for largernetworks. On the other hand, backfitting soon leads tooverfitting for N = 10. Finally, by applying eVBAR,the FVU values can be further decreased well belowthose of VBAR, BPL and PPL. Furthermore, overfit-ting for N = 10 is avoided, however, the FVU valuesdo not further decrease below that ofK = 3. An exam-ple of a reconstructed function is shown in Fig. 4(B)for VBAR with backfitting.

5. Regression on Pet Images

In order to test our regression procedure on a real-worldexample, we consider positron emission tomography(PET) images. The images were obtained in our lab-

Figure 5. Application of regression fitting on images. Definitionof central pixel at coordinate(i, j ) and its set of surrounding pixels,termedsurround(i, j ) (shaded area).

oratory (in collaboration with the PET center of theUniversity Hospital, K.U. Leuven) using a CTI scan-ner 931/08/12 which monitored the local blood flowwith the H15

2 O method. Radioactivity was measuredin 15 planes parallel to the orbito-meatal line. The im-age acquisition procedure was the same as the one de-scribed in [20] except that the dose was 40 mCi insteadof 50 mCi. An example of a raw PET image is shownin Fig. 6(A) and it is sized 128× 128 pixels at 8 bitsper pixel.

Regression fitting can be performed in the followingway. Consider an imageI (i, j ) sizedM × M pixels inwhich we select anm×m region or block (Fig. 5). Theselected region is divided into two parts: (1) the centralpixel at rowi and columnj and (2) the surrounding pix-els (shaded area), termedsurround(i, j ). In the case ofFig. 5,surround(i, j ) is a vector comprising 7× 7− 1pixels. In order to make the connection with regression(Section 3, Eq. (5)), consider the regression fitting ofthe grey value of the central pixel as a function of thevector of grey values of the surround: the grey valuesof the pixels in the surround define anm × m − 1 di-mensional vector of independent variables and the greyvalue of the central pixel represents the possibly noisymeasurement of the unknown scalar function which isto be estimated. If we assume a toroidal extension ofthe image, we will dispose ofM2 measurements in anm × m dimensional space.

Since our network in fact performs vector quantiza-tion in them×m dimensional space of the data points,we can use our regression approach to perform lossyimage compression: for a network withK projectiondirections andN neurons per direction, theN×K con-verged weight vectors andK projection directions actas the codebook vectors. The image is then compressedby giving the binary code for each non-overlapping

282 Van Hulle

Figure 6. Original PET image (A) and images obtained by regression fitting the original using surrounds sizedm= 3, 5 and 7, respectively(B)–(D). (E) Image obtained when using regression fitting for compression purposes (m= 7). (F) Decompressed image (E) smoothed withthe standardized smoothing operation of the statistical parametric mapping software for PET image analysis (MRC Cyclotron, HammersmithHospital, London).

m×m pixels image region in terms of the active quan-tization region inm × m dimensional space (i.e., 1-out-of-K (N + 1) quantization regions). The originalimage is then reconstructed, up to the quantization er-rors, using the nearest weight vectors and the projectiondirections of the regression model. We would like tostress that the simulation results we will now report onare still preliminary.

We have considered three surrounds sized 3×3−1,5×5−1, and 7×7−1 pixels (i.e.,m = 3, 5, 7, respec-tively). For the network we takeN = 10 andK = 5.We run eVBAR until the difference between the presentand the previous MSE is lower than 1.0 × 10−7 or un-til 1500 epochs have elapsed; we also apply backfit-ting. We partition the raw image of Fig. 6(A) into 4equally-sized quadrants and rescale the grey values in


Table 2. PET regression modeling results.

Surroundsize C(a5) SNRr SNRc SNRrc CR (bpp)

m = 3 0.00306 28.02 23.54 26.22 5.33 (0.667)

m = 5 0.00314 28.63 24.45 27.52 18.75 (0.24)

m = 7 0.00307 28.11 24.41 27.78 33.33 (0.122)

the 64×64 pixels subimage from the range [0, 255] tothe range [0, 2] and use the corresponding 4096 datavectors for training the network. We test the regressionmodel on the whole image (as well as on other im-ages) by predicting the grey value of the central pixel,given the grey values of the surround. If the predictedgrey value is retained instead, a new image will be ob-tained, termed the regressed image. This new imagecan be regarded as the result of adaptively filtering theraw image; the filter’s parameters were adapted duringtraining so as to minimize the discrepancy between thegrey value of the central pixel and its prediction.

The images obtained after regression modeling aredisplayed in Fig. 6(B)–(D), for the three surround sizesconsidered. The simulation results are quantified asfollows (Table 2): (1) the (residual) mean-square er-ror C(a5) for the training image (sized 64× 64 pix-els), (2) the signal-to-noise-ratio between the original(whole) image and the regressed image (SNRr ):

SNRr = 10 log10

∑i, j I (i, j )2∑

i, j ( I (i, j ) − I (i, j ))2, (dB)

(9)

with I the regressed image, (3) the signal-to-noise-ratiobetween the original image and the decompressed im-age (SNRc), and (4) the signal-to-noise-ratio betweenbetween the regressed image and the decompressed im-age (SNRrc). Finally, we have also listed the compres-sion ratio (CR) and the average number of bits per pixel(bpp) needed to store the compressed image. The com-pression ratio is estimated as follows: for the originalimage we need 8 bpp and for the compressed image weneed(N +1)× K = 55 values or 6 bits to code for thevector of grey values of eachm × m region in whichthe image is partitioned (plus the codebook vectors),i.e., 6

m×m bpp on average; the compression ratio is thenCR= 8

6/(m×m)(without the codebook vectors). An ex-

ample of a decompressed image is shown in Fig. 6(E)for m = 7.

We observe from Table 2 that even for small-sizedsurrounds, the signal-to-noise ratios obtained are quite

acceptable so that, for regression modeling, small sur-rounds will suffice. Evidently, when compression isenvisaged, larger surrounds will yield higher compres-sion ratios. Note that the signal-to-noise ratios such asSNRc are somewhat inflated because most of the origi-nal PET image is dark. (We plan to explore our regres-sion technique on other, non-PET images to verify thispoint.) Finally, as is presently done for the raw PETimages (such as Fig. 6(A)), the decompressed image(Fig. 6(E)) is also subjected to a statistical parametricmapping analysis (SPM software package; MRC Cy-clotron, Hammersmith Hospital, London) [21]. Thefirst step in that analysis is a standardized smoothingoperation with a Gaussian filter of which the widthat half height is 20 mm (or 10.652 pixels) in the im-age plane. If we perform the same smoothing oper-ation on our decompressed image, then the resultingsmoothed image becomes perceptually indistinguish-able from the smoothed raw image (Fig. 6(F); SNR=39.27 dB).

6. Conclusion

Dynamical knot allocation is computationally hardin traditional regression procedures [22] but is han-dled very naturally with self-organizing topographicmaps: the neurons of the map represent the knots andtheir weight vectors are dynamically allocated withan unsupervised competitive learning rule. In this ar-ticle, we have applied a novel, and computationallysimple, competitive learning rule to nonparametric pro-jection pursuit regression. The rule, called the Vecto-rial Boundary Adaptation Rule (VBAR), differs fromthe SOM algorithm [1] in that the weights convergeto the medians of the neurons’ receptive fields insteadof their averages. Medians are less sensitive to out-liers than averages and this explains the superior per-formance of VBAR when the number of activationfunctions fk (hidden units) is small; for larger num-bers, each added activation function introduces a biasin the distribution of the residuals and this, in turn,affects the performance of the subsequently-added ac-tivation function since medians are more sensitive tobiased noise than averages. A way to reduce this effectis to apply backfitting to further optimize the projec-tion directions. Furthermore, we have extended VBARwith a neighborhood function (eVBAR), the purposeof which was to perform kernel smoothing with the to-pographic map and in this way to further improve thegeneralization performance.

284 Van Hulle

Finally, we have used our regression procedure toperform adaptive filtering on raw PET images and, al-beit the results are still preliminary, we have obtainedgood regression models. These models in turn allowedfor a data compression from 8 bpp down to 0.122 bpp.However, more research work is needed to decide if theuse of a regression model will better capture the con-text of given pixel than what can be achieved by the useof vector quantization directly [12], as is usually as-sumed with topographic maps [2, 13, 14]. In any case,projection pursuit learning is expected to yield betterregression models in high-dimensional data spaces.

Acknowledgments

The author wishes to thank Dr. P. Dupont of the PETcenter, University Hospital K.U. Leuven, for providingthe PET images. M.M.V.H. is a research associate ofthe Fund for Scientific Research—Flanders (Belgium)and is supported by research grants received from theFund for Scientific Research (G.0185.96) and the Eu-ropean Commission (ECVnet EP8212).

References

1. T. Kohonen, “Self-organized formation of topologically correctfeature maps,”Biol. Cybern., Vol. 43, pp. 59–69, 1982.

2. T. Kohonen,Self-Organizing Maps, Springer, Heidelberg, 1995.3. H. Ritter, T. Martinetz, and K. Schulten,Neural Computation

and Self-Organizing Maps: An Introduction,Addison-Wesley,Reading, MA, 1992.

4. H. Ritter and K. Schulten, “Combining self-organizing maps,”Proc. Intl. Joint Conference on Neural Networks 2, pp. 499–502,1989.

5. V. Cherkassky and H. Lari-Najafi, “Constrained topologicalmapping for nonparametric regression analysis,”Neural Net-works, Vol. 4, pp. 27–40, 1991.

6. J.H. Friedman and W. Stuetzle, “Projection pursuit regres-sion,” Journal of the American Statistical Association, Vol. 76,No. 376, pp. 817–823, 1981.

7. T. Heskes and B. Kappen, “Self-organization and nonparametricregression,”Proc. ICANN’95, Vol. I, pp. 81–86, 1995.

8. D. Martinez and M.M. Van Hulle, “Generalized boundary adap-tation rule for minimizingr th power law distortion in high res-olution quantization,”Neural Networks, Vol. 6, pp. 891–900,1995.

9. M.M. Van Hulle, “Topographic map formation by maximiz-ing unconditional entropy: A plausible strategy for “on-line”unsupervised competitive learning and non-parametric densityestimation,” IEEE Trans. on Neural Networks, Vol. 7, No. 5,pp. 1299–1305, 1996.

10. M.M. Van Hulle, “Topology-preserving map formation achievedwith a purely local unsupervised competitive learning rule,”Neu-ral Networks, Vol. 10, No. 3, pp. 431–446, 1997.

11. J.-N. Hwang, S.-R. Lay, M. Maechler, R.D. Martin, andJ. Schimert, “Regression modeling in back-propagation andprojection pursuit learning,”IEEE Trans. on Neural Networks,Vol. 5, No. 3, pp. 342–353, 1994.

12. N.M. Nasrabadi and R.A. King, “Image coding using vec-tor quantization: A review,”IEEE Trans. on Communications,Vol. 36, pp. 957–971, 1988.

13. N.M. Nasrabadi and Y. Feng, “Vector quantization of im-ages based upon the Kohonen self-organizing feature maps,”Proc. IEEE Intl. Conference on Neural Networks (San Diego1988), Vol. I, pp. 101–108, IEEE, New York.

14. J. Naylor and K.P. Li, “Analysis of a neural network algorithmfor vector quantization of speech parameters,”Neural NetworksSupplement, Vol. 1, p. 310, 1988.

15. R.C. Gonzales and R.E. Woods,Digital Image Processing,Addison-Wesley, Reading, MA, 1993.

16. H. Ritter and K. Schulten, “On the stationary state of Kohonen’sself-organizing sensory mapping,”Biol. Cybern., Vol. 54,pp. 99–106, 1986.

17. A. Gersho, “Asymptotically optimal block quantization,”IEEETransactions on Information Processing, Vol. IT-25, pp. 373–380, 1979.

18. J. Hertz, A. Krogh, and R.G. Palmer,Introduction to the The-ory of Neural Computation, Addison-Wesley, Reading, MA,1991.

19. M.M. Van Hulle, “Nonparametric density estimation and-regression achieved with a learning rule for equiprobabilistic to-pographic map formation,”Proc. IEEE NNSP96, Seika, Kyoto,pp. 33–41, 1996.

20. P. Dupont, G.A. Orban, R. Vogels, G. Bormans, J. Nuyts,C. Schiepers, M. De Roo, and L. Mortelmans, “Different per-ceptual tasks performed with the same visual stimulus attributeactivate different regions of the human brain: A positron emis-sion tomography study,”Proc. Natl. Acad. Sci. USA, Vol. 90,pp. 10927–10931, 1993.

21. K.J. Friston, A.P. Holmes, K.J. Worsley, J.-P. Poline, C.D. Frith,and R.S.J. Frackowiak, “Statistical parameter maps in functionalimaging: A general linear approach,”Human Brain Mapping,Vol. 2, pp. 189–210, 1995.

22. J.H. Friedman and B.W. Silverman, “Flexible parsimonioussmoothing and additive modeling,”Technometrics, Vol. 31,No. 1, pp. 3–21, 1989.

Marc M. Van Hulle received an M.Sc. degree in ElectrotechnicalEngineering (Electronics) and a Ph.D. in Applied Sciences from theK.U. Leuven, Leuven (Belgium) in 1985 and 1990, respectively.


He also holds B.Sc. Econ. and MBA degrees. In 1992, he hasbeen with the Brain and Cognitive Sciences department of the Mas-sachusetts Institute of Technology, Boston (USA), as a postdoctoralscientist. Presently, he is affiliated with the Laboratory for Neuro-and Psychophysiology, Medical School, K.U. Leuven, as an asso-ciate Professor, and as a research associate of the Fund for Scientific

Research—Flanders (Belgium). He is also an Executive memberof the IEEE Signal Processing Society, Neural Networks for SignalProcessing Technical Committee (1996–1999). His research inter-ests include neural networks, biological modeling, vision, and [email protected]

nonparametric regression modeling with equiprobable topographic maps and projection pursuit learning...

Documents