the kernel self organising map

Fourth Internatirmrl G m ~ on knowkdge-Based ln&eUigrnt Engineering Systems 6 Allied Technologirs. 30 Aug-1 Sept 2000, Brighton,UK

Abstract

The Kernel Self Organising Map

Donald MacDonald and Colin Fyfe Applied Computational Intelligence Research Unit,

The University of Paisley, Scotland.

email: mcdcwiO,fyfe-ciO@paisley. zu:. uk

We review a recently developed method of per- forming k-means clustering in a high dimensional feature space and extend it to give the resultant mapping topology preserving properties. We show the results of the new algorithm on the standard data set, random numbers d m m uniformly from [0,1)2 and on the Olavetti database of faces. The new algorithm convetyea extremely quickly.

1 Introduction The use of kernels in unsupervised learning has become popular particularly in the field of Ker- nel Principal Component Analysis (KPCA) [6, 5 , 41 The method has recently been extended to other unsupervised techniques e.g. Kernel Principal Factor Analysis, Kernel Exploratory Projection Pursuit and Kernel Canonical Cor- relation Analysis [l]. In this paper, we extend the method and create a Kernel equivalent of the Self Organising Map of Kohonen [2].

The set of methods known under the generic title of Kernel Methods use a nonlinear m a p ping to map data into a high dimensional fe* ture space in which linear operations are performed. This gives us the computational advantages of h e a r methods but also the repre- sentational advantages of nonlinear methods. The result is a very efficient method of per- forming nonlinear operations on a data set. In more detail, let #(x) be the nonlinear function which maps the data into feature space, F. Then in F,we can define a matrix, K, in terms of a dot product in that space i.e. K ( i , j ) = 4(xi).#(xj. Typically we select the matrix K based on our knowledge of the properties of the matrix rather than any knowledge of the function $0. The kernel trick allows us to define every operation in feature space in terms of the kernel matrix rather than the nonlinear function, (60.

2 The Kohonen Feature Map The interest in feature maps stems directly from their biological importance. A feature map uses the physical layout of the output neurons to model some feature of the input space. In particular, if two inputs XI and xz are close together with respect to some distance measure in the input space, then if they cause output neurons pa and pt. to fire respectively, pa and 96 must be close together in some layout of the output neurons. Further we can sfate that the opposite should hold if ya and 86 are close te gether in the output layer, then those inputs which cause pa and to fire should be close together in the input space. When these two conditions hold, we have a feature map. Such maps are also d i e d topology preserving maps. There are several ways of creating feature maps - the most popular is Kohonens.

Kohonens algorithm is exceedingly simple - the network is a 2-layer network and com- petition takes place between the output neurons; however now not only are the weights into the winning neuron updated but also the weights into its neighbours. Kohonen defined a neighbourhood function A(i, i*) of the winning neuron i*. The neighbourhood function is a function of the distance between i and io. A typical function is the DBerence of G a w sians function; thus if unit i is at point ri in the output layer then

(1) Kohonen feature maps can take a long while to converge a problem which Kernel SOM solves.

3 Kernel K-means Cluster- ing

We will follow the derivation of [5] who have shown that the k means algorithm can be per-

0-7803-6400-7/00/$10.00 02000 IEEE 317

Fourth Infernationrrl Conference on knowledge-Based Intelligent Engim'ng Systems &Allied Technologies, 3@ Aug-I* Sqtt 2000, Brighton,UK

formed in Kernel space. The aim is to find k means, m, so that each point is close to one of the means. Now as with KPCA, each mean may be described as lying in the man- ifold spanned by the observations, $(xi) i.e. m, = Zi~,&(xi). Now the k means alge rithm choses the means, m,, to minimise the Euclidean distance between the points and the closest mean

i.e. the distance calculation can be accomp ished in Kernel space by means of the K matrix alone.

Let Mi, be the cluster assignment variable. i.e. Mi, = 1 if $(xi) is in the pth cluster and is 0 otherwise. [SI initialise the means to the first training patterns and then each new training point, #(xt+l),t + 1 > k, is assigned to the closest mean and its cluster assignment variable calculated using { ; if l l~(xt+1)-mail Mt+l,a = < Il4(Xt+l) - m,ll,Vp # a (2)

otherwise

In terms of the kernel function (noting that k(x,x) is common to all calculations) we have

i 0 otherwise We must then update the mean, m, to take account of the (t + l)th data point

1 if C i , j TaiTajk(xi,xj) -2 xi TUik(X, Xi)

-2Ci.y,ik(x,Xi),vP # a < Ci,j Tpirpjk(xi,xj) (3) Mt+l,a =

mt+l - a - mt, + I(+(xt+1) - mt,) (4)

where we have used the term mL+l to designate the updated mean which takes into account the new data point and

(5 )

which leads to an update equation of

4 The Kernel Self Organis- ing Map

Now the SOM algorithm is a k means algorithm with an attempt to distrilbute the means in an organised manner and so the first change to the above algorithm is to update the closest neuron's weights and those of its neighbours. Thus we find the winning neuxon (the closest in feature space) as above but now instead of (31, we use

Mt+l,, = A ( a , p') (7)

where a is the identifier of the closest neuron. Now the rest of the algorithm cam be performed as before. However there is one difficulty with this: the SOM requires a great number of itera- tions for convergence and since C =

Fourth lndmurtionnl Coyhnce on knowledge6ased Intelligent Enghming Systems 6 Allied Technolog&, 30" Aug-ld Sept 2000, Brighbon.UK

which in terms of the kernel function may be written as

(9) Notice that this grid was obtained with a one dimensional neighbourhood function in feature space.

on artificial data drawn from two concentric circles. Not only is the topology preservation maintained on the data set, the two circles are readily separated in feature space (Figure 2). The first 9 nodes capture the inner circle, the others the outer.

The third data set is the Olivetti database of faces[3] which is composed of 6 individuals each in 10 different poses against a dark back- ground with no preprocessing.

Results are shown in Figure 3 in which the numbers denote which individual was identified by the corresponding node on the graph. Note that this time the grid is a grid showing the mapping in feature space and we are dis- playing on the graph the identity of a person in image space. We see that the faces have been very clearly grouped into clusters each of which is specif5c to a particular individual.

The second simulation uses the Same method

6 6 6 5 5 1 1 1 1

6 6 6 5 5 1 1 1 1

7 6 6 5 5 2 1 1 1

References 7 7 6 6 5 3 2 2 1 :I 8 7 7 6 5 3 3 2 2 111 D Charles, C. Fyfe, P.L. Lai, D MacDon- , 16 17 1* l8

11 10 10 15 15 16 16 19 19

ald, and R Wipal. Unsupervised Learning using Wid Kernels. (submitted).

[2] Tuevo Kohonen. Self- Organising Maps. 1 1 11 12 14 15 IS P a0 2D Springer, 1995. 1 1 12 12 13 14 15 P 20 P

[3] F. Samaria and A. Harter. Parameterisa- 12 12 13 14 14 P 20 P tion of a stochastic model for human face identifiation. In 2nd IEEE Workshop on Applications of Computer Vision, 1994. Figure 1: The grid of points were not shown

[4] B. Scholkopf, S . Mika, C. Burges, to the KSOM during training but are used to identify which node is closest to each point in feature space. The number at each point on the grid identifies the winning neuron in feature space. There 19 a clear topographic ordering of the data.

P. Knirsch, K.-R. Muller, G. Ratsch, and A. J. Smola. Input space vs feature space in kernel-based methods. IEEE lkansactions on Neural Networks, 1O:lOOO-1017,1999.

[5] B. Scholkopf, A. Smola, and K.-R. Muller. Nonlinear component analysis as a kernel eigenvalue problem. Neural Computation, 10:1299-1319,1998.

[6] A. J. Smola, 0. L. Mangasarian, and B Scholkopf. Sparse kernel feature analysis. Technical Report 99-04, University of Wiscosin Madison, 1999.

1

19

19

ZD

20

20

-20

319

95

8 -

7 -

6 -

5 6

4 6

Fourth btmrational Confience on bledge-SaSed Intelligent E n g i m h g Systems b Allied T c h m w , 30 Aug-1 %t 2000, Brigh&n,UK

5 5 2 5

2 L 2 5

2 6

2 6 2 2

6 6

6 4 1

4 4 4 4 1

1 1 12 13 ,4

P 15

16 ZL

n

7 8

5

10-

8 -

6 -

4 -

2 -

-2-

-4-

-6-

4-

18

0-

9 3

1 I

1 1

I

-lo;--- -10 4 4 d -2 0 2 4 6 8 10 Figure 3: Each individual is identified by an - integer 1, ..., 6. The nodes are arranged in a two dimensional *id as they wer,l during training. We see that each individual person is identified by a specific region of feature space.

Figure 2: The KSOM identifies the two concentric data sets.

320

the kernel self organising map

Documents