self-organized formation of a set of scaling filters and their neighbouring connections

Biol. Cybem. 75, 37-47 (1996) Biological Cybernetics �9 Springer-Verlag 1996

Self-organized formation of a set of scaling filters and their neighbouring connections Tamks Rozgonyi 1,2, Lfiszl6 Balfizs 1,3, Tibor Fomin 1, Andrfis L6rincz 1

1 Department of Photophysics, Institute of Isotopes of The Hungarian Academy of Sciences, P.O. Box 77, H-1525 Budapest, Hungary 2 Department of Physics, Attila J6zsef University, Szeged, Hungary " 3 Bolyai Institute of Mathematics, Attila J6zsef University, Szeged, Hungary

Received: 19 April 1995/Accepted in revised form: 25 March 1996

Abstract. A set of scaling feedforward filters is developed in an unsupervised way via inputting pixel-discretized extended objects into a winner-take-all artificial neural network. The system discretizes the input space by both position and size. Depending on the distribution of input samples and below a certain number of neurons the spatial filters may form groups of similar filter sizes with each group covering the whole input space in a quasi- uniform fashion. Thus a multi-discretizing system may be formed. Interneural connections of scaling filters are also developed with the help of extended objects. It is shown both theoretically and with the help of numerical simulation that competitive Hebbian learning is suitable for defining neighbours for the multi-discretizing system. Taking into account the neighbouring connections between filters of similar sizes only, i.e. within the groups of filters, the system may be considered as a self-organizing multi-grid system.

1 Introduction

The eventual goal of developing artificial neural networks (ANN) is to solve problems in unknown environ- ments and in unexpected conditions. This task is carried out through self-organization with the help of unsupervised learning from input patterns. In many cases (e.g. in robotic control) the task is to determine positions, sizes and neighbourhood relations of objects in the real world. Determination of neighbourhood relations means identi- fying connected regions in the topology of the external world. This information - i.e. positions, sizes and neighbourhood relations - may then be used for categorization purposes if needed.

The problem of determining positions and neighbourhood relations without any a priori information about the topology of the external world has already

Correspondence to: A. L6rincz

been investigated by ANN methods. Neurons of such networks usually store input sample vectors, so-called prototype vectors. Position determination involves choosing the neuron whose sample vector is the closest to the given input. These neurons work like spatial filters. The neighbourhood relations of these filters are then represented by interneural connections. As has been shown in many previous publications, competing neurons are able to create spatial filters in a self-organizing way through different approaches (Luttrell 1994), one of which is Kohonen discretization (Kohonen 1984). Kohonen based his method on the winner-take-all (WTA) algorithm (Grossberg 1976, 1987). The Kohonen method also uses so-called neighbour training, which means that neighbouring neurons can learn from inputs of the winning neuron. A major advantage of the Kohonen method is that neighbour training speeds up the training procedure. However, the shortcoming of this model is that the neighbourhood relations of neurons are prescribed, which assumes a priori knowledge of the dimension of the space to be discretized and the shape of the region covered by input patterns. After recognizing that the crucial point of the Kohonen model is the prewired neighbourhood relation of the competitive network, several attempts were made to improve the model. Suggestions included a self-growing structure (Fritzke 1991) that can produce new neural units at any part of the network if needed and can introduce the new units with only minor perturbation to the system. The weak- ness of this approach is that a priori knowledge of the dimensionality is still indispensable. Another approach is the so-called neural gas algorithm (NGA) (Martinetz and Schulten 1991). The NGA does not need a priori knowledge of the topology of the space to be discretized and can take advantage of the neighbour training. Two other methods that were proposed and that provide correct representation of topological spaces are the method of learning with the help of extended input objects (Szepes- vgri and L6rincz 1993; Szepesvfiri et al. 1994) and the method of Voronoi polygons (Martinetz 1993). Both methods are capable of building up spatial filters and can

38

develop interneural connections to represent the correct neighbourhood relations of these filters. It has been shown that in the case of local extended input objects (Szepesvhri and Lrrincz 1993; Szepesvhri et al. 1994), interneural connections may be used for Kohonen- like cooperative neighbour training. In this way one may return to the original idea of the Kohonen network without prescribing the neighbourhood relations of neurons.

In many cases input objects of different sizes need to be distinguished, even if they are positioned in the same place in the external world. This means that the discretization of a parameter space defined by the positions and sizes of input objects is needed rather than the discretization of the input space itself. The solution of this task through single-sign localized spatial filters requires re- dundant discretization of the external world by spatial filters of different sizes that are sensitive not only to the position of a pattern but also to its size, thus discretizing the external world by both position and size. Such a filter system will be called a scaling filter system. Although the formation of spatial filters is fully treated in the literature, less attention has been paid to the problem of formation of scaling filters. One of the possibilities of Marshall's scale-sensitive neural network (Marshall 1990,1992), which was introduced to solve categorization tasks. His network consists of excitatory feedforward connections and inhibitory lateral connections, the latter ensuring the competition of neurons based on their input activities. The scale sensitivity is ensured by appropriate normalization accomplished by the Weber Law rule (see Marshall 1990, and references therein), according to which the input signals of neurons are calculated. The output of the network is determined by a shunting equation. For the network training, a typical variant of the Hebbian learning rule was used. In theory, the network could be used to develop scaling filters, but no work has been done along these lines.

In the present paper a scale-sensitive neural network that differs from Marshall's is discussed. Rozgonyi et al. (1994) have shown that if a WTA network is trained according to the learning rule [equation (4.15) of Kohonen 19841, it leads to the formation of filters of different sizes, covering the whole input space in a quasi- uniform fashion. We now extend our previous work by a detailed study of filter formation.

As mentioned above, the neighbour linking connections play a crucial role in the Kohonen network. They make neighbour training possible and speed up the training provided that the connections between the neurons functioning as spatial filters are capable of representing the neighbourhood relations of the filters. Another ap- plication of neighbouring connections is path planning. Path planning is possible for a discretization by using neighbouring connections as either a linear (Lei 1990) or a nonlinear (Connolly and Grupen 1993; Glasius et al. 1995) resistive network. Path planning and motion control can be learnt in a self-organizing way (Fomin et al. 1994). Interneural connections can also be employed in ANN image processing, e.g. in the formation of the skeleton of a planar shape (i.e. the loci of the centres of those

bitangent circles that lie entirely within the shape) (Marczell et al. 1996). These studies showed that in the course of skeleton formation multi-grids (Dyer 1987) should be used to overcome the problem of noisy boundaries. Thus the possibility for self-organized formation of such multi-grids with the help of interneural connections on scaling filters may also be worth investigating. Conse- quently, the neighbourhood relations of scaling filters and the possibility of the representation of their neighbourhood relations by interneural connections should be examined. These investigations are also carried out in the present work.

The paper is organized as follows: Background information is given in Sect. 2, including a short description of spatial filters, the problem of scale sensitivity, appropriate methods of network training, definitions of the neighbourhood relations of spatial filters and the formation of interneural connections for representing the neighbourhood relations of filters. New results are given in Sect. 3, including a formula for calculating the sizes of the scaling filters and a general condition of the equivalence of the two definitions of neighbourhood relations. A short derivation of the formula for filter sizes is given in Appendix 1. The equivalence of the two definitions for neighbourhood relations of neurons is proved in Appen- dix 2. Section 4 presents the results of computer simulations for one-dimensional (1D) and two-dimensional (2D) tori (i.e. for 1D and 2D input worlds with periodic boundary conditions) as well as for bounded worlds. The development of both the spatial filters and the interneural connections is investigated. Results are then discussed in Sect. 5. Conclusions are drawn in Sect. 6.

2 Background information

2.1 Description of the network and the spatial filters

The network under investigation can be described as follows. Assume an n-dimensional Euclidean space as the external world and its pixel-discretized image as the input space. Let Np denote the number of discretizing pixels. Thus the input space is 9t N,. Each pixel of the input space corresponds to a certain point in the external world. Let rj denote the n-dimensional position vector corresponding to pixel j in the external world. An object in the external world will be characterized by its position and size. In the input space an input object is represented by its Np-dimensional intensity distribution vector x = {xl, x2,. �9 �9 XN,}. The network consists of Nn neurons, each of which receives the input vector x via input weights. Let us denote the weight connecting the ith neuron with the jth component of the input vector by qij (i = 1 . . . . . Nn; j = 1 , . . . , Np). (Further on q~ is used to denote the set of input weights belonging to neuron i, that is qi--- {qil, q i 2 , . . . , qiN,}.) When an input vector x is presented to the network, input activities ai(x) are computed for all the neurons, with

ai(x) = ~ qo x~ (1) J

The neuron with the maximum ai(x) value is selected as the winner. This winning neuron marks the category (position, size, etc.) the present input belongs to. Let us denote the index of the winning neuron for an actual input x by s(x). According to the WTA mechanism the output of neuron i is given by 6is(x), where 6~s is the Kronecker delta, i.e. 6i~ = 1 if i = s, otherwise 6i~ = 0. If there are two or more neurons having maximum input activity, one can be chosen as the winner in any optional way (e.g. at random) to ensure the WTA mechanism. Nevertheless, the values of the a~(x) input activities form a continuous range; thus, the abovementioned case can be ignored in practice.

The domain where the input connection strengths are high, i.e. the domain of the pixel to which the neuron is sensitive when receiving inputs, will be called the receptive field of the neuron. If the receptive fields of neurons are localized then the neurons work as spatial filters, thus discretizing the external world by position. In the case of localized receptive fields, the r t~ position of filter i in the external world, i.e. the position of the receptive field of neuron i, may be defined as the position of the centre of the q~j distribution:

r(i) = Zj qij r j (2) ZJ q~J

where rj is the position vector of pixelj. The radius R~ of filter i may then be defined on the basis of the expression

X/E j qij(rj -- r(i)) 2 Ri ~j qij (3)

The position rx and the size (radius) Rx of the input pattern x can also be calculated with the help of the same expressions by replacing the qij values with the components of vector x.

The WTA mechanism - according to which it is only the winning neuron that can train its qij connections - has been shown to be able to produce localized receptive fields in a self-organizing way with the help of the common Hebbian learning rule for the training of the qij connections and inputting pixel-discretized, local extended objects. Thus the discretization of the input space by position can be achieved. Such discretization provides receptive fields of equal sizes (Szepesvfiri et al. 1994).

2.2 The problem o f sensitivity to input sizes

Concerning scale sensitivity the problem consists of two parts: (i) that of finding the filters that are able to distinguish between input patterns of different sizes; (ii) that of finding a self-organizing learning rule that can lead to the development of such filters when the network is inputted by objects of different sizes.

It is obvious that filters expected to be sensitive to objects of different sizes should have receptive fields of different sizes, and their respective filters should suit the sizes of inputs that they are supposed to be sensitive to. The question is, how the strengths of input connections should depend on the size of the neuron's receptive field. For quick orientation, let us assume that we have (i) a 1D

39

O b

i -F]-I- , !

Fig. la, b The size sensitivity problem. Two filters of different sizes and an input sample are shown for the two extremes: a the large filter wins for small-sized inputs, b the small filter wins for large-sized inputs. Filters are shown by the continuous lines; the input pattern is shown by the dashed line. The filter that can win for the presented pattern is drawn with a thick line

external world, (ii) digital inputs (i.e. the pixel intensity is either 1 or 0), (iii) rectangular spatial filters as displayed in Fig. 1. First let us consider the case when input connection strengths are independent of the size of the receptive fields. Two filters with receptive fields, one of which contains the other, are depicted in Fig. la. Under these conditions, if an input is about the size of the smaller filter but does not overlap with this filter perfectly, the neuron with the larger filter will win for the small-sized input. To see the other extreme let us assume that the input connection strengths within the receptive field of the neuron are proportional to l iNt, where Nr is the number of connections within the receptive field for the neuron in question, i.e. connection strength is inversely proportional to the size of the receptive field. This case is depicted in Fig. lb. Under these conditions, the neuron with the smaller receptive field will win for larger input sizes if the size of the input is just a bit smaller than the size of the larger filter, creating another unsatisfactory solution.

We have thus found that scale sensitivity is not ensured when qij values are independent of the size of the receptive field (the dependence has the power of 0) nor when qij values are inversely proportional to the size of the receptive field (the dependence has the power of - 1). To solve the scale sensitivity problem, the dependence of qij on the size of the receptive field should be between these two values. For example, the filters with weight vectors of normalized lengths (i.e. with qo distributions for which ~ q~ = 1 for all i) are expected to solve the problem of scale sensitivity.

2.3 Suitable ways o f network training

One of the learning rules that provides the normalization of weight vectors automatically is the one investigated by Ritter et al. (Ritter et al. 1991; Obermayer et al. 1990). It can be expressed as follows:

qij(t + 1) = qij(t) + e(t)6~(~)xj (4) ~/~,j(qij(t) + e(t)6is(x) x j) 2

where t is the iteration time and e(t) is the time-dependent learning rate. (e(t) usually tends to zero with t asymp- totically.) Learning rule (4) satisfies the requirements of

40

the WTA mechanism and ensures the normalization as well. However, it strictly keeps the weight vectors normalized during the whole training, which can cause a problem if the initial qo distributions are chosen in an unsuitable way - as will be discussed in detail later.

Another suitable update rule is the one investigated first by Kohonen (1984) and used by Rozgonyi et al. (1994) for developing scaling filters. This learning rule may be expressed as follows:

Aqij = e( t ) f i s (x) ( - - a i (x )q i j -t- x j ) (5)

where Aq~j is the change of the j th component of the weight vector belonging to the ith neuron. Taking into account input vectors and initial weight vectors - both having components of positive or zero values only

- a situation is created where each q~i value remains positive or zero for sufficiently small, positive e(t) values during the whole training. In this case it is easy to show that learning rule (5) minimizes the cost function E(ql . . . . . qN,) defined as

E = qi 2 -- 1 (6)

To show this, consider the AE change of the cost function E for an arbitrary input x:

OE A E = (7)

where Aqo is given by (5) for the input pattern x. Taking the corresponding derivatives of E and substituting them into (7) together with the expression for Aqo one gets for AE:

AE = - - 4e(t)~ 6is(x)ai(x) qi} -- 1 (8) i

It can be seen that AE <. 0 for any input vector x. This means that learning rule (5) results in the minimum cost function if the weight vectors become normalized. More- over, taking only the first-order terms of e(t) of (4) into consideration, it can be shown that learning rules (4) and (5) are equivalent in the vicinity of the stationary state, i.e. under the following two conditions: (i) if e(t) 4 1, and (ii) if the lengths of q~ vectors are approximately equal to 1.

2.4 Neighbourhood relations o f neurons and their representation by interneural connections

In this section the neighbourhood relations of neurons and their representation by interneural connections are treated. The notations of Martinetz and Schulten (1994) will be used. The term 'input space' was introduced in Sect. 2.1, referring to the Np-dimensional space of the pixel, that is ~N,. Each input pattern can be given by a vector x in 9t N.. The neurons are represented by q~ vectors in 9~ N~. In the case of a WTA network the neurons divide the input space into winning domains. Let V~ denote the winning domain in 9t Np belonging to neuron (i):

Vi = {x ~ !ltN" [ ai(x) >/ak(x)Vk} (9)

Let the Vi~ sets be given by all the x e 91N~ for which the ai(x) and aj(x) are the two highest input activities; that is, Vii is defined as

Vij = {x ~ ~Nt' I ai(x)/> ak(X ) and

aj(x) ~> ak(X)Vk, k ~ i , j} (10)

Let Vi (M) denote the common ]~art of Vi and an optional manifold M c 9tNp; that is, Vi ( .... = V i n M . (Later we shall specify M as the set of input vectors.) Let Vi~ ) denote the common part of V o and M; that is, Vi~ M) = Vii c~ M. If all the x and qi vectors are normalized and ai(x) is defined by (1), then V~ is the first-order Voronoi polyhedron of point qi in 9t N., and Vii is the second-order Voronoi polyhedron of the points qi and c b for all i andj. Vi (M) and Vi~M) are the so-called masked Voronoi polyhedra of first and second order, respectively. The neighbourhood relations of neurons stand for the neighbourhood relations of their winning domains, that is:

Definition 1: Neuron i and neuron j are neighbours of each other if there exists x ~ Vi ~M) ~ V) u).

V.(. M) A different definition, based on ,j sets, can be introduced as follows:

Definition 2: Neuron i and neuron j are neighbours of each other if there exists x ~ vi~M).

The neighbourhood relations of neurons are represented by the interneural connections. Let us denote the connection strength between neuron i and neuron k by Wik(i, k = 1 . . . . . N,). The representation is said to be correct if and only if neurons neighbouring according to Def. 1 are connected by nonzero connection strengths. As shown in Martinetz and Schulten (1994), in the case of fixed ql vectors, taking wik = 0Vi, k as the initial conditions, competitive Hebbian learning can lead in an unsupervised way to Wik values which differ from zero if and only if neuron i and neuron k are neighbours according to Def. 2. An analogue version of this learning rule, that we will use for wik training, is the following:

AWik = ewYik(--Wik + ai(x) ak(X)) (11)

where Y~k = 1 only if x ~ Vi~ M), otherwise Y~k = 0. Index w of the ew learning rate in (11) is used to distinguish it from e, showing that the setting of W~k interneural connection weights may be carried out on a timescale different from that of the q~ training. It is worth noting that the above W~k learning rule is symmetrical in i and k. As a consequence, the interneural connection strength matrix will be symmetrical too.

Using competitive Hebbian learning, correct representation of neighbourhood relations by the interneural connections may be achieved only when Def. 1 and Def. 2 are equivalent. The equivalence of Def. 1 and Def. 2 is shown in Martinetz and Schulten (1994) under two conditions: (i) Each qe vector belongs to M and (ii) the q~ vectors are dense on M (for details see Martinetz and Schulten 1994). In general, however, q~ ~ M may not be assumed. However, Defs. 1 and 2 may be equivalent when neither of the two conditions, (i) and (ii), is met. We will discuss this point further in Sect. 3.2.

41

3 Considerations on scaling filters

3.1 Considerations on filter sizes

Let us now consider the sizes of filters developed according to (5). To derive a formula for Ri in the case of learning rule (5) we will follow the train of thought used in Ober- mayer et al. (1990) for the same purpose but for learning rule (4). Let us assume that the network is inputted by local extended objects represented by a Gaussian 'excitation profile' with parameters ~r~.p ('size' of the input pattern) and rinp (position of the input pattern). Thus an input pattern vector x is generated by the expression

Xj ----- N ( a i n p ) exp(-(rj - r inp)2/ t r i2np) (12)

where N(t%p) is a normalization factor ensuring Y4x 2 = 1. Ignoring the effects arising from the truncation of input patterns at the edges of the image, it is easy to point out that the calculated position vector rx (the position of the centre of the xj distribution: see Sect. 2.1) is equal to the 'theoretical' position vector r~,v for an input vector x generated according to (12). In addition, the calculated radius Rx of such an input vector x is proportional to a~np. (Bearing in mind the definition of Rx given by (3), it is easy to show that for a 1D external world Rx is equal to ainp/x/~; for a 2D external world Rx is equal to a~np.) Thus an input pattern defined by (12) is fully characterized by its position rx and size R~. Consequently the index of the winning neuron s(x) for input x depends on rx and R~ only, so s(x) = s(r~, Rx). Let P(r~, Rx) denote the probability density of input patterns x. A further assumption that the system has reached its stationary state can be made. In this ease, the expectation value of each q~j weight is constant. Thus the expectation value of each filter size R~ is constant as well. Now the following expression may be derived for filter size R~:

~/~ 6i~(,~.R.)[Rx + ( r x - r ) ]~x~P(r~,R~)drxdRx R~ = ~ 2 (1) 2 2

~ 6i~(,~.RO ~j Xj P(rx, R~)dr~ dR 2

(13)

where integrations run over all rx and Rx values. A short derivation of (13) is given in Appendix 1.

A very similar expression of Ri for learning rule (4) was derived by Obermayer et al. (1990). In view of the equivalence of (4) and (5) in the vicinity of the stationary state, the similarity is not surprising. However, there is some difference arising from the different training tasks. In contrast to the case in Obermayer et al.'s (1990) investigations, in our study it cannot be assumed that s(x) depends on the position of the pattern only, because here the network is inputted with patterns of very different sizes.

3.2 Considerations on correct neighbourhood relation of scaling filters

Let us first assume that each input pattern is fully characterized in the external world by some independent parameters. For our Gaussian-like excitation profiles these parameters are rinp and a~np. Thus when the external

world is an n-dimensional Euclidean space each pattern can be defined unambiguously by a point 4 in ~R" § 1. Let, therefore, ~R n§ be called the parameter space. The domain of the parameter space from which the patterns are generated according to some probability distribution function P(4), is denoted by X:

X = {4 ~ ~R "+ ' IP(4) > 0} (14)

Assume, moreover, that there exists a bijective, continuous mapping ~ from X into the input space; that is, 9~ N~. Such a mapping is defined by (12). A pattern characterized by 4 in the parameter space is represented in the input space by the vector x = ~(4). Let M (M c 9t N.) denote the image of X in the input space:

M = {x ~ ~N'134 ~ x , x = ~(~)} (15)

We introduce the notation di(4) to represent ai(#(4)) in cases when there exists ~ ~ X so that x = q~(4)- Since both the ai(x) functions defined by (1) and the mapping q~ are continuous, the di(4) functions in X are continuous, too. Taking into account the # mapping, it is obvious that X is divided into winning domains that correspond to V~ (M) domains in 9t Np. On these bases, the winning domain of neuron i in X, which is denoted by Wi, is introduced. This is the pre-image of Vi tu) for mapping #:

W, = {4 ~ X ld,(~) >1 ak(4)Vk} (16)

Note that wi V~ (M) covers M entirely, while u~W~ covers X entirely. Note also that all the Vi <M) and Wi sets are closed sets. Let W~j denote the set of 4 ~ X, for which d~(4) and dj(4) are the two highest input activities; that is, Wii is defined by

Wi~ = {4 ~ Xld~(4) >~ dk(4) and

dj(~) >1 dk(~)Vk, k ~ i,j} (17)

The images of Wij sets in the input space are Vi (M) values. Focusing on the discretization of the parameter

space, the neighbourhood relation of the neurons should be defined in the parameter space and not in the input space. By analogy with Def. 1 and Def. 2 the following definitions may be set up, concerning the neighbourhood relation of neurons discretizing the parameter space:

Definition 1": Neuron i and neuron j are neighbours of each other if there exists ~ E Wi c~ Wj.

Definition 2*: Neuron i and neuron j are neighbours of each other if there exists ~ ~ W O.

For normalized x and qi vectors and if the input activity is defined by (1), Def. 1" and Def. 2* are the same as Def. 1 and Def. 2, respectively. Thus competitive learning can lead to a correct representation of the neighbourhood relation of scaling filters when Def. 1" and Def. 2* are equivalent. As is proved in Appendix 2, Def. 1" and Def. 2* are equivalent if the following two conditions are fulfilled:

(a) None of the Wi sets is a zero set. (b) The W~(j) = {4 [dj(4)/> dk(4)for all k, k ~ i} domains are all pathwise connected regions in X for all i and j, i.e.

42

for all 4, r /s Wi(j) there exists a continuous curve originating from ~ and ending in q so that each point of the curve belongs to Wi(j).

Conditions (a) and (b) are more general than conditions (i) and (ii) of Sect. 2.4.

4 Results of the simulations

4.1 Results of filter formation

To avoid the edge effects in our simulations, the input space was a pixel-discretized image of either a 1D or a 2D torus surface. The network was input by local extended objects each of which was represented by an input vector x and was generated according to (12). The parameters 0"1rip and rinp were generated independently from each other. Parameters 0"inp were generated randomly in the (0"1, a2) interval with identical probabilities. Inputs were presented at random positions with uniform probabilities and were arranged according to the periodic boundary conditions. Filter formations based on (4) and (5) were investigated in two different initial conditions in both cases. According to the first one (i) each qo was initialized to some positive random number in such a way that all neurons have q~ vectors with length equal to L, i.e. the condition x/~/q/2.J = L for all neurons i was satisfied. In the second case (ii) there was only one q~j weight selected for each neuron and it was initialized to some positive random number between 0.8 and 1.2, and all the other qij weights were set to zero value ('one-weight initialization'). Network training according to learning rule (5) led to the same result, i.e. to a good discretization of the parameter space, under both initial conditions. 'Good discretization' means that none of the Wi sets is a zero set (i.e. each neuron takes part in the discretization) and X is discretized by each of its coordinates (i.e. by all input parameters). It should be mentioned that condition L > 1 allows each neuron to have a chance to win. If L < 1 then some neurons with small initial q~j values may never win and thus will not take part in the discretization of the input space. The explanation for this behaviour is that the length of the weight vectors - and thus the q~j values, too - increases when they are bigger and decreases when they are smaller than 1. Since the chances of winning of the neurons are linear in the qij values, the chances of winning of those neurons that have not yet won, and therefore whose qij values are still high, increase compared with those neurons that have already won provided that training is started from the initial condition L > 1. Thus each neuron can gain a chance of winning. When L < 1, the situation is the opposite and the chances of winning might drop for certain neurons before they have won at all; thus these neurons will no longer have a chance to win. Our experiments demon- strated this property.

The results of network training according to (4) were different under the two different initial conditions. These results were found to be the same as the previous ones in the case of 'one-weight initialization', i.e. good discretiz-

ation by both position and size was developed. But under the other initial condition most of the neurons did not take part in the discretization (i.e. most of the neurons could never win) and thus poor discretization by size was developed. The reason for this is that learning rule (4) strictly keeps the q~ vectors normalized to unit length during the whole training procedure. In other words, each q~ is constrained to move on the surface of the unit sphere in ~R N, and the system gets trapped in a local minimum. Since in the case of learning rule (5) the length of the q~ vectors is also changing, learning rule (5) offers an extra degree of freedom for the settling of the spatial filters, and thus it is more robust in the discretization procedure. Hence, we used (5) for further investigations of filter formation.

Hereinafter results of network training according to (5) are reported. The qij weights were initialized to random numbers greater than zero in such a way that the length of the initial q~ vectors is equal to L. The initial length L was set to be greater than 1. The learning rate e was initialized to a value denoted by el, then it was decreased at each training step by the same value (denoted by Ae) until reaching a final value denoted by e2. Beyond this training step e remained constant until the end of training. Table 1 summarizes the parameters of each simulations. Experiments 1 and 2 include four simulations performed with different neuron numbers.

The first two experiments were run on a 50-pixel- sized 1D torus surface. In experiment 1 different a~nv values were generated with equal probabilities, and the alnv generation was independent of the generation of rinv values. This experiment included four simulations with different neuron numbers (Table 1). In experiment 2 the ainp generation and the rinp generation were not independent of each other. Here parameter ainp was generated from [al(ri~p), a2(rinp)] instead of being generated from (6x, 02). Dependences 0"1,2(rinp) w e r e the following:

al,e(rlnp) = al.2 + Aalrlnp - rol (18)

where Aa was equal to 4.0, and ro was the position vector of a selected pixel on the torus surface. Taking position ro as the 'middle' of the torus, it can be said that the network at the 'sides' of the torus was input by larger patterns than in the 'middle'. Figure 2 shows the borderlines of winning position-size domains of neurons, i.e. the boundaries of the IV,. domains in X for experiments lc and 2. (From now on input patterns corresponding to the boundaries of some Wi domain will be called the borderline patterns of neuron i. That is, x is a borderline pattern of neuron i if x ~ Vi~)c~ V~ M) for some j.) This main result, which can be seen in Fig. 2, is that neurons discretized the position-size space both horizontally and vertically, i.e. they discretized the external world by both position and size. As can be seen on the right-hand side of Fig. 2, the arrangement of the Wi domains shows the position dependence of the input-size range. On the left- hand side of Fig. 2 the W,. domains are ordered into four groups for experiment lc on the basis of the corresponding input-pattern sizes. An important feature is that each

T a b l e 1, Parameters of the networks and network training rules

No. of experiment

1 2 3 4 and 5 6

No. of neurons (a) 25, (b) 50, 100 400 100 (c) I00, (d) 200

Size of the input space 50 50 18 x 18 50 L 2.0 2.0 2.0 2.0 ~1 0.2 0.2 0.1 0.2 ~2 1 0 . 3 10 -3 5 X 10 . 3 1 0 . 3

Ae 5 x 1 0 - 7 5 x 1 0 - 7 10 . 7 5 X 1 0 - 7

a 1 0.777 0.777 0.777 0.777 a z 14.0 12.0 4.5 14.0 ~ w - - - 10 -3

200

15 x 15 2.0 0.1 5 x 10 .3 10 .7 2.0 4.5 10-3

43

EIO 7

i i "g 5 " 6 6

~-3

' ' �9 1 0 10 20 30 40 50 0

Position of pottern

. . . . . . . . i . . . . . . . . . i . . . . . . . . . i . . . . . . . . .

50 1 O0 150 200 Ordered neuron index

15 I

0 o . . . . . . . i ' 6 . . . . . . . i 6 . . . . . . . 5 b . . . . . . . 4~5 . . . . . . . 5o

Position of pottern

Fig. 2. Winning domains of neurons for experiment lc (left-hand side) and 2 (right-hand side). The network consisted of 100 neurons in both cases. For experiment 1, the input size range was independent of the position; for experiment 2, the input size range was a function of the input position - in the 'middle' of the input space ai.p values were generated from the interval (0.777, 12.0), while at the sides ai,p values were generated from (4.777, 16.0). Between these extremes the input size range varies linearly with the position

of these g roups of Wi d o m a i n s covers the surface of the to rus m o r e or less uniformly.

Radi i of the s tabi l ized spat ia l filters, o rde red according to their sizes for exper imen t 1, are shown in Fig. 3. The different curves c o r r e s p o n d to different neuron number runs. As can be seen in Fig. 3, neurons form groups of different filter sizes for each s imula t ion . These g roups are sepa ra t ed by gaps. Wi th in each neura l g roup the sizes of

Fig. 3. Magnitude-ordered filter sizes R~ for different neuron numbers for experiment 1. Curves a, b, c and d correspond to 25, 50, 100 and 200 neurons, respectively

filters are near ly the same. G a p s t ruc ture becomes finer (more groups are formed) and tends to d i s a p p e a r as the n u m b e r of neurons grows. F o r the G a u s s i a n - s h a p e d inputs, the region defined by the e l and a2 values corresponds to the range Rx e [0.55, 9.90]. However , the minimum and m a x i m u m values of the pa t t e rn radi i are somewha t different from these values. There are two reasons for these differences, one of which is the p o o r resolu t ion of small pat terns . The o the r r eason is tha t the sizes of the biggest pa t t e rns are c o m p a r a b l e to the size of the torus, so the edges of the G a u s s i a n in tens i ty d is t r ibu- t ion were cut off when they were p ro jec t ed to the surface of the torus. I t can, however, be s ta ted tha t the range of filter sizes tends to suit the range of i npu t sizes.

To achieve a deeper unde r s t a nd ing of filter f o r m a t i o n (see the discussion of the results in Sect. 5), the over laps of spat ia l filters, i.e. the over laps of the recept ive fields of neurons, were also inves t iga ted in expe r imen t 1. The O(i, k) over lap of filters i and k is defined by the expression:

O(i, k) = ~" qo qk~ (19) J

With in each filter size group, the over laps of the neighbour ing neurons were found to be a lmos t the same, with s t a n d a r d devia t ions of less than 4%.

44

The input activities of neurons for their borderline patterns were also investigated. Each neuron was investigated in experiments lc and ld. Input activities of a neuron for its borderline patterns were found to be almost the same, with standard deviations less than 2.5% for each neuron, except for neurons belonging to the smallest filter size group ('small neurons'). Here the poor resolu- tion of filters and the edge effect, i.e. the distortion of the shape of the 'borderlines' for neurons belonging to the smallest filter size group, set limits to achieve results of similar quality. With the exception of 'small neurons', the average input activities of the neurons for their own borderline patterns were also found to be almost the same.

Filter formation governed by (1) and (5) in the case of a 2D external world was investigated in experiment 3. Using periodic boundary conditions (torus surface), with the parameters shown in Table 1, filters became arranged into three distinct size groups. Three typical spatial filters representing the three different filter size groups are displayed in Fig. 4. The circular shape and the Gaussian character of the filters are well indicated in the figure. As can be observed, the larger the size (width) of a filter, the smaller its amplitude, in accordance with the normalized length of filters. Experiment 3 was also performed for a 2D closed (not toroidal) world. Gap formation tenden- cies were found in this case, too. Filter size groups were formed by 'volume neurons', and the gap between them was conquered by the 'surface neurons', i.e. by neurons situated around the edges of the image.

4.2 Results of interneural connection formation

Development of interneural connections according to (11) to represent the neighbourhood relations of scaling filters was investigated in experiments 4, 5 and 6. In each experiment, filters were considered to be stabilized, i.e. no q~j training was performed parallel to the Wig training. Each connection strength W~k was initialized to zero and the learning rate ew was kept constant during the whole training.

In experiment 4 the filter system that was developed in a 1D external world under periodic boundary conditions in experiment lc was used for Wig training. This filter system consisted of 100 filters arranged into four distinct filter size groups (Fig. 3c). In experiment 5 the same parameters of network training were used (Table 1) but no periodic boundary condition was applied. Instead, input patterns were simply cut off at the edges of the input space. Results of experiments 4 and 5 are shown in Fig. 5. The upper panel of the figure applies to the periodic, the lower to the non-periodic boundary conditions. In both panels filters are represented by circles with sizes proportional to the sizes of filters. Interneural connections are represented by connecting lines, the strengths of which are proportional to the strengths of the corresponding interneural connections. In both panels the circles representing the filters are shown vertically shifted in proportion to the size of the filters; thus the horizontal positions of the circles correspond to the positions of the filters (r ~~ and the vertical positions of the circles corres-

pond to the sizes of the filters (Ri). The lower panel shows the edge effect clearly: due to the cutting off of the patterns no spatial filters with large receptive fields located close to the boundaries of the image were developed. It can also be seen that far from the edges filters tend to form the same group structure that can be observed in the upper panel. It follows that the formation of filter size groups is not a consequence of the periodic boundary condition.

As can be seen in Fig. 5, interneural connections have developed only between neighbouring neurons in both cases. This indicates that the interneural connections correctly represent the neighbourhood relations of the filters. Within a certain distinct size group (see upper panel) the connections also correctly represent the topology of the external world (1D in this case).

In experiment 6, Wik training was performed on a filter system developed on a 2D torus surface. This filter system consisted of two distinct filter size groups. Both groups covered the whole external world homogeneously in an approximately hexagonal structure. The average radii of the small and the large filters were 2.82 and 3.66 respectively, with less than 1% standard deviation for both groups. Results of Wik formation for experiment 6 are indicated in Fig. 6. This figure displays the small and the large filters in the left-hand and right-hand panels, respectively; interneural connections within both filter size groups are also displayed. As can be seen, interneural connections have developed only between neighbouring neurons, i.e. the topology is represented correctly by both neural groups, which was made possible by the proper separation of the filter size groups. The homogeneous coverage by both groups of filters and the emergence of an approximate hexagonal structure within the neuron groups are well illustrated in Fig. 6. Interneural connections between two neurons with a large and a small filter size, respectively, also developed and show similar properties.

5 Discussion

The results of computer simulations and the theoretical considerations indicate the following qualitative description of the behaviour of the network: (5) ensures the normalization of qi vectors. This means that the qi points will be positioned close to the N~-dimensional unit sphere of the input space ~R N" in the stationary state. For an n-dimensional external world the parameter space is (n + 1)-dimensional. As a result, the manifold M marked by the corresponding input pattern set is (n + 1)-dimensional. According to the WTA mechanism, neurons divide this region into the Vi tM) domains. The filters developed during network training were found to have approximately Gaussian shape. Hence the q~ points may be considered to be elements of M. (Actually, the q~ points cannot be elements of M, but can be as near to M as desired by increasing the number of neurons.) Based on the fact that the ql points are situated close to M, it is reasonable to assume that the radius of curvature of the manifold M is large compared with the diameters of the

v

3

0.4

0.3

0,2

O.

18 1 2 1 5

6 9 6 9

18

3 "~ '~ - ' ~6 9 6

Iooo3o 5 l g

O 3 9 6 9 6

Fig. 4. Typical spatial filters belonging to the three different filter size groups developed in experiment 3. Corresponding radii are 3.93, 2.96 and 1.95, respectively

Fig. 5. The upper panel and lower panel show spatial filters and interneural connections of experiment 4 (periodic boundary conditions) and 5 (non-periodic boundary conditions), respectively. The circles and the connecting lines represent the filters and the interneural connections, respectively. The sizes of circles are proportional to the sizes of the spatial filters; the thicknesses of connecting lines are proportional to the strengths of the connections. (The circle radii are not to scale!) The horizontal position of a circle corresponds to the position of the filter (denoted by r <i) for neuron i in the text), the vertical position of the circle corresponds to the size of the filter (denoted by R, for neuron i in the text)

given V~ tM) domains . Thus every V~ r d o m a i n can be cons idered as a p lane figure. 1 Consequent ly , the results of

1 Note that for the present simulations the above approximation is not valid for the smallest patterns covering only a few pixels. This is the very reason why 'small neurons' differed from the others in the statistical investigations on 'borderline patterns'

45

Fig. 6. Spatial filters of experiment 6, represented by circles. Filters with small receptive fields are shown on the left; filters with large receptive fields are shown on the right. Circle radii are not to scale. Interneural connections are also shown. The thicknesses of the connecting lines are proportional to the strengths of the corresponding interneural weights

the stat is t ical invest igat ions on the activi t ies for ' bo rde r - line pa t t e rns ' can be summa r i z e d as follows: (i) The small devia t ions in inpu t activit ies of neurons for thei r bo rde r - line pa t te rns mean tha t in the case of a 1D externa l wor ld the b o u n d a r y of each Vi (M) d o m a i n a p p r o x i m a t e s a 2D circle in M. (Fo r an external wor ld of la rger d imens iona l - ity, they would a p p r o x i m a t e spheres.) (ii) The resul t tha t the average act ivi ty of the borde r l ine pa t t e rns of a neuron is app rox ima te ly the same for all the neurons means tha t the sizes of the Vi (M) domains , i.e. the radi i of the 'circles' , are a pp rox ima te ly the same. In view of the p roper t i e s of the m a p p i n g defined by (12), the a b o v e a s sumpt ion is suppo r t ed by the obse rva t ion tha t the grea ter the tr values are for the Vi ~M) doma ins of the same size, the greater the co r r e spond ing W/ d o m a i n in the X (see Fig. 2). F r o m the a p p r o x i m a t e equa l i ty of the sizes of the

46

Vi ~M) domains one can suppose that the q~ points are arranged homogeneously on M. Homogeneous arrangement then results in each neighbouring neuron pair having approximately equal overlap in accordance with the numerical results.

In both one and two dimensions a tendency for size clusters to form has been observed within the parameter range investigated. The size groups are made up of neurons whose filters can, together, cover the external world more or less uniformly. The number of neural groups depends mainly on the number of neurons. If the number of neurons is increased, the gap structure becomes finer (more groups are formed); if the number of neurons is increased even further the gap structure will further di- minish. This behaviour may be explained in accordance with our finding, viz. that the sizes of the Vi ~u) domains are approximately the same. With the help of (12) it is easy to show that two sets of curves characterized by the constant ai,p parameters (a-curves from now on) and characterized by the constant rlnp parameters (p-curves from now on), respectively, form two sets of curves per- pendicular to each other on M. The p-curves are geodetic. The g-curves are 'parallel' to one another, i.e. the distance between two a-curves along any p-curve is equal. This statement, however, is not true for the p- curves. M is bordered on two opposite sides by the a-curves, belonging to parameters al and a2 (hereinafter referred to as al-curve and a2-curve, respectively). Since the Vi ~u) domains are approximately of the same size, the number of 1/",- ~M) domains along the p-curves on M is defined only by the ratio of the distance between the al-curve and the aa-curve and the average diameters of the V/M) domains, which ratio is the same along any p-curve. As the distance between two p-curves is different along different a-curves, unlike for the a-curves, the number of filters between two p-curves will be different along different g-curves. This is one of the reasons why the Wi domains can form size groups. A further condition for the formation of size groups is that the size of M along the p-curves should be small enough in comparison with its size along the a-curves. (Note, however, that the size of a pattern cannot be larger than the size of the external world.) Still, under such conditions, and with a few neurons only, neurons form a single group and discretize X only by position. With the increase in the number of neurons the size of the Vi ~M) domains decreases, and more than one filter group may be formed. The groups settle according to the boundaries of M (see e.g. Fig. 2). Size groups form if long a-curve boundaries exist. The order, however, decreases with increasing distance from the borders. This is illustrated by Fig. 3d, showing that the deviation of the filter sizes remains small within the two extreme size groups, while the 'inner' filters do not form distinct size groups: the effect of the boundary constraints set by M diminishes with increasing number of neurons.

6 Conclusions

In the present work, the problem of discretization of the external world by both position and size was discussed.

Suitable ways of network training were suggested as a means of developing scaling filters in a self-organizing way. The most important result was that neurons could discretize the parameter space by each of the independent parameters, i.e. the scaling filters discretized the external world by both position and size. Properties of the scaling filter system were investigated both theoretically and by computer simulation. The results of these simulations show that the range of filter sizes scales with the size of the input samples. Below a certain number of neurons, filters are ordered into size clusters with gaps in between. The neighbourhood relation of scaling filters was also investigated. Two definitions - extensions of the definitions in Martinetz and Schulten (1994) - of the neighbourhood relation were introduced. Their equivalence was also shown under reasonable conditions. Using local extended input objects and competitive Hebbian learning for network training, an interneural connection set representing the neighbourhood relations of sealing filters was also developed. For properly separated size groups, a correct representation of topology could be achieved within each filter size group. Consequently, a self- organizing multi-grid can be developed under certain conditions with the help of scaling filters.

It was not intended that biological systems should be modelled in the present investigations. The basic issue, however, namely the formation of receptive fields of different sizes, is of relevance, e.g. for the case of the visual cortex (Lund 1990). Bearing in mind this point our con- clusion is that very simple self-organizing rules may develop such systems.

Appendices

Appendix 1

In order to derive a formula for Ri of the scaling filters, the train of thought used in Obermayer et al. (1990) is followed. We assume that the system has reached its stationary state, i.e. the expectation value of each qo weight is constant. Thus the expectation value of each filter size Ri is constant also. Hence the expectation value of the change of R~ should be equal to zero:

(AR~) = 0 (20)

Denoting the probability density of input patterns x by P(r., Rx), where rx is the position and Rx is the size of input pattern x, the expectation value (AR~) can be calculated as:

( AR~) = S S AR~P(rx, R.) drxdR, z (21)

Here integration runs over all rx and R. values. The AR 2 can be calculated as:

AR~ = ~ -~qoAqi~, (22)

Using the definitions (2) and (3), it is easy to show that

OR 2 = (rj -- r~/)) 2 -- a 2 (23)

Oqlj ~,,ql,

Using the definitions of r~ and Rx - also given by (2) and (3), respectively (see Sect. 2.1) - and the expression (5) for Aq~j, the expression for AR 2 is:

AR 2 = 6i~(~) (R~ + (r~ - r(i)) 2 - R~) (24)

Substituting this expression into (21) and taking (20) into account, one can easily derive (13).

Appendix 2

Here the equivalence of the two parameter space based definitions for neighbourhood relations of neurons is proved under the conditions mentioned in Sect. 3.2. Since (Win Wj) c Wij, Def. 2*. evolves from Def. 1.*

To prove the equivalence, the opposite direction is also considered. We assume further on that ~ is a point in X so that ff e Wij, that is: neuron i and neuron j are adjacent according to Def. 2*. First we will prove in an indirect way that ~ ~ W~uW~. Let us assume that ~r Wj. As a result, there exists k v~ i,j so that dk(~) ~> di(~) a n d dk(~) > dj(~), w h i c h c o n t r a d i c t s the a s s u m p -

t i on that ~ ~ Wij. Without loss of generality ~ ~ Wi may be assumed. Under this assumption it can easily be shown in an indirect way again that ~ e W~(j). Let us assume that ~r that is: there exists k v ~ i,j so that dk(~) >dj(~). This contradicts the assumption that

(~ Wlj. Hence, if ~ ~ W~j, without loss of generality r ~ Wi

and ~ ~ Wi(j) may be assumed. To prove that the above assumptions result in the existence of an ~* ~ Win Wj (i.e. Def. 1" evolves from Def. 2*), let us assume that r162 (If ~ ~ Wj, ~ is point ~*.) It is true that ~ W , ( k ) covers the whole X and Wk c W~(k) for all k. Let us consider a point r/so that r/~ Wj. Since Wj c Wi(j), it is true that r/~ Wi(j). Since each Wi(k) is assumed to be a pathwise connected domain in X, there exists a con~ tinuous curve ? originating at point ~/and ending at point

so that each point of 7 belongs to Wi(j). Since according to the assumptions di(~)> dj(r and dj(r/)> di(r/) and each di(.) function is continuous, there must be a point ~* ~ 7 for which di(~*) ~ dj([*). Since ~* ~ Wi(j) for ~* it should be true that ~* ~ Win Wj. This means that Def. 1" evolves from Def. 2*. Thus the equivalence is proved.

It is easy to show that if conditions (i) and (ii) of Sect. 2.4 are fulfilled then conditions (a) and (b) of Sect. 3.2 are also fulfilled. The opposite case is not true in general; take, for example, the case when the q; values are not dense on M everywhere, that is condition (ii) is not fulfilled but conditions (a) and (b) may still be fulfilled.

Acknowledgement. The authors are grateful to the referees for their valuable comments. This work was partially supported by OTKA grants no. T 014566 and T 017110.

47

References

Connolly CI, Grupen RA (1993) On the applications of harmonic functions to robotics. J Robotic Syst 10:931-946

Dyer CL (1987) Multiscale image understanding. In: Parallel computer vision. Academic Press, New York, pp 171-213

Fomin T, Szepes~,ari C, L6rincz A (1994) Self-organizing neurocontrol. In: Proceedings of the IEEE International Conference on Neural Networks. Orlando, Fla, IEEE Piscataway, pp 2777-2780

Fritzke B (1991) Let it grow - self organizing feature maps with problem dependent cell structure. In: Proceedings of ICANN, vol 1. Else- vier, Amsterdam, pp 403-408

Glasius R, Komoda A, Gielen S (1995) Neural network dynamics for trajectory formation and obstacle avoidance. Neural Networks 8:125-133

Grossberg S (1976) Adaptive pattern classification and universal recod- ing. I. Parallel development and coding of neural feature detectors. Biol Cybern 23:121-134

Grossberg S (1987) From interaction activation to adaptive resonance theory. Cogn Sci 11:23-63 (and references therein)

Kohonen T (1984) Self organisation and associative memory. Springer, Berlin Heidelberg New York

Lei G (1990) A neuron model with fluid properties for solving labyrin- thian puzzle. Biol Cybern 64:61-67

Lund JS (1990) Excitatory and inhibitory circuitry and laminar mapping strategies in the primary visual cortex of the monkey. In: Signal and sense: local and global order in perceptual maps. Wiley, New York

Luttrell SP (1994) A Bayesian analysis of self-organizing maps. Neural Comput 6:767-794

Marczell Z, Kalmhr Z, L6rincz A (1996) Generalized skeleton formation for texture segmentation. Neural Network World 6:79-87

Marshall JA (1990) A self-organizing scale-sensitive neural network. In: Proceedings of the International Joint Conference on Neural Net- works, San Diego, CA, IEEE Piscataway, vol 3, pp 649-654

Marshall JA (1992) Development of perceptual context-sensitivity in unsupervised neural networks: parsing, grouping, and segmentation. In: Proceedings of the International Joint Conference on Neural Networks, Baltimore, MD, IEEE Piscataway, vol 3, pp 315-320

Martinetz T (1993) Competitive Hebbian learning rule forms perfectly topology preserving maps. In: Proceedings of ICANN. Springer, Berlin Heidelberg New York, pp 427-434

Martinetz T, Schulten K (1991) A 'neural-gas' network learns topolo- gies. In: Proceedings of ICANN, vol 1. Elsevier, Amsterdam, pp 397-402

Martinetz T, Schulten K (1994) Topology representing networks. Neu- ral Networks 7:507-522

Obermayer K, Ritter H, Schulten K (1990) A neural network model for the formation of topographic maps in the CNS: development of receptive fields. In: IJCNN-90, Conference Proceedings II. San Diego, IEEE Piscataway, pp 423-429

Ritter H, Obermayer K, Schulten K, Rubner J (1991) Self-organizing maps and adaptive filters. In: Models of neural networks. Springer, Berlin Heidelberg New York, pp 281-307

Rozgonyi T, Fomin T, L6rincz A (1994) Self-organizing scaling filters for image segmentation. In: Proceedings of the IEEE International Conference on Neural Networks, Orlando, Fla, IEEE. Piscataway, vol 7, pp 4380-4383

Szepes~,'ari C, L6rincz A (1993) Topology learning solved by extended objects: a neural network model. In: Proceedings of the World Congress on Neural Networks, vol 2. Erlbaum, Hillsdale, N J, pp 497-500

Szepesvhri C, Balfizs L, L6rincz A (1994) Topology learning solved by extended objects: a neural network model. Neural Comput 6: 441-458

self-organized formation of a set of scaling filters and their neighbouring connections

Documents