associative memory with a sparse encoding mechanism for storing correlated patterns

Pergamon

PII: S0893-6080(97)00038-5

NeuralNetworks, Vol. 10, No. 9, pp. 1627-1636, 1997 © 1997 Elsevier Science Ltd. All rights reserved

Printed in Great Britain 0893 -6080/97 $17.00+.00

CONTRIBUTED ARTICLE

Associative Memory with a Sparse Encoding Mechanism for Storing Correlated Patterns

MAKOTO HIRAHARA,1 NATSUKI OKA 1 AND TOSHIKI KINDO 1,2

IMatsushita Research Institute Tokyo, Inc. and 2Japan Science and Technology Corporation

(Received 8 November 1996; accepted 26 March 1997)

Abstract--H. Gu(freund (Neural networks with hierarchically correlated patterns. Physical Review A, 37, 570-577, 1988) has proposed a model for storing hierarchically correlated patterns where ancestor patterns are correlated with descendant ones. However, there is a problem of small storage capacity. Furthermore, we must give ancestors in the learning phase, and determine the value of the parameter on which the capacity and the basins of attraction strongly depend. To overcome these problems, we present a model (CASM) consisting of the first associative memory to form ancestors from their descendants and the second associative memory to store sparse difference patterns which have only information on differences between the formed ancestors and the descendants. To evaluate the performance of CASM, extensive simulations are carried out. The results show that the capacity increases with increasing correlation between the ancestors and the descendants, and is as large as that of sparsely encoded associative memory. The basins of attraction become larger with decreasing correlation, and do not depend on loading level. © 1997 Elsevier Science Ltd. All rights reserved.

Keywords--Associative memory, Cascade, Hierarchically correlated patterns, Difference patterns, Sparse encoding, Storage capacity, Basins of attraction, Ancestor formation.

1. INTRODUCTION

Associative memory models have been proposed by Anderson (1972), Kohonen (1972) and Nakano (1972). Since then, many studies on associative memory have followed. Hopfield (1982) demonstrated that the storage capacity c~ c of associative memory for storing uncorrelated patterns is about 0.15, where ~c is the ratio between the maximum number of patterns to be stored as equilibria and the number of units. If stored patterns are correlated, the standard associative memory models do not work well. This is due to the fact that, in recalling a stored pattern, the crosstalk noise generated by the other stored patterns does not average to zero (Amit, Gutfreund, & Sompolinsky, 1987). Hence, the standard models are not suitable for problems in practical

Acknowledgements: The authors thank Takehisa Tanaka for useful comments and helpful discussions, and Masaaki Satoh and Hideyuki Yoshida for help with the numerical simulations. We also express our appreciation to the referees for their useful comments that helped to

improve this paper considerably. Requests for reprints should be sent to Makoto Hirahara, Matsushita

Research Institute Tokyo, Inc., 3-10-1, Higashimita, Tama-Ku, Kawasaki 214, Japan. Tel: +81-44-911-6351 ; Fax: +81-44-911-8760; E-mail: [email protected]

1627

situations where many objects are similar to one another. This problem, as well as small storage capacity, greatly reduces the possibility of their application in the field of engineering.

Here we consider two types of correlation. One is caused by biased patterns in each of which the number of + 1 components is not equal to that of - 1 components. The biased patterns, as well as uncorrelated patterns, are independent of one another. The other is caused by correlated patterns similar to each other.

Amit et al. (1987) proposed a model for storing biased patterns. When the bias of patterns is large, they are said to be sparse. Amari (1989) has shown that ac increases drastically as stored patterns become sparser. However, these models do not work well if stored patterns are similar to one another.

There are also studies dealing with a special case of correlated patterns, namely hierarchically correlated patterns (Feigelman & Ioffe, 1987; Gutfreund, 1988; Hirahara, Oka, & Kindo, 1996). In the case of two- level hierarchy, there are the first and the second-level patterns, referred to as ancestors and descendants, respectively. The descendants are strongly correlated with their ancestors, and the descendants belonging to the same ancestors are also correlated with one another.

1628 M. Hirahara et al.

Feigelman and Ioffe (1987) proposed a model for storing such hierarchically correlated patterns. The ancestors and their descendants are embedded in the same weight matrix. They have shown that the storage capacity is small and is of the same order of magnitude as that of Hopfield model. Guffreund (1988) proposed a different type of model consisting of two associative memory models, each of which has N units. The first associative memory (ASM1) and the second one (ASM2) store the ancestors and their descendants, respectively. When the model receives a descendant as input, ASMI recalls its ancestor. The recalled ancestor is projected down to ASM2, which restricts the dynamics of ASM2 and enables ASM2 to recall the descendant. The storage capacity c~ c and the basins of attraction around the descendants strongly depend on a parameter which controls the strength of the restriction from ASM1. Although there is a range of parameter values which ensures the stability of the descendants, the range depends on correlation between the ancestors and their descendants, and becomes narrower with increasing correlation. Hence, we must determine the parameter value carefully. With an appropriate parameter value, c~ c is about 0.15 which also depends on the correlation between the ancestors and their descendants.

Although the above two models are very attractive in terms of organizing hierarchical memory structure, the storage capacity a c (~0 .15) is insufficient for their application in the field of engineering. To overcome this problem, we present a cascade of associative memory models, called CASM (Hirahara et al., 1996). Although CASM is similar in structure to Gutfreund's model, CASM does not have the troublesome parameter that Gutfreund's model has. The main difference is that CASM has a mechanism of generating sparse patterns, called "difference patterns' ', to be stored in ASM2. The difference patterns have only information on the differences between the ancestors and their descendants. They are biased patterns and become sparser with increasing correlation between the ancestors and their descendants. These characteristics of the difference patterns indicate that the storage capacity a c increases with increasing correlation, and is as large as that of sparsely encoded associative memory (Amari, 1989). Although the size of the basins of attraction becomes smaller with increasing correlation, simulation results show that CASM has larger basins of attraction than both Gutfreund's model and standard associative memory which stores uncorrelated patterns when the loading level c~ = 0.1 (the ratio between the number of descendants to be stored and N). Furthermore,the size of the basins of attraction does not depend on the loading level c~, and is almost constant at a fixed value of the correlation.

In addition, we consider the situation where only the descendants are given and their ancestors are not. In order to cope with the situation, we extend ASM1 to perform ancestor formation, that is, automatic formation of ancestors by learning their descendants.

A procedure for generating hierarchically correlated patterns is described in Section 2. in Section 3, the learning and the recalling processes are introduced. In Section 4, ASM1 is extended to form ancestors from their descendants. To evaluate the performance of CASM, a large number of numerical simulations are demonstrated in Section 5.

2. H I E R A R C H I C A L L Y C O R R E L A T E D P A T T E R N S

We describe a procedure for generating a two-level hierarchical tree of correlated patterns (Gutfreund, 1988). At the first level Pl ancestors ~ ( # = 1 . . . . . Pl) are generated. Each component ~ ( i = 1, ..., N) takes + 1 inde- pendently with the probability

1 1 P r l ( ~ ) = ~(1 + a )6 [~ -- 11 4- ~(1 - a ) f i [ ~ 4- 11,

where f i[u]= 1 for u = 0 , and zero otherwise. The ancestors are characterized by a bias parameter a ( - l < a < l ) .

At the second level, P2 descendants ~'~(v = 1 . . . . . P2) are generated for each ancestor ~ . Each component ~" ( i = 1 . . . . . N) takes + 1 with the probability.

1 1 1 P r 2 ( ~ v) = ~( 4- ~ b ) ~ [ ~ v - 1] 4- ~(1 - ~ b ) 6 [ ~ ~ 4- 1],

where b(0 < b < 1) is a correlation parameter between ~ and ~ . This yields

E [ ~ ,] = f b i f / , = /z ' (1)

a2 b otherwise,

where E[u] indicates the average of u. The generalization to any number of levels is straightforward. In order to simplify the following discussion, we assume that

O < b < b 2 < . . . < b i < b i + l < . . . < l , (2)

where b i is a correlation parameter between the ith- and the (i + 1)th-level patterns.

3. M O D E L

We describe two levels of CASM, since the generalization to any number of levels is straightforward. The structure is shown in Figure 1, where ASM1 and ASM2 are the first and the second associative memory models, respectively. In the learning phase, ASM1 stores the ancestors ~(/~ = 1, ..., Pl), and ASM2 stores sparse patterns, called difference patterns, which have only information on the differences between the ancestors ~ and their descendants ~ . This sparse encoding is realised by CN1 which emits a pattern by multiplying corresponding components of ~u and ~u~. In the recalling phase, the sparse encoding mechanism of CN1 generates a sparse pattern to be input to ASM2 by multiplying corresponding components of a pattern input to CASM

Associative Memory f o r Correlated Patterns 1629

t input ~(i.)

]ASia ] ancestor ~u , T

difference [ pattern rfv~

ASM2

output ~(~t)(~(i.))

FIGURE 1. Structure of CASM. ASM1 and ASM2 are the first and the second associative memory models, respectively. ASM1 stores the ancestors ~". ASM2 stores sparse difference patterns ~ ' generated by a sparse encoding mechanism of CNI. The difference patterns ¢ ' contain only information on the differences between ~" and their descendants ~ ' . CN1 and CN2 have the same function which emits a pattern made by multiplying corresponding components of two input patterns.

(ASM1) and an ancestor recalled by ASM1. After ASM2 recalls a difference pattern, CN2 produces an output pattern of CASM by multiplying corresponding components of the ancestor recalled by ASMI and the difference pattern recalled by ASM2. Therefore, CN1 and CN2 have the same function which emits a pattern made by multiplying corresponding components of two input patterns. In each of the learning and the recalling phases, ASM1 precedes ASM2.

3.1. ASM1

The ancestors ( " (# = 1, ..., P1) are stored in the weights (1)..

wij i t = 1 ..... N; j = 1 . . . . . N) of ASM1 as follows:

wO) = 1 ij (N -- 1)(1 -- a 2) ( ~ -- a ) ( ~ -- a). (3)

Let ~(in), x(l)(/) and h(1)(t) be a pattern to be input to CASM, the state of ASMI at time t, and the inner state, respectively. Then, the dynamics of ASM1 is described by

XII)(0) ~--- ~.}in),

hl l ) ( t )= Z (l), (l),_ 1 ) _ S l ) + 0 1 , Wij I, Xj ( I - -

xll)(t) = sgn[hll)(t)],

where sgn[u] = 1 for u--> 0, and - 1 otherwise. The threshold of each unit in ASM1 is denoted by 01 and s] is a parameter which shifts the state x ( ] ) ( t - 1). Okada, Mimura, and Kurata (1993) have shown that the storage capacity of associative memory for storing biased pattems strongly depends on both 01 and Sl. Taking this into account, we set 0] ----a and s 1 = a. Iterating the above dynamics, ASM1 converges to a stable state ~-(1)(~.(in)).

3 . 2 . CN1

CN1 acts as a sparse encoder which emits a sparse pattern by multiplying corresponding components of a pattern ~.(in) input to CASM and a stable state ~-0), ~-(in), of ASM1. In the learning phase, CN1 produces the patterns ~"~ to be stored in ASM2 as follows:

~ = ~P~'}])(~) . (4)

Here, we consider characteristics of the patterns ~f~, assuming that the storage capacity of ASM1 is larger than P 1 / N and the correlation b is large enough. This assumption implies that ~"~ is in the basin of attraction around ~ . That is, ~ -0 ) (~) is assumed to be identical to ~ , which assumption is written in the form

1 ~ / ~ . 1 1 ) ( ~ ) = 1 for all Iz and v. (5) N

Then eqn (4) can be rewritten as

1 if ~ -- ~

' ' - 1 otherwise.

Each pattern ~"~ therefore contains only the information on the differences between ~" and ~"~. We thereby refer to ~"~ as difference patterns.

From eqn (1), each component ~ of the difference patterns ~" " (#= 1 . . . . . P1; v = 1 . . . . . P2) takes -+1 with the probability

Pr2,(~/~ ~) = 1(1 + b)617/~ ~ - 1] + ~(1 - b ) 6 [ ~ ~ + 11,

which yields the following relationships:

E[n~] = b, (6)

[ , ~ , / ] { 1 i f # = i z ' a n d v = v '

E ~/i ~i : b 2 otherwise (7)

Therefore, the difference patterns are biased ones whose bias parameter is equal to the correlation b. If eqn (2) is satisfied, the probability that the components of difference patterns in a level are +1 increases as the level becomes higher. That is, difference patterns in a high level are sparser than those in a low level.

Therefore, in the learning phase, CN1 transforms the descendants ("~ (correlated patterns) into the sparse difference patterns ~/"~ (biased patterns). In the recalling phase, CN1 produces a pattern to be input to ASM2, as described in Section 3.3.

3 . 3 . A S M 2

ASM2 stores the difference patterns which are biased if eqn (5) is satisfied. Their bias is equal to the correlation b between ~ and ~ . These sparse difference patterns ~ ( # = 1 . . . . , p1, v = 1 . . . . , p2) are stores

(2).. in the weight wij U = 1 . . . . . N; j = 1 . . . . . N ) of ASM2


as follows:

W(2) = 1 ij (U - 1)(1 - b 2) E (~/~' - b)(~/~ ~ - b).

be, 1'

In Gutfreund's model (Gutfreund, 1988) ASM2 stores the descendants themselves which are correlated patterns similar to one another.

The dynamics of ASM2 starts at time t = tc -t- 1, where t~ is the time when ASM1 converges to ~-0)(~-(i~)). Let x(2)(t) and h2(t) be the state of ASM2 at time t( > - t,. + 1) and the inner state respectively. Then, CN1 produces the initial state x(2)(t~ + 1) of ASM2 given by

xl2)(tc + 1 ) = ~'lin)~"l') (~'(in)), (8)

following which x(2)(t) is changed as

hl 2)(t) E (2), (2),_ = wij (x) [ t - - 1) - - s2 )+02 , (9) j ~ i

xl2)(t) = sgn[hl2)(t)], (10)

where 02( = b) is the threshold of each unit in ASM2, and s2(= b) is a parameter which shifts the state x(2)( t - 1). Iterating the dynamics, ASM2 converges to a stable state ~.(2) (~.(in)).

3.4. CN2

From the above, the recalling process of CASM can be divided into two stages. In the first stage (t ~< G), ASM1 recalls an ancestor. In the second state (t > t,), ASM2 recalls a difference pattern. In accordance with the two recalling stages, CN2 produces the state x( t ) of CASM as follows:

f xll)(t) if t --< t~. xi(t)

(1) (in) ~(2)t÷x [ ~-i (~" )hi u) otherwise.

When ASM2 converges to ~.(2)(~,(i~)), CASM also converges to the stable state ~.(out)(~.(in)) given by

¢l°ut)(~ "(in)) : ~'ll)(~'(in))~'12)(~'(in)).

For example, if CASM received ~'~ as input, ASM 1 and ASM2 will recall ~ ' and ~ , respectively. Then, ~.(out)(~,~) becomes

~' I ° °~(~ ~) = ~'.-,'~ ~,~ i q i ~ i "

Therefore, CASM recalls ~ through CN2.

3.5. Storage Capacity

The storage capacity c¢ c is roughly estimated using signal-to-noise ratio analysis, assuming that eqn (5) is satisfied. In accordance with Gutfreund (1988), the loading level c~ is defined by c~ ~ PIP2/N, although CASM as well as Guffreund' s model consists of 2N units. Then, c~ c is defined as the maximum value of a below which all descendants are stored as equilibria.

When (ll is input to CASM, the initial state x(2)(tc + 1 ) of ASM2 becomes

,~(2) , - .

since we assume that eqn (5) is satisfied. Then, the inner state at time t = t c + 2 is written in the form

(2) h i (tc + 2) = S i + Ri,

S i = ~ ] 1 ,

1 ~ ' * ( ~ Ri = ( N - 1)(1 - b 2) / - - r/i - b) E (~/~v _ b)0/) 1 _ b),

Iz, ~ j:£ i

where ~ , ~ indicates that the term satisfying /, : 1 and v = 1 is excluded from ~ , ~. Si and R i indicate the signal and the crosstalk noise, respectively. From eqns (6) and (7), the average noise E[Ri] is zero, and the variance V[Ri] is about PIP2(1 - bZ)/N. These results imply that c~ c increases with b as follows:

Ot C ce c - l _ b 2 (11)

where a~ is about 0.15 which is the storage capacity of associative memory for storing uncorrelated patterns. If we use Amari ' s method (Amari, 1989), we can prove that c~ c is as large as that of Amari 's sparsely encoded associative memory. CASM thus has larger storage capacity that Gutfreund's model (Gutfreund, 1988).

4. ANCESTOR FORMATION

Above, we assumed that the ancestors were given explicitly. In this section, we consider the situation where only the descendants are given and their ancestors are not. In order to cope with this situation, ASMI is extended to form the ancestors from their dependants.

Here, we consider the situation where many patterns are distributed around some representative patterns. Automatic formation of the representatives, through the learning of the patterns generated from the above distri- bution, is called "concept formation". Amari (1977) proposed a model for concept formation which learns an infinite number of patterns one by one based on a covariance learning rule with a weight decay term. If we regard ancestors as the representatives and their descendants are distributed around them, we can apply Amari 's model to ASM1. However, it is difficult to form the ancestors correctly when the number of descendants for each ancestor, Pz, is small. This is because the descendants belonging to the same ancestor are not always distributed uniformly around it if P2 is small. Hence, we use, not the learning algorithm of Amari ' s model, but the following simpler one:

w(I) = 1 ij ( N - 1)(1 - a 2) E ( ~ - a ) ( ~ ~ - a ) " (12)

bt, V

Associative Memory for Correlated Patterns 1631

We refer to patterns formed by ASM1 as concept patterns ~'~. Even if ~ ' ' is not identical to the corresponding ancestor ~ , CASM works well, as shown in the Section 5.

5. N U M E R I C A L S I M U L A T I O N S

In order to confirm the behaviour of CASM, we carried out four kinds of numerical simulations. Since ~/~ is sparse, we define "successful recal l" so as to depend on its sparseness as follows:

N ( - ) 1 ~ i ~ ~'l°"')(y)l -< 0 A - - 2N • N '

where N <- )( = N(1 - b)/2) is the number of - 1 components in ~7 ~ , and ~'~ is a noisy input pattern originating from ~ . The relationship between ~ and ~'~ is expressed by the initial overlap m(0):

1 re(O) = ~ / y . . ~ ' ~ ' ~ .

-7-

5.1. Simulation 1: Storage Capacity

The dependence of the storage capacity C~c on the correlation b was examined by inputting descendants ~ '~ (m(0)= 1), where N = 1000, P1 = 2 0 , and a = 0 . 2 . In this simulation, ~c was defined as the maximum value of a, below which the average probability of successful recall over 10 trials was above 0.99. Each trial involved presenting every input pattern once to CASM, after which new patterns were generated for the next trial. Furthermore, ancestors were given explicitly, so that ASM1 stored them in the form of eqn (3).

The results are shown in Figure 2 where the horizontal and vertical axes indicate the correlation b and the storage capacity c~ c, respectively. By obtaining data from Figure 1 in Gutfreund (1988), we also plotted the

0.6

o

0 . 5 M "

o. 0.4 0

~ 0.3

0.2 + ,--- Gutfreund's model 9 +

0.1 . . . . . . 0.7 0.74 0.78 0.82 0.86 0.9 0.94 0.98

Correlation b

FIGURE 2. Dependence of the storage capacity ~c on the correlation b, where N = 1000, .0 t = 20, and a = 0.2. The storage capacity of Gutfreund's model (Gutfreund, 1988) is indicated by "+". CASM has larger storage capacity than Gutfreund's model. The storage capacity (~c increases with b.

storage capacity of Gutfreund's model indicated by " + " . CASM therefore has the larger storage capacity than Gutfreund's model. Furthermore, c~ c increases sharply with b up to b = 0.96, which almost agrees with our pre- dication as in eqn (11). When b > 0.96, C¢c, on the con- trary, decreases with increasing b, which is caused by the small value of N. In the case of N = 2000, we observed a similar phenomenon where b > 0.97.

5.2. Simulation 2: Recalling Process

As in Amari and Maginu (1988), we demonstrated the dynamical process of recalling descendant ~11 by inputting ~.n, w h e r e N = 1000, P1 = 2 0 , a = 0 . 2 , and b = 0 . 8 . In this simulation, ancestors were given explicitly, so that ASM1 stored them in the form of eqn (3). In order to analyze the recalling process of CASM, we introduced the following three kinds of the overlaps:

1 ml(0= ~ ~ ~]'x](t),

1 ~i ~i (~ )Ai ~t) (t >- t c + 1), m 2 ( t ) = N . ~ l (1) 11 . . ( 2 ) , . ,

1 / ml( t ) if t--< t~ m ( t ) = ~ ~-~. ~)lXi(t)=

-7" • m2(t) otherwise.

The overlap between ~11 and the state xO)(t) of ASM1 is indicated by m 1 (t). If the stable state ~-o)(~-11) of ASM 1 is identical to ~-0)(~11), m2(t) indicates the overlap between the difference pattern rl ~ and the state x(2)(t) of ASM2. The overlap between ~11 and the state x(t) of CASM is denoted by m(t).

Figure 3 demonstrates the dynamical processes of recalling ~11, where Pe = 10(c~ = 0.2 < C~c). The time courses of ml( t ) are shown in Figure 3(a). Because E[~]l~]]=b, ml( t ) approached b ( = 0 . 8 ) . The critical overlap ml C of ASM1 was 0.2 which was the minimum value of m(0) above which ASM1 recalled the ancestor ~1 correctly. The results show that ASM1 has the large basin of attraction around ~1, which supports our assumption eqn (5).

After ASM1 converges to ~1, the dynamics of ASM2 starts at time t = tc + 1. The time courses of m2(t) are shown in Figure 3(b). Because of

m2(t c + 1 ) = ~ . ~]1r51)(~'11)~']1~'}1)(~'1' )

=-1 Y ¢I', N l

the first overlap m2(t c + 1) is equal to m(0). The critical overlap m2 c of ASM2 was 0.5. When m(0) < m2~, m2(t) approached 0.8. This is because all components of the stable state ~.(2)(~.11) of ASM2 take +1. These phenomena are useful for determining whether the recall in the second level is successful or not. We have observed such


(a) 1 ~

0.8 ;- : ~ ' - - "

Eo. 0.6

~ O.4 0

0.2

0

0 5 10 15 Time t

(b)

CM E o . ¢cl n CD > O

1

0.8

0.6

0.4

0.2

0

0 5 10 Time t

(c)

E

m CD >

O

1

0.8

0.6

0.4

0.2

0

i i ! ; i t , '~,

0 5 10 15 Time t

FIGURE 3. Dynamical process of recalling descendant Etl, where N=1000, P1 =20, P2=10, a=0.2, and b=0.8 (a=0.2). (a), (b) and (c) show the time courses of the overlap ml(t) between ~11 and x°)(t), the overlap m2(t) between ¢1 and x(2)(t), and the overlap m(t) between E11 and x(t), respectively. The critical overlaps of ASM1 and ASM2 were mlc =0.2 and m2c=0.5, respectively. When m(0)->m2c, E 11 is recalled correctly. If mlc <- m(O) < m2c, CASM recalls ~1.

phenomena through a large number of our simulations with c~ < c~ c and m(0) < m2c.

Figure 3(c) shows the times courses of m(t) which are given by connecting the curves in Figure 3(a) to the corresponding ones in Figure 3(b) at time t = t c + 1. The results show that CASM has the following three cases of recall. In the first case (m(0) -> m2c), the descendant ~11 is recalled correctly. In the second case, (mlc <-- m(O) < m2c), CASM recalls the ancestor ~l. In

0.8 - . . . . . . . - ~

E 0.6

~ 0.4 i

O.2 ~-~. . .

0 5 10 15 Time t

FIGURE 4. Dynamical process of recalling descendant ~ 1 1 , w h e r e

N = 1000, P1 = 20, P2 = 20, a = 0.2 and b = 0.8 (~ = 0.4). The curves indicate the time course of the overlap m(t) between Ell and x(t). The loading level ~ (= 0.4) was over ~c, since CASM failed to recall Ell even when m(0) = 1.

the final case, ( m ( 0 ) < ml~,), CASM (ASMI) fails to recall.

Figure 4 shows the time courses of m(t), where P2 = 20. In this simulation the loading level ~ was 0.4 From the simulation results shown in Figure 1, the storage capacity c~ c at b = 0.8 was about 0.26, so the loading level c~( = 0.4) was over c~ c. Actually, CASM failed to recall ~11 even if the input pattern ~,ll was identical to the descendant ~ l l ( m ( 0 ) = 1). The critical overlap mlc of ASM1 was 0.2, above which the ancestor ~l was recalled successfully. Hence, the recall failures in this simulation were due to overload in ASM2. In the case of overload in ASM1, CASM does not work at all.

5 . 3 . S i m u l a t i o n 3: B a s i n s o f A t t r a c t i o n

We examined the dependence of the basins of attraction around descendants on the correlation b, where N = 1000, Pj = 2 0 , and a = 0 . 2 . In this simulation ancestors were not given, so that ASM1 automatically formed concept patterns by learning their descendants, as in eqn (12). Figures 5(a) and 5(b) show the results for P2 = 10 (c~--0.2) and P2 = 15 (c~ ---- 0.3), respectively. Each surface represents the average probabili ty of successful recall over 10 trials as a function of the initial overlap m(0) and the correlation b, where each dot indicates that the average probabili ty of successful recall is above 0.99.

In Figure 5(a) ( ~ = 0 . 2 ) , the basins of attraction become larger as b decreases to b = 0 . 7 6 . When b < 0.76, there are no dots on the surface. A similar tendency is observed at b = 0 . 8 6 in Figure 5(b) (c~ = 0.3). These results are explained as follows. As the correlation b decreases, the sparseness of each difference pattern decreases, and the amount of information contained in it increases, which reduces the amount of relative noise input to ASM2. Hence, the basins of attraction become larger with decreasing correlation.


Probability of success

1 I ~ (a) 0.8- 0.6 0.4 0.2

1 ~ / J 0.94 0.86

• 0.78 v.,,~ ~ 0"78 Correlation b

Initital overlap m(O)

Probability of success (b)

0.8 0.6 0.4 0.

'

v. , . , ~ 0.78 Correlation b Initital overlap m(O) 0.5 •

FIGURE 5. Dependence of the basins of attraction around descendants on the correlation b, where N = 1000, P1 = 20, and a=0 .2 : (a) P2=10( (~=02) ; (b) P2=15(~=0 .3 ) . Each surface represents the average probabilities of successful recall, where each dot indicates that the average probability is above 0.99. In (a), the basins of attraction become larger as b decreases until b = 0.76. When b < 0,76, there are no dots on the surface. A similar tendency is observed at b = 0.86 in (b). When b -> 0.86, there are no significant differences between the two surfaces in (a) and (b). This means that the size of the basins of attraction does not depend on the loading level (~, and is almost constant at a fixed value of b.

However, this enlargement of the size of the basins of attraction is limited, since c~ c decreases with decreasing correlation. Once c~ is over O~c, a further decrease in b sharply reduces the size of the basins of attraction. We should also note that there are no significant differences between the two surfaces in Figures 5(a) and 5(b) when b --> 0.86. Thus, when ~ is smaller than ~c, the size of the basins of attraction does not depend on o¢, and is almost constant at a fixed value of correlation b.

During the simulations, we have also calculated the overlap between the concept patterns ~'~ and the corresponding ancestors ~ , where ~'~ was defined as the stable state ~.(l)(~) of ASM1. Figure 6 shows the average overlap between ~'~' and ~ ' over 10 trials as a function of b when P2 = 10 and 15. The results show that ~'" approaches ~ as the number of descendants for each ancestor, P2, increases. This is because the average pattern of the descendants ~ belonging to the same ancestor ~ approaches ~" as P2 increases. An increase in the correlation b also shows a similar tendency, since

t ~ f ' ' 1 e-t

G.) ¢,3

0.99 ,," ..... r . - J

0.98 u ) ® 0 c - 0~ ,- 0.97

~ 0.96 P2=10 ~- -

L .

0.95

0.7 0.77 0.84 0.91 0.98 Correlation b

FIGURE 6. Performance of ASM1 which automatically forms concept patterns. The horizontal and vertical axes indicate the correlation b and the average overlap between ~ '~ and ~", respectively. Each concept ~ '~ approaches the corresponding ancestor ~ as P2 increases. A similar tendency is observed as b increases.

the variance among the descendants belong to the same ancestor decreases with increasing, b.

Figs 5 and 6 show that, even if ~ ~ is not identical to ~ , CASM recalls a descendant correctly when m(0) is large enough. This is because each difference pattern ~"" defined by eqn (4) contains information on the differences between the descendant ~"' and the concept pattern ~'~, which in turn includes the differences between ~'" and the ancestor ~ . Therefore, there is no great need for CASM to form the ancestors correctly. The necessary function of ASM1 is to form average patterns, each of which has common properties among the descendants ~ belonging to the same ancestor ~ . However, the differences between ~'~ and ~¢ cause the difference patterns ~ to have different bias values, so that C¢c might be reduced with increasing differences.

5.4. Simulation 4: Performance Comparison

Gutfreund (1988) has performed numerical simulations to evaluate the basins of attraction around descendants, where N = 1000, Pl = 10, P2 = 10, a = 0, and b-----0.7(a=0.1). In the simulations, it was assumed that ancestors had already been recalled. In order to compare the size of the basins of attraction in Gutfreund's model and CASM, we have performed simulations under the same conditions. The results are shown in Figure 7 where the horizontal and vertical axes indicate the initial overlap m(0) and the average probability of successful recall over 10 trials, respectively. The broken line represents the performance of Gutfreund's model, where data were obtained from Figure 4 in Gutfreund (1988). As discussed earlier Gutfreund's model has a troublesome parameter on which the basins of attraction as well as the storage capacity depend. The broken line shown in

1634 M, Hirahara et al.

0.) 0 o= ([J "6 ..=,_

('0 ..Q 2 D-

1

0.8

0.6

0.4

0.2

o

, , , , , ,

×"\

CASM ~ - - Gutfreund's model ~ --

standard model -

i i i i t i

0.9 0.8 0.7 0.6 0.5 0.4 0.3 Initial overlap m(0)

FIGURE 7. Performance comparisons between CASM (solid line) and Gutfreund's model (dotted line), where ,°1 =10 , P2=10, a=0 .0 , and b = 0 . 7 ( ~ = 0 . 1 ) . The horizontal and vertical axes indicate the initial overlap n~0) and the average probability of successful recall, respectively. The dashed line shows the performance of the standard associative memory which stores uncorrelated patterns when ~ = 0.1. CASM has larger basins of attraction than both Gutfreund's model and the standard one when ~ = 0.1.

Figure 7 is the best result among the simulations with different values of the parameter. On the other hand, CASM does not have such a troublesome parameter. Figure 7 indicates that CASM (solid line) has the advantage of larger basins of attraction than Gutfreund's model, which was also supported by the results with N = 5 0 0 , PI =-5, P2 = 10, a = 0 , and b = 0 . 5 (Fig. 3 in Gutfreund (1988)).

Furthermore, we carried out additional simulations to compare CASM with a standard associative memory model of N( = 1000) units storing P1P2 ( = 100) uncorrelated patterns (c¢ = 0.1). The comparison is somewhat impractical, since they store different patterns and have different model structures. However, we consider that the comparison serves as a good reference, and it is useful in evaluating the size of the basins of attraction in CASM. Since a sparse difference pattern has a smaller amount of information than an uncorrelated one, relative noise input to ASM2 is greater than that input to the standard model. Hence, we had predicted that the input noise strongly affected the behaviour of ASM2, and the standard model had larger basins of attractions than CASM. Con- trary to the prediction, the simulation results show the superiority of CASM. The dashed line in Figure 7 represents the performance of the standard model. When c~ = 0.1, CASM has the larger basin of attraction than the standard model.

Note that these advantages of CASM over Gutfreund' s model and the standard one when a = 0.1 does not mean that CASM always has larger basins of attraction than the others. As mentioned in Simulation 3, CASM has the characteristic that the basins of attraction do not depend on c~, and the size of the basins at a fixed value of b is almost constant (see Figure 5). When we per-

formed simulations with b = 0.7 and various c~ ranging from 0.02 to 0.2, the critical overlaps were almost constant (0.45 ~ 0.5). We could not observe any significant dependence of the size of the basins of attraction on c~. For the standard model, we know that the basins of attraction strongly depend on the loading level c~, and became larger as c~ decreases. When ~ = 0.08, the critical overlap in the standard model is known to be about 0.3 (Amari & Maginu, 1988; Okada, 1995), so that the standard model might have larger basins of attraction than CASM when ~ -- 0.08.

It is possible that also in Gutfreund's model, the basins of attraction become larger as c~ decreases. At present, we do not know how the size of the basins of attraction in Gutfreund's model depends on c~, so that we cannot con- clude that CASM always has the larger basins of attraction than Gutfreund's model. However, we can at least say that CASM has an advantage over Gutfreund's model, since Gutfreund's model, as discussed above, has a parameter on which the size of the basins of attraction as well as the storage capacity strongly depend. To have large basins of attraction in Gutfreund's model, we must determine the parameter value carefully. However, CASM does not have such a parameter, which is one of its advantages.

6. D I S C U S S I O N

Amari (1989) has shown that sparsely encoded associative memory with a mechanism to keep activity constant has a large basin of attraction (one-step recalling region) around each stored pattern if the activity of a noisy input pattern is equal to those of the stores patterns. It was also proved that the size of the basins of attraction depends on the ratio between c¢ and c~ c. Although ASM2 in CASM stores sparse difference patterns, ASM2 does not have such an activity control mechanism as Amar i ' s model has (see eqns (9) and (10)). Furthermore, as easily under- stood from eqn (8), the activity of a noise pattern input to ASM2 is, in general, not equal to those of the difference patterns to be stored. However, the basins of attraction in CASM (ASM2) are considered to be sufficiently large, since the simulation results (Figure 7) show that CASM has larger basins of attraction than both Gutfreund's model and the standard one when c~ =0 .1 . Moreover, CASM (ASM2) has the characteristic that the basins of attraction do not depend so much on the ratio between c~ and c~c (see Figure 5); their size at a fixed value of b is almost constant. These experimental results obtained by CASM do not agree with the characteristics of Amar i ' s model. We do not know the reasons for this at present; we should further analyze the dynamics of ASM2 given by eqns (9) and (10), and clarify the differences in the dynamics between Amar i ' s model and ASM2 in CASM.

Gutfreund (1988) pointed out that his model is uneconomical in terms of the number of units, Let ceG( ~ 0 . 1 5 ) c be the storage capacity of Guffreund's


model. In order to store PIP2 descendants, ASM2 in Gutfreund's model needs more than P1Pzlo~ G units. Since ASMI and ASM2 consist of the same number of units, the total number of units than Gutfreund's model needs is more than 2P1P2/c~ c. In the same way, CASM needs more than 2P1P2/~c units in order to store P1P2 descendants. Since c~¢ given by eqn (I 1) is larger than acC(--~ 0.15), CASM needs a smaller number of units than Gutfreund's model, and is more economical, this advantage of CASM comes out more clearly as the number of levels of hierarchical tree of correlated patterns increases. In general, the number of descendants increases exponentially with the increase of the number of levels. CASM can absorb the exponential increase to a certain extent, since the storage capacity of associative memory in a high level increases if eqn (2) is satisfied. However, in Gutfreund's model, associative memory models in any level have almost the same storage capacity. Therefore, CASM is more economical than Gutfreund' s model.

In conclusion, we have proposed a cascade of associative memory models (CASM) with a sparse encoding mechanism for storing hierarchically correlated patterns. CASM has a larger storage capacity than Gutfreund's model (Gutfreund, 1988). When the loading level c~ = 0.1, we also found that CASM had larger basins of attraction than Gutfreund's model and standard models that store uncorrelated patterns. Moreover, Gutfreund's model has a parameter on which the storage capacity and the basins of attraction strongly depend, so that the value of the parameter must be determined carefully. In con- trast, CASM does not have such a parameter. These advantages of CASM are due to the incorporation of the second associative memory model (ASM2) which stores sparse difference pattern produced by the sparse encoding mechanism (CN1). The difference patterns have only the information on the differences between ancestors and their descendants, and become sparser within increasing correlation between the ancestors and their descendants. The advantages of CASM have been confirmed by a large number of the numerical simulations. The simulation results and theoretical considera- tion reveal the following characteristics of CASM.

• The storage capacity increases with increasing correlation between the ancestors and their descendants and is as large as that of sparsely encoded associative memory.

• The size of the basins of attraction around descendants becomes larger with decreasing correlation.

• The basins of attraction do not depend on the loading level. The size of the basins of attraction is almost constant at a fixed value of the correlation.

In future work, as discussed above, we will analyze the dynamics of ASM2 in order to clarify the differences in characteristics between Amari's model (Amari, 1989)

and ASM2. We also plan to improve CASM (ASM2) so as to have larger basins of attraction. Furthermore, CASM will be extended to successively recall the descendants belonging to the same ancestor.

REFERENCES

Amari, S. (1977). Neural theory of association and concept-formation. Biological Cybernetics, 26, 175-185.

Amari, S. (1989). Characteristics of sparsely encoded associative memory. Neural Networks, 2, 451-457.

Amari, S., & Maginu, K. (1988). Statistical neurodynamics of associative memory. Neural Networks, 1, 63-73.

Amit, D.J., Guffreund, H., & Sompolinsky, H. (1987). Information storage in neural networks with low levels of activity. Physical Review A, 35, 2293-2303.

Anderson, J.A. (1972). A simple neural network generating interactive memory. Mathematical Biosciences, 14, 197-220.

Feigelman, M.V., & Ioffe, L.B. (1987). The augmented models of associative memory--asymmetric interaction and hierarchy of patterns. International Journal of Modern Physics B, 1, 51-68.

Gutfreund, H. (1988). Neural networks with hierarchically correlated patterns. Physical Review A, 37, 570-577.

Hirahara, M., Oka, N., & Kindo, T. (1996). A cascade of associative memories for storing hierarchically correlated patterns. Proceed- ings of 1996 World Congress on Neural Networks, San Diego, California, 753-756.

Hopfield, J.J. (1982). Neural networks and physical systems with emer- gent collective computational abilities. Proceedings of the National Academy of Sciences USA, 79, 2445-2558.

Kohenen, T. (1972). Correlation matrix memories. IEEE Transactions on ComputersC-21, 353-359.

Nakano, K. (1972). Associatron--a model of associative memory. IEEE Transactions on Systems Man and Cybernetics, SMC-2, 380-388.

Okada, M. (1995). A hierarchy of macrodynamical equations for associative memory. Neural Networks, 8, 833-838.

Okada, M., Mimura, K., & Kurata, K. (1993). Sparsely encoded associative memory: static synaptie noise and static threshold noise. Proceedings of 1993 International Joint Conference on Neural Networks, Nagoya, 2624-2627.

N:

P I :

P2: ~ : ~/zu:

~r/z

a:

b: bi:

wO).

xd~)(t): x(Z)(t):

N O M E N C L A T U R E

number of units in an associative memory model number of ancestors number of descendants for each ancestor ancestor descendant of ~ difference pattern concept pattern corresponding to ~ bias parameter of ~ correlation parameter between ~ and ~ correlation parameter between the ith- and the (i + 1)th-level patterns weight in ASM1 weight in ASM2 state of ASM1 at time t state of ASM2 at time t


x(t): h(l)(t): h(2)(t):

01: 02: SI~

$2~

~.(2)((J.)):

state of CASM at time t inner state of ASM1 at time t inner state of ASM2 at time t threshold of units in ASM 1 threshold of units in ASM2 parameter which shifts x (1)(t) parameter which shifts x(21(t) pattern to be input to CASM noisy input pattern originating from ~u, stable state of ASM1 when ~.(in) is input to CASM stable state of ASM2 when ~.(i~,) is input to CASM

ml(t): m2(t): m(t): ml,:: m2c:

(~C" to: N(_):

stable state of CASM when ~-(in) in input to CASM overlap between ~11 and xO)(t) at time t overlap between I I and x(2)(t) at time t overlap between ~ and x(t) at time t critical overlap in ASM1 critical overlap in ASM2 loading level ( =- P1P2IN) storage capacity time when ASM 1 converges to a stable state number of - 1 components in a difference pattern

associative memory with a sparse encoding mechanism for storing correlated patterns

Documents