bidirectional asssociative memory

Upload: himanshu-koli

Post on 04-Jun-2018

231 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/14/2019 Bidirectional Asssociative Memory

    1/12

    ILEE IRANS ALTIO NS ON SYSTEMS, MAN, AND CYBERNETICS, VOL18, NO 1, ANUARY/FEBRUARY 1988 49

    Bidirectional Associative MemoriesBART KOSKO,MEMBER, IEEE

    htract -Stability and encoding propertiesof two-layer nonlinearfeedback neural networks are examined. Bidirectionality, forward andbackward information flow, is introduced in neural nets to produce two-wayassociative search for stored associations( A , , B , ) . Passing informationthrough M gives one direction; passing it through its transposeM T givesthe other. A bidirectional associative memory (BA M) behaves as a hetero-a5sociative content addressable memory (CAM ), storing and recalling thevector pairs ( A , , B , ) ; . , ( A , , B , ) , w he re A ~ { 0 , 1 } " n dB E { O , ~ } P .We prove that ewry n-by-p matrixM is a bidirectionally stable heteroas-wciativ e CAM for both binarypip olar and continuous neuronsa, and b,.When the BAM neurons are activated, the network quickly evolves to astable sta te of tw o-pattern reverberation, or resonance. The stable re-verberation corresponds to a system energy local minimum. Heteroassocia-t ive information is encoded in a BAM by summing correlation matrices.The BAM storage capacity for reliable recall is roughlym < m m ( n , p ) .No more heteroassociative pairs can be reliably stored and recalled thanthe lesser of the dimensions of the pattern spaces (0, l) " and {O,l}P . The

    Appendix shows that it is better on average to use bipolar{ - , l ) codingthan binary ( 0 , l ) coding of heteroassociative pairs(A, , B, ) .BAM encod-ing and decoding are combined in the adaptive BAM, which extends globalbidirectional stabilityto realtime unsupervised learning. Temporal patterns

    A L , . , A , ) are represented as ordered listsof binary/bipolar vectorsand stored in a temporal associative memory (TAM)n -by-n matrix M asa limit cycleof the dynamical system. Forward recall proceeds throughM ,backward recall throughM T. Temporal patterns are stored by summingcontiguous bipolar correlation matrices,XTX2+ . . + A X,, gener-alizing the BAM storage procedure.This temporal encoding scheme isseen to be equi talen t to a form of Grossb erg outstar avalanche coding forspatiotemporal patterns. The storage capacity isrn = m1 . . . + m k < n,where m, is the lengthof the j th temporal pattern andn is the dimensionof the spatial pattern space. Limit cyclesA,; . , A , , , A,) are shown to be5tored in local energy minimaof the binary state space {0,1}"

    I. STORINGAIREDN D TEMPORALATTERNS

    OW CAN paired-data associations( A , ,B , )be storedH nd recalled in a two-layer nonlinear feedback dy-namical system? What is the m inimal neural network thatachieves this? We show that the introduction of bidirec-tionality, forward and backward associative search forstored associations( A , ,B , ) , extends the symmetric unidi-rectional autoassociators[30] of C ohen and Gros sberg [7]and Hopfield [24],[25] .Euety real matrix is both a d iscreteand continuous bidirectionally stable associative memory.The bid irectional associative memory (BAM) is the minimaltwo-layer nonlinear feedback network. Information passes

    Manuscript received December3, 1986; revised November3, 1987.This work was supported in part by theAir Force Office of ScientificResearch under Contract F49620-86-C-0070, and by the Advanced Re-search Projects Agencyof the Departmentof Defense, ARPA Order5794.

    The author is with the Departmentof Electrical Engineering, Systems,Signal, and Information Processing Institute, Universityof SouthernCalifornia, Los Angeles, CA 90089.

    IEEE Log Number 8718862.

    forward from one neuron field to the other by passingthrough the connection matrixM . Information passesbackward through the matrix transposeM T. All othertwo-layer networks require more information in the formof backward connectionsN different from M T. The un-derlying mathematics are closely related to the propertiesof adjoint operators in function spaces, in particular howquadratic forms are essentially linearized by real matricesand their a djoints (transposes).

    Since every matrixM is bidirection ally stable, wesus-pect that gradually changes due to learning inM willresult in stability. We show that this isso quite naturallyfor real-time unsupervised learning.This extends Lyapunovconvergenceof neural networks for the first time to learn-

    ing.The neural network interpretationof a BAM is a two-layer hierarch y of symmetrically connected neurons. Whenthe neurons are activated, the network quickly evolves to astable state of two-pattern reverberation. The stable re-verberation corresponds to a system energy local mini-mum. In the learning or adaptive BAM, the stable re-verberation of a pattern ( A , ,B,) across the two fieldsofneurons seeps pattern information into the long-termmemory connections M , allowing input associations( A , ,B,) to dig their own energy wells in which to re-verberate.

    Temporal patterns are sequences of spatial patterns.Recalled temporal patterns are limit cycles. For instance, asequence of binary vectors can represent a harmonizedmelody. A given note or chord of the melody is oftensufficient to recollect the rest of the melody, to "name thattune." T he sam e note or chord can be made to trigger thedual bidirectional memory to continue (recall) the rest ofthe melody backwards to the start-a whstling feat worthyof M ozar t or Bach Limit cycles can also be shown to beenergy minim izers of simple network s of sy nchron ouson-off neurons.

    The forward and backward directionality of BAM corre-lation encoding naturally extends to the encoding of tem-poral patterns or limit cycles. The correlation encodingalgorithm is a discre te approx imation of H ebbian learning,in particular, a type of Grossberg outstar avalanche[91-[121.

    11. EVERY ATRIXS BIDIRECTIONALLYTABLE

    Traditional associative memories areunidirectional. Vec-tor patterns A , , A , ; . ., A , are stored in a matrix memory

    0018-9472/88/0100-0049$01.0001988 IEEE

  • 8/14/2019 Bidirectional Asssociative Memory

    2/12

    50 IEEE TRANSACTIONSON SYSTEMS,MAN, AND CYBERNETICS,VOL 18, NO 1, J A N U A RY / F L B R I K Y 19x8

    M . Input pattern A is presented to the memory by per-forming the multiplicationA M and som e subsequentnon-linear op eration, such as thresholding, with resulting out-put A. A is either accep ted as the recollection or fedb ackinto M , wh ch producesA, and so on. A stable memorywill eventually produce a fixed outputA,. If the memoryis a proper content addressable memory (CAM), thenA ,should be o ne of the stored patternsA , , . . , A , . Thisfeedback procedure behaves as if inputA was unidirec-tionally fed through a chain ofM s : A --f M --f A M

    Unidirectional CAMs are autoassociatiue [28]-[30].Pieces of patterns recall entire patterns. In effect, autoas-sociative memories store the redundant pairs( A , , A , ) ,( A 2 ,A 2 ) ; . - , ( A , , A m ) .In general, associative memoriesare heteroassociatiue. They store pairs of different data:(Al , B J , ( A 2 ,B 2 ) ; . . , ( A , , B m ) .A , and B, are vectors indifferent vector spaces. For instance, ifA , and B, arebinary and hence depict sets, they may come from therespective vector spaces(0 , l ) and (0 , l )P. If they areunit-interval valued and he nce depict fuzzy sets [38], theymay come from[0,1]and [0,1]P.

    Heteroassociative memories are usually used as one-shot memories.A is presented toM , B is output, and theprocess is finished. Hopefully,B will be closer to storedpattern B, than to all other stored patternsB, if the inputA is closest to stored patternA , . Kohonen [28]-[30] hasshown how to gu arantee this for matrix memories by usingpseudoinverses as optimal orthogonal projections. For in-stance, M will always recallB, when presented withA , ifall the stored input patternsA , are orthogonal.

    What is the minimal nonlinear feedback heteroassocia-tive memory that stores and accurately recalls binary as-sociations( A , ,B,)? Consider the chainA + M + B. Sup-pose A is closer to A , than to all the other stored inputpatterns A,. Suppose the m emoryM is sufficiently reliableso that the recollectionB is relatively close toB,. Supposefurther that M is an n-by-p matrix memory. We wouldlike to somehow feedbackB through the memory toincrease th e accura cy of th e final recollection. The simplestway to do this is to multiplyB by some p-by-n matrixmemory (then threshold, say), and the simplest such mem-ory is the transpose (adjoint) of M , M T. Whether thenetwork is implemented electrically, optically, or biologi-cally, M T is locally available information ifM is. Anyother feedback scheme requires additional information inthe form of a matrix p-by-n matrixN distinct from M T.This gwes the new chainB M T- + A, where, hopefully,A is at least as close toA , as A is. We can then reversedirection again and feed A through M : A+ M + B.Continuing ths bidirectional process, we produce a se-

    quence of paired approximations to the stored pair( A , ,B , ) :( A ,B ) , ( A , B ), ( A ,B ) , ( A ,B ), . . . Ideally, thissequen ce will quickly converge to som e fixed pair( A , , B,),and this fixed pair will be( A , ,B , )or nearly so.

    A bidirectional associative memory (BAM) behaves as aheteroassociative CAMif it is represented by the chain of

    A + M + + A , + M + A , + . e . .

    recollection:

    A + M + B

    A + M + BA M T

    A + M ~ +

    A , - M - , B,

    A , + M ~ +,

    This BAM chain makes explicit that a fixed pair( A , , B ,corresponds to a stable network reverberation or reso-nan ce, in the spirit of G rossberg s adaptive resonanc e[ 5 ] ,[6], [16]-[20]. I t also makes clear tha t a fixed pointA , of asymmetric autoassociative memory is a fixed pairA , , A,)of a BAM. On the contrary, a BAM, indeed any heteroas-sociator, can be viewed as a symmetrized augmented auto-associato r with conne ction matrix ma de up of zero blockdiagonal matrices, and withM and M T nonzero off-diago-nal matrices, andC,= [A, IB , ] .

    Th e fixed or stable points of autoass ociative (autocorre-lation) m emories are often described as rocks ona stretchedrubber sheet. An input pattern then behaves as a ballbear-ing on the rubber sheet as it minimizes its potential energysubject to frictional damping. Hecht-Nielsen[21] evendefines artificial neural systems or neurocomputersas pro-grammable dissipative dynamical systems. BAM fixedpoints are harder to visualize. Perhaps a frictionallydamped pendulum dynamical system captures the back-and-forth operations ofA -+ M and M T + B , or perhapsaproduct-space ball bearing rolling into product-spacepotential energy wells.

    A pair ( A ,B ) defines the state of the BAM M . Weprove stability by identifying a Lyapunov orenergy func-tion E with each state A ,B ) . In the autoasso ciative casewhen M is symmetric and zero diagonal, Hopfield [24],[25] has identified an appropriateE by E ( A )= - A M A(actually, Hopfield uses half this quantity). We reviewHop fields [24], [35],[37]argument to prove unidirectionalstability for zero-diagonal symmetric matrices in asyn-chronous operation. We will then generalize this prooftechniqu e to establish bidirectional stability of a rbitrarymatrices. Equation (21) generalizes this proofto a spec-trum of asynchronous BAM update strategies.

    Un idirect ional stability follows since ifA E = E , - E , IScaused by the k th neurons state change,A a , = a,, - u k i ;then E can be expanded as

    E @ ) = - a , a ~ m , ~k C a j m k / - k C a t m , ki f k j k J I

    (1)

    so that talung the differenceE , - E , and dividing byha,

  • 8/14/2019 Bidirectional Asssociative Memory

    3/12

    KOSKO: BIDIRECTIONAL ASSOCIATIVE MEMORIES 51

    gives similarly,A E

    - -- - CaJrnkJ-azrnrk

    k J I

    = - A M : - AM^ (2)where M k is the kth row of M , M k is the k th column. If

    M is symmetric, the right side of(2) is simply - 2 A M k .A M k is the input activation sum to neurona k . As in theclassical McCulloch-Pitts [34] bivalent neuron model,athresholds to +1 if A M k> 0, to - if A M k< 0. HenceA u k and A M k agree in sign, and hence their product ispositive (or zero). HenceA E = - 2 A a k ( A M k )< 0. SinceE is bou nded, the unidirectional procedure convergesonsome A , such that E ( A f ) s a local energy minimum.

    The unidirectional autoassociative CAM procedure is ingeneral unstableif M is not sym metric. For then the termAM: in (2) is the output activation sum froma k to theother neurons, an d, in general,A M : # A M k .f the magni-tud e of the outp ut sum exceeds the magnitud e of the inpu tsum and the two sums disagree in sign,A E > 0 occurs.The unidirectional CAM procedure isno longer a nearestneighb or classifier. Osc illation occurs.

    We propose the potential function

    E ( A ,B ) = - / 2 A M B T- 1 /2 B MTAT (3)as the BAM system energy of state( A ,B ) . Observe thatB MTAT = B ( A M ) T = A M B T ) T =A M B T.The last equalityfollow s since, trivially, the tran spos e of a scalar equ als thescalar. Hen ce (3) is equivalent to

    E ( A , B )= - A M B T. (4)This establishes that the BAM system energy is a well-defined concept sinceE ( A , B )= E ( B, A )and makes clearthat the Hopfield autoassociative energy corresponds tothe s pecial case whenB = A . Analogously, if a two-dimen-sional pendulum has a stable equilibrium at the vertical,then the energy of the pendulum at a given angle is thesame w hether the angle is m easured clockwise or coun ter-clockwise from the vertical. Moreover, the equalityE ( A , B )= E ( B , A )holds even though the neuronsin boththe A and B networks behave asynchronously.

    The BAM recall procedure is a nonlinear feedback pro-cedure. Each neuron a , in neuron population or fieldAand each neuronbJ in B independently and asyn chronously(or synchronously) examines its input sum from the neu-rons in the other population, then changes state or notaccording to whether the input sum exceeds, equals, orfalls short of the threshold. Hence we make the neuro-classical assumption that each neuron is eitheron ( + 1) oro f f (0 or - 1 ) according to whether its inp ut sum exceedsor falls short of som e num erical thresho ld; if th e input sumequals the threshold, the neuron maintains its currentstate. The inputsum to bJ is the column inner product

    A M J= I a , r n I J 5 )I

    where M J s the j t h column ofM . The input sum toa , is,

    B M T = C b j r n i j ( 6 )i

    where M i s the ith row (column) ofM ( M T ) .We take 0 asthe threshold for all neurons. In summary, the thresholdfunctions for a , and bj are

    ( 7 )

    b,= i; iff A M J< 0 . (8)When a paired pattern( A ,B ) is presented to the BAM,the neurons in populationsA and B are turned on or offaccording to the o ccurrence of1s and 0s (-1s) in statevectors A and B . The neurons continue their asynchronous(or synchronous) s tate changes until a bidirectionally sta-ble state ( A f ,B f ) is reached. We now prove that such astable state is reached for any matrixM and that itcorresponds to a local minimum of (3).

    E decreases along discrete trajectories in the phase sp ace( 0 , } (0 , } . We show this by showing that changesA a , and AbJ in state variables produceA E < 0. Note thatA a I ,Ab, E { - l , O , l > for binary state variables andh a , ,Ab, E { - , 0 , 2 } for bipolar variables. We needonly con-sider nonzero changes ina , and bJ. Rewriting (4) as adouble sum gives

    E ( A ,B ) = - C a i b J r n i j

    = - 1 ibirnii- b k I a k r n i k . (9).i j # k 1

    Henc e the energy changeA E = E , - E , due to state changeA a , is

    A E- - - - I bjrnkj= - BM;.

    j

    We recognize the right side of(10) as the input sum toa kfrom the threshold rule (7). Hence if 0 < A u k=1-0 =1, then (7) ensures that BM;> 0, and thus A E =- A a , ( B M : ) < 0. Similarly, if Auk< 0, then (7) againensures that aks input sum agrees in sign,B M T < 0, andthus A E = - A u k (BM;) < 0. Similarly, the energy changedue to state changeAb, is

    A E- - - - Cairn,, = - A M k

    1

    Aga in we recogn ize the right sid e of(11)as the negativeofthe input sum to b , from the threshold rule (8). HenceA b , > 0 only if A M k> 0, and Abk< 0 only if A M k< 0.In either case,A E = - A b k ( A M k )< 0. When A a ,= AbJ=0 , A E = 0. Hence A E < 0 along discrete trajectories in{ 0 , 1 } x{O,l}*(or in { - 1 , 1 } X { - l , l } j ) , s claimed.

  • 8/14/2019 Bidirectional Asssociative Memory

    4/12

    5 2 IEEE TRANSACTIONS ON SYSTEMS,MAN, AND CYBERNETICS, VOL. 18, NO. 1. A N C A K Y / F L F J R ~ I A K Y1 9 X X

    Since E is bounded below,

    E ( A ,B ) >, - xlm, , l , for all A and all B , (12)

    the BAM converges to some stable pointA f , f ) uch thatE ( A f 7B f ) is a local energy minimum. Since the n-by-pmatrix M in (3) was an arbitrary (real) matrix, everymatrix is bidirectionally stable.

    J

    111. BAM ENCODING

    Suppose we wish to store the binary (bipolar) patterns( A , ,B,); . . , A , , B,) at or near local energy minima.How can these association pairs be encoded in some BAMn-by-p matrix M ? In the previous section we showed howto decode an arbitraryM but not how to construct aspecificM . We now develop a simp le but general encodingprocedure based upon familiar correlation techniques.

    The association( A , ,B, ) can be viewed as a m eta-rule orset-level logical implication:IF A , , THEN B,. However,bidirectionality implies that( A , ,B , ) also represents theconverse meta-rule: IF B,, TH E N A , . Hence the logicalrelation betweenA , and B, is symmetric, namely, logicalimplication (set equivalence). The vector an alogueof t h ssymmetric biconditionality is correlation. The naturalsuggestion then isto memorize the association( A , ,B , ) byforming the correlation matrix or vector outer productATB,. The correlation matrix redundantly distributes thevector information in( A , ,B , ) n a parallel storage m edium,a matrix. The next suggestion is to superimpose themassociations (A l7B , ) by simply adding up the correlationmatrices pointwise:

    M = ATB, (13)I

    with dual BAM memoryM T given by

    The associative memory defined in(13) is the emblem oflinear associative network theory. It has been ex haustivelystudied in this context by Koh onen [27]-[30], Nakano[36],Anderson et al.[2]-[4], and several other researchers. Inthe o verwh elmin g num ber of cases,M is used in a simpleone-shot feedforward linear procedure. Consequently,muc h research [22], [23],[30] has focused on preprocessingof stored input ( A , )patterns to im prove the accuracy ofone-iteration synchronous recall. In contrast, the BAMprocedure uses (13) and(14) as system components in anonlinear multi-iteration procedureto achieve heteroas-sociative content addressability. The fundam ental bicondi-tional nature of the BAM process naturally leads to theselection of vector correlation for the memorization pro-cess.

    However, the non linearity introduced by the threshold-ing in (7) and (8) renders the memories in(13) and (14)unsuitable for BAM storage. The can didate memory bi-nary p atterns( A , ,B,); . , ( A , , ,B,) must be transformedto bipolar patterns (XI,Y,), . , X,, Y,) for proper mem-

    orization and superimposition.This yields the BAM mem-ories

    Note tha t (Al7 i ) an be erased fromM ( M T ) y addingXTr,C = - X T x to the right sideof (15) since the bipolarcomplement yc= - y. Also note that X;yr = - XT-Y = X T x . Hence encoding( A , ,B,) in memory encodes( A : ,B,)as well, and vice versa.

    The fundamental reason why(13) and (14) are unsuit-able but (15) and(16) are suitable for BAM storageis that0s in bina ry patterns are ignored when a dded, but- 1s inbipolar patterns are not:1 + 0 =1 but 1 + ( 1) =: 0. I fthe numbers are matrix entries that represent synapticstrengths, then multiplying and adding binary quantitiescan only produce excitatory connections or zero-weight

    connections. (We note, however, that13) and (14) arefunctio nally suitable if bipolar state ve ctors are used,al-though the neuronal interpretationis less clear than when(15) and (16) are used.)

    Multiplying and adding bipolar quantities produces ex-citatory and inhibitory connections. The connectionstren gths represent the frequency of excit atoryand inhibi-tory connections in the individual correlation matrices.Ife,, is the edge or connection strength between a , andh,,then ei,5 0 according as the number of+1 ijth entries inthe m correlation matrices XTY, exceeds, equals, or fallsshort of the number of - 1 ijth entries. The magnitudeo fe, , measures the preponderanceof 1s over - s, or - 1sover ls, in the summed matrices.

    Coding details aside, (15) encodes( A , ,B , ) in M by

    forming discrete reciprocal outstars[81-[ 121, in the lan-guag e of G rossberg associative learning; seeFigs. 1 and 2.Grossberg 181has long since shown that the outstar is theminimal network capableof perfectly learninga spatialpattern . T he reciprocal outstar framework providesa fertilecontext in which to interpret BAM convergence.

  • 8/14/2019 Bidirectional Asssociative Memory

    5/12

    5 3OSKO: BIDIRECTIONAL ASSOCIATIVEMEMORIES

    F

    Fig. 2

    The neurons { a , ; . e a,} and {b,; . ., b,} can be inter-preted as two symmetrically connectedfields [5], [6], [15],[16],[20] FA and FB of bivalent threshold functions. BAMconvergence then corresponds to a simple type ofadaptiveresonance [5], [6], [16]-[20]. Adaptive resonance occurswhen recurrent neuronal activity (short-term memory) andvariable connection strengths (long-term memory) equi-librate or resonan t. The resonance is adaptive because theconnection strengths gradually change. Hence BAM con-vergence represents nonadaptive resonance since the con-nections m,, are fixed by (15). (Later, and in Kosko [31],we allow BAM's to learn). Since connections typicallychange m uch slower than neuron activations change, BAMresonance may still accurately model interesting distrib-uted behavior.

    Let us examine the synchronous behavior of BAM'swhen M and M are given by (15) and (16). Suppose wehave stored( A , ,B,) ; . , ( A , , B,) in the BAM, and we arepresented with the pair( A ,B ) . We can initiate the recallprocess using A first or B first, or using them simulta-neously. For simplicity, suppose we present the BAM withthe stored pattern A , . Then we obtain the signal-noiseexpansion

    A I M = ( A , X , T ) Y +c ( A , X ? ) Y, (17)/ + I

    or, i f we use the bipolar versionX I of A , , which, asestablished in the Appendix, improves recall reliability onaverage, then

    X , M = ( X ,X , T )E ; +c (GqjY,J + l

    = nYj + (x , q T ) ~J f

    (c ly ; , c , , y;, . ~ c p $ ) > c k > o . (I8)

    Observe that the signal in (18) is given the maxim umpositive amplification factorn > 0. Ths exaggerates the

    bipolar features ofY hus tending to produceB , when theinput sum X , M is thresholded according to(8).

    The noise amplification coefficients,x,, = correctthe noise terms Y according to the Hamming distancesH ( A , ,A , ) . In particular,

    x,, 5 0 , iff H ( A , ,A , ) 2 /2. (19)T h s relat ionshp holds becausex,, is the number of vectorslots in which A , and A, agree, n - H ( A , ,A J ) ,minus thenumber of slots in which they differ,H ( A , ,A , ) . Hence

    X I ] = n - 2 H ( A , , A , ) , (20)

    which imp lies(19). If H ( A , ,A, ) = n/2, Y is zeroed o ut ofthe inp ut sum. IfH ( A , ,A , ) < n / 2 , and hence ifA , and A ,are close, then x,, > 0 and 5 is positively amplified indirect propo rtion to strength of match betweenA , and A,.If H A , , A , )> n/2, then x,, < 0 and the complement y/ ispositively amplified in direct prop ortion toH ( A , ,Af) sincey/ = - 5. Thus the correction coefficientsx convert theadditive noise vectorsy/ into a distance-weighted signalsum, thereby increasing the probability that the right side

    of (18) will ap pro xim ate some positive multiple ofy , thenthreshold to B z ( K ) . This argument still applies when anarbitrary vectorA = A , is presented to the BAM.

    Th e BAM storage capacity is ultimately d etermined bythe noise sum in (18). Roughly speakmg, this sum can beexpected to outweigh the signal term(s) ifm > n , where mis the number of stored pairs( A , ,B , ) , since n is themaximum signal amplification factor. Similarly, when pre-senting M T with B , the maximum signal term isp X , ; som > p can b e expected to produce un reliable recall. Hencea rough estimateof the BAM storage capacity ism