bioinspired computing lecture 16 associative memories with artificial neural networks netta cohen

Bioinspired ComputingLecture 16

Associative Memories

with

Artificial Neural Networks

Netta Cohen

2

Last time

• Biologically realistic architecture

• Dynamic interactive behaviour

• Natural learning protocols

• Biologically-inspired associative memories

• Also… steps away from biologically realistic model

• Unsupervised learning

• Applications

Today

Recurrent neural nets: Attractor neural nets:

3

Recurrent Nets: Pros & Cons

• Biologically-realistic architecture/performance

• Complex self-sustained activity

• Distributed representations

• Dynamic interactions with environment

• Powerful computation• Noise tolerance• Graceful degradation

• Hard to formalise in information processing terms

• Hard to visualise activity• Hard to train with no

guarantee of convergence

• No guaranteed solution

Pros Cons

attractor neural nets are a special case of recurrent nets.

Pros

4

Training Methods for Recurrent Networks

• Genetic algorithms– Usually in a controller– Fitness evaluated on the controller not the

network– Requires sophisticated genome

• Backpropagation through time (BPTT)– Understand principle

6

+1

-1

7

Training Methods for Recurrent Networks

• Jets and Sharks: interesting dynamic behaviour but weights set by hand

• Looking for an automatic, ‘grounded’ way of training networks

• Hopfield (today)!• Elman (also today … turn a ffd net into

something recurrent)

8

Associative Memory

The imprinting and recollection of memories is an important component of what we do & how we process information.

If we were to model these processes, here are a few conditions we might want to include in our model:

• Store and reliably recall multiple independent memories.• Given only partial input, recall complete information, or• Given noisy input, recall noise-free prototype information.• Learn new memories in a biologically realistic manner.• Recall memories fast enough (before next input is received) • Once recalled, maintain attention or memory long enough (for information processing & transmission elsewhere in brain).

9

Attractor Neural Nets

In some cases, trajectories in state-space can be guaranteed to lead to one of several stable states (fixed points, cycles or generic attractors).

In RNNs, the state of the system is dictated both by internal dynamics & the system response to inputs from the environment.

cross-section of energy landscape

state space

10

Attractor Neural Nets (cont.)

Dynamical systems such as RNNs could serve as models of associative memory if it was possible to encode each memory in a specific stable state or attractor.

In 1982, John Hopfield realised that by imposing a couple of restrictions on the architecture of the nets, he could guarantee the existence of attractors, such that every initial condition would necessarily evolve to a stable solution, where it would stay.

This is tantamount to the requirement that the above picture be described in terms of an energy landscape.

11

Attractor Neural Nets: Architecture

• No self-connectedness

• All connections are symmetric wij = wji wij

wji i

wii=0

j

“Gerard Toulouse has called Hopfield’s use of symmetric connections a ‘clever step backwards from biological realism’. The cleverness arises from the existence of an energy function.”*

* Hertz, Krogh & Palmer Introduction to the theory of neural computation (1990).

The existence of an energy function provides us with:• A formalism of the process of memory storage and recall• A tool to visualise the activity (both learning and recall)• A straightforward way to train the net• Once trained, a guaranteed solution (recall of the correct memory).

12

How Does It Work?Nodes are modelled by conventional binary MP neurons.

Each neuron serves both as an input and output unit. (There are no hidden units.)

States are given by the pattern of activity of the neurons (e.g. 101 for a network with three neurons). The number of neuron sets the maximum length for a bit-string of memory.

Different patterns can be simultaneously stored in the network. The number of independent patterns that can be remembered is less than or equal to the number of nodes.

Memory recall corresponds to a trajectory taking the system from some initial state (input) to the local energy minimum (closest association) .

Each step along the (recall) trajectory results in the same or a lower energy. Since energy is bounded from below, a solution is guaranteed for every input.

13

1 2 3 4 5 (neuron i)

5

4

3

2 1 0 1 -1 2 -3

1 0 3 -1 0

-1 3 0 1 -2

2 -1 1 0 1

-3 0 -2 1 0

Weight matrix

(ne

uro

n j)

0 1 0 0 0

A working example1 2 3 4 5

Input (t=0)

t=1

t=2

t=3

t=4

1

0 1 1 1 0

1 1 1 1 0

1 1 0 1

1 1 1 1 01 1 1 1 0

t

1 1 1 1 0

...-3 4 0 3 -5

2 2 4 0 -12 3 3 2 -4

1 0 3 -1 0∑ xi wij

i

Exercise: repeat this example with an initial input of [ 0 1 0 1 0 ].

threshold = 0

14

More general examples

Stability:

The stable pattern reached in the working example represents a fixed point in the dynamics.

While stable solutions are guaranteed, not all stable solutions are fixed point solutions.

State update rule:

This example used a “synchronous updating” method.

Asynchronous (sequential or random) updating methods can also be implemented.

15

Asynchronous random updating

• Pick any node• Update it • Repeat• Example difference between

asynchronous/synchronous:• Hopfield network with two nodes:• W =1• Initial state -1, 1

16

0,1 or -1,1

• No essential difference!

• Suppose we use weights w_ij for nodes that are -1,1

17

Trajectories in energy landscape

1) neuron coding [0 1] [-1 1]

Where does energy come in? The formalism we need to answer this question comes from physics and requires slight modifications to our notation:

3) Asynchronous updateSpin Glass

2) Threshold now becomes a “sign” function:

sign(input) =x=1 if input>0old x if input=0x= -1 if input<0{

18

From energies to MP neurons

We have just re-discovered that the MP update rule exactly corresponds to magnetic field alignment in Spin Glasses!

xi sign(inputi)

This is called the mean field approximation: the “magnetic field” at each node (each spin) corresponds to a weighted average over all the fields generated by all other spins. When a specific node senses this field, it wants to align with the mean field, thus reducing its energy. This is the update rule:

Define the energy of node i as

inputi = ∑ xj wijj

Ei = - xi inputi

where inputi to node i is the weighted sum over

all neurons

19

Ansynchronous Random updating

• Easy to show that updating minimizes energy

• Example given …

20

• All connections are symmetric wij = wji

Attractor Neural NetsThe restrictions imposed on the recurrent net are now:

• No self-connectedness

wij

wji i

wii=0

j

“Gerard Toulouse has called Hopfield’s use of symmetric connections a ‘clever step backward from biological realism’. The cleverness arises from the existence of an energy function.”*

The existence of an energy function provides us with:• A formalism of the process of memory storage and recall• A tool to visualise the activity (both learning and recall)• A straightforward way to train the net• Once trained, a guaranteed solution (recall of the correct memory).

21

Training the Net

We need to find the set of weights that encode a single pattern p of length N bits as a minimum energy solution.

The minimum energy is obtained when the output of node i exactly matches the inputs to that node.

)p wsign( p j1j

iji

N

OK)pp p1

sign( p jj1j

ii

N

N

jiij pp1

wN

Plug in a guess for w:

so that

22

Training the Net (cont.)

Now generalising for M memories or patterns:

)(j

1

)(iij pp

1 w m

M

m

m

N

This weight assignment is remarkably reminiscent of Hebbian learning:

If two nodes are spiking at the same time, then the

weight connecting them is strengthened. Here anti-

correlated nodes result in negative (inhibitory) weights.

For gradual learning . Only patterns

introduced repeatedly will result in the formation of new

memories; noise will be ignored.

Generalised Hebb Rule

jiij pp1

wN

23

Does it work?

http://www.cs.tcd.ie/Padraig.Cunningham/applets/Hopfield/Hopfield.htmfrom http://suhep.phy.syr.edu/courses/modules/MM/SIM/Hopfield/ where no longer available.

This applet demonstrates: The distributed representationThe ability to perform powerful computationHigh storage capacity (7 100-bit patterns in 100 neurons)High fidelity and noise toleranceGraceful degradation (for more memories)

Eliminated features: Dynamic inputs in trainingIntrinsic background activity

http://www.cs.tcd.ie/Padraig.Cunningham/applets/Hopfield/Hopfield.htm

24

Adding extra training patterns

• For a single training pattern it is clear that the pattern is an energy minimum (why?)

• For two training patterns, each of them is still an energy minimum, slightly perturbed by the other (explicit example)

• General condition p << N• Consequence resistance to damage• No localisation of memory: ‘holographic’

25

Storage CapacityHow many memories can be stored in the network? To store M memories, each of length N bits, in a network of N neurons, we first ask how many stable patterns can be reached? In 1987, McEliece et al derived an upper limit for the number of memories that can be stored accurately: M = N/(2 logN).

e.g. for N = 100 neurons, M = 11 distinct memories, each 100 bits long can be faithfully stored and recalled. To write out these 11 distinct memories, would take 1100 bits!

In general, the coding efficiency of the network can be summarised as 2 log N neurons per pattern (each N bits long).This enormous capacity is paid for by a potentially lengthy recall process.

McEliece et al., (1987) IEEE Trans. Inf. Theor. IT-33:461-482.

26

Hopfield nets have obvious applications for any problem that can be posed in terms of optimisation in the sense of maximising or minimising some function, that can be likened to an energy function.

The distance matching problem:

The travelling salesman problem:

Applications

shortestlength

match pairsgiven points

given points find path shortest

27

What about the brain (pros)?Hopfield nets maintain some very attractive features from recurrent net architectures. However, the imposition of symmetric weights was a conscious move away from biological realism and toward engineering-like reliability.

In contrast, Hopfield nets seem more biologically realistic in disallowing self-connected neurons.

Hebbian-like learning is also a great appeal of Hopfield nets, capturing several important principles:

(1) unsupervised learning

(2) natural synaptic plasticity

(3) No necessary distinction between training & testing.

(4) robustness to details of training procedure

28

While we now have dynamics in training and in recall, we might still ask is this dynamics realistic in the brain?

1) In the memory recall stage: we consider inputs one at a time, waiting for the association to be made before proceeding to the next pattern. Is this how the brain works?

2) The aspiration of every Hopfield net is to arrive at a stable solution. Is this a realistic representation of association, or cognition in general in the brain? In other words, do we represent solutions to problems by relaxing to stable states of activity in the brain, or does the brain represent solutions according to very different, dynamic paradigms that handle continuous inputs and actively resist falling into the abyss of equilibrium?

What about the brain (cons)?

29

How memories are implanted in our brains remains an exciting research question.

While Hopfield nets no longer participate in this discourse, their formative role in shaping our intuition about associative memories remains admirable.

What about the brain? (cont.)

30

Next time…

Reading• John Hopfield (1982) “Neural Networks and Physical Systems with Emergent Collective Computational Properties”, Proc. Nat. Acad. Sci. 79: 2554-2588.

• A highly accessible introduction to the subject, incl. both non-technical and technical approaches can be found at:

www.shef.ac.uk/psychology/gurney/notes/contents.html

• Some food for thought: a popular article on CNN: “Researchers: It’s easy to plant false memories”, CNN.com, Feb 16, 2003.

• Final lecture about neural networks (for the time being)

bioinspired computing lecture 16 associative memories with artificial neural networks netta cohen

Documents

recurrent slide

memory recall

principle slide

recurrent neural nets

input t

guaranteed solution

initial state input

initial input