a proposed model of repetition blindness
TRANSCRIPT
14th February 2005 17:37 WSPC/Trim Size: 9in x 6in for Proceedings ncpw9
A PROPOSED MODEL OF REPETITION BLINDNESS
COLM G. CONNOLLY
Dept. of Computer Science,
University College Dublin,
Belfield, Dublin 4, Ireland.
E-mail: [email protected]
RONAN G. REILLY
Dept. of Computer Science,
National University of Ireland, Maynooth
Co. Kildare, Ireland.
E-mail: [email protected]
We describe a model of repetition blindness which draws on the dichotomous divi-sion of the visual system into two subsystems which process identity and locationinformation. The model is constructed from self-organising networks of spikingneurons which are connected by plastic inhibitory and excitatory synapses. Inparticular, we describe how these networks are capable of learning translation in-variant letter representations and learning to locate a stimulus in the input arrayusing the SOM learning algorithm.
Various psychological phenomena require the establishment of separate
episodic representations for each of the objects perceived in the environ-
ment. From the point of view of the brain these representations need not
be stored in the same place, but may be distributed over many regions.
Indeed, there is neurobiological evidence supporting the theory that the
episodic context of an object is stored separately from its identity. We de-
scribe a model of one such phenomenon which uses separate representations
for the identity and location of word information.
1. Repetition Blindness
Repetition blindness (RB) is the failure to detect the repeated occurrence of
a stimulus which is presented in a rapid serial visual presentation (RSVP)
experimental paradigm. If the exposure of the first occurrence of the stim-
ulus, or critical word (C1), is set so that it is just identifiable—about
14th February 2005 17:37 WSPC/Trim Size: 9in x 6in for Proceedings ncpw9
100 ms—then identification of the second occurrence (C2) is impeded (Kan-
wisher, 1987).
RB can be observed under a variety of stimulus conditions, including
compounds, homographs, homophones and single letters (Kanwisher & Pot-
ter, 1990). RB is not confined to the level of whole words, as overlapping
fragments of words can be subject to blindness (Harris & Morris, 2000).
Furthermore, it appears not be restricted to letter clusters. It has been
found between numbers in their verbal (nine) and Arabic (9) forms and
between homophones, such as I / eye (Bavelier & Potter, 1992).
1.1. Types, Tokens and Cortical Streams
The distinction between types and tokens is well known in language and is
critical to the understanding of language and perception in general. Con-
sider the following example: When extracting the different meanings from
the following sentences, “Big fish eat little fish” and “Big fish eat little”,
it is necessary to clearly establish that there are two tokens of the type
“fish” (Kanwisher, 1987). Normally, we are proficient at this task. How-
ever, in RSVP experiments containing repeated stimuli this process can
break down.
Kanwisher (1987) proposed that RB occurs when words are recognised
as separate types but not tokenised, hence the description type activation
without token individuation. Kanwisher also suggested that as each word
is presented separate type and token nodes are setup with links established
between the two to tokenise the type.
One possible source of the type-token division is the what-where divi-
sion of the visual system (Ungerleider & Mishkin, 1982; Baylis, Driver, &
Rafal, 1993). In this view, the types are processed in the temporal “what”
pathway whereas the tokens, or episodic representations, are the domain of
the parietal “where” stream. It is assumed that the dorsal pathway is also
responsible for the tokens’ temporal order. The hypothesis is that the sep-
arate spatiotemporal and identity representations are then bound together,
hence tokenising the type. Activation of the type representation and cre-
ation of the episodic token can happen in parallel, the problem arises when
the two are bound together to form a stable percept. In RB, failure to
bind the type-token representations causes the disappearance of the second
critical stimulus.
One popular proposal is that the brain uses the synchronisation of neu-
ral spikes to solve this binding problem (Singer, 1994). In the context of
14th February 2005 17:37 WSPC/Trim Size: 9in x 6in for Proceedings ncpw9
accounting for RB within this framework, one could hypothesise that under
normal circumstances there are two ensembles of neurons, one in each of
the what and where streams. Over a short period of time these entrain
producing the neuronal correlate of a stable percept. Disruption of this
entrainment, however, gives rise to RB. In RB experiments, the time scale
of stimulus presentation is so fast that sometimes entrainment fails to occur
when the type is repeated.
2. A Model of RB
Drawing on the presence of the functional streams in the visual cortex, the
model we propose for repetition blindness, shown in figure 1, follows this
roughly Y-shaped division. It consists of two networks that share a com-
mon input source. One of the networks is specialised for localising where
a letter is presented in the input and the other is specialised for identi-
fying the letter regardless of location in the input. The final network in
the model plays the role of what is commonly called association cortex. It
is bidirectionally connected to both the “what” and the “where” networks
and allows for identity and location information to be bound together even
though two different networks sub-serve these functions. These three net-
works are laterally connected self-organising maps (SOM) (Kohonen, 1995)
of excitatory and inhibitory spiking neurons.
The input layer roughly models the pattern of orientation sensitive line
detectors found in V1 (Hubel & Wiesel, 1965). The layer is fully connected
to the excitatory neurons of the what layer while the connections between
it and the where layer are arranged in overlapping topographic receptive
fields. This allows only one side of the where map to activate in response
to a letter presentation on one side of the input. However, some of this
activity spreads, via the lateral connections, to the other side of the map.
The final part of the model consists of an “association” cortex layer.
Its function is to mediate the binding of letter identity and letter location
information into a stable percept. This is accomplished by using a third
self-organising layer of laterally connected excitatory and inhibitory neu-
rons. This layer receives input from both the “where” and “what” layers.
In addition to this, it also sends an extensive network of reciprocal connec-
tions to both the “what” and “where” layers. These connections promote
inter-areal synchronisation thus facilitating the binding of the “what” and
“where” information (Singer, 1994).
14th February 2005 17:37 WSPC/Trim Size: 9in x 6in for Proceedings ncpw9
AssociationArea
VisualFeature
Input
WhereSystemSystem
What
Figure 1. Gross outline of the model of RB.
2.1. Visual Input
Input to the model consists of a simplified version of the simple cells de-
scribed by Hubel and Wiesel (1965). Four different orientations, namely 0,
45, 90 and 135, and 16 possible spatial locations are used. This represen-
tation was chosen as it allowed the use of the font described by Rumelhart
and Siple (1974). The font together with the oriented line detectors and
their locations can be seen in figure 2. Each one of these features corre-
sponds to a neuron capable of generating a single spike and each letter is
represented by a spike on a subset of these oriented line detectors.
(a)
9 11
12
13
8
15
6 5
21
14
10
3
47
0
(b)
Neu
ron
Num
ber
0123456789101112131415
0 50 100
Time (ms)
(c)
Figure 2. (a): The font used for letter presentation to the model is similar to thatused by Rumelhart and Siple (1974). (b): The template showing the location andarrangement of all 16 features. After figure 2, Rumelhart and Siple (1974). (c): Theencoding of the letter “A” by spike times. In this case, neurons 0, 1, 2, 3, 4, 7, 8 and12 signal its presence. These spikes are solid lines. The other neurons, which do nottake an active part in encoding the feature have their spikes shown as dashed lines. Theneuron numbers correspond to the features numbered in (b).
14th February 2005 17:37 WSPC/Trim Size: 9in x 6in for Proceedings ncpw9
The visual input to the model consists of two groups of 16 neurons.
Each of these groups can represent one letter, consequently the two groups
are capable of registering two simultaneously presented letters.
The presence of a particular feature in the input is encoded in the firing
time of the appropriate neurons. In this case, the encoding is a delay or
latency code (Gerstner & Kistler, 2002). In the training pattern set, those
neurons signalling the presence of a particular feature fire early and all other
neurons fire much later. This arrangement can be seen in figure 2(c). The
inclusion of this positive and negative information is necessitated by the
design of the synapses used in the network. Since they only change efficacy
in response spike events, some means must be included to indicate that
the efficacy must either be increased or decreased. This is accomplished by
having spikes encoding the presence of a feature fire early and roughly in
synchrony. All the others fire randomly much later than those signalling the
presence of a feature. Given an appropriate membrane time constant, τm,
these early spikes summate and cause the SOM neurons to fire. When the
efficacy of the synapses is updated the early spikes will cause an increase
in the synaptic efficacy since they, for the most part, lie in the time period
before a SOM neuron spike when afferent spikes can potentiate a synapse.
The later firing neurons, on the other hand, lie in the time window after a
SOM neuron spike which causes a decrease in synaptic efficacy.
This pattern encoding scheme also has the advantage that a large num-
ber of patterns can easily be generated for a single letter by addition or
deletion of spikes from a canonical letter representation or by shifting one
or more spike’s time.
The testing pattern set is similar to the training pattern set save for the
absence of the late, dashed spikes in figure 2.
Having described how patterns are created for the letters, we now pro-
ceed to outline the neural model used in the SOM layers which learn these
input patterns.
3. The Neural Model
The spiking model used in the network is the Spike Response Model (SRM)
(Gerstner & Kistler, 2002). In order to obtain a computationally efficient
function which computes the response of a neuron to afferent spikes, several
simplifications described by Connolly, Marian, and Reilly (2004), have to
be made to the SRM equations. Firstly, between the arrival of any two
spikes at a neuron its membrane potential evolves in a deterministic manner.
14th February 2005 17:37 WSPC/Trim Size: 9in x 6in for Proceedings ncpw9
Therefore if we know what the membrane potential is at time tint we can
compute what it will be at some later time tint + ∆t. We then need only
compute the additional change in the membrane voltage caused by spikes
arriving since tint. Secondly, if we assume that the post-synaptic potentials
due to the arrival of a pre-synaptic spike contribute instantaneously to the
membrane potential of the model neuron then we can eliminate the need to
use a biophysically more realistic function to model the change of the post-
synaptic potential. In its place we can simply use a Dirac delta function.
The modified equation is shown in (1).
Vi(t) =∑
t(f)i
∈Fi
η(t − t(f)i ) + ε(tint) · exp(
−(t − tint)
τm
)+
∑
j∈Γi
∑
t(f)j
>tint∈Fj
wijδ(t − t(f)j − d̂ij)
(1)
where η describes the response of a neuron to its own spikes, ε(tint) is the
post-synaptic potential at time tint, wij is the efficacy of the synapse be-
tween neurons i and j, t(f)j is the most recent spike time of neuron i, Γi is
the set of neurons pre-synaptic to neuron i, Fj is the set of spike times of
neuron j, d̂ij is a noisy spike transmission delay between neurons i and j,
and δ is the Dirac delta function (δ(x) = 1 iff x = 0, 0 otherwise). This
simplification of the neural model, though physiologically unrealistic (Ger-
stner, 2001), has considerable computational benefits since we no longer
consider the time course of synaptic input to the neuron (Marian, 2002).
Learning in the model is accomplished by using plastic excitatory and
inhibitory spike time-dependent synapses (Abbott & Nelson, 2000). It is
thought that in the brain, plasticity processes at synapses mediate long-
term learning. At the excitatory synapses a type of Hebbian learning is
modelled. If a pre-synaptic spike precedes that of a post-synaptic neuron,
potentiate of the synapse can occur. If the reverse occurs then depression
of the synapse takes place. Equations (2) and (3), below, govern these two
processes.
wij(t + ∆t) = wij(t) + κE(t̂j + d̂ij − t̂i) (2)
κE(∆) =
A+ · exp(∆/τP ) if ∆ < 0
B− · exp(−∆/τD) if ∆ > 0
0 otherwise
(3)
14th February 2005 17:37 WSPC/Trim Size: 9in x 6in for Proceedings ncpw9
where t̂j , t̂i are the most recent spike times of the pre- and post-synaptic
neuron, respectively. The amplitude and polarity of the change are deter-
mined by the function κ, which implements what is known as the learning
window (Gerstner & Kistler, 2002). τP and τD control the time course over
which potentiation and depression occur respectively, and the constants
A+ > 0 and B− < 0 control the amplitude of potentiation and depression,
respectively. This function differentiates between the case where the pre-
synaptic spike, tpre, occurs before the post-synaptic spike, tpost, and vice
versa, with potentiation occurring in the former instance and depression in
the latter.
The similar process takes place at inhibitory synapses (Roberts, 2000).
When spiking of the pre-synaptic inhibitory neuron follows that of the
post-synaptic excitatory neuron, potentiation can result. Conversely, when
pre-synaptic activity precedes post-synaptic activation, depression can be
observed. Two similar equations, (4) and (5), can also be defined, with all
parameters as previously mentioned.
wij(t + ∆t) = wij(t) + κI(t̂j − t̂i) (4)
κI(∆) =
A+ · exp(−∆/τP ) if ∆ > 0
B− · exp(∆/τD) if ∆ < 0
0 otherwise
(5)
In both cases, a realistic description of synaptic plasticity requires that
the efficacy, wij , be constrained to vary within a certain range of values,
namely [wEImin, wEI
max]. The excitatory (inhibitory) synapse described here
must maintain a positive (negative) weight which must not be greater (less)
than wEImax. These constraints can be easily realised by setting B− = wijb−,
b− < 0 and A+ = (wEImax − wij)a+, a+ > 0 in equations (3) and (5). Thus
each positive (negative) term which increments (decrements) the synaptic
efficacy is proportional to wEImax−wij whereas each negative (positive) term
which leads to a decrement (increment) of the efficacy is proportional to
wij . When the efficacy reaches its maximum (minimum) value it is no
longer incremented (decremented) (Gerstner & Kistler, 2002).
4. Results
Thus far eliciting synchronous oscillation from the network has proved very
difficult as the network is very sensitive to small changes in parameter sets.
14th February 2005 17:37 WSPC/Trim Size: 9in x 6in for Proceedings ncpw9
Seemingly innocuous modifications can change the regular output shown in
figure 3 into self-propagating epileptiform spiking patterns where all infor-
mation about letter identity and location is lost. Balancing excitation and
inhibition in a way which maintains the topographic regularity extracted
by the SOM algorithm with the need to have regions of the three maps
spiking together, and binding what and where information is so doing, is a
demanding task. Solving this problem is our current goal.
Left [6] Left [7] Left [8] Left [9]
Right [0]
5 10 15
15
10
5
Right [1]
5 10 15
Right [3]
5 10 15
Right [4]
5 10 15
Right [5]
5 10 150
0.2
0.4
0.6
0.8
1
T
Left [2]
15
10
5
(a) Letter identity.
Left [6] Left [7] Left [8] Left [9]
Right [0]
2 4 6 8
8
6
4
2
Right [1]
2 4 6 8
Right [3]
2 4 6 8
Right [4]
2 4 6 8
Right [5]
2 4 6 80
0.2
0.4
0.6
0.8
1
T
Left [2]
8
6
4
2
(b) Letter location.
Figure 3. The figures above show the number of spikes generated by 10 presentationsof the letter “T” to the model.
Currently the model is capable of correctly learning both letter location
and a translation invariant representation of letter identity. This can be
seen in figure 3. The letter “T” was presented to a network 10 times. Half
of these presentations were to the left side of the input array and the other
half to the right. The model learned letter identity, show in figure 3(a),
regardless of presentation location. Though the individual spike patterns
are not identical, due to noise in spike transmission, they are substantially
similar. Likewise in figure 3(b), the model easily located the letter in the
input array. This is a consequence of the receptive field connection between
the input and where layers.
5. Summary and Conclusions
In this paper we have described a model of orthographic repetition blind-
ness, a disruption in the ability to recognise repeated words, or word frag-
14th February 2005 17:37 WSPC/Trim Size: 9in x 6in for Proceedings ncpw9
ments, which are rapidly visually presented.
Drawing on the division of the visual system into streams which pro-
cess object identity and location separately, the model consists of two self-
organising maps performing these functions. These two maps are recipro-
cally connected, via a third SOM, which promotes synchronous firing of the
neurons in each map.
We have described a model spiking neuron based on Gerstner’s SRM
with networks of these neurons connected with plastic excitatory and in-
hibitory synapses exhibiting spike time-dependant learning.
So far the network has successfully learned both letter and location in-
formation. In attempting to get the model to demonstrate synchronous
oscillations in different parts of the three maps the model has almost con-
sistently become epileptic, a known problem with these types of networks
(Huyck, n.d.).
This may be due to a number of factors. Firstly, the balance between
slow and fast excitatory and inhibitory synaptic transmission which can
stabilise oscillatory persistent activity (Brunel, 2003) may not be present in
the network. Secondly, the simple fire and quickly forget refractory function
may decay to swiftly thus depriving the network of its activity history. We
are currently investigating whether addressing these two problems will allow
the network to enter stable oscillatory persistently active states.
Acknowledgements
We wish to thank Harry Erwin for his useful comments and suggestions for
solving the simulated epilepsy demonstrated by the model.
References
Abbott, L. F., & Nelson, S. B. (2000). Synaptic plasticity: taming the
beast. Nature Neuroscience, 3, 1178–1183.
Bavelier, D., & Potter, M. C. (1992). Visual and phonological codes in
repetition blindness. Journal of Experimental Psychology: Human
Perception and Performance, 18 (1), 134–147.
Baylis, G., Driver, J., & Rafal, R. D. (1993). Visual extinction and stimulus
repetition. Journal of Cognitive Neuroscience, 5, 453–466.
Brunel, N. (2003). Dynamics and plasticity of stimulus-persistent activity
in cortical network models. Cerebral Cortex, 3 (11), 1151–1161.
Connolly, C. G., Marian, I., & Reilly, R. G. (2004). Approaches to ef-
ficient simulation with spiking neural networks. In H. Bowman &
14th February 2005 17:37 WSPC/Trim Size: 9in x 6in for Proceedings ncpw9
C. Labiouse (Eds.), Connectionist models of cognition and perception
II (Vol. 15, pp. 231–240). London, UK: World Scientific.
Gerstner, W. (2001). What’s different with spiking neurons. In H. A. K.
Mastebroek & J. E. Vos (Eds.), Plausible neural networks for bio-
logical modelling (Vol. 13, pp. 23–48). Dordrecht, The Netherlands:
Kluwer Academic Publishers.
Gerstner, W., & Kistler, W. M. (2002). Spiking neuron models: Single neu-
rons, populations, plasticity. Cambridge, UK: Cambridge University
Press.
Harris, C. L., & Morris, A. L. (2000). Orthographic repetition blindness.
The Quarterly Journal of Experimental Psychology, 53A, 1039–1060.
Hubel, D. H., & Wiesel, T. N. (1965). Receptive fields and functional
architecture of two non-striate visual areas (18 and 19) of the cat.
Journal of Neurophysiology, 28, 229–289.
Huyck, C. (n.d.). Creating hierarchical categories using cell assemblies.
(Manuscript under review. Retrieved 14th October 2004 from http:
//www.cwa.mdx.ac.uk/chris/hebb/hier/hier.ps)
Kanwisher, N. G. (1987). Repetition blindness: Type recognition without
token individuation. Cognition, 27, 117–143.
Kanwisher, N. G., & Potter, M. C. (1990). Repetition blindness: Levels of
processing. Journal of Experimental Psychology: Human Perception
and Performance, 16 (1), 30–47.
Kohonen, T. (1995). Self-organizing maps. Berlin: Springer.
Marian, I. D. (2002). A biologically inspired model of motor control of
direction. Unpublished master’s thesis, National University of Ireland,
Dublin, Dublin, Ireland.
Roberts, P. D. (2000). Modeling inhibitory plasticity in the electrosensory
system of mormyrid electric fish. Journal of Neurophysiology, 84 (1),
2035–2047.
Rumelhart, D. E., & Siple, P. (1974). Process of recognizing tachistoscop-
ically presented words. Psychological Review, 81 (2), 99–118.
Singer, W. (1994). Putative functions of temporal correlations in neocorti-
cal processing. In C. Koch & J. L. Davis (Eds.), Large-scale neuronal
theories of the brain (pp. 210–237). Cambridge, MA: The MIT Press.
Ungerleider, L. G., & Mishkin, M. (1982). Two cortical visual systems. In
D. Ingle, M. A. Goodale, & R. J. W. Mansfield (Eds.), Analysis of
visual behavior (pp. 549–586). Cambridge, MA: The MIT Press.