[ieee 2009 ieee symposium on computational intelligence for multimedia signal and vision processing...

5
AbstractEfficient coding hypothesis states that the goal of sensory system of the brain is to remove redundancies in the sensory input. Several models tried to remove redundancy in the visual input and successfully modeled the functional properties of neurons in the primary visual cortex. However there has been no progress to extend these models to simulate the properties of neurons in extrastriate visual areas. In this paper, we propose that visual cortex tries to remove higher order dependencies in a hierarchical architecture. In each layer a nonlinear mechanism removes redundancies in a local neighborhood. We used the biologically plausible divisive normalization mechanism in a two layer model network to remove nonlinear dependencies in the input. Units in this model can simulate the responses of neurons in area V2 to angle stimuli. I. INTRODUCTION ENTRAL visual pathway is thought to be the locus for processing object identity information in the visual cortex [1]. This processing pathway extends from V1 in the occipital cortex through V2 and V4 to the Inferotemporal cortex [2]. Along this pathway, receptive fields of neurons grow in size and the preferred stimuli changes from simple edge segments to complex shapes like faces [3]–[6]. These observations have led to the idea of hierarchical visual processing in the object recognition pathway [1]–[3], [7], [8]. Information theoretic methods are an approach to study the functions of visual system (and other sensory systems). These methods are usually based on the Barlow’s “efficient coding” hypothesis [9] which itself was motivated by the information theory. In his hypothesis, Barlow stated that sensory input to the brain is largely redundant because of the structural regularities in the natural environment. This redundant input may place a burden on the limited processing capabilities of the brain. One way for brain to overcome this problem is to remove these redundancies in the sensory system and feed only the most informative features to the higher levels. This theory has been described in various forms by others [10]–[13]. There are a number of successful applications of the “efficient coding” hypothesis which have modeled the functional properties of neurons in the primary visual cortex. M. Malmir is with the Department of Computer Engineering, Amirkabir University of Technology (Tehran Polytechnic), Tehran, Iran (e-mail: [email protected]). S. S. Ghidary is with the Department of Computer Engineering, Islamic Azad University of Parand, Tehran, Iran (phone: +98-21-64542737; e-mail: [email protected]). In an early study, Olshausen and Field tried to represent the input image based on sparse activation of linear filters [14]. They found localized band pass basis functions which were very similar to the selectivity of simple cells in the primary visual cortex. Similar results were achieved in a work by Bell and Sejnowski who linearly transformed the input image to a set of independent components [15]. Hyvärinen and Hoyer used energies of responses of filters in independent subspaces to model the shift and phase invariance properties of complex cells [16]. Schwartz and Simoncelli tried to model nonlinear dependencies in natural images using a nonlinear behavior observed in the neurons of visual and auditory systems [17]. They used divisive normalization to remove variance dependency among responses of a set of linear filters. Their model could account for a number of nonlinear physiological behaviours that had been observed in primary visual and auditory cortices. Despite great success in modeling different properties of neurons in the primary visual cortex, there has been no progress to extend these results to the properties of neurons in extrastriate visual cortical areas. Some studies tried to learn higher order structures in natural images through capturing variance dependencies among responses of linear filters [18]–[20]. However there were no similarities between the learned features in these models and the properties of neurons in extrastriate visual areas. In order to model the properties of neurons in higher order areas of visual cortex, we develop a model to remove higher order dependencies in the input image. We use divisive normalization in a hierarchical model which allows removing higher order dependencies in the successive stages of the hierarchy. II. HIGH ORDER DEPENDENCIES AND HIERARCHICAL DIVISIVE NORMALIZATION MODEL The problem with linear redundancy reduction methods like ICA is that, they cannot yield independent components with a simple linear transformation. Therefore some residual redundancies remain between responses of linear filter. Several methods have used nonlinear interactions to remove these residual redundancies to yield independent components [17], [18], [21], [22]. These models assume that the responses of distant linear filters are independent but the responses of adjacent filters exhibit variance dependency. The question is whether the responses of distant filters are really independent or they have some form of nonlinear dependency that is not easily detectable. In order to answer A Model of Angle Selectivity in Area V2 with Local Divisive Normalization Mohsen Malmir, Saeed Shiry Ghidary V 978-1-4244-2771-0/09/$25.00 ©2009 IEEE

Upload: saeed-shiry

Post on 03-Feb-2017

215 views

Category:

Documents


3 download

TRANSCRIPT

Abstract— Efficient coding hypothesis states that the goal of sensory system of the brain is to remove redundancies in the sensory input. Several models tried to remove redundancy in the visual input and successfully modeled the functional properties of neurons in the primary visual cortex. However there has been no progress to extend these models to simulate the properties of neurons in extrastriate visual areas. In this paper, we propose that visual cortex tries to remove higher order dependencies in a hierarchical architecture. In each layer a nonlinear mechanism removes redundancies in a local neighborhood. We used the biologically plausible divisive normalization mechanism in a two layer model network to remove nonlinear dependencies in the input. Units in this model can simulate the responses of neurons in area V2 to angle stimuli.

I. INTRODUCTION

ENTRAL visual pathway is thought to be the locus for processing object identity information in the visual

cortex [1]. This processing pathway extends from V1 in the occipital cortex through V2 and V4 to the Inferotemporal cortex [2]. Along this pathway, receptive fields of neurons grow in size and the preferred stimuli changes from simple edge segments to complex shapes like faces [3]–[6]. These observations have led to the idea of hierarchical visual processing in the object recognition pathway [1]–[3], [7], [8].

Information theoretic methods are an approach to study the functions of visual system (and other sensory systems). These methods are usually based on the Barlow’s “efficient coding” hypothesis [9] which itself was motivated by the information theory. In his hypothesis, Barlow stated that sensory input to the brain is largely redundant because of the structural regularities in the natural environment. This redundant input may place a burden on the limited processing capabilities of the brain. One way for brain to overcome this problem is to remove these redundancies in the sensory system and feed only the most informative features to the higher levels. This theory has been described in various forms by others [10]–[13].

There are a number of successful applications of the “efficient coding” hypothesis which have modeled the functional properties of neurons in the primary visual cortex.

M. Malmir is with the Department of Computer Engineering, Amirkabir

University of Technology (Tehran Polytechnic), Tehran, Iran (e-mail: [email protected]).

S. S. Ghidary is with the Department of Computer Engineering, Islamic Azad University of Parand, Tehran, Iran (phone: +98-21-64542737; e-mail: [email protected]).

In an early study, Olshausen and Field tried to represent the input image based on sparse activation of linear filters [14]. They found localized band pass basis functions which were very similar to the selectivity of simple cells in the primary visual cortex. Similar results were achieved in a work by Bell and Sejnowski who linearly transformed the input image to a set of independent components [15].

Hyvärinen and Hoyer used energies of responses of filters in independent subspaces to model the shift and phase invariance properties of complex cells [16]. Schwartz and Simoncelli tried to model nonlinear dependencies in natural images using a nonlinear behavior observed in the neurons of visual and auditory systems [17]. They used divisive normalization to remove variance dependency among responses of a set of linear filters. Their model could account for a number of nonlinear physiological behaviours that had been observed in primary visual and auditory cortices.

Despite great success in modeling different properties of neurons in the primary visual cortex, there has been no progress to extend these results to the properties of neurons in extrastriate visual cortical areas. Some studies tried to learn higher order structures in natural images through capturing variance dependencies among responses of linear filters [18]–[20]. However there were no similarities between the learned features in these models and the properties of neurons in extrastriate visual areas. In order to model the properties of neurons in higher order areas of visual cortex, we develop a model to remove higher order dependencies in the input image. We use divisive normalization in a hierarchical model which allows removing higher order dependencies in the successive stages of the hierarchy.

II. HIGH ORDER DEPENDENCIES AND HIERARCHICAL DIVISIVE NORMALIZATION MODEL

The problem with linear redundancy reduction methods like ICA is that, they cannot yield independent components with a simple linear transformation. Therefore some residual redundancies remain between responses of linear filter. Several methods have used nonlinear interactions to remove these residual redundancies to yield independent components [17], [18], [21], [22]. These models assume that the responses of distant linear filters are independent but the responses of adjacent filters exhibit variance dependency.

The question is whether the responses of distant filters are really independent or they have some form of nonlinear dependency that is not easily detectable. In order to answer

A Model of Angle Selectivity in Area V2 with Local Divisive Normalization

Mohsen Malmir, Saeed Shiry Ghidary

V

978-1-4244-2771-0/09/$25.00 ©2009 IEEE

this question, we implemented a local divisive normalization mechanism as described in [17]. We calculated the joint histogram of the activities of two filters before and after normalization and found that before normalization the responses of linear filters seemed to be independent (Fig. 1a). However as can be seen in the joint histogram of normalized responses in Fig. 1b, variance of the normalized responses of one filter depends on the normalized responses of the other.

An intuitive explanation for this observation is that there is a higher order dependency between linear filters responses beyond variance dependency. This higher order dependency cannot be detected in the joint histogram of the responses of two linear filters because it does not affect the mean or variance of these responses. In the topographic ICA or divisive normalization model it was assumed that ICA can result in independency between responses of these filters. Yet we can see that there is a nonlinear dependency between energies of these responses which cannot be eliminated using these transformations. This dependency which was hidden in the joint histogram of linear filters responses is revealed in the joint histogram of energies of them.

Fig. 1. Joint histogram of responses of two distant filters (a) before normalization and (b) after normalization

In order to remove this redundancy we added a second layer of divisive normalization to the model of Schwartz and Simoncelli. Normalized responses of linear filters in the first layer are fed to the units in the second layer. Divisive normalization is used to remove the nonlinear variance dependency between activities of units in the second layer. Such an application of divisive normalization has the ability to remove high nonlinear dependencies among responses of linear filters. The advantage of using this method is that it can remove higher nonlinear dependencies with a biologically plausible mechanism.

III. MODEL DESCRIPTION The model we propose here is an extension to the model

of Schwartz and Simoncelli. In this model the input image is transformed with a bank of linear filters similar to those found in ICA [15]. Then responses of these linear filters are transformed with a nonlinear divisive normalization mechanism in order to remove the variance dependency between them. The variance of responses of each linear filter

is modeled as a sum of squared responses of filters in a neighboring region as in (1).

( ) 22,|var x

Cyy

lyxxyx

x

LwCyLL σ+=∈ ∑∈

(1)

where xL and yL are the responses of linear filters x and

y , xC is a neighboring area around filter x , lyxw is the

weight of response of filter y in normalizing responses

of x , and 2xσ is a part of the variance of x which is

independent of the other filters responses. In order to remove the variance dependency, responses of filters are transformed using (2).

22

2

xCy

ylyx

xx

x

LwLR

σ+=∑∈

(2)

Parameters of this transformation are calculated using

maximum likelihood estimate over an ensemble of natural images as in (3).

( ) ( )( )∏ ∈=

ixyxx

lyx CyiLiLPw

xlyxw

,|maxargˆ,ˆ,σ

σ (3)

where P denotes the conditional probability distribution function of linear filters responses. For simplicity of computations the joint distribution function in the first and second layer is set to the normal distribution.

Normalization in the second layer of the model incorporates signals from larger vicinities compared to the first layer. Notice that simply extending the normalization area in the first layer does not remove this higher order dependency because it involves the variance of energies of linear filters not the variance of responses themselves. In the second layer responses are normalized using (4).

22

2

uCv

vlvu

uu

u

RwRI

σ+=∑∈

(4)

where uI is the normalized responses of filter u after

normalization and uC denotes the region in the second layer in which responses are used to normalize the responses of u .

We calculated the parameters of normalization in the first layer by likelihood estimation over an ensemble of natural images. After learning these parameters, the responses of filters in the first layer were calculated and used to estimate the parameters of the second layer. Estimation of parameters of normalization in the second layer was done using likelihood estimate over the responses of the first layer.

IV. COMPARISON OF MODEL UNITS WITH REAL V2 NEURONS

We compared the properties of the units in the second layer of the proposed model to the properties of neurons in the area V2. This is an important area in the visual cortex and is as large as area V1. However there are few studies which have systematically examined the selectivity of neurons of V2. The problem in the study of stimulus selectivity in this area is to select a stimulus set which can elicit neuronal firings. It has been suggested that this area might be selective to complex stimuli such as gratings and curves [4], [23]–[25].

One of the most prominent studies of visual area V2 is the work done by Ito and Komatsu who have examined the selectivity of neurons in this area to angle stimuli [26]. They used a set of angles with different orientations and angle widths to find if neurons in V2 are selective to angles. They found that most of the V2 neurons were responsive to these stimuli and some neurons demonstrated sharp selectivity to a specific angle stimulus. We adapted their stimulus set (Fig. 2) to examine the selectivity of units in the second layer of our proposed model. This stimulus set is composed of 66 angles that have two component half lines. These half lines vary from 0˚ to 330˚ with 30˚ steps. Angles in the upper triangle in Fig. 2 are replicated in the lower triangle for better visualization.

Fig. 2. Stimuli set used to examine the selectivity of model V2 neurons. Angles in the shaded region are a mirror image of angles in the white region.

Fig. 3a shows the responses of a typical unit in the second

layer of proposed model to the stimuli set. This unit is indeed selective to a 120˚ angle composed of 150˚ and 270˚ half lines. The topmost row and leftmost column in Fig. 3 display the responses of this unit to the individual half lines. This unit showed no selectivity to the 270˚ and a mild response to the 150˚ half line. In this case, the unit is selective to the combination of these two half lines. The response of this unit to the optimal stimulus is a nonlinear function of the responses to the constituent half lines. A similar nonlinear behavior was observed in the responses of neurons in area V2.

Fig. 3b displays the responses of another model unit to the stimulus set. The maximum response in this unit is elicited by an angle composed of 0˚ and 330˚ half lines. This unit

also responded to other angles which have a 0˚ half line component. This is evident in the Fig. 3b that the responses are elongated along the row and column corresponding to the 0˚ half line. This means that this unit is selective to a 0˚ half line and its response to the optimal angle is a nonlinear function of the constituent half lines. A similar behavior was found and reported in [26].

Fig. 3. Responses of two typical units of the proposed model to the angle stimuli set. (a) unit selective to the combination of 150˚ and 270˚ half lines (b) unit selective to a 30˚ angle composed of 0˚ and 330˚ half lines.

A. Distribution of Optima Angle Stimuli in the Model We compared the distribution of optimal angle width over

a sample population of model units with those observed in real V2 neurons reported in [26]. Fig. 4 shows this distribution for model units and real V2 neurons. A large number of model units (41%) showed maximum response to 30˚ degree angle. This is close to the 43% of real V2 neurons. Wide angles between 60˚ and 150˚ were optimal stimulus in 49% of model units and 41% in real V2 neurons. Long bars were optimal stimulus for only 9% of model units and 16% of real V2 neurons.

Fig. 4. Comparison of optimal angle stimuli distribution in (a) real V2 neurons regenerated from [26] and (b) in model units

Another characteristic that we compared in the model

units and real V2 neurons is the size of peak response area in the response profile. In this experiment, we measured the continuous area around the maximum response angle in which angles elicited responses higher than half of the maximum response. The result of this experiment is shown in fig. 5 and can be compared to the results same experiment in actual V2 neurons. As it is shown 25% of peak response areas include less than 4 angles in model units. These results are close to the data acquired in [26]. They reported that 21% of peak response areas include less than 4 angles.

Fig. 5. Size of the peak response area for (a) real V2 neurons reported in [26] and (b) model neurons.

V. CONCLUSION “Efficient coding” hypothesis has greatly influenced the

direction of research in the study of brain sensory system. Several studies used this hypothesis and successfully modeled a number of properties of neurons in the visual cortex. Primary works used linear transformation to remove redundancies in the input image [14], [15]. Nonlinear behavior of neurons in the visual cortex inspired subsequent models to use nonlinear interactions to remove redundancy. These models successfully generated nonlinear units which replicated the behavior of neurons in the primary visual cortex [16], [17].

Dependencies in the natural environment are highly nonlinear and we cannot claim that one stage of nonlinear transformation can completely remove them. A possible mechanism in the brain to remove these dependencies is a hierarchical processing system in which successive layers remove higher order dependencies in the input. In each layer nonlinear interactions such as divisive normalization are used to remove the visible redundancies in the input. Moreover these nonlinear interactions reveal higher order redundancies for the next layer. In this way a simple nonlinear interaction can remove higher order redundancies.

We implemented a two layer neural network and used divisive normalization in each layer to remove variance dependencies between network unit’s responses. Output of the first layer is calculated by normalizing the responses of a set of linear filters in a neighboring region. In the second layer these normalized responses are again normalized in larger vicinities. This mechanism can remove dependencies between variances of energies of linear filters responses. We compared the properties of units in the second layer of proposed network to the properties of real V2 neurons and found that these units could simulate a number of behaviors of real V2 neurons.

One important property observed in neuronal responses of area V2 is that responses to angle stimuli are a nonlinear

function of responses to the constituent half lines. Units in the second layer of the proposed model can simulate similar nonlinear behaviours. A large number of V2 neurons are selective to sharp angle stimuli. This was also observed in the selectivity of units in the model. Distribution of the size of peak response area was similar in the model units and real V2 neurons. These results indicate that our proposed model can simulate the properties of neurons in area V2.

REFERENCES [1] M. Mishkin, L. G. Ungerleider, K. A. Macko, "Object vision and

spatial vision: two cortical pathways," Trends. Neuroscience, vol. 6, pp. 414 – 417, 1983.

[2] J. H. R. Maunsell, W. T. Newsome, “Visual processing in monkey extrastriate cortex,” Ann. Review. Neuroscience, vol. 10, pp. 363-401, 1987.

[3] D. H. Hubel, T. N. Wiesel, “Receptive fields, binocular interaction and functional architecture in the cat's visual cortex,” J. Physiology, vol. 160, pp. 106–154, 1962.

[4] J. Hegdé, D. C. Van Essen, “Strategies of shape representation in macaque visual area V2,” Visual Neuroscience, vol. 20, pp. 313–328, 2003.

[5] A. Pasupathy, C. E. Connor, “Shape representation in area V4: position-specific tuning for boundary conformation,” J. Neurophysiology, vol. 86 No. 5, pp. 2505-2519, Nov. 2001.

[6] K. Tanaka K, “Inferotemporal cortex and object vision,” Ann. Rev. Neuroscience, vol. 19, pp.109–139, 1994.

[7] D. C. Van Essen, J. H. Maunsell, “Hierarchical organization and functional streams in the visual cortex,” Trends. Neuroscience, vol. 6, pp. 370–375, 1983.

[8] D. J. Felleman, D. C. Van Essen, “Distributed hierarchical processing in primate cerebral cortex,” Cerebral Cortex, vol. 1, pp. 1-47, 1991.

[9] H. B. Barlow, “Possible principles underlying the transformation of sensory messages,” Sensory Communication. ed. W. A. Rosenblith, pp. 217–234, 1961.

[10] F. Attneave, “Some informational aspects of visual perception,” Psychological Review, vol. 61, pp. 183–193, 1954.

[11] S. Laughlin, “A simple coding procedure enhances a neuron's information capacity,” Z. Naturforsch, vol. 36, pp. 910-912, 1981.

[12] J. Atick, “Could information theory provide an ecological theory of sensory processing?” Network: Computation. Neural Systems, vol. 3, pp. 213–251, 1992.

[13] D. Field, “What is the goal of sensory coding?” Neural Computation, vol. 6, pp. 559–601, 1994.

[14] B. A. Olshausen, D. J. Field, “Emergence of simple cell receptive field properties by learning a sparse code for natural images,” Nature, vol. 381, pp. 607–609, Jun.1996.

[15] A. J. Bell, T. J. Sejnowski, “The ‘independent components’ of natural scenes are edge filters,” Vision Research, vol. 37, No. 23, pp. 3327–3338, Dec. 1997.

[16] A. Hyvärinen, P. O. Hoyer, “Emergence of Phase- and Shift-Invariant Features by Decomposition of Natural Images into Independent Feature Subspaces,” Neural Computation, vol. 12, No. 7, pp. 1705-1720, 2000.

[17] O. Schwartz, E. P. Simoncelli, “Natural signal statistics and sensory gain control,” Nature Neuroscience, vol. 4, pp. 819–825, 2001.

[18] Y. Karklin, M. S. Lewicki, “Learning higher-order structures in natural images,” Network: Computation. Neural Systems, vol. 14, pp. 483–499, 2003.

[19] H. Valpola, M. Harva, J. Karhunen, “Hierarchical models of variance sources,” Signal Processing, vol. 84, pp. 267–282, 2004.

[20] H. J. Park, T. W. Lee, “Unsupervised learning of nonlinear dependencies in natural images,” International J. Imaging Systems. Technology, vol. 15, No. 1, pp. 34–47, 2005.

[21] A. Hyvärinen, P. Hoyer, M. Inki, “Topographic independent component analysis,” Neural Computation, vol. 13, pp. 1527–1558, 2001.

[22] M. Wainwright, E. Simoncelli, A. Willsky, “Random cascades on wavelet trees and their use in modeling and analyzing natural

imagery," in Proc SPIE, 45th Annual Meeting, (San Diego), July 2000.

[23] J. Hegdé, D. C. Van Essen, “Selectivity for Complex Shapes in Primate Visual Area V2,” J. Neuroscience, vol. 20, pp. 1–6, 2000.

[24] G. M. Boynton, J. Hegdé, “Visual Cortex: The Continuing Puzzle of Area V2,” Current Biology, vol. 14, issue. 13, pp. 523–524, 2004.

[25] J. Hegdé, D. C. Van Essen, “A Comparative Study of Shape Representation in Macaque Visual Areas V2 and V4,” Cerebral Cortex, vol. 17, pp. 1100–1116, 2007.

[26] M. Ito, H. Komatsu, “Representation of Angles Embedded within Contour Stimuli in Area V2 of Macaque Monkeys,” J. Neuroscience, vol. 24, pp. 3313–3324, 2004.