[IEEE 2009 IEEE Symposium on Computational Intelligence for Multimedia Signal and Vision Processing (CIMSVP) - Nashville, TN, USA (2009.03.30-2009.04.2)] 2009 IEEE Symposium on Computational Intelligence for Multimedia Signal and Vision Processing - Binary image registration using cellular simultaneous recurrent networks

Download [IEEE 2009 IEEE Symposium on Computational Intelligence for Multimedia Signal and Vision Processing (CIMSVP) - Nashville, TN, USA (2009.03.30-2009.04.2)] 2009 IEEE Symposium on Computational Intelligence for Multimedia Signal and Vision Processing - Binary image registration using cellular simultaneous recurrent networks

Post on 24-Feb-2017

216 views

Category:

Documents

3 download

Embed Size (px)

TRANSCRIPT

<ul><li><p>*Corresponding author: iftekhar@memphis.edu </p><p>* Intelligent Systems and Image Processing Lab, University of Memphis, Memphis, TN 38018. </p><p> Abstract </p><p>Cellular simultaneous recurrent networks (CSRN)s have been successfully exploited to solve the conventional maze traversing problem. In this work, for the first time, we investigate the use of CSRNs for image registration under affine transformations. In our simulations, we consider binary images with in-plane rotations between 20. First, we experiment with a readily available CSRN with generalized multilayer perceptrons (GMLP)s as the basic core. We identify performance criteria for such CSRNs in affine correction. We then propose a modified MLP architecture with multi-layered feedback as the core for a CSRN to improve binary image registration performance. Simulation results show that while both the GMLP network and our modified network are able to achieve localized image registration, our modified architecture is more effective in moving pixels for registration. Finally, we use sub-image processing with our modified MLP architecture, to reduce training time and increase global registration accuracy. Overall, both CSRN architectures show promise for correctly registering a binary image. </p><p>I. INTRODUCTION </p><p>A. Artificial Neural Networks The artificial neural network (ANN) has been researched </p><p>extensively [1][7][8][9]. The ANN attempts to simulate the biological neural system. Some of the applications wherein ANNs have been successfully used are pattern recognition, function approximation, and other similar tasks. ANNs work by adjusting weights based on an error calculation. In general, the more weights a network has, the better it is able to learn the task at hand. </p><p>There are several varieties of ANNs which have been designed for many different tasks. Some of the most common ANNs are based on the multi-layer perceptron (MLP). MLPs have been shown to be universal function approximators for adequately smooth functions [1]. When functions are not smooth enough, the MLPs become increasingly complex. For more complex networks, one must either restrict the complexity of the network and settle for a higher error, or find another network to use. </p><p>B. Simultaneous Recurrent Networks Simultaneous recurrent networks (SRNs) are a type of </p><p>neural network, which has been proven to be more powerful than MLPs [2][10]. </p><p> This material is based upon work supported by the National Science </p><p>Foundation under Grant No. ECCS-0738519 and ECCS-0715116. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation. </p><p> Previous research has shown that SRNs can always learn functions that are generated by MLPs, but the opposite is not true. This recurrent behavior in SRNs is similar to the way the brain works. SRNs use the output of the current iteration as input for the next iteration. This can be seen in the basic topology of the SRN as shown in Fig. 1. Figure 1. The basic topology of SRN </p><p>In Fig. 1, f is the forward function of the NN, W is the weight matrix, x is the network input, and z is the network output. The key feature of the SRN is the fact that the network outputs are fed back as inputs to the network. </p><p>C. Cellular Neural Networks Another type of ANN is the cellular neural network </p><p>(CNN) which consists of identical elements, arranged in some sort of geometry [2]. This configuration allows for a reduction in the required number of weights. Each element is able to share the same weights due to the symmetry of the network. This weight sharing can significantly decrease the number of weights needed, which, in turn, can significantly decrease the time needed to train the network. The symmetry of CNNs can also be useful in solving problems that contain a similar type of inherent geometry. Each element of such a network can be as simple as an artificial neuron or more complex, as a MLP. A typical cellular architecture is shown in Fig. 2. The similarity between this architecture and a typical image is immediately evident and is discussed further in section II-A. </p><p> Figure 2. A typical cellular architecture </p><p>Binary Image Registration using Cellular Simultaneous Recurrent Networks </p><p>Keith Anderson*, Khan Iftekharuddin*, Eddie White*, and Paul Kim* </p><p>Input: x</p><p>feedback </p><p>Feedforward Network f(W,x,z) </p><p>Output: z</p><p>z</p><p>978-1-4244-2771-0/09/$25.00 2009 IEEE </p></li><li><p>*Corresponding author: iftekhar@memphis.edu </p><p>D. Cellular Simultaneous Recurrent Networks By combining a SRN and a CNN one can obtain the </p><p>cellular simultaneous recurrent network (CSRN). The CSRN is more powerful than a regular MLP which is demonstrated in the fact that a CSRN can solve the maze traversal problem, which the MLP cannot [2]. The behavior of the CSRN mimics the cortex of the brain which consists of columns similar to each other. In early trials the CSRN is trained with back propagation through time (BPTT). However, BPTT is very slow. In Ilin et als work [4] the extended Kalman filter (EKF) is implemented to train the network via state estimation. The architecture of the CSRN is shown in Fig. 3. Figure 3. A CSRN architecture In Fig. 3, the geometry of the input pattern is reflected in the geometry of the cellular structure of the CSRN, one network cell (gray box) for each cell in the input pattern. As with CNNs, each cell can be a single artificial neuron or more complex network, such as a MLP. The outputs of each cell can be brought together to produce an overall network output. </p><p>E. The 2D Maze Traversing Application Until now, one of the primary applications of the CSRN </p><p>has been to solve the generalized 2D maze traversal problem, which is an extremely difficult problem [4]. Tests conducted by Pang et al. [2] were unable to solve this maze traversing problem with MLPs. The maze navigation problem can be represented as a square grid of locations which are considered as obstacles, path ways, or the goal. Figure 4 shows a visualization of the problem in which the pathways are represented by blank cells, obstacles are black cells, and the goal is a red circle. </p><p>The CSRN attempts to learn the shortest distance to the goal from any cell. This is done by computing the cost function for each cell in the grid. Because of the difficulty of calculating the cost function, the results are not always completely correct. However, the results do generally show the shortest distance to the goal by moving to the </p><p>neighboring cell with the lowest value. An example of a target maze and calculated cost function using EKF training are shown in Figs. 5 and, 6 respectively. </p><p>Figure 4: Sample maze. Blank cells represent an open pathway, black cells an obstacle, and red cells represents the target. </p><p> Figure 5. The cost function array for the solution of the maze shown in Fig. 4. </p><p> Figure 6. Output of the CSRN for the maze shown in Fig. 4. I.e, the CSRNs approximation of the cost function for the solution of the maze. The detailed structure of a CSRN is shown in Fig. 7. Figure 7 contains 17 nodes in all; 1 bias node, two external input nodes (obstacle and goal), 4 recurrent inputs from the 4 neighboring cells, 5 self-recurrent input nodes that are fed back from the output nodes, and 5 output nodes. The obstacle input contains a one if that cell contains an obstacle </p><p>Input Pattern</p><p>Cell of CSRN</p><p>Output Transformation</p><p>Network Output </p></li><li><p>*Corresponding author: iftekhar@memphis.edu </p><p>and a zero if it does not. The goal node works similarly. The 4 neighbor nodes take advantage of the symmetry of the problem and tie the cells together in the desired cellular structure. The 5 recurrent nodes contain the information from the output nodes of the previous iteration. The 5 output nodes are fully connected within their layer. Due to this connectivity, the calculated value for the cell is taken from the last output nodes since it contains information from the other 4 nodes. Figure 7. The network structure used in the Maze application. In Illin et al.s work [4] the CSRN is trained using EKF and the maze traversing results are compared to that of the network trained using BPTT. For training a single maze, the authors report that the CSRN trained with EKF converged within 10-15 epochs while BPTT required between 500 to 1000 epochs. This paper also introduces the goodness of navigation measurement. This measurement is basically a test to determine if the correct direction will be taken if an agent is to move from one cell to the neighbor cell with the lowest value. The results show that goodness of navigation for the testing maze is much lower than the same measurement for the training cases. This implies that the CSRN trained with EKF is not generalizing well. However, it is observed that with additional training mazes (25 to 30), the goodness of navigation for the test cases approaches that of the training cases. Figure 8 shows the convergence of the sum squared error (SSE) over all test mazes for a typical run. Figure 8: Covergence of the SSE of the CSRN for the Maze Problem. </p><p>II. IMAGE REGISTRATION USING CSRN </p><p>A. Background Several aspects of the CSRN lead us to investigate its </p><p>application to image processing. First is the CSRNs ability to approximate complex non-linear functions [2]. Second is its ability to approximate functions in a constrained geometric application as demonstrated by its ability to solve the 2D maze problem [2]. Third, is the similarity of the architectures of both CNNs and CSRNs, see Figs. 2 and 3, as well as the maze images, see Fig. 4, to typical binary images. Finally, Ilin et al.s use of the EKF for training [4], reduces the training times sufficiently to make application of CSRNs to image processing tasks computationally more attractive. </p><p>B. Processing Binary Images Similar to the Maze Application Because of the similarity between a standard binary image </p><p>and the maze images used in Pang et al.s maze application [2], our first attempt at using the CSRN in an image processing application is to simply replace the maze image with a binary image of the same size. This requires a slight modification to the encoding of the cost function images used in training the network </p><p>Viewed from an image processing perspective, the maze problem is similar to finding a path from one location to another in an image. This is very similar to edge or contour detection, and so our initial expectation is to see some sort of edge detection. However, the CSRN in our experiments shows no indication of having the ability to detect edges. Because of our interest in facial recognition we select a binary eye patch extracted from the Lena image as our test image. In some sample runs, we notice some slight registration of the image. This observation suggests that CSRNs can be investigated further for image registration. </p><p>C. Reformulating the Maze Application For Image Registration Our problem becomes how to change the goal of the </p><p>CSRN in the maze application to address image registration. Instead of training with images which represent the cost function for the solution of the maze, we need to train the network with images representing the cost functions for affine transformations. In our case, the affine transform is in-plane rotation, since we are interested in pose invariant face recognition. In essence, we need the output of each cell in our network to offer the new position of the pixel represented by that cell. That is, given the cells (pixels) location as inputs to the cells network, we want the output of the cell to be the new, registered, x &amp; y location of the pixel. </p><p>Because rotation in an image is separable in the x &amp; y directions, we may accomplish this in two ways such as 1) increase the number of outputs of our CSRN so that we have two outputs, one each for x &amp; y registration, respectively; or </p></li><li><p>*Corresponding author: iftekhar@memphis.edu </p><p>0 10 20 30 40 50 60 70 80 90 1001450</p><p>1500</p><p>1550</p><p>1600</p><p>epochs</p><p>SS</p><p>E</p><p>SSE - over all test images</p><p>2) simply keep the same network architecture and apply it twice to the image; once for x and then again for y registration. We choose the later option to avoid the complications of recoding. </p><p>The learning cost function for the maze application was reported by Pang et al. [2]. This cost function was used to generate training mazes for the CSRN. In this case, the cost function is given by the standard affine transformation for rotation. Using this transformation, we generate training images by rotating a test image by various angles. In this case, we use -20, -10, 0, 10, and 20. For each of the rotated training images we construct two translation matrices, which encode the registration cost functions for the x &amp; y directions, respectively. The translation matrices, can be encoded with a position cost function or a movement cost function. A position cost function contains the actual registered position of the pixel, while a movement cost function contains how much the pixel needs to be moved in a particular direction. In this work, we test both methods. With small rotations many pixels move very little, and some not at all. While, such movement cost functions cause the network to do less work, they are too sparse and result in many zero valued weights, resulting in less effective registration. Therefore, in this work, we use the position cost function. </p><p>With this structure in place, one network is trained to perform x registration, and another to perform y registration. To test the network we use the same test image rotated to a given angle between 20, not included in the training set. The resulting cell outputs give us the information we need to register each pixel and thus the entire image. Figure 9 shows a typical run. </p><p> Figure 9: Typical registration result for the CSRN using the GMLP architecture. Test image rotated by 12. </p><p>0 10 20 30 40 50 60 70 80 90 1001450</p><p>1500</p><p>1550</p><p>1600</p><p>epochs</p><p>SS</p><p>E</p><p>SSE - over all test images</p><p> Figure 10: Plot of the Sum-Squared Error for the case shown in Fig. 9. </p><p> Figure 9 shows the unrotated test image, the rotated </p><p>image, and the registered image. Figure 10 shows the convergence of the CSRNs SSE. As can be seen the CSRN shows fair error convergence, and therefore, signs of learning. As can be seen, some registration has also occurred in Fig. 9. In this example, the upper half of the test image is registered correctly, while the lower half is not. This sort of local registration appears in many sample runs with some areas of the image being registered better than others. The best approximation to the registration cost function yields around 46% accuracy with a final image registration of around 96% accuracy, from a base-line 82.2% accuracy for the rotated image(12). </p><p>D. A modified CSRN architec...</p></li></ul>