Transcript
Page 1: [IEEE 2009 IEEE Symposium on Computational Intelligence for Multimedia Signal and Vision Processing (CIMSVP) - Nashville, TN, USA (2009.03.30-2009.04.2)] 2009 IEEE Symposium on Computational

*Corresponding author: [email protected]

* Intelligent Systems and Image Processing Lab, University of Memphis, Memphis, TN 38018.

Abstract—

Cellular simultaneous recurrent networks (CSRN)s have been successfully exploited to solve the conventional maze traversing problem. In this work, for the first time, we investigate the use of CSRNs for image registration under affine transformations. In our simulations, we consider binary images with in-plane rotations between ±20°. First, we experiment with a readily available CSRN with generalized multilayer perceptrons (GMLP)s as the basic core. We identify performance criteria for such CSRNs in affine correction. We then propose a modified MLP architecture with multi-layered feedback as the core for a CSRN to improve binary image registration performance. Simulation results show that while both the GMLP network and our modified network are able to achieve localized image registration, our modified architecture is more effective in moving pixels for registration. Finally, we use sub-image processing with our modified MLP architecture, to reduce training time and increase global registration accuracy. Overall, both CSRN architectures show promise for correctly registering a binary image.

I. INTRODUCTION

A. Artificial Neural Networks The artificial neural network (ANN) has been researched

extensively [1][7][8][9]. The ANN attempts to simulate the biological neural system. Some of the applications wherein ANNs have been successfully used are pattern recognition, function approximation, and other similar tasks. ANNs work by adjusting weights based on an error calculation. In general, the more weights a network has, the better it is able to learn the task at hand.

There are several varieties of ANNs which have been designed for many different tasks. Some of the most common ANNs are based on the multi-layer perceptron (MLP). MLPs have been shown to be universal function approximators for adequately smooth functions [1]. When functions are not smooth enough, the MLPs become increasingly complex. For more complex networks, one must either restrict the complexity of the network and settle for a higher error, or find another network to use.

B. Simultaneous Recurrent Networks Simultaneous recurrent networks (SRNs) are a type of

neural network, which has been proven to be more powerful than MLPs [2][10].

This material is based upon work supported by the National Science

Foundation under Grant No. ECCS-0738519 and ECCS-0715116. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.

Previous research has shown that SRNs can always learn functions that are generated by MLPs, but the opposite is not true. This recurrent behavior in SRNs is similar to the way the brain works. SRNs use the output of the current iteration as input for the next iteration. This can be seen in the basic topology of the SRN as shown in Fig. 1. Figure 1. The basic topology of SRN

In Fig. 1, f is the forward function of the NN, W is the weight matrix, x is the network input, and z is the network output. The key feature of the SRN is the fact that the network outputs are fed back as inputs to the network.

C. Cellular Neural Networks Another type of ANN is the cellular neural network

(CNN) which consists of identical elements, arranged in some sort of geometry [2]. This configuration allows for a reduction in the required number of weights. Each element is able to share the same weights due to the symmetry of the network. This weight sharing can significantly decrease the number of weights needed, which, in turn, can significantly decrease the time needed to train the network. The symmetry of CNNs can also be useful in solving problems that contain a similar type of inherent geometry. Each element of such a network can be as simple as an artificial neuron or more complex, as a MLP. A typical cellular architecture is shown in Fig. 2. The similarity between this architecture and a typical image is immediately evident and is discussed further in section II-A.

Figure 2. A typical cellular architecture

Binary Image Registration using Cellular Simultaneous Recurrent Networks

Keith Anderson*, Khan Iftekharuddin*, Eddie White*, and Paul Kim*

Input: x

feedback

Feedforward Network f(W,x,z)

Output: z

z

978-1-4244-2771-0/09/$25.00 ©2009 IEEE

Page 2: [IEEE 2009 IEEE Symposium on Computational Intelligence for Multimedia Signal and Vision Processing (CIMSVP) - Nashville, TN, USA (2009.03.30-2009.04.2)] 2009 IEEE Symposium on Computational

*Corresponding author: [email protected]

D. Cellular Simultaneous Recurrent Networks By combining a SRN and a CNN one can obtain the

cellular simultaneous recurrent network (CSRN). The CSRN is more powerful than a regular MLP which is demonstrated in the fact that a CSRN can solve the maze traversal problem, which the MLP cannot [2]. The behavior of the CSRN mimics the cortex of the brain which consists of columns similar to each other. In early trials the CSRN is trained with back propagation through time (BPTT). However, BPTT is very slow. In Ilin et al’s work [4] the extended Kalman filter (EKF) is implemented to train the network via state estimation. The architecture of the CSRN is shown in Fig. 3. Figure 3. A CSRN architecture In Fig. 3, the geometry of the input pattern is reflected in the geometry of the cellular structure of the CSRN, one network cell (gray box) for each cell in the input pattern. As with CNNs, each cell can be a single artificial neuron or more complex network, such as a MLP. The outputs of each cell can be brought together to produce an overall network output.

E. The 2D Maze Traversing Application Until now, one of the primary applications of the CSRN

has been to solve the generalized 2D maze traversal problem, which is an extremely difficult problem [4]. Tests conducted by Pang et al. [2] were unable to solve this maze traversing problem with MLPs. The maze navigation problem can be represented as a square grid of locations which are considered as obstacles, path ways, or the goal. Figure 4 shows a visualization of the problem in which the pathways are represented by blank cells, obstacles are black cells, and the goal is a red circle.

The CSRN attempts to learn the shortest distance to the goal from any cell. This is done by computing the cost function for each cell in the grid. Because of the difficulty of calculating the cost function, the results are not always completely correct. However, the results do generally show the shortest distance to the goal by moving to the

neighboring cell with the lowest value. An example of a target maze and calculated cost function using EKF training are shown in Figs. 5 and, 6 respectively.

Figure 4: Sample maze. Blank cells represent an open pathway, black cells an obstacle, and red cells represents the target.

Figure 5. The cost function array for the solution of the maze shown in Fig. 4.

Figure 6. Output of the CSRN for the maze shown in Fig. 4. I.e, the CSRN’s approximation of the cost function for the solution of the maze. The detailed structure of a CSRN is shown in Fig. 7. Figure 7 contains 17 nodes in all; 1 bias node, two external input nodes (obstacle and goal), 4 recurrent inputs from the 4 neighboring cells, 5 self-recurrent input nodes that are fed back from the output nodes, and 5 output nodes. The obstacle input contains a one if that cell contains an obstacle

Input Pattern

Cell of CSRN

Output Transformation

Network Output

Page 3: [IEEE 2009 IEEE Symposium on Computational Intelligence for Multimedia Signal and Vision Processing (CIMSVP) - Nashville, TN, USA (2009.03.30-2009.04.2)] 2009 IEEE Symposium on Computational

*Corresponding author: [email protected]

and a zero if it does not. The goal node works similarly. The 4 neighbor nodes take advantage of the symmetry of the problem and tie the cells together in the desired cellular structure. The 5 recurrent nodes contain the information from the output nodes of the previous iteration. The 5 output nodes are fully connected within their layer. Due to this connectivity, the calculated value for the cell is taken from the last output nodes since it contains information from the other 4 nodes. Figure 7. The network structure used in the Maze application. In Illin et al.’s work [4] the CSRN is trained using EKF and the maze traversing results are compared to that of the network trained using BPTT. For training a single maze, the authors report that the CSRN trained with EKF converged within 10-15 epochs while BPTT required between 500 to 1000 epochs. This paper also introduces the goodness of navigation measurement. This measurement is basically a test to determine if the correct direction will be taken if an agent is to move from one cell to the neighbor cell with the lowest value. The results show that goodness of navigation for the testing maze is much lower than the same measurement for the training cases. This implies that the CSRN trained with EKF is not generalizing well. However, it is observed that with additional training mazes (25 to 30), the goodness of navigation for the test cases approaches that of the training cases. Figure 8 shows the convergence of the sum squared error (SSE) over all test mazes for a typical run. Figure 8: Covergence of the SSE of the CSRN for the Maze Problem.

II. IMAGE REGISTRATION USING CSRN

A. Background Several aspects of the CSRN lead us to investigate its

application to image processing. First is the CSRN’s ability to approximate complex non-linear functions [2]. Second is its ability to approximate functions in a constrained geometric application as demonstrated by its ability to solve the 2D maze problem [2]. Third, is the similarity of the architectures of both CNNs and CSRNs, see Figs. 2 and 3, as well as the maze “images”, see Fig. 4, to typical binary images. Finally, Ilin et al.’s use of the EKF for training [4], reduces the training times sufficiently to make application of CSRN’s to image processing tasks computationally more attractive.

B. Processing Binary Images Similar to the Maze Application Because of the similarity between a standard binary image

and the maze “images” used in Pang et al.’s maze application [2], our first attempt at using the CSRN in an image processing application is to simply replace the maze image with a binary image of the same size. This requires a slight modification to the encoding of the cost function images used in training the network

Viewed from an image processing perspective, the maze problem is similar to finding a path from one location to another in an image. This is very similar to edge or contour detection, and so our initial expectation is to see some sort of edge detection. However, the CSRN in our experiments shows no indication of having the ability to detect edges. Because of our interest in facial recognition we select a binary eye patch extracted from the Lena image as our test image. In some sample runs, we notice some slight registration of the image. This observation suggests that CSRNs can be investigated further for image registration.

C. Reformulating the Maze Application For Image Registration Our problem becomes how to change the goal of the

CSRN in the maze application to address image registration. Instead of training with images which represent the cost function for the solution of the maze, we need to train the network with images representing the cost functions for affine transformations. In our case, the affine transform is in-plane rotation, since we are interested in pose invariant face recognition. In essence, we need the output of each cell in our network to offer the new position of the pixel represented by that cell. That is, given the cell’s (pixel’s) location as inputs to the cell’s network, we want the output of the cell to be the new, registered, x & y location of the pixel.

Because rotation in an image is separable in the x & y directions, we may accomplish this in two ways such as 1) increase the number of outputs of our CSRN so that we have two outputs, one each for x & y registration, respectively; or

Page 4: [IEEE 2009 IEEE Symposium on Computational Intelligence for Multimedia Signal and Vision Processing (CIMSVP) - Nashville, TN, USA (2009.03.30-2009.04.2)] 2009 IEEE Symposium on Computational

*Corresponding author: [email protected]

0 10 20 30 40 50 60 70 80 90 1001450

1500

1550

1600

epochs

SS

E

SSE - over all test images

2) simply keep the same network architecture and apply it twice to the image; once for x and then again for y registration. We choose the later option to avoid the complications of recoding.

The learning cost function for the maze application was reported by Pang et al. [2]. This cost function was used to generate training “mazes” for the CSRN. In this case, the cost function is given by the standard affine transformation for rotation. Using this transformation, we generate training images by rotating a test image by various angles. In this case, we use -20°, -10°, 0°, 10º, and 20º. For each of the rotated training images we construct two translation matrices, which encode the registration cost functions for the x & y directions, respectively. The translation matrices, can be encoded with a position cost function or a movement cost function. A position cost function contains the actual registered position of the pixel, while a movement cost function contains how much the pixel needs to be moved in a particular direction. In this work, we test both methods. With small rotations many pixels move very little, and some not at all. While, such movement cost functions cause the network to do “less” work, they are too sparse and result in many “zero” valued weights, resulting in less effective registration. Therefore, in this work, we use the position cost function.

With this structure in place, one network is trained to perform x registration, and another to perform y registration. To test the network we use the same test image rotated to a given angle between ±20º, not included in the training set. The resulting cell outputs give us the information we need to register each pixel and thus the entire image. Figure 9 shows a typical run.

Figure 9: Typical registration result for the CSRN using the GMLP architecture. Test image rotated by 12°.

0 10 20 30 40 50 60 70 80 90 1001450

1500

1550

1600

epochs

SS

E

SSE - over all test images

Figure 10: Plot of the Sum-Squared Error for the case shown in Fig. 9.

Figure 9 shows the unrotated test image, the rotated

image, and the registered image. Figure 10 shows the convergence of the CSRN’s SSE. As can be seen the CSRN shows fair error convergence, and therefore, signs of learning. As can be seen, some registration has also occurred in Fig. 9. In this example, the upper half of the test image is registered correctly, while the lower half is not. This sort of local registration appears in many sample runs with some areas of the image being registered better than others. The best approximation to the registration cost function yields around 46% accuracy with a final image registration of around 96% accuracy, from a base-line 82.2% accuracy for the rotated image(12º).

D. A modified CSRN architecture for Image Registration As previously discussed, Pang et al. [2] use a GMLP network for each cell of the CSRN in solving the 2D maze traversal problem. That application only uses two external inputs. For our image registration application, we attempt to modify the existing CSRN in Fig. 7 with that shown in Fig. 11.

Page 5: [IEEE 2009 IEEE Symposium on Computational Intelligence for Multimedia Signal and Vision Processing (CIMSVP) - Nashville, TN, USA (2009.03.30-2009.04.2)] 2009 IEEE Symposium on Computational

*Corresponding author: [email protected]

Figure 11. Modified MLP architecture with multi-layered feedback used for each cell in the CSRN network. Nodes are fully connected between layers. Network weights shown in blue, label with green text. Feedback paths for recurrent networks shown in red. Our modified architecture utilizes a more standard MLP network, with separate input, hidden and output layers. We also utilize 3 external inputs such as pixel x location, pixel y location and pixel value (intensity) to obtain better control on each of the three free parameters in a typical intensity image. All weights between nodes in the same layer have been zeroed out. Using an MLP with separate layers allows us to feed back the outputs from the hidden nodes as well as that of the output node. This is done to create an “inner” feedback loop to improve network stability, in much the same way an inner velocity loop is used to stabilize an outer position loop in control theory [6].

In choosing to modify the basic network architecture, we had to commit ourselves to a complete rewrite of the forward and backward propagation modules used in implementing the NN. The backpropagation module [3], which computes the Jacobian used by the EKF training method is particularly time consuming. Figure 12. Position control loop with inner velocity loop for loop stabilization. Figures 13 and 14 show the results of applying the new network architecture to the same test image as before. Figure 14 shows good convergence. The convergence of the modified network is consistently smoother than that of the GMLP network in Fig. 10. Figure 13 shows the registration results. While the results for both architectures yield about the same amount of image registration, the modified

architecture appears to have more impact on pixel position, with a mean pixel movement of 2.00pixels vs 1.68pixels for the GMLP architecture. This indicates an 18.5% increase in pixel movement. The modified architecture displays the same ability for local registration. In this case, the upper and lower portions of the vertical bar are registered properly, but center portions are not. Figure 13. Registration results using the Modified MLP architecture with multilayered feedback. Test image rotated by 12°.

0 10 20 30 40 50 60 70 80 90 1001500

2000

2500

3000

3500

4000

4500

5000

5500

epochs

SS

E

SSE - over all test images

Figure 14. Plot of the Sum-Squared Error for the case shown in Fig. 13.

E. Global Image Registration Now we ask ourselves if we can exploit the CSRN’s

ability to perform local registration to register the entire image. In order to achieve this, we divide our 15x15 image into smaller 5x5 sub-images. Next we train a separate CSRN network for each of the sub-images. Once training is complete, in order to register an image, it is divided into sub-images, each of which is processed by its corresponding CSRN. The outputs of each CSRN, ie. the registered sub-images, are then recombined into the final registered image.

Training our modified CSRN for a 5x5 image using 5 training images takes only 16secs vs 400secs for a 15x15 image using the same number of training image. This makes training with additional images practical. We increase the number of training images from 5 to 11, training with images rotated from 0 – 20° in steps of 2°. This allows processing of the 15x15 image in just 150secs.

Figure 15 shows the results of registering the test image using our modified CSRN architecture, with sub-image

Page 6: [IEEE 2009 IEEE Symposium on Computational Intelligence for Multimedia Signal and Vision Processing (CIMSVP) - Nashville, TN, USA (2009.03.30-2009.04.2)] 2009 IEEE Symposium on Computational

*Corresponding author: [email protected]

processing.

Figure 15. Test image(rotated to 16°) registered using our modified CSRN architecture with sub-image processing.

In this example, we see a marked improvement in global

registration, as indicated by the cost function accuracy of 64%, and image accuracy of 98.2% from a base-line 68% accuracy for the 16º rotated image.

. Some local registration errors remain. These errors are primarily located in the sub-image boundary pixels.

III. RESULTS

A. Performance Metrics In order to evaluate how well the CSRN performs image

registration on a rotated image, we obtain a few metrics for measuring registration accuracy. As mentioned earlier, the output of the CSRN cells is the registered position of the corresponding pixel. To perform registration on the rotated image, we simply move each pixel in the rotated image to its new position, specified by the cell’s output. We settled on four metrics to help us evaluate how well the CSRN performs registration. These are the SSE of the network output, cost function percent accuracy, image percent accuracy and the mean pixel movement.

1.) The SSE of the network output is simply the summed squared error between the target registration cost function and the actual network output. This value is plotted as a function of epochs in order to evaluate how well the network convergences/ learns. The magnitude of the SSE varies with image size, so quantitative comparisons between images can only be made for images of the same size.

2.) The cost function percent accuracy, Jacc, compares the output of each cell to the known cost function value for that cell. If the values are equal, we consider that cell a match. The total # of matches normalized by the total # of cells offers the cost function percent accuracy.

3.) The image percent accuracy, Imacc compares the final registered image with the known unrotated image, on a pixel by pixel basis. If the corresponding pixels have equal values, we consider that pixel a match. The total # of matches normalized by the total # of pixels gives us the image percent accuracy.

4.) The mean pixel movement, mpm gives an indication of how much effect the network is having on the image, in particular, how much the network is moving each pixel. To compute this value we sum the Euclidian distance between the pixels original location and the registered position computed by the CSRN and take the average over the total

number of pixels in the image. To compare the GMLP and modified MLP architectures, we perform identical registration tasks using both the GMLP and modified MLP architecture for a large number of trials, and compute the overall mean pixel movement for each architecture. This metric does not indicate how well the network registers the image. It is only used as a tool to provide insight into how much effect the network is having on pixel location.

B. Training/Testing Times For tests where the CSRN is applied to the entire 15x15

image, we train with 5 images, and test with 1. Training/testing is done over 100 epochs. Execution time for these test consistently run in about 400 secs. For tests on 15x15 images using sub-image processing, 11 training images, 1 test image, and 100 epochs are used. These tests run in approximately 150secs, which is a 62.5% decrease in processing time.

C. Registration Results As we notice in Fig. 14, the CSRN’s SSE converges

smoothly during training indicating that the network is learning.

When applying a single CSRN to an image, our best efforts at approximating the correct registration cost function fall in the range of 42%-46% with an accuracy of 92%-96% for moving image pixels to correct locations in the registered image. Note the discrepancy between learning the correct cost function for registration and accuracy for moving image pixels in the registered image is due the fact that in an image a good number of pixels do not move much during the registration process. This is even more dramatic when registering two images that vary by small scale rotation.

When using our modified CSRN architecture with sub-image processing we are able to achieve a cost function accuracy around 64% and an image accuracy of 98%.

We also run tests with a synthetic eye patch, a manually constructed binary image of a single eye. Tests with this eye patch image produce results similar to those performed with our simple test image, as described earlier. Registration results using the synthetic eye patch are shown in Fig. 16.

Figure 16. Results of registering the synthetic eye patch using the modified CSRN architecture with sub-image processing.

D. Registration Errors Analyzing Figs. 9, 13 and 15, we notice three major

sources of error in our registration results. These errors are as follows.

Page 7: [IEEE 2009 IEEE Symposium on Computational Intelligence for Multimedia Signal and Vision Processing (CIMSVP) - Nashville, TN, USA (2009.03.30-2009.04.2)] 2009 IEEE Symposium on Computational

*Corresponding author: [email protected]

1.) Local Registration Errors: In test without sub-image processing, we notice that in many runs, one part of the image is registered correctly, while another is not. This is the case in both the GMLP architecture as well as our modified MLP architecture. This indicates that, given the symmetric nature of in-plane rotation, as the network works to reduce the error in one area of the image, it is increasing the error in another part of the image. In such cases, the local critical points are determining which part of the image is correctly registered.

2.) Overwrite Errors: In order to better understand how the image is being registered, we develop a MATLAB utility that allows us view the registration process, pixel by pixel. We notice that in many cases when an error occurs, it may overwrite a pixel that is correctly registered, in effect penalizing our accuracy measurement twice for a single error. This overwrite error occurs in all test, and affects the global registration.

3.) Sub-Image Boundary Errors: In test using sub-image processing, we see that a small number of pixels located on the boundaries of the sub-images, are not correctly registered. In any affine rotation with fixed image size, some boundary pixels will be lost. These boundary errors also effect the global registration of an image.

IV. CONCLUSION In this work, for the first time, we attempt to obtain

rotated binary image registration using CSRN. The image registration results in binary images show promise. Both the GMLP architecture and the modified MLP architecture demonstrate the ability to do localized registration, with the latter showing more effect on pixel movement. The modified MLP for CSRN offers better error convergence when compared to that of the GMLP. Further, when used in conjunction with sub-image processing, registration accuracy begins to approach percentages that make CSRNs viable for practical binary image registration.

V. FUTURE WORK In the future, we plan to investigate the problems of both

local and global registration errors by 1) using the Cellular network structure without using weight sharing such that each cell can effectively tune its weights to perform the registration required for that cell, and by 2) applying interpolation as a post-processing step to reduce the effects of the sub-image boundary pixel errors. In addition, we plan to further automate our MATLAB application to allow for batch testing in order to perform more rigorous tests and to gather statistical data on the performance of our CSRN network. In the long run, we plan to investigate the use of CSRNs on other relevant face patches(eyes, nose and mouth) for facial registration, and on gray-scale images.

ACKNOWLEDGMENT Authors thank Roman Ilin for many rewarding discussions

and for making the code for CSRN trained with EKF publicly available [4].

REFERENCES [1] C.M. Bishop, Neural Networks for Pattern Recognition. Cambridge,

UK: Oxford University Press, 1995. [2] X. Pang, P. Werbos, “Neural Network Design for J Function

Approximation in Dynamic Programming,” arXiv:adap-org/9806001v1, June. 1998.

[3] P. Werbos, “Backpropagation Through Time: What It Does and How to Do It,” Proceedings of the IEEE, vol. 78, no 10, Oct. 1990.

[4] R. Ilin, R. Kosma, P. Werbos, “Cellular SRN Trained by Extended Kalman Filter Shows Promise for ADP,” 2006 International Joint Conference on Neural Networks. Sheraton Vancouver Wall Centre Hotel, Vancouver, BC, Canada. July, 2006.

[5] B. Smith, “An approach to graphs of linear forms (Unpublished work style),” unpublished.

[6] B. Kuo, Automatic Control Systems, 5th edition, Englewood Cliffs, New Jersey: Prentice-Hall, Inc. pp-161, 1997.

[7] S. Haykin, Neural Networks: A Comprehensive Foundation, 2nd edition, Englewood Cliffs, New Jersey: Prentice-Hall, Inc., 1999.

[8] D.White, D. Sofge(eds), Handbook of Intelligent Control: Neural, Adaptive and Fuzzy Approaches, Van Nostrand., 1992.

[9] M. Minsky, Perceptrons, MIT Pres., 1990. [10] P Werbos, The Roots of Backpropagation: From Ordered Derivatives

to Neural Networks and Political Forcasting, Wiley, 1994.


Top Related