estimating optical flow with cellular neural networks

22
1Correspondence to: B. E. Shi, Department of Electrical and Electronic Engineering, The Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong. Contract/grant sponsor: Air Force Office of Scientific Research (AFSC/JSEP); contract/grant number: F49620-90-C-0029 Contract/grant sponsor: National Science Foundation; contract/grant number: MIP-8912639. CCC 00989886/98/040343 22$17.50 Received 19 September 1996 ( 1998 John Wiley & Sons, Ltd. Revised 15 December 1997 INTERNATIONAL JOURNAL OF CIRCUIT THEORY AND APPLICATIONS Int. J. Circ. ¹heor. Appl., 26, 343 364 (1998) ESTIMATING OPTICAL FLOW WITH CELLULAR NEURAL NETWORKS B. E. SHI1, *, T. ROSKA2 AND L. O. CHUA3 1Department of Electrical and Electronic Engineering, The Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong 2Computer and Automation Institute, Hungarian Academy of Sciences, Budapest, Hungary 3Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, CA 94720, U.S.A. SUMMARY The cellular neural network is a locally interconnected neural network capable of high-speed computation when implemented in analog VLSI. This work describes a CNN algorithm for estimating the optical flow from an image sequence. The algorithm is based on the spatio-temporal filtering approach to image motion analysis and is shown to estimate the optical flow more accurately than a comparable approach proposed previously. Two innovative features of the algorithm are the exploitation of a biological model for hyperacuity and the development of a new class of spatio-temporal filter better suited for image motion analysis than the commonly used spacetime Gabor filter. ( 1998 John Wiley & Sons, Ltd. KEY WORDS: CNN cellular neural network; VLSI; image motion analysis; optical flow 1. INTRODUCTION Cellular neural networks1,2 (CNNs) have been successfully applied to many different problems in image processing3. A CNN consists of a locally interconnected array of neurons or ‘cells’ where each cell corresponds to a pixel in the image. The restriction to local interconnections has enabled analog VLSI implementations of the CNN with unprecedented computational speeds to be designed and fabricated.4,5 Because of the temporal dynamics inherent in the cells of the CNN, it is well suited to processing time-varying images. This work describes a CNN algorithm to estimate optical flow. The optical flow6 is the two-dimensional velocity vector field defined at every pixel in an image describing the apparent motion of the brightness patterns in the image. Because it is often close to the two-dimensional projection of the three-dimentional velocity vector field resulting from relative motion between the imaging camera and the objects in the environment, it is commonly used as a first step in computer vision algorithms which extract three-dimensional shape from moving images. An important feature of this algorithm is that the entire transformation from time-varying iamge intensity to the computation and storage of the optical flow vectors can be accomplished using CNNs. Thus, it is a good candidate for implementation on a programm- able CNN architecture such as the CNN Universal Machine. 7 The algorithm can be divided into two stages. The first stage estimates of the components of the optical flow in different directions. These estimates are called component velocity estimates. The second stage

Upload: l-o

Post on 06-Jun-2016

214 views

Category:

Documents


2 download

TRANSCRIPT

1Correspondence to: B. E. Shi, Department of Electrical and Electronic Engineering, The Hong Kong University of Science andTechnology, Clear Water Bay, Kowloon, Hong Kong.Contract/grant sponsor: Air Force Office of Scientific Research (AFSC/JSEP); contract/grant number: F49620-90-C-0029Contract/grant sponsor: National Science Foundation; contract/grant number: MIP-8912639.

CCC 0098—9886/98/040343—22$17.50 Received 19 September 1996( 1998 John Wiley & Sons, Ltd. Revised 15 December 1997

INTERNATIONAL JOURNAL OF CIRCUIT THEORY AND APPLICATIONS

Int. J. Circ. ¹heor. Appl., 26, 343—364 (1998)

ESTIMATING OPTICAL FLOW WITH CELLULAR NEURALNETWORKS

B. E. SHI1,*, T. ROSKA2 AND L. O. CHUA3

1Department of Electrical and Electronic Engineering, The Hong Kong University of Science and Technology,Clear Water Bay, Kowloon, Hong Kong

2Computer and Automation Institute, Hungarian Academy of Sciences, Budapest, Hungary3Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, CA 94720, U.S.A.

SUMMARY

The cellular neural network is a locally interconnected neural network capable of high-speed computation whenimplemented in analog VLSI. This work describes a CNN algorithm for estimating the optical flow from an imagesequence. The algorithm is based on the spatio-temporal filtering approach to image motion analysis and is shown toestimate the optical flow more accurately than a comparable approach proposed previously. Two innovative features ofthe algorithm are the exploitation of a biological model for hyperacuity and the development of a new class ofspatio-temporal filter better suited for image motion analysis than the commonly used space—time Gabor filter. ( 1998John Wiley & Sons, Ltd.

KEY WORDS: CNN cellular neural network; VLSI; image motion analysis; optical flow

1. INTRODUCTION

Cellular neural networks1,2 (CNNs) have been successfully applied to many different problems in imageprocessing3. A CNN consists of a locally interconnected array of neurons or ‘cells’ where each cellcorresponds to a pixel in the image. The restriction to local interconnections has enabled analog VLSIimplementations of the CNN with unprecedented computational speeds to be designed and fabricated.4,5

Because of the temporal dynamics inherent in the cells of the CNN, it is well suited to processingtime-varying images. This work describes a CNN algorithm to estimate optical flow. The optical flow6 is thetwo-dimensional velocity vector field defined at every pixel in an image describing the apparent motionof the brightness patterns in the image. Because it is often close to the two-dimensional projectionof the three-dimentional velocity vector field resulting from relative motion between the imaging cameraand the objects in the environment, it is commonly used as a first step in computer vision algorithms whichextract three-dimensional shape from moving images. An important feature of this algorithm is that theentire transformation from time-varying iamge intensity to the computation and storage of the optical flowvectors can be accomplished using CNNs. Thus, it is a good candidate for implementation on a programm-able CNN architecture such as the CNN Universal Machine.7

The algorithm can be divided into two stages. The first stage estimates of the components of the opticalflow in different directions. These estimates are called component velocity estimates. The second stage

combines the component velocity estimates into an estimate of the optical flow vector at each pixel in theimage. In this sense, the algorithm is similar to that proposed by Fleet.8

The first stage uses an approach based upon spatio-temporal filtering. To estimate the component of theoptical flow vector parallel to a direction n, we apply the input image simultaneously to a bank ofspatio-temporal filters. The filters are broadly tuned to respond to a set of velocities parallel to n. In otherwords, the output of a given filter is large if the component velocity is within a wide range of its tunedvelocity. Despite the apparently poor resolution of the individual sensors, the component velocity can beestimated accurately by pooling their outputs using a model for the biological phenomenon of hyperacuityproposed by Heiligenberg.9 Unlike most other spatio-temporal filtering based approaches to optical flowestimation, our algorithm does not use space-time Gabor filters. Instead, we define the class of space—velocityseparable filters. The relationship between their output and the velocity and spatial frequency content of theinput is much easier to interpret than that for Gabor filters. In addition, these filters can also be implementedeasily using CNN arrays. The idea to combine CNN arrays with the Heiligenberg model of hyperacuity wasfirst suggested by Roska and Heiligenberg.10

The possibly conflicting estimates of component velocities can be combined using a CNN which minimizesthe errors between the component velocities of the estimated flow vector and the estimated componentvelocities. The CNN also incorporates a smoothness constraint which limits the variation between neigh-bouring estimate of the optical flow vectors, especially at image locations where the component velocityestimates are unreliable. Experimental results indicate that the algorithm produces more accurate estimatesthan the algorithm proposed by Heeger11 which uses the outputs of Gabor filters. We believe that theimprovement is primarily due to the simpler dependence of the output of the space-velocity filters on thecomponent velocity.

The remainder of the paper is organized as follows: Section 2 reviews Heiligenberg’s model for hyperacuityand proves that it is valid in a wider set of conditions than previously considered. In particular, it can beapplied to the outputs of the filters used in our algorithm. Section 3 elucidates the first stage of the algorithmby describing a CNN algorithm to estimate the local image velocity in one-dimensional images and thengeneralizing it to estimate component velocity in two-dimensional images. Section 4 describes the secondstage, where the component velocity estimates are combined. Finally, section 5 concludes by summarizingour results.

2. THE HEILIGENBERG HYPERACUITY MODEL

Typically, neurons which sense an environmental stimulus are tuned to respond maximally if the stimulus lieswithin a given range. For example, the cells in the inner ear are tuned to respond maximally to differentfrequency ranges. Thus, the inner ear is often modelled as a bank of bandpass filters. Hyperacuity isa phenomenon observed in biological system where an organism can resolve changes in the stimulus whichare much smaller than the sensitivities of the individual neurons. In the example of the inner ear, humans canresolve changes in frequency which are much smaller than the band-widths of the bandpass filters modellingthe inner ear.

In Reference 9, Heiligenberg proposes a simple mechanism to explain hyperacuity. Consider a set of2M#1 sensors which produce a real valued output in response to a real valued stimulus v. Indexing thesensors by k3M!M, 2 , MN, assume that the sensitivity of the kth sensor is a Gaussian function ofv centered at k. In other words, the output of sensor k to a stimulus v

0is b (v

0!k) where

b (x)"1

J2npe~x2@2p2

Assume the standard deviation of the Gaussian sensitivity is the same for all sensors, see Figure 1(a).

344 B. E. SHI, T. ROSKA AND L. O. CHUA

Int. J. Circ. ¹heor. Appl., 26, 343—364 (1998)( 1998 John Wiley & Sons, Ltd.

Figure 1. (a) The Heiligenberg model of hyperacuity assumes a population of sensors tuned to respond maximally to integer values ofa stimulus with Gaussian tuning characteristics; in this plot, the second deviation of the Gaussians, p, is 0)5. (b) The weighted sum of the

responses of these sensors according to (1) is nearly linear in the center region of the range covered by the sensors

Heiligenberg has demonstrated that the weighted sum of the sensor responses

f (v)"M+

k/~M

kb (v!k) (1)

is approximately linear in v between !M and M if the standard deviation, p, is large enough; see Figure 1(b).Baldi and Heiligenberg12 bounded the error in the approximation for the case M"R and noted that (1) isapproximately linear even if b (v) is not Gaussian, but merely bell-shaped.

Proposition 1 generalizes the bound derived by Baldi and Heiligenberg to b(x) which are not Gaussian,proving that f (v) is approximately linear for a wider class of sensitivities than previously considered.Intuitively, f (v) is a discrete approximation to the first moment or centre of mass of b (v!k):

f (v)"=+

k/~=

kb (v!k)+P=

~=

xb (v!x) dx. (2)

If b(x) is symmetric about the origin, then b(v!x) is symmetric about v and its first moment is exactly equalto Av where A is the area under b (x). The approximation f (v) will be valid if b(x) is ‘smooth enough’ that theerror introduced by discretization is small. The discussion following Proposition 1 elucidates the relationshipbetween the smoothness of b (x) and the error bound.

Proposition 1. Consider a even symmetric function b :RPR whose Fourier transform B(u)":=~=

b (x)e~+uxdx exists and is differentiable for all u3R. Define

f (v)"=+

k/~=

kb (v!k)

g (v)"=+

k/~=

b (v!k)

A"P=

~=

b(x) dx

ESTIMATING OPTICAL FLOW WITH CELLULAR NEURAL NETWORKS 345

( 1998 John Wiley & Sons, Ltd. Int. J. Circ. ¹heor. Appl., 26, 343—364 (1998)

If b (u) satisfies1

n=+k/1

DB (2nk)D"e1(R

1

n=+k/1

DB@(2nk)D"e2(R

then g(v) approximates A in the sense that Dg(v)!AD)e1

and f (v) is approximately linear in the sense thatD f (v)!Av D)e

1Dv D#e

2.

Proof: Define the error function e1(v)"g (v)!A. By Reference 13,

De1(v) D)

1

2n P=

~=

DE1(u) Ddu (3)

where E1(u)"G(u)!Ad(u) is the Fourier transform of e

1(x) and G (u) is the Fourier transform of g (v).

Since g(v)"b(v) * [+=k/~=

d(v!k)] and +=k/~=

d (v!k) has Fourier transform +=k/~=

d(u!2nk),G(u)"B(u) +=

k/~=d(u!2nk) . Since A"B(0), E

1(u)"B(u) +

kE0d(u!2nk). Substituting into equation (3),

De1(v) D)

1

2n P=

~=K +kE0

B (u)d (u!2nk) Kdu

)

1

2n+kE0

P=

~=

DB (u) Dd(u!2nk) du

"

1

2n+kE0

DB (2nk) D"1

n=+k/1

DB(2nk) D"e1

In the last step, we used the fact that since b (x) is real, DB(u)D"DB (!u)D. Thus Dg (v)!AD)e1.

To complete the proof, notice that f (x) can be expressed as f (v)"vg (x)!h (v) where

h (v)"=+

k/~=

(v!k) b(v!k)

By the triangle inequality, D f (v)!Av D)Dg(v)!AD ) Dv D#Dh (v) D. We proved above that Dg (v)!A D)e1.

A similar proof using the fact that the Fourier transform of xb(x) is jB@(u) shows that Dh(v) D)e2. h

The size of e1

and e2

control the accuracy of the approximation. These depend upon the rates at whichDB(u) D and DB@(u) D decay as DuDPR. The faster the decay, the smaller the error. The decay rates are related tothe smoothness of b (x). Intuitively, the faster the decay, the smaller the high-frequency content in b (x).Formally,13 if b (x) and its first n!1 derivatives are continuous and absolutely integrable, then

lim@u@?=

DuDnB (u)P0

lim@u@?=

DuDnB@(u)P0

It is clear why the Gaussian sensitivity leads to a good approximation to a linear function. All derivatives ofa Gaussian are continuously differentiable and absolutely integrable.

Proposition 1 assumed an infinite number of integer spaced sensors. If a finite number of sensors is used,the approximations will be best in the middle of the range covered by the sensors. The range over which theapproximations are valid is larger the faster b (x) decays as Dx DPR since the effect of truncating the infinitesums will be less. According to the uncertainty relation, the faster b (x) decays, the slower DB (u) D and DB@ (u) Ddecay as DuDPR. This increases the error terms e

1and e

2. Thus, there is a trade-off between the accuracy of

346 B. E. SHI, T. ROSKA AND L. O. CHUA

Int. J. Circ. ¹heor. Appl., 26, 343—364 (1998)( 1998 John Wiley & Sons, Ltd.

the approximation in the middle of the range and the range over which the approximation holds. In thisrespect, the Gaussian function is again optimal in some sense since it minimizes the uncertainty relation.

Example 1. Consider a triangular response characteristic with width d:

b (x)"G(1!Dx D)/d, Dx D)d

0, Dx D'd.

Since b (u)"sinc2(ud/2) where Dsinc(x) D"Dx~1(sinx) D)Dx~1 D ,

e1)

2

d2n2

=+k/1

1

k2e2)

2

d2n2

=+k/1

1

k2#

1

d2n3

=+k/1

1

k3

As d increases, the error also decreases.

Example 2. There is a large set of tuning curves b (x) for which f (x) is exactly linear. In particular, supposeb(x) is the result of convolving any even symmetric function a (x) by

c(x)"1

ndsincA

x

d Bwhere d'(2n)~1. Let A(u) and C(u) denote the Fourier transforms of a(x) and c (x). Since, C(u)"0 for allDuD'1/d, B (u)"A(u) C(u) for all DuD*1/d. Thus e

1" e

2"0.

3. COMPONENT IMAGE VELOCITY ESTIMATION

The first stage of the optical flow algorithm generates estimates of component velocities at each pixel in theimage. If the true optical flow vector is v. The true component velocity in direction of unit vector n is thescalar product vn"v ) n. In estimating the component velocities for a set of directions, we have decomposedthe two-dimensional estimation problem into a set of one-dimensional estimation problems. Thus, to clarifyour explanation, we discuss the estimation of image velocity in one-dimensional images, before generalizingto estimating the component velocities in two dimensions.

Basically, each pixel has a set of sensors which are tuned to respond maximally to different velocities. Thesesensors are a particular type of spatio-temporal image filter defined as ‘space—velocity separable’ in Section3.1. Section 3.2 explains how to implement space—velocity separable filters with cellular neural networks. InSection 3.3, the Heiligenberg model of hyperacuity is used to estimate local image velocity in one-dimen-sional images by combining the outputs of CNN filters tuned to different velocities. Finally, the componentvelocity estimation algorithm is outlined in Section 3.4. For the remainder of the paper, we use capital lettersto denote Fourier transforms in space, tilde notation to denote Fourier transforms in time, and a combina-tion to denote transforms in space and time.

3.1. Space—velocity separable image filters

The motivation for using spatio-temporal filters to analyse image motion lies in the following interpreta-tion of image motion in the spatio-temporal frequency domain. Consider a one-dimensional image i(x, t)undergoing uniform translation at velocity v. The Fourier transform of i (x, t) in space and time, II (u

x, u

t) is

non-zero only along the line ut"!v )u

xwhere u

tis the temporal frequency variable and u

xis the spatial

frequency variable.14~16 For example, suppose the image is a translating sine wave grating with spatialfrequency )

x, i (x, t)"sin()

x(x!vt)); see Figure 2. The image intensity oscillates in space and time with

frequencies )x

and )t, i.e. i(x, t)"sin()

xx#)

tt). Equating the arguments in the two expressions yields

ESTIMATING OPTICAL FLOW WITH CELLULAR NEURAL NETWORKS 347

( 1998 John Wiley & Sons, Ltd. Int. J. Circ. ¹heor. Appl., 26, 343—364 (1998)

Figure 2. (a) A translating sine wave grating oscillates both in space and in time. (b) The relationship between the spatial frequency andthe temporal frequency of the oscillation is given by )

t"!v)

x, where v is the speed at which the grating is translated

)x"!v)

t. More generally, i (x, t)"i

0(x!vt), where i

0(x) is the image intensity at t"0. Fourier trans-

forming in space and time yields

II (ux, u

t)"2nI

0(u

x)d (u

t#vu

x) (4)

which is non-zero only along ut"!v )u

x.

Using the outputs of filters tuned to different regions of the spatio-temporal frequency domain, severalauthors have proposed different algorithms to measure the optical flow.8,11,17 The filter commonly used inthese algorithms is the space—time separable Gabor filter which bandpass filters an image in space and time.It has a complex-valued impulse response

h(x, t)"A1

J2npx

e~x2@2px2ej)

xxB A1

J2npt

e~t2@2pt2ej)

ttBand Gaussian frequency response centered at ()

x, )

t):

H (ux, u

t)"e(~px

2@2) (ux~)x)e(~pt2@2) (ut~)t).

The algorithms in References in 11 and 17 convert the complex-valued output of the Gabor filter to a scalarquantity at each pixel called the ‘motion energy’14 by squaring the magnitude of the output. Intuitively, dueto the spatial and temporal band-pass filtering of the Gabor filter, one might expect the motion energyoutput to be maximal for an input image moving at velocity v"!)

t/)

x. This is not the case. Consider a sine

wave grating input with fixed spatial frequency uxo

but varying velocity v, i (x, t)"sin(uxo

(x!vt)).A space—time separable filter can be considered to be the cascade of two independent filters: a spatial filterand a temporal filter. Changing the velocity only changes the temporal frequency of the input, u

t. Since the

effect of the spatial filter is independent of velocity, the variation in the motion energy output is entirely dueto the temporal filter. Since the gain of the temporal filter is maximum for u

t")

t, the motion energy output

is maximum for velocity v"!)t/u

xo. Lowering the spatial frequency of the grating raises the velocity

where the motion energy output is maximal. For an image with multiple spatial frequency components, thereis a complex dependency of the motion energy output on both the spatial frequency content of the image andits velocity. To deal with this complexity, the algorithms in References 11 and 17 must make simplifyingassumptions which are not satisfied in practice. Violating these assumptions leads to errors in the estimatedoptical flow.

348 B. E. SHI, T. ROSKA AND L. O. CHUA

Int. J. Circ. ¹heor. Appl., 26, 343—364 (1998)( 1998 John Wiley & Sons, Ltd.

If instead of space—time separable motion energy filters, we consider space—velocity separable filters, themotion energy output can be split into two parts, one of which depends only upon the spatial frequencycontent of the image and the other which depends only upon the velocity of translation. A space velocityseparable filter can also be considered as a cascade of two independent filters. The first filter is a spatial filterwhose spatio-temporal frequency response is constant in u

tfor fixed u

x. The second filter is a velocity filter in

the sense that its spatio-temporal frequency response is constant along lines ut"!vu

xfor fixed v. Formally,

Definition 1. A spatio-temporal filter for one-dimensional images is said to be space—velocity separable if itsfrequency response can be expressed in the form

HI (ux, u

t)"G

Hs(u

x)HI

v(u

x, u

t) foru

x'0

0 forux)0

where Hs(u

x)3C and there exists a »: RPC such that HI

v(u

x, !lu

x)"»(v) for all u

x'0 and l3R.

The impulse response of a space—velocity separable filter must be complex since it is not conjugatesymmetric about the origin. Define the motion energy output to be the squared modulus of the filter output. Ifthe image input to a space—velocity-separable filter is uniformly translating at velocity v, its spectrum is onlynon-zero along the line u

t"!vu

x. Since the frequency response of the velocity filter HI

vis constant along

this line, its effect is simply a velocity-dependent scaling of the motion energy output.

Proposition 2. Consider a one-dimensional real image i(x). Define o(x, t) to be the output of a space—velocity-separable filter with input i(x!vt). ºsing the notation in Definition 1,

Eo (x, t)E2"E»(v)E2c (x!vt)

where

c(x)"KK1

2n P=

0

Hs(u

x) I(u

x) e+uxxdu

x KK2.

Proof: Since the spatio-temporal Fourier transform of i (x!vt) is I(ux)d(u

t#vu

x),

OI (ux, u

t)"G

Hs(u

x)HI

v(u

x, u

t)I(u

x)d(u

t#vu

x) for u

x'0

0 for ux)0

Taking the inverse Fourier transform,

o (x, t)"1

2n P=

0

Hs(u

x) I (u

x)C

1

2n P=

~=

HIv(u

x, u

t)d (u

t#vu

x) e+ut tdu

tD e+uxxdux

"

» (v)

2n P=

0

Hs(u

x) I (u

x) e+ux(x~vt)du

x.

Thus,

Eo(x, t)E2"E»(v)E2 KK1

2n P=

~=

Hs(u

x) I(u

x) e+ux(x~lt)du

x KK2.

Define

c (x)"KK1

2n P=

~=

Hs(u

x) I(u

x) e+uxxdu

x KK2.

By the shift property of the Fourier transform, Eo(x, t)E2"E» (v)E2c (x!vt).

ESTIMATING OPTICAL FLOW WITH CELLULAR NEURAL NETWORKS 349

( 1998 John Wiley & Sons, Ltd. Int. J. Circ. ¹heor. Appl., 26, 343—364 (1998)

Figure 3. A space—velocity-separable filter whose output is complex can be constructed from a real-valued spatio-temporal filter andtwo real-valued spatial filters

Proposition 2 shows that »(v) determines the velocity tuning of the motion energy output. Definea space—velocity filter to be velocity bandpass if E»(v)E2 is maximal for some centre velocity and decreases asDv!vN D increases. Define the velocity bandwidth *v to be the difference between the velocities at whichE»(v)E2 falls to half its maximum value.

The definition of a space—velocity-separable filter implies that its impulse response is complex. However, itis possible to construct a space—velocity-separable filter from spatial and spatio-temporal filters withreal-valued impulse responses using the architecture shown in Figure 3. The HI

1(u

x, u

t) must satisfy the

conditions of Definition 1 for ux'0 and the filters H

s1(u

x) and H

s2(u

x) must satisfy H

s2(u

x)"!j sgn(u

x)

Hs1

(ux). The latter condition implies that the impulse responses of the two spatial filters are Hilbert pairs.

Define HI (ux, u

t) to be the frequency response of the composite filter.

HI (ux, u

t)"(H

s1(u

x)#jH

s2(u

x))HI

1(u

x, u

t)

To show that the composite filter is space—velocity-separable, suppose first that ux'0. Substituting

HI1(u

x, u

t)"H

s(u

x) H

v(u

x, u

t) and H

s2(u

x)"!jH

s1(u

x) yields HI (u

x, u

t)"[2H

s1(u

x) H

s(u

x)]]HI

v(u

x, u

t).

Thus, the filter response can be separated into a spatial filter and a velocity filter. Suppose now ux)0.

Substituting Hs2

(ux)"jH

s1(u

x) yields HI (u

x, u

t)"0. Thus, the composite filter satisfies the condition of

Definition 1.A useful interpretation of any spatio-temporal filter is as a set of independent temporal filters, each

operating on a different spatial frequency component of the input. Define the discrete Fourier transform inspace of the input i (x, t) to be I(u

x, t). For uN

xfixed, I (uN

x, t) is a complex-valued waveform representing the

temporal variation of the uxspatial frequency component. This waveform is filtered by a temporal filter with

frequency response HI (uNx, u

t).

For a space—velocity-separable filter, the temporal frequency responses are proportional to »(!ut/u

x). If

vN is the centre velocity and *v is the velocity bandwidth, »(ut) is bandpass with centre frequency lN and

3dB bandwidth *l. The temporal filter for the ux

spatial frequency component is also bandpass but withcentre frequency!lN u

xand bandwidth *l )u

x. Thus, the centre frequency and the bandwidth of the temporal

filter scale linearly with the spatial frequency. In contrast, for a space—time-separable filterHI (u

x, u

t)"H

s(u

x)hJ

t(u

t). If hJ

t(u

t) is a bandpass filter, the centre frequency and the bandwidth of the

corresponding temporal filter are independent of ux.

In summary, unlike space—time-separable Gabor filters, the motion energy output of space—velocity-separable filters for uniformly translating input images is easy to interpret. At each pixel, the output is theproduct of two terms, E» (v)E2 which depends only upon the velocity of translation and c(x!vt) whichdepends only upon the spatial frequency content of the image. The latter term is the squared magnitude of theoutput of the spatial filter H

4(u

x) applied to input image. The next section elucidates the implementation of

velocity bandpass filters using cellular neural networks.

350 B. E. SHI, T. ROSKA AND L. O. CHUA

Int. J. Circ. ¹heor. Appl., 26, 343—364 (1998)( 1998 John Wiley & Sons, Ltd.

3.2. Implementing space—velocity-separable filters with cellular neural networks

A CNN is a neural network consisting of an array of neurons, called ‘cells’. Each cell is a dynamical systemwith a real valued input and output. To process an N pixel one-dimensional image, we use a linear array ofN cells where each cell corresponds to a pixel. The input to cell x3M1, 2 , NN, u (x, t) , is the image intensityat the xth pixel at time t. The output is the result of the computation performed by the network. In this work,we consider a special class of CNN known as the discrete time CNN linear filter array.18,19 The output of cellx is real-valued and satisfies the difference equation

y(x, t)"r+

m/~r

a(m) y(x#m, t!1)#r+

m/~r

b1(m)u (x#m, t!1)

#

r+

m/~r

b0(m) u(x#m, t) (5)

The array is locally interconnected in the sense that y (x, t) is an explicit function the outputs and inputs of thecells within a small neighbourhood of x. The connection radius r determines the size of the neighbourhood.Despite the local interconnectivity, the feedback interconnections allow information to be propagatedthrough the entire array. The coefficients A"Ma (m)N are defined as the feedback cloning templates. Thecoefficients B

0"Mb

0(m)N and B

1"Mb

1(m)N are defined as the feedforward cloning templates. The discrete time

CNN linear filter array is essentially a recursively implementable spatio-temporal filter. The cloning templatecoefficients determine what type of filter is implemented.

Since y (x, t) is real, we build the space—velocity-separable filter using the architecture in Figure 3 whereeach of the three filters is implemented using a CNN. The spatial convolution required for H

s1or H

s2can be

performed by setting a (m)"b1(m)"0 for all m and b

o(m) to be the space-reversed impulse response of the

filter. In this work, we use the odd Gabor filter20 and its Hilbert pair, which can often by approximated by aneven Gabor filter. The spatial frequency responses of the odd Gabor filter and its Hilbert pair are

jAexpA!(u

x!)

x)2

2p2 B!expA!(u

x#)

x)2

2p2 B and K expA!(u

x!)

x)2

2p2 B!expA!(u

x#)

x)2

2p2 B KConvolution kernels of radius r can be obtained by taking the 2r#1 point inverse DFT of uniformlysampled version of these frequency responses, see Figure 4.

The design of HI1

is based on pole-zero placement. Assume an infinite array of cells and that b0(0)"1 and

b0(m)"0 for all mO0. Taking the discrete Fourier transform of (5) in space and time, we obtain the

spatio-temporal frequency response of the array

HI1(u

x, u

t)"

1#B1(u

x)e~+ut

1!A(ux) e~+ut

"

½I (ux, u

t)

ºI (ux, u

t)

where A(ux)"+ r

m/~ra (m) e+uxm and B

1(u

x)"+r

m/~rb1(m) e+uxm. The pole and the zero locations of the

temporal filter corresponding the ux

spatial frequency component are A(ux) and !B

1(u

x). These can be

chosen to ensure that HI1(u

x, u

t) approximately satisfies the conditions of Definition 1. Given a connection

radius r, we find the cloning templates A and B1

by taking 2r#1 point inverse DFTs of uniformly sampledversions of A*(u

x) and B*

1(u

x), i.e.

a(m)"1

2r#1

r+

k/~r

A*A2nk

2r#1B e+(2nkm)@(2r`1), b1(m)"

1

2r#1

r+

k/~r

B*1 A

2nk

2r#1B e+(2nkm)@(2r`1)

We discuss first the specification of pole locations A(ux) . Recall that the frequencies near the poles of

a transfer function will be enhanced at the output of the filter while the frequencies near the zeros will besuppressed. For a velocity bandpass filter, the pole locations will be critical in determining the characterisitics

ESTIMATING OPTICAL FLOW WITH CELLULAR NEURAL NETWORKS 351

( 1998 John Wiley & Sons, Ltd. Int. J. Circ. ¹heor. Appl., 26, 343—364 (1998)

Figure 4. The odd gabor convolution kernel (a) and its Hilbert pair (b) or )x"0)4n, p"0)15n and r"10

in the velocity passband. The zeros determine the behaviour in the stop band and are specified similarly tothe poles.

Assume that b(m)"0 for all m. Since B (ux),0,

HI1(u

x, u

t)"

1

1!A(ux) e~+ut

"

1

1!EA (ux)Ee~+(ut~¸A(ux))

The temporal filter corresponding to the ux

spatial frequency component is bandpass filter with centrefrequency )

t"¸A(u

x). The amplitude of the pole, EA(u

x)E, must be less than one to ensure stability and

controls the 3 dB bandwidth of the filter, *ut:

EA(ux)E"a(*u

t)"(2!cos(0)5 )*u

t))!J(2!cos(0)5 )*u

t))2!1

A necessary condition for space—velocity separability is that the centre frequency and the bandwidth of thetemporal filters scale linearly with u

x. Denoting the centre velocity by vN and the velocity bandwidth by

*l, )t"!vN u

xand *u

t"*vu

x. Thus, the desired pole locations are

A(ux)"a(*v )u

x) e~+v6 ux

The constraints on the centre frequency and the bandwidth completely specify the pole locations. However,because they are only necessary but not sufficient conditions for space—velocity separability, the resultingfilter is only space—velocity separable in the velocity passband. In particular, the phase varies significantlywith u

xfor velocities far from vN ; see Figures 5(a) and 5(b).

Adding a zero tuned to velocity vN#dvz

improves the space—velocity separability for velocities far fromvN because the zero cancels the effect of the pole. Unfortunately, the zero eliminates the bandpass behaviour ofthe filter. Choosing dv

z(0 results in a velocity high-pass filter. The zero suppresses the response to velocities

below vN#dvz

while the pole enhances the response to velocities above vN . Choosing dvz'0 results in

a velocity low-pass filter. A bandpass filter can be regained by cascading two filters: a velocity high-pass filterpassing velocities greater than vN!dv

pand a velocity low-pass filter passing velocities less than vN#dv

p. The

desired pole and zero locations are

!B1(u

x)"a (*v

z)u

x) e~+(v$dvz)ux and A(u

x)"a(*v

p)u

x) e~+(v$dvp)ux

352 B. E. SHI, T. ROSKA AND L. O. CHUA

Int. J. Circ. ¹heor. Appl., 26, 343—364 (1998)( 1998 John Wiley & Sons, Ltd.

where dvz'dv

p'0. Choosing *v

z'*v

pensures that the filter is sharply tuned to image velocities near

vN and decreases nearly monotonically as the image velocity moves away. The frequency responses of the highand low-pass velocity filters are

HI)1

(ux, u

t)"

1!a (*vzu

x) e~+(ut~(v~dvz)ux)

1!a (*vpu

x) e~+(ut~(v~dvp)ux)

, HI-1

(ux, u

t)"

1!a (*vzu

x) e~+(ut~(v`dvz)ux)

1!a(*vpu

x) e~+(ut~(v`dvp)ux)

The velocity sensitivity of the motion energy filter is

E» (v)E2"KKHI

-1(1, !v)HI

)1(1, !v)

HI-1

(1, !vN )HI)1

(1, !vN ) KK2

Figures 5(c) and 5(d) show that the filter with parameters *vz"2)0, dv

z"1)0, *v

p"1)5, dv

p"0)375 and

vN"0 is approximately space—velocity separable. Figure 6 plots the cloning templates obtained by taking theinverse DFTs of uniformly sampled versions of A*(u

x) and B*(u

x) with r"10. Figure 7 shows that the

motion energy output for a translating bar input is approximately proportional to E»(v)E2.At each pixel in the image, the motion energy output can be considered as the output of a sensor which is

tuned to respond to image velocities near the center velocity. We have been assuming that the input image istranslating uniformly. If the image velocity is different in different region of the image, each pixel’s outputdepends on the image velocity in a neighbourhood of it. The size of this neighbourhood depends upon spatialand temporal support of the impulse response of the spatio-temporal filter.

3.3. »elocity estimation

Consider a set of the CNN velocity sensors designed above, each tuned to a centre velocityk3M!M,2 , MN. If these filters have the same parameters dv

p, dv

z, *v

pand *v

z, the shapes of the velocity

sensitivities are identical modulo the offset by the centre velocity. Fix x and t. Since the transfer function offilter k can be approximated by HI

k(u

x, u

t)"H

s(u

x) »(!u

t/u

x!k), the motion energy output of filter k is

ok(x, t)"E» (v!k)E2c (x!vt) where c(x) is the same for all k and v is the local image velocity. By

Proposition 1

f (x, t)"M+

k/~M

kok(x, t)"c(x!vt)

M+

k/~M

kE»(v!k)E2"Ac(x!vt)v

where A": E»(m)E2 dm. Since Ac(x!vt) can be approximated by g (x, t)"+Mk/~M

ok(x, t), a plausible

estimate of the velocity at pixel x and time t is

v%45

(x, t)"f (x, t)/g(x, t) (6)

Unfortunately, these estimates generated by this simple scheme are corrupted by the aliasing effects of thetemporal sampling. The highest unaliased normalized temporal frequency is u

t"n. For each u

x, velocities

with magnitudes greater than n/uxare aliased to lower velocities. This aliasing effect causes the contour lines

in Figure 5 to curve towards the uxaxis as u

xincreases. This degrades the space—velocity separability of the

CNN filters for large spatial frequencies and velocities far from the centre velocity.The aliasing effect can be viewed as adding ‘noise’ to the response of an idealized bell-shaped curve. The

function f (x, t) is extremely sensitive to random noise added to the sensor outputs. Consider (1) where theoutput of each sensor b(x!k) is replaced by b(x!k)#n

k(x) where n

k(x) is independent from sensor to

sensor with zero mean and variance p2. The variance of the error in the output is

M+

k/~M

k2p2"M(M#1) (2M#1)

3p2

which increases as the cube of the number of sensor outputs combined. To minimize the effect of noise, thenumber of sensors combined should be limited. In addition, the ‘noise’ due to aliasing is larger for velocitiesfar from the centre velocity of the filter.

ESTIMATING OPTICAL FLOW WITH CELLULAR NEURAL NETWORKS 353

( 1998 John Wiley & Sons, Ltd. Int. J. Circ. ¹heor. Appl., 26, 343—364 (1998)

Figure 5. Given HI1(u

x, u

t), the velocity filter can be found by HI

v(u

x, u

t)"HI

1(u

x, u

t)/HI

1(u

x, !vN u

x) where vN is the center velocity. For

space-velocity separability, HIv(u

x, !vu

x) should be constant in u

xfor fixed v in the (u

x, v) plane. Contour plots show that the

amplitude (a) of the single pole filter tuned to vN"0 with bandwidth *v"1 is approximately space—velocity separable for velocities near1, but the phase (b) is not. However, both the amplitude (a) and phase (b) of the cascade of a velocity lowpass and a velocity highpass

filter are approximately space—velocity separable

However, limiting the number of sensor decreases the range of velocities over which f (x, t) is linear andg(x, t) is constant. To overcome this limitation, we can roughly estimate the velocity by the centre frequencyof the filter with the maximum response: vN (x, t)"argmax

kMo

k(x, t)N. The Heiligenberg model can refine this

estimate by using the responses of the filters with centre velocities closest to the rough estimate to computea correction *v(x, t)"f

c(x, t)/g

c(x, t) where f

c(x, t)"+1

k/~1lkovN (x, t)`k

(x, t) and gc(x, t)"+1

k/~1ckov6 (x, t)`k

(x, t).The final velocity estimate can be expressed in the form (6) by setting

f (x, t)"gc(x, t)v

0(x, t)#f

c(x, t), g (x, t)"g

c(x, t) (7)

354 B. E. SHI, T. ROSKA AND L. O. CHUA

Int. J. Circ. ¹heor. Appl., 26, 343—364 (1998)( 1998 John Wiley & Sons, Ltd.

Figure 6. The feedforward (a,b) and feedback (c,d) templates for a template radius of 10 pixels for the second order space—velocity-separable CNN filters

To ensure that fc(x, t) is nearly linear and g

c(x, t) is nearly constant over the range [!v

-*., v

-*.], l

kand c

kare

chosen to minimize discrete approximations to the error functions

E1(l~1

, l0, l

1)"

1

2v-*.P

v-*.

~v-*.Cv!

1+

k/~1

lkE»(v!k)E2D

2dv

E2(c

~1, c

0, c

1)"

1

2v-*.P

v-*.

~v-*.C1!

1+

k/~1

ckE»(v!k)E2D

2dv

The limit v-*.

should be greater than 0)5 to ensure that the correction term is valid over the range where vN (x, t)is constant. The larger the range, the better the network can compensate for errors in the rough estimatevN (x, t), but the poorer the estimate when the vN (x, t) is correct. We have chosen v

-*."0)75.

We tested this algorithm by one-dimensional image sequences obtained by translating the input imageintensity distribution plotted in Figure 8. CNN motion energy filters were tuned to velocities !2)25,!1)5, !0)75, 0, 0)75, 1)5 and 2)25 pixels per frame with filter parameters that were the same as those used in

ESTIMATING OPTICAL FLOW WITH CELLULAR NEURAL NETWORKS 355

( 1998 John Wiley & Sons, Ltd. Int. J. Circ. ¹heor. Appl., 26, 343—364 (1998)

Figure 7. The actual output (solid line) of an approximately space—velocity-separable discrete time CNN motion energy filter at thepixel corresponding to the right-hand edge of a translating bar is nearly equal to a scaled version of E»(v)E2 (dash-dot line). The inputwas a translating bar of 20 pixels in a 100 pixel image. The output was taken at the cell corresponding to the right-hand edge of the bar

after 15 frames of translation

Figure 8. This one-dimensional image intensity distribution was translated by varying velocities to obtain image sequences to test theimage velocity measurement algorithm

Section 3.2. Figure 9 plots the measured velocity versus the input velocity at several different pixels. Thespatial variation of the measured velocity for several different input velocities is shown in Figure 10(a). Thepoints where the velocity estimate is poor correspond to points where g

c(x, t) is small; see Figure 10(b). These

points generally correspond to areas with little high spatial frequency content. Thus, gc(x, t) is an indicator for

our confidence in the velocity estimate. This feature is exploited in the next section.

356 B. E. SHI, T. ROSKA AND L. O. CHUA

Int. J. Circ. ¹heor. Appl., 26, 343—364 (1998)( 1998 John Wiley & Sons, Ltd.

Figure 9. The mapping from input image velocity to measured velocity is nearly linear. This plot shows the mapping for pixels 20, 40, 60,80 and 100 after 15 frames of translation for 41 velocities uniformly spaced between !2 and 2 pixels/frame

3.4. Estimating component velocities in two dimensions

To estimating component velocities in two-dimensional images, we first generalize the definition of aspace—velocity-separable filter to two dimensions. Unfortunately, the simple relationship in Proposition1 between the motion energy output and the image velocity does not generalize to two dimensions aside fromsome special cases. However, by appropriate filter design, the motion energy output can be approximated bya similar expression. This allows a straightforward generalization of the previous results. For two-dimen-sional images, we index the spatial variables and denote spatial frequencies using vector notation, i.e.,x"(x, y) and x

x"(u

x, u

y).

Definition 2. A spatio-temporal filter for two-dimensional images is said to be space—velocity-separable indirection n if its frequency response can be expressed as

HI (xx, u

t)"G

Hs(x

x)HI

v(x

x, u

t) for x

x) n'0

0 for xx) n)0

where Hs(x

x)3C and there exists a » : RPC such that

HIv(x

x, !v (x

x) n))"»(v) (8)

for all v3R and all xx3R2. The dot notation denotes the vector dot product.

As in the one-dimensional case, the filter’s impulse response is complex. Its motion energy output is definedpixel-wise as the squared modulus of the filter output.

The two-dimensional generalization of the architecture in Figure 3 requires that HI1(x

x, u

t) satisfy the

conditions of Definition 2 for xx) n'0 and that H

s1(x

x) and H

s2(x

x) satisfy H

s2(x

x)"

!j sgn(xx) n)H

s1(x

x). A space—velocity separable filter for one-dimensional images implemented with the

filters G1(u

x, u

t), G

s1(u

x) and G

s2(u

x) can be extended to two-dimensional images by taking

HI1(x

x, u

t)"GI

1(x

x) n, u

t)

Hs1

(xx)"Ho (x

x) no) G

s1(x

x) n)

Hs2

(xx)"Ho (x

x) no) G

s2(x

x) n) (9)

ESTIMATING OPTICAL FLOW WITH CELLULAR NEURAL NETWORKS 357

( 1998 John Wiley & Sons, Ltd. Int. J. Circ. ¹heor. Appl., 26, 343—364 (1998)

Figure 10. (a) The measured velocity at each pixel after 15 frames of translation for velocities !1)5, !1)0, !0)5, 0)0, 0)5, 1)0 and 1)5 isnearly constant across the entire image. (b) The estimate deteriorates at the edges of the image due to boundary effects and at locations

where gc( ), ) ) is small

where no is the unit vector perpendicular to n and Ho is a one-dimensional Gaussian lowpass spatial filter.Essentially, we have rotated the one-dimensional filter so that it is aligned with n and added spatial filteringperpendicular to n.

A space-velocity separable filter in a direction n is tuned to respond to the component of the local imagevelocity parallel to n. Using the algorithm in Section 3.3, filters tuned to a range of velocities parallel to n canbe used to estimate the component of the optical flow in the direction n. Consider an input image undergoinguniform translation, i(x!vt). The velocity of translation can be decomposed into a component parallel ton and a component perpendicular to n, v"vn#vono. The x

xspatial frequency component oscillates in time

with frequencyu

t"!v )x

x"!v(x

x) n)!vo(x

x) no)

Substituting into (8), HIv(x

x, u

t)"» (v#vol ) where l"(x

x) no)/(x

x) n) is the arctangent of the angle between

xxand n. If the velocity is parallel with n, then vo"0 and the response of HI

vis independent x

x. In this case,

358 B. E. SHI, T. ROSKA AND L. O. CHUA

Int. J. Circ. ¹heor. Appl., 26, 343—364 (1998)( 1998 John Wiley & Sons, Ltd.

the motion energy output Eo(x, t)E2 is the product of a term which depends on the velocity componentparallel to n and a term which depends upon the spatial frequency of the output. Using the notation ofDefinition 2,

Eo(x, t)E2"E»(v)E2c(x!vt) (10)

where

c (x)"KK1

4n2 PHs(x

x) I(x

x) e+(xx > x) dx

xKK2.

Equation (10) is also valid if the velocity is not parallel with n but the image only contains spatial frequenciesparallel to n, i.e. x

x) no"0. This is an example of the statement of the aperture problem which states that it is

impossible to recover optical flow components parallel to an isobrightness contour.The decomposition in (10) is not valid if the velocity is not parallel to n and the image contains spatial

frequencies which are not parallel to n. The spatial frequencies for which xx) noO0 introduce a spatial

frequency dependency into the output of HIvwhich increases with Dvo D and l. To reduce this spatial frequency

dependency, we make the filter Ho in (9) very narrowly tuned. This decreases the effect of spatial frequencieswhich are not parallel to n on the output. If it is sufficiently narrowly tuned, (10) is approximately satisfied.

The CNN filters designed in section 3.2 can easily be extended to two dimensions. The one-dimensionalCNN array defined in Section 3.2 can be generalized by extending the summations in (5) to two dimensions:

y (x, t)" +m3b

r

a(m))y(x#m, t!1)# +m3b

r

b1(m)u (x#m, t!1)# +

m3br

b0(m)u (x#m, t)

where br"M(m

x, m

y) D!r)m

x)r, !r)m

y)rN. The spatial filter H

s1or H

s2is implemented by setting

a(m)"b1(m)"0 for all m3b

rand b

0(m) equal to its convolution kernel reflected around the origin. The

spatio-temporal filter HI1

is designed using the same pole-zero placement strategy used previously. Forexample, for a pole tuned to velocity vN#dv

pwith bandwidth *v

p, set

A(xx)"a (*v

p) (x

x) n)) e~+(v6 `dvp) (xx

) n)

The cloning template coefficients can be found by

a (m)"1

(2r#1)2+

k3br

A*A2nk

(2r#1)2B e+(k )m)@(2r`1)2

We use the same filter parameters dvp, dv

z, *v

pand *v

zused for the one-dimensional filters.

4. OPTICAL FLOW ESTIMATION

The second stage of the optical flow algorithm uses a CNN with space-varying cloning template to combinethe estimates of component velocities in different directions into an estimate of the optical flow vector at eachpixel in an image. Assume that locally the image is translating in the direction v and that for each ofN different directions, Mg

iD 1) i)NN we have computed f

i(x, t) and g

i(x, t) as in equation (7) from a set of

space—velocity-separable CNN filters in the direction gi.

Fix x and t. The least-squares estimate of the optical flow vector, vest

(x, t), minimizes the error function

Ex,t(v%45)"

1

2

N+i/1

g2i(x, t) (v

%45)g

i!v

i(x, t))2

(11)

"

1

2

N+i/1

(g2i(x, t) (v

%45) g

i!f

i(x, t))2v

i(x, t))2

at each pixel in the image. If the component image velocity estimates, vi(x, t)" f

i(x, t)/g

i(x, t) are perfect,

then vi(x, t)"v ) g

iand Ex, t

(l)"0. We have used the confidence measure g2i(x, t) to weight orientation

ESTIMATING OPTICAL FLOW WITH CELLULAR NEURAL NETWORKS 359

( 1998 John Wiley & Sons, Ltd. Int. J. Circ. ¹heor. Appl., 26, 343—364 (1998)

measurement in the cost function. Thus, data from orientations where our confidence is low will affect theestimate less.

The following two-layer CNN with space-varying templates can perform the minimization. For each cellx and time q, denote the output of the first layer of the CNN by p (x, q) and the output of the second layer byq(x, q). These satisfy

dp (x, q)dq

" +m3b

r

a11

(x, m)p(x#m, q)# +m3b

r

a12

(x, m)q (x#m, q)#I1(x)

dq (x, q)dq

" +m3b

r

a21

(x, m)p(x#m, q)# +m3b

r

a22

(x, m)q (x#m, q)#I2(x)

The cloning templates, A11

(x)"[a11

(x, m)], A22

(x)"[a22

(x, m)], A12

(x)"[a12

(x, m)] and A21

(x)"[a

21(x, m)], and biases I

1(x) and I

2(x) are space varying since they depend upon the index x. The output of

the first layer corresponds to the first component of the optical flow vector. The output of the second layercorresponds to the second component of the optical flow vector. Define m

1"[1 0]T and m

2"[0 1]T. With the

template

A11

(x)"CN+i/1

g2i(x, t) (g

i) n

1)2D, A

22(x)"C

N+i/1

g2i(x, t) (g

i) n

2)2D

A12

(x)"CN+i/1

g2i(x, t) (g

i) n

1) (g

i) n

2)D"A

21(x) (12)

I1(x)"

N+i/1

fi(x, t)g

i(x, t) (g

i) n

1), I

2(x)"

N+i/1

fi(x, t)g

i(x, t) (g

i) n

1)

the CNN settles to a steady state which minimizes (11) at each cell.This CNN can be modified to include intralayer connections which enforce a smoothness constraint on the

computed optical flow as suggested by Horn and Schunk.21,22 Defining a global error function by summingthe individual error functions (11) over the entire array and adding a term which penalizes variation betweenneighbouring estimates of the optical flow vector:

Et(m)"+

x GEx, t(v(x))#

j2

[(p(x#n1)!p (x))2#(p(x#n

2)!p (x))2#(q(x#n

1)

!q (x))2#(q(x#n2)!q(x))2]H

where v"Mv (x)N"M(p(x), q (x))N is the set of all optical flow vectors in the array at time t. The correspondingCNN has the same template as (12) except

A11"

0 j 0

j A!4j!N+i/1

g2i(x, t) (g

i)n

1)2B j

0 j 0

,

A22"

0 j 0

j A!4j!N+i/1

g2i(x, t) (g

i)n

2)2B j

0 j 0

The larger the value of j, the smoother the resulting optical flow estimate. The desired value of j dependsupon the average values of the confidence function g

i(x, t). If g

i(x, t)@j, the smoothness term dominates. If

gi(x, t)Aj, the velocity constraints dominate. Thus, at locations where our confidence in the filter outputs is

360 B. E. SHI, T. ROSKA AND L. O. CHUA

Int. J. Circ. ¹heor. Appl., 26, 343—364 (1998)( 1998 John Wiley & Sons, Ltd.

Figure 11. The first and final frames the translating (a, b) and diverging (c, d) tree sequences used to test the optical flow algorithm. Thetranslating sequence corresponds to the camera translating horizontally with respect to a planar surface. The diverging sequence

corresponds to the camera translating towards a planar surface

small, the velocity estimate depends mainly upon the average of the velocity estimates at the neighbouringpixels.

We tested this algorithm on 15 frames of the translating and diverging tree sequences used in Reference 24,see Figure 11. For each of the four directions corresponding to 0, 45, 90 and 135°, we computed the motionenergy outputs of seven two dimensional space—velocity-separable filters tuned to velocities !2)25, !1)5,

ESTIMATING OPTICAL FLOW WITH CELLULAR NEURAL NETWORKS 361

( 1998 John Wiley & Sons, Ltd. Int. J. Circ. ¹heor. Appl., 26, 343—364 (1998)

Figure 12. (a) The computed optical flow (a, c) and the error (b, d) in the computed optical flow for the last frame of the translating anddiverging tree sequences

!0)75, 0, 0)75, 1)5 and 2)25 pixels per frame. For each direction, the spatial filter Ho was a Gaussian withstandard deviation 0)044n.

For the translating tree sequence, since the expected velocities lie between 1)73 and 2)3 pixels per frame, weran the algorithm on the level 1 of a Gaussian pyramid23. For the diverging tree sequence, we ran the

362 B. E. SHI, T. ROSKA AND L. O. CHUA

Int. J. Circ. ¹heor. Appl., 26, 343—364 (1998)( 1998 John Wiley & Sons, Ltd.

Table I. Accuracy of optical flow estimates

Technique Translating tree Diverging tree

Average Standard Average Standarderror deviation error deviation(deg) (deg) (deg) (deg)

CNN(j"0) 2)60 2)13 4)47 6)11CNN(j"300) 1)93 1)52 2)77 4)55Heeger 4)79 2)39 4)95 3)09

algorithm on the original image sequence. The computed optical flow fields and the errors between the actualand computed flow fields are shown in Figure 12. The majority of the errors occur along the edges of theimage due to boundary effects. Because of boundary effects, we discard velocity estimates for pixels within 10pixels of the image edges. We computed the angular error measure used in for comparison with the results inReference 24. Table I displays the results and indicates that this algorithm performs better than algorithmproposed by Heeger11 which estimates the optical flow field from the motion energy outputs of a set ofspace—time Gabor filters which are not space—velocity-separable.

5. CONCLUSION

We have described an spatio-temporal filtering based algorithm for estimating the optical flow vector atevery point in an image. In developing the algorithm, we made two important innovations. First, we provedthat the hyperacuity model of Heiligenberg applies to a wider class of sensitivity curves than previouslyconsidered. This validated our application of the model to estimate component image velocity from theoutputs of the CNN filters, whose velocity sensitivities are not Gaussian. Second, we have defined the newclass of space—velocity-separable filters and showed that the effects of velocity variations and variations in thespatial frequency content of the input images are decoupled in the motion energy output. This decoupling isnot true for the more commonly used space—time Gabor filter. We believe that this is the primary reason ouralgorithm seems to outperform other algorithms estimating the optical flow from the motion energy outputsof Gabor filters.

A key advantage of this algorithm is that every step is implementable on a CNN Universal Machinearchitecture. Several implementations of CNN Universal Machines in analog VLSI have recently beenreported.4,5 Since the values of the optical flow vectors are stored locally at each cell of this two layer CNN,the optical flow information can be further processed using additional CNN operations. For example,additional CNN templates might be applied to detect features such as the locations of discontinuities in theoptical flow or regions where the magnitude of the optical flow exceeds a pre-set threshold value. The opticalflow vectors might also be combined with other data extracted from an image or set of images, such as thelocation of intensity edges or disparity information, to perform a high level image segmentation.

REFERENCES

1. L. O. Chua and L. Yang, ‘Cellular neural networks: theory’, IEEE ¹rans. Circuits Systems, 35, 1257—1272 (1988).2. L. O. Chua and L. Yang, ‘Cellular neural networks: applications’, IEEE ¹rans. Circuits Systems, 35, 1273—1290 (1988).3. L. O. Chua and T. Roska, ‘The CNN paradigm’, IEEE ¹rans. Circuits Systems—I: Fundamental ¹heory Appl., 40, 147—156 (1993).4. S. Espejo, R. Carmona, R. Dominguez-Castro and A. Rodriguez-Vazquez, ‘A CNN universal chip in CMOS Technology’, Int. J.

Circuit ¹heory Appl., 24, 93—109 (1996).5. J. M. Cruz, L. O. Chua and T. Roska, ‘A fast, complex and efficient test implementation of the CNN universal machine’, Proc. 3rd

IEEE Int. ¼orkshop on Cellular Neural Networks and their Applications, December 1994, pp. 61—61.

ESTIMATING OPTICAL FLOW WITH CELLULAR NEURAL NETWORKS 363

( 1998 John Wiley & Sons, Ltd. Int. J. Circ. ¹heor. Appl., 26, 343—364 (1998)

6. B. K. P. Horn, Robot »ision, McGraw-Hill, New York, 1986.7. T. Roska and L. O. Chua, ‘The CNN universal machine: an analogic array computer’, IEEE ¹rans. Circuits Systems—II: Analog

Digital Signal Process., 40, 163—173 (1993).8. D. J. Fleet, Measurement of Image »elocity, Kluwer Academic Publishers, Boston, MA, 1992.9. W. Heiligenberg, ‘Central processing of sensory information in electric fish’, J. Comparative Physiol. A, 161, 621—631 (1987).

10. W. Heiligenberg and T. Roska, ‘On biological sensory information processing principles relevant to cellular neural networks,’ in:(T. Roska and J. Vandewalle (eds), Cellular Neural Networks, Wiley, New York; 1993, pp. 201—211.

11. D. J. Heeger, ‘Model for the extraction of image flow’, J. Opt. Soc. Amer. A, 4, 1455—1471 (1987).12. P. Baldi and W. Heiligenberg, ‘How sensory maps could enhance resolution through ordered arrangements of broadly tuned

receivers’, Biol. Cybern., 59, 313—318 (1988).13. R. N. Bracewell, ¹he Fourier ¹ransform and its Applications, McGraw Hill, New York, 1986.14. E. H. Adelson and J. R. Bergen, ‘Spatiotemporal energy models for the perception of motion’, J. Opt. Soc. Amer. A, 2, 284—299

(1985).15. E. H. Adelson and J. R. Bergen, ‘The extraction of spatio-temporal energy in human and machine vision’, Proc. IEEE ¼orkshop on

Motion: Representation and Analysis, Charleston, SC, May 1986, pp. 151—155.16. A. B. Watson and J. A. J. Ahumada, ‘Model of human visual-motion sensing’, J. Opt. Soc. Amer. A, 2, 322—342 (1985).17. N. M. Gryzwacz and A. L. Yuille, ‘A model for the estimate of local image velocity by cells in the visual cortex,’ Proc. R. Soc. ¸ond. B,

239, 129—161 (1990).18. B. Shi, T. Roska and L. O. Chua, ‘Design of cellular neural networks for motion sensitive filtering’, IEEE ¹rans, Circuits

Systems—II: Analog Digital Signal Process., 40, 320—331 (1993).19. B. E. Shi, ‘Spatio-temporal filtering with cellular neural networks’, Ph.D. ¹hesis, University of California at Berkeley, 1994.20. D. Gabor, ‘Theory of communication’, J. IEE ¸ondon, 93, 429—457 (1946).21. B. K. P. Horn and B. G. Schunk, ‘Determining optical flow’, Artificial Intelligence, 17, 185—204 (1981).22. J. Hutchinson, C. Koch, J. Luo and C. Mead, ‘Computing motion using analog and binary resistive networks’, Computer, 52—63,

(1988).23. P. J. Burt, ‘Fast algorithms for estimating local image properties’, Computer »ision, Graphics Image Processing, 21, 368—382 (1983).24. J. Barron, D. S. Fleet, S. S. Beauchemin and T. A. Burkitt, ‘Performance of optical flow techniques’, Proc. C»PR, Champaign, IL,

IEEE, 1992, pp. 236—242.25. D. J. Fleet and A. D. Jepson, ‘Computation of component image velocity from local phase information’, Int. J. Comput. »ision, 5,

77—104 (1990).

364 B. E. SHI, T. ROSKA AND L. O. CHUA

Int. J. Circ. ¹heor. Appl., 26, 343—364 (1998)( 1998 John Wiley & Sons, Ltd.