position-, rotation-, scale-, and orientation-invariant multiple object recognition from cluttered...

14
Position, rotation, scale and orientation invariant multiple object recognition from cluttered scenes Peter Bone, Rupert Young 1 , Chris Chatwin Laser and Photonic Systems Research Group Department of Engineering and Design University of Sussex, Brighton BN1 9QT ABSTRACT A method of detecting target objects in still images despite any kind of geometrical distortion is demonstrated. Two existing techniques are combined, each one capable of creating invariance to various types of distortion of the target object. A Maximum Average Correlation Height (MACH) filter is used to create invariance to orientation and gives good tolerance to background clutter and noise. A log r-θ mapping is employed to give invariance to in-plane rotation and scale by transforming rotation and scale variations of the target object into vertical and horizontal shifts. The MACH filter is trained on the log r-θ map of the target for a range of orientations and applied sequentially over regions of interest in the input image. Areas producing a strong correlation response can then be used to determine the position, in- plane rotation and scale of the target objects in the scene. Keywords: logmap, MACH filter, correlation filter, invariant pattern recognition 1. INTRODUCTION The problem of recognizing objects despite distortions in position, orientation and scale 1-2 , and within cluttered backgrounds is a demanding pattern recognition problem. A system capable of detecting target objects despite any kind of geometrical distortion has many practical applications since it can detect target objects when the orientation and position of the target or camera is unknown. The ability to classify objects as in-class or out-of-class in cluttered backgrounds is also crucial if the system is to be used in a practical application. The first main success at solving the invariance problem came from the development of the Synthetic Disciminant Function (SDF) 3-5 , which included the expected distortions in the filter design to create invariance to such distortions. More recent attempts have been based on the Maximum Average Correlation Height (MACH) filter 6 , which can be tuned to give maximum performance and is far more immune to background clutter. To attempt to solve the full invariance problem, we combine two existing techniques, each capable of achieving invariance to several of the possible variations of a target object. Out-of-plane rotation invariance is achieved using a Maximum Average Correlation Height filter (MACH). Aside from creating out-of-plane rotation invariance, this filter is capable of discriminating the target objects from cluttered or noisy backgrounds. Scale and in-plane rotation invariance is created with the use of a log r-θ mapping (logmap) of a localised region of the image space. A change in scale or rotation in the target object results in a horizontal or vertical shift in the logmap, which makes the object detectable by 1 E-mail: [email protected] ; Tel: +44(0)1273 678908; Fax: +44(0)1273690814

Upload: independent

Post on 26-Apr-2023

0 views

Category:

Documents


0 download

TRANSCRIPT

Position, rotation, scale and orientation invariant multiple object

recognition from cluttered scenes

Peter Bone, Rupert Young1, Chris Chatwin

Laser and Photonic Systems Research Group

Department of Engineering and Design

University of Sussex,

Brighton BN1 9QT

ABSTRACT

A method of detecting target objects in still images despite any kind of geometrical distortion is demonstrated.

Two existing techniques are combined, each one capable of creating invariance to various types of distortion of the target

object. A Maximum Average Correlation Height (MACH) filter is used to create invariance to orientation and gives

good tolerance to background clutter and noise. A log r-θ mapping is employed to give invariance to in-plane rotation

and scale by transforming rotation and scale variations of the target object into vertical and horizontal shifts. The MACH

filter is trained on the log r-θ map of the target for a range of orientations and applied sequentially over regions of

interest in the input image. Areas producing a strong correlation response can then be used to determine the position, in-

plane rotation and scale of the target objects in the scene.

Keywords: logmap, MACH filter, correlation filter, invariant pattern recognition

1. INTRODUCTION

The problem of recognizing objects despite distortions in position, orientation and scale 1-2

, and within cluttered

backgrounds is a demanding pattern recognition problem. A system capable of detecting target objects despite any kind

of geometrical distortion has many practical applications since it can detect target objects when the orientation and

position of the target or camera is unknown. The ability to classify objects as in-class or out-of-class in cluttered

backgrounds is also crucial if the system is to be used in a practical application. The first main success at solving the

invariance problem came from the development of the Synthetic Disciminant Function (SDF) 3-5, which included the

expected distortions in the filter design to create invariance to such distortions. More recent attempts have been based on

the Maximum Average Correlation Height (MACH) filter 6, which can be tuned to give maximum performance and is far

more immune to background clutter.

To attempt to solve the full invariance problem, we combine two existing techniques, each capable of achieving

invariance to several of the possible variations of a target object. Out-of-plane rotation invariance is achieved using a

Maximum Average Correlation Height filter (MACH). Aside from creating out-of-plane rotation invariance, this filter is

capable of discriminating the target objects from cluttered or noisy backgrounds. Scale and in-plane rotation invariance

is created with the use of a log r-θ mapping (logmap) of a localised region of the image space. A change in scale or

rotation in the target object results in a horizontal or vertical shift in the logmap, which makes the object detectable by

1 E-mail: [email protected]; Tel: +44(0)1273 678908; Fax: +44(0)1273690814

correlation with the logmap of the reference image. Simulation results are presented that demonstrate the detection of

the target object in a scene under a number of conditions that test the filter’s invariance.

2. IN-PLANE ROTATION AND SCALE INVARIANCE

A method was required to detect target objects in a scene despite their differences in scale or in-plane rotation to

the target reference images. To do this a log r- θ mapping 7-9

, or logmap, was used. The logmap uses a variation on the

basic x-y grid sensor used in conventional image processing. The structure of the sensor is based on a Weiman polar

exponential grid 7,10,11

and consists of concentric exponentially spaced rings of pixels, which increase in size from the

centre to the edge. This produces an arrangement similar to that found in the mammalian retina where photoreceptive

cells are small and densely packed in the fovea and increase in size exponentially to create a blurred periphery. Each

sensor pixel on the circular region of the x-y Cartesian space is mapped into a rectangular region in Polar image space r-

θ . The sensor’s geometry maps concentric circles in the Cartesian space into vertical lines in the polar space and radial

lines in the Cartesian space into horizontal lines in the Polar space. It will be shown mathematically that this

transformation offers scale and rotation invariance about its centre, since rotation or scale changes simply produce

vertical or horizontal shifts in the polar space. Aside from invariance to in-plane rotation and scale, this mapping also

has the advantage of giving a wide visual field whilst maintaining high resolution at the centre where it is needed most.

This means that it uses a minimum amount of memory and computational resources.

The logmap sensor performs a complex logarithmic mapping of the image data from the circular retinal region

into a rectangular region. Fig. 2.1 shows the complex logarithmic mapping performed by the sensor geometry. The

vertical lines in the w-plane map to concentric circles in the z-plane and the horizontal lines in the w-plane map to radial

lines in the z-plane.

Fig. 2.1 Sensor geometry and logarithmic mapping

The complex logarithmic mapping can be written as:

zw log= (2.1)

and applying the complex mathematical form notation

yixz += (2.2)

where 22yxr += and

=

x

yarctanθ

w = log z

z = exp w

y

x

u

v

w = u + i v z = x + i y

Z - PLANE W - PLANE

Thus,

ivuirw +=+= θlog (2.3)

where ru log= and θ=v

Hence, an image in the z-plane with co-ordinates x and y is mapped to the w-plane with co-ordinates u and v.

The mapped image from the Cartesian space (z-plane) in to the Polar space (w-plane) is referred to as the log-polar

mapping or the logmap. This process can be reversed to produce the inverse mapping from Polar space (w-plane) to

Cartesian space (z-plane).

The following invariance’s are created by the logmap and are useful for image processing applications, in

addition to the wide field of view, reduced pixel count and a central highly focused area inherent in the space-variant

sensor geometry. Firstly, the log-polar mapping offers in-plane rotation invariance of the rotated image. If an image is

rotated by an angle α about the origin, then:

( )α+θ= i

erz

Thus,

ααθ iviuiirw ++=++= log (2.4)

In effect, rotating the image by the angle α has resulted in a vertical shift in the mapped image by the rotation angle.

Secondly, the log-polar mapping offers scale invariance. If the image in the z-plane is scaled by a factor β, then:

θβ= i

erz

Thus,

viuirw ++=++= βθβ logloglog (2.5)

In effect, scaling the image by the factor β has resulted in a horizontal shift in the mapped image by the scaling factor.

Thirdly, the mapping also offers projection invariance. If we move towards a fixed point in the z-plane then the mapped

image will shift horizontally in the w-plane. The size and shape of the original z-plane image will stay the same. This is

equivalent to progressively scaling the z-plane image, but with a different change in the perspective ratio.

The fact that rotation and scale changes result in horizontal and vertical shifts in the logmap means that the

logmap creates invariance to these transformations since a linear correlator object recognition system is invariant to x-y

translation. It should be emphasised however, that the log-polar mapping is not shift invariant, so the properties

described above hold only if they are with respect to the origin of the Cartesian image space.

2.2 Polar-exponential sensor geometry

The geometry of the sensor can be varied to produce the optimum use of image space 12. A suitable mapping

would be to have the sensors angularity proportional to cell size. This results in a sensor array where the sensor cells are

roughly square and increased exponentially in size with distance from the centre of the mapping.

The sensor geometry is characterised using its parameters ( )dr,r,R,n min . The grain of the array n is defined

as the number of the sensor cells in each of the rings of the array and so is equal to the number of the spokes in the

sensor array. Thus, n spokes cover a range of π2 radians, and so the angular resolution of each sensor cell from the

centre of the array is n

2π. This is named the granularity of the array g, and can affect the sensors resolution:

ng

gygranularit

π2=

= (2.6)

The gain of the array k is defined as the ratio of pixel sizes between any two successive rings of pixels in the sensor:

gn

k exp2

exp =

=

π (2.7)

The above equation also describes the resolution of the sensor, which decreases as we move from the centre to the

periphery. Smaller gains represent a higher overall resolution than at larger gains. Thus, it can be shown that we can

compute the position of each ring and spoke in the sensor array by the sensor’s granularity g:

( ) ( ⋅= uur exp )g (2.8)

( ) ⋅=θ vv g (2.9)

where u and v are the integer indices of the logmap, varying maxmin uuu ≤≤ and ≤≤ v0 ( )1n − .

The next parameter of the sensor is the field width R, defined as the distance from the centre to the edge of the

mapping. By setting R to different values and fixing the focus point of expansion at the centre of the image we affect the

data sampled by the sensor. If we set R to be half of the size of the image, then the circular visual field omits only the

image data at the corners. Changing the value to 21 of the size of the image will include all the image data, but also

unwanted data around the image is included. The last parameter of the sensor to be defined is the blind spot of the

mapping, which affects the overall resolution of the mapping. If we vary the radius of the first ring minr , then we can

omit the image data at the centre of the sensor. This blind spot can be filled with a high-resolution array of uniformly

spaced rectangular cells in a similar way to the mammalian retina or with an r- θ mapping or just left empty. The

circular area of the blind spot is described by an offset dr, into the mapping:

drrSpotBlindtheofsize += min (2.10)

The number of sensor cells in the mapping and the range of pixels which they cover into the logmap will be determined

by the field width R, the radius of the first ring from the centre of the mapping minr , and the sensor’s grain n.

2.3 Logmap of an x-y raster image

An x-y raster image must be converted to a logmap image by interpolation. The intensity value of each sensor

cell is computed by averaging the intensity of all the x-y image pixels that fall within it. The most basic, and fastest, way

of doing this is a nearest-neighbour interpolation method where each image pixel contributes to only one sensor cell

depending on what sensor cell its centre is within. However, this method does not account for image pixels that fall

between sensor cell locations and therefore produces a very broken logmap, which is not fully scale and rotation

invariant. The logmap can be improved dramatically by sub-dividing the pixels that fall between sensor locations into

smaller sub-pixels and calculating the proportion of the image pixel that covers the logmap region by computing the

corresponding sensor cell for each sub-pixel. This produces a logmap image that is more rotation and scale invariant but

the computational time increases as the level of sub-division is increased. If the same sensor geometry is being used to

create logmaps of multiple images, then the time taken to create each logmap can be reduced significantly by pre-

computing a lookup table to store the image pixels that contribute to each sensor cell and the proportion they contribute.

Figure 2.2 shows an original image (a), the log-mapped image (b) and the inverse mapping of the logmap (c).

The parameters used in the logmap geometry were n=100, R=50, rmin=1, dr=0.

(a) (b) (c)

Fig. 2.2 Original image (a), logmap image (b) and inverse mapping (c)

3. MAXIMUM AVERAGE CORRELATION HEIGHT (MACH) FILTER

The MACH filter 6, like the SDF, is another method of creating invariance to distortions in the target object by

including the expected distortions in the construction of the filter. The MACH filter maximizes the relative height of the

average correlation peak with respect to the expected distortions. Unlike the SDF, the MACH filter can be tuned to

maximize the correlation peak height, peak sharpness and noise suppression while also being tolerant to distortions in the

target object that fall between the distortions given in the training set. However, the peak height of the MACH filter is

unconstrained making it more difficult to interpret the results of the correlation13

.

The MACH filter is derived by maximizing a performance metric called the average correlation height (ACH).

However, several other performance measures have to be balanced to better suit different application scenarios. These

measures are the Average Correlation Energy (ACE), Average Similarity Measure (ASM) and Output Noise Variance

(ONV) 5,14. Thus, to form an optimal trade-off filter 6,14, the following energy function is formed and minimized with

respect to one criterion, holding the others constant:

( ) ( ) ( ) ( ) ( )

x

T

xx mhhShhDhChh

ACHASMACEONVhE

δγβα

δγβα

−++=

−++=

+++ (3.1)

The resulting optimal trade-off (OT) MACH filter (in the frequency domain) is given as:

xx

x

SDC

mh

γβα ++=

(3.2)

where α, β and γ are non-negative OT parameters, xm is the average of the training image vector Nxxx ,...,, 21 (in the

frequency domain), and C is the diagonal power spectral density matrix of additive input noise. It is usually set as the

white noise covariance matrix, IC 2σ= . xD is the diagonal average power spectral density of the training images:

∑=

∗=N

i

iix XXN

D

1

1 (3.3)

where Xi is diagonal matrix of the ith

training image. Sx denotes the similarity matrix of the training images:

( ) ( )∑=

∗−−=

N

i

xixix MXMXN

S

1

1 (3.4)

where Mx

is the average of Xi.

The different values of α, β and γ control the MACH filter’s behaviour to match different application

requirements 15. If β=γ=0, the resulting filter behaves much like a MVSDF filter 16 with relatively good noise tolerance

but broad peaks. If α=γ=0 then the filter behaves more like a MACE filter17, which generally exhibits sharp peaks and

good clutter suppression but is very sensitive to distortion of the target object. If α=β=0, the filter gives high tolerance

for distortion but is less discriminating.

4. FULLY INVARIANT FILTER

By combining the in-plane rotation invariance of the logmap and the distortion invariance of the MACH it was

possible to create a filter that is invariant to any kind of geometrical distortion of the target object, while maintaining

high performance even in cluttered scenes. Such a filter was constructed and tested for the problem of detecting a

particular car (a model of a Jaguar S-Type) in a scene (Fig 4.1). The system was expected to correctly detect the car

despite variations in its out-of-plane orientation, in-plane rotation, scale, position and with noisy or cluttered

backgrounds. To create invariance to changes in out-of-plane orientation of the car, the MACH filter was created using a

training image set consisting of the expected range of rotation taken at small intervals of viewing angle.

Fig. 4.1 Various distortions of the target object (a model of a Jaguar S-Type)

The problem of combining the logmap with the MACH was solved simply by creating a logmap of each

reference image before synthesis of the filter. The input image then also needed to be log-mapped before being

correlated with the filter. However, the logmap gains in-plane rotation and scale invariance at the expense of position

invariance. This means that to search the entire input image for the target object requires the correlation process to be

repeated for each location in the image. To accomplish this, a moving window system was created to raster scan the

input image and perform the MACH logmap correlation on the sub-image contained in the window. The size of the

window needed to be as small as possible to reduce computational time while being large enough to fit any expected

distortion of the target object.

To classify each sub-image as containing an in-class or out-of-class object it was necessary to compare the

correlation peak height of each sub-image with some threshold - above which the sub-image would be classified as

containing an in-class target. The threshold was calculated by correlating each of the reference images with the MACH

filter and taking an average of their resultant peak intensities (the intensity at the centre of the correlation plane). This

value then gave a value close to what would be expected after correlation between the filter and an in-class target. The

threshold was then set slightly lower than this value by multiplying by 0.75 so that any in-class objects would produce

peaks slightly above the threshold, allowing for some reduction in amplitude, while still being high enough for all out-of-

class objects to fall below the threshold. The threshold equation was thus calculated as:

∑=

=N

i

iyx

NyxCThreshold

1,

75.0 |}),({|max (4.1)

Where Ci (x,y) correlation plane amplitude of the ith

training image.

Once a target object had been detected it is then possible to calculate the in-plane rotation and scale of the target

in the image. This is possible since rotation and scale variations in the target object produce horizontal and vertical shifts

in the logmap. The position of the maximum correlation peak will therefore also be shifted from its central position when

target objects are rotated or scaled relative to the set of reference images used to build the filter. By measuring the

horizontal and vertical offset of the peak from the centre of the correlation plane it is possible to calculate the scale ratio

and rotation relative to the reference images. From equations 2.2 and 2.9 it can be seen that the rotation and scale can be

calculated as:

( ⋅= offsetuScaleRatio exp )g (4.2)

⋅= offsetvRotation g (4.3)

Where uoffset and voffset are the horizontal and vertical offsets of the maximum correlation peak from the centre of

the correlation plane.

As well as calculating the scale and in-plane rotation of detected targets, it is also possible to calculate the out-

of-plane rotation (orientation) as a post processing operation. This is done by correlating the log-mapped sub-image with

each of the original log-mapped reference images and seeing which one gives the strongest response.

The logmap parameters used for the filter were n=100, R=50, rmin=5 and dr=0. The value of the gain n was

chosen based on the resolution of the reference images so that they would not be over or under sampled. The value of R

was chosen based on the maximum expected dimensions of the target object, since the aim is to make R as small as

possible while being large enough to cover any target object. The width of the moving window used to sample the input

image was set to 80, as this meant that it was fully sampled by the logmap while being as small as possible. rmin was set

to 5 to create a blind spot at the centre of the image. The blind spot was necessary for two reasons. Firstly, the area at

the centre of the sensor is over sampled and so a single pixel in the original image is mapped to many pixels in the

logmap image making the logmap image unnecessarily large. Secondly, the over sampling creates large areas of flat

colour on the left hand side of the logmap (this can be seen in Fig. 2.2 (b) where a blind spot was not used). Edges are

created in-between these areas of colour when the MACH filter is applied, which inhibit the in-plane rotation invariance

of the filter. The area inside the blind spot was left empty but this loss of data did not significantly affect the

performance of the filter since it represents a relatively small proportion of the expected target size.

The normal method of creating the logmap from each sub-image is very slow compared to the other stages of

Fourier transforming the logmap, multiplying it by the filter and Fourier transforming again to create the correlation

plane. This was especially true due to the fact that a subdivision level of 16 was chosen in the interpolation algorithm in

order to create a very smooth logmap (i.e. each image pixel was divided into 4 by 4 smaller pixels to take into account

pixels that fall between logmap regions). However, the logmap geometry used on each sub-image is the same and it is

therefore possible to remove the redundancy of performing the interpolation geometry calculations for each sub-image

by employing a look-up table. The look-up table is an array, each element of which corresponds to a particular logmap

region. Each element contains a list of image pixels that contribute to that particular logmap region and the fraction they

contribute. The intensity of each logmap pixel can then be computed simply by summing the intensity of its contributing

parts without having to calculate what the contributing parts are. Using this system makes the creation of a logmap for

each sub-image much faster. The system was able to process approximately 10 sub-images per second, including

logmap creation, two Fourier transforms, correlation peak location and classification, running on a 3GHz Pentium using

simulation code written and executed in MATLAB.

The parameters used in the MACH filter function were set to α=0.01, β=0.3 and γ=0.1. These values were set

after a long period of testing with different values to correctly balance the discriminating ability against the sharpness of

the correlation peak and the general performance of the filter.

4.1 Performance Criteria

To quantify the performance of the filter accurately it was necessary to calculate some basic measures for quality of the

correlation plane for detected targets.

The most basic measure of the target correlation peak is the correlation output peak intensity (COPI). This refers to the

maximum intensity value of the of the correlation output plane, i.e. the peak intensity. COPI is defined as 18

:

}|),({|max 2

,yxCCOPI

yx= (4.4)

Where C(x, y) is the output correlation amplitude value at position (x, y). A filter that produces a high COPI value has

good performance and better detectability.

To provide the best detectability and performance it is necessary for a filter to yield a sharp correlation peak as well as a

high COPI value, while keeping side peaks to a minimum. The filters ability to do this can be measured using the peak-

to-correlation energy (PCE) measure. The basis of the PCE measure is that the correlation peak intensity should be as

high as possible while the overall correlation energy in the plane should be as low as possible. The PCE ratio is defined

as 18:

[ ][ ]

2/12

22

2

)1(

|),(||),(|

|),(|

−=

∑yx NN

yxCyxC

yxCCOPIPCE

(4.5)

where ∑=yx NN

yxCyxC

22 |),(|

|),(| is the mean value of the correlation output plane intensity

A high PCE value therefore implies that the filter performs well.

4.2 Results

4.2.1 Correlation with training image

The first test to be carried out was a basic correlation using one of the training set images as the input image. It

was expected that the filter would perform well in this test since the input image was directly associated with the

construction of the filter. Figure 4.4 (a) shows the correlation plane output of the filter - trained on car images with

viewing angles 0, 10, 20 and 30 degrees and correlated with the 0 degree image. The filter performed well, producing a

tall sharp peak with no out-of-class peaks giving a PCE value of 25.9. It should be noted that the position of the peak in

the correlation plane does not represent the position of the target object in the input image since correlation is performed

in logmap space – the position of the peak therefore represents the rotation angle and log of scale ratio of the detected

target object. The correlation plane shown is where the strongest peak is found during scanning of the image using the

moving window method described above i.e. centred over the target object. The test was repeated 3 times by correlating

the filter with the 10, 20 and 30 degree training images. Similar results were observed with high COPI and PCE values.

4.2.2 Intermediate target object tolerance

In a practical implementation of the system the filter would be expected to correctly detect in-class targets even

if they were at an angle that did not exactly match any of the training set images. To test the distortion tolerance of the

filter it was correlated with an image of the car whose angle fell between the training set images. The filter was trained

on car images with viewing angles 0, 5, 10 and 30 degrees and correlated with a 20 degree input image. The results,

shown in Figure 4.4 (b), show that the MACH filter’s response was still large and sharp giving a PCE value of 22. There

was only a slight reduction in COPI and a slight broadening of the base of the peak. The filter still easily maintained its

discriminating ability. The test was repeated several times using different training and intermediate images, which

produced similar results.

4.2.3 In-plane rotation tolerance

To test the filter’s tolerance to in-plane rotation of the target object, the filter was correlated with a car image

that had been rotated, using bicubic interpolation, by an angle of 30 degrees. Any rotation of the image, other than at

multiples of 90 degrees, produces a slightly distorted image since there is not a one to one mapping between pixels. This

coupled with the interpolation distortion of the logmap means that in-plane rotation tolerance is a fairly demanding test

on any object recognition system. Several tests were performed using different sets of training images and correlating

with different rotated images. Figure 4.4 (c) shows a typical peak produced after training the filter on car images with

viewing angles 0, 5, 15 and 20 degrees and correlating with the 0 degree image rotated by an angle of 30 degrees of in-

plane rotation. It can be seen that the MACH filter has performed well and still produces a strong peak with a PCE value

of 24.9, which allows it to be correctly classify targets as an in-class objects despite their rotation. It can also be seen

that the peak has been shifted vertically downwards due to the rotation of the target in the image. The simulation

program was able to correctly calculate the rotation of the target given the offset from the centre using equation 4.3 - this

was accurate to within a few degrees.

To test how the filter responds to in-plane rotation in more detail, an image of the car was rotated by 360

degrees in 1 degree intervals, using bicubic interpolation, log-mapped and then correlated with the MACH filter at each

interval. The maximum correlation peak height was found for each interval and plotted against rotation angle (Fig. 4.2).

The MACH filter was constructed using car images with viewing angles of 0, 5, 10 and 15 degrees. The 15 degree

image was used as the rotated target. It can be seen that the correlation peak height varies slightly due to interpolation

errors in rotating the original image and creating the logmap. The interpolation cannot be perfect due to the pixilation of

the images. The peak height does, however, stay well above the detection threshold, calculated from equation 4.1,

showing that the filter is fully rotation invariant. This is partly due to the wrap-round nature of the Fourier transform.

The high peaks at regular 90 degree intervals are where there is an exact mapping of pixels from the original image to the

rotated image and so interpolation errors do not occur.

Fig. 4.2 Variation of correlation intensity peak height with in-plane rotation variation.

4.2.6 Scale tolerance

To test the filters tolerance to scale changes in the target object, which is directly related to the proximity of the

target object from the camera, the filter was correlated with car images that had been scaled by 120% using bicubic

interpolation. The scaling interpolation and pixilation of the logmap produces a distorted logmap image in the same way

as in the in-plane rotation test. However, with scale variation there is the added factor that any reduction in scale of the

target relative to the training images results in a loss of information, which adds to the difficulty of detection. Figure 4.4

(d) shows a typical peak produced when the filter was trained on car images with viewing angles 5, 10, 15 and 20

degrees and correlated with the 15 degree image scaled by 120%. The MACH filter maintained its discrimination ability

giving typical PCE values of around 22. The peak was been shifted sideways slightly due to the scale change and from

this the simulation program was able to calculate the scale ratio to within a few percent using equation 4.2.

Once again, a test was carried out to examine the response to distortion in more detail by varying the distortion

gradually and calculating the corresponding peak height at each interval. A car image was scaled from 0% to 300% in

1% intervals, using bicubic interpolation, log-mapped and then correlated with the MACH filter at each interval. The

maximum correlation intensity peak height was found for each scale value and plotted against rotation angle (Fig. 4.3).

The MACH filter was constructed using car images with viewing angles of 5, 10, 15 and 20 degrees. The 15 degree

image was used as the scaled target. It can be seen from Fig. 4.6 that the correlation peak height is maximum at 100%

when the target scale matches the scale of the training images used to construct the filter. Either side of this the peak

height decreases as expected. The detection range of the filter is between 63% and 221.5% and is shown as dotted lines

on the graph. However, this range would decrease as other distortions of the target object are included. It can be seen

that the peak height drops off more sharply on the < 100% side of the graph, which therefore means that the filter is less

tolerant to reduction in scale. This was expected due to the fact that information is lost in the case of reduced targets but

not in the case of expanded targets. The slight undulation in the graph is due to the pixilation of the logmap and could be

reduced by increasing the resolution of the logmap at the expense of an increase in computational time.

Fig. 4.3 Variation of correlation intensity peak height with scale variation.

4.2.5 Background clutter tolerance

The filter was tested with a car image that had been superimposed onto a background image (Fig. 4.6 (a)) to test

the filter’s immunity to cluttered and noisy backgrounds. Figure 4.5 (a) shows a typical peak produced when the filter

was trained on car images with viewing angles 0, 10, 15 and 20 degrees and correlated with the 15 degree image

superimposed on a background scene. Again, the filter performed well and produced strong enough peaks to correctly

classified target objects with PCE values around 23. There is an increase in out of class peaks compared to targets tested

in plain backgrounds but their amplitude is very low compared to the peak corresponding to the target object. This result

was expected since the MACH filter was designed to be immune to cluttered scenes.

4.2.6 Worst case scenario

The final test combined all possible distortions to see how the filter would perform in a worst case scenario.

The filter was trained on car images with viewing angles 5, 10, 15 and 20 degrees and correlated with the 15 degree

image scaled by 120%, rotated by 70 degrees and superimposed on a background scene (Fig. 4.6 (b)). Figure 4.5 (b)

shows that the performance of the filter has decreased compared to previous tests. However, considering it is such a

severe test, it still produces a fairly strong in-class peak that is much larger than the surrounding out-of-class peaks and

still greater than the detection threshold.

(a) (b)

(c) (d)

Fig. 4.4 MACH filter correlated with: one of the training set images (a), an intermediate image to the training set (b), a training set

image rotated by 30 degrees (c) and a training set image scaled by 120% (d). The transparent planes show the detection threshold.

(a) (b)

Fig. 4.5 MACH filter correlated with: one of the training set images superimposed on a background scene, one of the training set

images scaled by 120%, rotated by 70 degrees and superimposed onto a background scene (b).

(a) (b)

Fig. 4.6 Car image superimposed onto a background scene (a) and car image scaled by 120%, rotated by 70 degrees and

superimposed onto a background scene (b).

5. CONCLUSIONS

By combining two image processing techniques it was possible to create an object recognition system capable

of detecting a target object in a background scene despite any kind of geometrical distortion. A system such as this is

useful in practical applications where the position and orientation of the target object or camera is unknown. The

implementation of a MACH filter gave high performance by giving tolerance to background clutter and noise while

providing invariance to orientation of the target object. A method of classification of correlation peaks as in-class or out-

of-class objects was employed by computing the average expected peak intensity, achieved by correlating the filter with

each of the reference images. This made object detection simple since the height of the peak could directly be compared

to a predefined threshold.

Invariance to in-plane rotation and scale of the target was successfully achieved by log r-θ mapping the training

set images and input image prior to synthesis of the filter and correlation. However, the inclusion of these invariance’s

meant that the computational time required to implement the system was greatly increased since correlation with the

filter had to be performed at every location of the input image. This meant that implementing this system with software

alone is not a valid option for most practical applications. However, since the filter is based on Fourier techniques it

would be possible to implement the system using optical hardware by employing a Spatial Light Modulator (SLM) to

transduce the input image at high speed and an optical correlator to filter the sampled images and produce the correlation

output. It may also be possible to implement the system in real time using a specialized Digital Signal Processing (DSP)

chip set and associated efficient code.

REFERENCES

1 D. Casasent and D. Psaltis, “Position, rotation, and scale invariant optical correlation”, Applied Optics Vol. 15,

No. 7, 1795-1799 (1976)

2 K. Mersereau and G. Morris, “Scale, rotation, and shift invariant image recognition”, Applied Optics Vol. 25,

No. 14, 2338-2342 (1986)

3 H. J. Caulfield and W. Maloney, “Improved discrimination in optical character recognition”, Applied Optics,

Vol. 8, No. 11, 2354-2356 (1969)

4 C. F. Hester and D. Casasent, “Multivariant technique for multiclass pattern recognition”, Applied Optics, Vol. 19, 1758-1761 (1980)

5 Z. Bahri and B. V. K. Kumar, “Generalized synthetic discriminant functions”, Journal of Optical Society of

America, Vol. 5, No. 4, 562-571 (1988)

6 A. Mahalanobis, B.V.K. Vijaya Kumar, S. Song, S.R.F. Sims, J.F. Epperson, “Unconstrained correlation

filters”, Applied Optics, Vol. 33, pp. 3751-3759, (1994).

7 Weiman, C.F.R., and Chaikin, G., “Logarithmic spiral grids for image processing and display,” Computer

Graphics and Image Processing Vol. 11, 197-226 (1979).

8 Sandini, G., and Dario, P., “Active vision based on space-variant sensing,” in: 5th

International symposium on

Robotics Research, 75-83 (1989).

9 Schwartz, E.L., Greve D., and Bonmasser, G., “Space-variant active vision: definition, overview and

examples,” Neural Networks, Vol. 8, No. 7/8, 1297-1308 (1995).

10 Weiman, C.F.R., “3-D sensing with polar exponential sensor arrays,” in: Digital and Optical Shape

Representation and Pattern Recognition, Proc. SPIE Conf. on Pattern Recognition and Signal Processing, Vol.

938, 78-87 (1988).

11 Weiman, C.F.R., “Exponential sensor array geometry and simulation,” in: Digital and Optical Shape

Representation and Pattern Recognition, Proc. SPIE Conf. on Pattern Recognition and Signal Processing, Vol.

938, 129-137 (1988).

12 C-G. Ho, R.C.D. Young, C.R. Chatwin, “Sensor Geometry and Sampling Methods for Space-Variant

Image Processing”, Journal of Pattern Analysis and Applications, Vol. 5, 369-384, (2002) 13 I. Kypraios, R. C. D. Young, P. Birch, C. Chatwin, "Object recognition within cluttered scenes employing a

Hybrid Optical Neural Network (HONN) filter", Optical Engineering, Special Issue on Trends in Pattern

Recognition, Vol. 43, No. 8, 1839-1850, (2004)

14 Ph. Refregier, “Optimal trade-off filters for noise robustness, sharpness of the correlation peak and Horner

efficiency”, Optics Letters, Vol. 16, No. 11, 829-831 (1991)

15 Hanying Zhou and Tien-Hsin Chao, “MACH filter synthesising for detecting targets in cluttered environment

for gray-scale optical correlator”, Proc. SPIE, Vol. 715, 394-398, (1999)

16 B. V. K. Kumar, “Minimum Variance Synthetic Discriminant Function”, Journal of Optical Society of America

A3, Vol. 3, 1579-1584, (1986)

17 A. Mahalanobis, B. V. K. Vijaya Kumar, D. Casasent, “Minimum average correlation energy filters”, Applied

Optics, Vol. 26, No. 17, 3633-3640, (1987).

18 B.V.K. Kumar and L.Hassebrook, “Performance measures for correlation filters”, Applied Optics, Vol. 29,

2997-3006, (1990).