position-, rotation-, scale-, and orientation-invariant multiple object recognition from cluttered...
TRANSCRIPT
Position, rotation, scale and orientation invariant multiple object
recognition from cluttered scenes
Peter Bone, Rupert Young1, Chris Chatwin
Laser and Photonic Systems Research Group
Department of Engineering and Design
University of Sussex,
Brighton BN1 9QT
ABSTRACT
A method of detecting target objects in still images despite any kind of geometrical distortion is demonstrated.
Two existing techniques are combined, each one capable of creating invariance to various types of distortion of the target
object. A Maximum Average Correlation Height (MACH) filter is used to create invariance to orientation and gives
good tolerance to background clutter and noise. A log r-θ mapping is employed to give invariance to in-plane rotation
and scale by transforming rotation and scale variations of the target object into vertical and horizontal shifts. The MACH
filter is trained on the log r-θ map of the target for a range of orientations and applied sequentially over regions of
interest in the input image. Areas producing a strong correlation response can then be used to determine the position, in-
plane rotation and scale of the target objects in the scene.
Keywords: logmap, MACH filter, correlation filter, invariant pattern recognition
1. INTRODUCTION
The problem of recognizing objects despite distortions in position, orientation and scale 1-2
, and within cluttered
backgrounds is a demanding pattern recognition problem. A system capable of detecting target objects despite any kind
of geometrical distortion has many practical applications since it can detect target objects when the orientation and
position of the target or camera is unknown. The ability to classify objects as in-class or out-of-class in cluttered
backgrounds is also crucial if the system is to be used in a practical application. The first main success at solving the
invariance problem came from the development of the Synthetic Disciminant Function (SDF) 3-5, which included the
expected distortions in the filter design to create invariance to such distortions. More recent attempts have been based on
the Maximum Average Correlation Height (MACH) filter 6, which can be tuned to give maximum performance and is far
more immune to background clutter.
To attempt to solve the full invariance problem, we combine two existing techniques, each capable of achieving
invariance to several of the possible variations of a target object. Out-of-plane rotation invariance is achieved using a
Maximum Average Correlation Height filter (MACH). Aside from creating out-of-plane rotation invariance, this filter is
capable of discriminating the target objects from cluttered or noisy backgrounds. Scale and in-plane rotation invariance
is created with the use of a log r-θ mapping (logmap) of a localised region of the image space. A change in scale or
rotation in the target object results in a horizontal or vertical shift in the logmap, which makes the object detectable by
1 E-mail: [email protected]; Tel: +44(0)1273 678908; Fax: +44(0)1273690814
correlation with the logmap of the reference image. Simulation results are presented that demonstrate the detection of
the target object in a scene under a number of conditions that test the filter’s invariance.
2. IN-PLANE ROTATION AND SCALE INVARIANCE
A method was required to detect target objects in a scene despite their differences in scale or in-plane rotation to
the target reference images. To do this a log r- θ mapping 7-9
, or logmap, was used. The logmap uses a variation on the
basic x-y grid sensor used in conventional image processing. The structure of the sensor is based on a Weiman polar
exponential grid 7,10,11
and consists of concentric exponentially spaced rings of pixels, which increase in size from the
centre to the edge. This produces an arrangement similar to that found in the mammalian retina where photoreceptive
cells are small and densely packed in the fovea and increase in size exponentially to create a blurred periphery. Each
sensor pixel on the circular region of the x-y Cartesian space is mapped into a rectangular region in Polar image space r-
θ . The sensor’s geometry maps concentric circles in the Cartesian space into vertical lines in the polar space and radial
lines in the Cartesian space into horizontal lines in the Polar space. It will be shown mathematically that this
transformation offers scale and rotation invariance about its centre, since rotation or scale changes simply produce
vertical or horizontal shifts in the polar space. Aside from invariance to in-plane rotation and scale, this mapping also
has the advantage of giving a wide visual field whilst maintaining high resolution at the centre where it is needed most.
This means that it uses a minimum amount of memory and computational resources.
The logmap sensor performs a complex logarithmic mapping of the image data from the circular retinal region
into a rectangular region. Fig. 2.1 shows the complex logarithmic mapping performed by the sensor geometry. The
vertical lines in the w-plane map to concentric circles in the z-plane and the horizontal lines in the w-plane map to radial
lines in the z-plane.
Fig. 2.1 Sensor geometry and logarithmic mapping
The complex logarithmic mapping can be written as:
zw log= (2.1)
and applying the complex mathematical form notation
yixz += (2.2)
where 22yxr += and
=
x
yarctanθ
w = log z
z = exp w
y
x
u
v
w = u + i v z = x + i y
Z - PLANE W - PLANE
Thus,
ivuirw +=+= θlog (2.3)
where ru log= and θ=v
Hence, an image in the z-plane with co-ordinates x and y is mapped to the w-plane with co-ordinates u and v.
The mapped image from the Cartesian space (z-plane) in to the Polar space (w-plane) is referred to as the log-polar
mapping or the logmap. This process can be reversed to produce the inverse mapping from Polar space (w-plane) to
Cartesian space (z-plane).
The following invariance’s are created by the logmap and are useful for image processing applications, in
addition to the wide field of view, reduced pixel count and a central highly focused area inherent in the space-variant
sensor geometry. Firstly, the log-polar mapping offers in-plane rotation invariance of the rotated image. If an image is
rotated by an angle α about the origin, then:
( )α+θ= i
erz
Thus,
ααθ iviuiirw ++=++= log (2.4)
In effect, rotating the image by the angle α has resulted in a vertical shift in the mapped image by the rotation angle.
Secondly, the log-polar mapping offers scale invariance. If the image in the z-plane is scaled by a factor β, then:
θβ= i
erz
Thus,
viuirw ++=++= βθβ logloglog (2.5)
In effect, scaling the image by the factor β has resulted in a horizontal shift in the mapped image by the scaling factor.
Thirdly, the mapping also offers projection invariance. If we move towards a fixed point in the z-plane then the mapped
image will shift horizontally in the w-plane. The size and shape of the original z-plane image will stay the same. This is
equivalent to progressively scaling the z-plane image, but with a different change in the perspective ratio.
The fact that rotation and scale changes result in horizontal and vertical shifts in the logmap means that the
logmap creates invariance to these transformations since a linear correlator object recognition system is invariant to x-y
translation. It should be emphasised however, that the log-polar mapping is not shift invariant, so the properties
described above hold only if they are with respect to the origin of the Cartesian image space.
2.2 Polar-exponential sensor geometry
The geometry of the sensor can be varied to produce the optimum use of image space 12. A suitable mapping
would be to have the sensors angularity proportional to cell size. This results in a sensor array where the sensor cells are
roughly square and increased exponentially in size with distance from the centre of the mapping.
The sensor geometry is characterised using its parameters ( )dr,r,R,n min . The grain of the array n is defined
as the number of the sensor cells in each of the rings of the array and so is equal to the number of the spokes in the
sensor array. Thus, n spokes cover a range of π2 radians, and so the angular resolution of each sensor cell from the
centre of the array is n
2π. This is named the granularity of the array g, and can affect the sensors resolution:
ng
gygranularit
π2=
∴
= (2.6)
The gain of the array k is defined as the ratio of pixel sizes between any two successive rings of pixels in the sensor:
gn
k exp2
exp =
=
π (2.7)
The above equation also describes the resolution of the sensor, which decreases as we move from the centre to the
periphery. Smaller gains represent a higher overall resolution than at larger gains. Thus, it can be shown that we can
compute the position of each ring and spoke in the sensor array by the sensor’s granularity g:
( ) ( ⋅= uur exp )g (2.8)
( ) ⋅=θ vv g (2.9)
where u and v are the integer indices of the logmap, varying maxmin uuu ≤≤ and ≤≤ v0 ( )1n − .
The next parameter of the sensor is the field width R, defined as the distance from the centre to the edge of the
mapping. By setting R to different values and fixing the focus point of expansion at the centre of the image we affect the
data sampled by the sensor. If we set R to be half of the size of the image, then the circular visual field omits only the
image data at the corners. Changing the value to 21 of the size of the image will include all the image data, but also
unwanted data around the image is included. The last parameter of the sensor to be defined is the blind spot of the
mapping, which affects the overall resolution of the mapping. If we vary the radius of the first ring minr , then we can
omit the image data at the centre of the sensor. This blind spot can be filled with a high-resolution array of uniformly
spaced rectangular cells in a similar way to the mammalian retina or with an r- θ mapping or just left empty. The
circular area of the blind spot is described by an offset dr, into the mapping:
drrSpotBlindtheofsize += min (2.10)
The number of sensor cells in the mapping and the range of pixels which they cover into the logmap will be determined
by the field width R, the radius of the first ring from the centre of the mapping minr , and the sensor’s grain n.
2.3 Logmap of an x-y raster image
An x-y raster image must be converted to a logmap image by interpolation. The intensity value of each sensor
cell is computed by averaging the intensity of all the x-y image pixels that fall within it. The most basic, and fastest, way
of doing this is a nearest-neighbour interpolation method where each image pixel contributes to only one sensor cell
depending on what sensor cell its centre is within. However, this method does not account for image pixels that fall
between sensor cell locations and therefore produces a very broken logmap, which is not fully scale and rotation
invariant. The logmap can be improved dramatically by sub-dividing the pixels that fall between sensor locations into
smaller sub-pixels and calculating the proportion of the image pixel that covers the logmap region by computing the
corresponding sensor cell for each sub-pixel. This produces a logmap image that is more rotation and scale invariant but
the computational time increases as the level of sub-division is increased. If the same sensor geometry is being used to
create logmaps of multiple images, then the time taken to create each logmap can be reduced significantly by pre-
computing a lookup table to store the image pixels that contribute to each sensor cell and the proportion they contribute.
Figure 2.2 shows an original image (a), the log-mapped image (b) and the inverse mapping of the logmap (c).
The parameters used in the logmap geometry were n=100, R=50, rmin=1, dr=0.
(a) (b) (c)
Fig. 2.2 Original image (a), logmap image (b) and inverse mapping (c)
3. MAXIMUM AVERAGE CORRELATION HEIGHT (MACH) FILTER
The MACH filter 6, like the SDF, is another method of creating invariance to distortions in the target object by
including the expected distortions in the construction of the filter. The MACH filter maximizes the relative height of the
average correlation peak with respect to the expected distortions. Unlike the SDF, the MACH filter can be tuned to
maximize the correlation peak height, peak sharpness and noise suppression while also being tolerant to distortions in the
target object that fall between the distortions given in the training set. However, the peak height of the MACH filter is
unconstrained making it more difficult to interpret the results of the correlation13
.
The MACH filter is derived by maximizing a performance metric called the average correlation height (ACH).
However, several other performance measures have to be balanced to better suit different application scenarios. These
measures are the Average Correlation Energy (ACE), Average Similarity Measure (ASM) and Output Noise Variance
(ONV) 5,14. Thus, to form an optimal trade-off filter 6,14, the following energy function is formed and minimized with
respect to one criterion, holding the others constant:
( ) ( ) ( ) ( ) ( )
x
T
xx mhhShhDhChh
ACHASMACEONVhE
δγβα
δγβα
−++=
−++=
+++ (3.1)
The resulting optimal trade-off (OT) MACH filter (in the frequency domain) is given as:
xx
x
SDC
mh
γβα ++=
∗
(3.2)
where α, β and γ are non-negative OT parameters, xm is the average of the training image vector Nxxx ,...,, 21 (in the
frequency domain), and C is the diagonal power spectral density matrix of additive input noise. It is usually set as the
white noise covariance matrix, IC 2σ= . xD is the diagonal average power spectral density of the training images:
∑=
∗=N
i
iix XXN
D
1
1 (3.3)
where Xi is diagonal matrix of the ith
training image. Sx denotes the similarity matrix of the training images:
( ) ( )∑=
∗−−=
N
i
xixix MXMXN
S
1
1 (3.4)
where Mx
is the average of Xi.
The different values of α, β and γ control the MACH filter’s behaviour to match different application
requirements 15. If β=γ=0, the resulting filter behaves much like a MVSDF filter 16 with relatively good noise tolerance
but broad peaks. If α=γ=0 then the filter behaves more like a MACE filter17, which generally exhibits sharp peaks and
good clutter suppression but is very sensitive to distortion of the target object. If α=β=0, the filter gives high tolerance
for distortion but is less discriminating.
4. FULLY INVARIANT FILTER
By combining the in-plane rotation invariance of the logmap and the distortion invariance of the MACH it was
possible to create a filter that is invariant to any kind of geometrical distortion of the target object, while maintaining
high performance even in cluttered scenes. Such a filter was constructed and tested for the problem of detecting a
particular car (a model of a Jaguar S-Type) in a scene (Fig 4.1). The system was expected to correctly detect the car
despite variations in its out-of-plane orientation, in-plane rotation, scale, position and with noisy or cluttered
backgrounds. To create invariance to changes in out-of-plane orientation of the car, the MACH filter was created using a
training image set consisting of the expected range of rotation taken at small intervals of viewing angle.
Fig. 4.1 Various distortions of the target object (a model of a Jaguar S-Type)
The problem of combining the logmap with the MACH was solved simply by creating a logmap of each
reference image before synthesis of the filter. The input image then also needed to be log-mapped before being
correlated with the filter. However, the logmap gains in-plane rotation and scale invariance at the expense of position
invariance. This means that to search the entire input image for the target object requires the correlation process to be
repeated for each location in the image. To accomplish this, a moving window system was created to raster scan the
input image and perform the MACH logmap correlation on the sub-image contained in the window. The size of the
window needed to be as small as possible to reduce computational time while being large enough to fit any expected
distortion of the target object.
To classify each sub-image as containing an in-class or out-of-class object it was necessary to compare the
correlation peak height of each sub-image with some threshold - above which the sub-image would be classified as
containing an in-class target. The threshold was calculated by correlating each of the reference images with the MACH
filter and taking an average of their resultant peak intensities (the intensity at the centre of the correlation plane). This
value then gave a value close to what would be expected after correlation between the filter and an in-class target. The
threshold was then set slightly lower than this value by multiplying by 0.75 so that any in-class objects would produce
peaks slightly above the threshold, allowing for some reduction in amplitude, while still being high enough for all out-of-
class objects to fall below the threshold. The threshold equation was thus calculated as:
∑=
=N
i
iyx
NyxCThreshold
1,
75.0 |}),({|max (4.1)
Where Ci (x,y) correlation plane amplitude of the ith
training image.
Once a target object had been detected it is then possible to calculate the in-plane rotation and scale of the target
in the image. This is possible since rotation and scale variations in the target object produce horizontal and vertical shifts
in the logmap. The position of the maximum correlation peak will therefore also be shifted from its central position when
target objects are rotated or scaled relative to the set of reference images used to build the filter. By measuring the
horizontal and vertical offset of the peak from the centre of the correlation plane it is possible to calculate the scale ratio
and rotation relative to the reference images. From equations 2.2 and 2.9 it can be seen that the rotation and scale can be
calculated as:
( ⋅= offsetuScaleRatio exp )g (4.2)
⋅= offsetvRotation g (4.3)
Where uoffset and voffset are the horizontal and vertical offsets of the maximum correlation peak from the centre of
the correlation plane.
As well as calculating the scale and in-plane rotation of detected targets, it is also possible to calculate the out-
of-plane rotation (orientation) as a post processing operation. This is done by correlating the log-mapped sub-image with
each of the original log-mapped reference images and seeing which one gives the strongest response.
The logmap parameters used for the filter were n=100, R=50, rmin=5 and dr=0. The value of the gain n was
chosen based on the resolution of the reference images so that they would not be over or under sampled. The value of R
was chosen based on the maximum expected dimensions of the target object, since the aim is to make R as small as
possible while being large enough to cover any target object. The width of the moving window used to sample the input
image was set to 80, as this meant that it was fully sampled by the logmap while being as small as possible. rmin was set
to 5 to create a blind spot at the centre of the image. The blind spot was necessary for two reasons. Firstly, the area at
the centre of the sensor is over sampled and so a single pixel in the original image is mapped to many pixels in the
logmap image making the logmap image unnecessarily large. Secondly, the over sampling creates large areas of flat
colour on the left hand side of the logmap (this can be seen in Fig. 2.2 (b) where a blind spot was not used). Edges are
created in-between these areas of colour when the MACH filter is applied, which inhibit the in-plane rotation invariance
of the filter. The area inside the blind spot was left empty but this loss of data did not significantly affect the
performance of the filter since it represents a relatively small proportion of the expected target size.
The normal method of creating the logmap from each sub-image is very slow compared to the other stages of
Fourier transforming the logmap, multiplying it by the filter and Fourier transforming again to create the correlation
plane. This was especially true due to the fact that a subdivision level of 16 was chosen in the interpolation algorithm in
order to create a very smooth logmap (i.e. each image pixel was divided into 4 by 4 smaller pixels to take into account
pixels that fall between logmap regions). However, the logmap geometry used on each sub-image is the same and it is
therefore possible to remove the redundancy of performing the interpolation geometry calculations for each sub-image
by employing a look-up table. The look-up table is an array, each element of which corresponds to a particular logmap
region. Each element contains a list of image pixels that contribute to that particular logmap region and the fraction they
contribute. The intensity of each logmap pixel can then be computed simply by summing the intensity of its contributing
parts without having to calculate what the contributing parts are. Using this system makes the creation of a logmap for
each sub-image much faster. The system was able to process approximately 10 sub-images per second, including
logmap creation, two Fourier transforms, correlation peak location and classification, running on a 3GHz Pentium using
simulation code written and executed in MATLAB.
The parameters used in the MACH filter function were set to α=0.01, β=0.3 and γ=0.1. These values were set
after a long period of testing with different values to correctly balance the discriminating ability against the sharpness of
the correlation peak and the general performance of the filter.
4.1 Performance Criteria
To quantify the performance of the filter accurately it was necessary to calculate some basic measures for quality of the
correlation plane for detected targets.
The most basic measure of the target correlation peak is the correlation output peak intensity (COPI). This refers to the
maximum intensity value of the of the correlation output plane, i.e. the peak intensity. COPI is defined as 18
:
}|),({|max 2
,yxCCOPI
yx= (4.4)
Where C(x, y) is the output correlation amplitude value at position (x, y). A filter that produces a high COPI value has
good performance and better detectability.
To provide the best detectability and performance it is necessary for a filter to yield a sharp correlation peak as well as a
high COPI value, while keeping side peaks to a minimum. The filters ability to do this can be measured using the peak-
to-correlation energy (PCE) measure. The basis of the PCE measure is that the correlation peak intensity should be as
high as possible while the overall correlation energy in the plane should be as low as possible. The PCE ratio is defined
as 18:
[ ][ ]
2/12
22
2
)1(
|),(||),(|
|),(|
−
−
−=
∑yx NN
yxCyxC
yxCCOPIPCE
(4.5)
where ∑=yx NN
yxCyxC
22 |),(|
|),(| is the mean value of the correlation output plane intensity
A high PCE value therefore implies that the filter performs well.
4.2 Results
4.2.1 Correlation with training image
The first test to be carried out was a basic correlation using one of the training set images as the input image. It
was expected that the filter would perform well in this test since the input image was directly associated with the
construction of the filter. Figure 4.4 (a) shows the correlation plane output of the filter - trained on car images with
viewing angles 0, 10, 20 and 30 degrees and correlated with the 0 degree image. The filter performed well, producing a
tall sharp peak with no out-of-class peaks giving a PCE value of 25.9. It should be noted that the position of the peak in
the correlation plane does not represent the position of the target object in the input image since correlation is performed
in logmap space – the position of the peak therefore represents the rotation angle and log of scale ratio of the detected
target object. The correlation plane shown is where the strongest peak is found during scanning of the image using the
moving window method described above i.e. centred over the target object. The test was repeated 3 times by correlating
the filter with the 10, 20 and 30 degree training images. Similar results were observed with high COPI and PCE values.
4.2.2 Intermediate target object tolerance
In a practical implementation of the system the filter would be expected to correctly detect in-class targets even
if they were at an angle that did not exactly match any of the training set images. To test the distortion tolerance of the
filter it was correlated with an image of the car whose angle fell between the training set images. The filter was trained
on car images with viewing angles 0, 5, 10 and 30 degrees and correlated with a 20 degree input image. The results,
shown in Figure 4.4 (b), show that the MACH filter’s response was still large and sharp giving a PCE value of 22. There
was only a slight reduction in COPI and a slight broadening of the base of the peak. The filter still easily maintained its
discriminating ability. The test was repeated several times using different training and intermediate images, which
produced similar results.
4.2.3 In-plane rotation tolerance
To test the filter’s tolerance to in-plane rotation of the target object, the filter was correlated with a car image
that had been rotated, using bicubic interpolation, by an angle of 30 degrees. Any rotation of the image, other than at
multiples of 90 degrees, produces a slightly distorted image since there is not a one to one mapping between pixels. This
coupled with the interpolation distortion of the logmap means that in-plane rotation tolerance is a fairly demanding test
on any object recognition system. Several tests were performed using different sets of training images and correlating
with different rotated images. Figure 4.4 (c) shows a typical peak produced after training the filter on car images with
viewing angles 0, 5, 15 and 20 degrees and correlating with the 0 degree image rotated by an angle of 30 degrees of in-
plane rotation. It can be seen that the MACH filter has performed well and still produces a strong peak with a PCE value
of 24.9, which allows it to be correctly classify targets as an in-class objects despite their rotation. It can also be seen
that the peak has been shifted vertically downwards due to the rotation of the target in the image. The simulation
program was able to correctly calculate the rotation of the target given the offset from the centre using equation 4.3 - this
was accurate to within a few degrees.
To test how the filter responds to in-plane rotation in more detail, an image of the car was rotated by 360
degrees in 1 degree intervals, using bicubic interpolation, log-mapped and then correlated with the MACH filter at each
interval. The maximum correlation peak height was found for each interval and plotted against rotation angle (Fig. 4.2).
The MACH filter was constructed using car images with viewing angles of 0, 5, 10 and 15 degrees. The 15 degree
image was used as the rotated target. It can be seen that the correlation peak height varies slightly due to interpolation
errors in rotating the original image and creating the logmap. The interpolation cannot be perfect due to the pixilation of
the images. The peak height does, however, stay well above the detection threshold, calculated from equation 4.1,
showing that the filter is fully rotation invariant. This is partly due to the wrap-round nature of the Fourier transform.
The high peaks at regular 90 degree intervals are where there is an exact mapping of pixels from the original image to the
rotated image and so interpolation errors do not occur.
Fig. 4.2 Variation of correlation intensity peak height with in-plane rotation variation.
4.2.6 Scale tolerance
To test the filters tolerance to scale changes in the target object, which is directly related to the proximity of the
target object from the camera, the filter was correlated with car images that had been scaled by 120% using bicubic
interpolation. The scaling interpolation and pixilation of the logmap produces a distorted logmap image in the same way
as in the in-plane rotation test. However, with scale variation there is the added factor that any reduction in scale of the
target relative to the training images results in a loss of information, which adds to the difficulty of detection. Figure 4.4
(d) shows a typical peak produced when the filter was trained on car images with viewing angles 5, 10, 15 and 20
degrees and correlated with the 15 degree image scaled by 120%. The MACH filter maintained its discrimination ability
giving typical PCE values of around 22. The peak was been shifted sideways slightly due to the scale change and from
this the simulation program was able to calculate the scale ratio to within a few percent using equation 4.2.
Once again, a test was carried out to examine the response to distortion in more detail by varying the distortion
gradually and calculating the corresponding peak height at each interval. A car image was scaled from 0% to 300% in
1% intervals, using bicubic interpolation, log-mapped and then correlated with the MACH filter at each interval. The
maximum correlation intensity peak height was found for each scale value and plotted against rotation angle (Fig. 4.3).
The MACH filter was constructed using car images with viewing angles of 5, 10, 15 and 20 degrees. The 15 degree
image was used as the scaled target. It can be seen from Fig. 4.6 that the correlation peak height is maximum at 100%
when the target scale matches the scale of the training images used to construct the filter. Either side of this the peak
height decreases as expected. The detection range of the filter is between 63% and 221.5% and is shown as dotted lines
on the graph. However, this range would decrease as other distortions of the target object are included. It can be seen
that the peak height drops off more sharply on the < 100% side of the graph, which therefore means that the filter is less
tolerant to reduction in scale. This was expected due to the fact that information is lost in the case of reduced targets but
not in the case of expanded targets. The slight undulation in the graph is due to the pixilation of the logmap and could be
reduced by increasing the resolution of the logmap at the expense of an increase in computational time.
Fig. 4.3 Variation of correlation intensity peak height with scale variation.
4.2.5 Background clutter tolerance
The filter was tested with a car image that had been superimposed onto a background image (Fig. 4.6 (a)) to test
the filter’s immunity to cluttered and noisy backgrounds. Figure 4.5 (a) shows a typical peak produced when the filter
was trained on car images with viewing angles 0, 10, 15 and 20 degrees and correlated with the 15 degree image
superimposed on a background scene. Again, the filter performed well and produced strong enough peaks to correctly
classified target objects with PCE values around 23. There is an increase in out of class peaks compared to targets tested
in plain backgrounds but their amplitude is very low compared to the peak corresponding to the target object. This result
was expected since the MACH filter was designed to be immune to cluttered scenes.
4.2.6 Worst case scenario
The final test combined all possible distortions to see how the filter would perform in a worst case scenario.
The filter was trained on car images with viewing angles 5, 10, 15 and 20 degrees and correlated with the 15 degree
image scaled by 120%, rotated by 70 degrees and superimposed on a background scene (Fig. 4.6 (b)). Figure 4.5 (b)
shows that the performance of the filter has decreased compared to previous tests. However, considering it is such a
severe test, it still produces a fairly strong in-class peak that is much larger than the surrounding out-of-class peaks and
still greater than the detection threshold.
(a) (b)
(c) (d)
Fig. 4.4 MACH filter correlated with: one of the training set images (a), an intermediate image to the training set (b), a training set
image rotated by 30 degrees (c) and a training set image scaled by 120% (d). The transparent planes show the detection threshold.
(a) (b)
Fig. 4.5 MACH filter correlated with: one of the training set images superimposed on a background scene, one of the training set
images scaled by 120%, rotated by 70 degrees and superimposed onto a background scene (b).
(a) (b)
Fig. 4.6 Car image superimposed onto a background scene (a) and car image scaled by 120%, rotated by 70 degrees and
superimposed onto a background scene (b).
5. CONCLUSIONS
By combining two image processing techniques it was possible to create an object recognition system capable
of detecting a target object in a background scene despite any kind of geometrical distortion. A system such as this is
useful in practical applications where the position and orientation of the target object or camera is unknown. The
implementation of a MACH filter gave high performance by giving tolerance to background clutter and noise while
providing invariance to orientation of the target object. A method of classification of correlation peaks as in-class or out-
of-class objects was employed by computing the average expected peak intensity, achieved by correlating the filter with
each of the reference images. This made object detection simple since the height of the peak could directly be compared
to a predefined threshold.
Invariance to in-plane rotation and scale of the target was successfully achieved by log r-θ mapping the training
set images and input image prior to synthesis of the filter and correlation. However, the inclusion of these invariance’s
meant that the computational time required to implement the system was greatly increased since correlation with the
filter had to be performed at every location of the input image. This meant that implementing this system with software
alone is not a valid option for most practical applications. However, since the filter is based on Fourier techniques it
would be possible to implement the system using optical hardware by employing a Spatial Light Modulator (SLM) to
transduce the input image at high speed and an optical correlator to filter the sampled images and produce the correlation
output. It may also be possible to implement the system in real time using a specialized Digital Signal Processing (DSP)
chip set and associated efficient code.
REFERENCES
1 D. Casasent and D. Psaltis, “Position, rotation, and scale invariant optical correlation”, Applied Optics Vol. 15,
No. 7, 1795-1799 (1976)
2 K. Mersereau and G. Morris, “Scale, rotation, and shift invariant image recognition”, Applied Optics Vol. 25,
No. 14, 2338-2342 (1986)
3 H. J. Caulfield and W. Maloney, “Improved discrimination in optical character recognition”, Applied Optics,
Vol. 8, No. 11, 2354-2356 (1969)
4 C. F. Hester and D. Casasent, “Multivariant technique for multiclass pattern recognition”, Applied Optics, Vol. 19, 1758-1761 (1980)
5 Z. Bahri and B. V. K. Kumar, “Generalized synthetic discriminant functions”, Journal of Optical Society of
America, Vol. 5, No. 4, 562-571 (1988)
6 A. Mahalanobis, B.V.K. Vijaya Kumar, S. Song, S.R.F. Sims, J.F. Epperson, “Unconstrained correlation
filters”, Applied Optics, Vol. 33, pp. 3751-3759, (1994).
7 Weiman, C.F.R., and Chaikin, G., “Logarithmic spiral grids for image processing and display,” Computer
Graphics and Image Processing Vol. 11, 197-226 (1979).
8 Sandini, G., and Dario, P., “Active vision based on space-variant sensing,” in: 5th
International symposium on
Robotics Research, 75-83 (1989).
9 Schwartz, E.L., Greve D., and Bonmasser, G., “Space-variant active vision: definition, overview and
examples,” Neural Networks, Vol. 8, No. 7/8, 1297-1308 (1995).
10 Weiman, C.F.R., “3-D sensing with polar exponential sensor arrays,” in: Digital and Optical Shape
Representation and Pattern Recognition, Proc. SPIE Conf. on Pattern Recognition and Signal Processing, Vol.
938, 78-87 (1988).
11 Weiman, C.F.R., “Exponential sensor array geometry and simulation,” in: Digital and Optical Shape
Representation and Pattern Recognition, Proc. SPIE Conf. on Pattern Recognition and Signal Processing, Vol.
938, 129-137 (1988).
12 C-G. Ho, R.C.D. Young, C.R. Chatwin, “Sensor Geometry and Sampling Methods for Space-Variant
Image Processing”, Journal of Pattern Analysis and Applications, Vol. 5, 369-384, (2002) 13 I. Kypraios, R. C. D. Young, P. Birch, C. Chatwin, "Object recognition within cluttered scenes employing a
Hybrid Optical Neural Network (HONN) filter", Optical Engineering, Special Issue on Trends in Pattern
Recognition, Vol. 43, No. 8, 1839-1850, (2004)
14 Ph. Refregier, “Optimal trade-off filters for noise robustness, sharpness of the correlation peak and Horner
efficiency”, Optics Letters, Vol. 16, No. 11, 829-831 (1991)
15 Hanying Zhou and Tien-Hsin Chao, “MACH filter synthesising for detecting targets in cluttered environment
for gray-scale optical correlator”, Proc. SPIE, Vol. 715, 394-398, (1999)
16 B. V. K. Kumar, “Minimum Variance Synthetic Discriminant Function”, Journal of Optical Society of America
A3, Vol. 3, 1579-1584, (1986)
17 A. Mahalanobis, B. V. K. Vijaya Kumar, D. Casasent, “Minimum average correlation energy filters”, Applied
Optics, Vol. 26, No. 17, 3633-3640, (1987).
18 B.V.K. Kumar and L.Hassebrook, “Performance measures for correlation filters”, Applied Optics, Vol. 29,
2997-3006, (1990).