1996, kovesi, invariant measures of image features from phase information
TRANSCRIPT
-
INVARIANT MEASURES OF IMAGE FEATURES FROM
PHASE INFORMATION
This thesis is
presented to the
Department of Psychology
for the degree of
Doctor of Philosophy
of the
University of Western Australia
By
Peter Kovesi
May 1996
-
cfl Copyright 1996
by
Peter Kovesi
ii
-
Abstract
Invariant Measures of Image Features From Phase Information
If reliable and general computer vision techniques are to be developed it is crucial
that we find ways of characterizing low-level image features with invariant quanti-
ties. For example, if edge significance could be measured in a way that was invariant
to image illumination and contrast, higher-level image processing operations could
be conducted with much greater confidence. However, despite their importance,
little attention has been paid to the need for invariant quantities in low-level vision
for tasks such as feature detection or feature matching.
This thesis develops a number of invariant low-level image measures for feature
detection, local symmetry/asymmetry detection, and for signal matching. These
invariant quantities are developed from representations of the image in the frequency
domain. In particular, phase data is used as the fundamental building block for
constructing these measures. Phase congruency is developed as an illumination
and contrast invariant measure of feature significance. This allows edges, lines and
other features to be detected reliably, and fixed thresholds can be applied over wide
classes of images. Points of local symmetry and asymmetry in images give rise
to special arrangements of phase, and these too can be characterized by invariant
measures. Finally, a new approach to signal matching that uses correlation of
local phase and amplitude information is developed. This approach allows reliable
phase based disparity measurements to be made, overcomingmany of the difficulties
associated with scale-space singularities.
iii
-
iv
-
Acknowledgements
First of all I would like to thank my supervisors John Ross and James Trevelyan.
With their gentle guidance and encouragement, the odd searching question, and
the occasional nudge, they ensured that progress was always maintained. In each of
them I have also greatly valued their enormous breadth of knowledge that spanned
many disciplines. This helped me keep my thoughts open and wide ranging as I
searched for answers to my problems.
I must also thank my other supervisor, my wife Robyn Owens, for an uncount-
able number of technical discussions, for her proof-reading skills, and for always
being there and making the generation of this thesis far less traumatic than I would
have dared to hope for. I thank Grace, Genevieve, and later in the generation of this
thesis, Gabriel for their tolerance and patience while Daddy did his Pee-Aiche-Dee.
I would also like to acknowledge the many hours of useful discussions I have had
with Ben Robbins, Chris Pudney, Mike Robins, and Adrian Baddeley. Ben Robbins
pointed out the efficiencies that can be made in the Fourier convolution of an image
with a quadrature pair of filters. This must have saved me many hours of waiting
and allowed me to do many more experiments that I would have done otherwise.
Others I must thank include the following: Daniel Reisfeld who introduced me
to the problem of finding local symmetry in images, resulting in many long and
impassioned discussions on the subject; Concetta Morrone for her amazing grasp
of both the psychophysics and computer vision literature, and therefore, always
being able to suggest yet another paper I should read; Carlo Tomasi for his help
in converting an early version of my phase congruency code from C to a MATLAB
script; Olivier Faugeras and his colleagues for their hospitality and the fine working
environment they have developed at INRIA in Sophia Antipolis which I was able
v
-
to enjoy during my visit there during the first half of 1995.
Finally I thank everyone in The Robotics and Vision Research Group in the
Department of Computer Science at The University of Western Australia for the
enjoyable working environment that they contribute to.
vi
-
Contents
Abstract iii
Acknowledgements v
1 Introduction 1
1.1 The Need for Invariant Quantities in Images . . . . . . . . . . . . . 1
1.2 The Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.4 Thesis Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2 Image features 9
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.2 Gradient based feature detection . . . . . . . . . . . . . . . . . . . 10
2.3 Local energy and phase congruency . . . . . . . . . . . . . . . . . . 17
2.3.1 Defining phase congruency . . . . . . . . . . . . . . . . . . . 19
2.3.2 Local energy . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.4 Issues in calculating phase congruency . . . . . . . . . . . . . . . . 27
2.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3 Phase congruency from wavelets 33
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.2 Using Wavelets for Local Frequency Analysis . . . . . . . . . . . . . 34
3.3 Calculating Phase Congruency Via Wavelets . . . . . . . . . . . . . 39
3.4 Noise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
3.5 Extension to two dimensions . . . . . . . . . . . . . . . . . . . . . . 49
vii
-
3.5.1 2D filter design . . . . . . . . . . . . . . . . . . . . . . . . . 49
3.5.2 Filter orientations . . . . . . . . . . . . . . . . . . . . . . . . 50
3.5.3 Noise compensation in two dimensions . . . . . . . . . . . . 52
3.5.4 Combining data over several orientations . . . . . . . . . . . 53
3.6 The importance of frequency spread . . . . . . . . . . . . . . . . . . 54
3.7 Scale via high-pass filtering . . . . . . . . . . . . . . . . . . . . . . 61
3.7.1 Difficulties with low-pass filtering . . . . . . . . . . . . . . . 61
3.7.2 High-pass filtering . . . . . . . . . . . . . . . . . . . . . . . 63
3.7.3 High-pass filtering and scale-space . . . . . . . . . . . . . . . 67
3.8 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . 69
3.9 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
4 A second look at phase congruency 77
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
4.2 Log Gabor wavelets . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
4.3 Phase congruency from broad bandwidth filters . . . . . . . . . . . 84
4.4 Another way of defining phase congruency . . . . . . . . . . . . . . 85
4.4.1 Calculation of PC2 via quadrature pairs of filters . . . . . . 87
4.5 A third measure of phase congruency . . . . . . . . . . . . . . . . . 93
4.5.1 Calculation of PC3 via quadrature pairs of filters . . . . . . 93
4.6 Biological computation of phase congruency . . . . . . . . . . . . . 95
4.7 Symmetry and Asymmetry: Special patterns of phase . . . . . . . . 97
4.7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
4.7.2 A frequency approach to symmetry . . . . . . . . . . . . . . 98
4.7.3 Biological computation of symmetry and asymmetry . . . . 103
4.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
5 Representation and matching of signals 105
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
5.2 Spatial Correlation . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
5.3 Phase Based Disparity Measurement . . . . . . . . . . . . . . . . . 107
5.4 Matching Using Localized Frequency Data . . . . . . . . . . . . . . 110
viii
-
5.5 Using Phase to Guide Matching . . . . . . . . . . . . . . . . . . . . 111
5.6 Determining Relative Signal Distortion . . . . . . . . . . . . . . . . 113
5.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
6 Conclusion 119
6.1 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
6.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
Bibliography 123
A Portfolio of experimental results 133
A.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
A.2 Portfolio . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
A.2.1 Image acknowledgements . . . . . . . . . . . . . . . . . . . . 136
A.3 Parameter variations . . . . . . . . . . . . . . . . . . . . . . . . . . 153
B Noise models and noise compensation 161
B.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161
B.2 Noise generators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162
B.3 Noise spectra measured from images . . . . . . . . . . . . . . . . . 164
B.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
C Non-maximal suppression 167
C.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
C.2 Non-maximal suppression using feature orientation information . . . 168
C.3 Orientation from the feature image . . . . . . . . . . . . . . . . . . 169
C.4 Morphological approaches . . . . . . . . . . . . . . . . . . . . . . . 171
C.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172
D Implementation details 173
D.1 MATLAB Implementation . . . . . . . . . . . . . . . . . . . . . . . 175
ix
-
x
-
Chapter 1
Introduction
1.1 The Need for Invariant Quantities in Images
This thesis is concerned with the search for measures of image features that remain
constant over wide ranges of viewing conditions. Such invariant quantities provide
powerful tools for the analysis of images, allowing image processing algorithms to
work more reliably and over wider classes of images. The work presented in this
thesis concentrates on invariant quantities in low-level or early vision.
Some effort has been devoted to investigating invariant measures of higher level
structures in images, for example, Hu [37] developed a series of invariant moments
for recognizing binary objects. More recently there has been considerable interest
in geometric invariance, the study of geometric properties of objects that remain
invariant to imaging transformations. A collection of papers in this area can be
found in the book by Mundy and Zisserman [63]. However, little attention has
been paid to the invariant quantities that might exist in low-level or early vision
for tasks such as feature detection or feature matching. Some limited exceptions
to this include the work of Koenderink and van Doorn [44, 45] who recognized
the importance of differential invariants associated with motion fields, and Florack
et al. [28] who propose differential invariants for characterizing a number of image
contour properties. However, in general, interest in low-level image invariants has
been limited. This is surprising considering the fundamental importance of being
able to obtain reliable results from low level image operations in order to successfully
1
-
2 CHAPTER 1. INTRODUCTION
perform any higher level operations.
There are two main points about an invariant measure: firstly, of course, it must
be dimensionless (that is, have no units attached to it), and secondly it should rep-
resent some meaningful and useful quality. If it does not represent some meaningful
quality one has no idea how to use it. It is easy to construct a dimensionless quan-
tity that is meaningless, for example, the ratio of my height to the width of the
letter o. It is also easy to find measures that are useful but not dimensionless, for
example, the speed of your car. However, it is often hard to define something that
is both dimensionless and useful.
Why do we want to find invariant quantities? Quantities that are useful but
not dimensionless are generally only useful because they are applied in relatively
structured environments and at a specific scale. For example, using the speed of
your car to decide whether you are driving safely only works because most cars are
similar in size, roadways are standardized and gravitational forces are effectively
constant. Images, on the other hand, provide a very dynamic and unstructured
environment in which we struggle to make our algorithms operate. Objects can
appear with arbitrary orientation and spatial magnification along with arbitrary
brightness and contrast. Thus the search for invariant quantities is very important
for computer vision.
It is all too easy to forget that a number inside a computer often has units
associated with it. The fact that a number has units associated with it imposes
some constraints on how it should be used. For example, it does not make sense to
add a quantity representing time to one representing a length. Despite this, it is
quite common to find such nonsensical combinations of quantities in the computer
vision literature. There are many algorithms that involve the minimization of some
energy; often the energy is defined to be the addition of many components, each
having very different units. For example, energy minimizing splines (snakes) are
usually formulated in terms of the minimization of an energy that is made up of
an intensity gradient term and a spline bending term [42]. These two components,
while representing meaningful quantities, are not dimensionless. This means that
for energy minimizing splines to be effective their parameters have to be tuned
-
1.2. THE APPROACH 3
carefully for each individual application. The parameters are used to balance the
relative importance of individual components of the overall energy. If, say, the
overall image contrast was halved one would need to double the weighting applied
to the intensity gradient term to retain the same snake behaviour. If one was to
somehow replace the intensity gradient and spline bending terms with dimensionless
quantities that represented, in some way, the closeness of the spline to a feature
and the deformation of the spline, one would be able to use fixed parameters over
wider classes of images.
Clearly there is a pressing need for the identification of low level invariant quan-
tities in images. The main form of invariance that will be investigated in this thesis
is invariance to image illumination and contrast. That is, this thesis will be seeking
to construct low level image measures that have a response that is independent of
the image illumination and/or contrast.
1.2 The Approach
In the search for low level invariant quantities in images the approach taken in this
thesis is to make use of data from representations of the image in the frequency
domain. Working directly in the spatial domain is avoided for two reasons. Firstly,
the spatial domain of an image, while convenient and intuitive, almost always forces
one into making use of dimensional measures in the analysis of an image; it is
hard to get away from the use of intensity gradients, contrast levels or equivalent
quantities. Secondly, low level spatial techniques have been extensively researched,
and while one cannot say all possibilities have been exhausted, the opportunities
for the development of significantly new techniques appear limited.
The most logical alternative approach is to consider representations of the image
in the frequency domain; much of the psychophysical literature in visual perception
has been devoted to the development of models in this domain. However, these
psychophysical models have generally not been developed to the point where they
could be implemented as algorithms in a computer vision system.
With an image represented in terms of the variation of amplitude and phase
-
4 CHAPTER 1. INTRODUCTION
values with frequency one has a number of new and interesting possibilities in the
analysis of image signals. However, so far in the computer vision literature, very
little work has been done on the use of frequency data to recognize and charac-
terize features in signals. Some notable exceptions to this include the following:
Granlund [30], who proposed a multiscale Fourier transform approach to the analy-
sis of images; Knutsson, Wilson and Granlund [43], who developed these ideas fur-
ther for image coding and the restoration of noisy images; Morrone and Owens [61],
who use phase congruency as a means of finding image features; Fleet and Jep-
son [26], who used phase to determine image velocities; and Langley, Atherton and
Wilson [51], Fleet, Jepson and Jenkin [27] and Calway, Knutsson and Wilson [9],
who have investigated the use of phase information to estimate image disparities.
Jones and Malik [40, 41] have also used local frequency information for determining
disparity, though they do not directly use phase information.
In this thesis considerable effort is devoted to the understanding of the variations
of phase and amplitude over frequency for different image features. In particular,
phase data is used as the fundamental building block for the various low-level in-
variant feature measures that are developed in this thesis. Phase information is
an ideal starting point for the development of invariant measures for two reasons.
Firstly, phase itself is a dimensionless quantity, and secondly phase information has
been shown to be crucial in the perception of images [65]. This is discussed further
in Chapter 2.
1.3 Contributions
This thesis develops a number of invariant frequency based low-level image quan-
tities for feature detection, local symmetry/asymmetry detection, and for signal
matching.
Most of this thesis is devoted to the investigation of the use of congruency of
the local phase over many scales as an illumination and contrast invariant measure
of feature significance at points in images. Phase congruency was first proposed by
Morrone et al. [62] and Morrone and Owens [61] as a computational model of the
-
1.3. CONTRIBUTIONS 5
perception of low-level features such as step edges, lines, and Mach bands in images.
However, due to practical difficulties in calculating phase congruency they developed
the use of a related quantity, local energy, for feature detection instead. The main
contribution of this thesis is to establish the importance of phase congruencys
invariance to illumination and contrast and to develop a practical implementation of
it. The goal of an illumination and contrast independent feature detector is achieved
and its reliable performance over a wide range of images using fixed thresholds is
demonstrated.
In achieving this goal a number of other contributions are developed. These
include an effective noise compensation technique - something that is often essential
when normalized image measures such as phase congruency are used. This noise
compensation technique makes minimal assumptions about the nature of the image
noise and can be applied to any image processing technique that makes use of
banks of filters over several scales. Another contribution is the recognition of the
importance of the spread of frequencies that are present at each point in a signal
when one is considering phase congruency. For phase congruency to be used as a
measure of feature significance it must be weighted by some measure of the spread
of frequencies present. A method for doing this is presented.
Also presented is an argument that when a frequency based approach is used
in the analysis of images a more logical interpretation of scale is obtained by using
high-pass filtering rather than low-pass or band-pass filtering. This approach results
in feature positions remaining stable over different scales of analysis, something that
is not achieved with low-pass or band-pass filtering.
Another contribution is the recognition that points of local symmetry and asym-
metry in images also give rise to special arrangements of phase, and these can be
readily detected. The new measures of local image symmetry and asymmetry that
are developed are unique in that they are dimensionless and that they do not require
any previous image segmentation to have taken place prior to analysis.
Finally, with the insights obtained from this work in the use of phase for feature
detection a new approach to the matching of signals is developed. This technique
-
6 CHAPTER 1. INTRODUCTION
uses correlation of local phase and amplitude information, rather than spatial in-
tensity data, for matching. An advantage of this new method is that it also allows
disparity between points in stereo images to be estimated.
1.4 Thesis Overview
Chapter 2 reviews the major approaches that have been used for low-level edge
detection and discuses their shortcomings. The main problems are that existing
approaches use very simple edge models, and that one cannot know in advance of
applying the edge operator what level of edge response will be significant. That
is, edge thresholds for individual images have be set interactively by viewing the
output. The local energy and phase congruency model of feature perception is then
introduced and previous work in this area is reviewed. A new geometric interpreta-
tion of phase congruency is provided and it is argued that phase congruency rather
than local energy should be used to identify features in images because it is a di-
mensionless quantity. However, while phase congruency appears to be an attractive
measure to use there are some difficulties in calculating it and the chapter concludes
by identifying these problems:
Phase congruency can be defined in 1D but it is not clear how it should becalculated in 2D.
Being a normalized quantity phase congruency responds strongly to noise insignals.
Phase congruency is only meaningful if there is a spread of frequency compo-nents present in a signal; how should this spread be measured?
Phase congruency appears to require a different interpretation of scale, sug-gesting that high-pass filtering rather than low-pass filtering should be used.
Chapter 3 sets out to develop a practical method for calculating phase con-
gruency in images. The first requirement is to identify an appropriate method
of obtaining local frequency information in images. Complex Gabor wavelets are
-
1.4. THESIS OVERVIEW 7
adopted for this purpose. It is then shown how phase congruency in 1D signals
can be readily calculated from the convolution outputs of a bank of complex Gabor
filters. The problem of noise is then considered and a method of automatically rec-
ognizing, and compensating for, the influence of noise on phase congruency values
in an image is devised. This is followed by a section covering the issues involved in
extending the calculation of phase congruency to 2D images. It is then shown how
the use of wavelets allow us to obtain a measure of the spread of frequencies present
at a point of phase congruency. This helps us determine the degree of significance
of a point of phase congruency and allows us to improve feature localization. Fi-
nally the issue of analysis at different scales is considered in more detail and it is
concluded that high-pass filtering should be used to obtain image information at
different scales instead of the more usually applied low-pass filtering.
Chapter 4 re-examines the work on phase congruency that was developed in the
previous chapter. Firstly the choice of the wavelet function used for the analysis
of images is considered. Of particular concern is the limited maximum bandwidth
that can be obtained using Gabor functions and it is concluded that the log Gabor
function is more appropriate as it allows one to construct filters of arbitrary band-
width. However, when these high bandwidth filters were used to calculate phase
congruency unexpected results were produced. The analysis of these results led to
the development of two new approaches to the calculation of phase congruency, one
of which produced very superior results. This work, in turn, led to a new frequency
based approach to the detection of points of local symmetry and asymmetry in im-
ages. It is shown how symmetry and asymmetry can be thought of as representing
generalizations of delta and step features respectively.
Chapter 5 changes subject and considers the matching of signals and the estima-
tion of disparity using local frequency information. Many of the ideas and insights
obtained from the work on phase congruency are employed to great benefit here.
Where this work differs mainly from other work in this area is in its integrated use
of frequency data over many scales. An approach to signal matching via correla-
tion of local phase and amplitude is developed. A by-product of this approach to
signal matching is that an estimate of the spatial shift required in one signal to
-
8 CHAPTER 1. INTRODUCTION
match the second is obtained. This allows rapid convergence to the correct match-
ing locations. The chapter concludes with some discussion about the advantages of
matching signals represented in the log frequency domain. In this domain spatial
scale changes in signals manifest themselves as a translation of the local amplitude
spectra along with an amplitude rescaling, however, shape of the spectra remains
unchanged. This invariance in the log frequency domain offers a number of inter-
esting possibilities. For example it may allow textures to be correctly recognized in
foreshortened views, or provide a new way of identifying surface slant from spatial
scale change in stereopsis or motion.
Finally, Chapter 6 concludes this work and discusses the areas that might be
developed further in future work. Four appendices are also included. Appendix 1
presents a comprehensive portfolio of experimental results comparing phase congru-
ency to the output of the Canny edge detector over a wide range of images. Phase
symmetry images are also presented for each image in the portfolio. In addition,
phase congruency images are presented for a number of test conditions to illustrate
its behaviour under different parameter settings. Appendix B looks at the sensitiv-
ity of the phase congruency noise compensation technique to different noise models,
showing that the noise model is not critical. Appendix C describes the problems in
performing non-maximum suppression on phase congruency images. The techniques
that were used in generating the final phase congruency edge maps are described,
and it is concluded that much work could be done on the problem of non-maximum
suppression. Finally, Appendix D describes some of the implementational details
for the calculation of phase congruency.
-
Chapter 2
Image features
2.1 Introduction
The detection of edges and other low-level features in images has long been rec-
ognized as a fundamental operation of great importance. A good line drawing
can provide much the information that might be contained in a photograph of the
same scene, and in doing so only requires a small fraction of the data used by the
photograph to represent that information. Indeed, line drawings can be easier to
interpret and are often used instead of photographs in technical manuals and How
to do it books. However, one has to be cautious in comparing the interpretability
of line drawings with photographs. Drawings made by humans are almost always
constructed with their semantic content in mind, particularly so for technical man-
uals. Extraneous details are removed and extra details that would not normally be
visible may be added, shading is also often used1. Thus a line drawing that has
been automatically generated with no regard to the images semantic content may
not provide all the information that one might hope to obtain. Nevertheless the
extraction of a line drawing is an important first step in the automated analysis of
a scene.
In searching for parameters to describe the significance of image features, such
as edges, we should be looking for measures that are invariant with respect to image
If one had a good automated feature detector one would be able to construct line drawingswith no regard to their semantic content, this would allow a fair comparison between line drawingsand photographs.
9
-
10 CHAPTER 2. IMAGE FEATURES
contrast and spatial magnification. Such quantities would provide an absolute mea-
sure of the significance of feature points that could be applied universally to any
image irrespective of image contrast and magnification. The human visual system is
able to reliably identify the significance of image features under widely varying con-
ditions. Even if the illumination of a scene is altered by several orders of magnitude
our interpretation of it will remain largely unchanged. Similarly, our interpreta-
tion of images is not greatly affected by changes in apparent spatial magnification,
though not with the same degree of tolerance that we have to illumination changes.
Despite the obvious importance of characterizing low-level image features in some
invariant manner almost no effort seems to have been devoted to this task. One
recent exception is the work of Heeger [35] in his development of a normalized model
of contrast sensitivity that qualitatively matches psychophysical data, though this
work is not directed at computer vision.
This chapter discusses some of the shortcomings of existing feature detectors
and introduces the idea of detecting features on the basis of phase congruency.
2.2 Gradient based feature detection
The majority of work in the detection of low-level image features has been concen-
trated on the identification of step discontinuities in images using gradient based
operators. Gradient based edge detection methods were pioneered by Roberts [77],
Prewitt [71] and Sobel [72, 86]. They were then developed in terms of a computa-
tional model of human perception by Marr and Hildreth [55, 54]. Inspired by the
presence of on-centre and off-centre receptive fields in the retina, Marr and Hildreth
developed a model where edges were detected via the zero-crossings of the image af-
ter convolution with a Laplacian of Gaussian filter. While this model was attractive
it had a number of difficulties: Zero-crossings always form closed contours, often not
realistically modelling the connectivity of image features; staircase intensity profiles
result in false positives being detected; and finally, with the second derivative of
the image being used the results are susceptible to noise. Marr also introduced the
concept of the Primal Sketch, that is, the idea that the brain generates a concise
-
2.2. GRADIENT BASED FEATURE DETECTION 11
representation of the scene that contains important images tokens, such as edges
and other basic image features, and that this representation permits further analy-
sis of the scene to be done more efficiently by the brain. This concept has greatly
influenced much of the research done in computer vision.
A number of variations of second derivative operators have been devised in
various attempts to overcome their deficiencies. Some examples of this include the
work of Fleck [22], Haralick [32], and Sarkar and Boyer [83]. Fleck and Haralick
used directional second derivatives to reduce the influence of noise, with Fleck
also employing first and third derivative information to eliminate the detection
of false positives. Sarkar and Boyer adopted the optimality criteria proposed by
Canny [11, 12] to develop infinite impulse response filters for the detection of edges
via zero crossings.
Canny [11, 12] formalized the problem of the detection of step edges in terms
of three criteria: good detection; good localization; and uniqueness of the response
to a single feature. Subsequently Spacek [87] and Deriche [16] followed Cannys
approach to develop similar operators; Deriche allowing the operator to have an
infinite impulse response and Spacek modifying the response uniqueness criterion.
An objection to these optimal detectors is that they are only optimal in a very
limited domain, that of one dimensional step edges in the presence of noise. At 2D
features such as corners and junctions where the intensity gradient becomes poorly
defined these detectors have difficulties.
Thus, a major problem with gradient based operators is that they use a single
model of an edge, that is, they assume edges are step discontinuities. In an ideal
system a feature detector would mark features wherever a good artist would draw
features when making a sketch of a scene. An artist produces marks in a sketch for
a wide range of feature types, not just step edges. Marks are drawn to indicate line,
roof and step edges along with other features such as shadow boundaries, highlights,
and presumably a range of other (unknown) feature types. Perona and Malik [68]
point out that many image features are represented by some combination of step,
delta, roof and ramp profiles. For example, a very commonly encountered feature
type is the occluding boundary of a convex object, such as a ball. If the ball surface
-
12 CHAPTER 2. IMAGE FEATURES
grey valuemeasured
background
lambertian sphere
illumination
Figure 1: Intensity profile observed across a lambertian sphere against a plainbackgound with overhead illumination. The occlusion boundary is not a simplestep edge.
is lambertian and the illumination is aligned with the viewing direction the feature
profile will consist of an intensity profile that starts off brightest at the mid-point
of the ball and then gets darker as our view moves across the ball as a result of
the surface normal becoming perpendicular to our viewing direction, and finally
culminating in a step jump to the grey level of the background (Figure 1).
In this simple, idealized situation we have a feature that is considerably more
complex than a step edge. In practice the situation will be far more awkward;
the ball surface is unlikely to be lambertian, lighting can be from any direction,
there may be mutual illumination effects between the ball and other objects, and of
course, the background may not be uniform. For this reason the word feature will
be generally used in this thesis rather than the word edge in order to emphasize
the aim of finding all important features that represent points of high information
content, not just step edges. The definition of what a feature is will be deliberately
left vague, though subsequent sections which describe the phase congruency model
of feature perception will offer a possible definition.
Some might argue that an automated feature detector does not need to attempt
-
2.2. GRADIENT BASED FEATURE DETECTION 13
to emulate human sketching skills. However, the interest in producing feature detec-
tors has been primarily inspired from the ability of artists to produce line drawings2.
Artists have shown us that line drawings can provide very compact yet effective de-
scriptions of scenes. Indeed, in the assessment of any automated feature detector
perhaps the best we can do is to compare its output against a line drawing of the
same scene made by an expert reproductive artist. After all it is artists who are our
best experts in representing scenes via line drawings. It is probably fair to say that
excessive emphasis has been placed on finding optimal step edge detectors and the
original objective, that of finding points of high information content in images has
been forgotten. Just because a detector is effective in finding and localizing noisy
step edges in a scene does not mean that it will represent the information in the
scene well.
A second problem with gradient based edge detectors is that they typically
characterize edge strength by the magnitude of the intensity gradient. Thus the
perceived strength or significance of an edge is sensitive to illumination and spatial
magnification variations. Intensity gradient has units of lux/radian (pixel coordi-
nates represent viewing direction and hence have angular units)3. Intensity gradi-
ents in images depend on many factors, including scene illumination, blurring and
magnification. For example, doubling the size of an image while leaving its intensity
values unchanged will halve all the gradients in the image. Any gradient based edge
detection process will need to use a threshold modified appropriately. However, in
general, one does not know in advance the level of contrast present in an image or
its magnification. The image gradient values that correspond to significant edges
are usually determined empirically.
Here the distinction is made between line drawings, which contain only lines, and sketcheswhich may also include shading.
Strictly speaking, image grey values should not be called intensity values. Intensity is definedas the luminous flux that is emitted per solid angle and is a property that is associated with alight source. Intensity has units candelas (lumens/steradian). In constructing an image a camerameasures the illumination at each point in the image plane that is received from a scene. Thus,image grey values have units lux (lumens/m). Despite this, the use of the term intensity valuefor an image grey value appears to be commonly accepted. David Marr used the term in thismanner in his book [54].
-
14 CHAPTER 2. IMAGE FEATURES
Little guidance is available for the setting of thresholds, indeed Faugeras4 can
only offer the following advice:
Thresholding is a plague that occurs in many areas in engineering, but
to our knowledge it is unavoidable and must be tackled with courage.
A limited number of efforts have been made to determine threshold values automat-
ically. In his thesis, Canny [11] sets his thresholds on the basis of local estimates
of image noise obtained via Weiner filtering. However, the details of setting thresh-
olds on this basis, and the effectiveness of this approach are not reported. Canny
also introduced the idea of thresholding hysteresis which has proved to be a useful
heuristic for maintaining the continuity of thresholded edges, though one then has
the problem of determining two threshold levels. Sarkar and Boyer [83] also em-
ployed Weiner filtering to estimate the derivative of the noise output in their zero
crossing based detector. Having an estimated slope of the noise response allowed
them to set thresholds appropriately. However, this process required them to take
three more derivatives after the image had been filtered by their edge operator.
This presumably limited the quality of the estimate of the derivative of the noise
output.
Kundu and Pal [50] devised a method of thresholding based on human psy-
chophysical data where contrast sensitivity varies with overall illumination levels.
However, it is hard to provide any concrete guide to the fitting of a model of contrast
sensitivity relative to a digitized grey scale of 0255. More recently Fleck [24, 23]
suggested setting thresholds at some multiple (typically 3 to 5) of the expected
standard deviation of the operator output when applied to camera noise. This ap-
proach of course, requires detailed a priori knowledge of the noise characteristics of
any camera used to take an image. Noise is always a concern for gradient based
detectors. The main tool used to reduce the influence of noise is spatial smoothing.
However, smoothing degrades feature localization, and 2D feature positions such as
corners can be severely corrupted (see Perona and Malik [69]). With high degrees
of smoothing feature locations can move significantly, and distinct features may
Olivier Faugeras. Three-Dimensional Computer Vision: A Geometric Viewpoint. MIT Press1993, p117.
-
2.2. GRADIENT BASED FEATURE DETECTION 15
merge. It is very unsatisfactory for the perceived location of a feature to depend on
how much smoothing was required to overcome the influence of noise. This issue
will be considered in more detail in the next chapter.
Bergholm [5] adopts the scale-space model in developing his edge focusing ap-
proach to edge detection, and in doing so addresses an number of problems asso-
ciated with gradient based detectors. He observes that to eliminate the influence
of noise on a gradient based detector a heavily smoothed image is required, but
this degrades edge localization. To achieve good localization no smoothing should
be used but then noise becomes a problem. Bergholms solution is to start with
an edge map at a heavily smoothed scale. He then proceeds to calculate an edge
map at a slightly finer scale but only at pixels in the image connected to edge pixels
found at the previous scale. The old edge points are discarded, the new ones at
the slightly finer scale retained, and the process is repeated. In this manner edges
are propagated out from their initial, rough locations and focused to their correct
positions at the finest scale. An important point is that the problem of noise is
overcome by starting with edges at a coarse scale and only looking for edges in
adjacent pixels as scale is gradually reduced. Another attractive feature is that
edge thresholding is only required to generate the initial edge map. However if this
initial map is incorrectly thresholded at too high a level then many features will
never be found. Conversely, if the threshold is too low many noise features will be
found and these will be propagated down to the finest scale.
The discussion so far has been directed at gradient based detectors though,
of course, other types of detectors have been developed. For example the weak
membrane approach of Blake and Zisserman [6] involves minimizing a global energy
function over the image in order to solve for a surface function that fits the image
in a manner that is considered to be appropriate. Blake and Zissermans energy
measure is a weighted combination of terms representing the deviation of the surface
function from the image, the square of the slope of the function, and the contour
length of the function. This can be interpreted as fitting a weak membrane to
the image data in such a way that discontinuities are preserved. An objection to
this approach is that the energy term is not dimensionally consistent with different
-
16 CHAPTER 2. IMAGE FEATURES
types of quantities being added together. This makes the result very sensitive to
the relative weightings of the terms that make up the energy.
Noble [64] devised a number of grey level morphological operations to detect
edges. She develops a dilation-erosion residue operator which is analogous to a
first derivative operator and is used as an edge strength map. A second operator
called the signed maximum dilation-erosion residue (analogous to a second deriva-
tive operator) is used to guide the tracing of edges, and to classify the responses to
the dilation-erosion residue operator. While Nobles approach is morphological, the
steps involved can be interpreted in terms of differential operators. Thus it depends
on using a simple edge model and it does not escape the thresholding problem.
Perona and Malik [69] devised an approach to edge detection using anisotropic
diffusion. They developed an approach to scale space smoothing that is based on
the heat diffusion equation. To detect edges they make the conduction coefficient
a function of the image gradient to impede the flow of heat. Thus step discon-
tinuities in the image form local barriers to the diffusion process. Over repeated
iterations of the diffusion process step edges in the image become sharper and re-
gions between the step discontinuities become smoother. Final extraction of the
edges then becomes straightforward. A very significant attribute of this approach
is that feature positions remain stable over scale. All that changes with scale is
the level of contrast (heat difference) required for a feature to persist. However,
this approach only detects step edges and is very much dependent on local image
contrast.
Another interesting approach that has been developed recently by Smith and
Brady [85] is the SUSAN edge finder. This non-linear technique involves indexing
a circular mask over the image and at each location determining the area of the
mask having similar intensity values to the centre pixel value. This segment of the
mask is denoted the Univalue Segment Assimilating Nucleus (USAN). Locations in
the image where the USAN is locally at a minimum (locally the Smallest USAN,
hence SUSAN) mark the positions of step and line features. The detector performs
well, and its tolerance to noise is a significant attribute. However, the detector is
not invariant to image contrast as it requires the setting of a threshold which is
-
2.3. LOCAL ENERGY AND PHASE CONGRUENCY 17
used to decide whether or not elements of the mask are similar to the centre value
when determining the size of the USAN. This threshold specifies the minimum edge
contrast that can be detected.
The discussion above represents a generalized overview and sampling of existing
edge detection techniques. Others have conducted far more comprehensive reviews
(for example Noble [64]), and it is not intended to repeat such a review here. The
main purpose of this overview is to point out that almost all existing edge detec-
tors are based on the calculation of intensity gradients or some other measure of
the spatial variation of intensity across the image. These measures are dimensional
quantities and hence depend on image contrast and spatial magnification. Thus
the fundamental problem is that one does not know in advance what level of edge
strength corresponds to a significant feature. As a result, edge thresholds are gener-
ally set by humans viewing the output and adjusting the threshold until the result
is deemed acceptable. This is not automated feature detection.
2.3 Local energy and phase congruency
The local energy model of feature perception is a relatively new model. It is not
based on the use of local intensity gradients for feature detection. Instead it postu-
lates that features are perceived at points in an image where the Fourier components
are maximally in phase5. For example, when one looks at the Fourier series that
makes up a square wave all the Fourier components are sine waves that are exactly
in phase at the point of the step at an angle of 0 or 180 degrees depending on
whether the step is upward or downward. At all other points in the square wave
individual phase values vary, making phase congruency low. Similarly one finds that
phase congruency is a maximum at the peaks of a triangular wave (at an angle of
90 or 270 degrees). A particularly important point about using phase congruency
to mark features of interest is that one is not making any assumption about the
It should be emphasized that when phase is referred to in this thesis it is local phase that isbeing considered. That is, we are concerned with the local phase of the signal at some positionx. This is distinct from phase values that one might obtain, say, from a FFT of a signal in whichphase values will be the phase offsets of each of the sinusoidal basis functions in the decomposition.
-
18 CHAPTER 2. IMAGE FEATURES
shape of the waveform at all. One is simply looking for points in the image where
there is a high degree of order in the Fourier domain.
Figure 2: Construction of square and triangular waveforms from their Fourier series.In both diagrams the first few terms of the respective Fourier series are plottedwith broken lines, the sum of these terms is the solid line. Notice how the Fouriercomponents are all in phase at the point of the step in the square wave, and at thepeaks and troughs of the triangular wave.
A wide range of feature types give rise to points of high phase congruency.
These include step edges, line and roof edges, and Mach bands. It was, in fact,
investigations into the phenomenon of Mach bands by Morrone et al. [62] that led
to the development of the local energy model. Mach bands are illusory bright and
dark bands that appear on the edges of trapezoidal intensity gradient ramps, for
example, on the edges of shadows. The classical explanation for the perception of
Mach bands has been lateral inhibition (see Ratliff [74]). However, this explanation
fails in that it predicts maximal perception of Mach bands on step edges, where
in fact we see none. In their paper, Morrone et al. show that at the points where
we perceive Mach bands the Fourier components of the signal are maximally in
phase (though not exactly in phase); this led to their hypothesis that we perceive
features in images at points of high phase congruency. Further work by Morrone and
Burr [60] and Ross et al. [80] went on to show that this model successfully explains a
number of other psychophysical effects in human feature perception. Other studies
of the sensitivity of the human visual system to phase information include that by
Burr [8], Field and Nachmias [21] and du Buf [18]. Fleet [25] argues strongly for
the use of phase information in the calculation of image velocities. He shows that
the motion of contours of constant phase in images provide a better measure of
-
2.3. LOCAL ENERGY AND PHASE CONGRUENCY 19
the motion field than contours of constant intensity amplitude in the image. Phase
information is more robust to noise, and shading and contrast variations in the
image.
The classic demonstration of the importance of phase was devised by Oppenheim
and Lim [65]. They took the Fourier transforms of two images and used the phase
information from one image and the magnitude information of the other to construct
a new, synthetic Fourier transform which was then back-transformed to produce a
new image. The features seen in such an image, while somewhat scrambled, clearly
correspond to those in the image from which the phase data was obtained. Little
evidence, if any, from the other image can be perceived. A demonstration of this is
repeated here in Figure 3.
With phase data demonstrated as being so important in the perception of images
it is natural that one should pursue the development of a feature detector that
operates on the basis of phase information. From their work on Mach bands Morrone
and Owens [61] quickly recognized that the local energy model had applications in
feature detection for computer vision.
2.3.1 Defining phase congruency
We shall first consider one dimensional signals. The phase congruency function is
developed from the Fourier series expansion of a signal, I at some location, x,
I(x) =n
An cos(nx+ n0) (1)
=n
An cos(n(x)) , (2)
where An represents the amplitude of the nth cosine component, is a constant
(usually 2), and n0 is the phase offset of the nth component (the phase offset also
allows sine terms in the series to be represented). The function n(x) represents
the local phase of the Fourier component at position x.
Morrone and Owens define the phase congruency function as
PC(x) = max(x)2[0,2]
nAn cos(n(x) (x))
nAn. (3)
-
20 CHAPTER 2. IMAGE FEATURES
(a) image providing magnitude data (b) image providing phase data
(c) phase and amplitude mixed image
Figure 3: When phase information from one image is combined with magnitudeinformation of another it is phase information that prevails.
The value of (x) that maximizes Equation 3 is the amplitude weighted mean local
phase angle of all the Fourier terms at the point being considered. Taking the cosine
of the difference between the actual phase angle of a frequency component and this
weighted mean, (x), generates a quantity approximately equal to one minus half
this difference squared (the Taylor expansion of cos(x) 1 x2/2 for small x).Thus finding where phase congruency is a maximum is approximately equivalent to
finding where the weighted variance of local phase angles, relative to the weighted
average local phase, is a minimum (see Figure 4).
-
2.3. LOCAL ENERGY AND PHASE CONGRUENCY 21
An
_
(x)
n(x)
weighted mean ofFourier components
Figure 4: Polar diagram of the components of a Fourier series at a point in a signal.The series is represented as a sequence of vectors, each vector having a length Anand local phase angle n.
2.3.2 Local energy
As it stands phase congruency is a rather awkward quantity to calculate. As an
alternative to this Venkatesh and Owens [89] show that points of maximum phase
congruency can be calculated equivalently by searching for peaks in the local energy
function. The local energy function is defined for a one dimensional luminance
profile, I(x), as the modulus of a complex number,
E(x) =I2(x) +H2(x) , (4)
where the real component is represented by I(x) and the imaginary component by
iH(x), where i =1 and H(x) is the Hilbert transform of I(x) (a 90 degree phase
shift of I(x)).
Venkatesh and Owens prove that energy is equal to phase congruency scaled by
the sum of the Fourier amplitudes, that is
E(x) = PC(x)n
An. (5)
Thus the local energy function is directly proportional to the phase congruency
function, so peaks in local energy will correspond to peaks in phase congruency.
-
22 CHAPTER 2. IMAGE FEATURES
Venkatesh and Owens formal proof is not repeated here but the relationship be-
tween phase congruency, energy and the sum of the Fourier amplitudes can be
seen geometrically in Figure 5. The local Fourier components are plotted as com-
plex vectors adding head to tail. The sum of these components projected onto
the real axis represent I(x), the original signal, and the projection onto the imag-
inary axis represents H(x), the Hilbert transform. The magnitude of the vector
from the origin to the end point is the total energy, E(x). One can see that E(x)
is equal to
nAncos(n(x) (x)). Recalling that phase congruency is equal tonAn(x)cos(n(x) (x))/
nAn we can see that phase congruency is the ratio
of E(x) to the overall path length taken by the local Fourier components in reach-
ing the end point. Thus, one can clearly see that the degree of phase congruency
is independent of the overall magnitude of the signal. This provides invariance to
variations in image illumination and/or contrast.
Realaxis
Imaginaryaxis
E(x)
(x)
nA
n(x)
_
nn
_
A cos( (x) (x))
H(x)
I(x)
Figure 5: Polar diagram showing the Fourier components at a location in the signalplotted head to tail. This arrangement illustrates the construction of energy, thesum of the Fourier amplitudes and phase congruency from the Fourier componentsof a signal.
Rather than compute local energy via the Hilbert transform of the original
luminance profile one can calculate a measure of local energy by convolving the
-
2.3. LOCAL ENERGY AND PHASE CONGRUENCY 23
signal with a pair of filters in quadrature. The signal is first convolved with a filter
designed to remove the DC component from the image. This result is saved and
the image is then convolved with a second filter that is in quadrature with the first
(the Hilbert transform of the first). This gives us two signals, each being a band
passed version of the original, and one being a 90 degree phase shift of the other.
The results of the two convolutions are then squared and summed to produce a
local energy function. Odd and even-symmetric Gabor functions can be used for
the quadrature pair of filters. Thus local energy is defined by
E(x) =(I(x) Me)2 + (I(x) Mo)2 , (6)
whereMe andMo denote the even and odd symmetric filters in quadrature. Figure 6
illustrates the calculation of local energy on a synthetic signal containing a variety
of features.
The calculation of energy from spatial filters in quadrature pairs has been cen-
tral to many models of human visual perception, for example those proposed by
Heeger [33, 34, 36], Adelson and Bergen [1] and Watson and Ahumada [93] to name
just a few. The significance of Venkatesh and Owens work is that they provide
another explanation for the perceptual importance of energy: Peaks in the energy
function correspond to points where phase congruency is a maximum.
From this early work by Morrone et al. [62], Morrone and Owens [61] and
Venkatesh and Owens [89] the local energy model was developed further. Owens
et al. [67] investigated the idempotency properties of the local energy feature de-
tector. They argue that when any feature detecting operator is applied to its own
output it should not change the output. That is, the primal sketch of a primal
sketch should be itself. Gradient based detectors fail in this respect because they
attempt to mark edges on each side of any line feature in an image. Local energy,
on the other hand, produces a single response on a line feature, and hence satisfies
the idempotency requirement. Venkatesh and Owens [88] investigated the classifi-
cation of image features via the phase angle at which phase congruency occurs. In
this manner they show how step, line and shadow edges can be distinguished from
each other.
-
24 CHAPTER 2. IMAGE FEATURES
0
0.5
1
1.5
2
2.5
0 50 100 150 200 250
signal
odd-symmetric filtereven-symmetric filter
-3
-2
-1
0
1
2
3
0 50 100 150 200 250
convolution with even filter
-3
-2
-1
0
1
2
3
0 50 100 150 200 250
convolution with odd filter
0
0.5
1
1.5
2
2.5
3
0 50 100 150 200 250
local energy
square and sum
Figure 6: Calculation of local energy via convolution with two filters in quadrature.
Aw et al. [4] in their work on image compression make use of the fact that
local energy makes no assumptions about the intensity profiles of features. They
used local energy to detect features across a range of images, collecting information
about commonly occuring intensity profiles of features in images. This catalogue of
feature profiles enabled them to efficiently encode images for compression.
-
2.3. LOCAL ENERGY AND PHASE CONGRUENCY 25
Owens [66] identifies the conditions under which images have no local maxima
in local energy, and hence are feature free. She also investigates image transforma-
tions under which image features are preserved. It is pointed out that some image
operations, such as addition between images, can destroy or create image features.
She proposes two new operators for the interaction between images which do not
corrupt feature structures within images. These operators are analogous to com-
plex multiplication and complex division. Using these operators Owens shows how
it is possible to decompose a signal into its feature component and its feature-free
component.
Other researchers who have studied the use of local energy for feature detection
are Perona and Malik [68], Freeman [29] and Ronse [78]. Perona and Maliks work
on local energy is interesting in that they arrive at a generalization of the model
without using the concept of phase congruency. They point out that image features
are generally composed of combinations of step, delta, roof and ramp structures.
Under these conditions it is shown that linear filters will produce systematic er-
rors in localization. Perona and Malik go on to show that a quadratic filtering
approach results in the correct detection and localization of composite features.
That is, instead of looking for maxima in (I(x) M) one should look for maximain
i (I(x) Mi)2, where the Mi are a series of different filters. The local energymodel, in its use of two filters in quadrature, can be seen to be a specific case of
quadratic filtering. Perona and Malik suggest that there is no special reason to
use filters in quadrature and argue that one might wish to use quite different sets
of filters. However, in the results they presented they chose to use two filters in
quadrature; the second derivative of a Gaussian and its Hilbert transform.
Freeman, in his thesis [29] studied the local energy model with particular em-
phasis on multi-orientation analysis and the behaviour of local energy at feature
junctions. He devised an approach to the detection and classification of feature
junctions. The filters he used were generally second and fourth derivatives of Gaus-
sians along with their corresponding Hilbert transforms, depending on the narrow-
ness of the frequency tuning he required. As a tool for his multi-orientation analysis
-
26 CHAPTER 2. IMAGE FEATURES
Freeman developed the concept of steerable filters whereby filter outputs at any ori-
entation can be efficiently computed from a linear combination of the outputs of a
limited number of basis filters. Of relevance to the work presented in this thesis,
Freeman developed a normalized measure of local energy. However, his motivation
for doing this was primarily to allow image information to be represented over a
small dynamic range rather than to specifically seek an invariant measure of feature
significance. Some of his post-processing techniques might also be considered to be
somewhat ad hoc. Despite this he considers a wide range of issues concerning the
use of local energy for feature detection.
Ronse [78] makes a detailed mathematical study of the idempotency properties
of the local energy model and the conditions of image modification over which local
energy remains invariant. An important result, that will be used later, is that
the locations of local energy peaks are invariant to smoothing of the image by a
Gaussian or any other function having zero Fourier phase.
Rosenthaler et al. [79] make a comprehensive study of the behaviour of local
energy at 2D image feature points. They develop a model of 2D feature detection
based on differential geometry, using the first and second derivatives of oriented local
energy to identify what they call keypoints. Robbins and Owens [76] have followed
on from Rosenthaler et al.s work and developed a simpler model of 2D feature
detection that does not resort to the use of derivatives of the local energy signal.
Instead, they detect 2D features by calculating oriented local energy over the image
and then calculate local energy of this local energy image, but in an orientation
perpendicular to the first. The second application of local energy detects the end
points of any features detected by the first application of local energy. This process
is then repeated over multiple orientations to capture all 2D features.
Wang and Jenkin [92] use complex Gabor filters to detect edges and bars in
images. They recognize that step edges and bars have specific local phase properties
which can be detected using filters in quadrature, however they do not connect the
significance of high local energy with the concept of phase congruency.
One issue that previous work on local energy has not really addressed is the
problem of how one should integrate data over many scales. If the perceptual
-
2.4. ISSUES IN CALCULATING PHASE CONGRUENCY 27
significance of a peak in local energy is due to it also being a maximum in phase
congruency then it is important to consider many scales simultaneously. After all,
it is the occurrence of phase congruency over a range frequencies that makes it
significant.
While the use of the local energy function to find peaks in phase congruency is
computationally convenient it does not provide a dimensionless measure of feature
significance as it is weighted by the sum of the Fourier component amplitudes,
which have units lux. Thus, like derivative based feature detectors, local energy
suffers from the problem that we are unable to specify in advance what level of
response corresponds to a significant feature. Despite this, local energy remains a
useful measure in that it responds to a wide range of feature types.
Phase congruency, on the other hand, is a dimensionless quantity. We obtain it
by normalizing the local energy function; dividing energy by the sum of the Fourier
amplitudes. Values of phase congruency vary from a maximum of 1, indicating a
very significant feature, down to 0 indicating no significance. This property offers
the promise of allowing one to specify universal feature thresholds, that is, we could
set thresholds before an image is seen - truly automated feature detection.
2.4 Issues in calculating phase congruency
This section describes an initial attempt at devising a way of calculating phase
congruency. What is highlighted is that there are a number of difficulties that
have to be overcome if a practical method of calculating phase congruency is to be
devised. These problems include the following: How should one extend the idea of
phase congruency to 2D signals? What is the appropriate way of controlling the
scale of analysis? How should information be integrated over many scales, and how
can the influence of noise be overcome?
As mentioned earlier, phase congruency is awkward to calculate. An initial
approach to calculating phase congruency might be to take a signal, remove its
DC component, (it is removed because a 90 degree phase shift of a zero frequency
does not have any meaning) calculate the Hilbert transform (say, by calculating
-
28 CHAPTER 2. IMAGE FEATURES
the Fourier transform, multiplying the result by i and then performing an inverse
Fourier transform), square and sum the Hilbert transform and the AC component
of the signal, and finally normalize the result by dividing by the sum of the Fourier
amplitudes. Results using this method were reported by Kovesi [47] (further work
in which wavelets are used to calculate phase congruency were also presented by
Kovesi [48]). An example of the calculation of phase congruency via the FFT is
shown in Figure 7.
square, sum and divide by sum of Fourier amplitudes
0
0.5
1
1.5
2
2.5
0 50 100 150 200 250
signal
-1.5-1
-0.50
0.51
1.52
2.5
0 50 100 150 200 250
signal (DC removed)
-4-3-2-10123
0 50 100 150 200 250
Hilbert transform
0
0.2
0.4
0.6
0.8
1
0 50 100 150 200 250
phase congruency
Figure 7: Calculation of phase congruency via the FFT. Notice how phase congru-ency values range between 0 and 1.
There are some problems with the calculation of phase congruency via the FFT.
Firstly it is not clear how one adapts this approach for one-dimensional signals to
two dimensions; the Hilbert transform is only defined in one dimension. A sec-
ond difficulty is that the Fourier transform is not good for localizing frequency
-
2.4. ISSUES IN CALCULATING PHASE CONGRUENCY 29
information spatially. In the example shown in Figure 7 the Fourier transform
was calculated over the whole signal. Thus phase congruency at each point was
calculated with respect to the whole signal. To control the local scale and spa-
tial extent over which phase congruency is determined we have to use windowing
of the signal. Windowing introduces the problem of having to balance spatial lo-
calization against the range of frequencies we wish to analyze; the window width
controlling spatial localization but also constraining the lowest frequency we can
measure. Figure 8 shows the result of calculating phase congruency using a rect-
angular windowing function 32 points wide. The computational procedure was as
follows: Over each windowed section of the signal the Fourier transform was calcu-
lated, and the Hilbert transform generated. The signal value (minus the DC value)
and the Hilbert transform value at the centre of the window was then squared and
summed; this quantity would then be divided by the sum of the Fourier amplitudes
over the current window to produce a phase congruency value at the centre position
of the window. The window would then be indexed one point forward in the signal
and the process repeated. Notice how the peaks in phase congruency are higher
and more distinct. By windowing the signal each feature is considered in relative
isolation to the others and hence ends up being considered to be very significant.
An important point to note here is that for the calculation of phase congruency the
natural scale parameter to vary is the size of the analysis window over which we
calculate local frequency information. A large window means that the significance
of features are determined in a more global manner, and a small window results
in features being treated individually and locally. This leads to a new concept of
multi-scale analysis which will be discussed in detail in the next chapter.
If the scale of analysis of phase congruency is controlled by window size we must
consider what might happen when a windowed section of signal contains no features
and only consists of noise. Being a normalized quantity, phase congruency does not
depend on the magnitude of a feature on its own, it depends on the magnitude of
the feature in the context of the local window. Thus, if the signal is purely noise
each fluctuation in the signal will be considered quite significant relative to the
surrounding features as they will all be of similar magnitude. Hence, noise poses
-
30 CHAPTER 2. IMAGE FEATURES
square, sum and divide by sum of Fourier amplitudes
0
0.5
1
1.5
2
2.5
0 50 100 150 200 250
signal
-1.5-1
-0.50
0.51
1.52
2.5
0 50 100 150 200 250
signal (DC removed)
-4-3-2-10123
0 50 100 150 200 250
Hilbert transform
0
0.2
0.4
0.6
0.8
1
0 50 100 150 200 250
phase congruency
Figure 8: Calculation of phase congruency via the FFT using a rectangular win-dowing function 32 points wide.
a serious difficulty for us in trying to devise a practical way of calculating phase
congruency in images. Figure 9 shows what happens if we introduce a small amount
of noise into our signal. In regions that are distant from features the influence of
noise becomes very noticeable.
A further issue we must also consider is that phase congruency as defined in
Equation 3 does not take into account the spread of frequencies that are congruent
at a point. For example, a signal containing only one frequency component, say a
sine wave, will be in perfect congruence with itself and hence have phase congruency
of 1 everywhere (the Hilbert transform of sine is cosine, and sin2(x) + cos2(x) is
identically 1 and so no point x has maximal local energy). To mark all such points
as features would not make sense. Significant feature points are presumably ones
-
2.5. SUMMARY 31
square, sum and divide by sum of Fourier amplitudes
-1.5-1
-0.50
0.51
1.52
2.5
0 50 100 150 200 250
noisy signal (DC removed)
-4-3-2-10123
0 50 100 150 200 250
Hilbert transform
0
0.5
1
1.5
2
2.5
0 50 100 150 200 250
noisy signal
0
0.2
0.4
0.6
0.8
1
0 50 100 150 200 250
phase congruency
Figure 9: Phase congruency of a noisy signal calculated using a rectangular win-dowing function 32 points wide.
with high information content; a point of phase congruency indicates a point of
high information content only if we have a wide range of frequencies present. We
do not gain much information from knowing the phase congruency of a signal which
has only one frequency component.
2.5 Summary
This chapter has briefly re-examined the aims of feature detection. The objective
should be to find points of high information content in images. This objective is not
necessarily satisfied by finding optimal ways of detecting step edges in the presence
of noise. Ideally, the ability to detect features and assess their significance should
-
32 CHAPTER 2. IMAGE FEATURES
be independent of image contrast and spatial magnification. This implies that we
need to measure feature significance via a dimensionless quantity.
The shortcomings of derivative based feature detectors have been briefly re-
viewed. The main problems included an inability to specify in advance what level
of response corresponds to a significant feature, and the fact that they are generally
only designed to detect step edges. The local energy model of feature perception
has been introduced. This model has been inspired from psychophysical data and it
detects a wide range of feature types. Local energy can be normalized to produce a
measure of phase congruency; an approximation of the standard deviation in phase
angles of the Fourier components at a point in the signal. Phase congruency is
a dimensionless quantity and is thus an attractive way of detecting features and
identifying their significance. It provides an absolute measure of the significance of
feature points in an image and this offers the promise of allowing constant thresh-
old values to be applied across wide classes of images. Thresholds could then be
specified in advance of processing any image, and not have to be determined by
trial and error after processing.
However, there are a number of issues to be addressed. How should phase
congruency be calculated in 2D images? How should we calculate local frequency
information and control the scale of analysis? How do we deal with the influence of
noise, and how do we identify the range of frequencies present at a point of phase
congruency? These issues, and others, are addressed in the following chapter which
will describe how phase congruency can be calculated using wavelets.
-
Chapter 3
Phase congruency from wavelets
3.1 Introduction
This chapter describes a new way of calculating phase congruency using wavelets. In
calculating phase congruency it is important to obtain spatially localized frequency
information in images, wavelets offer perhaps the best way of doing this. The use
of wavelets is also biologically inspired; the interest in calculating phase congruency
is motivated from psychophysical results, hence, it would seem natural that one
should try to calculate it using biologically plausible computational machinery. In
this respect geometrically scaled spatial filters in quadrature pairs will be used. In
addition to this it will be seen how the use of wavelets allows one to address the
issues raised at the end of the previous chapter regarding the calculation of phase
congruency.
The material that will be covered in this chapter is organized as follows: First,
it will be shown how local frequency information can be obtained using quadrature
pairs of wavelets, concentrating in particular on the use of Gabor wavelets. From
this it is relatively straightforward to develop the ideas behind the calculation of
phase congruency in one dimensional signals using wavelets. Material is then pre-
sented to address the difficulties regarding the calculation of phase congruency that
were introduced in the previous chapter. First, the influence of noise in the cal-
culation of phase congruency is considered and an effective method for identifying
and compensating for noise is developed. This is followed by a section covering the
33
-
34 CHAPTER 3. PHASE CONGRUENCY FROM WAVELETS
issues involved in extending the calculation of phase congruency to 2D images. It
is then shown how the use of wavelets allow us to obtain a measure of the spread
of frequencies present at a point of phase congruency. This helps one determine
the degree of significance of a point of phase congruency and allows to improve
feature localization. The issue of analysis at different scales is then considered and
it is argued that high-pass filtering should be used to obtain image information at
different scales instead of the more usually applied low-pass filtering. Finally, some
results and conclusions are presented.
3.2 Using Wavelets for Local Frequency Analysis
Recently the Wavelet Transform has become one of the methods of choice for ob-
taining local frequency information. Most of the current literature on wavelets
can be traced back to the work of Morlet et al. [59] Morlet and his co-workers
were interested in obtaining temporally localized frequency data in their analysis
of geophysical signals. The basic idea behind wavelet analysis is that one uses a
bank of filters to analyze the signal. The filters are all created from rescalings of
the one wave shape, each scaling designed to pick out particular frequencies of the
signal being analyzed. An important feature is that the scales of the filters vary
geometrically, giving rise to a logarithmic frequency scale.
However, many of these ideas were developed earlier by Granlund [30]. In this
remarkable paper he developed many of the ideas behind what we would now call
multi-scale wavelet analysis. He also proposed an image feature detector that is
closely related to the local energy model. For some reason Granlunds paper has
remained relatively unnoticed despite its innovative nature, though his work has
been developed by Wilson, Calway and Knutsson (see, for example, Wilson, Cal-
way and Granlund [96], Knutsson, Wilson and Granlund [43], Calway and Wil-
son [10] and Calway, Knutsson and Wilson [9]). From the initial work of Morlet
and his colleagues wavelet theory has been subsequently developed by Grossmann
and Morlet [31], Meyer [58], Daubechies [14], Mallat [53] and many others.
-
3.2. USING WAVELETS FOR LOCAL FREQUENCY ANALYSIS 35
We are interested in calculating local frequency and, in particular, phase infor-
mation in signals. To preserve phase information linear phase filters must be used,
that is, we must use wavelets that are symmetric/anti-symmetric. This constraint
means that the work on orthogonal wavelets (which dominates much of the litera-
ture) is not applicable to us. Chui [13] provides a proof that, with the exception
of the Haar wavelet, one cannot have a wavelet of compact support that is both
symmetric and orthogonal. The Haar wavelet is rectangular in shape and is clearly
not appropriate for our needs.
For this work the approach of Morlet will be followed, that is, using wavelets
based on complex valued Gabor functions - sine and cosine waves, each modulated
by a Gaussian. Using two filters in quadrature enables one to calculate the ampli-
tude and phase of the signal for a particular frequency at a given spatial location. It
should be noted that these wavelets are not orthogonal; some conditions must apply
in order to achieve reasonable signal reconstruction after decomposition. However,
we only require approximate reconstruction up to a scale factor over a band of
frequencies or wavelet scales.
Odd Wavelet
Even Wavelet
Odd Wavelet
Even Wavelet
Figure 10: Gabor wavelet: a sine and cosine wave modulated by a Gaussian.
If the bank of wavelet filters is designed so that the transfer function of each
filter overlaps sufficiently with its neighbours in such a way that the sum of all
the transfer functions forms a relatively uniform coverage of the spectrum one can
reconstruct the decomposed signal over a band of frequencies up to a scale factor.
(If the transfer functions are scaled so that when their sum is taken we obtain a
-
36 CHAPTER 3. PHASE CONGRUENCY FROM WAVELETS
uniform transfer function of magnitude one, the reconstructed signal will have the
original scale.) Therefore, a problem we have is determining the appropriate scaling
factor between successive centre frequencies so that the overlap between transfer
functions results in an even spectral coverage. Granlund [30] suggests that the upper
cutoff frequency of one transfer function (where it falls to half its maximum value)
should coincide with the lower cutoff frequency of the next function. However, in
practice this does not produce particularly even coverage, and a closer spacing is
generally desirable. In the results presented in this chapter the filters used have
had bandwidths of approximately one octave with a scaling between successive
centre frequencies of 1.5. This arrangement was arrived at by experimentation, the
values are not critical and a wide range of parameters produce satisfactory results.
Referring to Figure 11 one can see that, in this example, the sum of the spectra of
the five wavelets produces a relatively ideal band-pass filter, especially when viewed
on the log frequency scale. Design of the wavelet bank ends up being a compromise
between wishing to form a smooth sum of spectra while at the same time minimizing
the number of filters used so as to minimize the computation requirements.
Analysis of a signal is done by convolving the signal with each of the wavelets.
If we let I denote the signal and Menand Mo
ndenote the even and odd wavelets at
a scale n, the amplitude of the transform at a given wavelet scale is given by
An(x) =(I(x) Men)2 + (I(x) Mon)2 (7)
and the phase is given by
n(x) = atan2(I(x) Men, I(x) Mon). (8)
Note that from now on n will be used to refer to wavelet scale (previously n has
denoted frequency in the Fourier series of a signal).
The results of convolving a signal with a bank of wavelets can be displayed
graphically via a scalogram (Figure 12). Each row of the scalogram is the result of
convolving the signal with a quadrature pair of wavelets at a certain scale. Phase
is plotted by mapping 0360 degrees to 0255 grey levels (note therefore, that
the black/while discontinuities in the scalogram correspond to the wrap-around in
-
3.2. USING WAVELETS FOR LOCAL FREQUENCY ANALYSIS 37
even-symmetric wavelets odd-symmetric wavelets
0 0.125 0.25 0.375 0.5frequency
spectra
.125 .5log frequency
spectra (log w)
0 0.125 0.25 0.375 0.5frequency
sum of spectra
.125 .5log frequency
sum of spectra (log w)
Figure 11: Five wavelets and their respective Fourier Transforms indicating whichsections of the spectrum each wavelet responds to. Collectively the wavelets pro-vide a wide coverage of the spectrum, though with some overlap. Note that on alogarithmic frequency scale the spectra are identical.
phase). The vertical axis of the scalogram is a logarithmic frequency scale, with the
lowest frequency at the bottom. Each column of the scalogram can be considered to
be a local Fourier spectrum for each point. Note that to achieve a dense scalogram
such as shown here the scaling factor between successive filter center frequencies
will be only slightly greater than 1.
The phase plot of the scalogram is of particular interest because it enables one
to actually see the points of high phase congruency. At locations in the signal where
there are large step changes one can see a vertical line of constant grey value in the
phase diagram indicating a constant phase angle over all frequencies at that point
in the signal.
-
38 CHAPTER 3. PHASE CONGRUENCY FROM WAVELETS
Signal to be Analyzed
Magnitude of Scalogram
* * * *Phase of Scalogram
Figure 12: A one dimensional signal and its amplitude and phase scalograms. Thehorizontal axes of the scalograms correspond directly with the signals horizontalaxis. The vertical axes of the scalograms correspond to a logarithmic frequency scalewith low frequencies at the bottom. The asterisks mark vertical lines of constantphase that occur at the step transitions in the signal. These are points of phasecongruency. (Note: the phase scalogram is presented by mapping 0360 degrees to0255 grey levels).
-
3.3. CALCULATING PHASE CONGRUENCY VIA WAVELETS 39
3.3 Calculating Phase Congruency Via Wavelets
To calculate phase congruency we need to construct the fo