1996, kovesi, invariant measures of image features from phase information

INVARIANT MEASURES OF IMAGE FEATURES FROM

PHASE INFORMATION

This thesis is

presented to the

Department of Psychology

for the degree of

Doctor of Philosophy

of the

University of Western Australia

By

Peter Kovesi

May 1996

cfl Copyright 1996

by

Peter Kovesi

ii

Abstract

Invariant Measures of Image Features From Phase Information

If reliable and general computer vision techniques are to be developed it is crucial

that we find ways of characterizing low-level image features with invariant quanti-

ties. For example, if edge significance could be measured in a way that was invariant

to image illumination and contrast, higher-level image processing operations could

be conducted with much greater confidence. However, despite their importance,

little attention has been paid to the need for invariant quantities in low-level vision

for tasks such as feature detection or feature matching.

This thesis develops a number of invariant low-level image measures for feature

detection, local symmetry/asymmetry detection, and for signal matching. These

invariant quantities are developed from representations of the image in the frequency

domain. In particular, phase data is used as the fundamental building block for

constructing these measures. Phase congruency is developed as an illumination

and contrast invariant measure of feature significance. This allows edges, lines and

other features to be detected reliably, and fixed thresholds can be applied over wide

classes of images. Points of local symmetry and asymmetry in images give rise

to special arrangements of phase, and these too can be characterized by invariant

measures. Finally, a new approach to signal matching that uses correlation of

local phase and amplitude information is developed. This approach allows reliable

phase based disparity measurements to be made, overcomingmany of the difficulties

associated with scale-space singularities.

iii

Acknowledgements

First of all I would like to thank my supervisors John Ross and James Trevelyan.

With their gentle guidance and encouragement, the odd searching question, and

the occasional nudge, they ensured that progress was always maintained. In each of

them I have also greatly valued their enormous breadth of knowledge that spanned

many disciplines. This helped me keep my thoughts open and wide ranging as I

searched for answers to my problems.

I must also thank my other supervisor, my wife Robyn Owens, for an uncount-

able number of technical discussions, for her proof-reading skills, and for always

being there and making the generation of this thesis far less traumatic than I would

have dared to hope for. I thank Grace, Genevieve, and later in the generation of this

thesis, Gabriel for their tolerance and patience while Daddy did his Pee-Aiche-Dee.

I would also like to acknowledge the many hours of useful discussions I have had

with Ben Robbins, Chris Pudney, Mike Robins, and Adrian Baddeley. Ben Robbins

pointed out the efficiencies that can be made in the Fourier convolution of an image

with a quadrature pair of filters. This must have saved me many hours of waiting

and allowed me to do many more experiments that I would have done otherwise.

Others I must thank include the following: Daniel Reisfeld who introduced me

to the problem of finding local symmetry in images, resulting in many long and

impassioned discussions on the subject; Concetta Morrone for her amazing grasp

of both the psychophysics and computer vision literature, and therefore, always

being able to suggest yet another paper I should read; Carlo Tomasi for his help

in converting an early version of my phase congruency code from C to a MATLAB

script; Olivier Faugeras and his colleagues for their hospitality and the fine working

environment they have developed at INRIA in Sophia Antipolis which I was able

v

to enjoy during my visit there during the first half of 1995.

Finally I thank everyone in The Robotics and Vision Research Group in the

Department of Computer Science at The University of Western Australia for the

enjoyable working environment that they contribute to.

vi

Contents

Abstract iii

Acknowledgements v

1 Introduction 1

1.1 The Need for Invariant Quantities in Images . . . . . . . . . . . . . 1

1.2 The Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.3 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.4 Thesis Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2 Image features 9

2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.2 Gradient based feature detection . . . . . . . . . . . . . . . . . . . 10

2.3 Local energy and phase congruency . . . . . . . . . . . . . . . . . . 17

2.3.1 Defining phase congruency . . . . . . . . . . . . . . . . . . . 19

2.3.2 Local energy . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

2.4 Issues in calculating phase congruency . . . . . . . . . . . . . . . . 27

2.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

3 Phase congruency from wavelets 33

3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

3.2 Using Wavelets for Local Frequency Analysis . . . . . . . . . . . . . 34

3.3 Calculating Phase Congruency Via Wavelets . . . . . . . . . . . . . 39

3.4 Noise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

3.5 Extension to two dimensions . . . . . . . . . . . . . . . . . . . . . . 49

vii

3.5.1 2D filter design . . . . . . . . . . . . . . . . . . . . . . . . . 49

3.5.2 Filter orientations . . . . . . . . . . . . . . . . . . . . . . . . 50

3.5.3 Noise compensation in two dimensions . . . . . . . . . . . . 52

3.5.4 Combining data over several orientations . . . . . . . . . . . 53

3.6 The importance of frequency spread . . . . . . . . . . . . . . . . . . 54

3.7 Scale via high-pass filtering . . . . . . . . . . . . . . . . . . . . . . 61

3.7.1 Difficulties with low-pass filtering . . . . . . . . . . . . . . . 61

3.7.2 High-pass filtering . . . . . . . . . . . . . . . . . . . . . . . 63

3.7.3 High-pass filtering and scale-space . . . . . . . . . . . . . . . 67

3.8 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . 69

3.9 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

4 A second look at phase congruency 77

4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

4.2 Log Gabor wavelets . . . . . . . . . . . . . . . . . . . . . . . . . . . 78

4.3 Phase congruency from broad bandwidth filters . . . . . . . . . . . 84

4.4 Another way of defining phase congruency . . . . . . . . . . . . . . 85

4.4.1 Calculation of PC2 via quadrature pairs of filters . . . . . . 87

4.5 A third measure of phase congruency . . . . . . . . . . . . . . . . . 93

4.5.1 Calculation of PC3 via quadrature pairs of filters . . . . . . 93

4.6 Biological computation of phase congruency . . . . . . . . . . . . . 95

4.7 Symmetry and Asymmetry: Special patterns of phase . . . . . . . . 97

4.7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 97

4.7.2 A frequency approach to symmetry . . . . . . . . . . . . . . 98

4.7.3 Biological computation of symmetry and asymmetry . . . . 103

4.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103

5 Representation and matching of signals 105

5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105

5.2 Spatial Correlation . . . . . . . . . . . . . . . . . . . . . . . . . . . 106

5.3 Phase Based Disparity Measurement . . . . . . . . . . . . . . . . . 107

5.4 Matching Using Localized Frequency Data . . . . . . . . . . . . . . 110

viii

5.5 Using Phase to Guide Matching . . . . . . . . . . . . . . . . . . . . 111

5.6 Determining Relative Signal Distortion . . . . . . . . . . . . . . . . 113

5.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117

6 Conclusion 119

6.1 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119

6.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121

Bibliography 123

A Portfolio of experimental results 133

A.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133

A.2 Portfolio . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133

A.2.1 Image acknowledgements . . . . . . . . . . . . . . . . . . . . 136

A.3 Parameter variations . . . . . . . . . . . . . . . . . . . . . . . . . . 153

B Noise models and noise compensation 161

B.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161

B.2 Noise generators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162

B.3 Noise spectra measured from images . . . . . . . . . . . . . . . . . 164

B.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165

C Non-maximal suppression 167

C.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167

C.2 Non-maximal suppression using feature orientation information . . . 168

C.3 Orientation from the feature image . . . . . . . . . . . . . . . . . . 169

C.4 Morphological approaches . . . . . . . . . . . . . . . . . . . . . . . 171

C.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172

D Implementation details 173

D.1 MATLAB Implementation . . . . . . . . . . . . . . . . . . . . . . . 175

ix

Chapter 1

Introduction

1.1 The Need for Invariant Quantities in Images

This thesis is concerned with the search for measures of image features that remain

constant over wide ranges of viewing conditions. Such invariant quantities provide

powerful tools for the analysis of images, allowing image processing algorithms to

work more reliably and over wider classes of images. The work presented in this

thesis concentrates on invariant quantities in low-level or early vision.

Some effort has been devoted to investigating invariant measures of higher level

structures in images, for example, Hu [37] developed a series of invariant moments

for recognizing binary objects. More recently there has been considerable interest

in geometric invariance, the study of geometric properties of objects that remain

invariant to imaging transformations. A collection of papers in this area can be

found in the book by Mundy and Zisserman [63]. However, little attention has

been paid to the invariant quantities that might exist in low-level or early vision

for tasks such as feature detection or feature matching. Some limited exceptions

to this include the work of Koenderink and van Doorn [44, 45] who recognized

the importance of differential invariants associated with motion fields, and Florack

et al. [28] who propose differential invariants for characterizing a number of image

contour properties. However, in general, interest in low-level image invariants has

been limited. This is surprising considering the fundamental importance of being

able to obtain reliable results from low level image operations in order to successfully

1

2 CHAPTER 1. INTRODUCTION

perform any higher level operations.

There are two main points about an invariant measure: firstly, of course, it must

be dimensionless (that is, have no units attached to it), and secondly it should rep-

resent some meaningful and useful quality. If it does not represent some meaningful

quality one has no idea how to use it. It is easy to construct a dimensionless quan-

tity that is meaningless, for example, the ratio of my height to the width of the

letter o. It is also easy to find measures that are useful but not dimensionless, for

example, the speed of your car. However, it is often hard to define something that

is both dimensionless and useful.

Why do we want to find invariant quantities? Quantities that are useful but

not dimensionless are generally only useful because they are applied in relatively

structured environments and at a specific scale. For example, using the speed of

your car to decide whether you are driving safely only works because most cars are

similar in size, roadways are standardized and gravitational forces are effectively

constant. Images, on the other hand, provide a very dynamic and unstructured

environment in which we struggle to make our algorithms operate. Objects can

appear with arbitrary orientation and spatial magnification along with arbitrary

brightness and contrast. Thus the search for invariant quantities is very important

for computer vision.

It is all too easy to forget that a number inside a computer often has units

associated with it. The fact that a number has units associated with it imposes

some constraints on how it should be used. For example, it does not make sense to

add a quantity representing time to one representing a length. Despite this, it is

quite common to find such nonsensical combinations of quantities in the computer

vision literature. There are many algorithms that involve the minimization of some

energy; often the energy is defined to be the addition of many components, each

having very different units. For example, energy minimizing splines (snakes) are

usually formulated in terms of the minimization of an energy that is made up of

an intensity gradient term and a spline bending term [42]. These two components,

while representing meaningful quantities, are not dimensionless. This means that

for energy minimizing splines to be effective their parameters have to be tuned

1.2. THE APPROACH 3

carefully for each individual application. The parameters are used to balance the

relative importance of individual components of the overall energy. If, say, the

overall image contrast was halved one would need to double the weighting applied

to the intensity gradient term to retain the same snake behaviour. If one was to

somehow replace the intensity gradient and spline bending terms with dimensionless

quantities that represented, in some way, the closeness of the spline to a feature

and the deformation of the spline, one would be able to use fixed parameters over

wider classes of images.

Clearly there is a pressing need for the identification of low level invariant quan-

tities in images. The main form of invariance that will be investigated in this thesis

is invariance to image illumination and contrast. That is, this thesis will be seeking

to construct low level image measures that have a response that is independent of

the image illumination and/or contrast.

1.2 The Approach

In the search for low level invariant quantities in images the approach taken in this

thesis is to make use of data from representations of the image in the frequency

domain. Working directly in the spatial domain is avoided for two reasons. Firstly,

the spatial domain of an image, while convenient and intuitive, almost always forces

one into making use of dimensional measures in the analysis of an image; it is

hard to get away from the use of intensity gradients, contrast levels or equivalent

quantities. Secondly, low level spatial techniques have been extensively researched,

and while one cannot say all possibilities have been exhausted, the opportunities

for the development of significantly new techniques appear limited.

The most logical alternative approach is to consider representations of the image

in the frequency domain; much of the psychophysical literature in visual perception

has been devoted to the development of models in this domain. However, these

psychophysical models have generally not been developed to the point where they

could be implemented as algorithms in a computer vision system.

With an image represented in terms of the variation of amplitude and phase


values with frequency one has a number of new and interesting possibilities in the

analysis of image signals. However, so far in the computer vision literature, very

little work has been done on the use of frequency data to recognize and charac-

terize features in signals. Some notable exceptions to this include the following:

Granlund [30], who proposed a multiscale Fourier transform approach to the analy-

sis of images; Knutsson, Wilson and Granlund [43], who developed these ideas fur-

ther for image coding and the restoration of noisy images; Morrone and Owens [61],

who use phase congruency as a means of finding image features; Fleet and Jep-

son [26], who used phase to determine image velocities; and Langley, Atherton and

Wilson [51], Fleet, Jepson and Jenkin [27] and Calway, Knutsson and Wilson [9],

who have investigated the use of phase information to estimate image disparities.

Jones and Malik [40, 41] have also used local frequency information for determining

disparity, though they do not directly use phase information.

In this thesis considerable effort is devoted to the understanding of the variations

of phase and amplitude over frequency for different image features. In particular,

phase data is used as the fundamental building block for the various low-level in-

variant feature measures that are developed in this thesis. Phase information is

an ideal starting point for the development of invariant measures for two reasons.

Firstly, phase itself is a dimensionless quantity, and secondly phase information has

been shown to be crucial in the perception of images [65]. This is discussed further

in Chapter 2.

1.3 Contributions

This thesis develops a number of invariant frequency based low-level image quan-

tities for feature detection, local symmetry/asymmetry detection, and for signal

matching.

Most of this thesis is devoted to the investigation of the use of congruency of

the local phase over many scales as an illumination and contrast invariant measure

of feature significance at points in images. Phase congruency was first proposed by

Morrone et al. [62] and Morrone and Owens [61] as a computational model of the

1.3. CONTRIBUTIONS 5

perception of low-level features such as step edges, lines, and Mach bands in images.

However, due to practical difficulties in calculating phase congruency they developed

the use of a related quantity, local energy, for feature detection instead. The main

contribution of this thesis is to establish the importance of phase congruencys

invariance to illumination and contrast and to develop a practical implementation of

it. The goal of an illumination and contrast independent feature detector is achieved

and its reliable performance over a wide range of images using fixed thresholds is

demonstrated.

In achieving this goal a number of other contributions are developed. These

include an effective noise compensation technique - something that is often essential

when normalized image measures such as phase congruency are used. This noise

compensation technique makes minimal assumptions about the nature of the image

noise and can be applied to any image processing technique that makes use of

banks of filters over several scales. Another contribution is the recognition of the

importance of the spread of frequencies that are present at each point in a signal

when one is considering phase congruency. For phase congruency to be used as a

measure of feature significance it must be weighted by some measure of the spread

of frequencies present. A method for doing this is presented.

Also presented is an argument that when a frequency based approach is used

in the analysis of images a more logical interpretation of scale is obtained by using

high-pass filtering rather than low-pass or band-pass filtering. This approach results

in feature positions remaining stable over different scales of analysis, something that

is not achieved with low-pass or band-pass filtering.

Another contribution is the recognition that points of local symmetry and asym-

metry in images also give rise to special arrangements of phase, and these can be

readily detected. The new measures of local image symmetry and asymmetry that

are developed are unique in that they are dimensionless and that they do not require

any previous image segmentation to have taken place prior to analysis.

Finally, with the insights obtained from this work in the use of phase for feature

detection a new approach to the matching of signals is developed. This technique


uses correlation of local phase and amplitude information, rather than spatial in-

tensity data, for matching. An advantage of this new method is that it also allows

disparity between points in stereo images to be estimated.

1.4 Thesis Overview

Chapter 2 reviews the major approaches that have been used for low-level edge

detection and discuses their shortcomings. The main problems are that existing

approaches use very simple edge models, and that one cannot know in advance of

applying the edge operator what level of edge response will be significant. That

is, edge thresholds for individual images have be set interactively by viewing the

output. The local energy and phase congruency model of feature perception is then

introduced and previous work in this area is reviewed. A new geometric interpreta-

tion of phase congruency is provided and it is argued that phase congruency rather

than local energy should be used to identify features in images because it is a di-

mensionless quantity. However, while phase congruency appears to be an attractive

measure to use there are some difficulties in calculating it and the chapter concludes

by identifying these problems:

Phase congruency can be defined in 1D but it is not clear how it should becalculated in 2D.

Being a normalized quantity phase congruency responds strongly to noise insignals.

Phase congruency is only meaningful if there is a spread of frequency compo-nents present in a signal; how should this spread be measured?

Phase congruency appears to require a different interpretation of scale, sug-gesting that high-pass filtering rather than low-pass filtering should be used.

Chapter 3 sets out to develop a practical method for calculating phase con-

gruency in images. The first requirement is to identify an appropriate method

of obtaining local frequency information in images. Complex Gabor wavelets are

1.4. THESIS OVERVIEW 7

adopted for this purpose. It is then shown how phase congruency in 1D signals

can be readily calculated from the convolution outputs of a bank of complex Gabor

filters. The problem of noise is then considered and a method of automatically rec-

ognizing, and compensating for, the influence of noise on phase congruency values

in an image is devised. This is followed by a section covering the issues involved in

extending the calculation of phase congruency to 2D images. It is then shown how

the use of wavelets allow us to obtain a measure of the spread of frequencies present

at a point of phase congruency. This helps us determine the degree of significance

of a point of phase congruency and allows us to improve feature localization. Fi-

nally the issue of analysis at different scales is considered in more detail and it is

concluded that high-pass filtering should be used to obtain image information at

different scales instead of the more usually applied low-pass filtering.

Chapter 4 re-examines the work on phase congruency that was developed in the

previous chapter. Firstly the choice of the wavelet function used for the analysis

of images is considered. Of particular concern is the limited maximum bandwidth

that can be obtained using Gabor functions and it is concluded that the log Gabor

function is more appropriate as it allows one to construct filters of arbitrary band-

width. However, when these high bandwidth filters were used to calculate phase

congruency unexpected results were produced. The analysis of these results led to

the development of two new approaches to the calculation of phase congruency, one

of which produced very superior results. This work, in turn, led to a new frequency

based approach to the detection of points of local symmetry and asymmetry in im-

ages. It is shown how symmetry and asymmetry can be thought of as representing

generalizations of delta and step features respectively.

Chapter 5 changes subject and considers the matching of signals and the estima-

tion of disparity using local frequency information. Many of the ideas and insights

obtained from the work on phase congruency are employed to great benefit here.

Where this work differs mainly from other work in this area is in its integrated use

of frequency data over many scales. An approach to signal matching via correla-

tion of local phase and amplitude is developed. A by-product of this approach to

signal matching is that an estimate of the spatial shift required in one signal to


match the second is obtained. This allows rapid convergence to the correct match-

ing locations. The chapter concludes with some discussion about the advantages of

matching signals represented in the log frequency domain. In this domain spatial

scale changes in signals manifest themselves as a translation of the local amplitude

spectra along with an amplitude rescaling, however, shape of the spectra remains

unchanged. This invariance in the log frequency domain offers a number of inter-

esting possibilities. For example it may allow textures to be correctly recognized in

foreshortened views, or provide a new way of identifying surface slant from spatial

scale change in stereopsis or motion.

Finally, Chapter 6 concludes this work and discusses the areas that might be

developed further in future work. Four appendices are also included. Appendix 1

presents a comprehensive portfolio of experimental results comparing phase congru-

ency to the output of the Canny edge detector over a wide range of images. Phase

symmetry images are also presented for each image in the portfolio. In addition,

phase congruency images are presented for a number of test conditions to illustrate

its behaviour under different parameter settings. Appendix B looks at the sensitiv-

ity of the phase congruency noise compensation technique to different noise models,

showing that the noise model is not critical. Appendix C describes the problems in

performing non-maximum suppression on phase congruency images. The techniques

that were used in generating the final phase congruency edge maps are described,

and it is concluded that much work could be done on the problem of non-maximum

suppression. Finally, Appendix D describes some of the implementational details

for the calculation of phase congruency.

Chapter 2

Image features

2.1 Introduction

The detection of edges and other low-level features in images has long been rec-

ognized as a fundamental operation of great importance. A good line drawing

can provide much the information that might be contained in a photograph of the

same scene, and in doing so only requires a small fraction of the data used by the

photograph to represent that information. Indeed, line drawings can be easier to

interpret and are often used instead of photographs in technical manuals and How

to do it books. However, one has to be cautious in comparing the interpretability

of line drawings with photographs. Drawings made by humans are almost always

constructed with their semantic content in mind, particularly so for technical man-

uals. Extraneous details are removed and extra details that would not normally be

visible may be added, shading is also often used1. Thus a line drawing that has

been automatically generated with no regard to the images semantic content may

not provide all the information that one might hope to obtain. Nevertheless the

extraction of a line drawing is an important first step in the automated analysis of

a scene.

In searching for parameters to describe the significance of image features, such

as edges, we should be looking for measures that are invariant with respect to image

If one had a good automated feature detector one would be able to construct line drawingswith no regard to their semantic content, this would allow a fair comparison between line drawingsand photographs.

9

10 CHAPTER 2. IMAGE FEATURES

contrast and spatial magnification. Such quantities would provide an absolute mea-

sure of the significance of feature points that could be applied universally to any

image irrespective of image contrast and magnification. The human visual system is

able to reliably identify the significance of image features under widely varying con-

ditions. Even if the illumination of a scene is altered by several orders of magnitude

our interpretation of it will remain largely unchanged. Similarly, our interpreta-

tion of images is not greatly affected by changes in apparent spatial magnification,

though not with the same degree of tolerance that we have to illumination changes.

Despite the obvious importance of characterizing low-level image features in some

invariant manner almost no effort seems to have been devoted to this task. One

recent exception is the work of Heeger [35] in his development of a normalized model

of contrast sensitivity that qualitatively matches psychophysical data, though this

work is not directed at computer vision.

This chapter discusses some of the shortcomings of existing feature detectors

and introduces the idea of detecting features on the basis of phase congruency.

2.2 Gradient based feature detection

The majority of work in the detection of low-level image features has been concen-

trated on the identification of step discontinuities in images using gradient based

operators. Gradient based edge detection methods were pioneered by Roberts [77],

Prewitt [71] and Sobel [72, 86]. They were then developed in terms of a computa-

tional model of human perception by Marr and Hildreth [55, 54]. Inspired by the

presence of on-centre and off-centre receptive fields in the retina, Marr and Hildreth

developed a model where edges were detected via the zero-crossings of the image af-

ter convolution with a Laplacian of Gaussian filter. While this model was attractive

it had a number of difficulties: Zero-crossings always form closed contours, often not

realistically modelling the connectivity of image features; staircase intensity profiles

result in false positives being detected; and finally, with the second derivative of

the image being used the results are susceptible to noise. Marr also introduced the

concept of the Primal Sketch, that is, the idea that the brain generates a concise

2.2. GRADIENT BASED FEATURE DETECTION 11

representation of the scene that contains important images tokens, such as edges

and other basic image features, and that this representation permits further analy-

sis of the scene to be done more efficiently by the brain. This concept has greatly

influenced much of the research done in computer vision.

A number of variations of second derivative operators have been devised in

various attempts to overcome their deficiencies. Some examples of this include the

work of Fleck [22], Haralick [32], and Sarkar and Boyer [83]. Fleck and Haralick

used directional second derivatives to reduce the influence of noise, with Fleck

also employing first and third derivative information to eliminate the detection

of false positives. Sarkar and Boyer adopted the optimality criteria proposed by

Canny [11, 12] to develop infinite impulse response filters for the detection of edges

via zero crossings.

Canny [11, 12] formalized the problem of the detection of step edges in terms

of three criteria: good detection; good localization; and uniqueness of the response

to a single feature. Subsequently Spacek [87] and Deriche [16] followed Cannys

approach to develop similar operators; Deriche allowing the operator to have an

infinite impulse response and Spacek modifying the response uniqueness criterion.

An objection to these optimal detectors is that they are only optimal in a very

limited domain, that of one dimensional step edges in the presence of noise. At 2D

features such as corners and junctions where the intensity gradient becomes poorly

defined these detectors have difficulties.

Thus, a major problem with gradient based operators is that they use a single

model of an edge, that is, they assume edges are step discontinuities. In an ideal

system a feature detector would mark features wherever a good artist would draw

features when making a sketch of a scene. An artist produces marks in a sketch for

a wide range of feature types, not just step edges. Marks are drawn to indicate line,

roof and step edges along with other features such as shadow boundaries, highlights,

and presumably a range of other (unknown) feature types. Perona and Malik [68]

point out that many image features are represented by some combination of step,

delta, roof and ramp profiles. For example, a very commonly encountered feature

type is the occluding boundary of a convex object, such as a ball. If the ball surface


grey valuemeasured

background

lambertian sphere

illumination

Figure 1: Intensity profile observed across a lambertian sphere against a plainbackgound with overhead illumination. The occlusion boundary is not a simplestep edge.

is lambertian and the illumination is aligned with the viewing direction the feature

profile will consist of an intensity profile that starts off brightest at the mid-point

of the ball and then gets darker as our view moves across the ball as a result of

the surface normal becoming perpendicular to our viewing direction, and finally

culminating in a step jump to the grey level of the background (Figure 1).

In this simple, idealized situation we have a feature that is considerably more

complex than a step edge. In practice the situation will be far more awkward;

the ball surface is unlikely to be lambertian, lighting can be from any direction,

there may be mutual illumination effects between the ball and other objects, and of

course, the background may not be uniform. For this reason the word feature will

be generally used in this thesis rather than the word edge in order to emphasize

the aim of finding all important features that represent points of high information

content, not just step edges. The definition of what a feature is will be deliberately

left vague, though subsequent sections which describe the phase congruency model

of feature perception will offer a possible definition.

Some might argue that an automated feature detector does not need to attempt


to emulate human sketching skills. However, the interest in producing feature detec-

tors has been primarily inspired from the ability of artists to produce line drawings2.

Artists have shown us that line drawings can provide very compact yet effective de-

scriptions of scenes. Indeed, in the assessment of any automated feature detector

perhaps the best we can do is to compare its output against a line drawing of the

same scene made by an expert reproductive artist. After all it is artists who are our

best experts in representing scenes via line drawings. It is probably fair to say that

excessive emphasis has been placed on finding optimal step edge detectors and the

original objective, that of finding points of high information content in images has

been forgotten. Just because a detector is effective in finding and localizing noisy

step edges in a scene does not mean that it will represent the information in the

scene well.

A second problem with gradient based edge detectors is that they typically

characterize edge strength by the magnitude of the intensity gradient. Thus the

perceived strength or significance of an edge is sensitive to illumination and spatial

magnification variations. Intensity gradient has units of lux/radian (pixel coordi-

nates represent viewing direction and hence have angular units)3. Intensity gradi-

ents in images depend on many factors, including scene illumination, blurring and

magnification. For example, doubling the size of an image while leaving its intensity

values unchanged will halve all the gradients in the image. Any gradient based edge

detection process will need to use a threshold modified appropriately. However, in

general, one does not know in advance the level of contrast present in an image or

its magnification. The image gradient values that correspond to significant edges

are usually determined empirically.

Here the distinction is made between line drawings, which contain only lines, and sketcheswhich may also include shading.

Strictly speaking, image grey values should not be called intensity values. Intensity is definedas the luminous flux that is emitted per solid angle and is a property that is associated with alight source. Intensity has units candelas (lumens/steradian). In constructing an image a camerameasures the illumination at each point in the image plane that is received from a scene. Thus,image grey values have units lux (lumens/m). Despite this, the use of the term intensity valuefor an image grey value appears to be commonly accepted. David Marr used the term in thismanner in his book [54].


Little guidance is available for the setting of thresholds, indeed Faugeras4 can

only offer the following advice:

Thresholding is a plague that occurs in many areas in engineering, but

to our knowledge it is unavoidable and must be tackled with courage.

A limited number of efforts have been made to determine threshold values automat-

ically. In his thesis, Canny [11] sets his thresholds on the basis of local estimates

of image noise obtained via Weiner filtering. However, the details of setting thresh-

olds on this basis, and the effectiveness of this approach are not reported. Canny

also introduced the idea of thresholding hysteresis which has proved to be a useful

heuristic for maintaining the continuity of thresholded edges, though one then has

the problem of determining two threshold levels. Sarkar and Boyer [83] also em-

ployed Weiner filtering to estimate the derivative of the noise output in their zero

crossing based detector. Having an estimated slope of the noise response allowed

them to set thresholds appropriately. However, this process required them to take

three more derivatives after the image had been filtered by their edge operator.

This presumably limited the quality of the estimate of the derivative of the noise

output.

Kundu and Pal [50] devised a method of thresholding based on human psy-

chophysical data where contrast sensitivity varies with overall illumination levels.

However, it is hard to provide any concrete guide to the fitting of a model of contrast

sensitivity relative to a digitized grey scale of 0255. More recently Fleck [24, 23]

suggested setting thresholds at some multiple (typically 3 to 5) of the expected

standard deviation of the operator output when applied to camera noise. This ap-

proach of course, requires detailed a priori knowledge of the noise characteristics of

any camera used to take an image. Noise is always a concern for gradient based

detectors. The main tool used to reduce the influence of noise is spatial smoothing.

However, smoothing degrades feature localization, and 2D feature positions such as

corners can be severely corrupted (see Perona and Malik [69]). With high degrees

of smoothing feature locations can move significantly, and distinct features may

Olivier Faugeras. Three-Dimensional Computer Vision: A Geometric Viewpoint. MIT Press1993, p117.


merge. It is very unsatisfactory for the perceived location of a feature to depend on

how much smoothing was required to overcome the influence of noise. This issue

will be considered in more detail in the next chapter.

Bergholm [5] adopts the scale-space model in developing his edge focusing ap-

proach to edge detection, and in doing so addresses an number of problems asso-

ciated with gradient based detectors. He observes that to eliminate the influence

of noise on a gradient based detector a heavily smoothed image is required, but

this degrades edge localization. To achieve good localization no smoothing should

be used but then noise becomes a problem. Bergholms solution is to start with

an edge map at a heavily smoothed scale. He then proceeds to calculate an edge

map at a slightly finer scale but only at pixels in the image connected to edge pixels

found at the previous scale. The old edge points are discarded, the new ones at

the slightly finer scale retained, and the process is repeated. In this manner edges

are propagated out from their initial, rough locations and focused to their correct

positions at the finest scale. An important point is that the problem of noise is

overcome by starting with edges at a coarse scale and only looking for edges in

adjacent pixels as scale is gradually reduced. Another attractive feature is that

edge thresholding is only required to generate the initial edge map. However if this

initial map is incorrectly thresholded at too high a level then many features will

never be found. Conversely, if the threshold is too low many noise features will be

found and these will be propagated down to the finest scale.

The discussion so far has been directed at gradient based detectors though,

of course, other types of detectors have been developed. For example the weak

membrane approach of Blake and Zisserman [6] involves minimizing a global energy

function over the image in order to solve for a surface function that fits the image

in a manner that is considered to be appropriate. Blake and Zissermans energy

measure is a weighted combination of terms representing the deviation of the surface

function from the image, the square of the slope of the function, and the contour

length of the function. This can be interpreted as fitting a weak membrane to

the image data in such a way that discontinuities are preserved. An objection to

this approach is that the energy term is not dimensionally consistent with different


types of quantities being added together. This makes the result very sensitive to

the relative weightings of the terms that make up the energy.

Noble [64] devised a number of grey level morphological operations to detect

edges. She develops a dilation-erosion residue operator which is analogous to a

first derivative operator and is used as an edge strength map. A second operator

called the signed maximum dilation-erosion residue (analogous to a second deriva-

tive operator) is used to guide the tracing of edges, and to classify the responses to

the dilation-erosion residue operator. While Nobles approach is morphological, the

steps involved can be interpreted in terms of differential operators. Thus it depends

on using a simple edge model and it does not escape the thresholding problem.

Perona and Malik [69] devised an approach to edge detection using anisotropic

diffusion. They developed an approach to scale space smoothing that is based on

the heat diffusion equation. To detect edges they make the conduction coefficient

a function of the image gradient to impede the flow of heat. Thus step discon-

tinuities in the image form local barriers to the diffusion process. Over repeated

iterations of the diffusion process step edges in the image become sharper and re-

gions between the step discontinuities become smoother. Final extraction of the

edges then becomes straightforward. A very significant attribute of this approach

is that feature positions remain stable over scale. All that changes with scale is

the level of contrast (heat difference) required for a feature to persist. However,

this approach only detects step edges and is very much dependent on local image

contrast.

Another interesting approach that has been developed recently by Smith and

Brady [85] is the SUSAN edge finder. This non-linear technique involves indexing

a circular mask over the image and at each location determining the area of the

mask having similar intensity values to the centre pixel value. This segment of the

mask is denoted the Univalue Segment Assimilating Nucleus (USAN). Locations in

the image where the USAN is locally at a minimum (locally the Smallest USAN,

hence SUSAN) mark the positions of step and line features. The detector performs

well, and its tolerance to noise is a significant attribute. However, the detector is

not invariant to image contrast as it requires the setting of a threshold which is

2.3. LOCAL ENERGY AND PHASE CONGRUENCY 17

used to decide whether or not elements of the mask are similar to the centre value

when determining the size of the USAN. This threshold specifies the minimum edge

contrast that can be detected.

The discussion above represents a generalized overview and sampling of existing

edge detection techniques. Others have conducted far more comprehensive reviews

(for example Noble [64]), and it is not intended to repeat such a review here. The

main purpose of this overview is to point out that almost all existing edge detec-

tors are based on the calculation of intensity gradients or some other measure of

the spatial variation of intensity across the image. These measures are dimensional

quantities and hence depend on image contrast and spatial magnification. Thus

the fundamental problem is that one does not know in advance what level of edge

strength corresponds to a significant feature. As a result, edge thresholds are gener-

ally set by humans viewing the output and adjusting the threshold until the result

is deemed acceptable. This is not automated feature detection.

2.3 Local energy and phase congruency

The local energy model of feature perception is a relatively new model. It is not

based on the use of local intensity gradients for feature detection. Instead it postu-

lates that features are perceived at points in an image where the Fourier components

are maximally in phase5. For example, when one looks at the Fourier series that

makes up a square wave all the Fourier components are sine waves that are exactly

in phase at the point of the step at an angle of 0 or 180 degrees depending on

whether the step is upward or downward. At all other points in the square wave

individual phase values vary, making phase congruency low. Similarly one finds that

phase congruency is a maximum at the peaks of a triangular wave (at an angle of

90 or 270 degrees). A particularly important point about using phase congruency

to mark features of interest is that one is not making any assumption about the

It should be emphasized that when phase is referred to in this thesis it is local phase that isbeing considered. That is, we are concerned with the local phase of the signal at some positionx. This is distinct from phase values that one might obtain, say, from a FFT of a signal in whichphase values will be the phase offsets of each of the sinusoidal basis functions in the decomposition.


shape of the waveform at all. One is simply looking for points in the image where

there is a high degree of order in the Fourier domain.

Figure 2: Construction of square and triangular waveforms from their Fourier series.In both diagrams the first few terms of the respective Fourier series are plottedwith broken lines, the sum of these terms is the solid line. Notice how the Fouriercomponents are all in phase at the point of the step in the square wave, and at thepeaks and troughs of the triangular wave.

A wide range of feature types give rise to points of high phase congruency.

These include step edges, line and roof edges, and Mach bands. It was, in fact,

investigations into the phenomenon of Mach bands by Morrone et al. [62] that led

to the development of the local energy model. Mach bands are illusory bright and

dark bands that appear on the edges of trapezoidal intensity gradient ramps, for

example, on the edges of shadows. The classical explanation for the perception of

Mach bands has been lateral inhibition (see Ratliff [74]). However, this explanation

fails in that it predicts maximal perception of Mach bands on step edges, where

in fact we see none. In their paper, Morrone et al. show that at the points where

we perceive Mach bands the Fourier components of the signal are maximally in

phase (though not exactly in phase); this led to their hypothesis that we perceive

features in images at points of high phase congruency. Further work by Morrone and

Burr [60] and Ross et al. [80] went on to show that this model successfully explains a

number of other psychophysical effects in human feature perception. Other studies

of the sensitivity of the human visual system to phase information include that by

Burr [8], Field and Nachmias [21] and du Buf [18]. Fleet [25] argues strongly for

the use of phase information in the calculation of image velocities. He shows that

the motion of contours of constant phase in images provide a better measure of


the motion field than contours of constant intensity amplitude in the image. Phase

information is more robust to noise, and shading and contrast variations in the

image.

The classic demonstration of the importance of phase was devised by Oppenheim

and Lim [65]. They took the Fourier transforms of two images and used the phase

information from one image and the magnitude information of the other to construct

a new, synthetic Fourier transform which was then back-transformed to produce a

new image. The features seen in such an image, while somewhat scrambled, clearly

correspond to those in the image from which the phase data was obtained. Little

evidence, if any, from the other image can be perceived. A demonstration of this is

repeated here in Figure 3.

With phase data demonstrated as being so important in the perception of images

it is natural that one should pursue the development of a feature detector that

operates on the basis of phase information. From their work on Mach bands Morrone

and Owens [61] quickly recognized that the local energy model had applications in

feature detection for computer vision.

2.3.1 Defining phase congruency

We shall first consider one dimensional signals. The phase congruency function is

developed from the Fourier series expansion of a signal, I at some location, x,

I(x) =n

An cos(nx+ n0) (1)

=n

An cos(n(x)) , (2)

where An represents the amplitude of the nth cosine component, is a constant

(usually 2), and n0 is the phase offset of the nth component (the phase offset also

allows sine terms in the series to be represented). The function n(x) represents

the local phase of the Fourier component at position x.

Morrone and Owens define the phase congruency function as

PC(x) = max(x)2[0,2]

nAn cos(n(x) (x))

nAn. (3)


(a) image providing magnitude data (b) image providing phase data

(c) phase and amplitude mixed image

Figure 3: When phase information from one image is combined with magnitudeinformation of another it is phase information that prevails.

The value of (x) that maximizes Equation 3 is the amplitude weighted mean local

phase angle of all the Fourier terms at the point being considered. Taking the cosine

of the difference between the actual phase angle of a frequency component and this

weighted mean, (x), generates a quantity approximately equal to one minus half

this difference squared (the Taylor expansion of cos(x) 1 x2/2 for small x).Thus finding where phase congruency is a maximum is approximately equivalent to

finding where the weighted variance of local phase angles, relative to the weighted

average local phase, is a minimum (see Figure 4).


An

_

(x)

n(x)

weighted mean ofFourier components

Figure 4: Polar diagram of the components of a Fourier series at a point in a signal.The series is represented as a sequence of vectors, each vector having a length Anand local phase angle n.

2.3.2 Local energy

As it stands phase congruency is a rather awkward quantity to calculate. As an

alternative to this Venkatesh and Owens [89] show that points of maximum phase

congruency can be calculated equivalently by searching for peaks in the local energy

function. The local energy function is defined for a one dimensional luminance

profile, I(x), as the modulus of a complex number,

E(x) =I2(x) +H2(x) , (4)

where the real component is represented by I(x) and the imaginary component by

iH(x), where i =1 and H(x) is the Hilbert transform of I(x) (a 90 degree phase

shift of I(x)).

Venkatesh and Owens prove that energy is equal to phase congruency scaled by

the sum of the Fourier amplitudes, that is

E(x) = PC(x)n

An. (5)

Thus the local energy function is directly proportional to the phase congruency

function, so peaks in local energy will correspond to peaks in phase congruency.


Venkatesh and Owens formal proof is not repeated here but the relationship be-

tween phase congruency, energy and the sum of the Fourier amplitudes can be

seen geometrically in Figure 5. The local Fourier components are plotted as com-

plex vectors adding head to tail. The sum of these components projected onto

the real axis represent I(x), the original signal, and the projection onto the imag-

inary axis represents H(x), the Hilbert transform. The magnitude of the vector

from the origin to the end point is the total energy, E(x). One can see that E(x)

is equal to

nAncos(n(x) (x)). Recalling that phase congruency is equal tonAn(x)cos(n(x) (x))/

nAn we can see that phase congruency is the ratio

of E(x) to the overall path length taken by the local Fourier components in reach-

ing the end point. Thus, one can clearly see that the degree of phase congruency

is independent of the overall magnitude of the signal. This provides invariance to

variations in image illumination and/or contrast.

Realaxis

Imaginaryaxis

E(x)

(x)

nA

n(x)

_

nn

_

A cos( (x) (x))

H(x)

I(x)

Figure 5: Polar diagram showing the Fourier components at a location in the signalplotted head to tail. This arrangement illustrates the construction of energy, thesum of the Fourier amplitudes and phase congruency from the Fourier componentsof a signal.

Rather than compute local energy via the Hilbert transform of the original

luminance profile one can calculate a measure of local energy by convolving the


signal with a pair of filters in quadrature. The signal is first convolved with a filter

designed to remove the DC component from the image. This result is saved and

the image is then convolved with a second filter that is in quadrature with the first

(the Hilbert transform of the first). This gives us two signals, each being a band

passed version of the original, and one being a 90 degree phase shift of the other.

The results of the two convolutions are then squared and summed to produce a

local energy function. Odd and even-symmetric Gabor functions can be used for

the quadrature pair of filters. Thus local energy is defined by

E(x) =(I(x) Me)2 + (I(x) Mo)2 , (6)

whereMe andMo denote the even and odd symmetric filters in quadrature. Figure 6

illustrates the calculation of local energy on a synthetic signal containing a variety

of features.

The calculation of energy from spatial filters in quadrature pairs has been cen-

tral to many models of human visual perception, for example those proposed by

Heeger [33, 34, 36], Adelson and Bergen [1] and Watson and Ahumada [93] to name

just a few. The significance of Venkatesh and Owens work is that they provide

another explanation for the perceptual importance of energy: Peaks in the energy

function correspond to points where phase congruency is a maximum.

From this early work by Morrone et al. [62], Morrone and Owens [61] and

Venkatesh and Owens [89] the local energy model was developed further. Owens

et al. [67] investigated the idempotency properties of the local energy feature de-

tector. They argue that when any feature detecting operator is applied to its own

output it should not change the output. That is, the primal sketch of a primal

sketch should be itself. Gradient based detectors fail in this respect because they

attempt to mark edges on each side of any line feature in an image. Local energy,

on the other hand, produces a single response on a line feature, and hence satisfies

the idempotency requirement. Venkatesh and Owens [88] investigated the classifi-

cation of image features via the phase angle at which phase congruency occurs. In

this manner they show how step, line and shadow edges can be distinguished from

each other.


0

0.5

1

1.5

2

2.5

0 50 100 150 200 250

signal

odd-symmetric filtereven-symmetric filter

-3

-2

-1

0

1

2

3

0 50 100 150 200 250

convolution with even filter

-3

-2

-1

0

1

2

3

0 50 100 150 200 250

convolution with odd filter

0

0.5

1

1.5

2

2.5

3

0 50 100 150 200 250

local energy

square and sum

Figure 6: Calculation of local energy via convolution with two filters in quadrature.

Aw et al. [4] in their work on image compression make use of the fact that

local energy makes no assumptions about the intensity profiles of features. They

used local energy to detect features across a range of images, collecting information

about commonly occuring intensity profiles of features in images. This catalogue of

feature profiles enabled them to efficiently encode images for compression.


Owens [66] identifies the conditions under which images have no local maxima

in local energy, and hence are feature free. She also investigates image transforma-

tions under which image features are preserved. It is pointed out that some image

operations, such as addition between images, can destroy or create image features.

She proposes two new operators for the interaction between images which do not

corrupt feature structures within images. These operators are analogous to com-

plex multiplication and complex division. Using these operators Owens shows how

it is possible to decompose a signal into its feature component and its feature-free

component.

Other researchers who have studied the use of local energy for feature detection

are Perona and Malik [68], Freeman [29] and Ronse [78]. Perona and Maliks work

on local energy is interesting in that they arrive at a generalization of the model

without using the concept of phase congruency. They point out that image features

are generally composed of combinations of step, delta, roof and ramp structures.

Under these conditions it is shown that linear filters will produce systematic er-

rors in localization. Perona and Malik go on to show that a quadratic filtering

approach results in the correct detection and localization of composite features.

That is, instead of looking for maxima in (I(x) M) one should look for maximain

i (I(x) Mi)2, where the Mi are a series of different filters. The local energymodel, in its use of two filters in quadrature, can be seen to be a specific case of

quadratic filtering. Perona and Malik suggest that there is no special reason to

use filters in quadrature and argue that one might wish to use quite different sets

of filters. However, in the results they presented they chose to use two filters in

quadrature; the second derivative of a Gaussian and its Hilbert transform.

Freeman, in his thesis [29] studied the local energy model with particular em-

phasis on multi-orientation analysis and the behaviour of local energy at feature

junctions. He devised an approach to the detection and classification of feature

junctions. The filters he used were generally second and fourth derivatives of Gaus-

sians along with their corresponding Hilbert transforms, depending on the narrow-

ness of the frequency tuning he required. As a tool for his multi-orientation analysis


Freeman developed the concept of steerable filters whereby filter outputs at any ori-

entation can be efficiently computed from a linear combination of the outputs of a

limited number of basis filters. Of relevance to the work presented in this thesis,

Freeman developed a normalized measure of local energy. However, his motivation

for doing this was primarily to allow image information to be represented over a

small dynamic range rather than to specifically seek an invariant measure of feature

significance. Some of his post-processing techniques might also be considered to be

somewhat ad hoc. Despite this he considers a wide range of issues concerning the

use of local energy for feature detection.

Ronse [78] makes a detailed mathematical study of the idempotency properties

of the local energy model and the conditions of image modification over which local

energy remains invariant. An important result, that will be used later, is that

the locations of local energy peaks are invariant to smoothing of the image by a

Gaussian or any other function having zero Fourier phase.

Rosenthaler et al. [79] make a comprehensive study of the behaviour of local

energy at 2D image feature points. They develop a model of 2D feature detection

based on differential geometry, using the first and second derivatives of oriented local

energy to identify what they call keypoints. Robbins and Owens [76] have followed

on from Rosenthaler et al.s work and developed a simpler model of 2D feature

detection that does not resort to the use of derivatives of the local energy signal.

Instead, they detect 2D features by calculating oriented local energy over the image

and then calculate local energy of this local energy image, but in an orientation

perpendicular to the first. The second application of local energy detects the end

points of any features detected by the first application of local energy. This process

is then repeated over multiple orientations to capture all 2D features.

Wang and Jenkin [92] use complex Gabor filters to detect edges and bars in

images. They recognize that step edges and bars have specific local phase properties

which can be detected using filters in quadrature, however they do not connect the

significance of high local energy with the concept of phase congruency.

One issue that previous work on local energy has not really addressed is the

problem of how one should integrate data over many scales. If the perceptual

2.4. ISSUES IN CALCULATING PHASE CONGRUENCY 27

significance of a peak in local energy is due to it also being a maximum in phase

congruency then it is important to consider many scales simultaneously. After all,

it is the occurrence of phase congruency over a range frequencies that makes it

significant.

While the use of the local energy function to find peaks in phase congruency is

computationally convenient it does not provide a dimensionless measure of feature

significance as it is weighted by the sum of the Fourier component amplitudes,

which have units lux. Thus, like derivative based feature detectors, local energy

suffers from the problem that we are unable to specify in advance what level of

response corresponds to a significant feature. Despite this, local energy remains a

useful measure in that it responds to a wide range of feature types.

Phase congruency, on the other hand, is a dimensionless quantity. We obtain it

by normalizing the local energy function; dividing energy by the sum of the Fourier

amplitudes. Values of phase congruency vary from a maximum of 1, indicating a

very significant feature, down to 0 indicating no significance. This property offers

the promise of allowing one to specify universal feature thresholds, that is, we could

set thresholds before an image is seen - truly automated feature detection.

2.4 Issues in calculating phase congruency

This section describes an initial attempt at devising a way of calculating phase

congruency. What is highlighted is that there are a number of difficulties that

have to be overcome if a practical method of calculating phase congruency is to be

devised. These problems include the following: How should one extend the idea of

phase congruency to 2D signals? What is the appropriate way of controlling the

scale of analysis? How should information be integrated over many scales, and how

can the influence of noise be overcome?

As mentioned earlier, phase congruency is awkward to calculate. An initial

approach to calculating phase congruency might be to take a signal, remove its

DC component, (it is removed because a 90 degree phase shift of a zero frequency

does not have any meaning) calculate the Hilbert transform (say, by calculating


the Fourier transform, multiplying the result by i and then performing an inverse

Fourier transform), square and sum the Hilbert transform and the AC component

of the signal, and finally normalize the result by dividing by the sum of the Fourier

amplitudes. Results using this method were reported by Kovesi [47] (further work

in which wavelets are used to calculate phase congruency were also presented by

Kovesi [48]). An example of the calculation of phase congruency via the FFT is

shown in Figure 7.

square, sum and divide by sum of Fourier amplitudes

0

0.5

1

1.5

2

2.5

0 50 100 150 200 250

signal

-1.5-1

-0.50

0.51

1.52

2.5

0 50 100 150 200 250

signal (DC removed)

-4-3-2-10123

0 50 100 150 200 250

Hilbert transform

0

0.2

0.4

0.6

0.8

1

0 50 100 150 200 250

phase congruency

Figure 7: Calculation of phase congruency via the FFT. Notice how phase congru-ency values range between 0 and 1.

There are some problems with the calculation of phase congruency via the FFT.

Firstly it is not clear how one adapts this approach for one-dimensional signals to

two dimensions; the Hilbert transform is only defined in one dimension. A sec-

ond difficulty is that the Fourier transform is not good for localizing frequency

2.4. ISSUES IN CALCULATING PHASE CONGRUENCY 29

information spatially. In the example shown in Figure 7 the Fourier transform

was calculated over the whole signal. Thus phase congruency at each point was

calculated with respect to the whole signal. To control the local scale and spa-

tial extent over which phase congruency is determined we have to use windowing

of the signal. Windowing introduces the problem of having to balance spatial lo-

calization against the range of frequencies we wish to analyze; the window width

controlling spatial localization but also constraining the lowest frequency we can

measure. Figure 8 shows the result of calculating phase congruency using a rect-

angular windowing function 32 points wide. The computational procedure was as

follows: Over each windowed section of the signal the Fourier transform was calcu-

lated, and the Hilbert transform generated. The signal value (minus the DC value)

and the Hilbert transform value at the centre of the window was then squared and

summed; this quantity would then be divided by the sum of the Fourier amplitudes

over the current window to produce a phase congruency value at the centre position

of the window. The window would then be indexed one point forward in the signal

and the process repeated. Notice how the peaks in phase congruency are higher

and more distinct. By windowing the signal each feature is considered in relative

isolation to the others and hence ends up being considered to be very significant.

An important point to note here is that for the calculation of phase congruency the

natural scale parameter to vary is the size of the analysis window over which we

calculate local frequency information. A large window means that the significance

of features are determined in a more global manner, and a small window results

in features being treated individually and locally. This leads to a new concept of

multi-scale analysis which will be discussed in detail in the next chapter.

If the scale of analysis of phase congruency is controlled by window size we must

consider what might happen when a windowed section of signal contains no features

and only consists of noise. Being a normalized quantity, phase congruency does not

depend on the magnitude of a feature on its own, it depends on the magnitude of

the feature in the context of the local window. Thus, if the signal is purely noise

each fluctuation in the signal will be considered quite significant relative to the

surrounding features as they will all be of similar magnitude. Hence, noise poses



0

0.5

1

1.5

2

2.5

0 50 100 150 200 250

signal

-1.5-1

-0.50

0.51

1.52

2.5

0 50 100 150 200 250

signal (DC removed)

-4-3-2-10123

0 50 100 150 200 250

Hilbert transform

0

0.2

0.4

0.6

0.8

1

0 50 100 150 200 250

phase congruency

Figure 8: Calculation of phase congruency via the FFT using a rectangular win-dowing function 32 points wide.

a serious difficulty for us in trying to devise a practical way of calculating phase

congruency in images. Figure 9 shows what happens if we introduce a small amount

of noise into our signal. In regions that are distant from features the influence of

noise becomes very noticeable.

A further issue we must also consider is that phase congruency as defined in

Equation 3 does not take into account the spread of frequencies that are congruent

at a point. For example, a signal containing only one frequency component, say a

sine wave, will be in perfect congruence with itself and hence have phase congruency

of 1 everywhere (the Hilbert transform of sine is cosine, and sin2(x) + cos2(x) is

identically 1 and so no point x has maximal local energy). To mark all such points

as features would not make sense. Significant feature points are presumably ones

2.5. SUMMARY 31


-1.5-1

-0.50

0.51

1.52

2.5

0 50 100 150 200 250

noisy signal (DC removed)

-4-3-2-10123

0 50 100 150 200 250

Hilbert transform

0

0.5

1

1.5

2

2.5

0 50 100 150 200 250

noisy signal

0

0.2

0.4

0.6

0.8

1

0 50 100 150 200 250

phase congruency

Figure 9: Phase congruency of a noisy signal calculated using a rectangular win-dowing function 32 points wide.

with high information content; a point of phase congruency indicates a point of

high information content only if we have a wide range of frequencies present. We

do not gain much information from knowing the phase congruency of a signal which

has only one frequency component.

2.5 Summary

This chapter has briefly re-examined the aims of feature detection. The objective

should be to find points of high information content in images. This objective is not

necessarily satisfied by finding optimal ways of detecting step edges in the presence

of noise. Ideally, the ability to detect features and assess their significance should


be independent of image contrast and spatial magnification. This implies that we

need to measure feature significance via a dimensionless quantity.

The shortcomings of derivative based feature detectors have been briefly re-

viewed. The main problems included an inability to specify in advance what level

of response corresponds to a significant feature, and the fact that they are generally

only designed to detect step edges. The local energy model of feature perception

has been introduced. This model has been inspired from psychophysical data and it

detects a wide range of feature types. Local energy can be normalized to produce a

measure of phase congruency; an approximation of the standard deviation in phase

angles of the Fourier components at a point in the signal. Phase congruency is

a dimensionless quantity and is thus an attractive way of detecting features and

identifying their significance. It provides an absolute measure of the significance of

feature points in an image and this offers the promise of allowing constant thresh-

old values to be applied across wide classes of images. Thresholds could then be

specified in advance of processing any image, and not have to be determined by

trial and error after processing.

However, there are a number of issues to be addressed. How should phase

congruency be calculated in 2D images? How should we calculate local frequency

information and control the scale of analysis? How do we deal with the influence of

noise, and how do we identify the range of frequencies present at a point of phase

congruency? These issues, and others, are addressed in the following chapter which

will describe how phase congruency can be calculated using wavelets.

Chapter 3

Phase congruency from wavelets

3.1 Introduction

This chapter describes a new way of calculating phase congruency using wavelets. In

calculating phase congruency it is important to obtain spatially localized frequency

information in images, wavelets offer perhaps the best way of doing this. The use

of wavelets is also biologically inspired; the interest in calculating phase congruency

is motivated from psychophysical results, hence, it would seem natural that one

should try to calculate it using biologically plausible computational machinery. In

this respect geometrically scaled spatial filters in quadrature pairs will be used. In

addition to this it will be seen how the use of wavelets allows one to address the

issues raised at the end of the previous chapter regarding the calculation of phase

congruency.

The material that will be covered in this chapter is organized as follows: First,

it will be shown how local frequency information can be obtained using quadrature

pairs of wavelets, concentrating in particular on the use of Gabor wavelets. From

this it is relatively straightforward to develop the ideas behind the calculation of

phase congruency in one dimensional signals using wavelets. Material is then pre-

sented to address the difficulties regarding the calculation of phase congruency that

were introduced in the previous chapter. First, the influence of noise in the cal-

culation of phase congruency is considered and an effective method for identifying

and compensating for noise is developed. This is followed by a section covering the

33

34 CHAPTER 3. PHASE CONGRUENCY FROM WAVELETS

issues involved in extending the calculation of phase congruency to 2D images. It

is then shown how the use of wavelets allow us to obtain a measure of the spread

of frequencies present at a point of phase congruency. This helps one determine

the degree of significance of a point of phase congruency and allows to improve

feature localization. The issue of analysis at different scales is then considered and

it is argued that high-pass filtering should be used to obtain image information at

different scales instead of the more usually applied low-pass filtering. Finally, some

results and conclusions are presented.

3.2 Using Wavelets for Local Frequency Analysis

Recently the Wavelet Transform has become one of the methods of choice for ob-

taining local frequency information. Most of the current literature on wavelets

can be traced back to the work of Morlet et al. [59] Morlet and his co-workers

were interested in obtaining temporally localized frequency data in their analysis

of geophysical signals. The basic idea behind wavelet analysis is that one uses a

bank of filters to analyze the signal. The filters are all created from rescalings of

the one wave shape, each scaling designed to pick out particular frequencies of the

signal being analyzed. An important feature is that the scales of the filters vary

geometrically, giving rise to a logarithmic frequency scale.

However, many of these ideas were developed earlier by Granlund [30]. In this

remarkable paper he developed many of the ideas behind what we would now call

multi-scale wavelet analysis. He also proposed an image feature detector that is

closely related to the local energy model. For some reason Granlunds paper has

remained relatively unnoticed despite its innovative nature, though his work has

been developed by Wilson, Calway and Knutsson (see, for example, Wilson, Cal-

way and Granlund [96], Knutsson, Wilson and Granlund [43], Calway and Wil-

son [10] and Calway, Knutsson and Wilson [9]). From the initial work of Morlet

and his colleagues wavelet theory has been subsequently developed by Grossmann

and Morlet [31], Meyer [58], Daubechies [14], Mallat [53] and many others.

3.2. USING WAVELETS FOR LOCAL FREQUENCY ANALYSIS 35

We are interested in calculating local frequency and, in particular, phase infor-

mation in signals. To preserve phase information linear phase filters must be used,

that is, we must use wavelets that are symmetric/anti-symmetric. This constraint

means that the work on orthogonal wavelets (which dominates much of the litera-

ture) is not applicable to us. Chui [13] provides a proof that, with the exception

of the Haar wavelet, one cannot have a wavelet of compact support that is both

symmetric and orthogonal. The Haar wavelet is rectangular in shape and is clearly

not appropriate for our needs.

For this work the approach of Morlet will be followed, that is, using wavelets

based on complex valued Gabor functions - sine and cosine waves, each modulated

by a Gaussian. Using two filters in quadrature enables one to calculate the ampli-

tude and phase of the signal for a particular frequency at a given spatial location. It

should be noted that these wavelets are not orthogonal; some conditions must apply

in order to achieve reasonable signal reconstruction after decomposition. However,

we only require approximate reconstruction up to a scale factor over a band of

frequencies or wavelet scales.

Odd Wavelet

Even Wavelet

Odd Wavelet

Even Wavelet

Figure 10: Gabor wavelet: a sine and cosine wave modulated by a Gaussian.

If the bank of wavelet filters is designed so that the transfer function of each

filter overlaps sufficiently with its neighbours in such a way that the sum of all

the transfer functions forms a relatively uniform coverage of the spectrum one can

reconstruct the decomposed signal over a band of frequencies up to a scale factor.

(If the transfer functions are scaled so that when their sum is taken we obtain a


uniform transfer function of magnitude one, the reconstructed signal will have the

original scale.) Therefore, a problem we have is determining the appropriate scaling

factor between successive centre frequencies so that the overlap between transfer

functions results in an even spectral coverage. Granlund [30] suggests that the upper

cutoff frequency of one transfer function (where it falls to half its maximum value)

should coincide with the lower cutoff frequency of the next function. However, in

practice this does not produce particularly even coverage, and a closer spacing is

generally desirable. In the results presented in this chapter the filters used have

had bandwidths of approximately one octave with a scaling between successive

centre frequencies of 1.5. This arrangement was arrived at by experimentation, the

values are not critical and a wide range of parameters produce satisfactory results.

Referring to Figure 11 one can see that, in this example, the sum of the spectra of

the five wavelets produces a relatively ideal band-pass filter, especially when viewed

on the log frequency scale. Design of the wavelet bank ends up being a compromise

between wishing to form a smooth sum of spectra while at the same time minimizing

the number of filters used so as to minimize the computation requirements.

Analysis of a signal is done by convolving the signal with each of the wavelets.

If we let I denote the signal and Menand Mo

ndenote the even and odd wavelets at

a scale n, the amplitude of the transform at a given wavelet scale is given by

An(x) =(I(x) Men)2 + (I(x) Mon)2 (7)

and the phase is given by

n(x) = atan2(I(x) Men, I(x) Mon). (8)

Note that from now on n will be used to refer to wavelet scale (previously n has

denoted frequency in the Fourier series of a signal).

The results of convolving a signal with a bank of wavelets can be displayed

graphically via a scalogram (Figure 12). Each row of the scalogram is the result of

convolving the signal with a quadrature pair of wavelets at a certain scale. Phase

is plotted by mapping 0360 degrees to 0255 grey levels (note therefore, that

the black/while discontinuities in the scalogram correspond to the wrap-around in

3.2. USING WAVELETS FOR LOCAL FREQUENCY ANALYSIS 37

even-symmetric wavelets odd-symmetric wavelets

0 0.125 0.25 0.375 0.5frequency

spectra

.125 .5log frequency

spectra (log w)

0 0.125 0.25 0.375 0.5frequency

sum of spectra

.125 .5log frequency

sum of spectra (log w)

Figure 11: Five wavelets and their respective Fourier Transforms indicating whichsections of the spectrum each wavelet responds to. Collectively the wavelets pro-vide a wide coverage of the spectrum, though with some overlap. Note that on alogarithmic frequency scale the spectra are identical.

phase). The vertical axis of the scalogram is a logarithmic frequency scale, with the

lowest frequency at the bottom. Each column of the scalogram can be considered to

be a local Fourier spectrum for each point. Note that to achieve a dense scalogram

such as shown here the scaling factor between successive filter center frequencies

will be only slightly greater than 1.

The phase plot of the scalogram is of particular interest because it enables one

to actually see the points of high phase congruency. At locations in the signal where

there are large step changes one can see a vertical line of constant grey value in the

phase diagram indicating a constant phase angle over all frequencies at that point

in the signal.


Signal to be Analyzed

Magnitude of Scalogram

* * * *Phase of Scalogram

Figure 12: A one dimensional signal and its amplitude and phase scalograms. Thehorizontal axes of the scalograms correspond directly with the signals horizontalaxis. The vertical axes of the scalograms correspond to a logarithmic frequency scalewith low frequencies at the bottom. The asterisks mark vertical lines of constantphase that occur at the step transitions in the signal. These are points of phasecongruency. (Note: the phase scalogram is presented by mapping 0360 degrees to0255 grey levels).

3.3. CALCULATING PHASE CONGRUENCY VIA WAVELETS 39

3.3 Calculating Phase Congruency Via Wavelets

To calculate phase congruency we need to construct the fo

1996, kovesi, invariant measures of image features from phase information

Documents

lowlevel image features

invariantto image illumination

phase data

phase congruency

feature matching

signal matching

phase informationif

correlation oflocal