1996, kovesi, invariant measures of image features from phase information

Upload: jaro123

Post on 31-Oct-2015

166 views

Category:

Documents


1 download

TRANSCRIPT

  • INVARIANT MEASURES OF IMAGE FEATURES FROM

    PHASE INFORMATION

    This thesis is

    presented to the

    Department of Psychology

    for the degree of

    Doctor of Philosophy

    of the

    University of Western Australia

    By

    Peter Kovesi

    May 1996

  • cfl Copyright 1996

    by

    Peter Kovesi

    ii

  • Abstract

    Invariant Measures of Image Features From Phase Information

    If reliable and general computer vision techniques are to be developed it is crucial

    that we find ways of characterizing low-level image features with invariant quanti-

    ties. For example, if edge significance could be measured in a way that was invariant

    to image illumination and contrast, higher-level image processing operations could

    be conducted with much greater confidence. However, despite their importance,

    little attention has been paid to the need for invariant quantities in low-level vision

    for tasks such as feature detection or feature matching.

    This thesis develops a number of invariant low-level image measures for feature

    detection, local symmetry/asymmetry detection, and for signal matching. These

    invariant quantities are developed from representations of the image in the frequency

    domain. In particular, phase data is used as the fundamental building block for

    constructing these measures. Phase congruency is developed as an illumination

    and contrast invariant measure of feature significance. This allows edges, lines and

    other features to be detected reliably, and fixed thresholds can be applied over wide

    classes of images. Points of local symmetry and asymmetry in images give rise

    to special arrangements of phase, and these too can be characterized by invariant

    measures. Finally, a new approach to signal matching that uses correlation of

    local phase and amplitude information is developed. This approach allows reliable

    phase based disparity measurements to be made, overcomingmany of the difficulties

    associated with scale-space singularities.

    iii

  • iv

  • Acknowledgements

    First of all I would like to thank my supervisors John Ross and James Trevelyan.

    With their gentle guidance and encouragement, the odd searching question, and

    the occasional nudge, they ensured that progress was always maintained. In each of

    them I have also greatly valued their enormous breadth of knowledge that spanned

    many disciplines. This helped me keep my thoughts open and wide ranging as I

    searched for answers to my problems.

    I must also thank my other supervisor, my wife Robyn Owens, for an uncount-

    able number of technical discussions, for her proof-reading skills, and for always

    being there and making the generation of this thesis far less traumatic than I would

    have dared to hope for. I thank Grace, Genevieve, and later in the generation of this

    thesis, Gabriel for their tolerance and patience while Daddy did his Pee-Aiche-Dee.

    I would also like to acknowledge the many hours of useful discussions I have had

    with Ben Robbins, Chris Pudney, Mike Robins, and Adrian Baddeley. Ben Robbins

    pointed out the efficiencies that can be made in the Fourier convolution of an image

    with a quadrature pair of filters. This must have saved me many hours of waiting

    and allowed me to do many more experiments that I would have done otherwise.

    Others I must thank include the following: Daniel Reisfeld who introduced me

    to the problem of finding local symmetry in images, resulting in many long and

    impassioned discussions on the subject; Concetta Morrone for her amazing grasp

    of both the psychophysics and computer vision literature, and therefore, always

    being able to suggest yet another paper I should read; Carlo Tomasi for his help

    in converting an early version of my phase congruency code from C to a MATLAB

    script; Olivier Faugeras and his colleagues for their hospitality and the fine working

    environment they have developed at INRIA in Sophia Antipolis which I was able

    v

  • to enjoy during my visit there during the first half of 1995.

    Finally I thank everyone in The Robotics and Vision Research Group in the

    Department of Computer Science at The University of Western Australia for the

    enjoyable working environment that they contribute to.

    vi

  • Contents

    Abstract iii

    Acknowledgements v

    1 Introduction 1

    1.1 The Need for Invariant Quantities in Images . . . . . . . . . . . . . 1

    1.2 The Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

    1.3 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

    1.4 Thesis Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

    2 Image features 9

    2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

    2.2 Gradient based feature detection . . . . . . . . . . . . . . . . . . . 10

    2.3 Local energy and phase congruency . . . . . . . . . . . . . . . . . . 17

    2.3.1 Defining phase congruency . . . . . . . . . . . . . . . . . . . 19

    2.3.2 Local energy . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

    2.4 Issues in calculating phase congruency . . . . . . . . . . . . . . . . 27

    2.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

    3 Phase congruency from wavelets 33

    3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

    3.2 Using Wavelets for Local Frequency Analysis . . . . . . . . . . . . . 34

    3.3 Calculating Phase Congruency Via Wavelets . . . . . . . . . . . . . 39

    3.4 Noise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

    3.5 Extension to two dimensions . . . . . . . . . . . . . . . . . . . . . . 49

    vii

  • 3.5.1 2D filter design . . . . . . . . . . . . . . . . . . . . . . . . . 49

    3.5.2 Filter orientations . . . . . . . . . . . . . . . . . . . . . . . . 50

    3.5.3 Noise compensation in two dimensions . . . . . . . . . . . . 52

    3.5.4 Combining data over several orientations . . . . . . . . . . . 53

    3.6 The importance of frequency spread . . . . . . . . . . . . . . . . . . 54

    3.7 Scale via high-pass filtering . . . . . . . . . . . . . . . . . . . . . . 61

    3.7.1 Difficulties with low-pass filtering . . . . . . . . . . . . . . . 61

    3.7.2 High-pass filtering . . . . . . . . . . . . . . . . . . . . . . . 63

    3.7.3 High-pass filtering and scale-space . . . . . . . . . . . . . . . 67

    3.8 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . 69

    3.9 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

    4 A second look at phase congruency 77

    4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

    4.2 Log Gabor wavelets . . . . . . . . . . . . . . . . . . . . . . . . . . . 78

    4.3 Phase congruency from broad bandwidth filters . . . . . . . . . . . 84

    4.4 Another way of defining phase congruency . . . . . . . . . . . . . . 85

    4.4.1 Calculation of PC2 via quadrature pairs of filters . . . . . . 87

    4.5 A third measure of phase congruency . . . . . . . . . . . . . . . . . 93

    4.5.1 Calculation of PC3 via quadrature pairs of filters . . . . . . 93

    4.6 Biological computation of phase congruency . . . . . . . . . . . . . 95

    4.7 Symmetry and Asymmetry: Special patterns of phase . . . . . . . . 97

    4.7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 97

    4.7.2 A frequency approach to symmetry . . . . . . . . . . . . . . 98

    4.7.3 Biological computation of symmetry and asymmetry . . . . 103

    4.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103

    5 Representation and matching of signals 105

    5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105

    5.2 Spatial Correlation . . . . . . . . . . . . . . . . . . . . . . . . . . . 106

    5.3 Phase Based Disparity Measurement . . . . . . . . . . . . . . . . . 107

    5.4 Matching Using Localized Frequency Data . . . . . . . . . . . . . . 110

    viii

  • 5.5 Using Phase to Guide Matching . . . . . . . . . . . . . . . . . . . . 111

    5.6 Determining Relative Signal Distortion . . . . . . . . . . . . . . . . 113

    5.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117

    6 Conclusion 119

    6.1 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119

    6.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121

    Bibliography 123

    A Portfolio of experimental results 133

    A.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133

    A.2 Portfolio . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133

    A.2.1 Image acknowledgements . . . . . . . . . . . . . . . . . . . . 136

    A.3 Parameter variations . . . . . . . . . . . . . . . . . . . . . . . . . . 153

    B Noise models and noise compensation 161

    B.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161

    B.2 Noise generators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162

    B.3 Noise spectra measured from images . . . . . . . . . . . . . . . . . 164

    B.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165

    C Non-maximal suppression 167

    C.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167

    C.2 Non-maximal suppression using feature orientation information . . . 168

    C.3 Orientation from the feature image . . . . . . . . . . . . . . . . . . 169

    C.4 Morphological approaches . . . . . . . . . . . . . . . . . . . . . . . 171

    C.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172

    D Implementation details 173

    D.1 MATLAB Implementation . . . . . . . . . . . . . . . . . . . . . . . 175

    ix

  • x

  • Chapter 1

    Introduction

    1.1 The Need for Invariant Quantities in Images

    This thesis is concerned with the search for measures of image features that remain

    constant over wide ranges of viewing conditions. Such invariant quantities provide

    powerful tools for the analysis of images, allowing image processing algorithms to

    work more reliably and over wider classes of images. The work presented in this

    thesis concentrates on invariant quantities in low-level or early vision.

    Some effort has been devoted to investigating invariant measures of higher level

    structures in images, for example, Hu [37] developed a series of invariant moments

    for recognizing binary objects. More recently there has been considerable interest

    in geometric invariance, the study of geometric properties of objects that remain

    invariant to imaging transformations. A collection of papers in this area can be

    found in the book by Mundy and Zisserman [63]. However, little attention has

    been paid to the invariant quantities that might exist in low-level or early vision

    for tasks such as feature detection or feature matching. Some limited exceptions

    to this include the work of Koenderink and van Doorn [44, 45] who recognized

    the importance of differential invariants associated with motion fields, and Florack

    et al. [28] who propose differential invariants for characterizing a number of image

    contour properties. However, in general, interest in low-level image invariants has

    been limited. This is surprising considering the fundamental importance of being

    able to obtain reliable results from low level image operations in order to successfully

    1

  • 2 CHAPTER 1. INTRODUCTION

    perform any higher level operations.

    There are two main points about an invariant measure: firstly, of course, it must

    be dimensionless (that is, have no units attached to it), and secondly it should rep-

    resent some meaningful and useful quality. If it does not represent some meaningful

    quality one has no idea how to use it. It is easy to construct a dimensionless quan-

    tity that is meaningless, for example, the ratio of my height to the width of the

    letter o. It is also easy to find measures that are useful but not dimensionless, for

    example, the speed of your car. However, it is often hard to define something that

    is both dimensionless and useful.

    Why do we want to find invariant quantities? Quantities that are useful but

    not dimensionless are generally only useful because they are applied in relatively

    structured environments and at a specific scale. For example, using the speed of

    your car to decide whether you are driving safely only works because most cars are

    similar in size, roadways are standardized and gravitational forces are effectively

    constant. Images, on the other hand, provide a very dynamic and unstructured

    environment in which we struggle to make our algorithms operate. Objects can

    appear with arbitrary orientation and spatial magnification along with arbitrary

    brightness and contrast. Thus the search for invariant quantities is very important

    for computer vision.

    It is all too easy to forget that a number inside a computer often has units

    associated with it. The fact that a number has units associated with it imposes

    some constraints on how it should be used. For example, it does not make sense to

    add a quantity representing time to one representing a length. Despite this, it is

    quite common to find such nonsensical combinations of quantities in the computer

    vision literature. There are many algorithms that involve the minimization of some

    energy; often the energy is defined to be the addition of many components, each

    having very different units. For example, energy minimizing splines (snakes) are

    usually formulated in terms of the minimization of an energy that is made up of

    an intensity gradient term and a spline bending term [42]. These two components,

    while representing meaningful quantities, are not dimensionless. This means that

    for energy minimizing splines to be effective their parameters have to be tuned

  • 1.2. THE APPROACH 3

    carefully for each individual application. The parameters are used to balance the

    relative importance of individual components of the overall energy. If, say, the

    overall image contrast was halved one would need to double the weighting applied

    to the intensity gradient term to retain the same snake behaviour. If one was to

    somehow replace the intensity gradient and spline bending terms with dimensionless

    quantities that represented, in some way, the closeness of the spline to a feature

    and the deformation of the spline, one would be able to use fixed parameters over

    wider classes of images.

    Clearly there is a pressing need for the identification of low level invariant quan-

    tities in images. The main form of invariance that will be investigated in this thesis

    is invariance to image illumination and contrast. That is, this thesis will be seeking

    to construct low level image measures that have a response that is independent of

    the image illumination and/or contrast.

    1.2 The Approach

    In the search for low level invariant quantities in images the approach taken in this

    thesis is to make use of data from representations of the image in the frequency

    domain. Working directly in the spatial domain is avoided for two reasons. Firstly,

    the spatial domain of an image, while convenient and intuitive, almost always forces

    one into making use of dimensional measures in the analysis of an image; it is

    hard to get away from the use of intensity gradients, contrast levels or equivalent

    quantities. Secondly, low level spatial techniques have been extensively researched,

    and while one cannot say all possibilities have been exhausted, the opportunities

    for the development of significantly new techniques appear limited.

    The most logical alternative approach is to consider representations of the image

    in the frequency domain; much of the psychophysical literature in visual perception

    has been devoted to the development of models in this domain. However, these

    psychophysical models have generally not been developed to the point where they

    could be implemented as algorithms in a computer vision system.

    With an image represented in terms of the variation of amplitude and phase

  • 4 CHAPTER 1. INTRODUCTION

    values with frequency one has a number of new and interesting possibilities in the

    analysis of image signals. However, so far in the computer vision literature, very

    little work has been done on the use of frequency data to recognize and charac-

    terize features in signals. Some notable exceptions to this include the following:

    Granlund [30], who proposed a multiscale Fourier transform approach to the analy-

    sis of images; Knutsson, Wilson and Granlund [43], who developed these ideas fur-

    ther for image coding and the restoration of noisy images; Morrone and Owens [61],

    who use phase congruency as a means of finding image features; Fleet and Jep-

    son [26], who used phase to determine image velocities; and Langley, Atherton and

    Wilson [51], Fleet, Jepson and Jenkin [27] and Calway, Knutsson and Wilson [9],

    who have investigated the use of phase information to estimate image disparities.

    Jones and Malik [40, 41] have also used local frequency information for determining

    disparity, though they do not directly use phase information.

    In this thesis considerable effort is devoted to the understanding of the variations

    of phase and amplitude over frequency for different image features. In particular,

    phase data is used as the fundamental building block for the various low-level in-

    variant feature measures that are developed in this thesis. Phase information is

    an ideal starting point for the development of invariant measures for two reasons.

    Firstly, phase itself is a dimensionless quantity, and secondly phase information has

    been shown to be crucial in the perception of images [65]. This is discussed further

    in Chapter 2.

    1.3 Contributions

    This thesis develops a number of invariant frequency based low-level image quan-

    tities for feature detection, local symmetry/asymmetry detection, and for signal

    matching.

    Most of this thesis is devoted to the investigation of the use of congruency of

    the local phase over many scales as an illumination and contrast invariant measure

    of feature significance at points in images. Phase congruency was first proposed by

    Morrone et al. [62] and Morrone and Owens [61] as a computational model of the

  • 1.3. CONTRIBUTIONS 5

    perception of low-level features such as step edges, lines, and Mach bands in images.

    However, due to practical difficulties in calculating phase congruency they developed

    the use of a related quantity, local energy, for feature detection instead. The main

    contribution of this thesis is to establish the importance of phase congruencys

    invariance to illumination and contrast and to develop a practical implementation of

    it. The goal of an illumination and contrast independent feature detector is achieved

    and its reliable performance over a wide range of images using fixed thresholds is

    demonstrated.

    In achieving this goal a number of other contributions are developed. These

    include an effective noise compensation technique - something that is often essential

    when normalized image measures such as phase congruency are used. This noise

    compensation technique makes minimal assumptions about the nature of the image

    noise and can be applied to any image processing technique that makes use of

    banks of filters over several scales. Another contribution is the recognition of the

    importance of the spread of frequencies that are present at each point in a signal

    when one is considering phase congruency. For phase congruency to be used as a

    measure of feature significance it must be weighted by some measure of the spread

    of frequencies present. A method for doing this is presented.

    Also presented is an argument that when a frequency based approach is used

    in the analysis of images a more logical interpretation of scale is obtained by using

    high-pass filtering rather than low-pass or band-pass filtering. This approach results

    in feature positions remaining stable over different scales of analysis, something that

    is not achieved with low-pass or band-pass filtering.

    Another contribution is the recognition that points of local symmetry and asym-

    metry in images also give rise to special arrangements of phase, and these can be

    readily detected. The new measures of local image symmetry and asymmetry that

    are developed are unique in that they are dimensionless and that they do not require

    any previous image segmentation to have taken place prior to analysis.

    Finally, with the insights obtained from this work in the use of phase for feature

    detection a new approach to the matching of signals is developed. This technique

  • 6 CHAPTER 1. INTRODUCTION

    uses correlation of local phase and amplitude information, rather than spatial in-

    tensity data, for matching. An advantage of this new method is that it also allows

    disparity between points in stereo images to be estimated.

    1.4 Thesis Overview

    Chapter 2 reviews the major approaches that have been used for low-level edge

    detection and discuses their shortcomings. The main problems are that existing

    approaches use very simple edge models, and that one cannot know in advance of

    applying the edge operator what level of edge response will be significant. That

    is, edge thresholds for individual images have be set interactively by viewing the

    output. The local energy and phase congruency model of feature perception is then

    introduced and previous work in this area is reviewed. A new geometric interpreta-

    tion of phase congruency is provided and it is argued that phase congruency rather

    than local energy should be used to identify features in images because it is a di-

    mensionless quantity. However, while phase congruency appears to be an attractive

    measure to use there are some difficulties in calculating it and the chapter concludes

    by identifying these problems:

    Phase congruency can be defined in 1D but it is not clear how it should becalculated in 2D.

    Being a normalized quantity phase congruency responds strongly to noise insignals.

    Phase congruency is only meaningful if there is a spread of frequency compo-nents present in a signal; how should this spread be measured?

    Phase congruency appears to require a different interpretation of scale, sug-gesting that high-pass filtering rather than low-pass filtering should be used.

    Chapter 3 sets out to develop a practical method for calculating phase con-

    gruency in images. The first requirement is to identify an appropriate method

    of obtaining local frequency information in images. Complex Gabor wavelets are

  • 1.4. THESIS OVERVIEW 7

    adopted for this purpose. It is then shown how phase congruency in 1D signals

    can be readily calculated from the convolution outputs of a bank of complex Gabor

    filters. The problem of noise is then considered and a method of automatically rec-

    ognizing, and compensating for, the influence of noise on phase congruency values

    in an image is devised. This is followed by a section covering the issues involved in

    extending the calculation of phase congruency to 2D images. It is then shown how

    the use of wavelets allow us to obtain a measure of the spread of frequencies present

    at a point of phase congruency. This helps us determine the degree of significance

    of a point of phase congruency and allows us to improve feature localization. Fi-

    nally the issue of analysis at different scales is considered in more detail and it is

    concluded that high-pass filtering should be used to obtain image information at

    different scales instead of the more usually applied low-pass filtering.

    Chapter 4 re-examines the work on phase congruency that was developed in the

    previous chapter. Firstly the choice of the wavelet function used for the analysis

    of images is considered. Of particular concern is the limited maximum bandwidth

    that can be obtained using Gabor functions and it is concluded that the log Gabor

    function is more appropriate as it allows one to construct filters of arbitrary band-

    width. However, when these high bandwidth filters were used to calculate phase

    congruency unexpected results were produced. The analysis of these results led to

    the development of two new approaches to the calculation of phase congruency, one

    of which produced very superior results. This work, in turn, led to a new frequency

    based approach to the detection of points of local symmetry and asymmetry in im-

    ages. It is shown how symmetry and asymmetry can be thought of as representing

    generalizations of delta and step features respectively.

    Chapter 5 changes subject and considers the matching of signals and the estima-

    tion of disparity using local frequency information. Many of the ideas and insights

    obtained from the work on phase congruency are employed to great benefit here.

    Where this work differs mainly from other work in this area is in its integrated use

    of frequency data over many scales. An approach to signal matching via correla-

    tion of local phase and amplitude is developed. A by-product of this approach to

    signal matching is that an estimate of the spatial shift required in one signal to

  • 8 CHAPTER 1. INTRODUCTION

    match the second is obtained. This allows rapid convergence to the correct match-

    ing locations. The chapter concludes with some discussion about the advantages of

    matching signals represented in the log frequency domain. In this domain spatial

    scale changes in signals manifest themselves as a translation of the local amplitude

    spectra along with an amplitude rescaling, however, shape of the spectra remains

    unchanged. This invariance in the log frequency domain offers a number of inter-

    esting possibilities. For example it may allow textures to be correctly recognized in

    foreshortened views, or provide a new way of identifying surface slant from spatial

    scale change in stereopsis or motion.

    Finally, Chapter 6 concludes this work and discusses the areas that might be

    developed further in future work. Four appendices are also included. Appendix 1

    presents a comprehensive portfolio of experimental results comparing phase congru-

    ency to the output of the Canny edge detector over a wide range of images. Phase

    symmetry images are also presented for each image in the portfolio. In addition,

    phase congruency images are presented for a number of test conditions to illustrate

    its behaviour under different parameter settings. Appendix B looks at the sensitiv-

    ity of the phase congruency noise compensation technique to different noise models,

    showing that the noise model is not critical. Appendix C describes the problems in

    performing non-maximum suppression on phase congruency images. The techniques

    that were used in generating the final phase congruency edge maps are described,

    and it is concluded that much work could be done on the problem of non-maximum

    suppression. Finally, Appendix D describes some of the implementational details

    for the calculation of phase congruency.

  • Chapter 2

    Image features

    2.1 Introduction

    The detection of edges and other low-level features in images has long been rec-

    ognized as a fundamental operation of great importance. A good line drawing

    can provide much the information that might be contained in a photograph of the

    same scene, and in doing so only requires a small fraction of the data used by the

    photograph to represent that information. Indeed, line drawings can be easier to

    interpret and are often used instead of photographs in technical manuals and How

    to do it books. However, one has to be cautious in comparing the interpretability

    of line drawings with photographs. Drawings made by humans are almost always

    constructed with their semantic content in mind, particularly so for technical man-

    uals. Extraneous details are removed and extra details that would not normally be

    visible may be added, shading is also often used1. Thus a line drawing that has

    been automatically generated with no regard to the images semantic content may

    not provide all the information that one might hope to obtain. Nevertheless the

    extraction of a line drawing is an important first step in the automated analysis of

    a scene.

    In searching for parameters to describe the significance of image features, such

    as edges, we should be looking for measures that are invariant with respect to image

    If one had a good automated feature detector one would be able to construct line drawingswith no regard to their semantic content, this would allow a fair comparison between line drawingsand photographs.

    9

  • 10 CHAPTER 2. IMAGE FEATURES

    contrast and spatial magnification. Such quantities would provide an absolute mea-

    sure of the significance of feature points that could be applied universally to any

    image irrespective of image contrast and magnification. The human visual system is

    able to reliably identify the significance of image features under widely varying con-

    ditions. Even if the illumination of a scene is altered by several orders of magnitude

    our interpretation of it will remain largely unchanged. Similarly, our interpreta-

    tion of images is not greatly affected by changes in apparent spatial magnification,

    though not with the same degree of tolerance that we have to illumination changes.

    Despite the obvious importance of characterizing low-level image features in some

    invariant manner almost no effort seems to have been devoted to this task. One

    recent exception is the work of Heeger [35] in his development of a normalized model

    of contrast sensitivity that qualitatively matches psychophysical data, though this

    work is not directed at computer vision.

    This chapter discusses some of the shortcomings of existing feature detectors

    and introduces the idea of detecting features on the basis of phase congruency.

    2.2 Gradient based feature detection

    The majority of work in the detection of low-level image features has been concen-

    trated on the identification of step discontinuities in images using gradient based

    operators. Gradient based edge detection methods were pioneered by Roberts [77],

    Prewitt [71] and Sobel [72, 86]. They were then developed in terms of a computa-

    tional model of human perception by Marr and Hildreth [55, 54]. Inspired by the

    presence of on-centre and off-centre receptive fields in the retina, Marr and Hildreth

    developed a model where edges were detected via the zero-crossings of the image af-

    ter convolution with a Laplacian of Gaussian filter. While this model was attractive

    it had a number of difficulties: Zero-crossings always form closed contours, often not

    realistically modelling the connectivity of image features; staircase intensity profiles

    result in false positives being detected; and finally, with the second derivative of

    the image being used the results are susceptible to noise. Marr also introduced the

    concept of the Primal Sketch, that is, the idea that the brain generates a concise

  • 2.2. GRADIENT BASED FEATURE DETECTION 11

    representation of the scene that contains important images tokens, such as edges

    and other basic image features, and that this representation permits further analy-

    sis of the scene to be done more efficiently by the brain. This concept has greatly

    influenced much of the research done in computer vision.

    A number of variations of second derivative operators have been devised in

    various attempts to overcome their deficiencies. Some examples of this include the

    work of Fleck [22], Haralick [32], and Sarkar and Boyer [83]. Fleck and Haralick

    used directional second derivatives to reduce the influence of noise, with Fleck

    also employing first and third derivative information to eliminate the detection

    of false positives. Sarkar and Boyer adopted the optimality criteria proposed by

    Canny [11, 12] to develop infinite impulse response filters for the detection of edges

    via zero crossings.

    Canny [11, 12] formalized the problem of the detection of step edges in terms

    of three criteria: good detection; good localization; and uniqueness of the response

    to a single feature. Subsequently Spacek [87] and Deriche [16] followed Cannys

    approach to develop similar operators; Deriche allowing the operator to have an

    infinite impulse response and Spacek modifying the response uniqueness criterion.

    An objection to these optimal detectors is that they are only optimal in a very

    limited domain, that of one dimensional step edges in the presence of noise. At 2D

    features such as corners and junctions where the intensity gradient becomes poorly

    defined these detectors have difficulties.

    Thus, a major problem with gradient based operators is that they use a single

    model of an edge, that is, they assume edges are step discontinuities. In an ideal

    system a feature detector would mark features wherever a good artist would draw

    features when making a sketch of a scene. An artist produces marks in a sketch for

    a wide range of feature types, not just step edges. Marks are drawn to indicate line,

    roof and step edges along with other features such as shadow boundaries, highlights,

    and presumably a range of other (unknown) feature types. Perona and Malik [68]

    point out that many image features are represented by some combination of step,

    delta, roof and ramp profiles. For example, a very commonly encountered feature

    type is the occluding boundary of a convex object, such as a ball. If the ball surface

  • 12 CHAPTER 2. IMAGE FEATURES

    grey valuemeasured

    background

    lambertian sphere

    illumination

    Figure 1: Intensity profile observed across a lambertian sphere against a plainbackgound with overhead illumination. The occlusion boundary is not a simplestep edge.

    is lambertian and the illumination is aligned with the viewing direction the feature

    profile will consist of an intensity profile that starts off brightest at the mid-point

    of the ball and then gets darker as our view moves across the ball as a result of

    the surface normal becoming perpendicular to our viewing direction, and finally

    culminating in a step jump to the grey level of the background (Figure 1).

    In this simple, idealized situation we have a feature that is considerably more

    complex than a step edge. In practice the situation will be far more awkward;

    the ball surface is unlikely to be lambertian, lighting can be from any direction,

    there may be mutual illumination effects between the ball and other objects, and of

    course, the background may not be uniform. For this reason the word feature will

    be generally used in this thesis rather than the word edge in order to emphasize

    the aim of finding all important features that represent points of high information

    content, not just step edges. The definition of what a feature is will be deliberately

    left vague, though subsequent sections which describe the phase congruency model

    of feature perception will offer a possible definition.

    Some might argue that an automated feature detector does not need to attempt

  • 2.2. GRADIENT BASED FEATURE DETECTION 13

    to emulate human sketching skills. However, the interest in producing feature detec-

    tors has been primarily inspired from the ability of artists to produce line drawings2.

    Artists have shown us that line drawings can provide very compact yet effective de-

    scriptions of scenes. Indeed, in the assessment of any automated feature detector

    perhaps the best we can do is to compare its output against a line drawing of the

    same scene made by an expert reproductive artist. After all it is artists who are our

    best experts in representing scenes via line drawings. It is probably fair to say that

    excessive emphasis has been placed on finding optimal step edge detectors and the

    original objective, that of finding points of high information content in images has

    been forgotten. Just because a detector is effective in finding and localizing noisy

    step edges in a scene does not mean that it will represent the information in the

    scene well.

    A second problem with gradient based edge detectors is that they typically

    characterize edge strength by the magnitude of the intensity gradient. Thus the

    perceived strength or significance of an edge is sensitive to illumination and spatial

    magnification variations. Intensity gradient has units of lux/radian (pixel coordi-

    nates represent viewing direction and hence have angular units)3. Intensity gradi-

    ents in images depend on many factors, including scene illumination, blurring and

    magnification. For example, doubling the size of an image while leaving its intensity

    values unchanged will halve all the gradients in the image. Any gradient based edge

    detection process will need to use a threshold modified appropriately. However, in

    general, one does not know in advance the level of contrast present in an image or

    its magnification. The image gradient values that correspond to significant edges

    are usually determined empirically.

    Here the distinction is made between line drawings, which contain only lines, and sketcheswhich may also include shading.

    Strictly speaking, image grey values should not be called intensity values. Intensity is definedas the luminous flux that is emitted per solid angle and is a property that is associated with alight source. Intensity has units candelas (lumens/steradian). In constructing an image a camerameasures the illumination at each point in the image plane that is received from a scene. Thus,image grey values have units lux (lumens/m). Despite this, the use of the term intensity valuefor an image grey value appears to be commonly accepted. David Marr used the term in thismanner in his book [54].

  • 14 CHAPTER 2. IMAGE FEATURES

    Little guidance is available for the setting of thresholds, indeed Faugeras4 can

    only offer the following advice:

    Thresholding is a plague that occurs in many areas in engineering, but

    to our knowledge it is unavoidable and must be tackled with courage.

    A limited number of efforts have been made to determine threshold values automat-

    ically. In his thesis, Canny [11] sets his thresholds on the basis of local estimates

    of image noise obtained via Weiner filtering. However, the details of setting thresh-

    olds on this basis, and the effectiveness of this approach are not reported. Canny

    also introduced the idea of thresholding hysteresis which has proved to be a useful

    heuristic for maintaining the continuity of thresholded edges, though one then has

    the problem of determining two threshold levels. Sarkar and Boyer [83] also em-

    ployed Weiner filtering to estimate the derivative of the noise output in their zero

    crossing based detector. Having an estimated slope of the noise response allowed

    them to set thresholds appropriately. However, this process required them to take

    three more derivatives after the image had been filtered by their edge operator.

    This presumably limited the quality of the estimate of the derivative of the noise

    output.

    Kundu and Pal [50] devised a method of thresholding based on human psy-

    chophysical data where contrast sensitivity varies with overall illumination levels.

    However, it is hard to provide any concrete guide to the fitting of a model of contrast

    sensitivity relative to a digitized grey scale of 0255. More recently Fleck [24, 23]

    suggested setting thresholds at some multiple (typically 3 to 5) of the expected

    standard deviation of the operator output when applied to camera noise. This ap-

    proach of course, requires detailed a priori knowledge of the noise characteristics of

    any camera used to take an image. Noise is always a concern for gradient based

    detectors. The main tool used to reduce the influence of noise is spatial smoothing.

    However, smoothing degrades feature localization, and 2D feature positions such as

    corners can be severely corrupted (see Perona and Malik [69]). With high degrees

    of smoothing feature locations can move significantly, and distinct features may

    Olivier Faugeras. Three-Dimensional Computer Vision: A Geometric Viewpoint. MIT Press1993, p117.

  • 2.2. GRADIENT BASED FEATURE DETECTION 15

    merge. It is very unsatisfactory for the perceived location of a feature to depend on

    how much smoothing was required to overcome the influence of noise. This issue

    will be considered in more detail in the next chapter.

    Bergholm [5] adopts the scale-space model in developing his edge focusing ap-

    proach to edge detection, and in doing so addresses an number of problems asso-

    ciated with gradient based detectors. He observes that to eliminate the influence

    of noise on a gradient based detector a heavily smoothed image is required, but

    this degrades edge localization. To achieve good localization no smoothing should

    be used but then noise becomes a problem. Bergholms solution is to start with

    an edge map at a heavily smoothed scale. He then proceeds to calculate an edge

    map at a slightly finer scale but only at pixels in the image connected to edge pixels

    found at the previous scale. The old edge points are discarded, the new ones at

    the slightly finer scale retained, and the process is repeated. In this manner edges

    are propagated out from their initial, rough locations and focused to their correct

    positions at the finest scale. An important point is that the problem of noise is

    overcome by starting with edges at a coarse scale and only looking for edges in

    adjacent pixels as scale is gradually reduced. Another attractive feature is that

    edge thresholding is only required to generate the initial edge map. However if this

    initial map is incorrectly thresholded at too high a level then many features will

    never be found. Conversely, if the threshold is too low many noise features will be

    found and these will be propagated down to the finest scale.

    The discussion so far has been directed at gradient based detectors though,

    of course, other types of detectors have been developed. For example the weak

    membrane approach of Blake and Zisserman [6] involves minimizing a global energy

    function over the image in order to solve for a surface function that fits the image

    in a manner that is considered to be appropriate. Blake and Zissermans energy

    measure is a weighted combination of terms representing the deviation of the surface

    function from the image, the square of the slope of the function, and the contour

    length of the function. This can be interpreted as fitting a weak membrane to

    the image data in such a way that discontinuities are preserved. An objection to

    this approach is that the energy term is not dimensionally consistent with different

  • 16 CHAPTER 2. IMAGE FEATURES

    types of quantities being added together. This makes the result very sensitive to

    the relative weightings of the terms that make up the energy.

    Noble [64] devised a number of grey level morphological operations to detect

    edges. She develops a dilation-erosion residue operator which is analogous to a

    first derivative operator and is used as an edge strength map. A second operator

    called the signed maximum dilation-erosion residue (analogous to a second deriva-

    tive operator) is used to guide the tracing of edges, and to classify the responses to

    the dilation-erosion residue operator. While Nobles approach is morphological, the

    steps involved can be interpreted in terms of differential operators. Thus it depends

    on using a simple edge model and it does not escape the thresholding problem.

    Perona and Malik [69] devised an approach to edge detection using anisotropic

    diffusion. They developed an approach to scale space smoothing that is based on

    the heat diffusion equation. To detect edges they make the conduction coefficient

    a function of the image gradient to impede the flow of heat. Thus step discon-

    tinuities in the image form local barriers to the diffusion process. Over repeated

    iterations of the diffusion process step edges in the image become sharper and re-

    gions between the step discontinuities become smoother. Final extraction of the

    edges then becomes straightforward. A very significant attribute of this approach

    is that feature positions remain stable over scale. All that changes with scale is

    the level of contrast (heat difference) required for a feature to persist. However,

    this approach only detects step edges and is very much dependent on local image

    contrast.

    Another interesting approach that has been developed recently by Smith and

    Brady [85] is the SUSAN edge finder. This non-linear technique involves indexing

    a circular mask over the image and at each location determining the area of the

    mask having similar intensity values to the centre pixel value. This segment of the

    mask is denoted the Univalue Segment Assimilating Nucleus (USAN). Locations in

    the image where the USAN is locally at a minimum (locally the Smallest USAN,

    hence SUSAN) mark the positions of step and line features. The detector performs

    well, and its tolerance to noise is a significant attribute. However, the detector is

    not invariant to image contrast as it requires the setting of a threshold which is

  • 2.3. LOCAL ENERGY AND PHASE CONGRUENCY 17

    used to decide whether or not elements of the mask are similar to the centre value

    when determining the size of the USAN. This threshold specifies the minimum edge

    contrast that can be detected.

    The discussion above represents a generalized overview and sampling of existing

    edge detection techniques. Others have conducted far more comprehensive reviews

    (for example Noble [64]), and it is not intended to repeat such a review here. The

    main purpose of this overview is to point out that almost all existing edge detec-

    tors are based on the calculation of intensity gradients or some other measure of

    the spatial variation of intensity across the image. These measures are dimensional

    quantities and hence depend on image contrast and spatial magnification. Thus

    the fundamental problem is that one does not know in advance what level of edge

    strength corresponds to a significant feature. As a result, edge thresholds are gener-

    ally set by humans viewing the output and adjusting the threshold until the result

    is deemed acceptable. This is not automated feature detection.

    2.3 Local energy and phase congruency

    The local energy model of feature perception is a relatively new model. It is not

    based on the use of local intensity gradients for feature detection. Instead it postu-

    lates that features are perceived at points in an image where the Fourier components

    are maximally in phase5. For example, when one looks at the Fourier series that

    makes up a square wave all the Fourier components are sine waves that are exactly

    in phase at the point of the step at an angle of 0 or 180 degrees depending on

    whether the step is upward or downward. At all other points in the square wave

    individual phase values vary, making phase congruency low. Similarly one finds that

    phase congruency is a maximum at the peaks of a triangular wave (at an angle of

    90 or 270 degrees). A particularly important point about using phase congruency

    to mark features of interest is that one is not making any assumption about the

    It should be emphasized that when phase is referred to in this thesis it is local phase that isbeing considered. That is, we are concerned with the local phase of the signal at some positionx. This is distinct from phase values that one might obtain, say, from a FFT of a signal in whichphase values will be the phase offsets of each of the sinusoidal basis functions in the decomposition.

  • 18 CHAPTER 2. IMAGE FEATURES

    shape of the waveform at all. One is simply looking for points in the image where

    there is a high degree of order in the Fourier domain.

    Figure 2: Construction of square and triangular waveforms from their Fourier series.In both diagrams the first few terms of the respective Fourier series are plottedwith broken lines, the sum of these terms is the solid line. Notice how the Fouriercomponents are all in phase at the point of the step in the square wave, and at thepeaks and troughs of the triangular wave.

    A wide range of feature types give rise to points of high phase congruency.

    These include step edges, line and roof edges, and Mach bands. It was, in fact,

    investigations into the phenomenon of Mach bands by Morrone et al. [62] that led

    to the development of the local energy model. Mach bands are illusory bright and

    dark bands that appear on the edges of trapezoidal intensity gradient ramps, for

    example, on the edges of shadows. The classical explanation for the perception of

    Mach bands has been lateral inhibition (see Ratliff [74]). However, this explanation

    fails in that it predicts maximal perception of Mach bands on step edges, where

    in fact we see none. In their paper, Morrone et al. show that at the points where

    we perceive Mach bands the Fourier components of the signal are maximally in

    phase (though not exactly in phase); this led to their hypothesis that we perceive

    features in images at points of high phase congruency. Further work by Morrone and

    Burr [60] and Ross et al. [80] went on to show that this model successfully explains a

    number of other psychophysical effects in human feature perception. Other studies

    of the sensitivity of the human visual system to phase information include that by

    Burr [8], Field and Nachmias [21] and du Buf [18]. Fleet [25] argues strongly for

    the use of phase information in the calculation of image velocities. He shows that

    the motion of contours of constant phase in images provide a better measure of

  • 2.3. LOCAL ENERGY AND PHASE CONGRUENCY 19

    the motion field than contours of constant intensity amplitude in the image. Phase

    information is more robust to noise, and shading and contrast variations in the

    image.

    The classic demonstration of the importance of phase was devised by Oppenheim

    and Lim [65]. They took the Fourier transforms of two images and used the phase

    information from one image and the magnitude information of the other to construct

    a new, synthetic Fourier transform which was then back-transformed to produce a

    new image. The features seen in such an image, while somewhat scrambled, clearly

    correspond to those in the image from which the phase data was obtained. Little

    evidence, if any, from the other image can be perceived. A demonstration of this is

    repeated here in Figure 3.

    With phase data demonstrated as being so important in the perception of images

    it is natural that one should pursue the development of a feature detector that

    operates on the basis of phase information. From their work on Mach bands Morrone

    and Owens [61] quickly recognized that the local energy model had applications in

    feature detection for computer vision.

    2.3.1 Defining phase congruency

    We shall first consider one dimensional signals. The phase congruency function is

    developed from the Fourier series expansion of a signal, I at some location, x,

    I(x) =n

    An cos(nx+ n0) (1)

    =n

    An cos(n(x)) , (2)

    where An represents the amplitude of the nth cosine component, is a constant

    (usually 2), and n0 is the phase offset of the nth component (the phase offset also

    allows sine terms in the series to be represented). The function n(x) represents

    the local phase of the Fourier component at position x.

    Morrone and Owens define the phase congruency function as

    PC(x) = max(x)2[0,2]

    nAn cos(n(x) (x))

    nAn. (3)

  • 20 CHAPTER 2. IMAGE FEATURES

    (a) image providing magnitude data (b) image providing phase data

    (c) phase and amplitude mixed image

    Figure 3: When phase information from one image is combined with magnitudeinformation of another it is phase information that prevails.

    The value of (x) that maximizes Equation 3 is the amplitude weighted mean local

    phase angle of all the Fourier terms at the point being considered. Taking the cosine

    of the difference between the actual phase angle of a frequency component and this

    weighted mean, (x), generates a quantity approximately equal to one minus half

    this difference squared (the Taylor expansion of cos(x) 1 x2/2 for small x).Thus finding where phase congruency is a maximum is approximately equivalent to

    finding where the weighted variance of local phase angles, relative to the weighted

    average local phase, is a minimum (see Figure 4).

  • 2.3. LOCAL ENERGY AND PHASE CONGRUENCY 21

    An

    _

    (x)

    n(x)

    weighted mean ofFourier components

    Figure 4: Polar diagram of the components of a Fourier series at a point in a signal.The series is represented as a sequence of vectors, each vector having a length Anand local phase angle n.

    2.3.2 Local energy

    As it stands phase congruency is a rather awkward quantity to calculate. As an

    alternative to this Venkatesh and Owens [89] show that points of maximum phase

    congruency can be calculated equivalently by searching for peaks in the local energy

    function. The local energy function is defined for a one dimensional luminance

    profile, I(x), as the modulus of a complex number,

    E(x) =I2(x) +H2(x) , (4)

    where the real component is represented by I(x) and the imaginary component by

    iH(x), where i =1 and H(x) is the Hilbert transform of I(x) (a 90 degree phase

    shift of I(x)).

    Venkatesh and Owens prove that energy is equal to phase congruency scaled by

    the sum of the Fourier amplitudes, that is

    E(x) = PC(x)n

    An. (5)

    Thus the local energy function is directly proportional to the phase congruency

    function, so peaks in local energy will correspond to peaks in phase congruency.

  • 22 CHAPTER 2. IMAGE FEATURES

    Venkatesh and Owens formal proof is not repeated here but the relationship be-

    tween phase congruency, energy and the sum of the Fourier amplitudes can be

    seen geometrically in Figure 5. The local Fourier components are plotted as com-

    plex vectors adding head to tail. The sum of these components projected onto

    the real axis represent I(x), the original signal, and the projection onto the imag-

    inary axis represents H(x), the Hilbert transform. The magnitude of the vector

    from the origin to the end point is the total energy, E(x). One can see that E(x)

    is equal to

    nAncos(n(x) (x)). Recalling that phase congruency is equal tonAn(x)cos(n(x) (x))/

    nAn we can see that phase congruency is the ratio

    of E(x) to the overall path length taken by the local Fourier components in reach-

    ing the end point. Thus, one can clearly see that the degree of phase congruency

    is independent of the overall magnitude of the signal. This provides invariance to

    variations in image illumination and/or contrast.

    Realaxis

    Imaginaryaxis

    E(x)

    (x)

    nA

    n(x)

    _

    nn

    _

    A cos( (x) (x))

    H(x)

    I(x)

    Figure 5: Polar diagram showing the Fourier components at a location in the signalplotted head to tail. This arrangement illustrates the construction of energy, thesum of the Fourier amplitudes and phase congruency from the Fourier componentsof a signal.

    Rather than compute local energy via the Hilbert transform of the original

    luminance profile one can calculate a measure of local energy by convolving the

  • 2.3. LOCAL ENERGY AND PHASE CONGRUENCY 23

    signal with a pair of filters in quadrature. The signal is first convolved with a filter

    designed to remove the DC component from the image. This result is saved and

    the image is then convolved with a second filter that is in quadrature with the first

    (the Hilbert transform of the first). This gives us two signals, each being a band

    passed version of the original, and one being a 90 degree phase shift of the other.

    The results of the two convolutions are then squared and summed to produce a

    local energy function. Odd and even-symmetric Gabor functions can be used for

    the quadrature pair of filters. Thus local energy is defined by

    E(x) =(I(x) Me)2 + (I(x) Mo)2 , (6)

    whereMe andMo denote the even and odd symmetric filters in quadrature. Figure 6

    illustrates the calculation of local energy on a synthetic signal containing a variety

    of features.

    The calculation of energy from spatial filters in quadrature pairs has been cen-

    tral to many models of human visual perception, for example those proposed by

    Heeger [33, 34, 36], Adelson and Bergen [1] and Watson and Ahumada [93] to name

    just a few. The significance of Venkatesh and Owens work is that they provide

    another explanation for the perceptual importance of energy: Peaks in the energy

    function correspond to points where phase congruency is a maximum.

    From this early work by Morrone et al. [62], Morrone and Owens [61] and

    Venkatesh and Owens [89] the local energy model was developed further. Owens

    et al. [67] investigated the idempotency properties of the local energy feature de-

    tector. They argue that when any feature detecting operator is applied to its own

    output it should not change the output. That is, the primal sketch of a primal

    sketch should be itself. Gradient based detectors fail in this respect because they

    attempt to mark edges on each side of any line feature in an image. Local energy,

    on the other hand, produces a single response on a line feature, and hence satisfies

    the idempotency requirement. Venkatesh and Owens [88] investigated the classifi-

    cation of image features via the phase angle at which phase congruency occurs. In

    this manner they show how step, line and shadow edges can be distinguished from

    each other.

  • 24 CHAPTER 2. IMAGE FEATURES

    0

    0.5

    1

    1.5

    2

    2.5

    0 50 100 150 200 250

    signal

    odd-symmetric filtereven-symmetric filter

    -3

    -2

    -1

    0

    1

    2

    3

    0 50 100 150 200 250

    convolution with even filter

    -3

    -2

    -1

    0

    1

    2

    3

    0 50 100 150 200 250

    convolution with odd filter

    0

    0.5

    1

    1.5

    2

    2.5

    3

    0 50 100 150 200 250

    local energy

    square and sum

    Figure 6: Calculation of local energy via convolution with two filters in quadrature.

    Aw et al. [4] in their work on image compression make use of the fact that

    local energy makes no assumptions about the intensity profiles of features. They

    used local energy to detect features across a range of images, collecting information

    about commonly occuring intensity profiles of features in images. This catalogue of

    feature profiles enabled them to efficiently encode images for compression.

  • 2.3. LOCAL ENERGY AND PHASE CONGRUENCY 25

    Owens [66] identifies the conditions under which images have no local maxima

    in local energy, and hence are feature free. She also investigates image transforma-

    tions under which image features are preserved. It is pointed out that some image

    operations, such as addition between images, can destroy or create image features.

    She proposes two new operators for the interaction between images which do not

    corrupt feature structures within images. These operators are analogous to com-

    plex multiplication and complex division. Using these operators Owens shows how

    it is possible to decompose a signal into its feature component and its feature-free

    component.

    Other researchers who have studied the use of local energy for feature detection

    are Perona and Malik [68], Freeman [29] and Ronse [78]. Perona and Maliks work

    on local energy is interesting in that they arrive at a generalization of the model

    without using the concept of phase congruency. They point out that image features

    are generally composed of combinations of step, delta, roof and ramp structures.

    Under these conditions it is shown that linear filters will produce systematic er-

    rors in localization. Perona and Malik go on to show that a quadratic filtering

    approach results in the correct detection and localization of composite features.

    That is, instead of looking for maxima in (I(x) M) one should look for maximain

    i (I(x) Mi)2, where the Mi are a series of different filters. The local energymodel, in its use of two filters in quadrature, can be seen to be a specific case of

    quadratic filtering. Perona and Malik suggest that there is no special reason to

    use filters in quadrature and argue that one might wish to use quite different sets

    of filters. However, in the results they presented they chose to use two filters in

    quadrature; the second derivative of a Gaussian and its Hilbert transform.

    Freeman, in his thesis [29] studied the local energy model with particular em-

    phasis on multi-orientation analysis and the behaviour of local energy at feature

    junctions. He devised an approach to the detection and classification of feature

    junctions. The filters he used were generally second and fourth derivatives of Gaus-

    sians along with their corresponding Hilbert transforms, depending on the narrow-

    ness of the frequency tuning he required. As a tool for his multi-orientation analysis

  • 26 CHAPTER 2. IMAGE FEATURES

    Freeman developed the concept of steerable filters whereby filter outputs at any ori-

    entation can be efficiently computed from a linear combination of the outputs of a

    limited number of basis filters. Of relevance to the work presented in this thesis,

    Freeman developed a normalized measure of local energy. However, his motivation

    for doing this was primarily to allow image information to be represented over a

    small dynamic range rather than to specifically seek an invariant measure of feature

    significance. Some of his post-processing techniques might also be considered to be

    somewhat ad hoc. Despite this he considers a wide range of issues concerning the

    use of local energy for feature detection.

    Ronse [78] makes a detailed mathematical study of the idempotency properties

    of the local energy model and the conditions of image modification over which local

    energy remains invariant. An important result, that will be used later, is that

    the locations of local energy peaks are invariant to smoothing of the image by a

    Gaussian or any other function having zero Fourier phase.

    Rosenthaler et al. [79] make a comprehensive study of the behaviour of local

    energy at 2D image feature points. They develop a model of 2D feature detection

    based on differential geometry, using the first and second derivatives of oriented local

    energy to identify what they call keypoints. Robbins and Owens [76] have followed

    on from Rosenthaler et al.s work and developed a simpler model of 2D feature

    detection that does not resort to the use of derivatives of the local energy signal.

    Instead, they detect 2D features by calculating oriented local energy over the image

    and then calculate local energy of this local energy image, but in an orientation

    perpendicular to the first. The second application of local energy detects the end

    points of any features detected by the first application of local energy. This process

    is then repeated over multiple orientations to capture all 2D features.

    Wang and Jenkin [92] use complex Gabor filters to detect edges and bars in

    images. They recognize that step edges and bars have specific local phase properties

    which can be detected using filters in quadrature, however they do not connect the

    significance of high local energy with the concept of phase congruency.

    One issue that previous work on local energy has not really addressed is the

    problem of how one should integrate data over many scales. If the perceptual

  • 2.4. ISSUES IN CALCULATING PHASE CONGRUENCY 27

    significance of a peak in local energy is due to it also being a maximum in phase

    congruency then it is important to consider many scales simultaneously. After all,

    it is the occurrence of phase congruency over a range frequencies that makes it

    significant.

    While the use of the local energy function to find peaks in phase congruency is

    computationally convenient it does not provide a dimensionless measure of feature

    significance as it is weighted by the sum of the Fourier component amplitudes,

    which have units lux. Thus, like derivative based feature detectors, local energy

    suffers from the problem that we are unable to specify in advance what level of

    response corresponds to a significant feature. Despite this, local energy remains a

    useful measure in that it responds to a wide range of feature types.

    Phase congruency, on the other hand, is a dimensionless quantity. We obtain it

    by normalizing the local energy function; dividing energy by the sum of the Fourier

    amplitudes. Values of phase congruency vary from a maximum of 1, indicating a

    very significant feature, down to 0 indicating no significance. This property offers

    the promise of allowing one to specify universal feature thresholds, that is, we could

    set thresholds before an image is seen - truly automated feature detection.

    2.4 Issues in calculating phase congruency

    This section describes an initial attempt at devising a way of calculating phase

    congruency. What is highlighted is that there are a number of difficulties that

    have to be overcome if a practical method of calculating phase congruency is to be

    devised. These problems include the following: How should one extend the idea of

    phase congruency to 2D signals? What is the appropriate way of controlling the

    scale of analysis? How should information be integrated over many scales, and how

    can the influence of noise be overcome?

    As mentioned earlier, phase congruency is awkward to calculate. An initial

    approach to calculating phase congruency might be to take a signal, remove its

    DC component, (it is removed because a 90 degree phase shift of a zero frequency

    does not have any meaning) calculate the Hilbert transform (say, by calculating

  • 28 CHAPTER 2. IMAGE FEATURES

    the Fourier transform, multiplying the result by i and then performing an inverse

    Fourier transform), square and sum the Hilbert transform and the AC component

    of the signal, and finally normalize the result by dividing by the sum of the Fourier

    amplitudes. Results using this method were reported by Kovesi [47] (further work

    in which wavelets are used to calculate phase congruency were also presented by

    Kovesi [48]). An example of the calculation of phase congruency via the FFT is

    shown in Figure 7.

    square, sum and divide by sum of Fourier amplitudes

    0

    0.5

    1

    1.5

    2

    2.5

    0 50 100 150 200 250

    signal

    -1.5-1

    -0.50

    0.51

    1.52

    2.5

    0 50 100 150 200 250

    signal (DC removed)

    -4-3-2-10123

    0 50 100 150 200 250

    Hilbert transform

    0

    0.2

    0.4

    0.6

    0.8

    1

    0 50 100 150 200 250

    phase congruency

    Figure 7: Calculation of phase congruency via the FFT. Notice how phase congru-ency values range between 0 and 1.

    There are some problems with the calculation of phase congruency via the FFT.

    Firstly it is not clear how one adapts this approach for one-dimensional signals to

    two dimensions; the Hilbert transform is only defined in one dimension. A sec-

    ond difficulty is that the Fourier transform is not good for localizing frequency

  • 2.4. ISSUES IN CALCULATING PHASE CONGRUENCY 29

    information spatially. In the example shown in Figure 7 the Fourier transform

    was calculated over the whole signal. Thus phase congruency at each point was

    calculated with respect to the whole signal. To control the local scale and spa-

    tial extent over which phase congruency is determined we have to use windowing

    of the signal. Windowing introduces the problem of having to balance spatial lo-

    calization against the range of frequencies we wish to analyze; the window width

    controlling spatial localization but also constraining the lowest frequency we can

    measure. Figure 8 shows the result of calculating phase congruency using a rect-

    angular windowing function 32 points wide. The computational procedure was as

    follows: Over each windowed section of the signal the Fourier transform was calcu-

    lated, and the Hilbert transform generated. The signal value (minus the DC value)

    and the Hilbert transform value at the centre of the window was then squared and

    summed; this quantity would then be divided by the sum of the Fourier amplitudes

    over the current window to produce a phase congruency value at the centre position

    of the window. The window would then be indexed one point forward in the signal

    and the process repeated. Notice how the peaks in phase congruency are higher

    and more distinct. By windowing the signal each feature is considered in relative

    isolation to the others and hence ends up being considered to be very significant.

    An important point to note here is that for the calculation of phase congruency the

    natural scale parameter to vary is the size of the analysis window over which we

    calculate local frequency information. A large window means that the significance

    of features are determined in a more global manner, and a small window results

    in features being treated individually and locally. This leads to a new concept of

    multi-scale analysis which will be discussed in detail in the next chapter.

    If the scale of analysis of phase congruency is controlled by window size we must

    consider what might happen when a windowed section of signal contains no features

    and only consists of noise. Being a normalized quantity, phase congruency does not

    depend on the magnitude of a feature on its own, it depends on the magnitude of

    the feature in the context of the local window. Thus, if the signal is purely noise

    each fluctuation in the signal will be considered quite significant relative to the

    surrounding features as they will all be of similar magnitude. Hence, noise poses

  • 30 CHAPTER 2. IMAGE FEATURES

    square, sum and divide by sum of Fourier amplitudes

    0

    0.5

    1

    1.5

    2

    2.5

    0 50 100 150 200 250

    signal

    -1.5-1

    -0.50

    0.51

    1.52

    2.5

    0 50 100 150 200 250

    signal (DC removed)

    -4-3-2-10123

    0 50 100 150 200 250

    Hilbert transform

    0

    0.2

    0.4

    0.6

    0.8

    1

    0 50 100 150 200 250

    phase congruency

    Figure 8: Calculation of phase congruency via the FFT using a rectangular win-dowing function 32 points wide.

    a serious difficulty for us in trying to devise a practical way of calculating phase

    congruency in images. Figure 9 shows what happens if we introduce a small amount

    of noise into our signal. In regions that are distant from features the influence of

    noise becomes very noticeable.

    A further issue we must also consider is that phase congruency as defined in

    Equation 3 does not take into account the spread of frequencies that are congruent

    at a point. For example, a signal containing only one frequency component, say a

    sine wave, will be in perfect congruence with itself and hence have phase congruency

    of 1 everywhere (the Hilbert transform of sine is cosine, and sin2(x) + cos2(x) is

    identically 1 and so no point x has maximal local energy). To mark all such points

    as features would not make sense. Significant feature points are presumably ones

  • 2.5. SUMMARY 31

    square, sum and divide by sum of Fourier amplitudes

    -1.5-1

    -0.50

    0.51

    1.52

    2.5

    0 50 100 150 200 250

    noisy signal (DC removed)

    -4-3-2-10123

    0 50 100 150 200 250

    Hilbert transform

    0

    0.5

    1

    1.5

    2

    2.5

    0 50 100 150 200 250

    noisy signal

    0

    0.2

    0.4

    0.6

    0.8

    1

    0 50 100 150 200 250

    phase congruency

    Figure 9: Phase congruency of a noisy signal calculated using a rectangular win-dowing function 32 points wide.

    with high information content; a point of phase congruency indicates a point of

    high information content only if we have a wide range of frequencies present. We

    do not gain much information from knowing the phase congruency of a signal which

    has only one frequency component.

    2.5 Summary

    This chapter has briefly re-examined the aims of feature detection. The objective

    should be to find points of high information content in images. This objective is not

    necessarily satisfied by finding optimal ways of detecting step edges in the presence

    of noise. Ideally, the ability to detect features and assess their significance should

  • 32 CHAPTER 2. IMAGE FEATURES

    be independent of image contrast and spatial magnification. This implies that we

    need to measure feature significance via a dimensionless quantity.

    The shortcomings of derivative based feature detectors have been briefly re-

    viewed. The main problems included an inability to specify in advance what level

    of response corresponds to a significant feature, and the fact that they are generally

    only designed to detect step edges. The local energy model of feature perception

    has been introduced. This model has been inspired from psychophysical data and it

    detects a wide range of feature types. Local energy can be normalized to produce a

    measure of phase congruency; an approximation of the standard deviation in phase

    angles of the Fourier components at a point in the signal. Phase congruency is

    a dimensionless quantity and is thus an attractive way of detecting features and

    identifying their significance. It provides an absolute measure of the significance of

    feature points in an image and this offers the promise of allowing constant thresh-

    old values to be applied across wide classes of images. Thresholds could then be

    specified in advance of processing any image, and not have to be determined by

    trial and error after processing.

    However, there are a number of issues to be addressed. How should phase

    congruency be calculated in 2D images? How should we calculate local frequency

    information and control the scale of analysis? How do we deal with the influence of

    noise, and how do we identify the range of frequencies present at a point of phase

    congruency? These issues, and others, are addressed in the following chapter which

    will describe how phase congruency can be calculated using wavelets.

  • Chapter 3

    Phase congruency from wavelets

    3.1 Introduction

    This chapter describes a new way of calculating phase congruency using wavelets. In

    calculating phase congruency it is important to obtain spatially localized frequency

    information in images, wavelets offer perhaps the best way of doing this. The use

    of wavelets is also biologically inspired; the interest in calculating phase congruency

    is motivated from psychophysical results, hence, it would seem natural that one

    should try to calculate it using biologically plausible computational machinery. In

    this respect geometrically scaled spatial filters in quadrature pairs will be used. In

    addition to this it will be seen how the use of wavelets allows one to address the

    issues raised at the end of the previous chapter regarding the calculation of phase

    congruency.

    The material that will be covered in this chapter is organized as follows: First,

    it will be shown how local frequency information can be obtained using quadrature

    pairs of wavelets, concentrating in particular on the use of Gabor wavelets. From

    this it is relatively straightforward to develop the ideas behind the calculation of

    phase congruency in one dimensional signals using wavelets. Material is then pre-

    sented to address the difficulties regarding the calculation of phase congruency that

    were introduced in the previous chapter. First, the influence of noise in the cal-

    culation of phase congruency is considered and an effective method for identifying

    and compensating for noise is developed. This is followed by a section covering the

    33

  • 34 CHAPTER 3. PHASE CONGRUENCY FROM WAVELETS

    issues involved in extending the calculation of phase congruency to 2D images. It

    is then shown how the use of wavelets allow us to obtain a measure of the spread

    of frequencies present at a point of phase congruency. This helps one determine

    the degree of significance of a point of phase congruency and allows to improve

    feature localization. The issue of analysis at different scales is then considered and

    it is argued that high-pass filtering should be used to obtain image information at

    different scales instead of the more usually applied low-pass filtering. Finally, some

    results and conclusions are presented.

    3.2 Using Wavelets for Local Frequency Analysis

    Recently the Wavelet Transform has become one of the methods of choice for ob-

    taining local frequency information. Most of the current literature on wavelets

    can be traced back to the work of Morlet et al. [59] Morlet and his co-workers

    were interested in obtaining temporally localized frequency data in their analysis

    of geophysical signals. The basic idea behind wavelet analysis is that one uses a

    bank of filters to analyze the signal. The filters are all created from rescalings of

    the one wave shape, each scaling designed to pick out particular frequencies of the

    signal being analyzed. An important feature is that the scales of the filters vary

    geometrically, giving rise to a logarithmic frequency scale.

    However, many of these ideas were developed earlier by Granlund [30]. In this

    remarkable paper he developed many of the ideas behind what we would now call

    multi-scale wavelet analysis. He also proposed an image feature detector that is

    closely related to the local energy model. For some reason Granlunds paper has

    remained relatively unnoticed despite its innovative nature, though his work has

    been developed by Wilson, Calway and Knutsson (see, for example, Wilson, Cal-

    way and Granlund [96], Knutsson, Wilson and Granlund [43], Calway and Wil-

    son [10] and Calway, Knutsson and Wilson [9]). From the initial work of Morlet

    and his colleagues wavelet theory has been subsequently developed by Grossmann

    and Morlet [31], Meyer [58], Daubechies [14], Mallat [53] and many others.

  • 3.2. USING WAVELETS FOR LOCAL FREQUENCY ANALYSIS 35

    We are interested in calculating local frequency and, in particular, phase infor-

    mation in signals. To preserve phase information linear phase filters must be used,

    that is, we must use wavelets that are symmetric/anti-symmetric. This constraint

    means that the work on orthogonal wavelets (which dominates much of the litera-

    ture) is not applicable to us. Chui [13] provides a proof that, with the exception

    of the Haar wavelet, one cannot have a wavelet of compact support that is both

    symmetric and orthogonal. The Haar wavelet is rectangular in shape and is clearly

    not appropriate for our needs.

    For this work the approach of Morlet will be followed, that is, using wavelets

    based on complex valued Gabor functions - sine and cosine waves, each modulated

    by a Gaussian. Using two filters in quadrature enables one to calculate the ampli-

    tude and phase of the signal for a particular frequency at a given spatial location. It

    should be noted that these wavelets are not orthogonal; some conditions must apply

    in order to achieve reasonable signal reconstruction after decomposition. However,

    we only require approximate reconstruction up to a scale factor over a band of

    frequencies or wavelet scales.

    Odd Wavelet

    Even Wavelet

    Odd Wavelet

    Even Wavelet

    Figure 10: Gabor wavelet: a sine and cosine wave modulated by a Gaussian.

    If the bank of wavelet filters is designed so that the transfer function of each

    filter overlaps sufficiently with its neighbours in such a way that the sum of all

    the transfer functions forms a relatively uniform coverage of the spectrum one can

    reconstruct the decomposed signal over a band of frequencies up to a scale factor.

    (If the transfer functions are scaled so that when their sum is taken we obtain a

  • 36 CHAPTER 3. PHASE CONGRUENCY FROM WAVELETS

    uniform transfer function of magnitude one, the reconstructed signal will have the

    original scale.) Therefore, a problem we have is determining the appropriate scaling

    factor between successive centre frequencies so that the overlap between transfer

    functions results in an even spectral coverage. Granlund [30] suggests that the upper

    cutoff frequency of one transfer function (where it falls to half its maximum value)

    should coincide with the lower cutoff frequency of the next function. However, in

    practice this does not produce particularly even coverage, and a closer spacing is

    generally desirable. In the results presented in this chapter the filters used have

    had bandwidths of approximately one octave with a scaling between successive

    centre frequencies of 1.5. This arrangement was arrived at by experimentation, the

    values are not critical and a wide range of parameters produce satisfactory results.

    Referring to Figure 11 one can see that, in this example, the sum of the spectra of

    the five wavelets produces a relatively ideal band-pass filter, especially when viewed

    on the log frequency scale. Design of the wavelet bank ends up being a compromise

    between wishing to form a smooth sum of spectra while at the same time minimizing

    the number of filters used so as to minimize the computation requirements.

    Analysis of a signal is done by convolving the signal with each of the wavelets.

    If we let I denote the signal and Menand Mo

    ndenote the even and odd wavelets at

    a scale n, the amplitude of the transform at a given wavelet scale is given by

    An(x) =(I(x) Men)2 + (I(x) Mon)2 (7)

    and the phase is given by

    n(x) = atan2(I(x) Men, I(x) Mon). (8)

    Note that from now on n will be used to refer to wavelet scale (previously n has

    denoted frequency in the Fourier series of a signal).

    The results of convolving a signal with a bank of wavelets can be displayed

    graphically via a scalogram (Figure 12). Each row of the scalogram is the result of

    convolving the signal with a quadrature pair of wavelets at a certain scale. Phase

    is plotted by mapping 0360 degrees to 0255 grey levels (note therefore, that

    the black/while discontinuities in the scalogram correspond to the wrap-around in

  • 3.2. USING WAVELETS FOR LOCAL FREQUENCY ANALYSIS 37

    even-symmetric wavelets odd-symmetric wavelets

    0 0.125 0.25 0.375 0.5frequency

    spectra

    .125 .5log frequency

    spectra (log w)

    0 0.125 0.25 0.375 0.5frequency

    sum of spectra

    .125 .5log frequency

    sum of spectra (log w)

    Figure 11: Five wavelets and their respective Fourier Transforms indicating whichsections of the spectrum each wavelet responds to. Collectively the wavelets pro-vide a wide coverage of the spectrum, though with some overlap. Note that on alogarithmic frequency scale the spectra are identical.

    phase). The vertical axis of the scalogram is a logarithmic frequency scale, with the

    lowest frequency at the bottom. Each column of the scalogram can be considered to

    be a local Fourier spectrum for each point. Note that to achieve a dense scalogram

    such as shown here the scaling factor between successive filter center frequencies

    will be only slightly greater than 1.

    The phase plot of the scalogram is of particular interest because it enables one

    to actually see the points of high phase congruency. At locations in the signal where

    there are large step changes one can see a vertical line of constant grey value in the

    phase diagram indicating a constant phase angle over all frequencies at that point

    in the signal.

  • 38 CHAPTER 3. PHASE CONGRUENCY FROM WAVELETS

    Signal to be Analyzed

    Magnitude of Scalogram

    * * * *Phase of Scalogram

    Figure 12: A one dimensional signal and its amplitude and phase scalograms. Thehorizontal axes of the scalograms correspond directly with the signals horizontalaxis. The vertical axes of the scalograms correspond to a logarithmic frequency scalewith low frequencies at the bottom. The asterisks mark vertical lines of constantphase that occur at the step transitions in the signal. These are points of phasecongruency. (Note: the phase scalogram is presented by mapping 0360 degrees to0255 grey levels).

  • 3.3. CALCULATING PHASE CONGRUENCY VIA WAVELETS 39

    3.3 Calculating Phase Congruency Via Wavelets

    To calculate phase congruency we need to construct the fo