quantitative comparison of the performance of sar segmentation algorithms

13
1534 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 7, NO. 11, NOVEMBER 1998 Quantitative Comparison of the Performance of SAR Segmentation Algorithms Ronald Caves, Shaun Quegan, Member, IEEE, and Richard White Abstract—Methods to evaluate the performance of segmenta- tion algorithms for synthetic radar (SAR) images are developed, based on known properties of coherent speckle and a scene model in which areas of constant backscatter coefficient are separated by abrupt edges. Local and global measures of seg- mentation homogeneity are derived and applied to the outputs of two segmentation algorithms developed for SAR data, one based on iterative edge detection and segment growing, the other based on global maximum a posteriori (MAP) estimation using simulated annealing. The quantitative statistically based measures appear consistent with visual impressions of the relative quality of the segmentations produced by the two algorithms. On simulated data meeting algorithm assumptions, both algorithms performed well but MAP methods appeared visually and mea- surably better. On real data, MAP estimation was markedly the better method and retained performance comparable to that on simulated data, while the performance of the other algorithm deteriorated sharply. Improvements in the performance measures will require a more realistic scene model and techniques to recognize oversegmentation. I. INTRODUCTION W HEN microwaves scatter from the Earth’s surface, they suffer changes in amplitude and phase caused by the local properties of the surface. A single pixel in a synthetic aperture radar (SAR) complex image provides a measurement of these changes, averaged over an area determined by the point spread function of the instrument. When distributed targets are being imaged, this weighted averaging typically contains contributions from many scatterers with effectively random phase, giving rise to interference and the well-known speckle phenomenon. As a consequence, single-channel SAR phase measurements from distributed targets display a uniform distribution and thus convey no information. Since distributed scatterers are the objects of prime interest in many applica- tions, it is therefore common to provide SAR images simply in amplitude form, and to regard the backscatter coefficient as the only important measurement. The backscatter coefficient is proportional to the measured power, which is given by the square of the amplitude data. Speckle in an amplitude or intensity image manifests it- self as a noiselike multiplicative modulation of so that Manuscript received February 25, 1996; revised January 27, 1998. This work was supported in part by SERC under Research Grant GR/H 90636. The associate editor coordinating the review of this manuscript and approving it for publication was Dr. Andrew F. Laine. R. Caves and S. Quegan are with the Sheffield Centre for Earth Obser- vation Science, University of Sheffield, Sheffield S3 7RH, U.K. (e-mail: [email protected]). R. White is with the Defence Research Agency, Great Malvern, Worcs. WR14 3PS, U.K. (e-mail: [email protected]). Publisher Item Identifier S 1057-7149(98)07748-3. individual pixels provide very inaccurate measurements of the true In order to counter its damaging effects, various approaches are possible. The simplest is to average over several resolution cells. When independent samples are averaged, this is known as multilooking, and gives rise to - look data. This procedure is often carried out inside the SAR processor, so that the single-look data are never formed. Image analysis approaches to the problem start from the single or multilook data and attempt to estimate the underlying based on models for the multiplicative speckle and the structure of The elementary properties of -look speckle are well-known [1], but there are difficulties associated with the correlation properties induced by the sampling of the image (see Section V). Simple models for treat it as locally uniform or gamma distributed, with or without a spatial correlation structure, and form the basis for several adaptive filtering schemes [2]–[4]. More general adaptive models test for structure in the form of pointlike objects, thin lines and edges, before defining the filtering window [5]. Filtering techniques are by their nature local. A global approach is to seek the underlying for the whole scene which optimizes some criterion. For example, methods have been developed that attempt to find the maximum a posteriori (MAP) solution for which is locally smooth except where abrupt edges occur [6], [7]. One such method is discussed in this paper. Both filtering and global optimization are reconstruction techniques, attempting to approximate everywhere in the image. For many purposes, it is then necessary to impose struc- ture on the image by the recognition of regions corresponding to fields, water bodies, ice floes, etc. A different approach is to start from the need to identify structure within the image, leading to image segmentation. The associated “world model” is that the scene consists entirely of regions that are in some sense homogeneous. Adjacent regions are separated by boundaries corresponding to changes in some local statistic, such as mean brightness or a texture variable. The simplest model of this type is that is uniform within segments, with edges marking an abrupt change in Segmentation is perhaps the lowest level description on which image understanding can be based. Because its output contains an explicit description of image structure (unlike the reconstruction algorithms), it supports high level concepts such as shape, area, adjacency, etc., and the criteria by which it is judged are consequently more severe. As well as meeting objective conditions, such as region homogeneity and mea- surable differences between adjacent segments, a successful 1057–7149/98$10.00 1998 IEEE

Upload: r

Post on 11-Mar-2017

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Quantitative comparison of the performance of SAR segmentation algorithms

1534 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 7, NO. 11, NOVEMBER 1998

Quantitative Comparison of the Performanceof SAR Segmentation Algorithms

Ronald Caves, Shaun Quegan,Member, IEEE,and Richard White

Abstract—Methods to evaluate the performance of segmenta-tion algorithms for synthetic radar (SAR) images are developed,based on known properties of coherent speckle and a scenemodel in which areas of constant backscatter coefficient areseparated by abrupt edges. Local and global measures of seg-mentation homogeneity are derived and applied to the outputsof two segmentation algorithms developed for SAR data, onebased on iterative edge detection and segment growing, theother based on global maximuma posteriori (MAP) estimationusing simulated annealing. The quantitative statistically basedmeasures appear consistent with visual impressions of the relativequality of the segmentations produced by the two algorithms. Onsimulated data meeting algorithm assumptions, both algorithmsperformed well but MAP methods appeared visually and mea-surably better. On real data, MAP estimation was markedly thebetter method and retained performance comparable to that onsimulated data, while the performance of the other algorithmdeteriorated sharply. Improvements in the performance measureswill require a more realistic scene model and techniques torecognize oversegmentation.

I. INTRODUCTION

W HEN microwaves scatter from the Earth’s surface, theysuffer changes in amplitude and phase caused by the

local properties of the surface. A single pixel in a syntheticaperture radar (SAR) complex image provides a measurementof these changes, averaged over an area determined by thepoint spread function of the instrument. When distributedtargets are being imaged, this weighted averaging typicallycontains contributions from many scatterers with effectivelyrandom phase, giving rise to interference and the well-knownspeckle phenomenon. As a consequence, single-channel SARphase measurements from distributed targets display a uniformdistribution and thus convey no information. Since distributedscatterers are the objects of prime interest in many applica-tions, it is therefore common to provide SAR images simply inamplitude form, and to regard the backscatter coefficientas the only important measurement. The backscatter coefficientis proportional to the measured power, which is given by thesquare of the amplitude data.

Speckle in an amplitude or intensity image manifests it-self as a noiselike multiplicative modulation of so that

Manuscript received February 25, 1996; revised January 27, 1998. Thiswork was supported in part by SERC under Research Grant GR/H 90636.The associate editor coordinating the review of this manuscript and approvingit for publication was Dr. Andrew F. Laine.

R. Caves and S. Quegan are with the Sheffield Centre for Earth Obser-vation Science, University of Sheffield, Sheffield S3 7RH, U.K. (e-mail:[email protected]).

R. White is with the Defence Research Agency, Great Malvern, Worcs.WR14 3PS, U.K. (e-mail: [email protected]).

Publisher Item Identifier S 1057-7149(98)07748-3.

individual pixels provide very inaccurate measurements ofthe true In order to counter its damaging effects, variousapproaches are possible. The simplest is to averageoverseveral resolution cells. When independent samples areaveraged, this is known asmultilooking, and gives rise to -look data. This procedure is often carried out inside the SARprocessor, so that the single-look data are never formed.

Image analysis approaches to the problem start from thesingle or multilook data and attempt to estimate the underlying

based on models for the multiplicative speckle and thestructure of The elementary properties of-look speckleare well-known [1], but there are difficulties associated withthe correlation properties induced by the sampling of theimage (see Section V). Simple models for treat it aslocally uniform or gamma distributed, with or without a spatialcorrelation structure, and form the basis for several adaptivefiltering schemes [2]–[4]. More general adaptive models testfor structure in the form of pointlike objects, thin lines andedges, before defining the filtering window [5].

Filtering techniques are by their nature local. A globalapproach is to seek the underlying for the whole scenewhich optimizes some criterion. For example, methods havebeen developed that attempt to find the maximuma posteriori(MAP) solution for which is locally smooth except whereabrupt edges occur [6], [7]. One such method is discussed inthis paper.

Both filtering and global optimization are reconstructiontechniques, attempting to approximate everywhere in theimage. For many purposes, it is then necessary to impose struc-ture on the image by the recognition of regions correspondingto fields, water bodies, ice floes, etc. A different approachis to start from the need to identify structure within theimage, leading to image segmentation. The associated “worldmodel” is that the scene consists entirely of regions that are insome sense homogeneous. Adjacent regions are separated byboundaries corresponding to changes in some local statistic,such as mean brightness or a texture variable. The simplestmodel of this type is that is uniform within segments, withedges marking an abrupt change in

Segmentation is perhaps the lowest level description onwhich image understanding can be based. Because its outputcontains an explicit description of image structure (unlike thereconstruction algorithms), it supports high level concepts suchas shape, area, adjacency, etc., and the criteria by which itis judged are consequently more severe. As well as meetingobjective conditions, such as region homogeneity and mea-surable differences between adjacent segments, a successful

1057–7149/98$10.00 1998 IEEE

Page 2: Quantitative comparison of the performance of SAR segmentation algorithms

CAVES et al.: SAR SEGMENTATION ALGORITHMS 1535

segmentation may be required to conform reasonably to humanperception of an image. This condition is necessary, but cannotbe made too rigorous, if only because segmentation of animage by hand is not unique. Different human operators makedifferent choices, and the same operator may make differentchoices at different times. In contrast, any given algorithm willforce a particular solution (and indeed must do so to be useful).

In this paper, we are primarily concerned with developinga strategy by which to assess segmentation algorithms, onthe basis of models for the speckle and the underlying scenedescribed in Section II. Properties of the ratio image formed bydividing the original image by its segmentation are developedin Section III, and are shown to provide both local andglobal measures of segmentation performance. An exhaustiveassessment of all available segmentation algorithms would beout of place, and to illustrate the value of the measures werestrict our attention to just two algorithms that have beendeveloped specifically for SAR images. They are describedbriefly in Section IV, before comparing their performancesin Section V. The implications of these comparisons arediscussed in Section VI, both as regards the performancemeasures (which are generally applicable) and the particularalgorithms they have been used to assess.

II. I MAGE MODEL

For an -look SAR image, coherent speckle causes the ob-served intensity in a uniform extended target to have a prob-ability density function (pdf) which is gamma distributed [1]:

(1)

The mean intensity is proportional to with a constant ofproportionality which will be assumed to be unity. This dis-tribution has variance and squared coefficientof variation

(2)

where denotes expectation. Another important moment isthe normalized log intensity

(3)

in which is the digamma function [8]. For analysispurposes, speckle values will be treated as statistically inde-pendent at different pixels. The consequences of the failure ofthis condition will be noted in Section V.

We assume the simplest possible structure for the scene, inwhich regions of constant are separated by abrupt changesin intensity. This will be referred to as the “cartoon” model.The aim of segmentation is to recover these regions from theimage data. More specifically, a segmentation splits the imageinto disjoint connected subsets. The number of pixels insegment is denoted so that is thetotal number of pixels in the image. Ideally, the recoveredsegments will correspond to true scene elements. Note thatthe cartoon model does not take account of gradual changesor fluctuations in (texture) within regions.

In the following, the notation indicates the average valueof a quantity over the whole image, while denotes theaverage value within segment i.e.,

(4)

where the summation is over all pixel positions in segmentWe will assume that a given segmentation assigns the averageintensity to each pixel in segment

III. PERFORMANCE MEASURES

Objective assessment of an algorithm’s performance re-quires testing the consistency between the segmentations itproduces and the cartoon model upon which it is based. Thismeans that segments must be homogeneous and statisticallydistinct from their neighbors. The latter condition can be forcedby segment merging (see Section IV), so that segments needonly be tested for homogeneity. This section describes waysof carrying out such tests for an arbitrary segmentation.

An important quantity in discussing homogeneity is theratio image formed by dividing the original image by itssegmentation. If every pixel were divided by its truethe ratio image would consist of pure speckle with unitmean (i.e., its distribution would be given by (1) with

Residual structure in the ratio image indicates regionswhere segmentation has failed, which can occur in two ways:either segments lie across true edges in the scene or truehomogeneous areas in the scene have been split into severalsegments. In the latter case, each subarea may be assigned adifferent as a result of estimating from a finite numberof pixels. Even when no structure is obvious in the ratio image,it still provides important indicators of the success of thesegmentation, as we show.

A. Intensity Coefficient of Variation

By construction, the average ratio over a segment isunity, regardless of its size or whether it is truly homogeneous.However, a test for homogeneity is provided by the samplevariance of the ratio within the segment. This is identical tothe estimated within the segment, since

(5)

Although for the population takes the value its estimatewithin a finite sample has a negative bias, since

(6)

and a variance given by

(7)

These results are derived in the Appendix.

Page 3: Quantitative comparison of the performance of SAR segmentation algorithms

1536 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 7, NO. 11, NOVEMBER 1998

A global measure of homogeneity is provided by the vari-ance of the full ratio image, This is equivalent to aweighted average of the within segments, since

(8)

Expressions for the mean value and variance offollowimmediately from (6) and (7), since, if where the

are constants and the are independent random variables,then and

B. Likelihood and Normalized Log Intensity

The measure can be used to assess the homogeneityof a single segmentation. Alternatively, segmentations canbe comparedby estimating the likelihood of the observedimage, under the assumption that the underlyingis givenby the average intensity within segments. We would expecta more accurate segmentation to generate a more accuratemean image, and hence a higher likelihood. An importantconstraint in applying the likelihood principle is that theaverage intensities of adjacent segments must be significantlydifferent. Without this, the maximum likelihood solution issimply the original image.

The log-likelihood per pixel, of the intensity imagegiven a specified image, is defined as

(9)

where is the PDF of the intensity given the meanUsing (1), we find that

(10)

where is the image whose value at positionisFor any segmentation, correct or not, it is easy to show that

and

so that

(11)

In this expression, the number of looks is a constant and thefourth term is the average log intensity of the original image.Hence, only the summation in the last term is dependent onthe segmentation. This summation can be written as

(12)

with

(13)

It is simply related to the ratio image, since

(14)

(where the sum is over all pixels in segmentand

(15)

The quantities and will be referred to as normalized log(NL) measures. They cannot be positive and are zero onlywhen all pixels have the same intensity. The likelihood istherefore maximized when is minimized. Heterogeneityis indicated when or take unusually large values.To identify large values requires the variance, which for ahomogeneous segment is given by

(16)

where

(17)

(see the Appendix). The estimate of is positively biased[compare (3)], since for all positive [8, Eq.(6.3.21)]. If all segments are homogeneous, then the meanand variance of the global measureare easily derived from(12), (16), and (17).

C. Comments on the Homogeneity Measures

A key feature of the homogeneity measures described inSections III-A and III-B is that their means and variancesare dependent solely on the number of looks,(which isan invariant of the imaging system) and the set of segmentsizes [see (6), (7), (16), and (17)]. This allows usto test whether a segmentation of any image, whether realor simulated, is consistent with the segments being trulyhomogeneous. However, since only the means and variancesof the measures are available and we have little knowledge oftheir sampling distributions, we are currently unable to defineconfidence intervals. Hence, although the tests are generallyapplicable, they are fairly crude and probably their most usefulaspect is in comparing segmentations.

Note that segmentations can only be fairly compared if theyeach contain approximately the same number of segments.This is because the mean can always be reduced andthe mean likelihood increased by splitting segments. Forthe comparisons in Section V, this condition is thereforeimposed, by suitable choices of the parameters controlling thesegmentations.

Page 4: Quantitative comparison of the performance of SAR segmentation algorithms

CAVES et al.: SAR SEGMENTATION ALGORITHMS 1537

Fig. 1. Flow diagram of the RWSEG algorithm;� denotes standard devi-ation.

IV. SEGMENTATION ALGORITHMS

In this section we describe two segmentation algorithms,RWSEG and ANNEAL, which will be compared using theperformance measures described in the last section. Bothalgorithms were developed specifically for SAR data and havebeen used for image analysis in a number of applicationstudies (see, for example, [9]). We also describe a segmentmerging procedure that is applied to produce the final seg-mentation.

A. RWSEG

This algorithm is illustrated by the flow diagram in Fig. 1.It involves an iterative process in which detected edges areused to limit segment growing. The resulting segmentation isthen used to generate more accurate edge detections and, inturn, an improved segmentation. After each iteration the globalaverage is measured, and iteration continues while thisdecreases. The steps in this process are described briefly inthe following subsections; a fuller description may be foundin [10] and [11].

1) Edge Detection:Edges are detected by multiscale gradi-ent operators with thresholds adapted to window size and thelocal standard deviation. The choice of edge detection strategyis based on four criteria:

• it must be able to “learn” from the previous segmentation;• the probability of false edge detection in a homogeneous

segment (the false alarm rate) must be independent ofsegment mean intensity;

• the probability of false alarm (PFA) must be selectable;• it must be possible to detect both low and high contrast

edges without causing edge thickening.

At each point in the image, the gradient across a rectangularwindow centered at that point can be estimated by splittingthe window into two portions (which for the moment areeach assumed to contain pixels), estimating the mean valuewithin each subwindow by averaging the pixels within it, andthen taking the difference of these two averages. The output

has mean and standard deviation

(18)

where and are the mean and standard deviation in onesubwindow, and and are the corresponding values inthe other. In order to produce a constant false alarm rateover homogeneous regions, the output of the gradient operatormust be normalized by its standard deviation. From (18), thisrequires an estimate of the variance in each subwindow, whichcan be calculated from the previous segmentation by takingthe average variance of segments overlapping the subwindow,weighted by the area of the overlap. In this manner, both theoriginal image and the previous segmentation are fed into theedge detection stage. For the first iteration the whole image istreated as a single segment.

Edges are then detected by applying a threshold to theabsolute value of the normalized gradient. For very largewindows, the central limit theorem implies that the normalizedgradient values will have a Gaussian distribution with zeromean and unit variance. A threshold can then be selectedto give a desired PFA, The same threshold is usedfor small windows, since simulations on 5-look data haveshown that even for a 3 3 window, the distribution ofthe normalized gradient is close to Gaussian. (Thresholdswhich for Gaussian output would have yielded PFA’s of0.01, 0.001, and 0.0001 instead produced false edge detec-tions with frequency 0.009 94, 0.000 91, and 0.000 06, respec-tively.)

To detect edges of different strengths, the algorithm usesrectangular windows applied in order of increasing size. Wherean edge has already been detected, the window is reducedin size so as to exclude the edge; if this causes a reductionexceeding 75%, no edge is set. This limits edge thickeningand means that the larger windows are used mainly to detectlow contrast edges.

2) Segment Growing:Detected edges are used to limit thesegment growing stage. Segments are grown by fitting discsinside regions where no edges have been detected. Discs ofdiameter 64, 32, 16, 8, 4, 2, and 1 pixels are fitted in orderof decreasing size. All discs of a given size which overlap orabut are merged to form a single segment. When a segmentdefined in terms of discs all of the same size overlaps or abutssegments defined in terms of larger discs, it is reduced bydiscarding the pixels in the overlap, then merged with theneighboring segment with the closest mean value. (See [11]for an illustration of this process.)

3) Halting the Segmentation Process:Each segment grow-ing stage is followed by calculation of the averagedefined by (8). As the segmentation improves, this would beexpected to decrease to a limiting value dependent only onthe number of looks. Iteration halts when it shows an increasefrom one iteration to the next, which typically corresponds to afluctuation about the limiting value. The previous segmentationis then output, corresponding to the first minimum in theaverage

Page 5: Quantitative comparison of the performance of SAR segmentation algorithms

1538 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 7, NO. 11, NOVEMBER 1998

Fig. 2. Twelve permitted connections between a central pixel (x) and twoof its neighbors.

B. ANNEAL

ANNEAL uses simulated annealing [7] to generate a max-imum a posteriori (MAP) reconstruction of the underlying

The a posteriori distribution of the imagegiven the observed image is proportional toFor spatially independent speckle, is simply a productof gamma distributed pdf’s, one for each pixel in the image.The prior distribution defines the spatial relations withinthe image. If values are considered to be independentthen thea posterioridistribution is maximized by the originalimage. To generate a more useful reconstruction, some spatialcorrelation must be introduced into the model. Here we usethe model described in [6] and [12], in which the of a givenpixel is related to just two of its neighbors. Twelve 3-pixelconnectivity patterns are allowed, as shown in Fig. 2. Thissimple connectivity permits smoothing to take place withoutthe burden of too much computational complexity.

For each connectivity pattern that is unaffected by edges, theof the central pixel is estimated by assuming that the

values of all three pixels are drawn from a gamma distributionwhose mean is that of the central pixel. Thevalues for theneighboring pixels, and are taken from the current stateof the system. The likelihood that at pixel is giventhe original image is therefore

(19)

which is maximized when

(20)

The quantity controls the amount of smoothing. Theparameter is set by the simulated annealing process. Asit is increased the solution is dominated less by the originalpixel value and more by the surrounding values.

This process is modified once edges have been detected.For each connectivity pattern in which one of the outer pixelsis separated from the central pixel by an edge the outer pixelis ignored. Maximizing the likelihood over the remaining twopixels yields

(21)

If the central pixel is separated from both outer pixels byedges, it is treated in isolation, with the MAP solution

(22)

An edge map is generated after each iteration, by setting anedge between pixels with values and for which

(23)

where is a fixed threshold. Closed boundaries in the resultingedge map yield a segmentation.

Because of the difficulty in simultaneously calculating theMAP solution for all pixels, a simulated annealing approachis adopted. This iterative procedure uses the original imageas its initial estimate of During each iterative cycle, allpixels are visited in a random order and at each pixel thelikelihood of the twelve possible configurations is calculated.One configuration is then randomly selected, weighted by aprobability determined by the likelihood values. The MAPsolution corresponding to the chosen configuration is used toupdate the image, until the new image is complete,whereupon a new edge map is generated. This process isrepeated for a fixed number of iterations and with a fixedvalue of and If the standard deviation of the imagehas decreased during this iterative cycle, a new cycle ofreconstructions is undertaken, with increasedand decreased

If the standard deviation has increased, the previousimage is output.

C. Dynamic Segment Merging

After either form of segmentation, neighboring segmentswhich are not statistically distinct are merged. To do this, wecalculate the probability that each pair of adjacent segmentshas the same mean intensity (so that their separating edge arosesimply from speckle). This calculation uses the normalizedintensity ratio, defined as

(24)

for segments and When of two homogeneous segmentshas a given normalized ratio the cumulative distribution of

is given by (25), shown at the bottom of the page, where

(25)

Page 6: Quantitative comparison of the performance of SAR segmentation algorithms

CAVES et al.: SAR SEGMENTATION ALGORITHMS 1539

(a) (b)

(c) (d)

Fig. 3. (a) ERS-1 PRI image of Feltwell, U.K.; (b) simulated�0: (c) Edge map corresponding to (b). (d) Simulated 5-look image. The images are128 � 128 pixels in size and are displayed in amplitude.

is the beta function and is the incompletebeta function [13], [14]. The probability that speckle aloneproduces a value of the intensity ratio at least as large asisthen given by

Pairs of segments for which this probability exceeds agiven threshold, are merged, but this must be carried outdynamically in order to yield good results [15]. The edges inthe image are first ordered from highest to lowest probabilityof being purely due to speckle. If the probability of the firstedge in the list exceeds the edge is deleted. This modifiessome of the probabilities and leads to a new ordering. Theprocess is repeated until the first element in the list no longerexceeds the threshold.

V. COMPARISONS OFSEGMENTATION PERFORMANCE

In this section, the performance of the two segmentationalgorithms described in Section IV is assessed visually andusing the measures defined in Section III. The tests are basedon a 256 256 extract from an ERS-1 PRI three-look imageof an agricultural scene near Feltwell, U.K., taken on 6 June

1992. The oversampling inherent in forming the PRI imagescauses adjacent pixels to be highly correlated (in excess of0.5 in both range and azimuth). This greatly degrades theperformance of the algorithms, whose theoretical formulationis based on independent pixels (similar remarks apply to manyof the image processing techniques derived for SAR data).Hence, 2 2 block averaging was applied to the image priorto segmentation, in order to reduce the spatial correlation.This also alters the effective number of looks from three toapproximately five [14]. The resulting 128 128 image isshown in Fig. 3(a). Bright and dark fields correspond roughlyto cereal and root crops respectively; pointlike responses areprobably due to strong backscatter from buildings.

Although algorithm performance on real data is the impor-tant issue, tests on simulated data are also instructive, sincewe can then guarantee that the image being tested meetsthe conditions assumed by the algorithms. Nonetheless, thesimulations should as far as possible be representative ofimages met in practice, with similar structural features andbrightness variations. To this end, the image in Fig. 3(a) was

Page 7: Quantitative comparison of the performance of SAR segmentation algorithms

1540 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 7, NO. 11, NOVEMBER 1998

Fig. 4. Average area of segments in the real and ten simulated images.

segmented using RWSEG with andThe resulting mean image and its associated edge map

are shown as Fig. 3(b) and (c), and will be referred to asthe control. To simulate a SAR image meeting exactly theconditions assumed by the algorithms, each pixel in the controlis multiplied by a sample from a unit mean gamma distributionwith order parameter samples at different pixels areindependent. Ten simulated images were generated in thisway, each with a different speckle realization. One of thesesimulated images is shown as Fig. 3(d).

A. Segmentation Results

In order to compare algorithms, the edge detection thresholdfor RWSEG was chosen so that both ANNEAL and RWSEGgave approximately the same average segment area aftersegment merging. This required Since thisis much higher than the value used to create Fig. 3(b), wewould expect all edges in the simulated images to be detected.Segment merging was applied with (as inedge detection) and [as in creating Fig. 3(b)]. Theaverage area of the segments generated by the two algorithmsfor the real and simulated images is plotted in Fig. 4, wherethe average segment area in the control is also marked. Thisconfirms that the two algorithms give segments of similararea, with no systematic bias apparent in the simulations. For

the average area is well below that in thecontrol, while for it tends to slightly exceed thatin the control.

Very noticeable is that segments detected in the real imageare about half the area of those in the simulated images. Thismay be due to structure present in the real image but notin the control. However, an important contributing factor isresidual correlation in the real image after the preaveraging.This increases the probability of false edge detection for a fixedvalue of so that more segments are formed. (For example,introducing a correlation coefficient of only 0.071 betweenadjacent pixels in segments of area 150 pixels is equivalent todoubling when is set at

1) Visual Comparison:Before using quantitative methods,some comments based on visual inspection are pertinent.Fig. 5 shows the mean, edge and ratio images produced fromthe real image by RWSEG and ANNEAL when

Most of the extended features are readily discernible inboth mean images [Fig. 5(a) and (b)], although in some areasANNEAL appears to preserve structure better than RWSEG.For example, the dark regions directly below the large tri-angular region in the top right corner are better defined byANNEAL than by RWSEG [compare Fig. 3(a)]. The RWSEGedge image appears more cluttered than that produced byANNEAL [Fig. 5(c) and (d)], despite both containing thesame number of segments. This is because RWSEG producessignificantly more small segments, and boundaries that tend tobe more jagged. Both algorithms (but particularly RWSEG)appear to oversegment the image. An obvious example isthe large triangular region in the top right corner whichis split into numerous segments. Finally, more structure isapparent in the ratio image produced by RWSEG than byANNEAL [Fig. 5(e) and (f)]. Most of the obvious featuresin the RWSEG ratio image are linear (this is much cleareron a computer screen than is apparent in the reproducedimage), suggesting that a primary cause is slight offsets insegment boundaries.

Fig. 6 shows the corresponding results for a simulated im-age. There is less oversegmentation and much less differencebetween the edge maps generated by the two algorithms.As with the real image, the segment boundaries found byANNEAL are less jagged than those found by RWSEG. Bothratio images appear homogeneous. In addition, most segmentsin the control can be identified with single segments in theRWSEG and ANNEAL segmentations, with minor differencesin shape; this is most easily observed by comparing theedge maps.

The results of merging with are not shown.For the real image there is less oversegmentation, butthe ratio images exhibit more features. The only apparentchange in the simulated image is that some small segmentsare merged.

B. Segment Homogeneity

The visual inspection described above suggests that AN-NEAL produces better results than RWSEG on real data, buton images known to meet the data model there is little toseparate them. We now assess whether quantitative measuresare consistent with these impressions.

For each segmentation the and NL measures werecalculated for all segments and were corrected for bias using(6) and (17). The results for the real image are displayed asa log–log scatterplot against segment area in Fig. 7. Here thehorizontal lines indicate the expected values for the measures,calculated from (2) and (3); intervals of one standard devi-ation on either side of the expected values are marked bysolid lines.

Observed valuesbelow the expected value appear to indi-cate segments which are unusually smooth, but occur simplybecause of estimation error, which decreases as segment areaincreases. Valuesgreater than expected occur as a result of

Page 8: Quantitative comparison of the performance of SAR segmentation algorithms

CAVES et al.: SAR SEGMENTATION ALGORITHMS 1541

(a) (b)

(c) (d)

(e) (f)

Fig. 5. Segmentation results for the real image withpm = 2 � 10�2

: The left hand column of figures is for RWSEG and shows (a) the mean intensitywithin segments, (c) the edge map, and (e) the ratio image formed by dividing the original image Fig. 3(a) by the segmented image Fig. 5(a). The righthand column shows the corresponding results for ANNEAL.

both estimation error and heterogeneity. Only the first ofthese reduces as segment area increases, so that there is noconsistent trend down to the expected value from above. For

each measure, the values exceeding the mean produced byRWSEG show more scatter and larger deviations than forANNEAL [compare Fig. 7(a) with 7(c) and 7(b) with 7(d)].

Page 9: Quantitative comparison of the performance of SAR segmentation algorithms

1542 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 7, NO. 11, NOVEMBER 1998

(a) (b)

(c) (d)

(e) (f)

Fig. 6. As for Fig. 5, but using a simulated image.

Similar behavior is observed for the simulated images (notshown). In the simulations, noticeably more scatter is observedfrom the segmentations than occurs when the control is usedto impose the true segment boundaries.

C. Global Measures of Segment Homogeneity

Although indicating significant amounts of heterogeneity inthe segments produced by both algorithms, the scatterplotsdo not provide any clear guide as to how to answer two

Page 10: Quantitative comparison of the performance of SAR segmentation algorithms

CAVES et al.: SAR SEGMENTATION ALGORITHMS 1543

(a) (b)

(c) (d)

Fig. 7. Log–log scatterplots of segment area versus unbiased segment homogeneity measures for the real image withpm = 2 � 10�2): (a) RWSEG:CV2

: (b) RWSEG: NL. (c) ANNEAL:CV2: (d) ANNEAL: NL. For each measure, the solid lines mark the mean value and the 1 standard deviation

intervals on either side of the mean.

key questions.

• Which algorithm performs better?• Which test provides the better measure of performance?

The situation becomes clearer when global measures of seg-mentation performance are derived, based on (8) and (15).Results are summarized in Tables I and II. For the real image,only a single value of each measure is available, while forthe ten simulated images, the mean and standard deviationof the observed measures is listed. The third column givesthe difference between the ideal value and the observedmean value, normalized by the standard deviation of theobservations. Also included are the values calculated for thesimulated images when a perfect segmentation is imposedusing the control.

1) Squared Coefficient of Variation:From (6)–(8), a per-fect segmentation of the simulated images should yield anexpected value of of 0.1988 with a standard deviation of0.0024. These values are very close to those observed for thecontrol segmentations. In contrast, the values ofproducedby both segmentation algorithms have an average and standarddeviation significantly greater than theory. This indicates thatsegments cut across true boundaries in the underlyingimage. It can also be seen that for each value of the

average value of produced by RWSEG is further fromthe ideal than that produced by ANNEAL, as measured bythe normalized difference, and that the average moves furtherfrom the ideal as decreases. ANNEAL therefore appearsto produce more homogeneous segmentations than RWSEG,but its performance is demonstrably not perfect. An importantfeature revealed by the table is that ANNEAL withperforms better than RWSEG with eventhough it generates segments which are approximately 30%larger (see Fig. 4). Hence, ANNEAL appears better able tocompensate for oversegmentation.

The difference between the two algorithms is greatly mag-nified when they are applied to the real image, with a markeddeterioration in the performance of RWSEG. For ANNEAL,the observed values of are comparable with those ob-tained on the simulations, but increase by approximately 30%for RWSEG. These quantitative results confirm the visualimpressions discussed in Section V-A1, where the superiorperformance of ANNEAL was much clearer on real thansimulated data.

2) Normalized Log Intensity:Many of the patterns ob-served for the average are repeated with the NL measure

(Table II). For the control segmentations, the mean

Page 11: Quantitative comparison of the performance of SAR segmentation algorithms

1544 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 7, NO. 11, NOVEMBER 1998

TABLE IGLOBAL AVERAGE CV2 MEASURE, �2

r

TABLE IIGLOBAL AVERAGE NL M EASURE, jDj

and standard deviation of the average value of arevery close to their expected values of 0.1028 and 0.001 14(see Section II-B). For both values of ANNEAL givesan average value of significantly greater than for aperfect segmentation. RWSEG performs markedly worsethan ANNEAL as measured in terms of the normalizeddifference, even when the merging threshold was set atfor ANNEAL but at for RWSEG. Differencesbetween the two algorithms again become more marked whenreal data are considered. For RWSEG the average NL measureis always considerably greater than for a perfect segmentation,while for ANNEAL the average value of is lower thanthe ideal case for but comparable to it when

The behavior of ANNEAL is a result of thecompetition between homogeneity and bias. In particular,small segments contribute a significant positive bias to(and, hence, a negative bias to for example, in five-lookdata, the bias from a segment of area two pixels is 0.05.Segments tend to be smaller when than when

leading to a larger bias which outweighs effectsdue to heterogeneity.

The tables permit some comment on the relative sensitivityof the two tests. Comparing the last columns in Tables I andII, we can see that for RWSEG the deviation from the idealvalue is greater for than for For ANNEAL, thereis no consistent picture; shows slightly more deviationfrom the mean than at and slightly less at

Hence, there is no clear evidence to prefereither of the two measures as a global homogeneity test.

VI. DISCUSSION

In the results above, there are two intertwined strands,each important in its own right. The first is the developmentof measures based on known properties of speckle, whichcan be used to compare the quality of segmentations andto recognize significant deviation from segment homogeneity.Oversegmentation is invisible to the measures, so that theapproach to assessing segmentation schemes described hereis not complete. However, an obvious criterion by which to

compare segmentations with similar homogeneity is to preferthe one in which the average segment area is larger.

The principles on which the homogeneity measures arebased are independent of the algorithms they are used totest, but have been illustrated using two algorithms developedto operate on SAR data. This provides the second strand inthe paper. The algorithms were first constrained to producethe same average area (which in practice simply involvedchoosing an edge detection threshold in RWSEG). Under theseconditions, the ANNEAL approach proved markedly superior.As expected, homogeneity tended to degrade as segments wereallowed to become bigger by relaxing the merging threshold.However, ANNEAL produces more homogeneous segmentsthan RWSEG even under conditions where its segments are onaverage 30% larger. Visually, ANNEAL also tends to producesmoother boundaries in both the real and simulated test data,which is consistent with human expectations for this type ofagricultural area. No quantitative measure has been used tocapture this property, but measures of image complexity arelikely to confirm this impression.

Despite the measurable superiority of ANNEAL, both al-gorithms performed creditably when applied to simulated dataknown to meet the algorithm assumptions. When applied toreal data, ANNEAL proved more robust than RWSEG, withmarked deterioration in the measured performance of thelatter, in line with visual impressions. It produced segmentsof comparable area to RWSEG while retaining homogeneityperformance comparable to that achieved on simulated data.The clear indication is that ANNEAL is the better algorithm,especially as it can be optimized to run significantly faster thanRWSEG (see comparisons in [9]).

Even though we attempted to produce representative sim-ulations, it is clear from the differences between the resultsfrom real and simulated images that some essential propertiesof the real data are not captured by the simulations. This is astimulus both to understanding the underlying structure of thatdata better and to using that understanding in order to improvethe algorithms. A likely source of the problem (which affectsboth measures and segmentation methods) is the cartoon modelimposed on By measuring the quality of a segmentation interms of homogeneity, segments which are textured or containgentle gradients will lead to apparently poor performance,despite their common occurrence in real scenes and theirimportance in any visual interpretation of SAR images. Notethat neither of the two segmentation methods discussed aboveexplicitly forces splitting of segments containing gradients(unlike, for example, split and merge techniques), so that themeasures impose stronger conditions than are intrinsic to thealgorithms. In practice, however, the operators employed byRWSEG will tend to detect edges in regions with smoothintensity gradients, causing artificial splitting of segments.Texture is also likely to cause problems for RWSEG because ofthe strong dependence of its edge detection phase on expectedspeckle statistics. Texture will increase false edge detectionsand hence cause more segments to be found. ANNEAL isaware of texture through its use of a gamma distributed modelfor but even here the edge detection step will be affected.

Page 12: Quantitative comparison of the performance of SAR segmentation algorithms

CAVES et al.: SAR SEGMENTATION ALGORITHMS 1545

The merging step applied after the segmentations also needsfurther examination. In RWSEG, the segments being mergedare explicitly treated as uniform, but no such condition isimposed in ANNEAL. The merging forces a homogeneousmodel for the segments onto the algorithm. However, theinitial segmentation produced by the form of ANNEAL usedin this paper contains far too many segments to be of value instructuring the image, so that some form of merging appearsessential. This highlights the need to strike a balance betweenthe competing requirements of finding all the structure inthe image and avoiding oversegmentation. There may be noeffective resolution of this issue except in the context of theapplication in which the segmented images are to be used.

These criticisms of the work presented above should notobscure the fact that significant progress has been made witha difficult and poorly understood problem. It is encouragingthat firmly based but purely statistical tests seem to corroboratevisual impressions of the relative quality of different types ofsegmentation. This suggests that the methodology developedhere can be used for meaningful comparison of differentalgorithms, and we are currently employing it to carry outa comprehensive review of segmentation schemes. By clearlyidentifying the best algorithms, this will serve as a guide topossible users of such schemes, who are currently faced witha bewildering range of possibilities. By extending the rangeof measures, we also hope to provide a more comprehensiveapproach to quantifying segmentation performance, to use thisto identify weaknesses in current algorithms and hence tomotivate further algorithm development.

APPENDIX

STATISTICS OF THE RATIO IMAGE

We will first prove that for a homogeneous segment of sizepixels, the joint pdf of two ratio values and

within the segment is given by

(26)

where and

(27)

is the multivariate form of the beta function [8]. Without lossof generality, (26) will be proved for and

Let be a set of independent gamma distributedintensity values with mean and order parameter Theirarithmetic mean, is also gamma distributed with meanand order parameter Let Thenand are independent, with joint pdf

(28)

where Transforming variables toand leads to the joint pdf of

and

(29)

Integrating over gives (26).A further integration over one of the ratio values shows that

the marginal pdf of a single ratio value within the segment isa beta distribution

(30)

where This arises, instead of a gamma distribution,because of the error in estimating the average intensity. As

the error tends to zero and the ratio values becomegamma distributed and independent, with unit mean and orderparameter

The joint moments of two ratio values within the segmentare given by

(31)

from which it is easily shown that the ratio values have unitmean, variance

(32)

and that two ratio values within the same segment havecorrelation coefficient

(33)

A. Mean and Variance of the Squared Coefficient of Variation

The squared coefficient of variation over a homogeneoussegment may be expressed in term of the ratio values as

(34)

From (32), the mean is then given by

(35)

The variance of the is given by

(36)

which from (31) gives

(37)

Page 13: Quantitative comparison of the performance of SAR segmentation algorithms

1546 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 7, NO. 11, NOVEMBER 1998

B. Mean and Variance of Normalized Log Intensity

The normalized log intensity over a segment may be ex-pressed in terms of the ratio values as

(38)

so that, over a homogeneous segment its mean is

(39)

and its variance is given by

(40)

To calculate and it is convenientto use the joint characteristic function of anddefined as

(41)

From (31) this is given by

(42)

Differentiating the characteristic function yields the moments

(43)

(44)

(45)

Hence, the normalized log intensity has mean

(46)

and variance

(47)

ACKNOWLEDGMENT

The authors wish to thank ESA for supplying the data usedin this study.

REFERENCES

[1] F. T. Ulaby, R. K. Moore, and A. K. Fung,Microwave Remote Sensing,vol. 3. Boston, MA: Artech House, 1986.

[2] V. S. Frost, J. A. Stiles, K. S. Shanmugan, and J. C. Holtzmann,“A model for radar images and its application to adaptive filtering ofmultiplicative noise,”IEEE Trans. Pattern Anal. Machine Intell., vol.PAMI-4, pp. 157–166, 1982.

[3] J. S. Lee, “Speckle analysis and smoothing of synthetic aperture radarimages,”Comput. Graph. Image Process., vol. 17, pp. 24–32, 1981.

[4] D. T. Kuan, A. A. Sawchuk, T. C. Strand, and P. Chavel, “Adaptiverestoration of images with speckle,”IEEE Trans. Acoust., Speech, SignalProcessing, vol. ASSP-35, pp. 373–383, 1987.

[5] A. Lopes, E. Nezry, R. Touzi, and H. Laur, “Structure detection andadaptive speckle filtering in SAR images,”Int. J. Remote Sens., vol. 14,pp. 1735–1758, 1993.

[6] R. G. White, “A simulated annealing algorithm for radar cross-sectionestimation and segmentation,” inProc. Applications of Artificial NeuralNetworks, Orlando, FL, 1994.

[7] S. Geman and D. Geman, “Stochastic relaxation, Gibbs distributions,and the Bayesian restoration of images,”IEEE Trans. Pattern Anal.Machine Intell., vol. PAMI-6, pp. 721–741, 1984.

[8] M. Abramowitz and I. Stegun,Handbook of Mathematical Functions.New York: Dover, 1965.

[9] K. D. Grover, S. Quegan, and C. C. F. Yanasse, “Quantitative estimationof tropical forest cover by SAR,”IEEE Trans. Geosci. Remote Sensing,to be published.

[10] R. G. White, “Low level segmentation of noisy imagery,” Tech. Rep.3900, Roy. Signals Radar Estab., Malvern, U.K., 1986.

[11] , “Change detection in SAR imagery,”Int. J. Remote Sensing,vol. 12, pp. 339–360, 1991.

[12] , “Cross-section estimation by simulated annealing,” inProc.IGARSS’94, Pasadena, CA, 1994, pp. 2188–2190.

[13] R. Touzi, A. Lopes, and P. Bousquet, “A statistical and geometricaledge detector for SAR images,”IEEE Trans. Geosci. Remote Sensing,vol. 26, pp. 764–773, 1988.

[14] R. G. Caves, “Automatic matching of features in synthetic aperture radardata to digital map data,” Ph.D. dissertation, Univ. Sheffield, Sheffield,U.K., 1993.

[15] S. Quegan, R. G. Caves, K. D. Grover, and R. G. White, “Segmentationand change detection in ERS-1 images over East Anglia,” inProc. 1stERS-1 Symp.: Early Results, Cannes, France, 1993, pp. 617–622.

Ronald Caves received the M.A. degree in phi-losophy and mathematics and the M.Sc. degree inremote sensing and image processing technologyfrom the University of Edinburgh, U.K., in 1987and 1988, respectively, and the Ph.D. degree fromthe University of Sheffield, U.K., in 1993.

After working on raster mapping research at theOrdnance Survey, Southampton, U.K., for one year,he joined the department of Applied and Computa-tional Mathematics, University of Sheffield, in 1989,as a Research Assistant working on SAR image

analysis. His main interests have been in segmentation, feature detection,and clutter statistics.

Dr. Caves was awarded the Remote Sensing Society Student Award forbest Ph.D. thesis of 1993.

Shaun Quegan(M’90) received the B.A. and M.Sc.degrees in mathematics from the University of War-wick, U.K., in 1970 and 1972, respectively, and thePh.D. degree from the University of Sheffield, U.K.,in 1982.

He taught mathematics and physics for severalyears before joining Marconi Research Ltd. in 1982,becoming Group Chief of the Remote Sensing Ap-plications Group in 1984. While there, he workedon a range of SAR topics, including image quality,land use applications, feature extraction, scattering

theory, propagation effects, and image processing. In 1986, he joined theUniversity of Sheffield, gaining a Professorship in 1993, and is the currentDirector of the Sheffield Centre for Earth Observation Science. His majorinterests are in polarimetric SAR, SAR image understanding, and land useapplications of airborne and spaceborne SAR.

Richard White received the B.Sc. degree in physicsfrom Oxford University, Oxford, U.K., in 1981, andthe Ph.D. degree from the University of Loughbor-ough, Loughborough, U.K., in 1984.

He joined the SAR section at the Defence Re-search Agency (DERA, then known as RSRE),Great Malvern, U.K., in 1984. Since that time he hasworked on a range of battlefield surveillence topicsincluding SAR autofocus, SAR segmentation andtarget detection, and moving target detection andindication (MTI). At present, he leads the DERA

SAR and MTI group at Malvern looking at both air and spaceborne radarsystems.