extracting meaningful curves from images

HAL Id: inria-00071517https://hal.inria.fr/inria-00071517

Submitted on 23 May 2006

HAL is a multi-disciplinary open accessarchive for the deposit and dissemination of sci-entific research documents, whether they are pub-lished or not. The documents may come fromteaching and research institutions in France orabroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, estdestinée au dépôt et à la diffusion de documentsscientifiques de niveau recherche, publiés ou non,émanant des établissements d’enseignement et derecherche français ou étrangers, des laboratoirespublics ou privés.

Extracting meaningful curves from imagesFrédéric Cao, Pablo Musé, Frédéric Sur

To cite this version:Frédéric Cao, Pablo Musé, Frédéric Sur. Extracting meaningful curves from images. [Research Report]RR-5067, INRIA. 2003. �inria-00071517�

https://hal.inria.fr/inria-00071517

https://hal.archives-ouvertes.fr

ISS

N 0

249-

6399

ISR

N IN

RIA

/RR

--50

67--

FR

+E

NG

ap por t de r ech er ch e

INSTITUT NATIONAL DE RECHERCHE EN INFORMATIQUE ET EN AUTOMATIQUE

Extracting meaningful curves from images

Frédéric Cao Pablo Musé, and Frédéric Sur

N˚5067

Décembre 2003

THÈME 3

Extracting meaningful curves from images

Frédéric Cao�

Pablo Musé�

, and Frédéric Sur�

Thème 3 — Interaction homme-machine,images, données, connaissances

Projet Vista

Rapport de recherche n˚5067 — Décembre 2003 — 38 pages

Abstract: Since the beginning, Mathematical Morphology has proposed to extract shapes fromimages as connected components of level sets. These methods have proved very efficient in shaperecognition and shape analysis. In this paper, we present an improved method to select the mostmeaningful level lines (boundaries of level sets) from an image. This extraction can be based onstatistical arguments, leading to a parameter free algorithm. It permits to roughly extract all piecesof level lines of an image, that coincide with pieces of edges. By this method, the number ofencoded level lines is reduced by a factor 100, without any loss of shape contents. In contrast toedge detections algorithm or snakes methods, such a level lines selection method delivers accurateshape elements, without user parameter: no smoothing involved and selection parameters can becomputed by Helmholtz Principle.

Key-words: Edge detection, Mathematical Morphology, Topographic map, level lines, HelmholtzPrinciple, Gestalt theory

(Résumé : tsvp)

�IRISA/INRIA, Campus universitaire de Beaulieu, 35042, Rennes Cedex, FRANCE, [email protected]�École Normale Supérieure de Cachan, 61 avenue du Président Wilson, 94235 Cachan Cedex, France,

{sur},{muse}@cmla.ens-cachan.fr

Unité de recherche INRIA RennesIRISA, Campus universitaire de Beaulieu, 35042 RENNES Cedex (France)

Téléphone : 02 99 84 71 00 - International : +33 2 99 84 71 00Télécopie : 02 99 84 71 71 - International : +33 2 99 84 71 71

Extraction de courbes significatives dans les images

Résumé : Dès sa naissance, la morphlogie mathématique a proposé d’extraire des formes dans lesimages à partir des composantes connexes des ensembles de niveau. L’efficacité de ces méthodes aété démontrée en reconnaissance et analyse de formes. Dans cet article, nous présentons une méth-ode de sélection des lignes de niveau (les frontières des ensembles de niveau) les plus significativesdans une une image. Cette extraction est basée sur des arguments statistiques, menant à une détec-tion sans paramètre. Elle permet de détecter tous les morceaux de lignes de niveau qui coïncidentavec les contours de l’image, en diminuant le nombre de lignes codables d’un facteur 100, sans pourautant réduire le contenu perceptuel des formes. Contrairement aux méthodes de détection de con-tours ou de contours actifs traditionnelles, cette sélection de lignes de niveau aboutit à des élémentsde formes précis, sans paramètre à régler par l’utilisateur. En outre, aucun lissage n’est nécessaire ettous les autres paramètres sont fixés par le principe de Helmholtz.

Mots-clé : Détection de contours, Morphologie Mathématique, carte topographique, lignes deniveau, principe de Helmholtz, théorie de la Gestalt

Extracting meaningful curves from images 3

Contents

1 Introduction 3

2 Meaningful boundaries 72.1 Helmholtz Principle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72.2 Contrasted boundaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72.3 Maximal boundaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82.4 Discussion on the definition of meaningful contrasted boundaries . . . . . . . . . . . 9

2.4.1 Interpretation of the number of false alarms . . . . . . . . . . . . . . . . . . 92.4.2 Length distribution and meaningfulness . . . . . . . . . . . . . . . . . . . . 112.4.3 Cleaning-up meaningful boundaries . . . . . . . . . . . . . . . . . . . . . . 112.4.4 Geometrical information reduction . . . . . . . . . . . . . . . . . . . . . . . 132.4.5 Meaningful boundaries vs. Haralick’s detector . . . . . . . . . . . . . . . . 152.4.6 Color edges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

3 Image quantization and gradient norm distribution 16

4 Local boundary detection 194.1 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194.2 Experiments on locally contrasted boundaries . . . . . . . . . . . . . . . . . . . . . 20

5 Meaningful boundaries or snakes? 255.1 Definition of local regularity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265.2 Meaningful contrasted and smooth boundary . . . . . . . . . . . . . . . . . . . . . . 275.3 Comparison with active contours . . . . . . . . . . . . . . . . . . . . . . . . . . . . 295.4 Experiments on smooth meaningful boundaries . . . . . . . . . . . . . . . . . . . . 30

6 Meaningful edges 316.1 Edges as pieces of level lines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 316.2 Experiments on edges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

7 Conclusion 32

A Appendix: Numerical estimation of the Hausdorff dimension of a curve 37

1 Introduction

Natural images are very complex, and despite the progress of modern computers, we cannot handlethe huge amount of information they contain. Thus, the idea of Marr and Hildreth [29] that edgesprovide a good summary of images is still vivid. Since their seminal works, efforts have been carriedon local methods. Marr defined edges as zero-crossings of the laplacian [28], and Haralick [19] pro-posed a more correct definition which is equivalent to the zero-crossings of �� where

RR n˚5067

4 Frédéric Cao Pablo Musé, and Frédéric Sur

� � and � � � are respectively the gradient and the second derivative of the image. In his famouspaper [2], Canny gives a filter that tries to optimize the edge localization, but which is equivalent toHaralick’s. Although they are technically sound, local methods also have an immediate drawback:while edges are usually thought about as curves, these methods detect sets of points with an orienta-tion (edgels) that have to be connected afterward. Moreover, they require different thresholds sincecontrast has no absolute meaning. In addition, they are sensitive to noise (since they use deriva-tives of the image) and can only be considered through a multiscale process. The choice of thesethresholds depends on the observed image, and is not that easy. It is also known that edge is nota completely local concept and that it does not rely entirely on contrast. Indeed, following GestaltTheory [20, 42], shapes (and thus edges) result from the collaboration of a small set of perceptuallaws (called “partial gestalts” by Desolneux, Moisan and Morel [14]), and contrast is only one ofthem. Among others, we can cite alignments, symmetry, convexity, closedness, good continuation,etc...Other theories, related to edge detection, explicitly use good continuation, which means in this caseregularity of curves. The most famous is certainly the theory of active contours (or snakes) [21],where optimal boundaries result from a compromise between their intrinsic regularity and the ex-trinsic value of the image contrast along the active contours. The main weaknesses of this theoryare the number of parameters and the sensitivity to an initial guess. More recent methods proposeto initiate the detection with many contours, most of which will hopefully disappear [9]. But again,there is no measure on the certainty of the remaining detected contours.The Mathematical Morphology school proposed an alternative to the local approaches above. Fol-lowing morphologists, the image information is completely contained in a family of binary imagesthat are obtained by thresholding the images at given values [30, 39]. This is equivalent to consider-ing level sets; the level set of � at the value

�is

�� R � � � � >��

(1)

Obviously, if we only consider a coarsely quantized set of different gray levels, information is lost,especially in textures. Nevertheless, it is worth noting how large shapes are already present with asfew as 5 or 6 levels. As soon remarked by Serra [39], no information is lost at all, since we canreconstruct an image from the whole family of its level sets, by

� � �� R �� Thus, the level sets do not only give a convenient way to extract information, they provide a completerepresentation of images. Alternative complete representations are, for instance, Fourier or waveletsdecomposition [27]. But while these lasts are very adequate for image compression (they are used inthe JPEG 2000 standard), they are not very well adapted in shape analysis, since their basic elementshave no immediate perceptual interpretation. On the contrary, morphologists soon remarked thatboundaries of level sets fits parts of objects boundaries very well. They call level lines the topologicalboundaries of connected components of level sets, and topographic map of an image, the collectionof all its level lines. The topographic map also gives a complete representation of an image andenjoys several important advantages [6]:

INRIA


� It is invariant with respect to contrast change. It is not invariant to illumination change, sincein this case, the image is really different, although it represents the same scene. However,many level lines still are locally the same.

� It is not as local as sets of edges, since level lines are Jordan curves that are either closed ormeet the image borders. (This property requires that the image has bounded variations [16]).

� It is a hierarchical representation: since level sets are ordered by the inclusion relation (and soare there connected components), the topographic map may be embedded in a tree structure.

� But most important regarding the main subject of this paper, object contours locally coincidewith level lines very well. Basically, level lines are everywhere normal to the gradient as edges.On the other hand, level lines are accurate at occlusions. Whereas, edges detectors usually failnear T-junctions (and additional treatments are necessary), there are several level lines at ajunction. The order of the multiple junction coincide with the number of level lines [4]. Weshall go back to this in Sect.2.4.5.

Figure 1: Level lines and T-junction. Depending on the grey level configuration between shapesand background, level lines may follow or not (as on the figure) the objects boundary. In any case,junctions appear where two level lines separate. Here, there are two kinds of level lines: the occludedcircle and the shape composed of the union of the circle and the square. The square itself may beretrieved by difference.

The level sets representation has recently been used, with success, for image simplification andsegmentation. In particular, it was shown that it allowed to define multiscale representation of im-ages [31, 37, 38], while avoiding the main drawbacks of linear scale space theory [23, 43], namelyan oversmoothing of contours.We are convinced that level lines may directly give usable curves for any shape recognition algo-rithm. The main drawback of the topographic map representation is its lack of compactness. First,since it is complete, it contains all the texture information. The level lines in textures are usuallyvery complicated, and are not always useful for blind shape recognition. (The opposite may be true,

RR n˚5067


for instance for very accurate image registration). Moreover, because of noise and interpolation,many level lines may follow a single contour. Thus, it is useful, for practical computational reasons,to select only the most meaningful level lines.Recently, Desolneux et al. proposed a parameterless algorithm to detect contrasted level lines (calledmeaningful boundaries) in grey level images [13]. Their method, which needs no parameter tuning,relies on a perceptual principle called Helmholtz Principe. Experimentally, meaningful boundariesare often very close to minimizers of any reasonable snake energy [15]. This adequation of mean-ingful boundaries and snakes is a bit paradoxical since, unlike snakes, no local regularity is imposedon meaningful boundaries.However, the algorithm of Desolneux et al. raises several questions and objections. First, because ofimage quantization, some edges are missing (lots of them in some low contrasted images). Second,it uses a global information on contrast (the histogram). This yields an overdetection in regions withimportant contrast and a subdetection in low contrasted regions (it is the so-called blue sky effect).Third, regularity of edges is not used for the detection.In this paper, we discuss and answer these three objections, with a significant improvement. Ourconclusions are the following: quantization noise can be removed with a very slight smoothing (forinstance a gaussian with a standard deviation equal to 0.5). We cannot talk of multiscale edges as inMarr-Hildreth theory, since, in practice, this smoothing is invisible, and this value does not dependon the experiment. We also propose a method considering contrast in a more local way. If we usemore local contrast information, we can remove edges in texture. Whether this is a nice thing ornot depends on the application: for very accurate registration, texture-edges can be useful, whilethey must be useless for shape recognition. (For texture recognition, harmonic analysis methodsare certainly more efficient.) Last, we introduce a local and stable measure of regularity of a curveand use it for smooth edges detection. As already noticed in [3], regularity is often sufficient todetect some very meaningful edges. Nevertheless, general belief is that both regularity and contrastare useful for edge detection. We experimentally check that contrast and regularity are often veryredundant. This redundancy is used to make the detection even more robust, but does not change theresults of contrast based detection alone. We are also able to tune automatically the relative weightof regularity and contrast, which is a recurrent question in active contours theory.

The plan is as follows. In Sect. 2, we recall the bases of Helmholtz Principle, the definition ofmeaningful boundaries of Desolneux, Moisan and Morel. We will justify and discuss this definition,which was not explicitly made in [13]. Errors due to the quantization of the contrast are corrected inSect. 3. In Sect. 4, we describe a procedure that automatically handles local contrast variations. InSect. 5, we explain how both contrast and regularity criteria can naturally be mixed in a probabilisticsetting by introducing a measure of regularity on random level lines in Sect. 5.1. In Sect. 6, wedescribe a procedure to cut boundaries into their most meaningful parts, before concluding in Sect. 7.

INRIA


2 Meaningful boundaries

2.1 Helmholtz Principle

Helmholtz Principle is a perceptual principle asserting that conspicuous structures may be viewedas exceptions to randomness. The unexpected configurations we must be interested in, are given bythe perceptual laws of Gestalt Theory [20, 42], as alignments, closedness of sets, parallelism etc...Given a class of geometrical events, we shall say that it is � -meaningful, if, in average, less than �of these events are observed in an image of white noise. Thus, features are detected a contrario,since we consider them as meaningful if they have a very low probability to occur by chance inan image of noise of the same size. In a well sampled white noise image, two objects that are at adistance lower than the Nyquist distance (that is to say 2 pixels) cannot be considered as independent.In what follows, we use the expression “independent points” to qualify points at Nyquist distance(and thus independent in white noise). It can be shown that Helmholtz Principle allows to detectsome elementary and important geometrical structures (see [15] and references therein). All theparameters can be reduced to the choice of � , which is the expected number of detections in a whitenoise image (which we call “false detections”). In practice, � can always be taken equal to 1 sincethe number of objects we want to detect is quite large. Moreover, meaningful perceptual objectscorrespond to very small values of � (about

��). As a consequence, the detection is in practice

parameter free. We refer the reader to [13, 14] for further details.

2.2 Contrasted boundaries

In order to illustrate Helmholtz principle, we recall here the definition of meaningful boundariesgiven in [13]. It will be also useful since we will discuss this definition in the next sections. Let �be a grey level image. We search the level lines of � along which contrast assumes unexpectedlyhigh values. We compare these values with the empirical histogram of contrast. By translationinvariance, we assume that the contrast values are identically distributed following the law of therandom variable defined by

� ��

� �� (2)

We denote by �� this empirical probability. In [13], Desolneux, Moisan and Morel proposed thefollowing definition.

Definition 1 ([13]) Let �� be the number of level lines of � . A level line ! is an � -meaningfulboundary if

�#"%$ �&! �(')� � � �*� ��+-,/.02143 � � �� 5� � �/6 �87 � (3)

where 9 is the length of ! . This number is called number of false alarms (NFA) of ! .

RR n˚5067


In (3), the NFA is the product of the number of level lines and the probability that a random curve�

containing �� independent samples has its contrast larger than +-, . 04143 � �� 5� everywhere, when

we assume that the contrast values on curve samples are mutually independent. If the NFA is verysmall, this means that this assumption is certainly not valid, leading to an a contrario detection.Notice also that meaningful boundaries are invariant with respect to affine contrast changes. In whatfollows, this detection model will be referred to as the MB model (meaningful boundaries model).

2.3 Maximal boundaries

Since level lines are nested, meaningful boundaries can also be embedded in a tree structure. Asremarked by Desolneux, Moisan and Morel [12], meaningful boundaries usually appear in parallelgroups. This is due to the fact that images are interpolated and that edges are thick (at least twoor three pixels), when images are suitably sampled with respect to Shannon-Nyquist theory. Theseboundaries are redundant, and in applications, it may be useful to eliminate some of them, for timeand memory saving. The previous authors use the notion of maximal monotone section in a levelline tree, as introduced by Monasse [32]. They consider parts of branches of the tree of meaningfulboundaries, such that any node has only one son and the grey level is monotone in this part. Sucha branch part is called a monotone section. It is maximal if it is not strictly contained in anothermonotone section.

Definition 2 ([13]) We say that a meaningful boundary is maximal meaningful if it has a minimalNFA in a maximal monotone section.

Figure 2 illustrates that the loss of information of maximal meaningful boundaries is negligiblecompared to the gain of information compactness.

Figure 2: Maximal meaningful boundaries. 1. Original image, 83,759 level lines 2. All meaningfulboundaries: 11,505 detections. 3. Maximal meaningful boundaries. Only 883 boundaries remain,while the visual loss is very weak.

Since meaningful boundaries inherit the tree structure of the topographic map, they can be usedto reconstruct an image, thus defining an image operator, see Fig. 3. It is a connected operator asdefined by Salembier and Serra [38] (but it is not a filter by reconstruction). It is neither a contrastinvariant operator, since it explicitly uses the gradient value (it only commutes with affine globalcontrast change), nor an idempotent operator.

INRIA


As remarked by Salembier [37], an operator pruning the topographic maps preserves edges very well.Contrary to local operators as for instance grain filter [41], the meaningful boundary reconstructiondoes not simply remove leaves of the tree but also inner nodes corresponding to possibly large (butlow contrasted) level lines.

Figure 3: Original image on the left (99,829 level lines). Right: reconstruction from the 429 maximalmeaningful boundaries. The gray level may not be really significant since, on edges, the maximalmeaningful level line has an intermediate level between both sides of the edge. It would be moreperceptually adequate to set the gray level to the brighter or darker meaningful level line. Neverthe-less, for contrast independent shape recognition purposes, we do not use the grey level value, butonly the geometry of level lines. The most important is that we preserved the main shapes, whileremoving the textures.

2.4 Discussion on the definition of meaningful contrasted boundaries

2.4.1 Interpretation of the number of false alarms

In this section, we give a precise interpretation of Def. 1, which was not explicit in [13]. Let us firstrecall the following classical lemma.

Lemma 1 Let be a random variable and � � � � �� > � . Then for all � �� ,�� 7 � � 6 � �

Assume that is a random variable described by the inverse repartition function � � � � � �� >� � . Assume that � is a random image such that the values � � �� are independent with the samelaw as . Let now � be a set of random curves � !�� in � such that

� � (the cardinality of � ) isindependent from each !�� . For each , we note

� � +-,/. 02143 � �� and �� the (random) length of

RR n˚5067


! � . We assume that � is independent from the pixels crossed by ! .We say that ! � is � -meaningful if

��"%$ � ! � �� 5� � � � � �� 7 � �Proposition 1 The expected number of � -meaningful curves in a random set � of random curves issmaller than � .

Proof. Let us also denote by � the binary random variable equal to 1 if ! � is meaningful and to 0else. Let also � � � � .

E

�� E

�E

� ��

We have assumed that � is independent from the curves. Thus, conditionally to � �� , the lawof � �

� � � is the law of �� , where � � is a binary variable equal to�

is � � � � � � � � 7 � and 0else. By linearity of expectation,

E

� �� E

� ��

� � E � � � � �Since � � is a Bernoulli variable, E � � � � � �� 7 � �� 9 �� 9 � . Again, we have assumed that � is independent of the gradient distribution in the im-age. Thus conditionally to � � 9 , the law of � � � � � � ��

is the law of � � � � � � � . Let us finally denoteby �� the 9 (independent) values of � �� along ! � . We have

�� 7 �"! � ��# � ��+-, .�6 $ 6 � � $ � 7&% ��(' � 6 �*)

� � # +,+.-�6 $ 6 � � �� $ � 7 % �� ' � 6 � )

since � is nonincreasing

��/$ � � # � �� $ � 70% ��(' � 6 � )

by independence

6�� from Lemma 1

�This term does not depend upon 9 , thus��

� � �� 1� 7 �� 9 �� 9 � 6��

� � �� )9�� Hence,

E

� �� 2� 6 � �

This finally implies E % � �� ' 6 � , which exactly means that the expected number of meaning-

ful curves is less than � . 3INRIA


2.4.2 Length distribution and meaningfulness

Let us consider an � -meaningful boundary ! containing 3 points, and� � + ,/. 02143 � �� .

The computation of the NFA of ! involves the probability that, on a curve contained in noise, thegradient is everywhere greater than

�, knowing that the length of the curve is � 3 . The a

contrario model gives the law of +-, . � �� on a curve, conditionally to its length. Precisely, theindependence assumption yields

��+-,/.02143 � �� 5� > � � � 9 �� *� � � � � �The MB model implicitly considers that curves are equally probable, independently from theirlength. Since all the curves we consider are level lines, this is obviously not true in white noise.Indeed, there are much more short level lines than long ones. Thus, the fact that we observe a verylong level line is, in itself, a reason to detect it. This is already suggested in the definition of theNFAs, since as soon as �� 7 �

, a curve becomes meaningful if it is long enough. Experimentsconfirm this fact: more than � ��

of the level lines with a length greater than 1000 are meaningful.The minimal value of the gradient along very long meaningful boundaries is generally very small:less than 1. Are these detections false detections? As far as we observed in real images, the factthat a level line is very long is never casual. The cause of this rare event may be that the curve is anedge or that a smooth gradient generated by a light source. Level lines due to light sources are notedges properly speaking: they do not follow objects boundaries, contrast is weak, and they are not assmooth as most contours (they have no shape in the usual meaning). But, in terms of image analysis,they obviously have an interpretation. From this view point, the proportion of long meaningful levellines is certainly still underestimated. Indeed, if we define meaningful boundaries as large deviationsfrom curves generated by noise, the probability of interest should rather be given by the joint law ofthe length and the minimal gradient, that is to say

�� > 9 +-, .04143 � �� > � � �If we could estimate this probability, we would be able to define NFAs which would take values muchsmaller than the current ones. In particular all very long curves would certainly be meaningful. Theinterpretation is that, in a white noise image of the same size as the real image, we do not observelevel lines of such a length. However, the fact that these lines have a perceptual interpretation doesnot imply that they correspond to the usual concept of shape (e.g. illumination gradient). In addition,the joint law of the minimal gradient and the length of the curves cannot be accurately estimated.

2.4.3 Cleaning-up meaningful boundaries

Proposition 1 asserts that if a curve is a meaningful boundary, then it cannot be entirely generated inwhite noise. On the other hand, can we guarantee that no part of a meaningful boundary is containedin noise ? Or, for a given meaningful boundary, can we give an upper bound of the size of the partof the boundary that is likely to be contained in noise (i.e. a non-edge region) ? To answer thisquestion, we use the a posteriori length distribution

�� > 9 � +-,/.02143 � � �� 5� > � � � (4)

RR n˚5067


Contrary to the probability appearing in Def. 1, this one penalizes long curves not only through thegradient value. To compute it, we need the a priori distribution �� > 9 � that a level line in noisehas a length greater than 9 . This law can be correctly estimated for 9 6

� �4� �(to give an order of

magnitude), see Fig. 4. For higher values, there are too few level lines. By using Bayes’ rule, wederive

�� > 9 � +-,/.04143 � �� 5� > � �� $ � ��+-, . 02143 � �� > � � �� $ � ��+-,/. 04143 � �� 5� > � � ��

(The denominator is nothing but �� ). By the a contrario assumption (independence of

Figure 4: Log10 of the inverse repartition function of length of level lines in a white noise image.The average length is about 3.5, meaning that most level sets enclose a single pixel.

the gradient along curves), we can still write

�� 9��(' �� > 9 � +-,/.02143 � � �� 5� > � � � � �$ � �*� � � � $ �� $ � �*� � � � $ ��

(5)

Let us now consider an image � with � � � level lines. We also denote by � � the number of allpossibles subcurves of these level lines. ( �� is the sum of the squared length of the lines if they areclosed).Assume that

�is a piece of length , contained in a non-edge part, described by the noise model.

We want to estimate the probability that is greater than 9 � �, knowing that � �� >

�. This is

exactly �� 9 � , the probability defined in (5). Then, as in Prop. 1, the number � � � �� 9�� is an upperbound of the expected number of pieces of lines of length greater than 9 with gradient larger than

�(see Sect. 6 of this paper and [13]). For a fixed

�, let be 9 such that �-�� 9�� 6 � . Then, we know

that, in average, we cannot observe more than � pieces of level line with a length larger than 9 anda gradient every greater than

�. We make the assumption that a point with a gradient less than

�is

INRIA


located in noise. Let us remove any piece of length 9 containing such a point. Then all remainingpoints belongs to a piece of curve with length greater than 9 with gradient larger than

�, which cannot

be due to chance.This yields a clean-up algorithm for boundary detection.

1. Detect meaningful boundaries.

2. For a fixed� � �

, let � � � � � , .��9 � �� 9 � 7 � � .3. For any meaningful boundary, remove all subcurve of length � � � � containing a point where

� � �� 6 �.

This introduces a parameter,�

. When�

gets larger, � � � � decreases, so that the clean-up removemore numerous but smaller pieces of curves. The choice of

�can be determined by applicative

considerations. Detected edges may be used for different purposes, for instance shape recognitionor image matching. Letting � �� less than 1, means that we cannot locate edges with an accuracybetter than one pixel. Thus choosing

� � �for all images is not restrictive. We also check that for

�about 1, we obtain values of � � � � less than a few hundreds, which is compatible with the empiricalestimation of the a priori length distribution.

Figure 5: Meaningful boundary clean-up. On the left the original image. In the middle, the meaning-ful boundaries with local histograms, see Sect. 4. Boundaries are found in the sky. They are detectedsince the gradient in the sky is regular because of the smoothly changing illumination. The gradientvalue is about 0.2. Even though they are not smooth at small scale (they cannot be well located,due to the too small gradient), they are nearly parallel at large scales, which can be explained, aposteriori. Now, these boundaries may not be very useful for shape recognition purposes, because oftheir bad localization. On the right, the result after the clean-up procedure with a gradient thresholdequal to 1.

2.4.4 Geometrical information reduction

Caselles et al. claim that pieces of level lines are the basic objects of image analysis [4]. Then,suitably encoded (that is to say in a stable and invariant way) pieces of level lines could directly beused to feed a shape recognition algorithm. There is no theoretical obstruction to encoding all levellines. Lisani et al. [26] describe an encoding method for shape matching. Intrinsic frame for pieces

RR n˚5067


of level line are defined from inflexion points or bitangent points. These normalized curves are takenas shape elements (we call them codes); their number is basically proportional to the number ofinflexion points. Thus, long and oscillating curves are very costly, and at the time being, the methodis not applicable in reasonable time if all the topographic map is encoded. The MB model is used tocompute a raw primal sketch [28], that contains most information of shape contours. Experimentsin [13] and this article show that the MB model give sufficient information: it is sufficiently completeand we now discuss the geometrical compression it provides.A natural geometrical measure for image complexity is the total variation. If � is an image in �

, itstotal variation is ��

�� ,�� ! � � R � R � � �� 6 � � �(6)

When �� ,�� . The total variation also has the classical geometrical interpreta-

tion, given by the coarea formula [16],�� R � � �� (7)

where � �is the one-dimensional Hausdorff measure. When � only assumes a finite number of

values, the total variation is nothing but the sum of the perimeters of the level sets. Obviously,approximation of � � �� by mere finite differences is much faster than summing all level linesperimeter. Both estimates give very close results (relative errors about 0.3%). The ratio between thetotal variation and the sum of meaningul boundaries perimeter gives an amount of the geometricalcompression. This ratio is experimentally about 15. Similar results are obtained with the totalcurvature � ! � � � �

��

� ! �#" � �$� � � ��" �� &%%%% � ,��

� �� %%%% � (8)

(The second equality is a consequence of the coarea formula.)If we now encode shape elements by Lisani’s algorithm, the gain is often much better than that, es-pecially for textured or low contrasted images. We think the reason is the following. Recent studiesproved [18, 24], that under scale invariance hypotheses, images have not a bounded total variation.This blow-up is mainly due to small objects (noise and microtexture). The corresponding level linesare usually too short to be meaningful boundaries, and the observed total variation decrease is dueto their elimination. On the other hand, these lines are too simple to contain an inflexion point, aflat point or a bitangent. Thus, they are not encoded either, even if we keep all level lines. On theopposite, in noise there are only very few long level lines. These lines have a very complicated frac-tal geometry, and numerically contain many inflexion points. The distance between two inflexionspoints in white noise in approximately 3. (This, in turn, is also the mean length of level lines in whitenoise.) As a result, a single long line in white noise may produce several thousands of shape ele-ments, which is the empirical approximate number of codes obtained for all meaningful boundariesfor images of size � �4�(' � � �

! Such lines may also appear in real images, in low contrasted areas

INRIA


with an illumination gradient as in Fig. 5. Their contribution is negligible for the total variation (thegradient is low), while it is disproportioned for shape coding. Thus, for practical shape recognition,we must ensure as in Sect. 2.4.3 above, that no pieces of level lines due to noise are detected.

2.4.5 Meaningful boundaries vs. Haralick’s detector

In this section, we comment the main differences between the meaningful boundary model and theclassical edge detector introduced by Haralick. The meaningful boundaries are based on the topo-graphic map of grey level images, which gives a complete topological representation of grey levelimages. Caselles, Coll and Morel [4, 6] detail all the properties of this representation. A first advan-tage of this representation is its stability: even with an important amount of noise, many level linesdo not change much. A second advantage is its invariance with respect to global contrast change.But the main property is the structure of this representation: it is a set of nested curves that are eitherclosed or meet the image boundary. As a consequence, level lines have two of the main propertiesusually expected in edge detection or image segmentation: they are curves (and not sets of points),and are embedded in a hierarchical structure [17, 33, 40]. Moreover, away from critical points, levellines coincide with isophotes. As a consequence, for almost any level, the gradient is almost every-where normal to level lines, which makes level lines good candidates for edges.Following Haralick [19], edges are the maxima of the gradient norm in the direction of the gradi-ent, such that the gradient is larger than a given threshold. Thus, for a grey-level image � , theyare the zero-crossings of � � � � � �� . It is well-known that this quantity is numerically sensitiveto noise and that a smoothing is necessary. Thus in practice, � is first convolved with a gaussianwith standard deviation � (we denote by �� this gaussian and �� ) and the points where� � �� changes sign and � �� )�

are edges points. Although there have been someattempts to automatically determine the scale parameter � [25], edge detection widely remains mul-tiscale as predicted by Marr [28]. On the contrary, meaningful boundaries do not need smoothingfor suitable sampled images (see next section) and can be run with no image dependent threshold.Haralick’s detector provides with a set of points or a few pixels long curves. The way they shouldbe connected is far from obvious and may lead to a very high computational complexity; this is onemore problem structurally handled by level lines. Last but not least, Haralick’s operator is inefficientfor corners and junctions. Indeed, at those points, the gradient direction is very badly estimated andedges may be severely cut. At a junction, level lines bifurcates. Figure 6 shows the meaningfulboundaries and Canny’s filter near two junctions. junction [5].

2.4.6 Color edges

Let � � � �� R �� R�

be a color image given by its three components in the RGBspace. Since topographic maps are based on the total order relation on gray levels, they cannot beimmediately generalized for multi-channel images. In [7], Caselles, Coll and Morel have proposedto define topographic maps based on the luminance and have shown that it contains most of theimage information. The luminance is defined by

� � � � ��

RR n˚5067


Figure 6: Junction and level lines. On the left, the original image. Middle, Haralick’s detectorimplemented with Canny’s filter on the area designated on the left image. Note how the contour isbroken at the junction, due to the bad estimate of the gradient direction, and the high number of edgepieces. Right: detailed view of meaningful boundaries on the region. There are two level-lines, eachcorresponding to an edge part.

From the topographic map of � � � , we can generalize the MB model by using a color gradient. Ateach point let us denote by � the direction of the gradient, that is � ��

��

�� . We then denote by

��the gradient of � in the direction of � , that is to say the vector� �� # � � �� ) � � �� (9)

We then apply the MB algorithm by replacing the norm of the single-channel image gradient by %%�� %% .As can be expected from [7], the results are almost the same as for grey-level images. Of course,

the results are slightly dependent on the choice of luminance and of the gradient norm. It is wellknown that different combinations of colors may lead to the same luminance. It is experimentallyquite seldom, and we did not observe any real gain or loss to use color.

3 Image quantization and gradient norm distribution

A common belief about edges is that the value of the gradient should be very large. The subjectivecontours (see Kanisza [20], for instance) are seen as some exceptions to this rule. On the contrary,we believe that a lot of edges are nearly subjective (this is nearly always the case in paintings afterthe Renaissance). In any case, numerical evidence proves that many edges have a contrast that is notlarger than 4 of 5, which is quite comparable to texture and noise. This, of course, makes gradientthresholds quite difficult to tune in classical edge detection. This also has an unexpected consequencefor meaningful boundaries: the gradient distribution does not really need to be accurate for largevalues but for small values. This means that grey levels have to be suitably quantized in smoothareas, while it does not have a real importance around very contrasted edges. Image quantization isperformed after image sampling, and creates high frequencies variations. As a consequence, coarsely

INRIA


quantized images are not well sampled, in the sense that their spectrum does not vanish close to thecritical frequency. This also implies a coarse quantization of the norm of the gradient, especially forsmall values. This was known to be a problem for the gradient orientation [11], but not for the norm.Let us use the simple gradient estimate on a � ' � mask, defined by� �� 0 � ��

��

� �� (10)

� ��

� � � � � �� (11)

In this case, � 0 � �� if and only if �� and ��

� � �� . Thus � does not need to be constant on the four points � �� , � � � �� , � �� and� � � �� . It may even assume very different values. For instance, this numerical gradient iseverywhere equal to zero on a perfect chessboard. However, such an image is not well sampledand the gradient computation is not correct. In [11], for alignment detection purposes, the proposeddequantization was a � �

� �� ! translation of the image by using Shannon’s interpolation. It is well

known that this creates Gibbs effect along edges. Since Gibbs effect leaves the gradient orientationmodulo � unchanged, this is not a problem for alignment detection , while it is not suitable for thenorm of the gradient.In order to dequantize the norm of the gradient, we simply propose to apply a slight gaussian smooth-ing to the image. Remark that this is not the same as to use a different mask to compute the gradient,since level lines are also extracted on the smoothed image. Numerically, a standard deviation of

� � �is nearly the smallest possible value. We emphasize that this convolution only aims at removingquantization noise, but not to remove small scales elements of the image. In practice, there is hardlyany visual difference before and after smoothing. On Fig. 7, we display the meaningful boundarieswithout and with smoothing. The convolution dramatically changes the gradient distribution close to0. It also removes many critical and saddle points due to quantization. While the image seems nearlyunchanged, the detected lines change incredibly. Of course, one can argue that smoothing createssome structures in the image, and this is why we obtain more detections. This argument is not true.Indeed, the Helmholtz principle asserts that there is less than � detections in a well-sampled imageof noise. Smoothing the image is equivalent to decrease Nyquist’s rate. Thus, we have to applythe same algorithm as above after a suitable sampling of level lines. For a very slight smoothing ofstandard deviation

� � � , we did not observe any change.

Remark 1 In order to deal with unstable values of the gradient, we could think of relaxing the def-inition of a meaningful boundary. As for alignments [12], we could require that a large proportionof points of the boundary have a large gradient. That is to say, the tested events would be of thekind “at least � points among 9 have a gradient larger than

� � ”. Under the a contrario assumptionof independence, the probability of such an event is given by the tail of the binomial law. This def-inition gives many more detections, and avoids all the accidental misdetections due to bad gradientcomputation. On the other hand, a lot of level lines pass from one object to another. In this case,the cleaning procedure becomes necessary, since this a contrario model accepts that a long part of aboundary may not coincide with an edge.

RR n˚5067


0 1 2 3 4 5

0.5

0.55

0.6

0.65

0.7

0.75

0.8

0.85

0.9

0.95

1

0 1 2 3 4 5

0.5

0.55

0.6

0.65

0.7

0.75

0.8

0.85

0.9

0.95

1

(a) Left: inverse repartition function of the gradient of the image below with no gradient norm dequan-tization (detailed view near 0), that is to say �� as a function of . Right: the same imageafter convolution with a gaussian with standard deviation equal to 0.5. For � �� the probabilitydrops from 0.92 to 0.56.

(b) On the left, the original image is much quantized since it has a very low contrast. This leads to bad gradientestimation and a lot of missing detections (middle). Gradient dequantization leads to more correct detections (right).

Figure 7: Influence of image quantization on meaningful boundaries.

INRIA


4 Local boundary detection

In the model above, the values of the gradient are random variables whose distribution is empiricallyestimated. It is simply the histogram of the gradient in the image. One can argue that this distributionis too global. This also yields what we call the “blue sky effect”. Consider an image containing twoparts: a contrasted or textured one (e.g. ground) and a smooth one (e.g. sky). Then, we can observean overdetection in the ground, and an underdetection in the sky. Indeed, the sky only contributeswith small values in the histogram. Thus we tend to detect anything which is more contrasted thanthe sky, and nearly anything is detected in the ground. On the contrary, the contrasted ground makesthe detection more difficult for regions with a small contrast. This is not in agreement with humanvision, since we locally adapt our perception of contrast. Objects are masked in contrasted regions,while our accuracy is improved in low contrasted regions (up to some physiological thresholds).In this section, we address this local adaptivity to contrast. It does not use new concepts and is anadaptation of the meaningful boundary model. We first describe the algorithm, then show experi-ments.

4.1 Algorithm

Assume that we have detected a closed boundary. Then it divides the image into two connectedcomponents: the interior and the exterior of the curve. Then, we can compute the empirical con-trast distribution in the interior on the one hand and in the exterior on the other hand. We thenindependently detect new meaningful boundaries in each connected component. We then apply thisprocedure recursively. Since the size of the level line tree is finite, it is clear that we end the detectionin a finite number of steps.The situation is actually a bit more complicated. First, this method depends on the order we useto describe the image boundaries. We simply choose to start with the most meaningful boundaries.Second, boundaries are not always closed. In this case, their endpoints belong to the image border.They still cut the image into two connected components. Unfortunately, there is no clear notion ofinterior and exterior. An algorithmic choice is made, but is purely algorithmic and arbitrary from aperceptual point of view [32]. Thus, we cannot rely on this choice of interior, which conflicts withclosed boundaries. However, we can first apply the detection to open boundaries, then to the closedones. (Open boundaries contain all the closed ones, since level lines are nested.) More precisely, weproceed as follows.Let us call � � the root boundary, that is the (non-meaningful) boundary containing all the image. If! is a boundary, we denote by � .�� ! its interior.

1. Set �� . (Local root.)

2. set � , the set of already stored in � meaningful boundaries. Initially, � is empty.

3. Let �� 3 1�� .�� ! .

4. compute the histogram of � � �� in �� .

RR n˚5067


5. Use this histogram and detect the maximal meaningful boundaries included in � � . Let�

bethe maximal meaningful boundaries defined by ! � � if and only if�

� .��&!� � ( � .�� ! �� #"%$ �&! � 7 �#"%$ �&!� �� .��&! �� .��&!� �� #"%$ �&! � 6 �#"%$ �&!� � � (12)

Otherwise said, the boundaries in�

have an optimal NFA. Note that this is stronger than themaximality defined in Sect. 2.3 since we go across monotone sections. We call the boundariesin�

the total maximal boundaries. The subtree with root equal to � that remains by keepingonly the boundaries in

�has only two levels: the local root � , and

�. Since the interior of

open boundaries is arbitrary, we do not mix the detection of open and closed boundaries. Inpractice, this means that if we detect an open meaningful boundary ! , we apply the definitionof total maximal boundary (12) only to open boundaries containing ! or contained in ! .

6. if�� , then we have detected new boundaries in the complementary of the already detected

ones. Then,

(a) set � � � � � . By construction, all the closed boundaries in � have disjoint interior.

(b) return to step 3.

7. If� �� , there are no new boundaries in the local root and in the complementary of the

currently detected boundaries. We then continue the research at lower levels of the tree. Forany boundary ! � � ,

(a) store ! .

(b) set �� ! , and � � .(c) return to step 3.

Remark 2 Each boundary may be tested more than once. Thus, the number of false alarms has tobe multiplied by the maximal number of visits of a boundary, which is bounded by above by thelevel lines tree depth. In fact, each detected boundary often lies in the middle of the local root, andthis divides the tree depth by 2. Thus in practice, the maximal number of visits of a boundary is likethe logarithm of the initial tree depth. In practice, it is always much smaller than 100.

4.2 Experiments on locally contrasted boundaries

In Fig .9, we show the difference between the detection with a global contrast histogram and theupdated local histogram. To give an idea of magnitude of the number of false alarms, the boundarydelimiting sky and the foreground has a NFA equal to

�� . This means, that, in average, we need

to look� � ��

images to find such a contrasted curve by chance! The smaller boundaries around theopening on the top of the tower have NFAs about

� � ��.

Very interestingly, using local contrast removes boundaries in texture. This is logical since the local

INRIA


(a)

R �

(b)

R3R2

R1

(c)

R4

(d)

R4

(e)

Figure 8: Example of local research of meaningful boundary. (a) the initial boundaries. They areoriented such that the tangent and the interior normal form a direct frame. We compute the NFAof each boundary. In solid line, we draw the total meaningful ones. Two are open, one is closed.Remark that the interior are disjoint, because of total maximality. While we detect some opencurves, we ignore the closed ones. (b) While we detect new meaningful boundaries, we computethe contrast histogram in the complementary of the interior of the open detected boundaries andresume research in this part of the image. In �� , the exterior of the detected open boundaries, wedetect a total maximal boundary. Remark that this boundary may have been already detected butrejected because of open boundaries. We assume here that no new open boundaries are meaningful.Thus, we keep this closed boundary. (c) We resume the research (with recomputed histogram) in theexterior (white part) of the detected boundaries, until we cannot find new ones. When this is over,we then compute the local contrast histogram in each region � � , � � , � � and look for boundariesinside them. (d) A boundary �� has been detected in � � . Compute the local histogram in � � � ��and detect boundaries. (e) Finally, we scan for boundaries in �� with new local contrast histogram.

RR n˚5067


Figure 9: Influence of local contrast. From left to right: original image, maximal meaningful bound-aries, local maximal meaningful boundaries. Regularity is not taken into account. There are 280,000boundaries in the initial image, 652 in the second one and 193 in the last one. Texture is removedsince local contrast (for instance) on the church tower is much more demanding than the globalhistogram. As the texture is uniform, no level line is a large deviation to the empirical local con-trast, yielding no detection. This is very good for shape analysis where we often want to distinguishtexture from real shapes.

INRIA


contrast in textured regions (as on the tower) assumes larger values than in the rest of the image.Thus, this decreases the NFA of boundaries and most of them simply disappear in textured regions.This is a masking phenomenon.Let us explain why this is useful for shape recognition. In general, a shape recognition algorithmcan be divided in four steps:

1. shapes extraction

2. (invariant) encoding

3. comparison: compute some distance between encoded shapes

4. decision: accept or reject pairs of matching shapes

Present and future applications need to compare images in huge databases, where we have no a pri-ori that two images, or two shapes should match. Since every procedure in the above methodologyis very costly, it is interesting to limit the number of encoded shapes and to try to keep the “mostmeaningful”.At the time being, there is no general model of shapes [44]. Nevertheless, for shape recognitionalgorithm, we can give empirical observations of what a “good shape” is. First, it should be stable interms of extraction. This is generally expressed in terms of contrast and regularity, and the methodwe describe in this paper gives quantitative arguments. (Regularity is the object of the next section.)For encoding, a good shape should not be too simple, especially if we are interested in an invariantrecognition. For instance, most convex shapes are very alike in affine invariant shape recognition.Assume that we have chosen an affine invariant distance between shapes. If we want to be surethat two convex shapes match, the distance between them has to be very small. Indeed, two convexshapes can casually be close to each other, while the probability that it occurs for more complexshapes is very small (this means that recognition is relative to the database and to the query [35]).On the other hand, a shape should not be too complex, since complexity usually makes the encodinglonger and more difficult. Because of occlusions, we usually try to match pieces of shapes. Verycomplex shapes will be divided in numerous pieces, making computations longer.Now, it is well known that texture are strongly damaged by compression. Thus level lines in texturemay not be reliable when two images come from different sources (with different quality, compres-sion rate etc...). Moreover, they are very complex, and yield many encoded pieces of curves. If thesecurves match for two different images, then those images are certainly exactly the same. Now, thecomputational cost may be too high for some applications, where we may want to detect a particularshape (a logo for instance) in a database. Thus it may be useful to automatically remove contrastedregions corresponding to texture. This is what the local contrast detection makes in practice.

The argument above is reversed for stereo images registration. In this case, we have the strong apriori that the images are nearly the same, and the goal is to register them as best as possible. In thisapplication, textures can also give some useful information. (See Fig 10).The elimination of textures is the first aspect of the “blue sky” effect. On the contrary, local contrastshould make curves in low contrasted areas more detectable. This is also what we empiricallyobserve: we detect illumination gradient (See Sect. 2.4.2, Fig. 5 and 11). They can be due to the

RR n˚5067


(a) (b)

(c) (d)

(e) (f)

Figure 10: Image registration. (a) and (b) are two images from a movie during a rightward traveling.(c) and (d) are the meaningful boundaries in the previous images. (e) and (f) are the pieces of levellines of (c) and (d) that match with a number a false alarms less than

� � � �. We use the algorithm

developed by Lisani, Musé and Sur [26, 34, 35], which uses an a contrario definition of shapematching.

INRIA


vicinity of the light source, or to the variation of the orientation of the surface of a three dimensionalobject with respect to the light source. Such lines do not correspond to the usual notion of shapes(objects). Nevertheless, it is logical to detect them as remarkable structures.

Figure 11: Illumination, local contrast and regularity. Left: original image. Middle: meaningfulcontrasted boundary. Right: meaningful contrasted and smooth boundary with local contrast. Withcontrast only, a single boundary appears on the right with the contrast due to illumination. If con-trast is localized, then more boundaries are detected. If we also add a regularity constraint (seeSect. 5.1 below), there are still more detections. These boundaries are very different from texturesince they are nearly convex and parallel. They are eliminated by the cleaning procedure describedin Sect. 2.4.3.

5 Meaningful boundaries or snakes?

In [15], Desolneux, Moisan and Morel compared the MB model with variational snake theory. Thismay seem a bit weird since the MB model only uses contrast observations along a curve, whilesnakes are also required to be smooth. In fact, the explanation for natural images is that contrastedboundaries often locally coincide with objects. Thus, they are also incidentally smooth. Whereassmoothness seems to be optional for the detection, it may give a better localization of the contour.In this section, we propose a method to incorporate smoothness to the detection. There are only fewadditional detections, while the position of the maximal meaningful boundaries may change a littlebit. The NFA also significantly decreases. The small number of new detections and the fact thateach partial detector can detect most image edges prove a contrario that contrast and regularity arenot independent in natural images.An a contrario model of regularity has been proposed in [3]. It assumes that the variation of theorientation of the tangent between two samples is a random value uniformly distributed in � � � � � .Thus, the implicit a contrario model is random walks with isotropic and independent increments.This model is not really adapted for the following reason. All the curves we detect are level lines, thusboundaries of compact sets. As a consequence, they do not self-intersect. While the local influenceis not clearly visible, this implies that long level lines are much more regular than random walks.This logically leads to an overdetection of long level lines because the independence assumption isstrongly violated at very long range. The solution we propose is to stick to Helmholtz principle: “no

RR n˚5067


detection in white noise”. Thus we have to learn the regularity of level lines in white noise, and usethis as the a priori distribution.

5.1 Definition of local regularity

Let 9 � � �be a fixed positive value. Let ! be a rectifiable planar curve, parameterized by its length.

Let � ! ��" � � �#! . With no loss of generality, we assume that " � � �.

Definition 3 We call regularity of ! at (at scale 9 � ) the quantity

� �� + + - � � � ! � � 9 � � � � � ! � 9 � �5� �9 �

�(13)

x � C�0 �

C��

l0 �

C�l0 �

l0 � Rl0

�x �

Figure 12: Regularity definition. The regularity at is obtained by comparing the radius of the circlewith 9 � . The radius is equal to 9 � if and only if the curve is a straight line. If the curve has a largecurvature, the radius will be small compared to 9 � .

Of course, this definition really makes sense if the length of ! is larger than �49 � . This definition ofregularity (see Fig. 12) is related to the Hausdorff dimension of ! around . First, � �� 6

�, with

equality if and only if either ! � � � 9 � � � � or ! � � � 9 � � � is a line segment. On the contrary, if � � � � �is small, then the curve is highly curved around .

We can also interpret � � � � � as a function of the local curvature. Indeed, if ! is a circle with largeenough radius � , then

� � � � � � � , . # �29 ��

) where � , .� � � ,/.

�

(14)

This approximation is valid when 9 � is small compared to � . In this case, the regularity is a nonin-creasing function of the curvature.

INRIA


This definition is not purely local, but it is also less sensitive to noise compared to differential mea-sures as the curvature. Let

� �� ! ! is a white noise level line and � � � � � �� (15)

This distribution only depends on 9 � and can be empirically estimated. Of course, we learn it onlevel lines whose length is much larger than 9 � in order to avoid quantization effects.

Remark 3 As expected, the distribution � �� is very different in white noise and natural images.In natural images, the histogram of � �� has a peak at

�, corresponding to real objects boundaries

(which often contain alignments). In some textured images, as paintings, most edges are not realbut subjective and this is clearly visible on the histogram of � �� . See Fig. 13. The distribution alsoclearly depends on 9 � . When 9 � grows, the histogram mode moves to lower values. However, weobtain the same qualitative behavior as above. In Appendix A, we use these distributions to computethe Hausdorff dimension of white noise level lines. We then quantitatively check that they are muchmore smooth than (self-intersecting) isotropic random walks.

Remark 4 In [18], Gousseau and Morel prove that natural images are not smooth. In particular,their total variation seems to be unbounded. This is not a contradiction with the experiment wedisplayed. Indeed, images are not smooth because they contain many small objects due to texture.Our regularity computation is not completely local, so we remove little objects before computing thehistograms.

Again, the choice of 9 � is a natural question. Of course 9 � should be larger than Nyquist distance.It should not be too large either. In experiments we have chosen 9 � � ��

. But, since NFAs areadditive, we may also choose several reasonables values of 9 � (say 9 � � � , 10, 20) and multiply theNFAs by the number of 9 � . In practice, changing 9 � influences the number of samples and best NFAsare attained for small 9 � .

5.2 Meaningful contrasted and smooth boundary

Now that we have a background model of regularity, we use it to detect regular curves a contrario. Itis natural to assume, in the background model, that contrast and regularity are independent. Thus

�� ! is contrasted and smooth � � ��&! is smooth � ' �� ! is contrasted � �

Definition 4 Let ! be a level line. Let

� � +-, .�� ! � (16)

� � +-,/.�� 5� �� ! � (17)

RR n˚5067


Figure 13: Regularity histograms. Upper row: a white noise image, a scanned photograph and ascanned photograph of a painting. Bottom row: the three regularity histograms for 9 � � ��

. Sinceits histogram vanishes near

�, white noise does not contain any alignments or smooth curves, as

foreseen. Nearly all natural images (containing true edges) have a regularity histogram like thesecond one. The third image contains mostly subjective edges, as it is composed of painted strokes.As a consequence, the regularity histogram is much less concentrated around 1 as for “natural”images. If we now unzoom the three images (with an adequate smoothing before downsampling),then the first histogram remains unchanged (scale invariance), while the other two have regularityhistograms like the second one. Indeed, after unzooming, most textures and small scale featuresdisappear, and small gaps get filled.

INRIA


be respectively the minimal quantized contrast and regularity along ! . Let

�#"%$ �� &! � � � � �� 6 � � � � � � � � � � 6 � � � � (18)

We say that ! is a � -meaningful smooth boundary if �#"%$ �� &! � 7 � .

The number of false alarms is the product of number of level lines and the probability that the contrastand the regularity are simultaneously larger than the observed values along a curve with prescribedlength taken in the background model. The probability is computed in the a contrario model wherecontrast and regularity are independent and local observations are mutually independent.As above, this research can be recursively performed by computing local histograms of the gradient.In experiments, detection results are qualitatively equivalent with or with no regularity. On the otherhand, NFA may decrease a lot for smooth boundaries. Even though the detection is not changed inone single image, it is still interesting to decrease the NFA as much as possible. Indeed, we maywant to detect boundaries not in a single image but in a database (for instance in shape recognitionapplications). We can evaluate the size of a “universal database” (containing all the images ever seenby human kind) to

�� . Thus, curves with a NFA lower than

� � �� in a single image can also be

considered as universally meaningful.

5.3 Comparison with active contours

Active contours is one of the most popular techniques of boundary detection. The first works ofKass, Witkin and Terzopoulos [21] have been improved and generalized by many authors. Recentmodels are more intrinsic, can be expressed implicitly (which ease the possible topological changesof the active contours) and can use image statistics [8, 36]. In this section, we do not focus on anyparticular active contour model, but try to compare a generic model with meaningful boundaries.Such a comparison has already been made by Desolneux, Moisan and Morel [15] for meaningfulboundaries. Even though these boundaries are only contrast-based, they show that they are veryclose to active contours in general and particularly to the model of Kimmel and Bruckstein [22].Since in this paper we have also introduced a regularity criterion, comparison is even more adequate.Let us briefly give a generic active contour model: it is a curve that fits shape contours (hence contrastshould be large along the contour) and which is also as smooth as possible. The problem usuallyassumes a variational formulation. An optimal curve minimizes an energy of the type

� � ! � ��

3 � � �� ! ��" � �5� � �� ! �#" � � �� " (19)

where �� is the gradient of a given grey-level image, is a nonincreasing function, �� ! ��" � � isthe curvature of ! at point ! �#" � , � is a nondecreasing function and " is the arc-length. The optimalcurve is a trade-off between the external energy depending on the image gradient, and the internalenergy depending on the curve itself only. Such a model can accurately give the position of thecontour. However, it has several drawbacks:

� The model assumes that there is a contour: It cannot be used as a detection algorithm. Thisalso explains why active contours are also introduced in Bayesian models, where the realquestion is: knowing that one object is present, what is the best candidate?

RR n˚5067


� The initialization is crucial.� The optimal balance parameter

�(which, for homogeneity reasons, can also be viewed as a

scale parameter) is unknown and depends on the image. It has a strong influence on the result.

If we now only consider the homogeneity of the different energy terms, we have to minimize apotential of the form � � �� ! � , being the length of the curve. Let us now considerthe meaningful smooth boundary model. A meaningful curve has a small probability to occur in thea contrario model. Our regularity measure is a non increasing function of the curvature (see (14)).Thus, for a meaningful curve, the quantity

� � � � � � �� 6 � � � � � � � � � ! � � � � 6 � � �is small. Let us now take the logarithm of this expression. We obtain an expression of the type

� � � 0�� ! � � where � � 0�� is a non increasing function of � � �� , and � � � � is a non decreasing function of thecurvature. The model is qualitatively alike a snake model. Nevertheless, there are three majordifferences:

1. There is a quantitative criterion to decide if the curve has to be detected. Contrary to snakesalgorithm, meaningful boundaries detection is not a minimization algorithm. It is well knownin active contours model that the value of the energy of the minimizer has no interpretation.All that we can say is that a candidate is better than another one. Our model gives a meaningto the energy-like term. Thus, there is no need for a minimization since we can give thresholdsunder which a candidate has to be detected.

2. Meaningful boundaries are level lines. Thus, no initialization by hand is needed.

3. We do not have to fix the weight functions and�

as well as the scale parameter�

.

5.4 Experiments on smooth meaningful boundaries

In general, adding a regularity criterion does not qualitatively change the result. This is conformwith the observation of Desolneux et al. in [15]. Remark also that the criterion we give does noteliminate irregular level lines. Indeed,

�#"%$ �� ! � 6 �8� � � � � � � � 6 � (with the same notations as in Def. 4) since � � � � � 6

�. We can only detect more lines. Of course,

smooth boundaries NFA decrease a lot (about� � ��

), and this can modify maximal meaningfulboundaries. As it was already observed in [3], contrast and regularity are often very redundant, andthis explains why the same curves are detected.Fig. 14, (INRIA desk) is very geometrical and shows the redundancy between contrast and regularity.We can also define NFA for smooth boundaries, with no care of contrast, as

��"%$ �� ! � � � � � � � �� 6 � � � � (20)

We retrieve most edges in the desk image with this definition.

INRIA


Figure 14: Regularity detectability. The original is the left most. In middle, we display the � ��detected contrasted smooth boundaries as defined in Def. 4. On the right, the �� smooth boundaries,with no contrast information, defined in (20). All the main boundaries are already present. Of course,contrast may be the main cause of small NFA, since regularity acts at larger scales. For instance,the window panes have NFA about

� � �� with contrast and

� �� with regularity only (which still

make them detectable in any image database). The desk on the bottom right has a NFA equal to�� with contrast and

�� with regularity, which is already very small.

6 Meaningful edges

6.1 Edges as pieces of level lines

A very long curve may become meaningful even though it is not very contrasted and smooth (SeeSect. 2.4.2). This phenomenon did not appear in Desolneux et al. work [13], because of quantizationeffects. Indeed, the gradient computation for a quantized image implies that there many criticalpoints (zero gradient). If a level line is very long and does not fit an edge, then the probability thatit does not cross a critical point is close to zero. Thus (too) many long curves were eliminated.Because of occlusions, meaningful boundaries may not coincide with a whole object boundary. Buteach of its parts is the part of an object boundary. Thus, it is natural to consider that the basic objectsfor image analysis are pieces of level lines. This was also proposed in [4, 13]. The drawback is thatpieces of level lines are not embedded in a tree structure and we can non longer apply the notionof (monotone) maximality (Sect. 2.3). The following definition of edges incorporates smoothnessinformation to the edges of [13].Consider a set of digital curves ! � � � � ! � , with length (in independent points) equal to 9 � � � � 9 � .Then the number of all possible sampled connected subcurves is bounded by � � � � �

� � 9 �� .Definition 5 Let

�be a (connected) subcurve of one of the ! � . Let � and � the minimal quantized

contrast and regularity in�

(See (16) and (17)). Let 9 the length of�

. We define the number of falsealarms of

�by

�#"%$ �� )�8�� 6 � � � � � � � � � � 6 � � � � (21)

RR n˚5067


We say that�

is an � -meaningful edge if �#"%$ � � � 7 � .We say that

�is a maximal � -meaningful edge if and only if

1. for all� � , � � � � � �#"%$ � � � � > ��"%$ � � � .

2. for all� � , � � � � � �#"%$ � � � � � ��"%$ � � � .

This detection destroys the tree structure of the topographic map and we loose the notion of mono-tone maximality introduced in Sect. 2.3. We can apply the edge detection in two different ways:

� apply it to all the image level lines

� apply it only to meaningful boundaries.

In the first case, we automatically get all the meaningful pieces of all level lines. We may get thicksets of edges concentrated along true edges. The tree structure can no longer be used. (However,it is possible to use another notion of maximality that make those wide strokes thinner. [1, 10].) Inthe second case, we already know that the a contrario model is false on the data curves (else theywould not be meaningful boundaries). Thus, it is not possible to apply the same detection thresholdson this boundaries. However, it is possible to perform a variational research: we can select the mostmeaningful pieces in each meaningful boundary. Practically, the meaningful edge algorithm needsno parameter at all, and leads to equivalent result to the cleaning-up method of Sect. 2.4.3. However,edges maximality may force to prefer too short edges if contrast is very high on a small portion ofline, which is not the case for the cleaning procedure that removes low-contrasted parts.

6.2 Experiments on edges

We first display an image with smooth edges alone, that is to say we ignore contrast and only use theregularity criterion: on the Valbonne church image (Fig. 15), only the main structures are recovered.They correspond to very strong edges of man made objects. On the contrary, textures are ignored.See for instance the edge of the tree on the right-hand side of the image. Finally we display anexperiment showing edges within meaningful boundaries (Fig 16). We first compute the maximalmeaningful boundaries, then only keep their most meaningful parts. This may remove pieces ofa boundary that are detectable when compared to noise, but which are much less meaningful thanother parts of the boundary.

7 Conclusion

The soundness of Marr and Hildreth’s edge detection doctrine [29] has been widely proved in theliterature. The results in the previous section show that, moreover, the output of edge detection candirectly be sets of curves (and not points to be connected as in the classical Canny detector [2]). Evenmore important, this can be done with no parameters. We also justify, a posteriori, all the algorithms(as active contours and segmentation) that combines both contrast and regularity. However, wecan add that for large scale objects (“large” means in practice more than 20 pixels), regularity and

INRIA


Figure 15: Valbonne church. The original image is the same as in Fig. 9. Meaningful edges with onlyregularity criterion ( 9 � � � ). Only the very regular edges are detected. The other ones are mainly dueto texture and appear with contrasts. This experiments show that regularity alone gives comparableresults with classical edge detections. It also suffers from the same drawbacks. At junctions (andcorners) regularity edges are broken. They do not provide good shape elements by themselves.

Figure 16: Edges. Left: original image (Composition 8 by Kandinsky). Middle: local maximalmeaningful contrasted and smooth boundaries. Right: edges within the boundaries. Some bound-aries that do not correspond to shapes are detected in the middle. It is due to the fact that thebackground of the painting is not uniform at all and that shapes are surrounded by a halo, whoselevel lines are detected.

RR n˚5067


contrast usually express the same information, that is to say, they allow to detect the same things.This redundancy is compatible with the usual notion of shapes, and we prove that we can detect bothproperties with no a priori model but a qualitative a contrario model. Regularity information makesdetection even surer but are usually not necessary. Besides, incorporating several detection criteria ismore heavy, and a compromise between accuracy and speed as to be found. In our shape recognitionexperiments, we usually prefer to ignore regularity. Another conclusion of our experiments is thatno detector allows to detect all the shapes in an image and only them. This “failure” is completelycoherent with Gestalt Theory: shape (and contour) is not a local notion. First, many contours aresubjective and configurations of the type of Kanizsa’s triangle often appear at lower degree. Forsuch contours, all local methods are doomed to fail. Moreover, some curves may appear as contoursand good shapes out of their context. If we crop a small part of an image, a boundary may appearcontrasted, smooth and fairly complex, as a good shape. However, these small shapes may be maskedby much larger ones which impose the natural scale of the image. Thus, the notion of edge is onlyrelative, to a context, or to a precise application. We think that the different models we exposedpermit to cover most applications. For instance, in image registration, we may want to use anyinformation for a better accuracy, e.g. small variation in texture. Digital elevation models generationuses texture for registration. In this case, we have the very strong a priori knowledge that registrationmakes sense. If we now want to compare two images in a large database, it is certainly better to keepthe most meaningful curves that correspond to shapes as best as possible, and to eliminate all otherlines. For practical shape matching by shape elements comparison [26, 35, 34], the MB modelwith local contrast and cleaning-up automatically eliminates most edges due to texture or smallillumination gradient. It gives the most complete and compact shape elements of natural images.

Acknowledgements. We thank Jean-Michel Morel and Agnès Desolneux for all discussions andadvice.

References

[1] A. Almansa. Échantillonnage, interpolation et détection. Applications en imagerie satellitaire.PhD thesis, ENS Cachan, 2002.

[2] J. Canny. A computational approach to edge detection. IEEE Transactions on Pattern Analysisand Machine Intelligence, 8(6):679–698, 1986.

[3] F. Cao. Good continuation in digital images. In Proceeding of ICCV 03, Nice, volume 1, pages440–447, 2003.

[4] V. Caselles, B. Coll, and J.M. Morel. A Kanizsa program. In Progress in Nonlinear DifferentialEquations and their Applications, volume 25, pages 35–55, 1996.

[5] V. Caselles, T. Coll, and J.M. Morel. Junction detection and filtering. In Cucker Felipe, editor,Foundations of computational mathematics, pages 23–42. Springer, 1997.

[6] V. Caselles, T. Coll, and J.M. Morel. Topographic maps and local contrast changes in naturalimages. International Journal of Computer Vision, 33(1):5–27, 1999.

INRIA


[7] V. Caselles, T. Coll, and J.M. Morel. Geometry and color in natural images. JMIV, 16(2):89–105, 2002.

[8] V. Caselles, R. Kimmel, and G. Sapiro. Geodesic active contours. International Journal ofComputer Vision, 22(1):61–79, 1997.

[9] T. Chan and L. Vese. Active contours without edges. IEEE Transactions on Image Processing,10(2):266–277, 2001.

[10] E. d’Angelo. Recherches de formes géométriques significatives. application au traitementd’otolithes marins. Technical report, IRISA, 2003.

[11] A. Desolneux, S. Ladjal, L. Moisan, and J.M. Morel. Dequantizing image orientation. IEEETransactions on Image Processing, 11(10):1129–1140, 2002.

[12] A. Desolneux, L. Moisan, and J.M. Morel. Meaningful alignments. International Journal ofComputer Vision, 40(1):7–23, 2000.

[13] A. Desolneux, L. Moisan, and J.M. Morel. Edge detection by Helmholtz principle. Journal ofMathematical Imaging and Vision, 14(3):271–284, 2001.

[14] A. Desolneux, L. Moisan, and J.M. Morel. A grouping principle and four applications. IEEETransactions on Pattern Analysis and Machine Intelligence, 25(4):508–513, 2003.

[15] A. Desolneux, L. Moisan, and J.M. Morel. Variational snake theory. In S. Osher and N. Para-gios, editors, Geometric Level Set Methods in Imaging, Vision, and Graphics. Springer Verlag,2003.

[16] L.C. Evans and R.Gariepy. Measure Theory and Fine Properties of Functions. CRC Press AnnHarbor, 1992.

[17] P. Felzenszwalb and D. Huttenlocher. Image segmentation using local variation. In ProceedingsIEEE Conference on Computer Vision and Pattern Recognition, pages 98–104, 1998.

[18] Y. Gousseau and J.M. Morel. Are natural images of bounded variation? SIAM J. of Math.Anal., 33(3):634–648, 2001.

[19] R. Haralick. Digital step edges from zero crossing of second directional derivatives. IEEETransactions on Pattern Analysis and Machine Intelligence, 6:58–68, 1984.

[20] G. Kanizsa. La Grammaire du Voir. Diderot, 1996. Original title: Grammatica del vedere.French translation from Italian.

[21] M. Kass, A. Witkin, and D. Terzopoulos. Snakes: Active contour models. International Journalof Computer Vision, 1:321–331, 1987.

[22] R. Kimmel and A.M. Bruckstein. Regularized laplacian zero crossings as optimal edge inte-grators. In Image and Vision Computing, IVCNZ01, New Zealand, 2001.

RR n˚5067


[23] J.J. Koenderink. The structure of images. Biol. Cybern., 50:363–370, 1984.

[24] A.B. Lee, D. Mumford, and J. Huang. Occlusion models for natural images: A statistical studyof a scale-invariant dead leaves model. International Journal of Computer Vision, 41(1-2):35–59, 2001.

[25] T. Lindeberg. Feature detection with automatic scale selection. International Journal of Com-puter Vision, 30(2):79–116, 1994.

[26] J.L. Lisani, L. Moisan, P. Monasse, and J.M. Morel. On the theory of planar shape. SIAMMultiscale Modeling and Simulation, 1(1):1–24, 2003.

[27] S. Mallat. A Wavelet Tour in Signal Processing. Academic Press, 2nd edition, 1999.

[28] D. Marr. Vision. N.York, W.H. and Co, 1982.

[29] D. Marr and E. Hildreth. Theory of edge detection. Proceeding of Royal Society of London,207:187–207, 1980.

[30] G. Matheron. Random Sets and Integral Geometry. John Wiley N.Y., 1975.

[31] F. Meyer and P. Maragos. Nonlinear scale-space representation with morphological levelings.J. of Visual Comm. and Image Representation, 11:245–265, 2000.

[32] P. Monasse. Morphological Representation of Digital Images and Application to Registration.PhD thesis, Université Paris IX Dauphine, 2000.

[33] D. Mumford and J. Shah. Optimal approximation by piecewise smooth functions and associ-ated variational problems. Communication on Pure and Applied Mathematics, XLII(4), 1989.

[34] P. Musé, F. Sur, F. Cao, and Y. Gousseau. Unsupervised thresholds for shape matching. InIEEE Int. Conf. on Image Processing, ICIP, 2003.

[35] P. Musé, F. Sur, and J.M. Morel. Sur les seuils de reconnaissance de formes. Traitement dusignal, 19(5/6), 2003.

[36] N. Paragios and R. Deriche. Geodesic active regions and level set methods for supervisedtexture segmentation. International Journal of Computer Vision, 46(3):223–247, 2002.

[37] P. Salembier and L. Garrido. Binary partition tree as an efficient representation for imageprocessing, segmentation, and information retrieval. IEEE Transactions on Image Processing,9(4):561–576, 2000.

[38] P. Salembier and J. Serra. Flat zones filtering, connected operators, and filters by reconstruc-tion. IEEE Transactions on Image Processing, 4(8):1153–1160, 1995.

[39] J. Serra. Image Analysis and Mathematical Morphology. Academic Press, 1982.

INRIA


[40] J. Shi and J. Malik. Normalized cuts and image segmentation. IEEE Transactions on PatternAnalysis and Machine Intelligence, 22(8):888–905, 2000.

[41] L. Vincent. Grayscale area openings and closings, their efficient implementation and applica-tions. In J. Serra and P. Salembier, editors, Proceedings of the

� � � Workshop on MathematicalMorphology and its Applications to Signal Processing, pages 22–27, Barcelona, Spain, 1993.

[42] M. Wertheimer. Untersuchungen zur Lehre der Gestalt, II. Psychologische Forschung, 4:301–350, 1923.

[43] A.P. Witkin. Scale space filtering. In Proc. of IJCAI, Karlsruhe, pages 1019–1021, 1983.

[44] S.C. Zhu. Embedding Gestalt laws in markow random fields. IEEE Transactions on PatternAnalysis and Machine Intelligence, 21(11):1170–1187, 1999.

A Appendix: Numerical estimation of the Hausdorff dimensionof a curve

In order to compute the Hausdorff dimension of identically distributed random curves from thehistogram of regularity, we proceed as follows. Let ! be a curve.

Definition 6 The Hausdorff measure of dimension � is defined by

� , +�� ,/.�� covering

��

� � � � �

where the � � form a covering of ! and � � � � is the diameter of � � . The family �� is a � -coveringof ! if ! � � �� and for all , � � � � 7 � .

The problem to estimate this quantity is that it makes no sense to let � �for digital curves. Indeed,

even for white noise, the precision is bounded from below by Nyquist distance. We assume that thecurve is self-similar. This allows to examine it at larger and larger scales, instead of letting � go to 0.Let us cut a curve with length � �4�#9 in � chunks of length �49 . We measure the regularity �%� � �at the middle point � of each piece. The balls with radius � ��9 nearly form a covering of ! . It isnot a covering because the endpoint of the curve chunk may not be the most remote point from thecenter (see (13)). Nevertheless, we approximate the measure of ! by

� � �&! � �� 29 � � �� (9��

where � � is the mean regularity along ! . Let us now consider the curve� ! with

� � �. We can

make the same procedure as above with chunks whose length is equal to � � 9 . Thus we evaluate themeasure of

� ! by � � � � ! � � � � �� 9��

RR n˚5067


But, if we now use pieces of curves of length �49 , we also obtain

� � � � ! � � � � � � � (9 � �� Thus � � � � � � � � � � � yielding

� �� # �� ) � �� (22)

We can evaluate � by examining the histograms of � � as a function of 9 .For random walks with independent increments, we find � � � � � � , whereas the true dimension is 2.For level lines in white noise, we find � � � ��

. As expected, the level lines of a white noise imageare more regular than random walks.

INRIA

Unité de recherche INRIA Lorraine, Technopôle de Nancy-Brabois, Campus scientifique,615 rue du Jardin Botanique, BP 101, 54600 VILLERS LÈS NANCY

Unité de recherche INRIA Rennes, Irisa, Campus universitaire de Beaulieu, 35042 RENNES CedexUnité de recherche INRIA Rhône-Alpes, 655, avenue de l’Europe, 38330 MONTBONNOT ST MARTIN

Unité de recherche INRIA Rocquencourt, Domaine de Voluceau, Rocquencourt, BP 105, 78153 LE CHESNAY CedexUnité de recherche INRIA Sophia-Antipolis, 2004 route des Lucioles, BP 93, 06902 SOPHIA-ANTIPOLIS Cedex

ÉditeurINRIA, Domaine de Voluceau, Rocquencourt, BP 105, 78153 LE CHESNAY Cedex (France)

http://www.inria.frISSN 0249-6399

extracting meaningful curves from images

Documents