evaluation of interest point detectorszduric/cs774/papers/schmid-evaluation-ijcv.pdf ·...

22
International Journal of Computer Vision 37(2), 151–172, 2000 c 2000 Kluwer Academic Publishers. Manufactured in The Netherlands. Evaluation of Interest Point Detectors CORDELIA SCHMID, ROGER MOHR AND CHRISTIAN BAUCKHAGE INRIA Rh ˆ one-Alpes, 655 av.de l’Europe, 38330 Montbonnot, France [email protected] Abstract. Many different low-level feature detectors exist and it is widely agreed that the evaluation of detectors is important. In this paper we introduce two evaluation criteria for interest points: repeatability rate and information content. Repeatability rate evaluates the geometric stability under different transformations. Information content measures the distinctiveness of features. Different interest point detectors are compared using these two criteria. We determine which detector gives the best results and show that it satisfies the criteria well. Keywords: interest points, quantitative evaluation, comparison of detectors, repeatability, information content 1. Introduction Many computer vision tasks rely on low-level fea- tures. A wide variety of feature detectors exist, and results can vary enormously depending on the detec- tor used. It is widely agreed that evaluation of fea- ture detectors is important (Phillips and Bowyer, 1999). Existing evaluation methods use ground-truth verifica- tion (Bowyer et al., 1999), visual inspection (Heath et al., 1997; L´ opez et al., 1999), localization accu- racy (Baker and Nayar, 1999; Brand and Mohr, 1994; Coelho et al., 1991; Heyden and Rohr, 1996), theoret- ical analysis (Demigny and Kaml´ e, 1997; Deriche and Giraudon, 1993; Rohr, 1994) or specific tasks (Shin et al., 1998, 1999). In this paper we introduce two novel criteria for evaluating interest points: repeatability and informa- tion content. Those two criteria directly measure the quality of the feature for tasks like image matching, object recognition and 3D reconstruction. They apply to any type of scene, and they do not rely on any spe- cific feature model or high-level interpretation of the feature. Our criteria are more general than most existing evaluation methods (cf. Section 1.1). They are comple- mentary to localization accuracy which is relevant for tasks like camera calibration and 3D reconstruction of specific scene points. This criterion has previously been evaluated for interest point detectors (Baker and Nayar, 1999; Brand and Mohr, 1994; Coelho et al., 1991; Heyden and Rohr, 1996). Repeatability explicitly compares the geometrical stability of the detected interest points between dif- ferent images of a given scene taken under varying viewing conditions. Previous methods have evaluated detectors for individual images only. An interest point is “repeated”, if the 3D scene point detected in the first image is also accurately detected in the second one. The repeatability rate is the percentage of the total ob- served points that are detected in both images. Note that repeatability and localization are conflicting criteria— smoothing improves repeatability but degrades local- ization (Canny, 1986). Information content is a measure of the distinctive- ness of an interest point. Distinctiveness is based on the likelihood of a local greyvalue descriptor computed at the point within the population of all observed inter- est point descriptors. Descriptors characterize the local shape of the image at the interest points. The entropy of these descriptors measures the information content of a set of interest points. In this paper several detectors are compared using these two evaluation criteria. The best detector satisfies both of these criteria well, which explains its success for tasks such as image matching based on interest points

Upload: others

Post on 19-Jul-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Evaluation of Interest Point Detectorszduric/cs774/Papers/Schmid-Evaluation-IJCV.pdf · International Journal of Computer Vision 37(2), 151–172, 2000 °c 2000 Kluwer Academic Publishers

International Journal of Computer Vision 37(2), 151–172, 2000c© 2000 Kluwer Academic Publishers. Manufactured in The Netherlands.

Evaluation of Interest Point Detectors

CORDELIA SCHMID, ROGER MOHR AND CHRISTIAN BAUCKHAGEINRIA Rhone-Alpes, 655 av. de l’Europe, 38330 Montbonnot, France

[email protected]

Abstract. Many different low-level feature detectors exist and it is widely agreed that the evaluation of detectorsis important. In this paper we introduce two evaluation criteria for interest points: repeatability rate and informationcontent. Repeatability rate evaluates the geometric stability under different transformations. Information contentmeasures the distinctiveness of features. Different interest point detectors are compared using these two criteria.We determine which detector gives the best results and show that it satisfies the criteria well.

Keywords: interest points, quantitative evaluation, comparison of detectors, repeatability, information content

1. Introduction

Many computer vision tasks rely on low-level fea-tures. A wide variety of feature detectors exist, andresults can vary enormously depending on the detec-tor used. It is widely agreed that evaluation of fea-ture detectors is important (Phillips and Bowyer, 1999).Existing evaluation methods use ground-truth verifica-tion (Bowyer et al., 1999), visual inspection (Heathet al., 1997; L´opez et al., 1999), localization accu-racy (Baker and Nayar, 1999; Brand and Mohr, 1994;Coelho et al., 1991; Heyden and Rohr, 1996), theoret-ical analysis (Demigny and Kaml´e, 1997; Deriche andGiraudon, 1993; Rohr, 1994) or specific tasks (Shinet al., 1998, 1999).

In this paper we introduce two novel criteria forevaluating interest points: repeatability and informa-tion content. Those two criteria directly measure thequality of the feature for tasks like image matching,object recognition and 3D reconstruction. They applyto any type of scene, and they do not rely on any spe-cific feature model or high-level interpretation of thefeature. Our criteria are more general than most existingevaluation methods (cf. Section 1.1). They are comple-mentary to localization accuracy which is relevant fortasks like camera calibration and 3D reconstruction ofspecific scene points. This criterion has previously been

evaluated for interest point detectors (Baker and Nayar,1999; Brand and Mohr, 1994; Coelho et al., 1991;Heyden and Rohr, 1996).

Repeatability explicitly compares the geometricalstability of the detected interest points between dif-ferent images of a given scene taken under varyingviewing conditions. Previous methods have evaluateddetectors for individual images only. An interest pointis “repeated”, if the 3D scene point detected in the firstimage is also accurately detected in the second one.The repeatability rate is the percentage of the total ob-served points that are detected in both images. Note thatrepeatability and localization are conflicting criteria—smoothing improves repeatability but degrades local-ization (Canny, 1986).

Information content is a measure of the distinctive-ness of an interest point. Distinctiveness is based on thelikelihood of a local greyvalue descriptor computed atthe point within the population of all observed inter-est point descriptors. Descriptors characterize the localshape of the image at the interest points. The entropyof these descriptors measures the information contentof a set of interest points.

In this paper several detectors are compared usingthese two evaluation criteria. The best detector satisfiesboth of these criteria well, which explains its success fortasks such as image matching based on interest points

Page 2: Evaluation of Interest Point Detectorszduric/cs774/Papers/Schmid-Evaluation-IJCV.pdf · International Journal of Computer Vision 37(2), 151–172, 2000 °c 2000 Kluwer Academic Publishers

152 Schmid, Mohr and Bauckhage

and correlation (Zhang et al., 1995). In this contextat least a subset of the points have to be repeated inorder to allow feature correspondence. Furthermore,if image-based measures (e.g. correlation) are used tocompare points, interest points should have distinctivepatterns.

1.1. Related Work on the Evaluationof Feature Detectors

The evaluation of feature detectors has concentratedon edges. Only a few authors have evaluated inter-est point detectors. In the following we give a fewexamples of existing edge evaluation methods and asurvey of previous work on evaluating interest points.Existing methods can be categorized into methodsbased on: ground-truth verification, visual inspection,localization accuracy, theoretical analysis and specifictasks.

1.1.1. Ground-Truth Verification. Methods basedon ground-truth verification determine the undetectedfeatures and the false positives. Ground-truth is in gen-eral created by a human. It relies on his symbolic in-terpretation of the image and is therefore subjective.Furthermore, human interpretation limits the complex-ity of the images used for evaluation.

For example, Bowyer et al. (1999) use humanmarked ground-truth to evaluate edge detectors. Theirevaluation criterion is the number of false positiveswith respect to the number of unmatched edgeswhich is measured for varying input parameters. Theyused structured outdoor scenes, such as airports andbuildings.

1.1.2. Visual Inspection. Methods using visual in-spection are even more subjective as they are directlydependent on the human evaluating the results. L´opezet al. (1999) define a set of visual criteria to evalu-ate the quality of detection. They visually comparea set of ridge and valley detectors in the context ofmedical images. Heath et al. (1997) evaluate detectorsusing a visual rating score which indicates the per-ceived quality of the edges for identifying an object.This score is measured by a group of people. Differentedge detectors are evaluated on real images of complexscenes.

1.1.3. Localization Accuracy. Localization accu-racy is the criterion most often used to evaluate interestpoints. It measures whether an interest point is accu-rately located at a specific 2D location. This criterion

is significant for tasks like camera calibration and the3D reconstruction of specific scene points. Evaluationrequires the knowledge of precise 3D properties, whichrestricts the evaluation to simple scenes.

Localization accuracy is often measured by verify-ing that a set of 2D image points is coherent with theknown set of corresponding 3D scene points. For ex-ample Coelho et al. (1991) compare the localizationaccuracy of interest point detectors using different pla-nar projective invariants for which reference values arecomputed using scene measurements. The scene con-tains simple black polygons and is imaged from dif-ferent viewing angles. A similar evaluation scheme isused by Heyden and Rohr (1996). They extract sets ofpoints from images of polyhedral objects and use pro-jective invariants to compute a manifold of constraintson the points.

Brand and Mohr (1994) measure the localization ac-curacy of a model-based L-corner detector. They usefour different criteria: alignment of the extracted points,accuracy of the 3D reconstruction, accuracy of theepipolar geometry and stability of the cross-ratio. Theirscene is again very simple: it contains black squares ona white background.

To evaluate the accuracy of edge point detectors,Baker and Nayar (1999) propose four global criteria:collinearity, intersection at a single point, parallelismand localization on an ellipse. Each of the criteria cor-responds to a particular (very simple) scene and is mea-sured using the extracted edgels. Their experiments areconducted under widely varying image conditions (il-lumination change and 3D rotation).

As mentioned by Heyden and Rohr, methods basedon projective invariants don’t require the knowledge ofthe exact position of the features in the image. This is anadvantage of such methods, as the location of featuresin an image depends both on the intrinsic parametersof the camera and the relative position and orientationof the object with respect to the camera and is there-fore difficult to determine. However, the disadvantageof such methods is the lack of comparison to true datawhich may introduce a systematic bias.

1.1.4. Theoretical Analysis. Methods based on atheoretical analysis examine the behavior of the detec-tors for theoretical feature models. Such methods arelimited, as they only apply to very specific features.For example, Deriche and Giraudon (1993) study an-alytically the behavior of three interest point detectorsusing a L-corner model. Their study allows them tocorrect the localization bias. Rohr (1994) performs a

Page 3: Evaluation of Interest Point Detectorszduric/cs774/Papers/Schmid-Evaluation-IJCV.pdf · International Journal of Computer Vision 37(2), 151–172, 2000 °c 2000 Kluwer Academic Publishers

Evaluation of Interest Point Detectors 153

similar analysis for L-corners with aperture angles inthe range of 0 and 180 degrees. His analysis evaluates10 different detectors.

Demigny and Kaml´e (1997) use three criteria to the-oretically evaluate four step edge detectors. Their cri-teria are based on Canny’s criteria which are adaptedto the discrete domain using signal processing theory.Canny’s criteria are good detection, good localizationand low responses multiplicity.

1.1.5. A Specific Task. A few methods have evalu-ated detectors through specific tasks. They consider thatfeature detection is not a final result by itself, but merelyan input for further processing. Therefore, the true per-formance criterion is how well it prepares the input forthe next algorithm. This is no doubt true to some ex-tent. However, evaluations based on a specific task andsystem are hard to generalize and hence rather limited.

Shin et al. (1999) compare edge detectors using anobject recognition algorithm. Test images are cars im-aged under different lighting conditions and in frontof varying backgrounds. In Shin et al. (1998) the per-formance of a edge-based structure from motion algo-rithm is used for evaluation. Results are given for twosimple 3D scenes which are imaged under varying 3Drotations.

1.2. Overview of the Paper

Section 2 presents a state of the art on interest pointdetectors as well as implementation details for the de-tectors used in our comparison. Section 3 defines therepeatability criterion, explains how to determine it ex-perimentally and presents the results of a comparisonunder different transformations. Section 4 describes theinformation content criterion and evaluates results fordifferent detectors. In Section 5 we select the detectorwhich gives the best results according to the two crite-ria, show that the quality of its results is very high anddiscuss possible extensions.

2. Interest Point Detectors

By “interest point” we simply mean any point in theimage for which the signal changes two-dimensionally.Conventional “corners” such as L-corners, T-junctionsand Y-junctions satisfy this, but so do black dots onwhite backgrounds, the endings of branches and anylocation with significant 2D texture. We will use thegeneral term “interest point” unless a more specific

Figure 1. Interest points detected on Van Gogh’s sower painting.The detector is an improved version of the Harris detector. There are317 points detected.

type of point is referred to. Figure 1 shows an exam-ple of general interest points detected on Van Gogh’ssower painting.

2.1. State of the Art

A wide variety of interest point and corner detectorsexist in the literature. They can be divided into threecategories: contour based, intensity based and paramet-ric model based methods. Contour based methods firstextract contours and then search for maximal curva-ture or inflexion points along the contour chains, or dosome polygonal approximation and then search for in-tersection points. Intensity based methods compute ameasure that indicates the presence of an interest pointdirectly from the greyvalues. Parametric model meth-ods fit a parametric intensity model to the signal. Theyoften provide sub-pixel accuracy, but are limited to spe-cific types of interest points, for example to L-corners.In the following we briefly present detection methodsfor each of the three categories.

2.1.1. Contour Based Methods.Contour based me-thods have existed for a long time; some of the morerecent ones are presented. Asada and Brady (1986) ex-tract interest points for 2D objects from planar curves.They observe that these curves have special character-istics: the changes in curvature. These changes are clas-sified in several categories: junctions, endings etc. Toachieve robust detection, their algorithm is integrated ina multi-scale framework. A similar approach has beendeveloped by Mokhtarian and Mackworth (1986). Theyuse inflexion points of a planar curve.

Page 4: Evaluation of Interest Point Detectorszduric/cs774/Papers/Schmid-Evaluation-IJCV.pdf · International Journal of Computer Vision 37(2), 151–172, 2000 °c 2000 Kluwer Academic Publishers

154 Schmid, Mohr and Bauckhage

Medioni and Yasumoto (1987) use B-splines to ap-proximate the contours. Interest points are maxima ofcurvature which are computed from the coefficients ofthese B-splines.

Horaud et al. (1990) extract line segments from theimage contours. These segments are grouped and inter-sections of grouped line segments are used as interestpoints.

Shilat et al. (1997) first detect ridges and troughsin the images. Interest points are high curvature pointsalong ridges or troughs, or intersection points. They ar-gue that such points are more appropriate for tracking,as they are less likely to lie on the occluding contoursof an object.

Mokhtarian and Suomela (1998) describe an interestpoint detector based on two sets of interest points. Oneset are T-junctions extracted from edge intersections. Asecond set is obtained using a multi-scale framework:interest points are curvature maxima of contours at acoarse level and are tracked locally up to the finest level.The two sets are compared and close interest points aremerged.

The algorithm of Pikaz and Dinstein (1994) is basedon a decomposition of noisy digital curves into a min-imal number of convex and concave sections. The lo-cation of each separation point is optimized, yieldingthe minimal possible distance between the smoothedapproximation and the original curve. The detection ofthe interest points is based on properties of pairs of sec-tions that are determined in an adaptive manner, ratherthan on properties of single points that are based on afixed-size neighborhood.

2.1.2. Intensity Based Methods.Moravec (1977)developed one of the first signal based interest pointdetectors. His detector is based on the auto-correlationfunction of the signal. It measures the greyvalue dif-ferences between a window and windows shifted inseveral directions. Four discrete shifts in directions par-allel to the rows and columns of the image are used. Ifthe minimum of these four differences is superior to athreshold, an interest point is detected.

The detector of Beaudet (1978) uses the secondderivatives of the signal for computing the measureDET= Ixx I yy− I 2

xy whereI (x, y) is the intensity sur-face of the image. DET is the determinant of theHessian matrix and is related to the Gaussian curva-ture of the signal. This measure is invariant to rota-tion. Points where this measure is maximal are interestpoints.

Kitchen and Rosenfeld (1982) present an inter-est point detector which uses the curvature of planarcurves. They look for curvature maxima on isophotesof the signal. However, due to image noise an isophotecan have an important curvature without correspond-ing to an interest point, for example on a region withalmost uniform greyvalues. Therefore, the curvatureis multiplied by the gradient magnitude of the imagewhere non-maximum suppression is applied to the gra-dient magnitude before multiplication. Their measure

is K = Ixx I 2y + I yy I 2

x − 2Ixy Ix I y

I 2x + I 2

y.

Dreschler and Nagel (1982) first determine locationsof local extrema of the determinant of the Hessian DET.A location of maximum positive DET can be matchedwith a location of extreme negative DET, if the direc-tions of the principal curvatures which have oppositesign are approximatively aligned. The interest point islocated between these two points at the zero crossing ofDET. Nagel (1983) shows that the Dreschler-Nagel’sapproach and Kitchen-Rosenfeld’s approach are iden-tical.

Several interest point detectors (F¨orstner, 1994;Forstner and G¨ulch, 1987; Harris and Stephens, 1988;Tomasi and Kanade, 1991) are based on a matrix relatedto the auto-correlation function. This matrixA averagesderivatives of the signal in a windowW around a point(x, y):

A(x, y)

= ∑

W(Ix(xk, yk))

2 ∑W

Ix(xk, yk)I y(xk, yk)∑W

Ix(xk, yk)I y(xk, yk)∑W(I y(xk, yk))

2

(1)

whereI(x, y) is the image function and(xk, yk) are thepoints in the windowW around(x, y).

This matrix captures the structure of the neighbor-hood. If this matrix is of rank two, that is both of itseigenvalues are large, an interest point is detected. Amatrix of rank one indicates an edge and a matrix ofrank zero a homogeneous region. The relation betweenthis matrix and the auto-correlation function is givenin Appendix A.

Harris and Stephens (1988) improve the approach ofMoravec by using the auto-correlation matrixA. Theuse of discrete directions and discrete shifts is thusavoided. Instead of using a simple sum, a Gaussianis used to weight the derivatives inside the window.Interest points are detected if the auto-correlation ma-trix A has two significant eigenvalues.

Page 5: Evaluation of Interest Point Detectorszduric/cs774/Papers/Schmid-Evaluation-IJCV.pdf · International Journal of Computer Vision 37(2), 151–172, 2000 °c 2000 Kluwer Academic Publishers

Evaluation of Interest Point Detectors 155

Forstner and G¨ulch (1987) propose a two step proce-dure for localizing interest points. First points are de-tected by searching for optimal windows using the auto-correlation matrixA. This detection yields systematiclocalization errors, for example in the case of L-corners.A second step based on a differential edge intersectionapproach improves the localization accuracy.

Forstner (1994) uses the auto-correlation matrixA toclassify image pixels into categories—region, contourand interest point. Interest points are further classifiedinto junctions or circular features by analyzing the localgradient field. This analysis is also used to determinethe interest point location. Local statistics allow a blindestimate of signal-dependent noise variance for auto-matic selection of thresholds and image restoration.

Tomasi and Kanade (1991) motivate their approachin the context of tracking. A good feature is definedas one that can be tracked well. They show that sucha feature is present if the eigenvalues of matrixA aresignificant.

Heitger et al. (1992) develop an approach inspiredby experiments on the biological visual system. Theyextract 1D directional characteristics by convolving theimage with orientation-selective Gabor like filters. Inorder to obtain 2D characteristics, they compute thefirst and second derivatives of the 1D characteristics.

Cooper et al. (1993) first measure the contour direc-tion locally and then compute image differences alongthe contour direction. A knowledge of the noise char-acteristics is used to determine whether the image dif-ferences along the contour direction are sufficient toindicate an interest point. Early jump-out tests allow afast computation of the image differences.

The detector of Reisfeld et al. (1995) uses the con-cept of symmetry. They compute a symmetry mapwhich shows a “symmetry strength” for each pixel.This symmetry is computed locally by looking at themagnitude and the direction of the derivatives of neigh-boring points. Points with high symmetry are selectedas interest points.

Smith and Brady (1997) compare the brightness ofeach pixel in a circular mask to the center pixel to definean area that has a similar brightness to the center. Twodimensional features can be detected from the size,centroid and second moment of this area.

The approach proposed by Lagani`ere (1998) is basedon a variant of the morphological closing operatorwhich successively applies dilation/erosion with dif-ferent structuring elements. Two closing operators andfour structuring elements are used. The first closing op-

erator is sensitive to vertical/horizontal L-corners andthe second to diagonal L-corners.

2.1.3. Parametric Model Based Methods.The para-metric model used by Rohr (1992) is an analytic junc-tion model convolved with a Gaussian. The parametersof the model are adjusted by a minimization method,such that the template is closest to the observed sig-nal. In the case of a L-corner the parameters of themodel are the angle of the L-corner, the angle betweenthe symmetry axis of the L-corner and thex-axis, thegreyvalues, the position of the point and the amount ofblur. Positions obtained by this method are very precise.However, the quality of the approximation depends onthe initial position estimation. Rohr uses an interestpoint detector which maximizes det(A) (cf. Eq. (1)) aswell as the intersection of line segments to determinethe initial values for the model parameters.

Deriche and Blaszka (1993) develop an accelerationof Rohr’s method. They substitute an exponential forthe Gaussian smoothing function. They also show thatto assure convergence the image region has to be quitelarge. In cluttered images the region is likely to containseveral signals, which makes convergence difficult.

Baker et al. (1998) propose an algorithm that auto-matically constructs a detector for an arbitrary para-metric feature. Each feature is represented as a denselysampled parametric manifold in a low dimensional sub-space. A feature is detected, if the projection of thesurrounding intensity values in the subspace lies suf-ficiently close to the feature manifold. Furthermore,during detection the parameters of detected featuresare recovered using the closest point on the featuremanifold.

Parida et al. (1998) describe a method for generaljunction detection. A deformable template is used to de-tect radial partitions. The minimum description lengthprinciple determines the optimal number of partitionsthat best describes the signal.

2.2. Implementation Details

This section presents implementation details for thedetectors included in our comparison. The detectorsare Harris (Harris and Stephens, 1988), ImpHarris (animproved version of Harris), Cottier (Cottier, 1994),Horaud (Horaud et al., 1990), Heitger (Heitger et al.,1992) and F¨orstner (F¨orstner, 1994). Except in the caseof the improved version of Harris, we have used the

Page 6: Evaluation of Interest Point Detectorszduric/cs774/Papers/Schmid-Evaluation-IJCV.pdf · International Journal of Computer Vision 37(2), 151–172, 2000 °c 2000 Kluwer Academic Publishers

156 Schmid, Mohr and Bauckhage

implementations of the original authors, with the stan-dard default parameter values recommended by the au-thors for general purpose feature detection. These val-ues are seldom optimal for any given image, but theydo well on average on collections of different images.Our goal is to evaluate detectors for such collections.

The Harris (Harris and Stephens, 1988) computesthe derivatives of the matrixA (cf. Eq. (1)) by convolu-tion with the mask [−2−1 0 1 2]. A Gaussian(σ = 2)is used to weight the derivatives summed over the win-dow. To avoid the extraction of the eigenvalues of thematrixA, the strength of an interest points is measuredby det(A)−α trace(A)2. The second term is used toeliminate contour points with one strong eigenvalue,α

is set to 0.06. Non-maximum suppression using a 3× 3mask is then applied to the interest point strength anda threshold is used to select interest points. The thresh-old is set to 1% of the maximum observed interest pointstrength.

In the improved version of Harris (ImpHarris),derivatives are computed more precisely by replacingthe [−2−1 0 1 2]mask with derivatives of a Gaussian(σ = 1). A recursive implementation of the Gaussianfilters (Deriche, 1993) guarantees fast detection.

Cottier (1994) applies the Harris detector only tocontour points in the image. Derivatives for contourextraction as well as for the Harris detector are com-puted by convolution with the Canny/Deriche opera-tor (Deriche, 1987)(α= 2, ω= 0.001). Local maximadetection with hysteresis thresholding is used to ex-tract contours. High and low thresholds are determinedfrom the gradient magnitude (high= average gradientmagnitude, low= 0.1 ∗ high). For the Harris detec-tor derivatives are averaged over two different windowsizes in order to increase localization accuracy. Pointsare first detected using a 5× 5 window. The exact lo-cation is then determined by using a 3× 3 window andsearching the maximum in the neighborhood of the de-tected point.

Horaud (Horaud et al., 1990) first extracts contourchains using his implementation of the Canny edge de-tector. Tangent discontinuities in the chain are locatedusing a worm, and a line fit between the discontinuitiesis estimated using orthogonal regression. Lines are thengrouped and intersections between neighboring linesare used as interest points.

Heitger (Heitger et al., 1992) convolves the imagewith even and odd symmetrical orientation-selectivefilters. These Gabor like filters are parameterized bythe width of the Gaussian envelope(σ = 5), the sweep

which increases the relative weight of the negativeside-lobes of even filters and the orientation selectivitywhich defines the sharpness of the orientation tuning.Even and odd filters are computed for 6 orientations.For each orientation an energy map is computed bycombining even and odd filter outputs. 2D signal vari-ations are then determined by differentiating each en-ergy map along the respective orientation using “end-stopped operators”. Non-maximum suppression (3× 3mask) is applied to the combined end-stopped operatoractivity and a relative threshold (0.1) is used to selectinterest points.

The Forstner detector (F¨orstner, 1994) computesthe derivatives on the smoothed image(σ = 0.7). Thederivatives are then summed over a Gaussian window(σ = 2) to obtain the auto-correlation matrixA. Thetrace of this matrix is used to classify pixels into re-gion or non-region. For homogeneous regions the tracefollows approximatively aχ2-distribution. This allowsto determine the classification threshold automaticallyusing a significance level(α= 0.95) and the estimatednoise variance. Pixels are further classified into con-tour or interest point using the ratio of the eigenvaluesand a fixed threshold (0.3). Interest point locations arethen determined by minimizing a function of the localgradient field. The parameter of this function is the sizeof the Gaussian which is used to compute a weightedsum over the local gradient measures(σ = 4).

3. Repeatability

3.1. Repeatability Criterion

Repeatability signifies that detection is independent ofchanges in the imaging conditions, i.e. the parametersof the camera, its position relative to the scene, andthe illumination conditions. 3D points detected in oneimage should also be detected at approximately cor-responding positions in subsequent ones (cf. Fig. 2).Given a 3D pointX and two projection matricesP1

and Pi , the projections ofX into imagesI1 and Ii arex1 = P1X andxi = Pi X. A point x1 detected in imageI1 is repeated in imageIi if the corresponding pointxi

is detected in imageIi . To measure the repeatability,a unique relation betweenx1 and xi has to be estab-lished. This is difficult for general 3D scenes, but inthe case of a planar scene this relation is defined by ahomography (Semple and Kneebone, 1952):

xi = H1i x1 where H1i = Pi P−11

Page 7: Evaluation of Interest Point Detectorszduric/cs774/Papers/Schmid-Evaluation-IJCV.pdf · International Journal of Computer Vision 37(2), 151–172, 2000 °c 2000 Kluwer Academic Publishers

Evaluation of Interest Point Detectors 157

Figure 2. The pointsx1 andxi are the projections of 3D pointXinto imagesI1 and Ii : x1= P1X and xi = Pi X where P1 and Pi

are the projection matrices. A detected pointx1 is repeated ifxi isdetected. It isε-repeated if a point is detected in theε-neighborhoodof xi . In the case of planar scenes the pointsx1 andxi are related bythe homographyH1i .

P−11 is an abusive notation to represent the back-

projection of imageI1. In the case of a planar scenethis back-projection exists.

The repeatability rate is defined as the number ofpoints repeated between two images with respect to thetotal number of detected points. To measure the numberof repeated points, we have to take into account that theobserved scene parts differ in the presence of changedimaging conditions, such as image rotation or scalechange. Interest points which can not be observed inboth images corrupt the repeatability measure. There-fore only points which lie in the common scene part areused to compute the repeatability. This common scenepart is determined by the homography. Pointsx1 andxi

which lie in the common part of imagesI1 and Ii aredefined by:

{x1}= {x1 | H1i x1∈ Ii } and {xi }= {xi | Hi 1xi ∈ I1}

where{x1} and{xi } are the points detected in imagesI1 and Ii respectively.Hi j is the homography betweenimagesIi and I j .

Furthermore, the repeatability measure has to takeinto account the uncertainty of detection. A repeatedpoint is in general not detected exactly at positionxi , but rather in some neighborhood ofxi . The sizeof this neighborhood is denoted byε (cf. Fig. 2)and repeatability within this neighborhood is calledε-repeatability. The set of point pairs(x1, xi ) whichcorrespond within anε-neighborhood is defined by:

Ri (ε) = {(x1, xi ) | dist(H1i x1, xi ) < ε}

The number of detected points may be different forthe two images. In the case of a scale change, for ex-ample, more interest points are detected on the highresolution image. Only the minimum number of inter-est points (the number of interest points of the coarseimage) can be repeated. The repeatability rateri (ε) forimageIi is thus defined by:

ri (ε) = |Ri (ε)|min(n1, ni )

where n1= |{x1}| and ni = |{xi }| are the number ofpoints detected in the common part of imagesI1 andIi

respectively. We can easily verify that 0≤ ri (ε)≤ 1.The repeatability criterion, as defined above, is only

valid for planar scenes. Only for such scenes the geo-metric relation between two images is completely de-fined. However, the restriction to planar scenes is nota limitation, as additional interest points detected on3D scenes are due to occlusions and shadows. Thesepoints are due to real changes of the observed scene andthe repeatability criterion should not take into accountsuch unreliable points (Shi and Tomasi, 1994; Shilatet al., 1997).

3.2. Experimental Conditions

Sequences: The repeatability rates of several inter-est point detectors are compared under different trans-formations: image rotation, scale change, illuminationvariation and viewpoint change. We consider both uni-form and complex illumination variation. Stability toimage noise has also been tested. The scene is al-ways static and we move either the camera or the lightsource.

We will illustrate results for two planar scenes: “VanGogh” and “Asterix”. The “Van Gogh” scene is thesower painting shown in Fig. 1. The “Asterix” scenecan be seen in Fig. 3. The two scenes are very different:the “Van Gogh” scene is highly textured whereas the“Asterix” scene is mostly line drawings.

Estimating the homography: To ensure an accuraterepeatability rate, the computation of the homographyhas to be precise and independent of the detected points.An independent, sub-pixel localization is required. Wetherefore take a second image for each position of thecamera, with black dots projected onto the scene (cf.Fig. 3). The dots are extracted very precisely by fittinga template, and their centers are used to compute the

Page 8: Evaluation of Interest Point Detectorszduric/cs774/Papers/Schmid-Evaluation-IJCV.pdf · International Journal of Computer Vision 37(2), 151–172, 2000 °c 2000 Kluwer Academic Publishers

158 Schmid, Mohr and Bauckhage

Figure 3. Two images of the “Asterix” scene. On the left the image of the scene and on the right the image with black dots projected.

Figure 4. Projection mechanism. An overhead projector casts blackdots on the scene.

homography. A least median square method makes thecomputation robust.

While recording the sequence, the scene and the pro-jection mechanism (overhead projector) remain fixed.

Figure 5. Comparison of Harris and ImpHarris. On the left the repeatability rate for an image rotation and on the right the rate for a scalechange.ε = 1.5.

Only the camera or the light source move. The projec-tion mechanism is displayed in Fig. 4.

3.3. Results for Repeatability

We first compare the two versions of Harris (Sec-tion 3.3.1). The one with better results is then includedin the comparison of the detectors (Section 3.3.2–3.3.6). Comparisons are presented for image rotation,scale change, illumination variation, change of view-point and camera noise. Results are presented for the“Van Gogh” scene; results for the “Asterix” scene aregiven in Appendix B. For the images used in our experi-ments, we detect between 200 and 1200 interest pointsdepending on the image and the detector used. Themean distance between a point and its closest neighboris around 10 pixels. Measuring the repeatability ratewith ε= 1.5 or less, the probability that two points areaccidentally within the error distance is very low.

3.3.1. Comparison of the Two Harris Versions.Fig-ure 5 compares the two different versions of the Harris

Page 9: Evaluation of Interest Point Detectorszduric/cs774/Papers/Schmid-Evaluation-IJCV.pdf · International Journal of Computer Vision 37(2), 151–172, 2000 °c 2000 Kluwer Academic Publishers

Evaluation of Interest Point Detectors 159

detector in the presence of image rotation (graph onthe left) and scale change (graph on the right). The re-peatability of the improved version of Harris (ImpHar-ris) is better in both cases. The results of the standardversion vary with image rotation, the worst results be-ing obtained for an angle of 45◦. This is due to thefact that the standard version uses non-isotropic dis-crete derivative filters. A stable implementation of thederivatives significantly improves the repeatability ofthe Harris detector. The improved version (ImpHarris)is included in the comparison of the detectors in thefollowing sections.

Figure 6. Image rotation sequence. The left image is the reference image. The rotation angle for the image in the middle is 38◦ and for theimage on the right 116◦.

Figure 7. Interest points detected on the images of Fig. 6 using the improved version of Harris. There are 610, 595 and 586 points detected inthe left, middle and right images, respectively. For a localization errorε of 1.5 pixels the repeatability rate between the left and middle imagesis 91% and between the left and right images 89%.

Figure 8. Repeatability rate for image rotation.ε = 0.5 for the left graph andε = 1.5 for the right graph.

3.3.2. Image Rotation. In this section, we compareall detectors described in Section 2.2 for an image rota-tion. Image rotations are obtained by rotating the cam-era around its optical axis using a special mechanism.Figure 6 shows three images of the rotation sequence.The left image is the reference image. The rotation an-gle for the image in the middle is 38◦ and for the imageon the right 116◦. The interest points detected for thesethree images using the improved version of Harris aredisplayed in Fig. 7.

The repeatability rate for this rotation sequence isdisplayed in Fig. 8. The rotation angles vary between

Page 10: Evaluation of Interest Point Detectorszduric/cs774/Papers/Schmid-Evaluation-IJCV.pdf · International Journal of Computer Vision 37(2), 151–172, 2000 °c 2000 Kluwer Academic Publishers

160 Schmid, Mohr and Bauckhage

Figure 9. Repeatability rate as a function of the localization errorε. The rotation angle is 89◦.

0◦ and 180◦. The graph on the left displays the repeata-bility rate for a localization errorε of 0.5 pixels. Thegraph on the right shows the results for an error of 1.5pixels, that is the detected point lies in the pixel neigh-borhood of the predicted point.

For both localization errors the improved version ofHarris gives the best results; results are not dependenton image rotation. Atε= 1.5 the repeatability rate ofImpHarris is almost 100%. Computing Harris only onthe image contours (Cottier) makes the results worse.Results of the Heitger detector depend on the rotationangle, as it uses derivatives computed in several fixed

Figure 10. Scale change sequence. The left image is the reference image. The scale change for the middle one is 1.5 and for the right one 4.1.

Figure 11. Interest points detected on the images of Fig. 10 using the improved version of Harris. There are 317, 399 and 300 points detectedin the left, middle and right images, respectively. The repeatability rate between the left and middle images is 54% and between the left andright images 1% for a localization errorε of 1.5 pixels.

directions. Its results are worse for rotation angles be-tween 40◦ and 140◦. The detector of F¨orstner gives badresults for rotations of 45◦, probably owing to the useof anisotropic derivative filters. The worst results areobtained by the method based on the intersection ofline segments (Horaud).

Figure 9 shows the repeatability rate as a functionof the localization errorε for a constant rotation angleof 89 degrees. The localization error varies between0.5 pixels and 5 pixels. When increasing the local-ization error, the results improve for all the detectors.However, the improved version of Harris detector is al-ways best and increases most rapidly. For this detector,good results are obtained above a 1 pixel localizationerror.

3.3.3. Scale Change. Scale change is investigated byvarying the focal length of the camera. Figure 10 showsthree images of the scale change sequence. The leftimage is the reference image. The scale change for themiddle one is 1.5 and for the right one 4.1. The scalefactors have been determined by the ratios of the focallengths. The interest points detected for these three im-ages using the improved version of Harris are displayedin Fig. 11.

Figure 12 shows the repeatability rate for scalechanges. The left graph shows the repeatability rate foran ε of 0.5 and the right one for anε of 1.5 pixels.Evidently the detectors are very sensitive to scale

Page 11: Evaluation of Interest Point Detectorszduric/cs774/Papers/Schmid-Evaluation-IJCV.pdf · International Journal of Computer Vision 37(2), 151–172, 2000 °c 2000 Kluwer Academic Publishers

Evaluation of Interest Point Detectors 161

Figure 12. Repeatability rate for scale change.ε = 0.5 for the left graph andε = 1.5 for the right graph.

Figure 13. Repeatability rate as a function of the localization errorε. The scale change is 1.5.

changes. For anε of 0.5 (1.5) repeatability is very poorfor a scale factor above 1.5 (2). The improved version ofHarris and Cottier detectors give the best results. Theresults of the other are hardly usable. Above a scalefactor of about 2.5, the results are mainly due to arti-facts. At larger scales many more points are found inthe textured regions of the scene, so accidental corre-spondences are more likely.

Figure 14. Uniform illumination variation: from left to right images with relative greyvalue 0.6, 1 and 1.7.

Figure 13 shows the repeatability rate as a functionof the localization errorε for a constant scale changeof 1.5. Results improve in the presence of larger local-ization errors; the repeatability rates of ImpHarris andCottier increase more rapidly than those of the otherdetectors.

3.3.4. Variation of Illumination. Illumination canvary in many different ways. In the following we con-sider both a uniform variation of illumination and amore complex variation due to a change of light sourceposition.

Uniform variation of illumination: Uniform illumi-nation variation is obtained by changing the cameraaperture. The change is quantified by the “relativegreyvalue”—the ratio of mean greyvalue of an image tothat of the reference image which has medium illumi-nation. Figure 14 shows three images of the sequence,a dark one with a relative greyvalue of 0.6, the refer-ence image and a bright one with a relative greyvalueof 1.7.

Figure 15 displays the results for a uniform illu-mination variation. Even for a relative greyvalue of

Page 12: Evaluation of Interest Point Detectorszduric/cs774/Papers/Schmid-Evaluation-IJCV.pdf · International Journal of Computer Vision 37(2), 151–172, 2000 °c 2000 Kluwer Academic Publishers

162 Schmid, Mohr and Bauckhage

Figure 15. Repeatability rate for uniform illumination variation.ε = 0.5 for the left graph andε = 1.5 for the right graph.

1 there is not 100% repeatability due to image noise(two images of a relative greyvalue of 1 have beentaken, one reference image and one test image). In bothgraphs, the repeatability decreases smoothly in propor-tion to the relative greyvalue. The improved version ofHarris and Heitger obtain better results than the otherdetectors.

Complex variation of illumination: A non-uniformillumination variation is obtained by moving the light

Figure 16. Complex illumination variation. The reference image is on the left (image # 0), the light source is furthest right for this image. Forthe image in the middle the light source is in front of the scene (image # 6). For the one on the right the light source is furthest left (image # 11).

Figure 17. Repeatability rate for complex illumination variation.ε = 0.5 for the left graph andε = 1.5 for the right graph.

source in an arc from approximately−45◦ to 45◦.Figure 16 shows three images of the sequence. Thelight source is furthest right for the left image (im-age # 0). This image is the reference image for ourevaluation. For the image in the middle the light sourceis in front of the scene (image # 6). Part of this image issaturated. The light source is furthest left for the rightimage (image # 11).

Figure 17 displays the repeatability results. The im-proved version of Harris obtains better results than

Page 13: Evaluation of Interest Point Detectorszduric/cs774/Papers/Schmid-Evaluation-IJCV.pdf · International Journal of Computer Vision 37(2), 151–172, 2000 °c 2000 Kluwer Academic Publishers

Evaluation of Interest Point Detectors 163

the other detectors. For anε of 0.5, results slightlydecrease as the light position changes. For anε of1.5 results are not modified by a complex illumina-tion variation. Illumination direction has little effect onthe results as the interest point measures are computedlocally.

3.3.5. Viewpoint Change. To measure repeatabilityin the presence of viewpoint changes, the position ofthe camera is moved in an arc around the scene. Theangle varies from approximately−50◦ to 50◦. The dif-ferent viewpoints are approximately regularly spaced.Figure 18 shows three images of the sequence. The leftimage is taken at the rightmost position of the camera(image # 0). For the image in the middle the camerais in front of the painting (image # 7). This image isthe reference image for our evaluation. For the rightimage the camera is at its leftmost position (image# 15).

Figure 19 displays the results for a viewpoint change.The improved version of Harris gives results superiorto those of the other detectors. The results degraderapidly for ε= 0.5, but significantly more slowly for

Figure 18. Viewpoint change sequence. The left image is taken at the rightmost position of the camera (image # 0). For the middle image thecamera is in front of the painting (image # 7). This image is the reference image for our evaluation. For the right image the camera is at itsleftmost position (image # 15).

Figure 19. Repeatability rate for the viewpoint change.ε = 0.5 for the left graph andε = 1.5 for the right graph.

ε= 1.5. For thisε the repeatability of ImpHarris is al-ways above 60% except for image 0. The ImpHarrisdetector shows a good repeatability in the presence ofperspective deformations.

3.3.6. Camera Noise. To study repeatability in thepresence of image noise, a static scene has been re-corded several times. The results of this experiment aredisplayed in Fig. 20. We can see that all detectors givegood results except the Horaud one. The improved ver-sion of Harris gives the best results, followed closelyby Heitger. Forε= 1.5 these two detectors obtain a rateof nearly 100%.

3.4. Conclusion for Repeatability

Repeatability of various detectors has been evaluatedin the presence of different imaging conditions: imagerotation, scale change, variation of illumination, view-point change and noise of the imaging system. Twodifferent scenes have been used: “Van Gogh” and “As-terix”. The results for these two sequences are very

Page 14: Evaluation of Interest Point Detectorszduric/cs774/Papers/Schmid-Evaluation-IJCV.pdf · International Journal of Computer Vision 37(2), 151–172, 2000 °c 2000 Kluwer Academic Publishers

164 Schmid, Mohr and Bauckhage

Figure 20. Repeatability rate for the camera noise.ε = 0.5 for the left graph andε = 1.5 for the right graph.

similar; the “Asterix” sequence (cf. Appendix B) con-firms the results presented above.

Results of the previous section show that a stableimplementation of the derivatives improves the resultsof the standard Harris detector. The improved versionof Harris (ImpHarris) gives significantly better resultsthan the other detectors in the presence of image rota-tion. This is due to the rotation invariance of its imagemeasures. The detector of Heitger combines compu-tations in several directions and is not invariant to ro-tations. This is confirmed by Perona (1995) who hasnoticed that the computation in several directions isless stable to an image rotation. ImpHarris and Cottiergive the best results in the presence of scale changes.Moreover, the computation of these detectors is basedon Gaussian filters and can easily be adapted to scalechanges. In the case of illumination variations and cam-era noise, ImpHarris and Heitger obtain the best results.For a viewpoint change ImpHarris shows results whichare superior to the other detectors.

In all cases the results of the improved version ofthe Harris detector are better or equivalent to those ofthe other detectors. For this detector, interest points arelargely independent of the imaging conditions; pointsare geometrically stable.

4. Information Content

4.1. Information Content Criterion

Information content is a measure of the distinctivenessof an interest point. Distinctiveness is based on thelikelihood of a local greyvalue descriptor computed at

the point within the population of all observed interestpoint descriptors. Given one or several images, a de-scriptor is computed for each of the detected interestpoints. Information content measures the distributionof these descriptors. If all descriptors lie close together,they don’t convey any information, that is the informa-tion content is low. Matching for example fails, as anypoint can be matched to any other. On the other handif the descriptors are spread out, information content ishigh and matching is likely to succeed.

Information content of the descriptors is measuredusing entropy. The more spread out the descriptors are,the higher is the entropy. Section 4.2 presents a shortintroduction to entropy and shows that entropy mea-sures the average information content. In Section 4.3we introduce the descriptors used for our evalua-tion, which characterize local greyvalue patterns. Sec-tion 4.4 describes how to partition the set of descriptors.Partitioning of the descriptors is necessary to computethe entropy, as will be explained in Section 4.2. Theinformation content criterion of different detectors iscompared in Section 4.5.

4.2. Entropy

Entropy measures the randomness of a variable. Themore random a variable is the bigger the entropy. Inthe following we are not going to deal with continu-ous variables, but with partitions (Papoulis, 1991). Theentropy of a partitionA={Ai } is:

H(A) = −∑

i

pi log(pi ) (2)

Page 15: Evaluation of Interest Point Detectorszduric/cs774/Papers/Schmid-Evaluation-IJCV.pdf · International Journal of Computer Vision 37(2), 151–172, 2000 °c 2000 Kluwer Academic Publishers

Evaluation of Interest Point Detectors 165

where pi is the probability ofAi . Note that the sizeof the partition influences the results. IfB is a newpartition formed by subdivisions of the sets ofA thenH(B)≥ H(A).

Entropy measures average information content. Ininformation theory the information contentI of amessagei is defined as

Ii = log

(1

pi

)= − log(pi )

The information content of a message is inversely re-lated to its probability. Ifpi = 1 the event always occursand no information is attributed to it:I = 0. The aver-age information content per message of a set of mes-sages is then defined by−∑i pi log(pi ) which is itsentropy.

In the case of interest points we would like to knowhow much average information content an interest point“transmits”, as measured by its descriptor. The moredistinctive the descriptors are, the larger is the averageinformation content.

4.3. Descriptors Characterizing Local Shape

To measure the distribution of local greyvalue patternsat interest points, we have to define a measure whichdescribes such patterns. Collecting unordered pixel val-ues at an interest point does not represent the shape ofthe signal around the pixel. Collecting ordered pixelvalues (e.g. from left to right and from top to bot-tom) respects the shape but is not invariant to rota-tion. We have therefore chosen to use local rotationinvariants.

The rotation invariants used are combinations ofgreyvalue derivatives. Greyvalue derivatives are com-puted stably by convolution with Gaussian deriva-tives. This set of derivatives is called the “local jet”(Koenderink and van Doorn, 1987). Note that deriva-tives up toNth order describe the intensity functionlocally up to that order. The “local jet” of orderN ata pointx = (x, y) for imageI and scaleσ is definedby:

JN [ I ](x, σ ) = {Li1..in(x, σ )∣∣(x, σ ) ∈ I × IR+;

n = 0, . . . , N}

where Li1..in(x, σ ) is the convolution of imageIwith the Gaussian derivativesGi1..in(x, σ ) and i k ∈{x, y}.

To obtain invariance under the groupSO(2) (2Dimage rotations), Koenderink (Koenderink and vanDoorn, 1987) and Romeny (Romeny et al., 1994)compute differential invariants from the local jet.In our work invariants up to second order areused:

EV [0..3]=

Lx Lx + L yL y

LxxLx Lx + 2LxyLx L y + L yyL yL y

Lxx + L yy

LxxLxx + 2LxyLxy+ L yyL yy

(3)

The average luminance does not characterize theshape and is therefore not included. Note that the firstcomponent ofEV is the square of the gradient magnitudeand the third is the Laplacian.

4.4. Partitioning a Set of Descriptors

The computation of entropy requires the partitioningof the descriptorsEV . Partitioning is dependent on thedistance measure between descriptors. The distance be-tween two descriptorsEV 1 and EV 2 is given by the Ma-halanobis distance:

dM( EV 2, EV 1) =√( EV 2− EV 1)T3−1( EV 2− EV 1)

The covariance matrix3 takes into account the vari-ability of the descriptorsEV , i.e. their uncertainty due tonoise. This matrix3 is symmetric positive definite. Itsinverse can be decomposed into3−1= PT DP whereD is diagonal andP an orthogonal matrix representinga change of reference frame. We can then define thesquare root of3−1 as3−1/2= D1/2P whereD1/2 is adiagonal matrix whose coefficients are the square rootsof the coefficients ofD. The Mahalanobis distance canthen be rewritten as:

dM( EV 2, EV 1) = ‖D1/2P( EV 2− EV 1)‖

The distancedM is the norm of the difference of thenormalized vectors:

EV norm= D1/2P EV (4)

Normalization allows us to use equally sized cells inall dimensions. This is important since the entropy isdirectly dependent on the partition used. The probabil-ity of each cell of this partition is used to compute theentropy of a set of vectorsEV.

Page 16: Evaluation of Interest Point Detectorszduric/cs774/Papers/Schmid-Evaluation-IJCV.pdf · International Journal of Computer Vision 37(2), 151–172, 2000 °c 2000 Kluwer Academic Publishers

166 Schmid, Mohr and Bauckhage

4.5. Results for Information Content

In this section, we compute the information content ofthe detectors which are included in our comparison.To obtain a statistically significant measure, a largenumber of points has to be considered. We use a set of1000 images of different types: aerial images, imagesof paintings and images of toy objects. The informationcontent of a detector is then computed as follows:

1. Extract interest points for the set of images.2. Compute descriptors (cf. Eq. (3)) for all extracted

interest points(σ = 3).3. Normalize each descriptor (cf. Eq. (4)). The covari-

ance matrix takes into account the variability of thedescriptors.

4. Partition the set of normalized descriptors. The cellsize is the same in all dimensions, it is set to 20.

5. Determine the probability of each cell and computethe entropy with Eq. (2).

The results are presented in Table 1. It shows thatthe improved version of Harris produces the highestentropy, and hence the most distinctive points. The re-sults obtained for Heitger are almost as good. The twodetectors based on line extraction obtain worse results.This can be explained by their limitation to contourlines which reduces the distinctiveness of their grey-value descriptors and thus their entropy.

Random points are included in our comparison: foreach image we compute the mean numberm of in-terest points extracted by the different detectors. Wethen selectm random points over the image using aspatially uniform distribution. Entropy is computed asspecified above using this random point detector. Theresult for this detector (random) is given in Table 1. Un-surprisingly, the results obtained for all of the interestpoint detectors are significantly better than those forrandom points. The probability to produce a collision

Table 1. The information contentfor different detectors.

Detector Information content

ImpHarris 6.049526

Heitger 5.940877

Horaud 5.433776

Cottier 4.846409

Forstner 4.523368

Random 3.300863

is e−(3.3−6.05)≈ 15.6 times higher for Random than forHarris.

5. Conclusion

In this paper we have introduced two novel evaluationcriteria: repeatability and information content. Thesetwo criteria present several advantages over existingones. First of all, they are significant for a large num-ber of computer vision tasks. Repeatability comparesinterest points detected on images taken under varyingviewing conditions and is therefore significant for anyinterest point based algorithm which uses two or moreimages of a given scene. Examples are image match-ing, geometric hashing, computation of the epipolargeometry etc.

Information content is relevant for algorithms whichuse greyvalue information. Examples are image match-ing based on correlation and object recognition basedon local feature vectors. Furthermore, repeatability aswell as information content are independent of humanintervention and apply to real scenes.

The two criteria have been used to evaluate and com-pare several interest point detectors. Repeatability wasevaluated under various different imaging conditions.In all cases the improved version of Harris is better thanor equivalent to those of the other detectors. Except forlarge scale changes, its points are geometrically sta-ble under all tested image variations. The results forinformation content again show that the improved ver-sion of Harris obtains the best results, although theHeitger detector is a close second. All of the detec-tors have significantly higher information content thanrandomly selected points, so they do manage to select“interesting” points.

The criteria defined in this paper allow the quanti-tative evaluation of new interest point detectors. Onepossible extension is to adapt these criteria to otherlow-level features. Another extension would be to de-sign an improved interest point detector with respect tothe two evaluation criteria. Concerning repeatability,we have seen that detectors show rapid degradation inthe presence of scale change. To solve this problem,the detectors could be included in a multi-scale frame-work. Another solution might be to estimate the scaleat which the best results are obtained. Concerning in-formation content, we think that studying which kindsof greyvalue descriptors occur frequently and whichones are rare will help us to design a detector with evenhigher information content.

Page 17: Evaluation of Interest Point Detectorszduric/cs774/Papers/Schmid-Evaluation-IJCV.pdf · International Journal of Computer Vision 37(2), 151–172, 2000 °c 2000 Kluwer Academic Publishers

Evaluation of Interest Point Detectors 167

Appendix A: Derivation of theAuto-Correlation Matrix

The local auto-correlation function measures the lo-cal changes of the signal. This measure is obtained bycorrelating a patch with its neighbouring patches, thatis with patches shifted by a small amount in differentdirections. In the case of an interest point, the auto-correlation function is high for all shift directions.

Given a shift(1x,1y) and a point(x, y), the auto-correlation function is defined as:

f (x, y) =∑W

(I (xk, yk)− I (xk+1x, yk+1y))2 (5)

where(xk, yk) are the points in the windowW centeredon (x, y) and I the image function.

If we want to use this function to detect interest pointswe have to integrate over all shift directions. Integra-tion over discrete shift directions can be avoided byusing the auto-correlation matrix. This matrix is de-rived using a first-order approximation based on theTaylor expansion:

I (xk +1x, yk +1y)

≈ I (xk, yk)+ (Ix(xk, yk) I y(xk, yk))

(1x1y

)(6)

Substituting the above approximation (6) into Eq. (5),

Figure 21. Comparison of Harris and ImpHarris. On the left repeatability rate for an image rotation and on the right the rate for a scale change.ε= 1.5.

we obtain:

f (x, y) =∑W

((Ix(xk, yk) I y(xk, yk)

)(1x

1y

))2

= (1x 1y)

∑W(Ix(xk, yk))

2 ∑W

Ix(xk, yk)I y(xk, yk)∑W

Ix(xk, yk)I y(xk, yk)∑W(I y(xk, yk))

2

(1x

1y

)

= (1x 1y)A(x, y)

(1x

1y

)(7)

The above Eq. (7) shows that the auto-correlation func-tion can be approximated by the matrixA(x, y). Thismatrix A captures the structure of the local neighbor-hood.

Appendix B: Repeatability Resultsfor the “Asterix” Scene

In this appendix the repeatability results for the “As-terix” scene are presented. Experimental conditions arethe same as described in Section 3.

B.1. Comparison of the Two Harris Versions

Figure 21 compares the two different versions of theHarris detector in the presence of image rotation (graphon the left) and scale change (graph on the right). Therepeatability of the improved version of Harris versionis better in both cases. Results are comparable to thoseobtained for the “Van Gogh” scene (cf. Section 3.3.1).

Page 18: Evaluation of Interest Point Detectorszduric/cs774/Papers/Schmid-Evaluation-IJCV.pdf · International Journal of Computer Vision 37(2), 151–172, 2000 °c 2000 Kluwer Academic Publishers

168 Schmid, Mohr and Bauckhage

Figure 22. Image rotation sequence. On the left the reference image for the rotation sequence. On the right an image with a rotation angle of154◦.

Figure 23. Repeatability rate for the sequence image rotation.ε= 0.5 for the left graph andε= 1.5 for the right graph.

B.2. Image Rotation

Figure 22 shows two images of the rotation sequence.The repeatability rate for the rotation sequence is dis-played in Fig. 23. The improved version of Harris givesthe best results as in the case of the “Van Gogh” scene(cf. Section 3.3.2). Figure 24 shows the repeatabilityrate as a function of the localization errorε for a con-stant rotation angle of 93◦.

B.3. Scale Change

Figure 25 shows two images of the scale change se-quence. The scale factor between the two images is4.1. The repeatability rate for the scale change se-quence is displayed in Fig. 26. The improved version ofHarris and the Cottier detector give the best results asin the case of the “Van Gogh” scene (cf. Section 3.3.3).

Figure 24. Repeatability rate as a function of the localization errorε. The rotation angle is 93◦.

Page 19: Evaluation of Interest Point Detectorszduric/cs774/Papers/Schmid-Evaluation-IJCV.pdf · International Journal of Computer Vision 37(2), 151–172, 2000 °c 2000 Kluwer Academic Publishers

Evaluation of Interest Point Detectors 169

Figure 25. Scale change sequence. On the left the reference image for the scale change sequence. On the right an image with a scale changeof a factor 4.1.

Figure 26. Repeatability rate for the sequence scale change.ε= 0.5 for the left graph andε= 1.5 for the right graph.

Figure 27 shows the repeatability rate as a functionof the localization errorε for a constant scale changeof 1.5.

B.4. Uniform Variation of Illumination

Figure 28 shows two images of the uniform variation ofillumination sequence, a dark one with a relative grey-value of 0.6 and a bright one with a relative greyvalueof 1.5. The repeatability rate for a uniform illuminationvariation is displayed in Fig. 29. ImpHarris and Heitgergive the best results as in the case of the “Van Gogh”scene (cf. Section 3.3.4).

B.5. Camera Noise

The repeatability rate for camera noise is displayed inFig. 30. ImpHarris and Heitger give the best results.

Figure 27. Repeatability rate as a function of the localization errorε. The scale change is 1.5.

Page 20: Evaluation of Interest Point Detectorszduric/cs774/Papers/Schmid-Evaluation-IJCV.pdf · International Journal of Computer Vision 37(2), 151–172, 2000 °c 2000 Kluwer Academic Publishers

170 Schmid, Mohr and Bauckhage

Figure 28. Uniform variation of illumination sequence. On the left an image with a relative greyvalue of 0.6. On the right an image with arelative greyvalue of 1.5.

Figure 29. Repeatability rate for the sequence uniform variation of illumination.ε= 0.5 for the left graph andε= 1.5 for the right graph.

Figure 30. Repeatability rate for camera noise sequence.ε= 0.5 for the graph on the left andε= 1.5 for the graph on the right.

Page 21: Evaluation of Interest Point Detectorszduric/cs774/Papers/Schmid-Evaluation-IJCV.pdf · International Journal of Computer Vision 37(2), 151–172, 2000 °c 2000 Kluwer Academic Publishers

Evaluation of Interest Point Detectors 171

Acknowledgments

We are very grateful to Yves Dufournaud, Radu Horaudand Andrew Zisserman for discussions.

References

Asada, H. and Brady, M. 1986. The curvature primal sketch.IEEE Transactions on Pattern Analysis and Machine Intelligence,8(1):2–14.

Baker, S. and Nayar, S.K. 1999. Global measures of coherence foredge detector evaluation. InProceedings of the Conference onComputer Vision and Pattern Recognition, Fort Collins, Colorado,USA, pp. 373–379.

Baker, S., Nayar, S.K., and Murase, H. 1998. Parametric feature de-tection.International Journal of Computer Vision, 27(1):27–50.

Beaudet, P.R. 1978. Rotationally invariant image operators. InPro-ceedings of the 4th International Joint Conference on PatternRecognition, Tokyo, pp. 579–583.

Bowyer, K.W., Kranenburg, C., and Dougherty, S. 1999. Edge de-tector evaluation using empirical ROC curves. InProceedings ofthe Conference on Computer Vision and Pattern Recognition, FortCollins, Colorado, USA, pp. 354–359.

Brand, P. and Mohr, R. 1994. Accuracy in image measure. InPro-ceedings of the SPIE Conference on Videometrics III, Boston, Mas-sachusetts, USA, Vol. 2350, pp. 218–228.

Canny, J. 1986. A computational approach to edge detection.IEEETransactions on Pattern Analysis and Machine Intelligence,8(6):679–698.

Coelho, C., Heller, A., Mundy, J.L., Forsyth, D., and Zisserman,A. 1991. An experimental evaluation of projective invariants. InProceeding of the DARPA–ESPRIT Workshop on Applications ofInvariants in Computer Vision, Reykjavik, Iceland, pp. 273–293.

Cooper, J., Venkatesh, S., and Kitchen, L. 1993. Early jump-out cor-ner detectors.IEEE Transactions on Pattern Analysis and MachineIntelligence, 15(8):823–833.

Cottier, J.C. 1994. Extraction et appariements robustes des pointsd’interet de deux images non ´etalonnees. Technical Report,LIFIA–IMAG–INRIA Rh one-Alpes.

Demigny, D. and Kaml´e, T. 1997. A discrete expression of Canny’scriteria for step edge detector performances evaluation.IEEETransactions on Pattern Analysis and Machine Intelligence,19(11):1199–1211.

Deriche, R. 1987. Using Canny’s criteria to derive a recursively im-plemented optimal edge detector.International Journal of Com-puter Vision, 1(2):167–187.

Deriche, R. 1993. Recursively implementing the Gaussian and itsderivatives. Technical Report, INRIA.

Deriche, R. and Blaszka, T. 1993. Recovering and characterizing im-age features using an efficient model based approach. InProceed-ings of the Conference on Computer Vision and Pattern Recogni-tion, New York, USA, pp. 530–535.

Deriche, R. and Giraudon, G. 1993. A computational approach forcorner and vertex detection.International Journal of ComputerVision, 10(2):101–124.

Dreschler, L. and Nagel, H.H. 1982. Volumetric model and 3D tra-jectory of a moving car derived from monocular TV frame se-quences of a street scene.Computer Graphics and Image Pro-cessing, 20:199–228.

Forstner, W. 1994. A framework for low level feature extraction. InProceedings of the 3rd European Conference on Computer Vision,Stockholm, Sweden, pp. 383–394.

Forstner, W. and G¨ulch, E. 1987. A fast operator for detection andprecise location of distinct points, corners and centres of circularfeatures. InIntercommission Conference on Fast Processing ofPhotogrammetric Data, Interlaken, Switzerland, pp. 281–305.

Harris, C. and Stephens, M. 1988. A combined corner and edgedetector. InAlvey Vision Conference, pp. 147–151.

Heath, M.D., Sarkar, S., Sanocki, T., and Bowyer, K.W. 1997. Arobust visual method for assessing the relative performance ofedge-detection algorithms.IEEE Transactions on Pattern Analysisand Machine Intelligence, 19(12):1338–1359.

Heitger, F., Rosenthaler, L., von der Heydt, R., Peterhans, E., andKuebler, O. 1992. Simulation of neural contour mechanism: Fromsimple to end-stopped cells.Vision Research, 32(5):963–981.

Heyden, A. and Rohr, K. 1996. Evaluation of corner extractionschemes using invariance methods. InProceedings of the 13th In-ternational Conference on Pattern Recognition, Vienna, Austria,Vol. I, pp. 895–899.

Horaud, R., Skordas, T., and Veillon, F. 1990. Finding geometric andrelational structures in an image. InProceedings of the 1st Euro-pean Conference on Computer Vision, Antibes, France, pp. 374–384.

Kitchen, L. and Rosenfeld, A. 1982. Gray-level corner detection.Pattern Recognition Letters, 1:95–102.

Koenderink, J.J. and van Doorn, A.J. 1987. Representation of localgeometry in the visual system.Biological Cybernetics, 55:367–375.

Laganiere, R. 1998. Morphological corner detection. InProceedingsof the 6th International Conference on Computer Vision, Bombay,India, pp. 280–285.

Lopez, A.M., Lumbreras, F., Serrat, J., and Villanueva, J.J. 1999.Evaluation of methods for ridge and valley detection.IEEE Trans-actions on Pattern Analysis and Machine Intelligence, 21(4):327–335.

Medioni, G. and Yasumoto, Y. 1987. Corner detection and curverepresentation using cubic B-splines.Computer Vision, Graphicsand Image Processing, 39:267–278.

Mokhtarian, F. and Mackworth, A. 1986. Scale-based descriptionand recognition of planar curves and two-dimensional shapes.IEEE Transactions on Pattern Analysis and Machine Intelligence,8(1):34–43.

Mokhtarian, F. and Suomela, R. 1998. Robust image corner detec-tion through curvature scale space.IEEE Transactions on PatternAnalysis and Machine Intelligence, 20(12):1376–1381.

Moravec, H.P. 1977. Towards automatic visual obstacle avoidance. InProceedings of the 5th International Joint Conference on ArtificialIntelligence, Cambridge, Massachusetts, USA, p. 584.

Nagel, H.H. 1983. Displacement vectors derived from second orderintensity variations in image sequences.Computer Vision, Graph-ics and Image Processing, 21:85–117.

Papoulis, A. 1991.Probability, Random Variables, and StochasticProcesses. McGraw Hill.

Parida, L., Geiger, D., and Hummel, R. 1998. Junctions: Detection,classification, and reconstruction.IEEE Transactions on PatternAnalysis and Machine Intelligence, 20(7):687–698.

Perona, P. 1995. Deformable kernels for early vision.IEEE Trans-actions on Pattern Analysis and Machine Intelligence, 17(5):488–499.

Page 22: Evaluation of Interest Point Detectorszduric/cs774/Papers/Schmid-Evaluation-IJCV.pdf · International Journal of Computer Vision 37(2), 151–172, 2000 °c 2000 Kluwer Academic Publishers

172 Schmid, Mohr and Bauckhage

Phillips, P.J. and Bowyer, K.W. 1999. Introduction to the specialsection on empirical evaluation of computer vision algorithms.IEEE Transactions on Pattern Analysis and Machine Intelligence,21(4):289–290.

Pikaz, A. and Dinstein, I. 1994. Using simple decomposition forsmoothing and feature point detection of noisy digital curves.IEEE Transactions on Pattern Analysis and Machine Intelligence,16(8):808–813.

Reisfeld, D., Wolfson, H., and Yeshurun, Y. 1995. Context-free at-tentional operators: The generalized symmetry transform.Inter-national Journal of Computer Vision, 14(2):119–130.

Rohr, K. 1992. Recognizing corners by fitting parametric models.International Journal of Computer Vision, 9(3):213–230.

Rohr, K. 1994. Localization properties of direct corner detectors.Journal of Mathematical Imaging and Vision, 4(2):139–150.

Romeny, B.M., Florack, L.M.J., Salden, A.H., and Viergever, M.A.1994. Higher order differential structure of images.Image andVision Computing, 12(6):317–325.

Semple, J.G. and Kneebone, G.T. 1952.Algebraic Projective Geom-etry. Oxford Science Publication.

Shi, J. and Tomasi, C. 1994. Good features to track. InProceedingsof the Conference on Computer Vision and Pattern Recognition,Seattle, Washington, USA,pp. 593–600.

Shilat, E., Werman, M., and Gdalyahu, Y. 1997. Ridge’s corner de-tection and correspondence. InProceedings of the Conferenceon Computer Vision and Pattern Recognition, Puerto Rico, USA,pp. 976–981.

Shin, M.C., Goldgof, D., and Bowyer, K.W. 1998. An objective com-parison methodology of edge detection algorithms using a struc-ture from motion task. InProceedings of the Conference on Com-puter Vision and Pattern Recognition, Santa Barbara, California,USA, pp. 190–195.

Shin, M.C., Goldgof, D., and Bowyer, K.W. 1999. Comparison ofedge detectors using an object recognition task. InProceedings ofthe Conference on Computer Vision and Pattern Recognition, FortCollins, Colorado, USA, pp. 360–365.

Smith, S.M. and Brady, J.M. 1997. SUSAN—A new approach tolow level image processing.International Journal of ComputerVision, 23(1):45–78.

Tomasi, C. and Kanade, T. 1991. Detection and tracking of pointfeatures. Technical Report CMU-CS-91-132, Carnegie MellonUniversity.

Zhang, Z., Deriche, R., Faugeras, O., and Luong, Q.T. 1995. A robusttechnique for matching two uncalibrated images through the re-covery of the unknown epipolar geometry.Artificial Intelligence,78:87–119.