a feature-based matching scheme: mpcd and robust matching strategy

10
A feature-based matching scheme: MPCD and robust matching strategy Wenbo Zhang a, * , Xinting Gao a , Eric Sung a , Farook Sattar a , Ronda Venkateswarlu b a School of Electrical and Electronic Engineering, Nanyang Technological University, 50 Nanyang Avenue, Singapore 639798, Singapore b Institue for Infocomm Research, Singapore 119613, Singapore Received 6 June 2005; received in revised form 25 May 2006 Available online 20 February 2007 Communicated by H.H.S. Ip Abstract This paper presents a scheme that matches interest point features detected on two images taken from different points of view. To accomplish this objective, we jointly consider the corner detection and matching problems. Firstly, a new multi-scale Plessey corner detec- tor (MPCD) is used to detect the interest points. Secondly, the geometric constraint between two images is exploited, based on which we propose a new energy function that can approximate 2D affine transformation in a more efficient way. Only a small set of corners with the highest accuracy and robustness are considered in the first stage, consequently, a small data space is provided to the robust algorithm in the second stage reducing the computation time. Therefore, more information can be incorporated into our scheme based on corner detection and matching phases. We compare our method using the proposed MPCD with two standard corner detectors based on image matching. We also evaluate our proposed matching strategy against Zhengyou’s method [Deriche, R., Zhang, Z., Luong, Q.-T., Faugeras, O., 1994. Robust recovery of the epipolar geometry for an uncalibrated stereo rig. In: European Conference on Computer Vision, Stockholm, Sweden, pp. 567–576]. Our scheme provides a new viewpoint and better results for the traditional feature matching problem. Ó 2007 Elsevier B.V. All rights reserved. Keywords: Corner detection; Stereo matching; Epipolar geometry; Structure from motion 1. Introduction A traditional problem in computer vision community is to find the correspondence between images, no matter that the two images are acquired simultaneously in a binocular stereo, or sequentially by a moving camera (Hartley and Zisserman, 2000). Then, the established correspondence can pave the way for 3D reconstruction, tracking, recogni- tion and other applications. The correspondence problem could be decomposed into sparse feature matching and dense area matching. In this paper, we will focus on fea- ture-based matching between two images. Although many methods have been proposed, we examine this problem in a different viewpoint. We consider feature detection and fea- ture matching as a whole. Contributions are made on both stages, and the results of feature detection facilitate the fea- ture matching algorithm. Our method gives a new point of view for this traditional problem. The relationship between two images of the same object at different view points can be described by a projective transformation. Many previous works assume an affine camera and hence describe this relationship by an affine transformation. For an approximate planar surface, e.g. the object depth variation is far less than its distance to the camera, the affine transformation can be an effective approximation (Hartley and Zisserman, 2000). However, in practice most objects are not planar. To satisfy the real world situation, we can assume that the object surface are continuous and pair-wise planar. Then, the local 2D affine transformation between neighboring areas of correspond- ing features in the two images can be exploited to match 0167-8655/$ - see front matter Ó 2007 Elsevier B.V. All rights reserved. doi:10.1016/j.patrec.2007.02.004 * Corresponding author. Fax: +65 67905868. E-mail address: [email protected] (W. Zhang). www.elsevier.com/locate/patrec Pattern Recognition Letters 28 (2007) 1222–1231

Upload: independent

Post on 28-Nov-2023

0 views

Category:

Documents


0 download

TRANSCRIPT

www.elsevier.com/locate/patrec

Pattern Recognition Letters 28 (2007) 1222–1231

A feature-based matching scheme: MPCD and robust matching strategy

Wenbo Zhang a,*, Xinting Gao a, Eric Sung a, Farook Sattar a, Ronda Venkateswarlu b

a School of Electrical and Electronic Engineering, Nanyang Technological University, 50 Nanyang Avenue, Singapore 639798, Singaporeb Institue for Infocomm Research, Singapore 119613, Singapore

Received 6 June 2005; received in revised form 25 May 2006Available online 20 February 2007

Communicated by H.H.S. Ip

Abstract

This paper presents a scheme that matches interest point features detected on two images taken from different points of view. Toaccomplish this objective, we jointly consider the corner detection and matching problems. Firstly, a new multi-scale Plessey corner detec-

tor (MPCD) is used to detect the interest points. Secondly, the geometric constraint between two images is exploited, based on which wepropose a new energy function that can approximate 2D affine transformation in a more efficient way. Only a small set of corners withthe highest accuracy and robustness are considered in the first stage, consequently, a small data space is provided to the robust algorithmin the second stage reducing the computation time. Therefore, more information can be incorporated into our scheme based on cornerdetection and matching phases. We compare our method using the proposed MPCD with two standard corner detectors based on imagematching. We also evaluate our proposed matching strategy against Zhengyou’s method [Deriche, R., Zhang, Z., Luong, Q.-T.,Faugeras, O., 1994. Robust recovery of the epipolar geometry for an uncalibrated stereo rig. In: European Conference on ComputerVision, Stockholm, Sweden, pp. 567–576]. Our scheme provides a new viewpoint and better results for the traditional feature matchingproblem.� 2007 Elsevier B.V. All rights reserved.

Keywords: Corner detection; Stereo matching; Epipolar geometry; Structure from motion

1. Introduction

A traditional problem in computer vision community isto find the correspondence between images, no matter thatthe two images are acquired simultaneously in a binocularstereo, or sequentially by a moving camera (Hartley andZisserman, 2000). Then, the established correspondencecan pave the way for 3D reconstruction, tracking, recogni-tion and other applications. The correspondence problemcould be decomposed into sparse feature matching anddense area matching. In this paper, we will focus on fea-ture-based matching between two images. Although manymethods have been proposed, we examine this problem in adifferent viewpoint. We consider feature detection and fea-

0167-8655/$ - see front matter � 2007 Elsevier B.V. All rights reserved.

doi:10.1016/j.patrec.2007.02.004

* Corresponding author. Fax: +65 67905868.E-mail address: [email protected] (W. Zhang).

ture matching as a whole. Contributions are made on bothstages, and the results of feature detection facilitate the fea-ture matching algorithm. Our method gives a new point ofview for this traditional problem.

The relationship between two images of the same objectat different view points can be described by a projectivetransformation. Many previous works assume an affinecamera and hence describe this relationship by an affinetransformation. For an approximate planar surface, e.g.the object depth variation is far less than its distance tothe camera, the affine transformation can be an effectiveapproximation (Hartley and Zisserman, 2000). However,in practice most objects are not planar. To satisfy the realworld situation, we can assume that the object surface arecontinuous and pair-wise planar. Then, the local 2D affinetransformation between neighboring areas of correspond-ing features in the two images can be exploited to match

W. Zhang et al. / Pattern Recognition Letters 28 (2007) 1222–1231 1223

point features, and the global transformation is still keptprojective. This basic assumption is widely used in com-puter vision community as in (Hartley and Zisserman,2000; Jung and Lacroix, 2001; Deriche et al., 1994).

Various methods for corner detection and featurematching have been proposed. On one hand, we may cate-gorize corner detection into three types: template-basedcorner detection, edge-detection-based corner detectionand direct corner detection. The template-based cornerdetection such as in (Rangarajan et al., 1989; Singh andShneier, 1990) first set up mathematical models for cornerstructure, then use correlations between the models and theimage to detect the corners. As the models cannot cover alltypes of corners that have different orientations and sub-tended angles, the performance is not satisfactory in prac-tical applications. Edge-detection-based corner detectiontakes corner points as the high curvature points on theedges. This approach detects the edge first, then detectsthe corner points using various methods on the edge orcontour. Lee et al. (1995) and Quddus and Fahmy (1999)use wavelets to decompose the contours and find the cornerpoints on it. Cooper et al. (1993) have introduced twocorner detectors: one works using dissimilarity along thecontour direction when a corner emerges and the other esti-mates the curvature along the contour direction. In (Mokh-tarian and Suomela, 1998), Mokhtarian et al. apply scale-space technique to the contour to find the corners. Besidesvarious corner detection techniques, the performance ofthese type of detectors mainly depend on the success ofthe edge detection applied in the algorithms.

The third type is the direct corner detection, since itdetects corner points directly after some calculations basedon the first or second derivative of the image. In (Beaudet,1978), Beaudet derives a corner measurement from theHessian Matrix that needs to calculate the second deriva-tive of the image. Alison Noble (1988) characterizes the2D surface features (including corner points) by the differ-ential geometry of a facet model. In (Kitchen and Rosen-feld, 1982), Kitchen and Rosenfeld multiply the rate ofchange of gradient direction by the gradient magnitudeto detect corner points. In (Harris and Stephens, 1988),Harris and Stephens develop Moravec’s idea (Moravec,1977) into the famous Plessey corner detector. This methodis based on the first derivative quantities. In Smith andBrady (1997), Smith and Brady apply a circular mask todetect corners, the so called SUSAN detector.

On the other hand, there exists a number of literature onmatching point feature between two or more images in thecomputer vision areas (Okutomi and Kanade, 1993; Pilu,1997; Faugeras et al., 1993). The latest survey for non-fea-ture-based matching is the taxonomy by Scharstein andSzeliski (2002). However, in this paper, we only focuson feature-based matching, especially point corners. In(Trajkovic and Hedley, 1996), SUSAN corner detector isapplied to extract corner points. To compute the similaritybetween corners, they make use of the standard cross-correlation on a small patches centered over the corners.

In order to improve the performance, they also employtemporary constraint, that only features with constant dis-parity are saved, as well as the geometric relationshipbetween features. A variety of methods make use of thePlessey corner detector. In (Jung and Lacroix, 2001), theyuse affine transformation to approximate the relationshipbetween two small homologous patches of the same contin-uous surface patch of the object. They explicitly computethe 2D affine transformation by an exhaustive search inthe neighboring space of a pair of corners, then estimatethe affine transformation and compute the repeatability.Consequently, the computational complexity is very high.Also, the affine approximation may not be satisfied forthe nearest five corners in its neighborhood. Finally, thetest images used in (Jung and Lacroix, 2001) are either aer-ial images or the objects are almost planar ones, which aresuitable for the affine assumption.

In the landmark work of Deriche et al. (1994), ZNCC(zero-mean normalized cross correlation) (Faugeras et al.,1993) is used for computing matching candidates, and then,a relaxation technique is used to obtain initial correspon-dences for estimating the fundamental matrix. The finalcorrespondences are found with the help of fundamentalmatrix. The method is robust by the use of least medianof squares technique (LMedS). However, the nonlinearprocedure of robust algorithm leads to unexpected longtime on the computation of true solution, especially whenthe data space is large and noisy. In addition, the key totheir algorithm, the energy functional, does not take intoaccount the rotational effect of the object.

In a close view, a feature-based matching method iscomposed of three steps: feature detection, computationof epipolar geometry and feature matching based on epipo-lar geometry. The final result, i.e. the matched features,depends on the cooperation of algorithms on the threestages and their respective performance. Furthermore,although finding epipolar geometry is not explicit in featurematching, it is actually the most important step in thispaper because the final results are based on it. When fea-tures have been detected and candidate matches are alreadyinitialized, a general robust algorithm such as RANSACand LMedS, can be applied to obtain the epipolar geome-try due to noisy data space (the high percentage of falsematches in matching candidates). The nonlinear procedureto distinguish false matches is computationally very expen-sive for a feature-based matching method especially whenthe data space is large and very noisy, e.g. up to 50% falsematches. Meanwhile, in practice, we found that even onepixel displacement of the point features may result in largechanging of epipolar geometry. Then, the accuracy of fea-tures and a ‘clean’ matching candidate space with smalldimension may be demanding in an efficient feature-basedmatching method. In fact this leads to our idea that we takeboth the feature detection and the matching strategy intoaccount simultaneously to improve the overall computa-tional efficiency. Firstly, only the most accurate cornersare detected in a multi-scale manner to reduce corner

1224 W. Zhang et al. / Pattern Recognition Letters 28 (2007) 1222–1231

delocalization problem in Plessey corner detector, and toprovide small data set as initializing matching candidates,i.e. a small matching candidate space is provided. Secondly,because of the efficiency of robust algorithm on small dataspace, a new energy function is designed to integrate moreinformation to obtain fundamental matrix: to approximatethe affine transformation in a more effective way that thetranslation, rotation and uniform-scale transformationsare taken into account; finally, more matches are foundby the same energy function based on epipolar geometry.A price to pay in our first stage is less detected corners, how-ever, it is acceptable because as low as seven point matchesare enough to calculate the fundamental matrix (Hartleyand Zisserman, 2000). Due to the assumption it follows,the methodology based on the above idea will be applicablefor a great sets of object reconstruction. One kind of theobject is human face that is the application in this paper.

The organization of this paper is as follows. We brieflyrevisit the Plessey corner detector in Section 2, followedby the multi-scale Plessey corner detector. The matchingstrategy in (Deriche et al., 1994) is analyzed and our energyfunction is proposed in Section 3. The results from differentcorner detectors and matching strategies are illustrated andcompared in Section 4. Section 5 gives the conclusion andfuture work.

2. The multi-scale Plessey corner detector

The origin of the Plessey corner detector can be tracedto Moravec detector in (Moravec, 1977). It employs theauto-correlation of an image to detect corners. The basicidea is based on the following observation: the differencebetween the adjacent pixels along an edge or on a uniformpart of the image is small, whereas for the corner points thedifference is high in all directions.

The Plessey corner detector is isotropic on response dueto the analytic expansion about the shift origin, and robustto noise due to the circular Gaussian filter. It is an excellentgeneral corner detector and the details can be found in(Harris and Stephens, 1988). However, the corners needto be the most robust ones and as accurate as possible ina feature-based matching context. Plessey detects cornersat a single scale which have to be determined experimen-tally to produce the best results. There is a tradeoff betweenthe localization accuracy and robustness. Large scale canincorporate more image local information and producemore robust results but with large localization displace-ment. At the same time, a small scale can produce resultswith accurate localization, however, the robustness willbe poor. Because of the tradeoff, a large percentage of cor-ners will be detected at a large scale, hence with poor local-ization accuracy. In addition, the other two parameters: thesize of the non-maximum-suppression window and thethreshold, also have to be decided by trial and error.

To address the above problems, we have proposed amulti-scale analysis scheme for corner detection using Ples-sey method as follows. We set a range of scales from small to

large, so that the image is transformed into the scale-spacedomain. At small scales, details of the structure in the imageare accurately captured. At large scales, the general featureof the structure is obtained (Babaud et al., 1986). Conse-quently, the proposed algorithm detects the corner pointsfor a range of scales to obtain the corners belonging todifferent scales. Note that since the corners are detected atdifferent scales, we can set the size of the non-maximum-suppression window according to the size of the Gaussiankernel used in our scale-space transformation.

Bearing in mind the above observation, we replace thethreshold at each scale in (Harris and Stephens, 1988) withthe possible number of corners, n0, that we expect to detectat each scale. n0 is determined by the image characteristicsand to facilitate the following matching algorithm in oursystem. Therefore, n0 is set to 150 to satisfy the minimumrequirement of feature-based matching algorithm. In otherwords, we are actually assuming that an image, which donot contain more than 150 corners, is lacking in textureand not suitable for feature-based matching. By this way,we do not need to adjust the threshold at each scale man-ually. This parameter setting corresponds with both thedefinition of the corners and the characteristics of thehuman vision system (HVS).

Another advantage with MPCD is that we will not needto trade off between the corner localization and the scale onwhich it operates. Instead, the corners are detected at sev-eral scales and only the corners with least scale are acceptedas corner at this location, other corners that are detected onlarger scales are removed. In this way, the tradeoff betweenlocalization accuracy and robustness is made adaptively oneach corner instead of on whole image as Plessey does.That results in the improvement on localization accuracyas well as robustness, which are very important for fea-ture-based matching.

The proposed corner detection algorithm has the follow-ing steps.

Step 1. Compute the ‘‘cornerness’’ measurement at dif-ferent scales by varying the standard deviation of theGaussian window as ri = 0.5:5 where the step size for ri

is 0.5, i.e. i = 1:10.

L2xðriÞ ¼ GðriÞ � ðIxÞ2; ð1Þ

L2yðriÞ ¼ GðriÞ � ðIyÞ2; ð2Þ

LxyðriÞ ¼ GðriÞ � ðIx � IyÞ; ð3Þ

Mð�; riÞ ¼L2

xðriÞ LxyðriÞ

LxyðriÞ L2yðriÞ

24

35; ð4Þ

where Ix, Iy are the first derivatives of the image along the x

and y directions, G(ri) is the Gaussian function, andM(Æ,ri) is the second moment matrix at scale ri. Then,the Plessey ‘‘cornerness’’ measurement, R(Æ,ri) at eachscale, is then calculated by the following expression:

W. Zhang et al. / Pattern Recognition Letters 28 (2007) 1222–1231 1225

Rð�; riÞ ¼L2

xðriÞL2yðriÞ � ðLxyðriÞÞ2

L2xðriÞ þ L2

yðriÞ: ð5Þ

Eq. (5) should be interpreted in the limiting sense when thedenominator approaches zero.

Step 2. For each scale, non-maximum-suppression isapplied to suppress the multi-responses. The size of thenon-maximum-suppression window relates with the sizeof the Gaussian window. Then the values of R(Æ,ri) arearranged in a descending order and the first n0 values arechosen as corner candidates for each scale.

Step 3. From the second scale to the largest scale, find ifthe corner candidates in the current scale appears in theprevious scale. If not, keep it; else, remove it. This proce-dure guarantees that the corners will be detected at thescale that is as small as possible. So, we get the best local-ization precision for each corner point.

Step 4. The final corner detection result is obtained byadding the results as obtained from Step 3.

Although computational load of the modified Plesseydetector is higher than the original Plessey detector, itcan trade off on localization and robustness adaptively. Ifwe consider the trial and error procedure required by theoriginal Plessey detector, the proposed algorithm may needcomparable simulation times as the original method, whilethe proposed algorithm can provide the better results. Theresults are shown in Section 4.

3. The matching strategy

In this section, the ZNCC is introduced first. Then, thebasic assumption of our matching strategy is described indetail. Finally, the proposed energy function is given,which is the core of matching procedure.

3.1. Cross correlation

The cross correlation is one of the commonly used stan-dard method to compute the similarity between two imagepatches. In this paper, we use zero-mean normalized cross

correlation (ZNCC):

Zðc1; c2Þ ¼Pm

i¼�m

Pnj¼�n½I1ði; jÞ � I1� � ½I2ði; jÞ � I2�ð2mþ 1Þð2nþ 1Þr1r2

; ð6Þ

where c1 and c2 are corners in the first and second images,I1(i, j) and I2(i, j) are the (i, j) pixel value of the local windowin the first and second images and, ‘*’ is multiplication.Also we have

Ik ¼Pm

i¼�m

Pnj¼�nIkði; jÞ

ð2mþ 1Þð2nþ 1Þ ; k ¼ 1; 2 ð7Þ

is the average intensity of the window centered at ck. Theintensity variance rk over the window is defined as

rk ¼

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi1

ð2mþ 1Þð2nþ 1ÞXm

i¼�m

Xn

j¼�n

½Ikði; jÞ � Ik�2vuut ; k ¼ 1; 2:

ð8Þ

The corner pair whose ZNCC values are higher than a gi-ven threshold are accepted as a matching candidate. How-ever, one corner on the left image may be matched toseveral corners on the right one, and vice versa. Thus, var-ious constraints are exploited to find out the correctmatches after correlation.

3.2. The matching procedure

We follow the general idea that we first find a smallgroup of robust point matches from matching candidates,then compute the epipolar geometry between the twoimages. Ideally, this set of point matches should be evenlydistributed on the respective images. Finally, we use thecalculated epipolar geometry as guide to find a large groupof matches. In order to find the first set of robust matches,we consider the same updating strategy and robust com-puting method as in (Deriche et al., 1994) but with a differ-ent energy function to estimate the epipolar geometry. Theproposed energy function, which is the core of the match-ing procedure, can take 2D affine transformation intoaccount, and is minimized iteratively.

We assume the object is non-planar but locally continu-ous. Then, a little patch on the object surface can beassumed planar. It is a fair assumption for a large groupof objects, including face. Based on the above statements,a pair of images for the same object but at different posecan be related by a projective transformation, hence for apair of matching patches that is assumed to be locally pla-nar, this transformation will be a homography and can beapproximated by affine transformation. Then, our newenergy function takes the advantage of this affine transfor-mation: we use scale, rotation and translation changeswithout considering the shear effects to approximate thisaffine transformation. This approximation is valid forbinocular stereo camera or for structure from motion situ-ation with small pose changes. The image transformationassumption has been used by many proposed matchingmethods, however, we integrate it into a robust matchingstrategy, which itself is a part of point matching scheme.The following gives the implementation details of theassumption in this paper.

Suppose there are M corners in the first image and N

corners in the second image from MPCD. Also supposethere are G matching candidates after ZNCC. Then, forthe gth matching candidate (mi,nj) where g 2 [1,G],i 2 [1,M] and j 2 [1, N], we define R(mi) and R(nj) as theneighborhoods of mi and nj respectively. Hence we definethe hth (h 2 [1, G] but h 5 g) matching candidate (mp,nq)as neighbor candidate of (mi,nj), where mp 2 R(mi),p 2 [1, M] and mq 2 R(mj), q 2 [1,N]. From the above defi-nitions, we can obtain vectors uip and vjq as

1226 W. Zhang et al. / Pattern Recognition Letters 28 (2007) 1222–1231

uip ¼ mp �mi; ð9Þvjq ¼ nq � nj: ð10Þ

Then the 2D affine transformation A between uip and vjq

can be written as

vjq ¼ Auip ¼ ScShRðaÞuip þ t; ð11Þ

where Sc represents the scale transformation, Sh representsthe shear transformation and the R(a) describes the rota-tion. t is the translation from mi to nj, which can be derivedby subtraction of their coordinates. Because we only con-sider uniform scale, rotation and translation transforma-tions, Eq. (11) can be changed to:

vjq � sRðaÞuip þ t; ð12Þ

where s is uniform scale instead of Sc. Then, the transfor-mation between uip and vjq can be approximated by scales and rotation angle a, where

sðuip; vjqÞ ¼kvjqk2

kuipk2

; ð13Þ

aðuip; vjqÞ ¼ arctanuip � vjq

uip � vjq

��������

2

� �: ð14Þ

Then based on the scale and rotation changes derived here,we can establish an energy function in the following way.

3.3. A new energy function

Our energy function is based on the idea of traditionalrelaxation technique, which can recognize the true data fromnoisy data set. In our situation, the relaxation idea can beinterpreted as the following. The true matches will satisfythe same projective transformation globally and the sameaffine transformation locally, then the neighboring correctmatches will be consistent with the above transformations.On the contrary, as the neighboring false matches will notbe consistent on any local transformation or global one, theycould be assumed as Gaussian noise. Bearing the aboveobservation in mind, we can assign a measure of strengthto each match by its neighbors, which imply the correctnessof this match. Then, a correct match will gain strong supportfrom its consistent relationship to its neighbors while a badmatch, can only obtain weak support from its neighborsbecause they are not consistent to any transformation.

Following the above analysis, the energy function is sim-ply the sum of the strength of matches and is minimized ina gradient descent way: in each iteration the weakest groupof matches are eliminated, so the associated match strengthis deducted from our energy function. Also, the eliminatedmatches’ weak but positive support to strong matches isremoved. Because all support are positive, this removalresults in the decreasing of the strong match’s strength.As the result, the energy function is monotonically decreas-ing over each iteration, and reach its minimum when thereis no further matches can be kicked out by the same mech-anism in (Deriche et al., 1994). The minimum could be alocal one and the remaining matches may still contain false

ones. However, the percentage of false matches has beenlargely reduced and can be detected easily by a robust algo-rithm. The details are provided later.

Our strength of match is defined based on the idea thatexplicitly calculates the local 2D affine transformation. Asmentioned in Section 3.2, if there are K neighbor matchingcandidates for a given pair of matching candidate (mi,nj),their corresponding scale and rotation transformationscan be calculated and described as follows:

B �s1 s2 . . . sK

a1 a2 . . . aK

� �; ð15Þ

where sk,ak; k 2 [1,K] are the scale and rotation changescorresponding to the kth neighbor matching candidate.Since not all of the matching candidates are correctmatches, the affine transformations computed based onthem cannot be homogeneous. In other words, the scaleand rotation changes based on wrong matches will be out-liers, and should be discarded or assigned with a lowweight. The problem here is how to detect outliers and dis-card them?

Our idea here is that we do not make a crisp distinctionfor inlier and outlier, but instead, we weigh its membershipfunction according to its distance from the correct transfor-mation. So now the critical problem is of what is the cor-rect local transformation?

Assuming that the total number of outliers is not morethan 50%, the median of scale and angle changes are takenas the correct changes. Then the difference between eachchange on scale and angle and their medians is computedas distance. When the assumption of affine approximationis valid, we can obtain the heuristic condition that the smal-ler the difference, the stronger will be the support of theneighbors to the given matching candidate. Furthermore,to compensate the difference between quantities of scaleand angle changes, the squared differences of both scaleand angle are normalized to unity, and different weightsare provided.

We assumed that the small patch around a corner is con-tinuous and its different view can be described by affinetransformation. However, this assumption is only validwhen the patch is small enough, that is to say, only a smallenough patch can be assumed as local planar, and hencethe transformation between a pair of patches is affine. Asa results, the approximated transformation will performbetter when the neighbor matching candidate is close thanit does when the neighbor candidate is far. Based on thisanalysis, we will assign greater weight for the closer neigh-bor candidates than the distant ones.

Then, for the gth matching candidate (mi,nj) which haveK neighbor matching candidates, the strength of matchbetween them SMg(mi,nj) is defined as

SMgðmi; njÞ ¼1

K

XK

k¼1

½10 � 2�0:1Dk � e�½cEsk=EsNþð1�cÞEak=EaN ��:

ð16Þ

Fig. 1. The illustration of affine approximation.

Fig. 2. Illustration for energy function.

W. Zhang et al. / Pattern Recognition Letters 28 (2007) 1222–1231 1227

In Fig. 2, three neighbor candidates are illustrated for thecurrent matching candidate (mi,nj). Referring to this figure,the details of Eq. (16) are as follows. Let k 2 [1, K] is the kthneighboring matching candidate of (mi,nj), Dk is the aver-age 2-norm of the two corresponding vectors (see Fig. 1):

Dk ¼kuipk2 þ kvjqk2

2ð17Þ

and this average distance is integrated into the function asthe �0.1Dk power of 2, which is a strictly monotonicallydecreasing function of it. Dk is used to describe the distancefrom the current matching candidate (mi,nj) to its neighborcandidate. Larger weights are assigned according to smal-ler Dk because the local object surface can be approximatedmore efficiently as planar when Dk is getting small.

The factor of 10 on the right term of Eq. (16) is intro-duced in order to prevent large machine error in case thevalue of matching strength is too small. The EaK and EsK

which for normalizing the distances is the 2-norm of Eak

and Esk, for all k 2 [1,K], respectively:

EaK ¼XK

k¼1

Ea2k

" #12

; ð18Þ

EsK ¼XK

k¼1

Es2k

" #12

; ð19Þ

where Eak is the square difference from amedian:

Eak ¼ ðak � amedianÞ2 ð20Þ

and Esk is the square difference from smedian:

Esk ¼ ðsk � smedianÞ2: ð21Þ

The amedian and smedian are the median of ak and sk, wherek 2 [1,K]. After normalization, we have Esk/EsN 2 [0,1]and Eak/EaN 2 [0, 1] so that neither can dominate in theirsum except by different assigned weight c. c is the weightagebetween scale transformation and rotation transformation.In practice, we assign it 0.4 to emphasize rotation slightly.

An implicit factor in Eq. (16) is the definition of neigh-borhood area. Because we already assign high weights fornear neighbor matching candidates and low weights forfar ones, we define our neighborhood as the following ifwe take the first image as reference:

RðmÞ ¼ u; kuk2 2 ½r;R�; ð22Þ

R is determined by the validity of affine assumption, wherethe depth variation on the area of radius R is much lessthan the distance from the corresponding object surfaceto the camera. Unfortunately the depth variation is un-known in this context. Therefore, the R is set to 1/10 ofthe image width by experience for a normal lens, e.g. notwide angle or macro lens. r is necessary to prevent large er-ror: suppose uip is free of noise but vjq has an absolute erroras Dv, then the Ds/s will be increasing when both kuipk2 andkvjqk2 are decreasing. In other words, when the neighbormatching candidate is getting close to the current matchingcandidate, the relative error of scale and angle changes willbe large. Therefore, we set r to 5 pixel on the assumptionthat kDvk2 is not more than 0.5 pixel.

By defining the square difference from the median of localtransformations, our energy function can integrate the affineapproximation assumption effectively. The smaller thesquared difference, the stronger the contribution of theneighboring matching candidate to the current one, andwe consider translation, scale and rotation transformations.

4. Results and analysis

In this section, we perform a control experiment to com-pare the performance of our method with others. Threekinds of corner detectors, Plessey, SUSAN, MPCD andtwo matching strategies, Zhengyou’s method and oursare tested for comparison. The control experiment is madeby combinations of each corner detector with one of thetwo matching strategies. That means the only differencebetween different methods are the target of our compari-son, and the other experimental conditions would be thesame. We then compute the epipolar geometry betweenimages, and the epipolar lines. In order to quantitativelymeasure the performances, we define the residual error asin (Hartley and Zisserman, 2000).

Residual ¼PN

i¼1ðd1i þ d2iÞN

; ð23Þ

0 2 4 6 8 10 12 14 16 180

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Sample Index

Mat

chin

g R

esid

uals

(pi

xel)

Plessey with our matchingSusan with our matchingMPCD with our matching

Fig. 4. Three corner detectors with our matching strategy.

0 2 4 6 8 10 12 14 16 180.45

0.5

0.55

0.6

0.65

0.7

0.75

0.8

Sample Index

Mat

chin

g R

esid

uals

(pi

xel)

Plessey with zhengyou’s matchingPlessey with our matching

Fig. 5. Plessey corner detector with two matching strategies.

1228 W. Zhang et al. / Pattern Recognition Letters 28 (2007) 1222–1231

where d1i and d2i are the distance of the ith matching can-didate to their corresponding epipolar lines in each image,and N is the number of matches. Then the error is com-puted by averaging the sum of distances. Finally, a simpleanalysis of computational complexity compared with otheralgorithms are shown.

4.1. Comparison 1: Different corner detectors with

the same matching strategy

To compare the results from different algorithms fairly,we have carried on the control experiment that uses thesame matching strategy but with different corner detectorson the same set of images. Then only the corner detector isdifferent from each set of results. The algorithms have usedeighteen pairs of images from wide baseline database ofALOI (Geusebroek et al., 2005), where we use different cor-ner detectors to detect point features and the same match-ing strategy to find the final matches between images. Fig. 3shows the results of different combinations of three cornerdetectors with Zhengyou’s matching strategy, while Fig. 4illustrates the results of different combinations of the threecorner detectors with our matching strategy. In Fig. 3, wecan see that for 66.7% or 12 samples, the MPCD achievedthe best results, whereas for the other five samples, the per-formance of MPCD is almost indistinguishable from thebest one. On the other hand, in Fig. 4, the best performanceof 94.4% or 17 samples is achieved by MPCD.

4.2. Comparison 2: Different matching strategies with thesame corner detector

To demonstrate the performance of our matching strat-egy, another comparison is made on the same data sets asin the previous one, and the results are shown in Figs. 5–7.In this study, two matching strategies are consideredbased on the same corner detectors, i.e. Plessey, Susan or

0 2 4 6 8 10 12 14 16 180.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2

Sample Index

Mat

chin

g R

esid

uals

(pi

xel)

Plessey with Zhengyou’s matchingSusan with Zhengyou’s matchingMPCD with zhengyou’s matching

Fig. 3. Three corner detectors with Zhengyou’s matching strategy.

0 2 4 6 8 10 12 14 16 180.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2

Sample Index

Mat

chin

g R

esid

uals

(pi

xel)

Susan with Zhengyou’s matchingSusan with our matching

Fig. 6. Susan corner detector with two matching strategies.

0 2 4 6 8 10 12 14 16 180

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

Sample Index

Mat

chin

g R

esid

uals

(pi

xel)

MPCD with zhengyou’s matchingMPCD with our matching

Fig. 7. Multi-scale Plessey corner detector with two matching strategies.

W. Zhang et al. / Pattern Recognition Letters 28 (2007) 1222–1231 1229

multi-scale Plessey corner detector. For the Plessey cornerdetector shown in Fig. 5, we can see that our strategy per-forms better 83.3% of the time, or 15 out of 18 samples,

Fig. 8. Epipolar lines on the left image: row 1 by Plessey, row 2 by SUSAN, roour matching strategy.

and for the other three samples, both methods performsimilarly. For the Susan corner detector in Fig. 6, our strat-egy can produce better results 72.2% of the time, or 13 outof 18 samples, and the error differences are almost indistin-guishable on the other five samples. The same remarks canbe made for Fig. 7, where our strategy provides on 88.9%or 16 samples, and have almost the same results for theother two sets.

4.3. Comparison 3: Performance with real-world images

In this comparison, the six combinations between threecorner detectors and two matching strategies are run onreal world images. The epipolar geometry results are drawnin the left image as shown in Fig. 8. We first manually select12 pairs of point matches between images, then calculateand draw the corresponding epipolar lines in black. Wealso draw the epipolar lines calculated from our proposedmethod in white. Finally, the black lines, that is the epipo-lar lines derived from manually selected matches which areassumed as the ground truth, and the white lines, that is the

w 3 by MPCD. Column 1 by Zhengyou’s matching strategy, column 2 by

Fig. 9. Matching result: row 1 by Plessey, row 2 by SUSAN, row 3 by MPCD. Columns 1 and 2 by Zhang’s matching strategy, columns 3 and 4 by ourmatching strategy.

1230 W. Zhang et al. / Pattern Recognition Letters 28 (2007) 1222–1231

epipolar lines from our method, are drawn in the sameimage for comparison. From Fig. 8, the epipolar lines fromour scheme is the closest ones to the black lines, which areassumed as the truth.

The final matching results which are computed based onthe found epipolar geometry by matching strategies areillustrated in Fig. 9. Each match is drawn as a short line,of which the extreme left point is in the coordinates of leftimage, and the extreme right point is in the coordinates ofright image no matter which images they are on. Then, forthe corresponding short lines in both images, the extremeleft point in the left image is matched to the extreme rightpoint in the right image. The residual errors of the finalmatches to epipolar lines, which are derived from auto-cal-culated matches and manually selected matches, areshowed in Table 1. Although we take the epipolar geome-try derived from manually selected matches as the groundtruth, we can see that it is also an estimation of the truth.However, from Table 1, we can see that, first, our matchingstrategy based on the MPCD can produce the smallesterror of 0.60 pixel; second, if we take Zhengyou’s strategyas reference, the results of MPCD have (0.81 � 0.73)/0.81 = 9.9% improvement over Plessey; third, if we takePlessey corner detector as reference, the results of ourmatching strategy have (0.81 � 0.74)/0.81 = 8.6%; andfinally, our scheme has (0.81 � 0.6)/0.81 = 25.9% improve-

Table 1Residual errors of different methods

Errors(pixel)

Zhengyou’s strategy Our strategy

Autoepi-line

Manualepi-line

Autoepi-line

Manualepi-line

Plessey 0.81 0.88 0.74 0.77SUSAN 1.36 0.93 0.90 0.87MPCP 0.73 0.84 0.60 0.69

ment from the results of Zhengyou’s matching strategybased on Plessey corner detector.

4.4. Comparison 4: Computational complexity

The exact overall computational complexity of thewhole system is difficult to obtain because there are worstand best cases in different stages. However, we can decom-pose it into four parts that have significant computationalcomplexity, and then, a peer to peer comparison withmethod in (Deriche et al., 1994) is given based on O(n)analysis:

• Corner detection: Plessey is O(l1WH) while MPCD isO(k1WH), here W represent the image width and H rep-resent the image height. In MPCD, k1 is equal to 10which is the number of scales we have searched; inPlessey, l1 is the trial times which we assume to be3–5.

• Correlation: our system is O(PQ), which is the same asin (Deriche et al., 1994). Here P is the corner quantityand Q is the average corner quantity in local searchwindow.

• Relaxation: Our energy function is O(k2MKk3) whereasZhengyou’s method is O(l2MKl3). M is the matchingcandidate quantity, K is the average neighbor candidatequantity, and k2, l2 are the iteration times respectively.In addition, k3 and l3 are the respective prime operationsfor each strength of matches. Considering one moretransformation, the rotation, is incorporated, we fairlytake k3 � 1.5l3 to avoid the worst or best case.

• LMedS for fundamental matrix: The computationalcomplexity is O(N) if taking solving linear equationsas prime operations. However, if suppose n is the actualtimes of solving linear equations, n will be:

W. Zhang et al. / Pattern Recognition Letters 28 (2007) 1222–1231 1231

n ¼ logð1�pÞ½1�ð1�eÞ8�; ð24Þ

where p is the confidence level, e is the percentage offalse matches after relaxation process. If we suppose eis 15% in our case, and 45% in normal case, the actualcomputation load of ours versus normal will be 1:38.

In summary, our scheme makes an automatic search inthe corner detection phase and takes more information inrelaxation phase, which requires more computational com-plexity. On the other hand, we also successfully reduce theactual computational complexity in the robust algorithmstage, where the prime operation is to solve the linear equa-tions. The reduction of computation is accomplished bycarefully finding the corners and initial matches. In prac-tice, the overall computation time are similar, and ourmethod has less user interaction and achieves better results.

5. Conclusion

In this paper, we present a feature-based matchingmethodology. We consider this problem by simultaneouslyperforming point feature detection and point featurematching: we propose a new energy function which takesmore information into account than that of Zhengyou’swork (Deriche et al., 1994). Also, a multi-scale Plessey cor-ner detector, that can detect corners at adaptive scale oneach pixel other than at single scale that has to be tradedoff over the whole image in Plessey, is used to extract fea-ture points. An initial set of matching points between thetwo images are achieved by minimizing the new energyfunction. Then, the epipolar geometry is found by the ini-tial matching points. The results based on three cornerdetectors (Plessey, SUSAN and MPCD) and two matchingstrategies (Zhengyou’s and the proposed one) are com-pared, showing that our method can produce more preciseand robust results. Our method also considers the tradi-tional feature matching problem in a new aspect. Futurework could include the reduction of the computationalcomplexity in each phase of the scheme.

References

Alison Noble, J., 1988. Finding corners. Comput. Vision Graphics ImageProcess. 6 (2), 121–128.

Babaud, J., Witkin, A.P., Baudin, M., Duda, R.O., 1986. Uniqueness ofthe gaussian kernel for scale-space filtering. IEEE Trans. Pattern Anal.Machine Intell. PAMI-8 (January), 26–33.

Beaudet, P.R., 1978. Rotational invariant image operators. In: FourthInternational Conference on Pattern Recognition, pp. 579–583.

Cooper, J., Venkatesh, S., Kitchen, L., 1993. Early jump-out cornerdetectors. IEEE Trans. Pattern Anal. Machine Intell. 15 (August),823–828.

Deriche, R., Zhang, Z., Luong, Q.-T., Faugeras, O., 1994. Robustrecovery of the epipolar geometry for an uncalibrated stereo rig. In:European Conference on Computer Vision, Stockholm, Sweden, pp.567–576.

Faugeras, O., Hotz, B., Mathieu, H., Vieville, T., Zhang, Z., Fua, P.,Theron, E., Moll L., et al., 1993. Real time correlation-based stereo:Algorithm, implementations and applications, Technique ReportRR-2013, INRIA.

Geusebroek, Jan-Mark, Burghouts, Gertjan J., Smeulders, Arnold W.M.,2005. The Amsterdam library of object images. Internat. J. Comput.Vision 61 (1), 103–112.

Harris, Chris, Stephens, Mike, 1988. A combined corner and edgedetector. In: Proceedings of The Fourth Alvey Vision Conference,Manchester, pp 147–151.

Hartley, R., Zisserman, A., 2000. Multiple View Geometry in ComputerVision. Cambridge University Press.

Jung, I.I.K., Lacroix, S., 2001. A robust interest point matchingalgorithm. In: International Conference on Computer Vision, Van-couver, Canada, pp. 538–543.

Kitchen, L., Rosenfeld, A., 1982. Grey level corner detection. PatternRecognition Lett., 95–102.

Lee, J.-S., Sun, Y.-N., Chen, C.-H., 1995. Multiscale corner detection byusing wavelet transform. IEEE Trans. Image Process. 4, 100–104.

Mokhtarian, F., Suomela, R., 1998. Robust corner detection throughcurvature scale space. IEEE Trans. Pattern Anal. Machine Intell. 20,1376–1381.

Moravec, H., 1977. Towards automatic visual obstacle avoidance. In:Proceedings of the 5th International Joint Conference on ArtificialIntelligence, August, pp. 584.

Okutomi, M., Kanade, T., 1993. A multiple-baseline stereo. IEEE Trans.Pattern Anal. Machine Intell. 15 (4), 353–363.

Pilu, M., 1997. Uncalibrated stereo correspondence by singular valuedecomposition, Computer Vision and Pattern Recognition.

Quddus, A., Fahmy, M., 1999. Fast wavelet-based corner detectiontechnique. Electron. Lett. 35 (4), 287–288.

Rangarajan, Krishnan, Shah, Mubarak, Van Brackle, David, 1989.Optimal corner detector. Computer Vision Graphics Image Process.48, 230–245.

Scharstein, D., Szeliski, R., 2002. A taxonomy and evaluation of densetwo-frame stereo correspondence algorithms. Internat. J. Comput.Vision 47 (1/2/3), 7–42.

Singh, A., Shneier, M., 1990. Gray level corner detection: A generalizationand a robust real time implementation. Computer Vision GraphicsImage Process. 51, 54–69.

Smith, S.M., Brady, J.M., 1997. Susan – a new approach to low levelimage processing. Internat. J. Comput. Vision 23 (1), 45–78.

Trajkovic M., Hedley, M., 1996. Fast feature detection and matching formachine vision. In: Proceedings of the 7th British Machine VisionConference, University of Edinburgh, Scotland, September 1996, pp.93–102.