[ieee 2010 canadian conference on computer and robot vision - ontario, canada...

8
Matching Maximally Stable Extremal Regions Using Edge Information and the Chamfer Distance Function Pantelis Elinas Australian Centre for Field Robotics The University of Sydney Sydney, Australia [email protected] Abstract We consider the problem of image recognition using lo- cal features. We present a method for matching Maximally Stable Extremal Regions [21] using edge information and the chamfer distance function [2]. We represent MSERs us- ing the Canny edges of their binary image representation in an affine normalized coordinate frame and find correspon- dences using chamfer matching. We evaluate the perfor- mance of our approach on a large number of data sets com- monly used in the computer vision literature and we show that it is useful for matching images under large affine and viewpoint transformations as well as blurring, illumination changes and JPEG compression artifacts. 1 Introduction In this paper, we consider the problem of matching im- ages of planar scenes using local features. Specifically, we present a new method for matching Maximally Stable Ex- tremal Regions (MSERs) [21] using the bidirectional cham- fer distance function [28] computed over the MSERs’ affine normalized connected component edges. Image matching using local features has received much attention in recent years because several proposed methods have been shown to perform robustly under image noise, changes in illumination, scene clutter and affine transfor- mations. Local features are most commonly either corner points, e.g., Harris corners [16], or connected components, e.g., image regions that have similar intensity values. Here, we focus on methods that work on gray scale images even though many popular approaches have extensions that can handle color images. Our work is concerned with match- ing Maximally Stable Extremal Regions that are connected component regions stable over a range of gray-scale thresh- old values. In order to match the regions extracted from two dif- ferent images of the same scene, a descriptor is computed that holds information about the signal structure over the region’s extent. Given the descriptors and a distance func- tion, we can compute correspondences among the features extracted from two images. In general, the type of descrip- tors that have been shown to perform best are based on his- tograms of gradient orientations even though for MSERs descriptors based on moment invariants were the first con- sidered [15, 21]. We propose the use of the bidirectional chamfer distance function for matching MSERs using edge information. We first normalize a detected region using information from its covariance matrix. Second, we compute the Canny edges of the region’s connected component in the normalized frame. Third, we evaluate the distance transform over the normal- ized image patch and, finally, we estimate the similarity be- tween any two MSERs by computing their bidirectional (or symmetric) chamfer distance. We show that our method works well for a number of challenging data sets exhibiting robustness under affine warping, viewpoint changes, illumi- nation variation, and image compression artifacts. The rest of this paper is structured as follows. In Sec- tion 2 we review the most relevant work on image match- ing using local features. In Section 3 we explain MSER normalization and edge extraction. In the same Section we also define the bidirectional chamfer distance function and explain how we use it for MSER matching. In Section 4 we experimentally evaluate our method using a large number of data sets common in the computer vision literature. We conclude and discuss future work in Section 5. 2 Previous work Image matching using local features has received much attention since the original work of Schmid et al. [27]. The basic idea is to represent an image using a collection of 2010 Canadian Conference Computer and Robot Vision 978-0-7695-4040-5/10 $26.00 © 2010 IEEE DOI 10.1109/CRV.2010.10 17

Upload: pantelis

Post on 10-Dec-2016

212 views

Category:

Documents


0 download

TRANSCRIPT

Matching Maximally Stable Extremal Regions Using Edge Information and the

Chamfer Distance Function

Pantelis Elinas

Australian Centre for Field Robotics

The University of Sydney

Sydney, Australia

[email protected]

Abstract

We consider the problem of image recognition using lo-

cal features. We present a method for matching Maximally

Stable Extremal Regions [21] using edge information and

the chamfer distance function [2]. We represent MSERs us-

ing the Canny edges of their binary image representation in

an affine normalized coordinate frame and find correspon-

dences using chamfer matching. We evaluate the perfor-

mance of our approach on a large number of data sets com-

monly used in the computer vision literature and we show

that it is useful for matching images under large affine and

viewpoint transformations as well as blurring, illumination

changes and JPEG compression artifacts.

1 Introduction

In this paper, we consider the problem of matching im-

ages of planar scenes using local features. Specifically, we

present a new method for matching Maximally Stable Ex-

tremal Regions (MSERs) [21] using the bidirectional cham-

fer distance function [28] computed over the MSERs’ affine

normalized connected component edges.

Image matching using local features has received much

attention in recent years because several proposed methods

have been shown to perform robustly under image noise,

changes in illumination, scene clutter and affine transfor-

mations. Local features are most commonly either corner

points, e.g., Harris corners [16], or connected components,

e.g., image regions that have similar intensity values. Here,

we focus on methods that work on gray scale images even

though many popular approaches have extensions that can

handle color images. Our work is concerned with match-

ing Maximally Stable Extremal Regions that are connected

component regions stable over a range of gray-scale thresh-

old values.

In order to match the regions extracted from two dif-

ferent images of the same scene, a descriptor is computed

that holds information about the signal structure over the

region’s extent. Given the descriptors and a distance func-

tion, we can compute correspondences among the features

extracted from two images. In general, the type of descrip-

tors that have been shown to perform best are based on his-

tograms of gradient orientations even though for MSERs

descriptors based on moment invariants were the first con-

sidered [15, 21].

We propose the use of the bidirectional chamfer distance

function for matching MSERs using edge information. We

first normalize a detected region using information from its

covariance matrix. Second, we compute the Canny edges of

the region’s connected component in the normalized frame.

Third, we evaluate the distance transform over the normal-

ized image patch and, finally, we estimate the similarity be-

tween any two MSERs by computing their bidirectional (or

symmetric) chamfer distance. We show that our method

works well for a number of challenging data sets exhibiting

robustness under affine warping, viewpoint changes, illumi-

nation variation, and image compression artifacts.

The rest of this paper is structured as follows. In Sec-

tion 2 we review the most relevant work on image match-

ing using local features. In Section 3 we explain MSER

normalization and edge extraction. In the same Section we

also define the bidirectional chamfer distance function and

explain how we use it for MSER matching. In Section 4 we

experimentally evaluate our method using a large number

of data sets common in the computer vision literature. We

conclude and discuss future work in Section 5.

2 Previous work

Image matching using local features has received much

attention since the original work of Schmid et al. [27]. The

basic idea is to represent an image using a collection of

2010 Canadian Conference Computer and Robot Vision

978-0-7695-4040-5/10 $26.00 © 2010 IEEE

DOI 10.1109/CRV.2010.10

17

local features, i.e., keypoints or corners, identified using a

descriptor capturing the signal signature over a small rect-

angular support region around the feature’s image center.

Most often, the keypoints selected are Harris corners [16]

made scale invariant using Lindeberg’s scale space the-

ory [19]. Some notable descriptors that work with varying

levels of success include oriented filter banks, shape con-

text, and gradient orientation histograms [1, 4, 5, 20]. For

robustness to affine image transformations a keypoint’s sup-

port region can be normalized using one of several methods

including edge information [15], interest point groups [6],

or shape adaptation [3, 22].

An alternative method for feature selection involves de-

tecting image blobs using the Laplacian, salient regions or

the segmentation of an image to regions of similar inten-

sity. A robust segmentation method recently proposed is

the Maximally Stable Extremal Region (MSER) detector of

Matas et al. [21] ; MSERs are based on an extension of the

Watershed segmentation algorithm [29] and were originally

developed for image matching in wide-baseline stereo. A

recent study comparing different region detectors showed

that MSERs work best for many scenes when measuring

detection repeatability under a large number of image trans-

formations [24]. MSERs were also recently extended to

a multi-scale representation and to work with color im-

ages [11].

MSERs can be robustly detected in images that have un-

dergone a wide variety of transformations. Researches have

proposed several methods for matching MSERs. Matas et

al. matched MSERs using invariants based on complex mo-

ments. Forssen et al. [12] used a shape descriptor based

on the Scale Invariant Feature Transform (SIFT) [20] com-

puted over the normalized binary image of the connected

components; they enhanced these descriptors by consider-

ing pairs of nearby features and also by considering tex-

ture information computed over the original gray-scale im-

age texture. Most importantly, they suggested that the SIFT

vector be estimated over the affine normalized binary con-

nected component instead of the original gray scale image

values.

We build upon this idea to match MSERs using the

edge information of the connected component. We do not

use complex moments following Matas et al. [21] or a

histogram-based descriptor like the one used by Forssen

et al. [12]; instead, we match MSERs using their edge

pixels and the chamfer distance function which has been

successfully used for edge-based, object class recogni-

tion [8, 25, 26].

The chamfer distance function has been utilized exten-

sively in detecting objects from their contour outline and

for matching Pictorial Structures [10, 18]. Gavrila [14] de-

veloped a Bayesian approach for detecting pedestrians in

images using a hierarchy of contour templates. In Picto-

(a) (b)

Figure 1. Examples of detected MSERs for 2different images; the features are shown bytheir covariance ellipses. For clarity we onlyshow (a) 70 and (b) 25 regions.

rial Structures objects are represented as a set of edge seg-

ments forming a codebook, along with a model of their spa-

tial distribution. Each segment can be localized in an edge

map using the chamfer distance function that we describe in

more detail in Section 3.2. Each localized segment votes for

the location of the object’s bounding box using the known

spatial distribution model. Shotton et al. [28] propose ori-

ented chamfer matching which extends the basic algorithm

to be orientation invariant; in the same work, they propose

a method for matching at multiple scales along with an ap-

proximation technique for faster computation. They show

experimentally that their method is excellent for object class

recognition using edge information.

3 Proposed method for MSER matching

We propose matching MSERs using their edge pixels

and the chamfer distance function. In Section 3.1, we de-

scribe the extraction and affine normalization of MSERs

and their edge-based representation. In Section 3.2, we de-

scribe the bi-directional chamfer distance function and its

use for matching MSERs.

3.1 Extracting and normalizing MSERs

Given an image, we extract MSERs using the method

originally described by Matas et al. [21]. Figure 1 shows

examples of detected MSERs. For each region, we compute

a covariance matrix C of the distribution of the component’s

pixels. We can use this matrix to compute a normalizing

transform as follows [12],

x = sAx + t (1)

where A = 2V D1/2 from the covariance matrix eigenvalue

decomposition C = V DV T . The matrix A maps points

18

Figure 2. Normalized MSER binary com-ponents (left column), the detected Cannyedges (middle column), and the distancetransform (right column). The first 2 rows

show regions from the image in part (a) ofFigure 1 and the last 2 rows show examplesfrom the image shown in part (b) of the sameFigure.

x from the normalized coordinate frame to the original im-

age. In Equation 1, t is a position offset and the constant s

is a scale factor which we set to the value 1.75 for all our

experiments; this scale value guarantees that the connected

component pixels are all included in the normalized image

patch.

The above normalization is correct up to a rotation.

We make the regions rotation invariant using the common

method of evaluating a histogram of gradient orientations

over the normalized image patch and then selecting the ori-

entation at the largest histogram bin. In addition, more than

one orientations can be considered if there are several strong

peaks in the histogram [12, 20]. Figure 2 shows a few ex-

amples of normalized MSERs taken from the images shown

in Figure 1.

3.2 Chamfer matching

Chamfer matching [2] has been used extensively for

contour-based object class recognition [28]. The chamfer

distance function is defined as follows,

d(Em,Eq)(x) =

1

N

xt∈Em

minxe∈Eq

‖(xt + x) − xe‖2 (2)

where Em and Eq are the edge maps for the model and

query regions respectively; most commonly these are com-

puted using the Canny edge detector [7]. N is the size of

Em, and ‖· ‖2 is the l2 norm. The model edge map Em con-

sists of the Canny edges of the normalized connected com-

ponent as shown in Figure 2. The chamfer distance function

can be evaluated efficiently by first computing the Distance

Transform of the query edge map Eq given by

dtEq(x) = min

xe∈Eq

‖x − xe‖2 (3)

which computes the minimum distance of every pixel to an

edge pixel in Eq. The rightmost column of Figure 2 shows

a few examples of the distance transform for MSERs taken

from the images in Figure 1. Given dtEq(x), we can evalu-

ate the chamfer distance between two edge maps using the

following equation,

d(Em,Eq)(x) =

1

N

xt∈Em

dtEq(xt + x) (4)

noting that d(Em,Eq)(x) 6= d(Eq,Em)(x).The distance transform can be computed efficiently in

time linear to the size of the edge map [9]. When this dis-

tance function is used in object class recognition, it has to

be evaluated over a large number of orientations and scales

of the model because it is not rotation and scale invariant.

In our case, we work in an affine normalized space as de-

scribed in Section 3.1 and so we need not perform such an

expensive search procedure. For the same reason, in Equa-

tion 4 we need not to search over the value of x that mini-

mizes it and we can safely set it to 0 value taking advantage

of both the affine normalization and the fact that the chamfer

distance function is robust to small affine distortions result-

ing from errors in MSER extraction.

For additional robustness, we compute the bidirectional

or symmetric chamfer distance between 2 MSERs. Given

edge maps Em and Eq for the model and query features

respectively, we evaluate the matching score using the fol-

lowing equation (shown in its general form but as explained

above x is always 0 in our case),

d(Em,Eq)b (x) = d

(Em,Eq)(x) + d(Eq,Em)(x) (5)

The bidirectional metric compensates for the fact that com-

plex edge maps tend to match well with all models because

their distance transform has small values over the entire im-

age.

4 Experimental evaluation

We evaluate our method using two sets of images. One

set is that used by Mikolajczyk et al. [23] for evaluating the

19

Figure 3. The reference images from the datasets used for the experimental evaluation.

performance of a large number of descriptors and ground

truth is available. The second set of images, we collected

ourselves to provide additional evidence of our method’s

robustness.

We performed experiments using the first set of data

with the available ground truth wanting to determine how

our matching method behaves as the images undergo dif-

ferent transformations. We used images from 4 different

scenes with the images undergoing rotation plus scaling,

viewpoint, blurring, and compression transformations; re-

spectively, these are the boat, graf, bikes, and ubc image

sets from [23]. Figure 3 shows the reference images for

each of the sets which consist of 6 images each and the ho-

mographies relating the reference images and all others are

known.

In Figure 4, we show the number of correct matches be-

tween the reference image and every other image for each

data set. For each MSER in the reference image, we find its

closest match in the query image as the one that minimizes

Equation 5. Out of these and given the known homogra-

phy relating the two images, we consider two MSERs as

matching if computing the overlap error between the re-

gions’ covariance ellipses [12, 23] is less than or equal to

50%. We see that the performance of our method decreases

as the amount of warping increases regardless of its type.

We were able to find a good amount of correct matches for

all pairs of images except for the 6th image in the graf se-

quence (corresponding to a 60 degree change in viewpoint).

We also notice that large changes in scale greatly reduce

matching performance.

One parameter that we have not specified yet is the spa-

tial resolution of the normalized image patches. Shown in

Figure 4 are the results for 6 different resolutions, from

41 × 41 to 91 × 91 at 10 pixel intervals. We can see that in

20 30 40 50 600

20

40

60

80

100

120

140

160

180

Viewpoint angle

Nu

mb

er

of

co

rre

ct

ma

tch

es

41x41

51x51

61x61

71x71

81x81

91x91

(a)

2 3 4 5 60

10

20

30

40

50

60

70

80

90

Image numberN

um

ber

of corr

ect m

atc

hes

41x41

51x51

61x61

71x71

81x81

91x91

(b)

2 3 4 5 60

20

40

60

80

100

120

140

160

180

Image number

Nu

mb

er

of

co

rre

ct

ma

tch

es

41x41

51x51

61x61

71x71

81x81

91x91

(c)

2 3 4 5 60

50

100

150

200

250

300

350

Image number

Nu

mb

er

of

co

rre

ct

ma

tch

es

41x41

51x51

61x61

71x71

81x81

91x91

(d)

Figure 4. The number of correct matches for

different spatial resolutions and under differ-ent kinds of image transformations (a) view-point, (b) rotation and scaling, (c) blur, and(d) JPEG image compression.

20

0 10 20 30 40 50 60 70 80 90 1000

10

20

30

40

50

60

70

80

90

100

False Positive Rate

Tru

e P

ositiv

e R

ate

img 2

img 3

img 4

img 5

img 6

0 10 20 30 40 50 60 70 80 90 1000

10

20

30

40

50

60

70

80

90

100

False Positive Rate

Tru

e P

ositiv

e R

ate

20 degrees

30 degrees

40 degrees

50 degrees

(a) (b)

0 10 20 30 40 50 60 70 80 90 1000

10

20

30

40

50

60

70

80

90

100

False Positive Rate

Tru

e P

ositiv

e R

ate

img 2

img 3

img 4

img 5

img 6

0 10 20 30 40 50 60 70 80 90 1000

10

20

30

40

50

60

70

80

90

100

False Positive Rate

Tru

e P

ositiv

e R

ate

img 2

img 3

img 4

img 5

img 6

(c) (d)

Figure 5. ROC curves for the (a) boat, (b) graf,

(c) bikes, and (d) ubc image sets. The parame-ter we vary is the chamfer distance thresholdvalue.

most cases a region size larger than or equal to 51× 51 pix-

els generates good results but at resolution of 71 × 71 we

obtain the most consistent results including large changes

in viewpoint. The latter resolution provides a good compro-

mise between getting a large number of correct matches and

having low computational overhead (the lower the spatial

resolution, the fewer calculations necessary for computing

the Distance Transform).

When given images to match, we don’t have the ground

truth homography and we need to estimate it from the com-

puted region correspondences using RANSAC [17]. In this

case, we need to determine what the correct matches are

such that a large number of true positives and a small num-

ber of false positives are included. To determine what con-

stitutes a good match, we look at the chamfer distances be-

tween regions and accept those below a given threshold as

correct. In order to determine the threshold value, we per-

formed an ROC-based evaluation and we show the results

in Figure 5. From these, we were able to determine that we

can set the threshold to the value 2.0 giving a good mix of

true and false positives and allowing us to compute using

RANSAC the homography relating a pair of images.

Using the determined values for the spatial resolution

of the normalized MSERs and chamfer distance threshold,

we compared the performance of MSER chamfer matching

to state-of-the-art gradient orientation histogram descriptors

(SIFT [20] and GLOH [24]) and moment invariants [15].

We show the results in Figure 6. We used the programs

made available by [24] to calculate the number of correct

matches. For this evaluation we only compare the perfor-

20 30 40 500

50

100

150

200

250

300

Viewpoint angle

Nu

mb

er

of

co

rre

ct

ma

tch

es /

66

7

Chamfer

SIFT

GLOH

Moment invariants

(a )

2 3 4 5 60

20

40

60

80

100

120

Image number

Nu

mb

er

of

co

rre

ct

ma

tch

es /

21

4

Chamfer

SIFT

GLOH

Moment invariants

(b)

Figure 6. Comparing the performance of

chamfer matching versus other popular de-scriptors on the (a) graf and (b) boat image se-quences.

mance of the different methods using the graf and boat

data. We observe that on the graf data chamfer matching

performed poorly compared to the other approaches. How-

ever, on the boat data MSER chamfer matching performed

closely with the other methods in some cases outperforming

the moment invariants approach. Considering that chamfer

matching is easy to implement, further study of its perfor-

mance especially for scenes seen from different viewpoints

is needed; in addition, chamfer matching could be useful in

complementing the histogram-based methods but this needs

to be investigated in more detail.

In general, we were able to correctly compute the ho-

mography for all pairs of images in our data set except for

the largest change in viewpoint in the graf sequence and the

largest changes in blurring and image compression in the

bikes and ubc sequences. In parts (a) and (b) of Figure 7, we

show representative recognition results for the image pairs

from the boat and graf data sets; specifically, we show the

results for matching the reference images with the 5th image

in each set (in the case of the graf image this corresponds to

a viewpoint change of 50 degrees). For the examples shown

in Figure 7, the number of matched regions out of the total

21

(a)

(b)

Figure 7. Matching results for (a) rotation plus scaling and (b) viewpoint change (50 degrees).Theleft column shows the correct matches and the right column the estimated homography relating the

two images (the model image bounding rectangle projected to the query image using the estimatedhomography).

number of regions in the model where (a) 27 out of 214 and

(b) 32 out of 667.

Figure 8 shows another example of successful recog-

nition with changes in viewpoint including a considerable

amount of image clutter. In the 4 cases shown, the book was

successfully recognized and localized in each query image.

In the last one, there is a significant difference in illumina-

tion between the model and query images because the latter

was taken without flash which was used for the other 3 im-

ages. The number of correctly matched regions are (a) 46,

(b) 52, (c) 48, and (d) 52 out of a total 87 MSERs in the

model.

5 Conclusions and future work

In this paper, we presented a new method for matching

MSERs using edge information and the bidirectional cham-

fer distance function. We showed experimentally that our

approach is an effective way for matching images of pla-

nar scenes under large viewpoint, illumination, and affine

changes as well as JPEG compression artifacts.

In future work we plan to perform a more detailed com-

parison of the proposed method with state-of-the-art meth-

ods utilizing descriptors based on gradient orientation his-

tograms. In addition, we wish to extend our method to work

with color images and address two remaining issues. The

first issue is improving our method when it comes to match-

ing images with large changes in scale. We believe that an

improvement can be achieved with the introduction of the

multi-resolution MSERs from [12]. Second, matching at

the moment requires that Equation 5 is evaluated M × N

times where M and N are the numbers of MSERs in the

model and query images respectively. Since these opera-

tions for each pair of regions are independent of each other

and involve only additions and a single division, we should

be able to obtain a large speed-up with an optimized imple-

mentation on a Graphics Processing Unit (GPU) [13].

6 Acknowledgements

This work is supported by the Rio Tinto Centre for Mine

Automation and the ARC Centre of Excellence programme

funded by the Australian Research Council (ARC) and the

New South Wales State Government. We would also like to

22

(a)

(b)

(c)

(d)

Figure 8. Additional recognition results for a book seen from different viewpoints. The left columnshows all the MSERs matched such that the chamfer distance is less than or equal to 2.0; the middle

column shows the matched regions that are correctly matched after the computation of the homog-raphy relating the model and query images; and the right column shows the model image boundingrectangle projected to the query image using the estimated homography.

23

thank the reviewers for their many helpful suggestions and

comments.

References

[1] A. P. Ashbrook, N. A. Thacker, P. I. Rockett, and C. I.

Brown. Robust recognition of scaled shapes using pairwise

geometric histograms. In BMVC ’95: Proceedings of the 6th

British conference on Machine vision (Vol. 2), pages 503–

512, Surrey, UK, UK, 1995. BMVA Press.

[2] H. G. Barrow, J. M. Tenenbaum, R. C. Bolles, and H. C.

Wolf. Parametric correspondence and chamfer matching:

two new techniques for image matching. In IJCAI’77: Pro-

ceedings of the 5th international joint conference on Artifi-

cial intelligence, pages 659–663, San Francisco, CA, USA,

1977. Morgan Kaufmann Publishers Inc.

[3] A. Baumberg. Reliable feature matching across widely sep-

arated views. In Computer Vision and Pattern Recognition,

2000. Proceedings. IEEE Conference on, volume 1, pages

774–781, 2000.

[4] H. Bay, A. Ess, T. Tuytelaars, and L. V. Gool. Speeded-up

robust features (SURF). Computer Vision and Image Under-

standing, 110(3):346–359, 2008.

[5] S. Belongie, J. Malik, and J. Puzicha. Shape matching and

object recognition using shape contexts. IEEE Trans. Pat-

tern Anal. Mach. Intell., 24(4):509–522, 2002.

[6] M. Brown and D. Lowe. Invariant Features from Interest

Point Groups. In Proceedings of the 13th British Machine

Vision Conference, pages 253–262, Cardiff, 2002.

[7] J. Canny. A computational approach to edge detection. IEEE

Trans. Pattern Anal. Mach. Intell., 8(6):679–698, 1986.

[8] O. Carmichael and M. Hebert. Shape-based recognition

of wiry objects. IEEE Trans. Pattern Anal. Mach. Intell.,

26(12):1537–1552, 2004.

[9] P. F. Felzenszwalb and D. P. Huttenlocher. Distance trans-

forms of sampled functions. Technical report, Cornell Com-

puting and Information Science, September 2004.

[10] P. F. Felzenszwalb and D. P. Huttenlocher. Pictorial

structures for object recognition. Int. J. Comput. Vision,

61(1):55–79, 2005.

[11] P.-E. Forssen. Maximally stable colour regions for recog-

nition and matching. In IEEE Conference on Computer

Vision and Pattern Recognition, Minneapolis, USA, June

2007. IEEE Computer Society, IEEE.

[12] P.-E. Forssen and D. Lowe. Shape descriptors for maxi-

mally stable extremal regions. In IEEE International Con-

ference on Computer Vision, volume CFP07198-CDR, Rio

de Janeiro, Brazil, October 2007. IEEE Computer Society.

[13] J. Fung and S. Mann. Using graphics devices in reverse:

Gpu-based image processing and computer vision. In Mul-

timedia and Expo, 2008 IEEE International Conference on,

pages 9–12, April 2008.

[14] D. Gavrila. A bayesian, exemplar-based approach to hier-

archical shape matching. Pattern Analysis and Machine In-

telligence, IEEE Transactions on, 29(8):1408–1421, Aug.

2007.

[15] L. J. V. Gool, T. Moons, and D. Ungureanu. Affine/ pho-

tometric invariants for planar intensity patterns. In ECCV

’96: Proceedings of the 4th European Conference on Com-

puter Vision-Volume I, pages 642–651, London, UK, 1996.

Springer-Verlag.

[16] C. Harris and M. Stephens. A combined corner and edge

detector. In Proceedings of The Fourth Alvey Vision Confer-

ence, pages 147–151, 1988.

[17] R. I. Hartley and A. Zisserman. Multiple View Geometry

in Computer Vision. Cambridge University Press, ISBN:

0521540518, second edition, 2004.

[18] P. M. Kumar, P. Torr, and A. Zisserman. Extending pictorial

structures for object recognition. In British Machine Vision

Conference, 2004.

[19] T. Lindeberg. Scale-Space Theory in Computer Vision.

Kluwer Academic Publishers, Norwell, MA, USA, 1994.

[20] D. G. Lowe. Distinctive image features from scale-invariant

keypoints. Int. J. Comput. Vision, 60(2):91–110, 2004.

[21] J. Matas, O. Chum, M. Urban, and T. Pajdla. Robust wide-

baseline stereo from maximally stable extremal regions. Im-

age and Vision Computing, 22(10):761 – 767, 2004.

[22] K. Mikolajczyk and C. Schmid. Scale & affine invariant

interest point detectors. Int. J. Comput. Vision, 60(1):63–86,

2004.

[23] K. Mikolajczyk and C. Schmid. A performance evaluation

of local descriptors. IEEE Trans. Pattern Anal. Mach. Intell.,

27(10):1615–1630, 2005.

[24] K. Mikolajczyk, T. Tuytelaars, C. Schmid, A. Zisserman,

J. Matas, F. Schaffalitzky, T. Kadir, and L. V. Gool. A com-

parison of affine region detectors. Int. J. Comput. Vision,

65(1-2):43–72, 2005.

[25] K. Mikolajczyk, A. Zisserman, and C. Schmid. Shape recog-

nition with edge-based features. Proceedings of the British

Machine Vision Conference, Norwich, UK, 2003.

[26] P. A. Opelt Andreas and Z. Andrew. A boundary-fragment-

model for object detection. In ECCV ’06: Proceedings of the

14th European Conference on Computer Vision - Volume II,

pages 575–588, Graz, Austria, 2006. Springer-Verlag.

[27] C. Schmid and R. Mohr. Local greyvalue invariants for im-

age retrieval. IEEE Transactions on Pattern Analysis and

Machine Intelligence, 19:530–535, 1997.

[28] J. Shotton, A. Blake, and R. Cipolla. Multiscale categor-

ical object recognition using contour fragments. Pattern

Analysis and Machine Intelligence, IEEE Transactions on,

30(7):1270–1281, July 2008.

[29] L. Vincent and P. Soille. Watersheds in digital spaces: an

efficient algorithm based on immersion simulations. Pattern

Analysis and Machine Intelligence, IEEE Transactions on,

13(6):583–598, 1991.

24