scale-dependent/invariant local 3d geometric features …2791... · scale-dependent/invariant local...

Scale-Dependent/Invariant Local 3D Geometric Features and Shape Descriptors

A Thesis

Submitted to the Faculty

of

Drexel University

by

John Novatnack

in partial fulfillment of the

requirements for the degree

of

Masters of Science in Department of Computer Science

2008

-1

Acknowledgements

I would like to thank my adviser Dr. Ko Nishino for his guidance and encouragement without

which this work would not have been possible. I am extremely grateful for the lessons Dr. Nishino

taught me about properly conducting research and communicating my results in papers and presen-

tations. These lessons will be invaluable in all my future endeavors. I would also like to acknowledge

and thank my defense committee: Dr. Fernand S. Cohen, Dr. Dario Salvucci and Dr. Ali Shoko-

ufandeh. Finally I would like to acknowledgment Dr. Ali Shokoufandeh for his contributions to this

work and for his mentorship throughout my career at Drexel University.

I would like to acknowledge the following for the use of 3D mesh models or range images: the

Stanford Graphics Lab, Signal Analysis and Machine Perception Laboratory at Ohio State Univer-

sity, INRIA, Visual Computing Lab at ISTI-CNR and the Aim@Shape repository.

This material is based in part upon work supported by the National Science Foundation under

CAREER award IIS-0746717. Any opinions, findings, and conclusions or recommendations expressed

in this material are those of the author(s) and do not necessarily reflect the views of the National

Science Foundation.

0

Table of Contents

List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

1.1 Background and Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

1.1.1 Scale-Space Analysis of Image Structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

1.1.2 Analysis of Scale in 3D Geometric Data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

1.2 Thesis Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2. Geometric Scale-Space of a 3D Mesh Model and Range Image . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2.1 2D Representation of the Surface Geometry of a 3D Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2.1.1 Normal and Distortion Maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2.1.2 Multiple Normal Maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

2.1.3 Geodesic Distance Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

2.2 2D Representation of the Surface of a Range Image . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

2.2.1 Normal Map . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

2.2.2 Geodesic Distance Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

2.3 Geometric Scale-Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

2.3.1 Geometric Scale-Space Operator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

3. Feature Detection in the Geometric Scale-Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

3.1 Derivatives of the Normal Map . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

3.2 Corners . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

3.3 Edges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

3.4 Automatic Scale-Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

3.5 Experiments. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

3.5.1 Corner and Edge Detection on 3D Mesh Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

3.5.2 Corner Detection on Range Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

3.5.3 Noisy Surface Normals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

3.5.4 Varying Sampling Densities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

4. Scale-Dependent/Invariant Local 3D Shape Descriptors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

4.1 Exponential Map . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

4.2 Scale-Dependent Local 3D Shape Descriptors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

1

4.3 Scale-Invariant Local 3D Shape Descriptors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

4.4 Pairwise Matching of Scale-Dependent/Invariant Local 3D Shape Descriptors . . . . . . . . . . 32

4.4.1 Similarity of Scale-Dependent and Scale-Invariant Descriptors . . . . . . . . . . . . . . . . . . . 32

4.4.2 Pairwise Matching of Scale-Dependent Descriptors. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

4.4.3 Pairwise Matching of Scale-Invariant Descriptors. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

4.5 Experiments. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

4.5.1 Registration of Range Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

4.5.2 Registration of Range Images with Consistent Global Scale . . . . . . . . . . . . . . . . . . . . . . 37

4.5.3 Registration of Range Images with Inconsistent Global Scale . . . . . . . . . . . . . . . . . . . . 39

5. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

5.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

5.2 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

2

List of Figures

2.1 2D normal and distortion map of a 3D model. (a) shows the original model. (b) illustratesthe dense 2D normal map. Observe that geometric features such as the creases on thepalm are clearly visible. (c) shows the distortion maps corresponding to the normalmaps. Darker regions have been shrunk relative to the brighter regions. Iso-contour linesillustrate the various levels of distortion induced by the embedding.. . . . . . . . . . . . . . . . . . . . . . . . . 14

2.2 Representing a 3D model using multiple normal maps. The heat map (a) illustrates thedensity of 3D vertices mapped to each point in the normal map shown in Figure 2.1(b).Figure (b) illustrates the 5 clustered components, each visualized with a different color,automatically computed from the density map in Figure (a). Each component is rep-resented with a separate complimentary normal map. Figures (b) and (c) show twosupplementary normal maps corresponding to two fingers of the model shown in Fig-ure 2.1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

2.3 2D normal maps of two adjacent range images. Figure (a) shows the depth maps oftwo range images of the Buddha model, separated by 24 degrees. Figure (b) shows thetwo dense normal maps built by triangulating the range images and computing a surfacenormal for each point in the range image. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

2.4 The geometric scale-space representation of the 3D mesh model shown in Figure 2.1 andFigure 2.2(b,c). As the standard deviation increases fine model details are smoothed awayleaving only coarse geometric structures. For example the finger nail is quickly smoothedaway, while the prominent creases on the palm remain visible even at the coarsest scale.Although the sizes of the finger nails in the two supplementary normal maps are different,the rate of smoothing is consistent due to the use of the geodesic Gaussian kernel thataccounts for the distortion induced by the embedding. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

3.1 Corners detected on the 2D normal map. (a) illustrates the 20 strongest corners on the2D representation of the hand model at scale σ = 3. Observe that the corner points onthe palm are primarily located where two creases converge, or where there is an acutebend in one crease. (b) shows the strongest corner on two of the finger normal maps atscale σ = 7. At this coarse scale the corners are detected on the tip of the finger. . . . . . . . . . 22

3.2 Edges detected at one scale level (σ = 1). The edges are detected accurately on surfacepoints with locally maximum curvature, namely 3D ridges and valleys. Here the thecreases of the palm form the edges. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

3.3 Scale-dependent geometric corners (a) and edges (b) detected on the hand model. Thecorners are represented with 3D spheres which are colored and sized according to theirrespective scale (blue and red correspond to the finest and coarsest scales, respectively).The corners accurately represent the geometric scale variability of the model, for instancewith fine corners on the palm creases and coarse corners at the tips of the fingers. Theedges also encode the geometric scale variability, tracing edge segments that arise atdifferent scales. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

3.4 Scale-dependent geometric corner and edge detection results on a disc topology model(Caesar) and two genus zero models (Buddha and armadillo). The corner and edgesare accurately detected across different scales. The resulting scale-dependent geometricfeature set encode the scale variability of the underlying surface geometry resulting in aunique representation of each model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

3

3.5 Scale-dependent geometric corner detection on two pairs of range images, separated by 24degrees. Despite the noise inherent to range image data, the corners are distributed acrossscales and reflect the relative scale of the underlying geometric structures. Additionally,the corners display a high degree of repeatability, both in location and scale, betweentwo adjacent range images. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

3.6 Scale-dependent geometric corner detection with the presence of noisy surface normals(a), and with varying surface sampling densities (b). When compared with the cornersshown in Figure 3.4 the results demonstrate that the scale-dependent corners detectedwith our framework are largely invariant to significant input noise and variations in thesampling density. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

4.1 Scale-dependent features and local shape descriptors in the geometric scale-space of arange image. Features are colored and according to the scale in which they were detected,with red being the coarsest and blue being the most fine. A subset of local shape-dependent shape descriptors is illustrated that form a hierarchical representation of theunderlying scale-variability in the surface of a range image and enables correspondencesto be determined robustly between pairs of range images. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

4.2 Matching two range images with consistent global scale represented as a set of scale-dependent local shape descriptors. On the left we show the 67 point correspondencesfound with our matching algorithm and on the right the resulting rigid transformation.The hierarchy of the local scale-dependent 3D shape descriptors enables a coarse-to-fine sampling strategy that results in a large number of correspondences that accuratelyapproximate the pairwise rigid transformation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

4.3 Matching two range images with inconsistent global scales represented with a set of scale-invariant local shape descriptors. On the left we show the 24 point correspondences foundwith our matching algorithm and on the right the resulting similarity transformation.Even though the range images differ by a global scale factor of approximately 2.4 the scale-invariant local descriptors enable us to recover the similarity transformation accurately. . . 35

4.4 Fully automatic approximate registration of 15 views of the Buddha model, 12 views of thearmadillo model and 18 views of the angel model with scale-dependent local descriptors.In the first column we show the initial set of range images. In the second we show theapproximate registration obtained with our framework, which is further refined with ICPin the third column. Finally a water tight model is built using a surface reconstructionalgorithm [18]. Observe that the initial approximation obtained with our framework isquite accurate. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

4.5 Automatic registration of 42 range images, 15 views of the Buddha model, 12 viewsof the armadillo and 15 views of the dragon model. The accuracy of the approximateregistration found with our framework enables us to automatically discover that the rangeimages correspond to three disjoint models. Note that these registrations have not beenpost-processed with a global registration algorithm. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

4

4.6 Fully automatic approximate registration of 15 views of the Buddha and dragon modelseach with a random global scaling from 1 to 4. For each model we visualize the initialset of range images and the initial alignment obtained by our framework. In columnthree we show the results after applying ICP and in column four we show the resultsof the surface reconstruction. Even with the substantial variations in the global scale,our scale-invariant representation of the underlying geometry enables us to accuratelyapproximate the registration without any assumptions about the initial poses. . . . . . . . . . . . . . 39

4.7 Automatic registration of 42 randomly scaled range images, 15 views of the Buddhamodel, 12 views of the armadillo and 15 views of the dragon model, each. Each range im-age was randomly scaled by a factor between 1 and 4. Again, despite the significant scalevariations, our scale-invariant representation of the underlying local geometric structuresenables us to fully automatically construct initial pose estimates for all three modelssimultaneously. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

5

AbstractScale-Dependent/Invariant Local 3D Geometric Features and Shape Descriptors

John NovatnackAdvisor: Ko Nishino

The quality and abundance of three-dimensional geometric data is rapidly increasing with the

drastic improvement in the cost and effectiveness of 3D acquisition hardware. In fact, three-

dimensional geometric data already plays a central role in many computer vision and computer

graphics applications such as autonomous vehicle navigation, 3D object recognition and the computer-

based preservation of cultural artifacts. Despite the increasing relevance and importance of geometric

data, current techniques of processing the data have neglected to explicitly model and exploit a sig-

nificant source of information of the data - the scale variability of the local geometric structures.

In this thesis we overcome the limitation of past techniques with a comprehensive framework of

modeling the scale-variations in local geometric structures, effectively adding an additional dimen-

sion to geometric data. To accomplish this we derive the geometric scale-space, a representation

of local geometric structures at various degrees of scale. This representation enables us to define

scale-dependent geometric feature detectors, such as corners and edges, that determine not only the

location of salient geometric features, but also their relative scales. The augmentation of a geometric

feature with its intrinsic scale enables us to define scale-dependent/invariant local shape descrip-

tors that together form both a hierarchical and scale-invariant representation of the local geometric

structures of a 3D shape. We derive and present the theory of these methods and also demonstrate

their effectiveness for the purposes of robust 3D feature detection and fully automatic range image

registration.

6

1. Introduction

Three-dimensional geometric data plays a fundamental role in many modern systems and ap-

plications. For example in autonomous vehicle navigation systems, 3D geometric data is acquired

through laser range scanners or other sensors and is used to make navigation decisions and to detect

threats in the environment. Also 3D geometric data is beginning to play a central role in computer-

based cultural preservation. Archaeologists have begun to bring 3D range scanners to dig sites so

that 3D computer models can be built as the artifacts are excavated. The overall increase in the

quality, ease of deployability and affordability of 3D acquisition systems has led to a drastic increase

in the importance of 3D geometric data in applications such as these and many more. This increase

in importance has created a need for effective algorithms to represent, acquire and process this 3D

geometric data.

When acquired with modern 3D acquisition hardware, 3D geometric data is often incredibly

complex, consisting of many millions of points or more. Data of such immense size and complexity

makes building efficient methods of processing 3D geometric data a challenging task. A fundamental

question being asked by computer vision and computer graphics researchers is how to construct rich,

but concise representation of the local structures in 3D geometric data that alleviates some of the

complexity of 3D geometric data while enabling efficient and robust geometric processing algorithms.

In order to deal with the enormity of geometric data, it is common to detect local salient geometric

features that together form a more concise representation of the underlying geometry. For example,

in the case of a human bust model we may represent the underlying data by a set of points on

the nose, skin pores, hair and corners of the mouth. Local geometric features such as these play a

central role in a large number of computer vision and computer graphics applications and therefore

developing rich 3D geometric features is a fundamental problem in computer vision and computer

graphics research.

Despite the importance of detecting a rich and concise set of geometric features, most current

methods have completely ignored a significant source of information about local geometric structures

- their relative scale. Instead the scale of local structures is incorporated in an ad hoc manner,

such as with tunable parameters that can lead to ineffective and incomplete representations of the

underlying geometric structures. The topic of this thesis is to develop a comprehensive framework

for detecting scale-dependent geometric features that may then be encoded in rich shape descriptors.

7

For example again consider the case of a human bust model. Rather than simply detecting salient

geometric points on the nose, skin pores and corners of the mouth, our framework can augment each

feature with a notion of its relative scale, i.e. the nose occurs at a coarser scale than the mouth

corners, which in turn occur at a coarser scale than the skin pores. Augmenting local geometric

features with a notion of scale is a powerful paradigm that effectively adds a fourth dimension to

our representation of 3D geometric data. In fact, the central role local geometric features play

in computer vision and computer graphics causes such a technique to be widely applicable to any

method or application that has a significant geometric processing component.

1.1 Background and Related Work

Representing complex data with local features is not an approach specific to 3D geometry, but is

also commonly used to represent local image structures for the purposes of image processing or 2D

computer vision. In fact, the problem of incorporating notions of the scale of local image structures

was recognized early on in computer vision research and has led to a large number of effective

algorithms with strong theoretical bases.

1.1.1 Scale-Space Analysis of Image Structures

In computer vision, early feature detectors such as corners [13] and edges [27] proved to be

powerful representation of underlying local image structures. However, they suffer from the fact that

they are tied to a specific scale, often determined by the size of the local window used to detect the

feature in the image. In order to overcome this limitation researchers developed methods of analyzing

local image structures at various scales simultaneously, without the use of a tunable parameter [42,

41, 24, 19, 23]. This is achieved by constructing an image scale-space, or a representation of an image

at different scales, effectively adding a third dimension to a two-dimensional image. The scale-space

of an image is constructed by convolving the image with Gaussian kernels of increasing standard

deviation. The standard deviation of the Gaussian serves as the third dimension of the scale-space

that controls the amount of smoothing, or blurring, of the local image structures.

The scale-space representation of a 2D image is powerful in that it enables features to be detected

at all scales simultaneously. Consider, for example, an image of a house. At the base of the scale-

space is the inputted image from which a countless number of fine scale corners, such as on the texture

of the bricks, may be detected. However, as the standard deviation of the Gaussian is increased,

8

the fine scale features are smoothed away leaving only coarse, or large scale features, such as on

the boundary contours of the house. The scale-space of an image allows us to detect both of these

local feature types without the need to manually tune a feature detector for the “magical number”

that will result in a feature set that represents all the underlying image structures accurately. In

fact, the inherent multi-scale nature of image structures in complex scenes may make such a number

impossible to find. An additional strength of the scale-space representation of an image is that it

is not tied to a specific feature detector. Instead a large number of features detectors have been

derived such as corners, edge, blobs, ridges and more [23].

The 2D scale-space allows us to detect not only the location of a local feature in an image, but

also its intrinsic scale, through the use of an automatic scale-selection technique [24]. Automatic

scale-selection determines the intrinsic scale of a feature by searching for local maxima in the feature

response across image scales. Augmenting local image features with a notion of their intrinsic scale

serves as an added piece of information that enhances the representational power of the feature set.

Additionally, the ability to determine the intrinsic scale of a feature enables scale-invariant local

image features detectors [28, 25], that is, local features that are invariant to the scale of the object

in the image. Scale-invariant image features enable objects in two different images to be matched

even if they occur at drastically different scales. For example, consider the case where one image

contains a close-up of a toy and another image contains a crowded scene with a number of toys,

including the one in the first image. Scale-invariant image features enable us to match these two

instances of the same toy, despite their drastic scale differences in the two images. Scale-invariant

feature descriptors have proven to be robust to changes in lighting, noise and even slight rotations

in depth and have been shown to significantly outperform local image features that do not account

for the variations in the scale of local image structures [29].

1.1.2 Analysis of Scale in 3D Geometric Data

In order to model and exploit the scale-variations in 3D geometric data the first issue is to

construct a representation of the data at different scales. One approach to this problem is to adapt

techniques of constructing a multi-resolution representation of a mesh [14, 5], where the complexity

of the mesh may be changed interactively. These methods are efficient and well supported in current

3D programming libraries; however, they suffer from the fact that they are sensitive to the sampling

of the original model. Instead, most approaches of modeling a 3D shape at different scales have

attempted to apply smoothing operators similar to 2D scale-space theory to 3D geometric data by

9

smoothing the 3D points, for instance with Gaussian kernels [30] or mean curvature flow [35, 4].

Although these approaches seem straightforward they are problematic as they lead to alternations

in the global topology of the geometric data, in particular through fragmentation of the original

model [38], which leads to an erroneous representation of the geometry at different scales.

A second limitation of past approaches is that the Euclidean distance between 3D points was

used as the distance metric for modeling the geometry and detecting local features at different

scales [10, 22, 34, 20, 17]. This can lead to the creation of erroneous features when portions of the

surface are close in 3D space, however actually have a geodesic distance that is large. For instance,

consider the case where we wish to detect features at the tip of each finger on a 3D human hand

model. The corner features on the tip of the finger naturally exist at a relatively coarse scale, and

therefore if the Euclidean distance is used as the distance measure the corner detector may include

both finger tips simultaneously. However, if the geodesic distance is used as the base distance

metric then the finger tips will correctly have a large distance. Using an accurate distance metric is

especially important for detecting 3D geometric features as feature repeatability and reliability are

critical when matching two 3D feature sets.

A final limitation of past methods is that they are tailored to detect only a single type of feature,

such as corners [10] or edges [33], and do not extend easily to other feature types. This is in stark

contrast to the 2D scale-space which is a general representation of the image at different scales that

has enabled a large number of feature detectors to be derived. This flexibility is important as it

enables the framework to be extended to a large number of applications.

1.2 Thesis Overview

In this thesis we present a comprehensive framework for modeling the scale variability of local

geometric structures and representing that variability in scale-dependent/invariant geometric fea-

tures and shape descriptors. We begin by presenting our technique for constructing the geometric

scale-space, a canonical representation of the scale-variations in 3D geometric data, in Chapter 2.

Similarly to the scale-space of an image, the geometric scale-space is a general representation of

geometric data that effectively adds a fourth dimension to three-dimensional data. Also similarly

to the scale-space of an image, the geometric scale-space is a general representation that may be

leveraged by any number of feature detectors or other types of algorithms.

The key insight behind our approach is that we should construct a geometric scale-space represen-

10

tation of the surface geometry of the 3D model at hand, since we are interested in the scale-dependent

geometric features lying on the surface. This is opposed to constructing the scale-space directly on

the 3D coordinates of the points representing a 3D model. For this reason, we consider the normal

field on the 2D surface of the 3D object as the base representation of the geometry. While higher-

order geometric entities such as the mean curvature [21] can be used instead, surface normals are a

better choice since they are the first-order partial derivatives of the raw geometric data which is less

affected by noise compared to higher-order partial derivatives.

Although our framework can be adapted to work with any representation of three-dimensional

geometric data, we focus in this work particularly on 3D mesh models and range images. A 3D mesh

model is a discrete representation of a 3D polyhedral object that consists of a set of 3D points, or

vertices, edges that connect the points and faces that are groups of edges that form 2D polygons,

such as a triangle in the case of a triangular mesh. As a second representation of geometric data

we also consider range images, which are dense and regular 2D images of ranges, or distances, to an

object. Range images are a common representation of a single view of a laser range finder, where an

object will be represented with a large number of range images that are then combined to construct

a final 3D model.

In order to construct the geometric scale-space of a 3D mesh model or range image we first

represent the input geometry as a dense and regular “image” of the surface normals of the object.

Representing the 2D surface of a 3D object in a dense and regular 2D plane enables us to leverage

techniques from scale-space analysis of intensity images to the scale-space analysis of 3D geometry.

In the case of a 3D mesh model we construct this 2D representation by parameterizing, or mapping,

the surface of the mesh model onto a 2D plane. We then interpolate over the surface normals at

each 2D-embedded vertex of the mesh model to obtain a dense and regular 2D representation (vector

field) of the original surface, which we refer to as the normal map. In the case of a range image, the

key idea behind constructing the normal map is that a range image readily provides a 2D projection

of the original surface onto a 2D plane. The normal map can be constructed by triangulating the

range image and approximating surface normals at each vertex. With both 3D mesh models and

range images, the normal map enables us to represent a piece of 3D geometry in a manner that is

independent of the underlying sampling density of the original object.

We compute the geometric scale-space of the normal map by convolving the vector field with

Gaussian kernels of increasing standard deviation [31, 32] similarly to 2D scale-space. However, the

Gaussian kernel is modified so that distances are defined not by the Euclidean distance, but rather

11

the original surface geodesics. The geodesic distances are approximated efficiently in the case of a

3D mesh model with a distortion map, a novel representations of the relative distortion induced by

the parameterization at each point in the normal map. In the case of a range image the surface

geodesics are approximated directly from the range data itself. Defining the geometric scale-space

operator in terms of the surface geodesics ensures that the construction and analysis of the geometric

scale-space is equivalent to analyzing the scale-space of the normal field on the surface of the 3D

object, however, algorithmically in a much simpler way since we compute it in a regular and dense 2D

domain. Additionally, because our base representation of the geometric data is a dense and regular

2D plane, we can leverage techniques developed for 2D scale-space in developing novel methods for

the geometric scale-space.

A rich set of scale-dependent features can be extracted from the resulting geometric scale-space

representation. In particular, in Chapter 3 we derive detectors to extract geometric corners and edges

at different scales. In order to establish these detectors we derive the first- and second-order partial

derivatives of the normal map. Finally, we derive an automatic scale selection method analogous to

that of 2D scale-space theory to identify the natural scale of each feature and to unify all features

into a single set. The result is a set of scale-dependent 3D geometric features that provide a rich

and unique basis for representing the 3D geometry of the original object. The effectiveness of the

proposed method is evaluated on several range images and 3D mesh models of different topology

and its robustness to noise and mesh sampling density variation is demonstrated.

In Chapter 4 we show that we can encode the spatial extent of each local geometric feature in

a scale-dependent local 3D shape descriptor that together form a hierarchical representation of the

local geometric structures captured in a 3D mesh model or range image. Additionally, we show how

we may define a scale-invariant local 3D shape descriptor that is invariant to the inherent local scale

of the geometry and that can be used to match a pair of 3D objects of unknown or inconsistent

global scale. We demonstrate the effectiveness of our framework of encoding the scale-variability

of 3D mesh models and range images in scale-dependent/invariant local 3D shape descriptors by

deriving a fully automatic technique for approximately registering a set of range images. We further

demonstrate the effectiveness of our framework by fully automatically registering a set of range

images corresponding to multiple 3D models simultaneously.

The framework presented in this thesis enables us to directly model the scale-variations of ge-

ometric structures in local scale-dependent features. Due to the fundamental role local geometric

features play in constructing rich and concise representation of geometric data in computer vision

12

and computer graphics applications, this framework has wide applicability. We conclude the thesis

with a discussion on some of the directions future work in this area may take and also discuss a

number of potential applications. In fact, we imagine that the framework can be applied in any

geometric processing method that requires a concise and rich representation of the underlying local

geometric structures.

13

2. Geometric Scale-Space of a 3D Mesh Model and Range Image

Geometric features that represent a 3D mesh model or range image reside on the model’s surface.

For this reason, we must construct a geometric scale-space that faithfully encodes the scale variability

of the surface geometry. We represent the geometry of a 3D mesh model or range image in a dense

and regular 2D representation. The construction of this 2D representation depends on whether the

input geometry is a 3D mesh model, which resides in 3D dimensions and must be mapped to a 2D

domain, or a range image which is inherently defined on a dense and regular 2D domain. From

the 2D representation of the input geometry we construct the geometric scale-space by deriving and

applying a geometric scale-space operator that correctly accounts for the geodesic distances on the

surface.

2.1 2D Representation of the Surface Geometry of a 3D Model

We construct a 2D representation of a 3D mesh model by first unwrapping the surface of the

model onto the 2D plane. The result of this mapping is a sparse encoding of the mesh vertices in the

2D plane. We build a dense and regular 2D representation of the surface geometry by interpolating

over a geometric entity, specifically the surface normal, associated with each model vertex. Ideally,

this mapping would be isometric, however in general an isometric embedding cannot be achieved

and shrinkage and expansion of mesh edges will occur. We compensate for these distortions by

representing a 3D mesh model with a set of normal maps, each associated with different components

of the mesh model. Lastly, by constructing a dense map of the distortion induced by the mapping

at each point in the 2D domain we can accurately reconstruct the geodesic distances of the original

3D mesh model. This enables us to accurately build the geometric scale-space of the original surface

geometry with our 2D representation.

2.1.1 Normal and Distortion Maps

Given a 3D mesh model M and the planar domain D we seek a bijective parameterization

φ : D → M from a discrete set of planar points to the mesh vertex set. Since we later accurately

account for the introduced distortion in the distance metric, any embedding algorithm may be

used 1. For this work, we compute an initial parameterization based on the estimation of the1Readers are referred to [7] for a recent survey on various R3 to R2 parameterization algorithms.

14

(a) (b) (c)

Figure 2.1: 2D normal and distortion map of a 3D model. (a) shows the original model. (b)illustrates the dense 2D normal map. Observe that geometric features such as the creases on thepalm are clearly visible. (c) shows the distortion maps corresponding to the normal maps. Darkerregions have been shrunk relative to the brighter regions. Iso-contour lines illustrate the variouslevels of distortion induced by the embedding.

harmonic map [5], and iteratively refine it using a method proposed by Yoshizawa et al. [43] which

minimizes the distortion in the surface area of the triangulation.

The results of the above embedding is a 2D sparse “image” of the 3D mesh vertices. In order

to construct a regular and dense representation of the original surface, we interpolate a geometric

entity associated with each of the vertex points in the 2D domain. Surface normals are a natural

choice for this entity due to the fact that they are less affected by noise as compared to higher-

order derivative quantities such as curvature. Furthermore, they convey directional information of

the surface geometry as opposed to scalar quantities such as mean or Gaussian curvature. Note

that 3D coordinates cannot be used since they form the extrinsic geometry of the surface. Given

a parameterization φ we construct a dense normal map N : R2 → S2, where N maps points in D

to the corresponding 3D normal vector. Figure 2.1 (b) shows an example of such a normal map.

The resulting normal map provides a dense and regular 2D representation of the original 3D mesh

model. Most importantly, the density of the normal map is independent of the mesh resolution of

the given 3D model, therefore subsequent feature detection can be achieved at arbitrary precision

on the surface and is robust to changes in the sampling density on the original 3D model.

In order to accurately construct the geometric scale-space representation of the original surface

geometry of the 3D mesh model, we require the relative geodesic distance between any two points

on the normal map 2. This geodesic distance can be computed by accounting for the distortion

introduced by the embedding. Given a point u = (s, t) ∈ D that maps to a 3D mesh vertex φ(u)

2Note that there is also a global scaling between the 3D mesh model and its parameterized 2D image.

15

(a) (b) (c) (d)

Figure 2.2: Representing a 3D model using multiple normal maps. The heat map (a) illustrates thedensity of 3D vertices mapped to each point in the normal map shown in Figure 2.1(b). Figure (b)illustrates the 5 clustered components, each visualized with a different color, automatically computedfrom the density map in Figure (a). Each component is represented with a separate complimentarynormal map. Figures (b) and (c) show two supplementary normal maps corresponding to two fingersof the model shown in Figure 2.1.

we may define its distortion ε(u) as the average change in edge lengths connected to each vertex:

ε(u) =1

|A(u)|∑

v∈A(u)

‖ u− v ‖‖ φ(u)− φ(v) ‖

, (2.1)

where A(u) is a set of vertices adjacent to u. We construct a dense distortion map by again

interpolating over the values defined at each vertex. Figure 2.1(c) depicts the distortion map of the

hand model in Figure 2.1(a).

2.1.2 Multiple Normal Maps

Mesh parameterization algorithms often require 3D models to contain a natural boundary loop

that is mapped to the boundary of the planar domain. For 3D models that do not contain such

boundaries, we need to introduce cuts that map to the boundaries of the 2D domains, for instance

one boundary cut for a genus-0 model, and compute multiple normal maps corresponding to different

surface regions. At the same time, in general, surface regions mapped close to the boundary of the

2D domain are significantly distorted. In such areas the local neighbor of a 3D vertex is mapped

into a highly skewed region, which can introduce errors in the subsequent filtering.

We introduce boundary cuts such that they avoid surface regions that are most likely to contain

discriminative features. Specifically, for each 3D boundary cut, we manually select end points and

automatically trace vertices with low curvature 3. Furthermore, we construct a complimentary3This is the opposite of methods that try to minimize the overall distortion in the parameterization [12].

16

parameterization where portions of the model that were mapped to the perimeter of the original

embedding are mapped to the central region of D. By using this complimentary parameterization

together with the original embedding, we can ensure that every surface region is mapped to a region

in the normal map where the local structure is well preserved.

There can also be a considerable loss of information due to the finite resolution of the planar

domain D. Often when a model has large appendages the discretization of the normal map results in

2D points which correspond to multiple mesh vertices. To compensate for this many-to-one mapping,

we first compute a density map of vertices as shown in Figure 2.2(a). By clustering the points of

this histogram into disjoint sets and parameterizing the corresponding portions of the mesh in each

cluster, we construct a comprehensive set of supplementary normal maps. Figure 2.2(b) shows the

5 clusters, each visualized with a different color, automatically computed from the density map.

Figures 2.2(b) and (c) show two such supplementary normal maps corresponding to two fingers of

the model shown in Figure 2.1. By representing the 3D model using multiple normal maps, we

ensure that all portions of the surface are covered by the representation.

2.1.3 Geodesic Distance Function

In order to construct a geometric scale-space of a 3D mesh model that accurately encodes the

scale-variability in the underlying surface geometry we define all operators in terms of the geodesic

distance, rather then the Euclidean 3D or 2D distances. The geodesic distance between two 3D

points φ(v) and φ(u) on a 3D mesh model is defined as the minimum length line integral between

v and u in the distortion map. We approximate this by computing the discretized line integral

dM (u,v),

dM (v,u) ≈∑

vi∈P(v,u), 6=u

ε(vi)−1 + ε(vi+1)−1

2‖ vi − vi+1 ‖ , (2.2)

where P(v,u) = [v,v1,v2, ...,vn,u] is a list of points sampled on the line between v and u. The

quality of the approximation is determined by the sampling density of this line. Also the actual

geodesic is the line integral of the path with the minimum length, approximated here with the

straight-line distance. Although this assumption is not true, in general, we found that this geodesic

distance approximation to be sufficient in practice.

17

(a) (b)

Figure 2.3: 2D normal maps of two adjacent range images. Figure (a) shows the depth maps of tworange images of the Buddha model, separated by 24 degrees. Figure (b) shows the two dense normalmaps built by triangulating the range images and computing a surface normal for each point in therange image.

2.2 2D Representation of the Surface of a Range Image

The key insight underlying the 2D representation of the surface geometry captured in a range

image is that each range image is already a dense and regular projection of one view of the surface

of a 3D model. This means that we can avoid the embedding step necessary when constructing the

2D representation of a 3D mesh model. Furthermore, since the range image readily provides the

depth of each point the distortion map is unnecessary as the geodesics can be approximated directly

from the range data itself.

2.2.1 Normal Map

Given a range image R : D → R3, where D is a 2D domain in R2, we build the geometric

scale-space by triangulating the range image and then computing a surface normal for each vertex.

The density and regularity of the range image itself ensures that the density and regularity of the

resulting normal map. Figure 2.3 illustrates the normal maps computed from two Buddha range

images, separated by 24 degrees. Observe that the normal maps contain a significant amount of

information about the geometry of the 3D model.

18

2.2.2 Geodesic Distance Function

In order to construct a geometric scale-space of a range image that accurately encodes the

scale-variability in the underlying surface geometry we define all operators in terms of the geodesic

distance, rather than the Euclidean 3D or 2D distances. The geodesic distance between any two

points in a range image may be approximated by summing the change in 3D distance along the path

connecting those two points. Specifically, given two points u,v ∈ D we approximate the geodesic

distance dR(u,v) of the range image as

dR(u,v) ≈∑

ui∈P(u,v), 6=v

‖ R(ui)−R(ui+1) ‖, (2.3)

where P is a list of points on the path between u and v. If the path between u and v crosses

an unsampled point in the range image then we define the geodesic distance as infinity. When

approximating geodesics from a point of interest outwards, this sum can be computed efficiently

by storing the geodesic distances of all points along the current frontier and reusing these when

considering a set of points further away.

2.3 Geometric Scale-Space

The 2D representation and geodesic distance functions computed from either a 3D mesh model

or range image serve as the basis from which a discrete geometric scale-space is constructed. The

geometric scale-space is constructed by successively convolving the base normal map with a Gaussian

kernel of increasing standard deviation, where the kernel is defined in terms of the geodesic distance

of either the 3D mesh model or range image. The resulting geometric scale-space directly represents

the inherent scale-variability of local geometric structures captured in a 3D mesh model or range

image and serves as a rich basis for further processing.

2.3.1 Geometric Scale-Space Operator

To construct a (discrete) geometric scale-space we convolve the normal map with a Gaussian

kernel and renormalize the normals at each level. As in 2D scale-space theory [23], the standard

deviation of the Gaussian is monotonically increased from fine to coarse scale levels. We use the

geodesic distance as the distance metric to accurately construct a geometric scale-space that encodes

the surface geometry. Given a 2D isotropic Gaussian centered at a point u ∈ D, we define the value

19

(a) Scale σ = 1 (b) Scale σ = 3 (c) Scale σ = 5 (d) Scale σ = 7

Figure 2.4: The geometric scale-space representation of the 3D mesh model shown in Figure 2.1 andFigure 2.2(b,c). As the standard deviation increases fine model details are smoothed away leavingonly coarse geometric structures. For example the finger nail is quickly smoothed away, while theprominent creases on the palm remain visible even at the coarsest scale. Although the sizes of thefinger nails in the two supplementary normal maps are different, the rate of smoothing is consistentdue to the use of the geodesic Gaussian kernel that accounts for the distortion induced by theembedding.

of the geodesic Gaussian kernel at a point v as

g(v;u, σ) =1

2πσ2exp

[−d(v,u)2

2σ2

](2.4)

where d is the geodesic distance function defined as either dM or dR depending on whether the

input geometry is a 3D mesh model or range image, respectively. Other than this discrepancy, the

remainder of the theory is consistent regardless of the input type.

Using this geodesic Gaussian kernel, we compute the normal at point u for scale level σ as

Nσ(u) =

∑v∈W

N(v)g(v;u, σ)

‖∑

v∈WN(v)g(v;u, σ) ‖

, (2.5)

where W is a set of points in a window centered at u. The window size is also defined in terms

of the geodesic distance and is set proportional to σ at each scale level. In our implementation,

we grow the window from the center point while evaluating each point’s geodesic distance from

the center to correctly account for the points falling inside the window. Figure 2.4 shows the

normal map of the hand model and two supplementary normal maps at four scale levels. As the

standard deviation of the Gaussian increases fine model details are smoothed away, leaving only

coarse geometric structures.

20

3. Feature Detection in the Geometric Scale-Space

In order to detect salient local features in the geometric scale-space of a 3D mesh model or

range image, we first derive the first- and second-order partial derivatives of the normal map Nσ.

Novel corner and edge detectors are then derived using these partial derivatives. An automatic

scale-selection algorithm is introduced to unify the features detected at each scale into a single set of

scale-dependent geometric features. Lastly, the effectiveness and robustness of the proposed method

for computing scale-dependent geometric features is demonstrated on several 3D models of varying

topology as well as a set of range images.

3.1 Derivatives of the Normal Map

We first derive the first-order partial derivatives of the 2D normal map in the horizontal (s) and

vertical (t) directions. In the following we describe them only for the horizontal (s) direction. The

partial derivatives in the vertical direction (t) may be derived by simply replacing s with t. Note

that because we represent the geometric scale-space of both 3D mesh models and range images with

a set of discrete normal maps, the following sections are independent of the actual input data type

with the exception of the particular geodesic distance function.

At any point in the normal map the horizontal direction corresponds to a unique direction on

the tangential plane at the corresponding 3D point. The first-order derivative is thus the directional

derivative of the normal along this specific direction in the tangential plane, known as the normal

curvature. In the discrete domain D the normal curvature in the horizontal (Cs) direction at a point

u = (s, t) may be computed by numerical central angular differentiation:

Ns(u) =∂N(u)

∂s= Cs(u) ≈

sin( 12θ(u−1,u+1))

L(u−1,u+1), (3.1)

where u±1 = (s±1, t), θ(u−1,u+1) is the angle between the normal vectors N(u−1) and N(u+1), and

L(u−1,u+1) is the chord length between the 3D points φ(u−1) and φ(u+1). Because the normal

curvature is a function of adjacent points in the 2D domain D the chord length L is simply the

geodesic distance between these points. After applying the discrete geodesic distance in Equation 2.2

we obtain

Ns(u) ≈sin( 1

2θ(u−1,u+1))d(u−1,u+1)

, (3.2)

21

where again d may refer to the geodesic distance particular to a 3D mesh model or range image.

Note that because the angle between the two normal vectors is in the range [0, π], the first-order

derivative is nonnegative at both convex and concave surface points – it is unsigned.

The second-order derivative of the normal map can be derived as

Nss(u) =∂2N(u)

∂s2=

∂Cs(u)ds

. (3.3)

After applying the chain rule to Equation 3.1 we obtain

Nss(u) ≈ ∂θ(u−1,u+1)∂s

cos(

12θ(u−1,u+1)

)L(u−1,u+1)

− ∂L(u−1,u+1)∂s

2 sin(

12θ(u−1,u+1)

)L(u−1,u+1)2

. (3.4)

In the case of a 3D mesh model, we can safely assume that the parameterization induces a uniform

distortion between every adjacent point in D, the derivative of the chord length L will be zero, and

the second term vanishes. In this case we may apply numerical central differentiation to θ and using

the half angle formula, the second-order derivative reduces to

Nss(u) ≈ θ(u−2,u)− θ(u+2,u)d(u−1,u+1)

√12 (1 + N(u−1) ·N(u+1))

d(u−1,u+1). (3.5)

This form is particularly attractive as it enables us to compute the second-order derivative in terms of

the original normal vectors, and the change in the local angle. The noise associated with higher-order

derivatives is reduced as we have avoided an additional numerical differentiation of the first-order

derivatives.

3.2 Corners

Consider the hand model shown in Figure 2.1. We wish to detect geometrically meaningful

corners such as the finger tips as well as the points on the sharp bends of the palm prints. In other

words, we are interested in detecting two different types of geometric corners, namely points that

have high curvature isotropically or in at least two distinct tangential directions. The rich geometric

information encoded in the normal maps enable us to accurately detect these two types of 3D corners

using a two-phase geometric corner detector.

We begin by computing the Gram matrix M of first-order partial derivatives of the normal map

22

(a) σ = 3 (b) σ = 7

Figure 3.1: Corners detected on the 2D normal map. (a) illustrates the 20 strongest corners on the2D representation of the hand model at scale σ = 3. Observe that the corner points on the palmare primarily located where two creases converge, or where there is an acute bend in one crease. (b)shows the strongest corner on two of the finger normal maps at scale σ = 7. At this coarse scale thecorners are detected on the tip of the finger.

Nσ at each point. The Gram matrix at a point u is defined as

M(u;σ, τ) =∑v∈W

Nσs (v)2 Nσ

s (v)Nσt (v)

Nσs (v)Nσ

t (v) Nσt (v)2

g(v;u, τ) , (3.6)

where W is the local window around the point u. M has two parameters, one that determines

the particular scale in the scale-space representation (σ), and one that determines the weighting of

each point in the Gram matrix (τ)1. The corner response at a point u is defined as the maximum

eigenvalue of M. However, due to the unsigned first-order derivative the resulting corner set will

contain not only the aforementioned two desired types of geometric corners, but will also contain

points lying on 3D edges.

The second-order derivatives of the normal map can be used to prune the corners lying along

the 3D edges. We first prune the corner points that are not centered on zero crossings in both

the horizontal and vertical directions. Next we keep only those points where the variance of the

second-order partial derivatives around the point u are within a constant factor of each other. The

closer this constant factor is to 1, the greater the geometric variance of the selected corner points

in both tangential directions. Figure 3.1 illustrates corners detected on the hand model shown in

Figure 2.1 at one scale level. Once the corners are detected in 2D they can be mapped back to the

3D model. Because the 2D normal map is dense, the corresponding location of the corners in 3D1In our experiments we set τ = σ/2 experimentally.

23

Figure 3.2: Edges detected at one scale level (σ = 1). The edges are detected accurately on surfacepoints with locally maximum curvature, namely 3D ridges and valleys. Here the the creases of thepalm form the edges.

are independent of the input model’s triangulation.

3.3 Edges

In order to find edges at each scale level in the geometric scale-space we use the second-order

derivatives of the normal map. Although the first-order derivative is unsigned, locating edges using

the zero crossing of the second-order derivative is sufficient, as the sign only affects the profile of the

derivative values around the zero crossing, and not the actual location of the zero crossing.

Similar to the classic work of Marr and Hildreth [27] for 2D images, given a normal map, we

begin by computing the Laplacian, defined as

∇2Nσ = Nσss + Nσ

tt (3.7)

Next we construct a binary image that contains the zero-crossing of the Laplacian. This set of zero

crossings contains points centered on curvature maxima, as well as spurious edge points arising from

uniform or slow changing curvature regions. We remove the spurious edge points by thresholding

the magnitude of the first-order derivative, and the variance of the second-order derivative. This

ensures that edges are detected in high curvature regions, and lie along portions of the surface with

a significant variation in the surface geometry.

Figure 3.2 shows an example result of edge detection on the hand model at one scale level (σ = 1).

Again, these edges are localized on the surface of the 3D model independent of the mesh resolution.

Additional post-processing may also be applied to the edges once they are mapped onto the 3D

24

(a) (b)

Figure 3.3: Scale-dependent geometric corners (a) and edges (b) detected on the hand model. Thecorners are represented with 3D spheres which are colored and sized according to their respectivescale (blue and red correspond to the finest and coarsest scales, respectively). The corners accuratelyrepresent the geometric scale variability of the model, for instance with fine corners on the palmcreases and coarse corners at the tips of the fingers. The edges also encode the geometric scalevariability, tracing edge segments that arise at different scales.

model. In latter experiments, we first compute a minimum spanning tree of the 3D edge points,

where the magnitude of the edge response in 2D determines the weight of the 3D point similar

to [33]. Then we decompose the tree into a set of disjoint edge paths via caterpillar decomposition

and fit NURBs curves to each of these paths to obtain smooth parametric 3D edges.

3.4 Automatic Scale-Selection

Once features are detected in each of the normal maps in the geometric scale-space they can

be unified into a single feature set. Although a feature may have a response at multiple scales,

it intrinsically exists at the scale where the response of the feature detector is maximized. By

determining this intrinsic scale for each feature we obtain a comprehensive scale-dependent 3D

geometric feature set.

In order to find the intrinsic scale of a feature we search for local maxima of the normalized feature

response across a set of discrete scales, analogous to the 2D automatic scale selection method [24].

The derivatives are normalized to account for a decrease in the derivative magnitude as the normal

maps are increasingly blurred. We define the normalized first-order derivatives Nσs and Nσ

t as

Nσs = σγNσ

s and Nσt = σγNσ

t , (3.8)

25

where γ is a free parameter that is empirically set for each particular feature detector. The corre-

sponding normalized second-order derivatives are defined as

Nσss = σ2γNσ

ss and Nσtt = σ2γNσ

tt . (3.9)

Normalized feature responses are computed by substituting the normalized derivatives into the

corner and edge detectors presented in the previous two sections. The final scale-dependent geometric

feature set is constructed by identifying the points in the geometric scale-space where the normalized

feature response is maximized along the scale axis and locally in a spatial window. Figure 3.3

illustrates the scale-dependent geometric corners and edges of the hand model. The scale-dependent

geometric features accurately encode the geometric scale-variability and can clearly be used as a

unique representation of the underlying geometry.

3.5 Experiments

We evaluated the effectiveness and robustness of the proposed method for computing scale-

dependent geometric features on several 3D mesh models and range images. The method was

applied to 3 different 3D mesh models, in addition to the hand model shown in Figure 3.3. One

of these models, the Julius Caesar, is of disk topology and two, the armadillo and Buddha, have

a genus of zero. In addition, we applied the method to two pairs of range scans, separated by 24

degrees, of two unique objects.

3.5.1 Corner and Edge Detection on 3D Mesh Models

Figure 3.4 illustrates the corners and edges detected on the three models. The armadillo model

has appendages that were significantly distorted in the 2D representation and therefore multiple

normal maps were used. Additionally the large distortion at the boundaries of the armadillo was

accounted for using a complimentary parameterization. The set of scales used to detect the corners

and edges depend on the geometry of the model and were set empirically. Observe that the set

of corners is distributed across scales, and that the scale of a particular corner reflects the scale of

the underlying surface geometry. For instance the tip of Caesar’s nose is detected at the coarsest

scale, while the corners of the mouth are detected at a relatively finer scale. The edges are detected

along ridges and valleys of the 3D models existing at different scales, for example the edges on the

prominent creases of the Buddha’s robe, as well as edges along the finer details of the base.

26

Figure 3.4: Scale-dependent geometric corner and edge detection results on a disc topology model(Caesar) and two genus zero models (Buddha and armadillo). The corner and edges are accuratelydetected across different scales. The resulting scale-dependent geometric feature set encode the scalevariability of the underlying surface geometry resulting in a unique representation of each model.

3.5.2 Corner Detection on Range Images

Figure 3.5 illustrates corners detected on two pairs of range scans of two unique models, the Bud-

dha and dragon model, taken 24 degrees apart. Observe that again the set of corners is distributed

across all scales. The features display a high degree of repeatability, both in the location and scale.

For instance note the large number of coarse (red) corners repeated on the chest and stomach of the

Buddha model, as well as the large number of finer (blue and teal) corners repeated on the robe and

necklace. Also observe the consistency of the scale and localization of the corners detected along

the body of the dragon model. It should be noted that the corner detection was conducted on the

raw range data, and no preprocessing was applied to the data.

27

(a) (b)

Figure 3.5: Scale-dependent geometric corner detection on two pairs of range images, separated by24 degrees. Despite the noise inherent to range image data, the corners are distributed across scalesand reflect the relative scale of the underlying geometric structures. Additionally, the corners displaya high degree of repeatability, both in location and scale, between two adjacent range images.

3.5.3 Noisy Surface Normals

We tested the resilience of our framework to noisy input data by applying Gaussian random

noise with standard deviation 0.05, 0.075, and 0.1 to the surface normals of the Julius Caesar 3D

mesh model. The features were detected with the identical parameters used to detect the original

set of corners on the Julius Caesar model. Figure 3.6(a) illustrates the results. Although fine scale

corners can arise from the input noise, the detected scale-dependent geometric corner set are highly

consistent with the those detected on the original model and localized accurately as compared to

the original results shown in Figure 3.4.

3.5.4 Varying Sampling Densities

We demonstrate the independence of our framework from surface sampling density by computing

scale-dependent geometric corners on three simplified Julius Caesar mesh models. Specifically, we

applied a surface simplification algorithm [9] to construct Julius Caesar models with 30, 000, 20, 000

and 10, 000 faces from the original model with 50, 000 triangle faces. Corners were detected at each

sampling density using the parameters from the original experiment (Figure 3.4). Figure 3.6(b)

illustrates the results. Although the number of faces changes substantially, the location and scale of

the corners remain largely constant. This demonstrates that the density of the 2D representation of

28

(a) Models with input noise of 0.05, 0.075 and 0.1

(b) Models with 30,000, 20,000 and 10,000 faces

Figure 3.6: Scale-dependent geometric corner detection with the presence of noisy surface normals(a), and with varying surface sampling densities (b). When compared with the corners shown inFigure 3.4 the results demonstrate that the scale-dependent corners detected with our frameworkare largely invariant to significant input noise and variations in the sampling density.

the surface geometry ensures that the framework is independent of the surface sampling.

29

4. Scale-Dependent/Invariant Local 3D Shape Descriptors

Once we detect scale-dependent features on a 3D mesh model or range image via geometric

scale-space analysis we may define novel local shape descriptors that naturally encode the scale

of the underlying geometric structures. In particular we derive both scale-dependent and scale-

invariant local 3D shape descriptors, which retain the geometric scale variability as a hierarchical

representation or achieve scale-invariance, respectively. We demonstrate the effectiveness of our

framework of encoding the scale-variability of geometric structures in scale-dependent/invariant

local 3D shape descriptors in a method to automatically register a number of models, represented

by a set of range images, of varying geometric complexity. We further demonstrate the effectiveness

of our framework by fully automatically registering a set of range images corresponding to multiple

3D models simultaneously.

4.1 Exponential Map

There are a wide variety of 2D shape descriptors that have been previously proposed [16, 8,

3, 37, 36] for the purposes of 3D object recognition, registration navigation and more. Many of

these approaches suffer from a number of limitations, such as sensitivity to the sampling density

of the underlying geometry or being sensitive to slight perturbations in the localization of the

surface descriptor. In order to overcome these limitations we chose to encode each scale-dependent

feature and its spatial extent in a dense and regular 2D shape descriptor. However, unlike past

approaches [44] we did not want to rely on 2D embedding techniques that are sensitive to the local

surface patching being encoded. Additionally, we require a 2D shape descriptor that is repeatable

in order to determine surface correspondences between pairs of 3D models accurately.

We construct both our scale-dependent and scale-invariant local surface descriptors by mapping

the local neighborhood of a feature to a 2D domain using the exponential map. The exponential

map is a mapping from the tangent space of a surface point to the surface itself [2]. Specifically,

given a unit vector w lying on the tangent plane at a point u there is a unique geodesic Γ on the

surface such that Γ(0) = u and Γ′(0) = w. The exponential map takes a vector w on the tangent

plane and maps it to the point on the geodesic curve at a distance of 1 from u, or exp(w) = Γ(1).

Following this, any point v in the local neighborhood of u can be mapped to u’s tangent plane by

30

determining the unique geodesic between u and v and computing the geodesic distance and polar

angle of the tangent to the geodesic at u relative to a fixed basis {e1, e2} on u’s tangent plane.

The exponential map has a number of properties that are attractive for constructing a 2D shape

descriptor. First, it is known that for each point on a 3D surface the exponential map is defined

and differential for some neighborhood [2]. Although fold-overs may occur if this neighborhood

is too large, the local nature of the scale-dependent and scale-invariant descriptors implies this

will rarely happen. In practice we have witnessed fold-overs on an extremely small number of

features, mostly near points of depth discontinuity. Although the exponential map is not, in general,

isometric the geodesic distance of radial lines from the feature point are preserved [11]. This ensures

that corresponding features detected in the geometric scale-space will have consistent 2D shape

descriptors. Additionally, because the exponential map is defined relative to the interest point, it

has an inherent robustness to the boundary of the neighborhood being encoded.

4.2 Scale-Dependent Local 3D Shape Descriptors

We construct a 2D scale-dependent local shape descriptor for a feature detected at u and at

scale σ by mapping each point v in the neighborhood of u to a 2D domain using the geodesic polar

coordinates G defined as

G(u,v) = (d(u,v), θT (u,v)), (4.1)

where again d(u,v) is the geodesic distance between u and v and θT (u,v) is the polar angle of the

tangent of the geodesic between u and v, defined relative to a fixed bases {e1, e2}. In practice we

approximate this angle by orthographically projecting v onto the tangent plane of u and measuring

the polar angle of the intersection point.

After mapping each point in the local neighborhood of u to its tangent plane we are left with

a sparse 2D representation of the local geometry around u. In order to construct a dense 2D

descriptor we interpolate over a geometric entity encoded at each vertex to construct a dense and

regular representation of the neighborhood of u at scale σ. We choose to encode the surface normals

from the base normal map, rotated such that the normal at the center point u points in the positive

z direction. The resulting dense 2D descriptor is invariant up to a single rotation. We resolve

this by aligning the principal curvature directions at u to the horizontal axis in the geodesic polar

coordinates, resulting in a rotation invariant shape descriptor. Once this local basis has been fixed

we re-express each point in terms of the normal coordinates, with the feature point u at the center

31

Figure 4.1: Scale-dependent features and local shape descriptors in the geometric scale-space of arange image. Features are colored and according to the scale in which they were detected, with redbeing the coarsest and blue being the most fine. A subset of local shape-dependent shape descriptorsis illustrated that form a hierarchical representation of the underlying scale-variability in the surfaceof a range image and enables correspondences to be determined robustly between pairs of rangeimages.

of the descriptor.

The radius of the descriptor is set proportional to the scale σ to encode the inherent scale of

different features. We refer to this dense 2D scale-dependent descriptor as Gσu for a feature at u

and at scale σ. Figure 4.1 shows an example of a set of scale-dependent local 3D shape descriptors

detected in the geometric scale-space of two Buddha range images. The local scale-dependent 3D

shape descriptors form a hierarchical representation of the underlying surface geometry.

4.3 Scale-Invariant Local 3D Shape Descriptors

The scale-dependent local 3D shape descriptors, described in the previous section are appropriate

only when the global scales between a pair of 3D mesh models or range images are the same or are

known. This happens for instance when we know that the range images are captured with the same

range finder. In order to enable comparison between 3D mesh models or range images that do not

have the same global scale we also derive a scale-invariant local 3D shape descriptor Gσu.

32

The key insight into our scale-invariant local 3D shape descriptor is the fact that the scale

of local-geometric structures relative to the global scale of a mesh model or range image remains

constant as the global scale of a range image is altered. This enables us to construct a set of scale-

invariant local 3D shape descriptors by first building a set of scale-dependent local shape descriptors

and then normalizing each descriptor to a constant radius. Such a scale-invariant representation of

the underlying geometric structures enables us to establish correspondences between a pair of range

images even when the global scale is different and unknown.

4.4 Pairwise Matching of Scale-Dependent/Invariant Local 3D Shape Descriptors

Scale-dependent and scale-invariant shape descriptors contain a wealth of information about the

scale of local geometric structures that can be exploited in robust algorithms to find the pairwise

transformation between two 3D mesh models or range images. In particular, we show how the scale-

dependent shape descriptors form a hierarchical representation of the geometric structures that can

be leveraged in a coarse-to-fine matching algorithm between a pair of mesh models or range images

with the same global scale. We also demonstrate how the scale-invariant local shape descriptors

can be used to effectively establish correspondences between a pair of range images with completely

different global scales.

4.4.1 Similarity of Scale-Dependent and Scale-Invariant Descriptors

We first need a measure of the similarity between two scale-dependent or two scale-invariant de-

scriptors. Since each descriptor is a dense 2D image of the surface normals in the local neighborhood

we may define the similarity as a function of the average angle between two corresponding normals,

S(Gσu1

,Gσu2

) =π

2− 1|A ∩B|

∑v∈A∩B

arccos(Gσu1

(v) ·Gσu2

(v)), (4.2)

where A and B are the set of points in the domain of Gσu1

and Gσu2

, respectively. Although this

similarity measure is defined in terms of the scale-dependent descriptors the definition corresponding

to the scale-invariant descriptors is equivalent with G substituted for G.

4.4.2 Pairwise Matching of Scale-Dependent Descriptors

The natural hierarchical representation of the scale-dependent local 3D shape descriptors can be

used in a robust algorithm for matching a pair of 3D objects (R1,R2), such as 3D mesh models or

33

Figure 4.2: Matching two range images with consistent global scale represented as a set of scale-dependent local shape descriptors. On the left we show the 67 point correspondences found withour matching algorithm and on the right the resulting rigid transformation. The hierarchy of thelocal scale-dependent 3D shape descriptors enables a coarse-to-fine sampling strategy that results ina large number of correspondences that accurately approximate the pairwise rigid transformation.

range images, with the same global scale. Note that if we know that the range images are captured

with the same range scanner, or if we know the metrics of the 3D coordinates, i.e. centimeters, we

can safely assume that they have, or we can covert them to, the same global scale.

Once we have a set of scale-dependent local 3D shape descriptors for each 3D object, we construct

a set of possible correspondences by matching each descriptor to the n1 most similar. The consistency

of the global scale allows us to consider only those correspondences at the same global scale –

greatly decreasing the number of correspondences that must be later sampled. We find the best

pairwise rigid transformation between the two 3D objects by randomly sampling this set of potential

correspondences and determining the one that maximizes the area of overlap between the two 3D

objects, similar to RANSAC [6]. However, rather than sampling the correspondences at all scales

simultaneously, we instead sample in a coarse-to-fine fashion, beginning with the descriptors detected

at the coarsest scale in the geometric scale-space and ending with descriptors detected in the most

fine. This enables us to quickly determine a rough alignment between two 3D objects, as there are,1In our experiments n is set in the range of 5 ∼ 10.

34

in general, far fewer features at coarser scales.

For each scale σi we randomly construct N · σi sets of 3 correspondences, where each corre-

spondence has a scale between σ1 and σi. For each correspondence set C we approximate a rigid

transformation T , using the method proposed by Umeyama [40], and then add to C all those corre-

spondences (uj ,vj , σj) where ‖ T ·R1(uj) −R2(vj) ‖≤ α and σj ≤ σi. Throughout the sampling

process we keep track of the transformation and correspondence set that yield the maximum area

of overlap. Once we begin sampling the next finest scale σi+1 we initially test whether the corre-

spondences at that scale improve the area of overlap of the best rigid transformation. This allows us

to quickly add a large number of correspondences at the finer scales efficiently without an excessive

number of samples being drawn.

Figure 4.2 shows the results of applying our pairwise matching algorithm to two range images

taken of the Buddha model. The number of correspondences is quite large and the correspondences

are distributed across all scales. Although this is an initial approximate alignment, the large corre-

spondence set enables the pairwise rigid transformation to be accurately approximated without an

excessive number of samples.

4.4.3 Pairwise Matching of Scale-Invariant Descriptors

The scale-invariant local 3D shape descriptors detected in the geometric scale-space are a fully

scale-invariant representation of the underlying local surface geometry that enable us to match a

pair of 3D objects (R1,R2) with different global scales. In this case we may register the 3D objects

while estimating the global scale differences.

Our algorithm for matching a pair of 3D mesh models or range images (R1,R2) with completely

different global scales is similar to the one proposed in the previous section for scale-dependent

local descriptors. However, since we no longer know a priori their relative global scales we must

consider the possibility that a feature detected in the geometric scale-space of a 3D mesh model

or range image may be detected at any scale in the second 3D mesh model or range image. Our

algorithm proceeds by first constructing a potential correspondence set that contains, for each scale-

invariant local 3D descriptor detected in the first 3D object R1, the n most similar in the second

object R2. We find the best pairwise similarity transformation by applying RANSAC to randomly

sample this correspondence set. For each iteration the algorithm approximates the 3D similarity

transformation [40] and computes the area of overlap. The transformation which results in the

maximum area of overlap is considered the best.

35

Figure 4.3: Matching two range images with inconsistent global scales represented with a set of scale-invariant local shape descriptors. On the left we show the 24 point correspondences found with ourmatching algorithm and on the right the resulting similarity transformation. Even though the rangeimages differ by a global scale factor of approximately 2.4 the scale-invariant local descriptors enableus to recover the similarity transformation accurately.

Figure 4.3 shows the result of applying our algorithm to two views of the Buddha model with a

relative global scale difference of approximately 2.4. Even though there is a considerable difference

in the relative global scales, our scale-invariant representation, composed of scale-invariant local

descriptors, enables us to recover the similarity transformation quite accurately without any initial

assumption about the models or their global scales.

4.5 Experiments

In this section we demonstrate the effectiveness of our framework for constructing and matching

scale-dependent/invariant local 3D shape descriptors by deriving a fully automatic range image

registration algorithm. We show that the scale-dependent and scale-invariant descriptors can be used

to register a set of range images both with and without global scale variations. The effectiveness of

our algorithm is shown by registering a number of models of varying geometric complexity. In fact

we show that we can register a mixed set of range images corresponding to multiple 3D models

simultaneously and fully automatically.

36

Figure 4.4: Fully automatic approximate registration of 15 views of the Buddha model, 12 viewsof the armadillo model and 18 views of the angel model with scale-dependent local descriptors. Inthe first column we show the initial set of range images. In the second we show the approximateregistration obtained with our framework, which is further refined with ICP in the third column.Finally a water tight model is built using a surface reconstruction algorithm [18]. Observe that theinitial approximation obtained with our framework is quite accurate.

4.5.1 Registration of Range Images

A 3D computer model is built from an object using 3D acquisition hardware, such as a laser

scanner, by taking a number of scans, represented as range images, of different views of the object.

Combining these range images is not straightforward, as each reside in their own coordinate system.

In order to construct the final 3D model, each range image must be placed in a consistent coordinate

system, a problem referred to as range image registration. In the past range image registration

techniques have relied on human input to give a rough alignment between the range images by

manually clicking on corresponding points in the set of range images. After the initial manual

37

alignment, refinement algorithms, such as iterative closest point [1, 39] (ICP), are then applied to

create the final set of registered range images.

Although final refinement algorithms, such as ICP, have proved to be robust and effective in

creating a final object from an initial alignment, requiring a person to manually determine corre-

spondences between potentially hundreds of range image for this initial alignment has lead to a

need for fully automatic range image registration algorithms. One approach to this problem is to

represent each range image with a global descriptor and perform the automatic registration between

all range images simultaneously [26]. Other approaches have differentiated between the local reg-

istration phase, where local features are used to determine correspondences between pairs of range

images, and a global registration phase, where the pairs of matching range images are combined

for a single approximate registration [15, 22]. Although such approaches have proved promising,

automatically registering models accurately remains a difficult and open problem.

Given a set of range images {R1, ...,Rn} our fully automatic approximate range image registra-

tion algorithm first constructs the geometric scale-space of each range image. Scale-dependent fea-

tures are detected at discrete scales and then combined into a single comprehensive scale-dependent

feature set, where the support size of each feature follows naturally from the scale in which it was

detected. Each feature is encoded in either a scale-dependent or scale-invariant local shape descrip-

tor, depending on whether the input range images have a consistent global scale. We then apply

the appropriate matching algorithm, presented in the previous sections, to each pair of range images

in the inputted set to recover the pairwise transformations. We augment each transformation with

the area of overlap resulting from the transformation. Next we construct a model graph [15], where

each range image is represented with a vertex and each pairwise transformation and area of overlap

is encoded in a weighted edge. We prune edges with an area of overlap less then ε. In order to

construct the final set of meshes {M1, ...,Mm} we compute the maximum spanning tree of the

model graph and build a single mesh from each connected component.

The alignment obtained by our algorithm is only an initial registration that can be further refined

by applying a global registration algorithm, such as ICP [1], to form a fully automatic range image

registration algorithm.

4.5.2 Registration of Range Images with Consistent Global Scale

Figure 4.4 illustrates the results of applying our framework independently to 15 views of the

Buddha model, 12 views of the armadillo model, and 18 views of the angel model, with con-

38

Figure 4.5: Automatic registration of 42 range images, 15 views of the Buddha model, 12 viewsof the armadillo and 15 views of the dragon model. The accuracy of the approximate registrationfound with our framework enables us to automatically discover that the range images correspondto three disjoint models. Note that these registrations have not been post-processed with a globalregistration algorithm.

sistent global scales. Scale-dependent local shape descriptors were detected at 5 discrete scales,

σ = {0.5, 1, 1.5, 2, 2.5}, in the geometric scale-space. In the first column we show the initial input.

Each range image is colored differently to visualize the quality of the registration. The second col-

umn illustrates the approximate alignment by by our framework. Although we are not making any

assumptions about the initial pose we are able to obtain an accurate initial alignment. In the third

column we show the resulting registration after applying ICP. After applying ICP we measured the

average distance between each vertex in the initial pose estimated by our algorithm and the final

pose computed by ICP. We found the resulting average distances for the armadillo, Buddha and

angel models, relative to the diameter of the models, to be 0.169%, 0.294% percent and 1.16%,

respectively. This shows that although we are not making any assumptions about the initial pose we

are able to obtain an accurate initial alignment. In the final column we show the watertight model

obtained after applying a surface reconstruction algorithm [18].

In our next experiment we demonstrate the ability of our framework to register range images

corresponding to multiple 3D models simultaneously. In order to register the models simultaneously

we prune the edges on the model graph that correspond to transformations with an area of overlap

less than some threshold. In practice we found this threshold easy to set as our framework results

in approximate alignments that are quite accurate. Figure 4.6 summarizes the results. In the first

column we illustrate the initial set of range images; 15 views of the Buddha model, 12 views of

the armadillo, and 15 views of the dragon model. The remainder of the figure illustrates the three

disjoint approximate registrations obtained with our framework. Note that no global registration

39

Figure 4.6: Fully automatic approximate registration of 15 views of the Buddha and dragon modelseach with a random global scaling from 1 to 4. For each model we visualize the initial set of rangeimages and the initial alignment obtained by our framework. In column three we show the resultsafter applying ICP and in column four we show the results of the surface reconstruction. Even withthe substantial variations in the global scale, our scale-invariant representation of the underlyinggeometry enables us to accurately approximate the registration without any assumptions about theinitial poses.

algorithm has been applied to these results.

4.5.3 Registration of Range Images with Inconsistent Global Scale

Next we demonstrate the effectiveness of our framework for fully automatically registering a

number of range images with unknown global scales. Figure 4.6 illustrates the results of applying

our framework to 15 views of the Buddha and dragon models. Each range image was globally scaled

by a random factor between 1 and 4. For each model we illustrate the initial set of range images on

the left and the approximate pose obtained with our framework. Each pair of range images for the

Buddha and dragon had an average scale difference of 1.46 and 1.23, respectively. For each pair of

adjacent range images we found the mean difference between the ground truth scale and that found

by our algorithm to be 0.016 for the dragon and 0.004 for the Buddha model. This demonstrates

that even with substantial variations in the global scale, our scale-invariant representation of the

underlying geometry enables us to accurately approximate the registration without any assumptions

about the initial poses or global scales.

Lastly, Figure 4.7 illustrates the results of applying our framework to 42 range images corre-

sponding to three different models that have been randomly scaled by a factor between 1 and 4.

Again, despite the significant scale variations, our scale-invariant representation of the underlying

40

Figure 4.7: Automatic registration of 42 randomly scaled range images, 15 views of the Buddhamodel, 12 views of the armadillo and 15 views of the dragon model, each. Each range imagewas randomly scaled by a factor between 1 and 4. Again, despite the significant scale variations,our scale-invariant representation of the underlying local geometric structures enables us to fullyautomatically construct initial pose estimates for all three models simultaneously.

local geometric structures enables us to fully automatically construct initial pose estimates for all

three models simultaneously.

41

5. Conclusions

We close this thesis with a summary of our comprehensive framework for representing the local

scale-variations of a 3D model with scale-dependent/invariant geometric features, encoded with novel

shape descriptors. First we summarize our work and the significant contributions of the thesis. We

conclude with a discussion of a few applications for which we believe our framework is well suited.

5.1 Summary

In this thesis we present a comprehensive framework for modeling the scale variability of local ge-

ometric structures and representing that variability in scale-dependent/invariant geometric features

and shape descriptors. This was accomplished by first representing the inputted geometry with a

normal map - a dense and regular image of the surface normals of the original model’s surface. This

novel 2D representation allowed us to leverage 2D scale-space theory in order to develop an effective

scale operator that builds the geometric scale-space of a 3D shape. The geometric scale-space is a

four dimensional representation of three-dimensional data, where the fourth dimension is a measure

of the smoothing of the model’s surface. Next, we derived novel first- and second- order partial

derivatives on the normal map which were used to detect scale-dependent 3D features, in particular

corners and edges. The effectiveness of the scale-dependent feature set to represent the scale and

geometric properties of the underlying local surface structures was demonstrated on a number of

range images and 3D models of varying topology. Furthermore we demonstrated the robustness of

these features to noisy input data and changes to the sampling density of the surface.

Next, we demonstrated how augmenting local 3D geometric features with their natural scale

enables us to automatically determine their spatial extent to encode in local shape descriptors. In

particular we derived novel scale-dependent and scale-invariant local shape descriptors. Both the

scale-dependent and scale-invariant local shape descriptors were built by leveraging the exponential

map, which enables us to represent the spatial extent of a local geometric feature in a manner that is

independent of the surface sampling density and is invariant to the pose of the model. Furthermore

we demonstrated how the scale-dependent local shape descriptors form a hierarchical representation

of the underlying local geometric structures that can be leveraged in an efficient top down pairwise

matching algorithm. We also demonstrated how the scale-invariant local shape descriptors form a

42

fully scale-invariant representation of the local geometric structures that can match two 3D models

with significantly different global scales. Finally, the scale-dependent and scale-invariant local shape

descriptors were used in a fully automatic range image registration algorithm that was able to register

sets of range images corresponding to multiple models simultaneously.

5.2 Contributions

In this thesis we have made the following contributions.

• A comprehensive framework for modeling the scale-variability of local structures in 3D geo-

metric data. The key is the construction of the geometric scale-space, a general representation

of the implicit surface geometry of the local geometric at different scales. The scale operator

is defined on the geodesic distance function, which avoids potential pitfalls of the Euclidean

distance and ensures that the geometric scale-space is an accurate representation of the surface

geometry at different scales.

• Scale-dependent features detected in the geometric scale-space of a 3D model. The geometric

scale-space of a 3D model serves as a general representation from which many feature types

can potentially be derived. We presented a derivation of the first- and second- order partial

derivatives that were used specifically in novel corner and edge detectors. Additionally we

presented an automatic scale-selection technique that enables the intrinsic scale of each feature

in the geometric scale-space to be determined automatically.

• Novel scale-dependent local shape descriptors that form a hierarchical representation of the

underlying geometric structures. The key is that by leveraging our framework for constructing

the geometric scale-space we are able to automatically determine the spatial extent of a feature

to encode in the local scale-dependent shape descriptor. The shape descriptor itself is an

improvement over past methods in that it is independent of the sampling density of the original

object and is also invariant to pose changes to the model. An effective hierarchical algorithm

for matching two 3D models represented with a set of scale-dependent local shape descriptors

was presented.

• Novel scale-invariant local shape descriptors that together form a scale-invariant representation

of the underlying geometric structures. By leveraging the geometric scale-space we normalize

the spatial extent of each feature to form a scale-invariant representation of the underlying

43

geometry. This representation allows us to match two pieces of geometry even if they have

drastically different global scales.

• Fully automatic registration algorithm for multiple range images with or without consistent

global scale. We presented a fully automatic range image registration algorithm that was shown

to accurately register a set of range images corresponding to multiple objects simultaneously.

44

Bibliography

[1] P. J. Besl and N. D. McKay. A Method for Registration of 3D Shapes. IEEE Trans. on PatternAnalysis and Machine Intelligence, 14(2):239–256, 1992.

[2] M. P. Do Carmo. Differential Geometry of Curves and Surfaces. Prentice Hall, 1976.

[3] C.S. Chua and R. Jarvis. Point Signatures: A New Representation for 3D Object Recognition.Int’l Journal of Computer Vision, 25(1):63–85, October 1997.

[4] M. Desbrun, M. Meyer, P. Schroder, and A. H. Barr. Implicit Fairing of Irregular Meshes UsingDiffusion and Curvature Flow. In SIGGRAPH, pages 317–324, New York, NY, USA, 1999.ACM Press/Addison-Wesley Publishing Co.

[5] M. Eck, T. DeRose, T. Duchamp, H. Hoppe, M. Lounsbery, and W. Stuetzl. MultiresolutionAnalysis of Arbitrary Meshes. In EUROGRAPHICS 2002, pages 209–218, 2002.

[6] M A. Fischler and R. C. Bolles. Random Sample Consensus: A Paradigm for Model Fitting withApplications to Image Analysis and Automated Cartography. Commun. ACM, 24(6):381–395,1981.

[7] M.S. Floater and K. Hormann. Surface Parameterization: A Tutorial and Survey. In M. S.Floater and M. A. Sabin, editors, Advances in Multiresolution for Geometric Modelling, pages157–186. Springer-Verlag, 2005.

[8] A. Frome, D. Huber, R. Kolluri, T. Bulow, and J. Malik. Recognizing Objects in Range DataUsing Regional Point Descriptors. In European Conf. on Computer Vision, May 2004.

[9] M. Garland and P. Heckbert. Surface Simplification Using Quadric Error Metrics. In ACMSIGGRAPH, pages 209–216, 1997.

[10] N. Gelfand, N.J. Mitra, L.J. Guibas, and H. Pottmann. Robust Global Registration. In Sym-posium on Geometry Processing, 2005.

[11] A. Gray. Modern Differential Geometry of Curves and Surfaces with Mathematic. CRC Press,1998.

[12] X. Gu, S. Gortler, and H. Hoppe. Geometry Images. In ACM SIGGRAPH, pages 355–361,2002.

[13] C.G. Harris and M.J. Stephens. A Combined Corner and Edge Detector. In Fourth AlveyVision Conference, pages 147–151, 1988.

[14] Hugues Hoppe. Progressive Meshes. In SIGGRAPH, pages 99–108, New York, NY, USA, 1996.ACM.

[15] D. Huber and M. Hebert. Fully Automatic Registration of Multiple 3D Data Sets. Image andVision Computing, 21(7):637–650, July 2003.

[16] A. Johnson and M. Hebert. Using Spin-Images for Efficient Multiple Model Recognition inCluttered 3-D Scenes. IEEE Trans. on Pattern Analysis and Machine Intelligence, 21(5):433–449, 1999.

[17] T. R. Jones, F. Durand, and M. Desbrun. Non-Iterative, Feature-Preserving Mesh Smoothing.In SIGGRAPH, pages 943–949, New York, NY, USA, 2003. ACM.

45

[18] M. Kazhdan, M. Bolitho, and H. Hoppe. Poisson Surface Reconstruction. In EurographicsSymp. on Geometry Processing, pages 61–70, Aire-la-Ville, Switzerland, Switzerland, 2006.Eurographics Association.

[19] J.J. Koenderink. The Structure of Images. Biological Cybernetics, 50:363–370, 1984.

[20] J. Lalonde, R. Unnikrishnan, N. Vandapel, and M. Hebert. Scale Selection for Classification ofPoint-sampled 3-d Surfaces”. In Int’l Conf. on 3-D Digital Imaging and Modeling, 2005.

[21] C.H. Lee, A. Varshney, and D.W. Jacobs. Mesh Saliency. ACM Trans. on Graphics, 24(3):659–666, 2005.

[22] X. Li and I. Guskov. Multi-scale Features for Approximate Alignment of Point-based Surfaces.In Symposium on Geometry Processing, 2005.

[23] T. Lindeberg. Scale-Space Theory in Computer Vision. Kluwer Academics Publishers, 1994.

[24] T. Lindeberg. Feature Detection with Automatic Scale Selection. Int’l Journal of ComputerVision, 30:77–116, 1998.

[25] D. G. Lowe. Distinctive Image Features from Scale-Invariant Keypoints. IJCV, 60(2):91–110,2004.

[26] A. Makadia, A. IV Patterson, and K. Daniilidis. Fully Automatic Registration of 3D PointClouds. In CVPR, pages 1297–1304, Washington, DC, USA, 2006. IEEE Computer Society.

[27] D. Marr and E. Hildreth. Theory of Edge Detection. Proc. Royal Society London, 207:187 –217, 1980.

[28] K. Mikolajczyk and C. Schmid. Scale & Affine Invariant Interest Point Detectors. IJCV,60(1):63–86, 2004.

[29] K. Mikolajczyk and C. Schmid. A Performance Evaluation of Local Descriptors. PAMI,27(10):1615–1630, 2005.

[30] F. Mokhtarian, N. Khalili, and P. Yuen. Cuvature Computation on Free-Form 3-D Meshes atMultiple Scales. Computer Vision and Image Understanding, 83:118–139, 2001.

[31] J. Novatnack and K. Nishino. Scale-Dependent 3D Geometric Features. In IEEE Int’l Conf.on Computer Vision. IEEE Computer Society, 2007.

[32] J. Novatnack, K. Nishino, and A. Shokoufandeh. Extracting 3D Shape Features in DiscreteScale-Space. In Third Int’l Symposium on 3D Data Processing, Visualization and Transmission,2006.

[33] M. Pauly, R. Keiser, and M. Gross. Multi-scale Feature Extraction on Point-sampled Surfaces.EUROGRAPHICS, 21(3), 2003.

[34] M. Pauly, L. P. Kobbelt, and M. Gross. Point-Based Multi-Scale Surface Representation. ACMTrans. on Graphics, 25(2), 2006.

[35] M. Schlattmann. Intrinsic Features on Surfaces. In Central European Seminar on ComputerGraphics, pages 169–176, 2006.

[36] L. J. Skelly and S. Sclaroff. Improved Feature Descriptors for 3-d Surface Matching. Proc. SPIEConf. on Two- and Three-Dimensional Methods for Inspection and Metrology, 6762:63–85, 2007.

[37] Y. Sun and M.A. Abidi. Surface Matching by 3D Point’s Fingerprint. ICCV, 2:263–269, 2001.

46

[38] G. Taubin. A Signal Processing Approach to Fair Surface Design. In ACM SIGGRAPH, pages351–358, 1995.

[39] G. Turk and M. Levoy. Zippered Polygon Meshes from Range Images. In SIGGRAPH, pages311–318, New York, NY, USA, 1994. ACM.

[40] Shinji Umeyama. Least-Squares Estimation of Transformation Parameters Between Two PointPatterns. IEEE Trans. Pattern Anal. Mach. Intell., 13(4):376–380, 1991.

[41] J. Weickert, S. Ishikawa, and A. Imiya. Linear Scale-Space has First been Proposed in Japan.Journal of Mathematical Imaging and Vision, 10(3):237–252, 1999.

[42] A.P. Witkin. Scale-Space Filtering: A New Approach to Multi-Scale Description. In IEEE Int’lConf. on Acoustics, Speech, and Signal Processing, pages 150–153, 1984.

[43] S. Yoshizawa, A. Belyaev, and H-P. Seidel. A Fast and Simple Stretch-Minimizing MeshParametrization. In Int’l Conf. on Shape Modeling and Applications, pages 200–208, 2004.

[44] D. Zhang and M. Hebert. Harmonic Maps and Their Applications in Surface Matching. InIEEE Int’l Conf. on Computer Vision and Pattern Recognition, volume 2, 1999.

scale-dependent/invariant local 3d geometric features …2791... · scale-dependent/invariant local...

Documents