scale-dependent/invariant local 3d geometric features …2791... · scale-dependent/invariant local...
TRANSCRIPT
Scale-Dependent/Invariant Local 3D Geometric Features and Shape Descriptors
A Thesis
Submitted to the Faculty
of
Drexel University
by
John Novatnack
in partial fulfillment of the
requirements for the degree
of
Masters of Science in Department of Computer Science
2008
c© Copyright 2008John Novatnack. All Rights Reserved.
-1
Acknowledgements
I would like to thank my adviser Dr. Ko Nishino for his guidance and encouragement without
which this work would not have been possible. I am extremely grateful for the lessons Dr. Nishino
taught me about properly conducting research and communicating my results in papers and presen-
tations. These lessons will be invaluable in all my future endeavors. I would also like to acknowledge
and thank my defense committee: Dr. Fernand S. Cohen, Dr. Dario Salvucci and Dr. Ali Shoko-
ufandeh. Finally I would like to acknowledgment Dr. Ali Shokoufandeh for his contributions to this
work and for his mentorship throughout my career at Drexel University.
I would like to acknowledge the following for the use of 3D mesh models or range images: the
Stanford Graphics Lab, Signal Analysis and Machine Perception Laboratory at Ohio State Univer-
sity, INRIA, Visual Computing Lab at ISTI-CNR and the Aim@Shape repository.
This material is based in part upon work supported by the National Science Foundation under
CAREER award IIS-0746717. Any opinions, findings, and conclusions or recommendations expressed
in this material are those of the author(s) and do not necessarily reflect the views of the National
Science Foundation.
0
Table of Contents
List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.1 Background and Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.1.1 Scale-Space Analysis of Image Structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.1.2 Analysis of Scale in 3D Geometric Data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.2 Thesis Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2. Geometric Scale-Space of a 3D Mesh Model and Range Image . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.1 2D Representation of the Surface Geometry of a 3D Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.1.1 Normal and Distortion Maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.1.2 Multiple Normal Maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.1.3 Geodesic Distance Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.2 2D Representation of the Surface of a Range Image . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.2.1 Normal Map . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.2.2 Geodesic Distance Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.3 Geometric Scale-Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.3.1 Geometric Scale-Space Operator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3. Feature Detection in the Geometric Scale-Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.1 Derivatives of the Normal Map . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.2 Corners . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.3 Edges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.4 Automatic Scale-Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.5 Experiments. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.5.1 Corner and Edge Detection on 3D Mesh Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.5.2 Corner Detection on Range Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.5.3 Noisy Surface Normals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.5.4 Varying Sampling Densities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4. Scale-Dependent/Invariant Local 3D Shape Descriptors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
4.1 Exponential Map . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
4.2 Scale-Dependent Local 3D Shape Descriptors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
1
4.3 Scale-Invariant Local 3D Shape Descriptors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
4.4 Pairwise Matching of Scale-Dependent/Invariant Local 3D Shape Descriptors . . . . . . . . . . 32
4.4.1 Similarity of Scale-Dependent and Scale-Invariant Descriptors . . . . . . . . . . . . . . . . . . . 32
4.4.2 Pairwise Matching of Scale-Dependent Descriptors. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
4.4.3 Pairwise Matching of Scale-Invariant Descriptors. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
4.5 Experiments. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
4.5.1 Registration of Range Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
4.5.2 Registration of Range Images with Consistent Global Scale . . . . . . . . . . . . . . . . . . . . . . 37
4.5.3 Registration of Range Images with Inconsistent Global Scale . . . . . . . . . . . . . . . . . . . . 39
5. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
5.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
5.2 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
2
List of Figures
2.1 2D normal and distortion map of a 3D model. (a) shows the original model. (b) illustratesthe dense 2D normal map. Observe that geometric features such as the creases on thepalm are clearly visible. (c) shows the distortion maps corresponding to the normalmaps. Darker regions have been shrunk relative to the brighter regions. Iso-contour linesillustrate the various levels of distortion induced by the embedding.. . . . . . . . . . . . . . . . . . . . . . . . . 14
2.2 Representing a 3D model using multiple normal maps. The heat map (a) illustrates thedensity of 3D vertices mapped to each point in the normal map shown in Figure 2.1(b).Figure (b) illustrates the 5 clustered components, each visualized with a different color,automatically computed from the density map in Figure (a). Each component is rep-resented with a separate complimentary normal map. Figures (b) and (c) show twosupplementary normal maps corresponding to two fingers of the model shown in Fig-ure 2.1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.3 2D normal maps of two adjacent range images. Figure (a) shows the depth maps oftwo range images of the Buddha model, separated by 24 degrees. Figure (b) shows thetwo dense normal maps built by triangulating the range images and computing a surfacenormal for each point in the range image. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.4 The geometric scale-space representation of the 3D mesh model shown in Figure 2.1 andFigure 2.2(b,c). As the standard deviation increases fine model details are smoothed awayleaving only coarse geometric structures. For example the finger nail is quickly smoothedaway, while the prominent creases on the palm remain visible even at the coarsest scale.Although the sizes of the finger nails in the two supplementary normal maps are different,the rate of smoothing is consistent due to the use of the geodesic Gaussian kernel thataccounts for the distortion induced by the embedding. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.1 Corners detected on the 2D normal map. (a) illustrates the 20 strongest corners on the2D representation of the hand model at scale σ = 3. Observe that the corner points onthe palm are primarily located where two creases converge, or where there is an acutebend in one crease. (b) shows the strongest corner on two of the finger normal maps atscale σ = 7. At this coarse scale the corners are detected on the tip of the finger. . . . . . . . . . 22
3.2 Edges detected at one scale level (σ = 1). The edges are detected accurately on surfacepoints with locally maximum curvature, namely 3D ridges and valleys. Here the thecreases of the palm form the edges. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.3 Scale-dependent geometric corners (a) and edges (b) detected on the hand model. Thecorners are represented with 3D spheres which are colored and sized according to theirrespective scale (blue and red correspond to the finest and coarsest scales, respectively).The corners accurately represent the geometric scale variability of the model, for instancewith fine corners on the palm creases and coarse corners at the tips of the fingers. Theedges also encode the geometric scale variability, tracing edge segments that arise atdifferent scales. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.4 Scale-dependent geometric corner and edge detection results on a disc topology model(Caesar) and two genus zero models (Buddha and armadillo). The corner and edgesare accurately detected across different scales. The resulting scale-dependent geometricfeature set encode the scale variability of the underlying surface geometry resulting in aunique representation of each model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3
3.5 Scale-dependent geometric corner detection on two pairs of range images, separated by 24degrees. Despite the noise inherent to range image data, the corners are distributed acrossscales and reflect the relative scale of the underlying geometric structures. Additionally,the corners display a high degree of repeatability, both in location and scale, betweentwo adjacent range images. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.6 Scale-dependent geometric corner detection with the presence of noisy surface normals(a), and with varying surface sampling densities (b). When compared with the cornersshown in Figure 3.4 the results demonstrate that the scale-dependent corners detectedwith our framework are largely invariant to significant input noise and variations in thesampling density. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
4.1 Scale-dependent features and local shape descriptors in the geometric scale-space of arange image. Features are colored and according to the scale in which they were detected,with red being the coarsest and blue being the most fine. A subset of local shape-dependent shape descriptors is illustrated that form a hierarchical representation of theunderlying scale-variability in the surface of a range image and enables correspondencesto be determined robustly between pairs of range images. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
4.2 Matching two range images with consistent global scale represented as a set of scale-dependent local shape descriptors. On the left we show the 67 point correspondencesfound with our matching algorithm and on the right the resulting rigid transformation.The hierarchy of the local scale-dependent 3D shape descriptors enables a coarse-to-fine sampling strategy that results in a large number of correspondences that accuratelyapproximate the pairwise rigid transformation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
4.3 Matching two range images with inconsistent global scales represented with a set of scale-invariant local shape descriptors. On the left we show the 24 point correspondences foundwith our matching algorithm and on the right the resulting similarity transformation.Even though the range images differ by a global scale factor of approximately 2.4 the scale-invariant local descriptors enable us to recover the similarity transformation accurately. . . 35
4.4 Fully automatic approximate registration of 15 views of the Buddha model, 12 views of thearmadillo model and 18 views of the angel model with scale-dependent local descriptors.In the first column we show the initial set of range images. In the second we show theapproximate registration obtained with our framework, which is further refined with ICPin the third column. Finally a water tight model is built using a surface reconstructionalgorithm [18]. Observe that the initial approximation obtained with our framework isquite accurate. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
4.5 Automatic registration of 42 range images, 15 views of the Buddha model, 12 viewsof the armadillo and 15 views of the dragon model. The accuracy of the approximateregistration found with our framework enables us to automatically discover that the rangeimages correspond to three disjoint models. Note that these registrations have not beenpost-processed with a global registration algorithm. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
4
4.6 Fully automatic approximate registration of 15 views of the Buddha and dragon modelseach with a random global scaling from 1 to 4. For each model we visualize the initialset of range images and the initial alignment obtained by our framework. In columnthree we show the results after applying ICP and in column four we show the resultsof the surface reconstruction. Even with the substantial variations in the global scale,our scale-invariant representation of the underlying geometry enables us to accuratelyapproximate the registration without any assumptions about the initial poses. . . . . . . . . . . . . . 39
4.7 Automatic registration of 42 randomly scaled range images, 15 views of the Buddhamodel, 12 views of the armadillo and 15 views of the dragon model, each. Each range im-age was randomly scaled by a factor between 1 and 4. Again, despite the significant scalevariations, our scale-invariant representation of the underlying local geometric structuresenables us to fully automatically construct initial pose estimates for all three modelssimultaneously. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
5
AbstractScale-Dependent/Invariant Local 3D Geometric Features and Shape Descriptors
John NovatnackAdvisor: Ko Nishino
The quality and abundance of three-dimensional geometric data is rapidly increasing with the
drastic improvement in the cost and effectiveness of 3D acquisition hardware. In fact, three-
dimensional geometric data already plays a central role in many computer vision and computer
graphics applications such as autonomous vehicle navigation, 3D object recognition and the computer-
based preservation of cultural artifacts. Despite the increasing relevance and importance of geometric
data, current techniques of processing the data have neglected to explicitly model and exploit a sig-
nificant source of information of the data - the scale variability of the local geometric structures.
In this thesis we overcome the limitation of past techniques with a comprehensive framework of
modeling the scale-variations in local geometric structures, effectively adding an additional dimen-
sion to geometric data. To accomplish this we derive the geometric scale-space, a representation
of local geometric structures at various degrees of scale. This representation enables us to define
scale-dependent geometric feature detectors, such as corners and edges, that determine not only the
location of salient geometric features, but also their relative scales. The augmentation of a geometric
feature with its intrinsic scale enables us to define scale-dependent/invariant local shape descrip-
tors that together form both a hierarchical and scale-invariant representation of the local geometric
structures of a 3D shape. We derive and present the theory of these methods and also demonstrate
their effectiveness for the purposes of robust 3D feature detection and fully automatic range image
registration.
6
1. Introduction
Three-dimensional geometric data plays a fundamental role in many modern systems and ap-
plications. For example in autonomous vehicle navigation systems, 3D geometric data is acquired
through laser range scanners or other sensors and is used to make navigation decisions and to detect
threats in the environment. Also 3D geometric data is beginning to play a central role in computer-
based cultural preservation. Archaeologists have begun to bring 3D range scanners to dig sites so
that 3D computer models can be built as the artifacts are excavated. The overall increase in the
quality, ease of deployability and affordability of 3D acquisition systems has led to a drastic increase
in the importance of 3D geometric data in applications such as these and many more. This increase
in importance has created a need for effective algorithms to represent, acquire and process this 3D
geometric data.
When acquired with modern 3D acquisition hardware, 3D geometric data is often incredibly
complex, consisting of many millions of points or more. Data of such immense size and complexity
makes building efficient methods of processing 3D geometric data a challenging task. A fundamental
question being asked by computer vision and computer graphics researchers is how to construct rich,
but concise representation of the local structures in 3D geometric data that alleviates some of the
complexity of 3D geometric data while enabling efficient and robust geometric processing algorithms.
In order to deal with the enormity of geometric data, it is common to detect local salient geometric
features that together form a more concise representation of the underlying geometry. For example,
in the case of a human bust model we may represent the underlying data by a set of points on
the nose, skin pores, hair and corners of the mouth. Local geometric features such as these play a
central role in a large number of computer vision and computer graphics applications and therefore
developing rich 3D geometric features is a fundamental problem in computer vision and computer
graphics research.
Despite the importance of detecting a rich and concise set of geometric features, most current
methods have completely ignored a significant source of information about local geometric structures
- their relative scale. Instead the scale of local structures is incorporated in an ad hoc manner,
such as with tunable parameters that can lead to ineffective and incomplete representations of the
underlying geometric structures. The topic of this thesis is to develop a comprehensive framework
for detecting scale-dependent geometric features that may then be encoded in rich shape descriptors.
7
For example again consider the case of a human bust model. Rather than simply detecting salient
geometric points on the nose, skin pores and corners of the mouth, our framework can augment each
feature with a notion of its relative scale, i.e. the nose occurs at a coarser scale than the mouth
corners, which in turn occur at a coarser scale than the skin pores. Augmenting local geometric
features with a notion of scale is a powerful paradigm that effectively adds a fourth dimension to
our representation of 3D geometric data. In fact, the central role local geometric features play
in computer vision and computer graphics causes such a technique to be widely applicable to any
method or application that has a significant geometric processing component.
1.1 Background and Related Work
Representing complex data with local features is not an approach specific to 3D geometry, but is
also commonly used to represent local image structures for the purposes of image processing or 2D
computer vision. In fact, the problem of incorporating notions of the scale of local image structures
was recognized early on in computer vision research and has led to a large number of effective
algorithms with strong theoretical bases.
1.1.1 Scale-Space Analysis of Image Structures
In computer vision, early feature detectors such as corners [13] and edges [27] proved to be
powerful representation of underlying local image structures. However, they suffer from the fact that
they are tied to a specific scale, often determined by the size of the local window used to detect the
feature in the image. In order to overcome this limitation researchers developed methods of analyzing
local image structures at various scales simultaneously, without the use of a tunable parameter [42,
41, 24, 19, 23]. This is achieved by constructing an image scale-space, or a representation of an image
at different scales, effectively adding a third dimension to a two-dimensional image. The scale-space
of an image is constructed by convolving the image with Gaussian kernels of increasing standard
deviation. The standard deviation of the Gaussian serves as the third dimension of the scale-space
that controls the amount of smoothing, or blurring, of the local image structures.
The scale-space representation of a 2D image is powerful in that it enables features to be detected
at all scales simultaneously. Consider, for example, an image of a house. At the base of the scale-
space is the inputted image from which a countless number of fine scale corners, such as on the texture
of the bricks, may be detected. However, as the standard deviation of the Gaussian is increased,
8
the fine scale features are smoothed away leaving only coarse, or large scale features, such as on
the boundary contours of the house. The scale-space of an image allows us to detect both of these
local feature types without the need to manually tune a feature detector for the “magical number”
that will result in a feature set that represents all the underlying image structures accurately. In
fact, the inherent multi-scale nature of image structures in complex scenes may make such a number
impossible to find. An additional strength of the scale-space representation of an image is that it
is not tied to a specific feature detector. Instead a large number of features detectors have been
derived such as corners, edge, blobs, ridges and more [23].
The 2D scale-space allows us to detect not only the location of a local feature in an image, but
also its intrinsic scale, through the use of an automatic scale-selection technique [24]. Automatic
scale-selection determines the intrinsic scale of a feature by searching for local maxima in the feature
response across image scales. Augmenting local image features with a notion of their intrinsic scale
serves as an added piece of information that enhances the representational power of the feature set.
Additionally, the ability to determine the intrinsic scale of a feature enables scale-invariant local
image features detectors [28, 25], that is, local features that are invariant to the scale of the object
in the image. Scale-invariant image features enable objects in two different images to be matched
even if they occur at drastically different scales. For example, consider the case where one image
contains a close-up of a toy and another image contains a crowded scene with a number of toys,
including the one in the first image. Scale-invariant image features enable us to match these two
instances of the same toy, despite their drastic scale differences in the two images. Scale-invariant
feature descriptors have proven to be robust to changes in lighting, noise and even slight rotations
in depth and have been shown to significantly outperform local image features that do not account
for the variations in the scale of local image structures [29].
1.1.2 Analysis of Scale in 3D Geometric Data
In order to model and exploit the scale-variations in 3D geometric data the first issue is to
construct a representation of the data at different scales. One approach to this problem is to adapt
techniques of constructing a multi-resolution representation of a mesh [14, 5], where the complexity
of the mesh may be changed interactively. These methods are efficient and well supported in current
3D programming libraries; however, they suffer from the fact that they are sensitive to the sampling
of the original model. Instead, most approaches of modeling a 3D shape at different scales have
attempted to apply smoothing operators similar to 2D scale-space theory to 3D geometric data by
9
smoothing the 3D points, for instance with Gaussian kernels [30] or mean curvature flow [35, 4].
Although these approaches seem straightforward they are problematic as they lead to alternations
in the global topology of the geometric data, in particular through fragmentation of the original
model [38], which leads to an erroneous representation of the geometry at different scales.
A second limitation of past approaches is that the Euclidean distance between 3D points was
used as the distance metric for modeling the geometry and detecting local features at different
scales [10, 22, 34, 20, 17]. This can lead to the creation of erroneous features when portions of the
surface are close in 3D space, however actually have a geodesic distance that is large. For instance,
consider the case where we wish to detect features at the tip of each finger on a 3D human hand
model. The corner features on the tip of the finger naturally exist at a relatively coarse scale, and
therefore if the Euclidean distance is used as the distance measure the corner detector may include
both finger tips simultaneously. However, if the geodesic distance is used as the base distance
metric then the finger tips will correctly have a large distance. Using an accurate distance metric is
especially important for detecting 3D geometric features as feature repeatability and reliability are
critical when matching two 3D feature sets.
A final limitation of past methods is that they are tailored to detect only a single type of feature,
such as corners [10] or edges [33], and do not extend easily to other feature types. This is in stark
contrast to the 2D scale-space which is a general representation of the image at different scales that
has enabled a large number of feature detectors to be derived. This flexibility is important as it
enables the framework to be extended to a large number of applications.
1.2 Thesis Overview
In this thesis we present a comprehensive framework for modeling the scale variability of local
geometric structures and representing that variability in scale-dependent/invariant geometric fea-
tures and shape descriptors. We begin by presenting our technique for constructing the geometric
scale-space, a canonical representation of the scale-variations in 3D geometric data, in Chapter 2.
Similarly to the scale-space of an image, the geometric scale-space is a general representation of
geometric data that effectively adds a fourth dimension to three-dimensional data. Also similarly
to the scale-space of an image, the geometric scale-space is a general representation that may be
leveraged by any number of feature detectors or other types of algorithms.
The key insight behind our approach is that we should construct a geometric scale-space represen-
10
tation of the surface geometry of the 3D model at hand, since we are interested in the scale-dependent
geometric features lying on the surface. This is opposed to constructing the scale-space directly on
the 3D coordinates of the points representing a 3D model. For this reason, we consider the normal
field on the 2D surface of the 3D object as the base representation of the geometry. While higher-
order geometric entities such as the mean curvature [21] can be used instead, surface normals are a
better choice since they are the first-order partial derivatives of the raw geometric data which is less
affected by noise compared to higher-order partial derivatives.
Although our framework can be adapted to work with any representation of three-dimensional
geometric data, we focus in this work particularly on 3D mesh models and range images. A 3D mesh
model is a discrete representation of a 3D polyhedral object that consists of a set of 3D points, or
vertices, edges that connect the points and faces that are groups of edges that form 2D polygons,
such as a triangle in the case of a triangular mesh. As a second representation of geometric data
we also consider range images, which are dense and regular 2D images of ranges, or distances, to an
object. Range images are a common representation of a single view of a laser range finder, where an
object will be represented with a large number of range images that are then combined to construct
a final 3D model.
In order to construct the geometric scale-space of a 3D mesh model or range image we first
represent the input geometry as a dense and regular “image” of the surface normals of the object.
Representing the 2D surface of a 3D object in a dense and regular 2D plane enables us to leverage
techniques from scale-space analysis of intensity images to the scale-space analysis of 3D geometry.
In the case of a 3D mesh model we construct this 2D representation by parameterizing, or mapping,
the surface of the mesh model onto a 2D plane. We then interpolate over the surface normals at
each 2D-embedded vertex of the mesh model to obtain a dense and regular 2D representation (vector
field) of the original surface, which we refer to as the normal map. In the case of a range image, the
key idea behind constructing the normal map is that a range image readily provides a 2D projection
of the original surface onto a 2D plane. The normal map can be constructed by triangulating the
range image and approximating surface normals at each vertex. With both 3D mesh models and
range images, the normal map enables us to represent a piece of 3D geometry in a manner that is
independent of the underlying sampling density of the original object.
We compute the geometric scale-space of the normal map by convolving the vector field with
Gaussian kernels of increasing standard deviation [31, 32] similarly to 2D scale-space. However, the
Gaussian kernel is modified so that distances are defined not by the Euclidean distance, but rather
11
the original surface geodesics. The geodesic distances are approximated efficiently in the case of a
3D mesh model with a distortion map, a novel representations of the relative distortion induced by
the parameterization at each point in the normal map. In the case of a range image the surface
geodesics are approximated directly from the range data itself. Defining the geometric scale-space
operator in terms of the surface geodesics ensures that the construction and analysis of the geometric
scale-space is equivalent to analyzing the scale-space of the normal field on the surface of the 3D
object, however, algorithmically in a much simpler way since we compute it in a regular and dense 2D
domain. Additionally, because our base representation of the geometric data is a dense and regular
2D plane, we can leverage techniques developed for 2D scale-space in developing novel methods for
the geometric scale-space.
A rich set of scale-dependent features can be extracted from the resulting geometric scale-space
representation. In particular, in Chapter 3 we derive detectors to extract geometric corners and edges
at different scales. In order to establish these detectors we derive the first- and second-order partial
derivatives of the normal map. Finally, we derive an automatic scale selection method analogous to
that of 2D scale-space theory to identify the natural scale of each feature and to unify all features
into a single set. The result is a set of scale-dependent 3D geometric features that provide a rich
and unique basis for representing the 3D geometry of the original object. The effectiveness of the
proposed method is evaluated on several range images and 3D mesh models of different topology
and its robustness to noise and mesh sampling density variation is demonstrated.
In Chapter 4 we show that we can encode the spatial extent of each local geometric feature in
a scale-dependent local 3D shape descriptor that together form a hierarchical representation of the
local geometric structures captured in a 3D mesh model or range image. Additionally, we show how
we may define a scale-invariant local 3D shape descriptor that is invariant to the inherent local scale
of the geometry and that can be used to match a pair of 3D objects of unknown or inconsistent
global scale. We demonstrate the effectiveness of our framework of encoding the scale-variability
of 3D mesh models and range images in scale-dependent/invariant local 3D shape descriptors by
deriving a fully automatic technique for approximately registering a set of range images. We further
demonstrate the effectiveness of our framework by fully automatically registering a set of range
images corresponding to multiple 3D models simultaneously.
The framework presented in this thesis enables us to directly model the scale-variations of ge-
ometric structures in local scale-dependent features. Due to the fundamental role local geometric
features play in constructing rich and concise representation of geometric data in computer vision
12
and computer graphics applications, this framework has wide applicability. We conclude the thesis
with a discussion on some of the directions future work in this area may take and also discuss a
number of potential applications. In fact, we imagine that the framework can be applied in any
geometric processing method that requires a concise and rich representation of the underlying local
geometric structures.
13
2. Geometric Scale-Space of a 3D Mesh Model and Range Image
Geometric features that represent a 3D mesh model or range image reside on the model’s surface.
For this reason, we must construct a geometric scale-space that faithfully encodes the scale variability
of the surface geometry. We represent the geometry of a 3D mesh model or range image in a dense
and regular 2D representation. The construction of this 2D representation depends on whether the
input geometry is a 3D mesh model, which resides in 3D dimensions and must be mapped to a 2D
domain, or a range image which is inherently defined on a dense and regular 2D domain. From
the 2D representation of the input geometry we construct the geometric scale-space by deriving and
applying a geometric scale-space operator that correctly accounts for the geodesic distances on the
surface.
2.1 2D Representation of the Surface Geometry of a 3D Model
We construct a 2D representation of a 3D mesh model by first unwrapping the surface of the
model onto the 2D plane. The result of this mapping is a sparse encoding of the mesh vertices in the
2D plane. We build a dense and regular 2D representation of the surface geometry by interpolating
over a geometric entity, specifically the surface normal, associated with each model vertex. Ideally,
this mapping would be isometric, however in general an isometric embedding cannot be achieved
and shrinkage and expansion of mesh edges will occur. We compensate for these distortions by
representing a 3D mesh model with a set of normal maps, each associated with different components
of the mesh model. Lastly, by constructing a dense map of the distortion induced by the mapping
at each point in the 2D domain we can accurately reconstruct the geodesic distances of the original
3D mesh model. This enables us to accurately build the geometric scale-space of the original surface
geometry with our 2D representation.
2.1.1 Normal and Distortion Maps
Given a 3D mesh model M and the planar domain D we seek a bijective parameterization
φ : D → M from a discrete set of planar points to the mesh vertex set. Since we later accurately
account for the introduced distortion in the distance metric, any embedding algorithm may be
used 1. For this work, we compute an initial parameterization based on the estimation of the1Readers are referred to [7] for a recent survey on various R3 to R2 parameterization algorithms.
14
(a) (b) (c)
Figure 2.1: 2D normal and distortion map of a 3D model. (a) shows the original model. (b)illustrates the dense 2D normal map. Observe that geometric features such as the creases on thepalm are clearly visible. (c) shows the distortion maps corresponding to the normal maps. Darkerregions have been shrunk relative to the brighter regions. Iso-contour lines illustrate the variouslevels of distortion induced by the embedding.
harmonic map [5], and iteratively refine it using a method proposed by Yoshizawa et al. [43] which
minimizes the distortion in the surface area of the triangulation.
The results of the above embedding is a 2D sparse “image” of the 3D mesh vertices. In order
to construct a regular and dense representation of the original surface, we interpolate a geometric
entity associated with each of the vertex points in the 2D domain. Surface normals are a natural
choice for this entity due to the fact that they are less affected by noise as compared to higher-
order derivative quantities such as curvature. Furthermore, they convey directional information of
the surface geometry as opposed to scalar quantities such as mean or Gaussian curvature. Note
that 3D coordinates cannot be used since they form the extrinsic geometry of the surface. Given
a parameterization φ we construct a dense normal map N : R2 → S2, where N maps points in D
to the corresponding 3D normal vector. Figure 2.1 (b) shows an example of such a normal map.
The resulting normal map provides a dense and regular 2D representation of the original 3D mesh
model. Most importantly, the density of the normal map is independent of the mesh resolution of
the given 3D model, therefore subsequent feature detection can be achieved at arbitrary precision
on the surface and is robust to changes in the sampling density on the original 3D model.
In order to accurately construct the geometric scale-space representation of the original surface
geometry of the 3D mesh model, we require the relative geodesic distance between any two points
on the normal map 2. This geodesic distance can be computed by accounting for the distortion
introduced by the embedding. Given a point u = (s, t) ∈ D that maps to a 3D mesh vertex φ(u)
2Note that there is also a global scaling between the 3D mesh model and its parameterized 2D image.
15
(a) (b) (c) (d)
Figure 2.2: Representing a 3D model using multiple normal maps. The heat map (a) illustrates thedensity of 3D vertices mapped to each point in the normal map shown in Figure 2.1(b). Figure (b)illustrates the 5 clustered components, each visualized with a different color, automatically computedfrom the density map in Figure (a). Each component is represented with a separate complimentarynormal map. Figures (b) and (c) show two supplementary normal maps corresponding to two fingersof the model shown in Figure 2.1.
we may define its distortion ε(u) as the average change in edge lengths connected to each vertex:
ε(u) =1
|A(u)|∑
v∈A(u)
‖ u− v ‖‖ φ(u)− φ(v) ‖
, (2.1)
where A(u) is a set of vertices adjacent to u. We construct a dense distortion map by again
interpolating over the values defined at each vertex. Figure 2.1(c) depicts the distortion map of the
hand model in Figure 2.1(a).
2.1.2 Multiple Normal Maps
Mesh parameterization algorithms often require 3D models to contain a natural boundary loop
that is mapped to the boundary of the planar domain. For 3D models that do not contain such
boundaries, we need to introduce cuts that map to the boundaries of the 2D domains, for instance
one boundary cut for a genus-0 model, and compute multiple normal maps corresponding to different
surface regions. At the same time, in general, surface regions mapped close to the boundary of the
2D domain are significantly distorted. In such areas the local neighbor of a 3D vertex is mapped
into a highly skewed region, which can introduce errors in the subsequent filtering.
We introduce boundary cuts such that they avoid surface regions that are most likely to contain
discriminative features. Specifically, for each 3D boundary cut, we manually select end points and
automatically trace vertices with low curvature 3. Furthermore, we construct a complimentary3This is the opposite of methods that try to minimize the overall distortion in the parameterization [12].
16
parameterization where portions of the model that were mapped to the perimeter of the original
embedding are mapped to the central region of D. By using this complimentary parameterization
together with the original embedding, we can ensure that every surface region is mapped to a region
in the normal map where the local structure is well preserved.
There can also be a considerable loss of information due to the finite resolution of the planar
domain D. Often when a model has large appendages the discretization of the normal map results in
2D points which correspond to multiple mesh vertices. To compensate for this many-to-one mapping,
we first compute a density map of vertices as shown in Figure 2.2(a). By clustering the points of
this histogram into disjoint sets and parameterizing the corresponding portions of the mesh in each
cluster, we construct a comprehensive set of supplementary normal maps. Figure 2.2(b) shows the
5 clusters, each visualized with a different color, automatically computed from the density map.
Figures 2.2(b) and (c) show two such supplementary normal maps corresponding to two fingers of
the model shown in Figure 2.1. By representing the 3D model using multiple normal maps, we
ensure that all portions of the surface are covered by the representation.
2.1.3 Geodesic Distance Function
In order to construct a geometric scale-space of a 3D mesh model that accurately encodes the
scale-variability in the underlying surface geometry we define all operators in terms of the geodesic
distance, rather then the Euclidean 3D or 2D distances. The geodesic distance between two 3D
points φ(v) and φ(u) on a 3D mesh model is defined as the minimum length line integral between
v and u in the distortion map. We approximate this by computing the discretized line integral
dM (u,v),
dM (v,u) ≈∑
vi∈P(v,u), 6=u
ε(vi)−1 + ε(vi+1)−1
2‖ vi − vi+1 ‖ , (2.2)
where P(v,u) = [v,v1,v2, ...,vn,u] is a list of points sampled on the line between v and u. The
quality of the approximation is determined by the sampling density of this line. Also the actual
geodesic is the line integral of the path with the minimum length, approximated here with the
straight-line distance. Although this assumption is not true, in general, we found that this geodesic
distance approximation to be sufficient in practice.
17
(a) (b)
Figure 2.3: 2D normal maps of two adjacent range images. Figure (a) shows the depth maps of tworange images of the Buddha model, separated by 24 degrees. Figure (b) shows the two dense normalmaps built by triangulating the range images and computing a surface normal for each point in therange image.
2.2 2D Representation of the Surface of a Range Image
The key insight underlying the 2D representation of the surface geometry captured in a range
image is that each range image is already a dense and regular projection of one view of the surface
of a 3D model. This means that we can avoid the embedding step necessary when constructing the
2D representation of a 3D mesh model. Furthermore, since the range image readily provides the
depth of each point the distortion map is unnecessary as the geodesics can be approximated directly
from the range data itself.
2.2.1 Normal Map
Given a range image R : D → R3, where D is a 2D domain in R2, we build the geometric
scale-space by triangulating the range image and then computing a surface normal for each vertex.
The density and regularity of the range image itself ensures that the density and regularity of the
resulting normal map. Figure 2.3 illustrates the normal maps computed from two Buddha range
images, separated by 24 degrees. Observe that the normal maps contain a significant amount of
information about the geometry of the 3D model.
18
2.2.2 Geodesic Distance Function
In order to construct a geometric scale-space of a range image that accurately encodes the
scale-variability in the underlying surface geometry we define all operators in terms of the geodesic
distance, rather than the Euclidean 3D or 2D distances. The geodesic distance between any two
points in a range image may be approximated by summing the change in 3D distance along the path
connecting those two points. Specifically, given two points u,v ∈ D we approximate the geodesic
distance dR(u,v) of the range image as
dR(u,v) ≈∑
ui∈P(u,v), 6=v
‖ R(ui)−R(ui+1) ‖, (2.3)
where P is a list of points on the path between u and v. If the path between u and v crosses
an unsampled point in the range image then we define the geodesic distance as infinity. When
approximating geodesics from a point of interest outwards, this sum can be computed efficiently
by storing the geodesic distances of all points along the current frontier and reusing these when
considering a set of points further away.
2.3 Geometric Scale-Space
The 2D representation and geodesic distance functions computed from either a 3D mesh model
or range image serve as the basis from which a discrete geometric scale-space is constructed. The
geometric scale-space is constructed by successively convolving the base normal map with a Gaussian
kernel of increasing standard deviation, where the kernel is defined in terms of the geodesic distance
of either the 3D mesh model or range image. The resulting geometric scale-space directly represents
the inherent scale-variability of local geometric structures captured in a 3D mesh model or range
image and serves as a rich basis for further processing.
2.3.1 Geometric Scale-Space Operator
To construct a (discrete) geometric scale-space we convolve the normal map with a Gaussian
kernel and renormalize the normals at each level. As in 2D scale-space theory [23], the standard
deviation of the Gaussian is monotonically increased from fine to coarse scale levels. We use the
geodesic distance as the distance metric to accurately construct a geometric scale-space that encodes
the surface geometry. Given a 2D isotropic Gaussian centered at a point u ∈ D, we define the value
19
(a) Scale σ = 1 (b) Scale σ = 3 (c) Scale σ = 5 (d) Scale σ = 7
Figure 2.4: The geometric scale-space representation of the 3D mesh model shown in Figure 2.1 andFigure 2.2(b,c). As the standard deviation increases fine model details are smoothed away leavingonly coarse geometric structures. For example the finger nail is quickly smoothed away, while theprominent creases on the palm remain visible even at the coarsest scale. Although the sizes of thefinger nails in the two supplementary normal maps are different, the rate of smoothing is consistentdue to the use of the geodesic Gaussian kernel that accounts for the distortion induced by theembedding.
of the geodesic Gaussian kernel at a point v as
g(v;u, σ) =1
2πσ2exp
[−d(v,u)2
2σ2
](2.4)
where d is the geodesic distance function defined as either dM or dR depending on whether the
input geometry is a 3D mesh model or range image, respectively. Other than this discrepancy, the
remainder of the theory is consistent regardless of the input type.
Using this geodesic Gaussian kernel, we compute the normal at point u for scale level σ as
Nσ(u) =
∑v∈W
N(v)g(v;u, σ)
‖∑
v∈WN(v)g(v;u, σ) ‖
, (2.5)
where W is a set of points in a window centered at u. The window size is also defined in terms
of the geodesic distance and is set proportional to σ at each scale level. In our implementation,
we grow the window from the center point while evaluating each point’s geodesic distance from
the center to correctly account for the points falling inside the window. Figure 2.4 shows the
normal map of the hand model and two supplementary normal maps at four scale levels. As the
standard deviation of the Gaussian increases fine model details are smoothed away, leaving only
coarse geometric structures.
20
3. Feature Detection in the Geometric Scale-Space
In order to detect salient local features in the geometric scale-space of a 3D mesh model or
range image, we first derive the first- and second-order partial derivatives of the normal map Nσ.
Novel corner and edge detectors are then derived using these partial derivatives. An automatic
scale-selection algorithm is introduced to unify the features detected at each scale into a single set of
scale-dependent geometric features. Lastly, the effectiveness and robustness of the proposed method
for computing scale-dependent geometric features is demonstrated on several 3D models of varying
topology as well as a set of range images.
3.1 Derivatives of the Normal Map
We first derive the first-order partial derivatives of the 2D normal map in the horizontal (s) and
vertical (t) directions. In the following we describe them only for the horizontal (s) direction. The
partial derivatives in the vertical direction (t) may be derived by simply replacing s with t. Note
that because we represent the geometric scale-space of both 3D mesh models and range images with
a set of discrete normal maps, the following sections are independent of the actual input data type
with the exception of the particular geodesic distance function.
At any point in the normal map the horizontal direction corresponds to a unique direction on
the tangential plane at the corresponding 3D point. The first-order derivative is thus the directional
derivative of the normal along this specific direction in the tangential plane, known as the normal
curvature. In the discrete domain D the normal curvature in the horizontal (Cs) direction at a point
u = (s, t) may be computed by numerical central angular differentiation:
Ns(u) =∂N(u)
∂s= Cs(u) ≈
sin( 12θ(u−1,u+1))
L(u−1,u+1), (3.1)
where u±1 = (s±1, t), θ(u−1,u+1) is the angle between the normal vectors N(u−1) and N(u+1), and
L(u−1,u+1) is the chord length between the 3D points φ(u−1) and φ(u+1). Because the normal
curvature is a function of adjacent points in the 2D domain D the chord length L is simply the
geodesic distance between these points. After applying the discrete geodesic distance in Equation 2.2
we obtain
Ns(u) ≈sin( 1
2θ(u−1,u+1))d(u−1,u+1)
, (3.2)
21
where again d may refer to the geodesic distance particular to a 3D mesh model or range image.
Note that because the angle between the two normal vectors is in the range [0, π], the first-order
derivative is nonnegative at both convex and concave surface points – it is unsigned.
The second-order derivative of the normal map can be derived as
Nss(u) =∂2N(u)
∂s2=
∂Cs(u)ds
. (3.3)
After applying the chain rule to Equation 3.1 we obtain
Nss(u) ≈ ∂θ(u−1,u+1)∂s
cos(
12θ(u−1,u+1)
)L(u−1,u+1)
− ∂L(u−1,u+1)∂s
2 sin(
12θ(u−1,u+1)
)L(u−1,u+1)2
. (3.4)
In the case of a 3D mesh model, we can safely assume that the parameterization induces a uniform
distortion between every adjacent point in D, the derivative of the chord length L will be zero, and
the second term vanishes. In this case we may apply numerical central differentiation to θ and using
the half angle formula, the second-order derivative reduces to
Nss(u) ≈ θ(u−2,u)− θ(u+2,u)d(u−1,u+1)
√12 (1 + N(u−1) ·N(u+1))
d(u−1,u+1). (3.5)
This form is particularly attractive as it enables us to compute the second-order derivative in terms of
the original normal vectors, and the change in the local angle. The noise associated with higher-order
derivatives is reduced as we have avoided an additional numerical differentiation of the first-order
derivatives.
3.2 Corners
Consider the hand model shown in Figure 2.1. We wish to detect geometrically meaningful
corners such as the finger tips as well as the points on the sharp bends of the palm prints. In other
words, we are interested in detecting two different types of geometric corners, namely points that
have high curvature isotropically or in at least two distinct tangential directions. The rich geometric
information encoded in the normal maps enable us to accurately detect these two types of 3D corners
using a two-phase geometric corner detector.
We begin by computing the Gram matrix M of first-order partial derivatives of the normal map
22
(a) σ = 3 (b) σ = 7
Figure 3.1: Corners detected on the 2D normal map. (a) illustrates the 20 strongest corners on the2D representation of the hand model at scale σ = 3. Observe that the corner points on the palmare primarily located where two creases converge, or where there is an acute bend in one crease. (b)shows the strongest corner on two of the finger normal maps at scale σ = 7. At this coarse scale thecorners are detected on the tip of the finger.
Nσ at each point. The Gram matrix at a point u is defined as
M(u;σ, τ) =∑v∈W
Nσs (v)2 Nσ
s (v)Nσt (v)
Nσs (v)Nσ
t (v) Nσt (v)2
g(v;u, τ) , (3.6)
where W is the local window around the point u. M has two parameters, one that determines
the particular scale in the scale-space representation (σ), and one that determines the weighting of
each point in the Gram matrix (τ)1. The corner response at a point u is defined as the maximum
eigenvalue of M. However, due to the unsigned first-order derivative the resulting corner set will
contain not only the aforementioned two desired types of geometric corners, but will also contain
points lying on 3D edges.
The second-order derivatives of the normal map can be used to prune the corners lying along
the 3D edges. We first prune the corner points that are not centered on zero crossings in both
the horizontal and vertical directions. Next we keep only those points where the variance of the
second-order partial derivatives around the point u are within a constant factor of each other. The
closer this constant factor is to 1, the greater the geometric variance of the selected corner points
in both tangential directions. Figure 3.1 illustrates corners detected on the hand model shown in
Figure 2.1 at one scale level. Once the corners are detected in 2D they can be mapped back to the
3D model. Because the 2D normal map is dense, the corresponding location of the corners in 3D1In our experiments we set τ = σ/2 experimentally.
23
Figure 3.2: Edges detected at one scale level (σ = 1). The edges are detected accurately on surfacepoints with locally maximum curvature, namely 3D ridges and valleys. Here the the creases of thepalm form the edges.
are independent of the input model’s triangulation.
3.3 Edges
In order to find edges at each scale level in the geometric scale-space we use the second-order
derivatives of the normal map. Although the first-order derivative is unsigned, locating edges using
the zero crossing of the second-order derivative is sufficient, as the sign only affects the profile of the
derivative values around the zero crossing, and not the actual location of the zero crossing.
Similar to the classic work of Marr and Hildreth [27] for 2D images, given a normal map, we
begin by computing the Laplacian, defined as
∇2Nσ = Nσss + Nσ
tt (3.7)
Next we construct a binary image that contains the zero-crossing of the Laplacian. This set of zero
crossings contains points centered on curvature maxima, as well as spurious edge points arising from
uniform or slow changing curvature regions. We remove the spurious edge points by thresholding
the magnitude of the first-order derivative, and the variance of the second-order derivative. This
ensures that edges are detected in high curvature regions, and lie along portions of the surface with
a significant variation in the surface geometry.
Figure 3.2 shows an example result of edge detection on the hand model at one scale level (σ = 1).
Again, these edges are localized on the surface of the 3D model independent of the mesh resolution.
Additional post-processing may also be applied to the edges once they are mapped onto the 3D
24
(a) (b)
Figure 3.3: Scale-dependent geometric corners (a) and edges (b) detected on the hand model. Thecorners are represented with 3D spheres which are colored and sized according to their respectivescale (blue and red correspond to the finest and coarsest scales, respectively). The corners accuratelyrepresent the geometric scale variability of the model, for instance with fine corners on the palmcreases and coarse corners at the tips of the fingers. The edges also encode the geometric scalevariability, tracing edge segments that arise at different scales.
model. In latter experiments, we first compute a minimum spanning tree of the 3D edge points,
where the magnitude of the edge response in 2D determines the weight of the 3D point similar
to [33]. Then we decompose the tree into a set of disjoint edge paths via caterpillar decomposition
and fit NURBs curves to each of these paths to obtain smooth parametric 3D edges.
3.4 Automatic Scale-Selection
Once features are detected in each of the normal maps in the geometric scale-space they can
be unified into a single feature set. Although a feature may have a response at multiple scales,
it intrinsically exists at the scale where the response of the feature detector is maximized. By
determining this intrinsic scale for each feature we obtain a comprehensive scale-dependent 3D
geometric feature set.
In order to find the intrinsic scale of a feature we search for local maxima of the normalized feature
response across a set of discrete scales, analogous to the 2D automatic scale selection method [24].
The derivatives are normalized to account for a decrease in the derivative magnitude as the normal
maps are increasingly blurred. We define the normalized first-order derivatives Nσs and Nσ
t as
Nσs = σγNσ
s and Nσt = σγNσ
t , (3.8)
25
where γ is a free parameter that is empirically set for each particular feature detector. The corre-
sponding normalized second-order derivatives are defined as
Nσss = σ2γNσ
ss and Nσtt = σ2γNσ
tt . (3.9)
Normalized feature responses are computed by substituting the normalized derivatives into the
corner and edge detectors presented in the previous two sections. The final scale-dependent geometric
feature set is constructed by identifying the points in the geometric scale-space where the normalized
feature response is maximized along the scale axis and locally in a spatial window. Figure 3.3
illustrates the scale-dependent geometric corners and edges of the hand model. The scale-dependent
geometric features accurately encode the geometric scale-variability and can clearly be used as a
unique representation of the underlying geometry.
3.5 Experiments
We evaluated the effectiveness and robustness of the proposed method for computing scale-
dependent geometric features on several 3D mesh models and range images. The method was
applied to 3 different 3D mesh models, in addition to the hand model shown in Figure 3.3. One
of these models, the Julius Caesar, is of disk topology and two, the armadillo and Buddha, have
a genus of zero. In addition, we applied the method to two pairs of range scans, separated by 24
degrees, of two unique objects.
3.5.1 Corner and Edge Detection on 3D Mesh Models
Figure 3.4 illustrates the corners and edges detected on the three models. The armadillo model
has appendages that were significantly distorted in the 2D representation and therefore multiple
normal maps were used. Additionally the large distortion at the boundaries of the armadillo was
accounted for using a complimentary parameterization. The set of scales used to detect the corners
and edges depend on the geometry of the model and were set empirically. Observe that the set
of corners is distributed across scales, and that the scale of a particular corner reflects the scale of
the underlying surface geometry. For instance the tip of Caesar’s nose is detected at the coarsest
scale, while the corners of the mouth are detected at a relatively finer scale. The edges are detected
along ridges and valleys of the 3D models existing at different scales, for example the edges on the
prominent creases of the Buddha’s robe, as well as edges along the finer details of the base.
26
Figure 3.4: Scale-dependent geometric corner and edge detection results on a disc topology model(Caesar) and two genus zero models (Buddha and armadillo). The corner and edges are accuratelydetected across different scales. The resulting scale-dependent geometric feature set encode the scalevariability of the underlying surface geometry resulting in a unique representation of each model.
3.5.2 Corner Detection on Range Images
Figure 3.5 illustrates corners detected on two pairs of range scans of two unique models, the Bud-
dha and dragon model, taken 24 degrees apart. Observe that again the set of corners is distributed
across all scales. The features display a high degree of repeatability, both in the location and scale.
For instance note the large number of coarse (red) corners repeated on the chest and stomach of the
Buddha model, as well as the large number of finer (blue and teal) corners repeated on the robe and
necklace. Also observe the consistency of the scale and localization of the corners detected along
the body of the dragon model. It should be noted that the corner detection was conducted on the
raw range data, and no preprocessing was applied to the data.
27
(a) (b)
Figure 3.5: Scale-dependent geometric corner detection on two pairs of range images, separated by24 degrees. Despite the noise inherent to range image data, the corners are distributed across scalesand reflect the relative scale of the underlying geometric structures. Additionally, the corners displaya high degree of repeatability, both in location and scale, between two adjacent range images.
3.5.3 Noisy Surface Normals
We tested the resilience of our framework to noisy input data by applying Gaussian random
noise with standard deviation 0.05, 0.075, and 0.1 to the surface normals of the Julius Caesar 3D
mesh model. The features were detected with the identical parameters used to detect the original
set of corners on the Julius Caesar model. Figure 3.6(a) illustrates the results. Although fine scale
corners can arise from the input noise, the detected scale-dependent geometric corner set are highly
consistent with the those detected on the original model and localized accurately as compared to
the original results shown in Figure 3.4.
3.5.4 Varying Sampling Densities
We demonstrate the independence of our framework from surface sampling density by computing
scale-dependent geometric corners on three simplified Julius Caesar mesh models. Specifically, we
applied a surface simplification algorithm [9] to construct Julius Caesar models with 30, 000, 20, 000
and 10, 000 faces from the original model with 50, 000 triangle faces. Corners were detected at each
sampling density using the parameters from the original experiment (Figure 3.4). Figure 3.6(b)
illustrates the results. Although the number of faces changes substantially, the location and scale of
the corners remain largely constant. This demonstrates that the density of the 2D representation of
28
(a) Models with input noise of 0.05, 0.075 and 0.1
(b) Models with 30,000, 20,000 and 10,000 faces
Figure 3.6: Scale-dependent geometric corner detection with the presence of noisy surface normals(a), and with varying surface sampling densities (b). When compared with the corners shown inFigure 3.4 the results demonstrate that the scale-dependent corners detected with our frameworkare largely invariant to significant input noise and variations in the sampling density.
the surface geometry ensures that the framework is independent of the surface sampling.
29
4. Scale-Dependent/Invariant Local 3D Shape Descriptors
Once we detect scale-dependent features on a 3D mesh model or range image via geometric
scale-space analysis we may define novel local shape descriptors that naturally encode the scale
of the underlying geometric structures. In particular we derive both scale-dependent and scale-
invariant local 3D shape descriptors, which retain the geometric scale variability as a hierarchical
representation or achieve scale-invariance, respectively. We demonstrate the effectiveness of our
framework of encoding the scale-variability of geometric structures in scale-dependent/invariant
local 3D shape descriptors in a method to automatically register a number of models, represented
by a set of range images, of varying geometric complexity. We further demonstrate the effectiveness
of our framework by fully automatically registering a set of range images corresponding to multiple
3D models simultaneously.
4.1 Exponential Map
There are a wide variety of 2D shape descriptors that have been previously proposed [16, 8,
3, 37, 36] for the purposes of 3D object recognition, registration navigation and more. Many of
these approaches suffer from a number of limitations, such as sensitivity to the sampling density
of the underlying geometry or being sensitive to slight perturbations in the localization of the
surface descriptor. In order to overcome these limitations we chose to encode each scale-dependent
feature and its spatial extent in a dense and regular 2D shape descriptor. However, unlike past
approaches [44] we did not want to rely on 2D embedding techniques that are sensitive to the local
surface patching being encoded. Additionally, we require a 2D shape descriptor that is repeatable
in order to determine surface correspondences between pairs of 3D models accurately.
We construct both our scale-dependent and scale-invariant local surface descriptors by mapping
the local neighborhood of a feature to a 2D domain using the exponential map. The exponential
map is a mapping from the tangent space of a surface point to the surface itself [2]. Specifically,
given a unit vector w lying on the tangent plane at a point u there is a unique geodesic Γ on the
surface such that Γ(0) = u and Γ′(0) = w. The exponential map takes a vector w on the tangent
plane and maps it to the point on the geodesic curve at a distance of 1 from u, or exp(w) = Γ(1).
Following this, any point v in the local neighborhood of u can be mapped to u’s tangent plane by
30
determining the unique geodesic between u and v and computing the geodesic distance and polar
angle of the tangent to the geodesic at u relative to a fixed basis {e1, e2} on u’s tangent plane.
The exponential map has a number of properties that are attractive for constructing a 2D shape
descriptor. First, it is known that for each point on a 3D surface the exponential map is defined
and differential for some neighborhood [2]. Although fold-overs may occur if this neighborhood
is too large, the local nature of the scale-dependent and scale-invariant descriptors implies this
will rarely happen. In practice we have witnessed fold-overs on an extremely small number of
features, mostly near points of depth discontinuity. Although the exponential map is not, in general,
isometric the geodesic distance of radial lines from the feature point are preserved [11]. This ensures
that corresponding features detected in the geometric scale-space will have consistent 2D shape
descriptors. Additionally, because the exponential map is defined relative to the interest point, it
has an inherent robustness to the boundary of the neighborhood being encoded.
4.2 Scale-Dependent Local 3D Shape Descriptors
We construct a 2D scale-dependent local shape descriptor for a feature detected at u and at
scale σ by mapping each point v in the neighborhood of u to a 2D domain using the geodesic polar
coordinates G defined as
G(u,v) = (d(u,v), θT (u,v)), (4.1)
where again d(u,v) is the geodesic distance between u and v and θT (u,v) is the polar angle of the
tangent of the geodesic between u and v, defined relative to a fixed bases {e1, e2}. In practice we
approximate this angle by orthographically projecting v onto the tangent plane of u and measuring
the polar angle of the intersection point.
After mapping each point in the local neighborhood of u to its tangent plane we are left with
a sparse 2D representation of the local geometry around u. In order to construct a dense 2D
descriptor we interpolate over a geometric entity encoded at each vertex to construct a dense and
regular representation of the neighborhood of u at scale σ. We choose to encode the surface normals
from the base normal map, rotated such that the normal at the center point u points in the positive
z direction. The resulting dense 2D descriptor is invariant up to a single rotation. We resolve
this by aligning the principal curvature directions at u to the horizontal axis in the geodesic polar
coordinates, resulting in a rotation invariant shape descriptor. Once this local basis has been fixed
we re-express each point in terms of the normal coordinates, with the feature point u at the center
31
Figure 4.1: Scale-dependent features and local shape descriptors in the geometric scale-space of arange image. Features are colored and according to the scale in which they were detected, with redbeing the coarsest and blue being the most fine. A subset of local shape-dependent shape descriptorsis illustrated that form a hierarchical representation of the underlying scale-variability in the surfaceof a range image and enables correspondences to be determined robustly between pairs of rangeimages.
of the descriptor.
The radius of the descriptor is set proportional to the scale σ to encode the inherent scale of
different features. We refer to this dense 2D scale-dependent descriptor as Gσu for a feature at u
and at scale σ. Figure 4.1 shows an example of a set of scale-dependent local 3D shape descriptors
detected in the geometric scale-space of two Buddha range images. The local scale-dependent 3D
shape descriptors form a hierarchical representation of the underlying surface geometry.
4.3 Scale-Invariant Local 3D Shape Descriptors
The scale-dependent local 3D shape descriptors, described in the previous section are appropriate
only when the global scales between a pair of 3D mesh models or range images are the same or are
known. This happens for instance when we know that the range images are captured with the same
range finder. In order to enable comparison between 3D mesh models or range images that do not
have the same global scale we also derive a scale-invariant local 3D shape descriptor Gσu.
32
The key insight into our scale-invariant local 3D shape descriptor is the fact that the scale
of local-geometric structures relative to the global scale of a mesh model or range image remains
constant as the global scale of a range image is altered. This enables us to construct a set of scale-
invariant local 3D shape descriptors by first building a set of scale-dependent local shape descriptors
and then normalizing each descriptor to a constant radius. Such a scale-invariant representation of
the underlying geometric structures enables us to establish correspondences between a pair of range
images even when the global scale is different and unknown.
4.4 Pairwise Matching of Scale-Dependent/Invariant Local 3D Shape Descriptors
Scale-dependent and scale-invariant shape descriptors contain a wealth of information about the
scale of local geometric structures that can be exploited in robust algorithms to find the pairwise
transformation between two 3D mesh models or range images. In particular, we show how the scale-
dependent shape descriptors form a hierarchical representation of the geometric structures that can
be leveraged in a coarse-to-fine matching algorithm between a pair of mesh models or range images
with the same global scale. We also demonstrate how the scale-invariant local shape descriptors
can be used to effectively establish correspondences between a pair of range images with completely
different global scales.
4.4.1 Similarity of Scale-Dependent and Scale-Invariant Descriptors
We first need a measure of the similarity between two scale-dependent or two scale-invariant de-
scriptors. Since each descriptor is a dense 2D image of the surface normals in the local neighborhood
we may define the similarity as a function of the average angle between two corresponding normals,
S(Gσu1
,Gσu2
) =π
2− 1|A ∩B|
∑v∈A∩B
arccos(Gσu1
(v) ·Gσu2
(v)), (4.2)
where A and B are the set of points in the domain of Gσu1
and Gσu2
, respectively. Although this
similarity measure is defined in terms of the scale-dependent descriptors the definition corresponding
to the scale-invariant descriptors is equivalent with G substituted for G.
4.4.2 Pairwise Matching of Scale-Dependent Descriptors
The natural hierarchical representation of the scale-dependent local 3D shape descriptors can be
used in a robust algorithm for matching a pair of 3D objects (R1,R2), such as 3D mesh models or
33
Figure 4.2: Matching two range images with consistent global scale represented as a set of scale-dependent local shape descriptors. On the left we show the 67 point correspondences found withour matching algorithm and on the right the resulting rigid transformation. The hierarchy of thelocal scale-dependent 3D shape descriptors enables a coarse-to-fine sampling strategy that results ina large number of correspondences that accurately approximate the pairwise rigid transformation.
range images, with the same global scale. Note that if we know that the range images are captured
with the same range scanner, or if we know the metrics of the 3D coordinates, i.e. centimeters, we
can safely assume that they have, or we can covert them to, the same global scale.
Once we have a set of scale-dependent local 3D shape descriptors for each 3D object, we construct
a set of possible correspondences by matching each descriptor to the n1 most similar. The consistency
of the global scale allows us to consider only those correspondences at the same global scale –
greatly decreasing the number of correspondences that must be later sampled. We find the best
pairwise rigid transformation between the two 3D objects by randomly sampling this set of potential
correspondences and determining the one that maximizes the area of overlap between the two 3D
objects, similar to RANSAC [6]. However, rather than sampling the correspondences at all scales
simultaneously, we instead sample in a coarse-to-fine fashion, beginning with the descriptors detected
at the coarsest scale in the geometric scale-space and ending with descriptors detected in the most
fine. This enables us to quickly determine a rough alignment between two 3D objects, as there are,1In our experiments n is set in the range of 5 ∼ 10.
34
in general, far fewer features at coarser scales.
For each scale σi we randomly construct N · σi sets of 3 correspondences, where each corre-
spondence has a scale between σ1 and σi. For each correspondence set C we approximate a rigid
transformation T , using the method proposed by Umeyama [40], and then add to C all those corre-
spondences (uj ,vj , σj) where ‖ T ·R1(uj) −R2(vj) ‖≤ α and σj ≤ σi. Throughout the sampling
process we keep track of the transformation and correspondence set that yield the maximum area
of overlap. Once we begin sampling the next finest scale σi+1 we initially test whether the corre-
spondences at that scale improve the area of overlap of the best rigid transformation. This allows us
to quickly add a large number of correspondences at the finer scales efficiently without an excessive
number of samples being drawn.
Figure 4.2 shows the results of applying our pairwise matching algorithm to two range images
taken of the Buddha model. The number of correspondences is quite large and the correspondences
are distributed across all scales. Although this is an initial approximate alignment, the large corre-
spondence set enables the pairwise rigid transformation to be accurately approximated without an
excessive number of samples.
4.4.3 Pairwise Matching of Scale-Invariant Descriptors
The scale-invariant local 3D shape descriptors detected in the geometric scale-space are a fully
scale-invariant representation of the underlying local surface geometry that enable us to match a
pair of 3D objects (R1,R2) with different global scales. In this case we may register the 3D objects
while estimating the global scale differences.
Our algorithm for matching a pair of 3D mesh models or range images (R1,R2) with completely
different global scales is similar to the one proposed in the previous section for scale-dependent
local descriptors. However, since we no longer know a priori their relative global scales we must
consider the possibility that a feature detected in the geometric scale-space of a 3D mesh model
or range image may be detected at any scale in the second 3D mesh model or range image. Our
algorithm proceeds by first constructing a potential correspondence set that contains, for each scale-
invariant local 3D descriptor detected in the first 3D object R1, the n most similar in the second
object R2. We find the best pairwise similarity transformation by applying RANSAC to randomly
sample this correspondence set. For each iteration the algorithm approximates the 3D similarity
transformation [40] and computes the area of overlap. The transformation which results in the
maximum area of overlap is considered the best.
35
Figure 4.3: Matching two range images with inconsistent global scales represented with a set of scale-invariant local shape descriptors. On the left we show the 24 point correspondences found with ourmatching algorithm and on the right the resulting similarity transformation. Even though the rangeimages differ by a global scale factor of approximately 2.4 the scale-invariant local descriptors enableus to recover the similarity transformation accurately.
Figure 4.3 shows the result of applying our algorithm to two views of the Buddha model with a
relative global scale difference of approximately 2.4. Even though there is a considerable difference
in the relative global scales, our scale-invariant representation, composed of scale-invariant local
descriptors, enables us to recover the similarity transformation quite accurately without any initial
assumption about the models or their global scales.
4.5 Experiments
In this section we demonstrate the effectiveness of our framework for constructing and matching
scale-dependent/invariant local 3D shape descriptors by deriving a fully automatic range image
registration algorithm. We show that the scale-dependent and scale-invariant descriptors can be used
to register a set of range images both with and without global scale variations. The effectiveness of
our algorithm is shown by registering a number of models of varying geometric complexity. In fact
we show that we can register a mixed set of range images corresponding to multiple 3D models
simultaneously and fully automatically.
36
Figure 4.4: Fully automatic approximate registration of 15 views of the Buddha model, 12 viewsof the armadillo model and 18 views of the angel model with scale-dependent local descriptors. Inthe first column we show the initial set of range images. In the second we show the approximateregistration obtained with our framework, which is further refined with ICP in the third column.Finally a water tight model is built using a surface reconstruction algorithm [18]. Observe that theinitial approximation obtained with our framework is quite accurate.
4.5.1 Registration of Range Images
A 3D computer model is built from an object using 3D acquisition hardware, such as a laser
scanner, by taking a number of scans, represented as range images, of different views of the object.
Combining these range images is not straightforward, as each reside in their own coordinate system.
In order to construct the final 3D model, each range image must be placed in a consistent coordinate
system, a problem referred to as range image registration. In the past range image registration
techniques have relied on human input to give a rough alignment between the range images by
manually clicking on corresponding points in the set of range images. After the initial manual
37
alignment, refinement algorithms, such as iterative closest point [1, 39] (ICP), are then applied to
create the final set of registered range images.
Although final refinement algorithms, such as ICP, have proved to be robust and effective in
creating a final object from an initial alignment, requiring a person to manually determine corre-
spondences between potentially hundreds of range image for this initial alignment has lead to a
need for fully automatic range image registration algorithms. One approach to this problem is to
represent each range image with a global descriptor and perform the automatic registration between
all range images simultaneously [26]. Other approaches have differentiated between the local reg-
istration phase, where local features are used to determine correspondences between pairs of range
images, and a global registration phase, where the pairs of matching range images are combined
for a single approximate registration [15, 22]. Although such approaches have proved promising,
automatically registering models accurately remains a difficult and open problem.
Given a set of range images {R1, ...,Rn} our fully automatic approximate range image registra-
tion algorithm first constructs the geometric scale-space of each range image. Scale-dependent fea-
tures are detected at discrete scales and then combined into a single comprehensive scale-dependent
feature set, where the support size of each feature follows naturally from the scale in which it was
detected. Each feature is encoded in either a scale-dependent or scale-invariant local shape descrip-
tor, depending on whether the input range images have a consistent global scale. We then apply
the appropriate matching algorithm, presented in the previous sections, to each pair of range images
in the inputted set to recover the pairwise transformations. We augment each transformation with
the area of overlap resulting from the transformation. Next we construct a model graph [15], where
each range image is represented with a vertex and each pairwise transformation and area of overlap
is encoded in a weighted edge. We prune edges with an area of overlap less then ε. In order to
construct the final set of meshes {M1, ...,Mm} we compute the maximum spanning tree of the
model graph and build a single mesh from each connected component.
The alignment obtained by our algorithm is only an initial registration that can be further refined
by applying a global registration algorithm, such as ICP [1], to form a fully automatic range image
registration algorithm.
4.5.2 Registration of Range Images with Consistent Global Scale
Figure 4.4 illustrates the results of applying our framework independently to 15 views of the
Buddha model, 12 views of the armadillo model, and 18 views of the angel model, with con-
38
Figure 4.5: Automatic registration of 42 range images, 15 views of the Buddha model, 12 viewsof the armadillo and 15 views of the dragon model. The accuracy of the approximate registrationfound with our framework enables us to automatically discover that the range images correspondto three disjoint models. Note that these registrations have not been post-processed with a globalregistration algorithm.
sistent global scales. Scale-dependent local shape descriptors were detected at 5 discrete scales,
σ = {0.5, 1, 1.5, 2, 2.5}, in the geometric scale-space. In the first column we show the initial input.
Each range image is colored differently to visualize the quality of the registration. The second col-
umn illustrates the approximate alignment by by our framework. Although we are not making any
assumptions about the initial pose we are able to obtain an accurate initial alignment. In the third
column we show the resulting registration after applying ICP. After applying ICP we measured the
average distance between each vertex in the initial pose estimated by our algorithm and the final
pose computed by ICP. We found the resulting average distances for the armadillo, Buddha and
angel models, relative to the diameter of the models, to be 0.169%, 0.294% percent and 1.16%,
respectively. This shows that although we are not making any assumptions about the initial pose we
are able to obtain an accurate initial alignment. In the final column we show the watertight model
obtained after applying a surface reconstruction algorithm [18].
In our next experiment we demonstrate the ability of our framework to register range images
corresponding to multiple 3D models simultaneously. In order to register the models simultaneously
we prune the edges on the model graph that correspond to transformations with an area of overlap
less than some threshold. In practice we found this threshold easy to set as our framework results
in approximate alignments that are quite accurate. Figure 4.6 summarizes the results. In the first
column we illustrate the initial set of range images; 15 views of the Buddha model, 12 views of
the armadillo, and 15 views of the dragon model. The remainder of the figure illustrates the three
disjoint approximate registrations obtained with our framework. Note that no global registration
39
Figure 4.6: Fully automatic approximate registration of 15 views of the Buddha and dragon modelseach with a random global scaling from 1 to 4. For each model we visualize the initial set of rangeimages and the initial alignment obtained by our framework. In column three we show the resultsafter applying ICP and in column four we show the results of the surface reconstruction. Even withthe substantial variations in the global scale, our scale-invariant representation of the underlyinggeometry enables us to accurately approximate the registration without any assumptions about theinitial poses.
algorithm has been applied to these results.
4.5.3 Registration of Range Images with Inconsistent Global Scale
Next we demonstrate the effectiveness of our framework for fully automatically registering a
number of range images with unknown global scales. Figure 4.6 illustrates the results of applying
our framework to 15 views of the Buddha and dragon models. Each range image was globally scaled
by a random factor between 1 and 4. For each model we illustrate the initial set of range images on
the left and the approximate pose obtained with our framework. Each pair of range images for the
Buddha and dragon had an average scale difference of 1.46 and 1.23, respectively. For each pair of
adjacent range images we found the mean difference between the ground truth scale and that found
by our algorithm to be 0.016 for the dragon and 0.004 for the Buddha model. This demonstrates
that even with substantial variations in the global scale, our scale-invariant representation of the
underlying geometry enables us to accurately approximate the registration without any assumptions
about the initial poses or global scales.
Lastly, Figure 4.7 illustrates the results of applying our framework to 42 range images corre-
sponding to three different models that have been randomly scaled by a factor between 1 and 4.
Again, despite the significant scale variations, our scale-invariant representation of the underlying
40
Figure 4.7: Automatic registration of 42 randomly scaled range images, 15 views of the Buddhamodel, 12 views of the armadillo and 15 views of the dragon model, each. Each range imagewas randomly scaled by a factor between 1 and 4. Again, despite the significant scale variations,our scale-invariant representation of the underlying local geometric structures enables us to fullyautomatically construct initial pose estimates for all three models simultaneously.
local geometric structures enables us to fully automatically construct initial pose estimates for all
three models simultaneously.
41
5. Conclusions
We close this thesis with a summary of our comprehensive framework for representing the local
scale-variations of a 3D model with scale-dependent/invariant geometric features, encoded with novel
shape descriptors. First we summarize our work and the significant contributions of the thesis. We
conclude with a discussion of a few applications for which we believe our framework is well suited.
5.1 Summary
In this thesis we present a comprehensive framework for modeling the scale variability of local ge-
ometric structures and representing that variability in scale-dependent/invariant geometric features
and shape descriptors. This was accomplished by first representing the inputted geometry with a
normal map - a dense and regular image of the surface normals of the original model’s surface. This
novel 2D representation allowed us to leverage 2D scale-space theory in order to develop an effective
scale operator that builds the geometric scale-space of a 3D shape. The geometric scale-space is a
four dimensional representation of three-dimensional data, where the fourth dimension is a measure
of the smoothing of the model’s surface. Next, we derived novel first- and second- order partial
derivatives on the normal map which were used to detect scale-dependent 3D features, in particular
corners and edges. The effectiveness of the scale-dependent feature set to represent the scale and
geometric properties of the underlying local surface structures was demonstrated on a number of
range images and 3D models of varying topology. Furthermore we demonstrated the robustness of
these features to noisy input data and changes to the sampling density of the surface.
Next, we demonstrated how augmenting local 3D geometric features with their natural scale
enables us to automatically determine their spatial extent to encode in local shape descriptors. In
particular we derived novel scale-dependent and scale-invariant local shape descriptors. Both the
scale-dependent and scale-invariant local shape descriptors were built by leveraging the exponential
map, which enables us to represent the spatial extent of a local geometric feature in a manner that is
independent of the surface sampling density and is invariant to the pose of the model. Furthermore
we demonstrated how the scale-dependent local shape descriptors form a hierarchical representation
of the underlying local geometric structures that can be leveraged in an efficient top down pairwise
matching algorithm. We also demonstrated how the scale-invariant local shape descriptors form a
42
fully scale-invariant representation of the local geometric structures that can match two 3D models
with significantly different global scales. Finally, the scale-dependent and scale-invariant local shape
descriptors were used in a fully automatic range image registration algorithm that was able to register
sets of range images corresponding to multiple models simultaneously.
5.2 Contributions
In this thesis we have made the following contributions.
• A comprehensive framework for modeling the scale-variability of local structures in 3D geo-
metric data. The key is the construction of the geometric scale-space, a general representation
of the implicit surface geometry of the local geometric at different scales. The scale operator
is defined on the geodesic distance function, which avoids potential pitfalls of the Euclidean
distance and ensures that the geometric scale-space is an accurate representation of the surface
geometry at different scales.
• Scale-dependent features detected in the geometric scale-space of a 3D model. The geometric
scale-space of a 3D model serves as a general representation from which many feature types
can potentially be derived. We presented a derivation of the first- and second- order partial
derivatives that were used specifically in novel corner and edge detectors. Additionally we
presented an automatic scale-selection technique that enables the intrinsic scale of each feature
in the geometric scale-space to be determined automatically.
• Novel scale-dependent local shape descriptors that form a hierarchical representation of the
underlying geometric structures. The key is that by leveraging our framework for constructing
the geometric scale-space we are able to automatically determine the spatial extent of a feature
to encode in the local scale-dependent shape descriptor. The shape descriptor itself is an
improvement over past methods in that it is independent of the sampling density of the original
object and is also invariant to pose changes to the model. An effective hierarchical algorithm
for matching two 3D models represented with a set of scale-dependent local shape descriptors
was presented.
• Novel scale-invariant local shape descriptors that together form a scale-invariant representation
of the underlying geometric structures. By leveraging the geometric scale-space we normalize
the spatial extent of each feature to form a scale-invariant representation of the underlying
43
geometry. This representation allows us to match two pieces of geometry even if they have
drastically different global scales.
• Fully automatic registration algorithm for multiple range images with or without consistent
global scale. We presented a fully automatic range image registration algorithm that was shown
to accurately register a set of range images corresponding to multiple objects simultaneously.
44
Bibliography
[1] P. J. Besl and N. D. McKay. A Method for Registration of 3D Shapes. IEEE Trans. on PatternAnalysis and Machine Intelligence, 14(2):239–256, 1992.
[2] M. P. Do Carmo. Differential Geometry of Curves and Surfaces. Prentice Hall, 1976.
[3] C.S. Chua and R. Jarvis. Point Signatures: A New Representation for 3D Object Recognition.Int’l Journal of Computer Vision, 25(1):63–85, October 1997.
[4] M. Desbrun, M. Meyer, P. Schroder, and A. H. Barr. Implicit Fairing of Irregular Meshes UsingDiffusion and Curvature Flow. In SIGGRAPH, pages 317–324, New York, NY, USA, 1999.ACM Press/Addison-Wesley Publishing Co.
[5] M. Eck, T. DeRose, T. Duchamp, H. Hoppe, M. Lounsbery, and W. Stuetzl. MultiresolutionAnalysis of Arbitrary Meshes. In EUROGRAPHICS 2002, pages 209–218, 2002.
[6] M A. Fischler and R. C. Bolles. Random Sample Consensus: A Paradigm for Model Fitting withApplications to Image Analysis and Automated Cartography. Commun. ACM, 24(6):381–395,1981.
[7] M.S. Floater and K. Hormann. Surface Parameterization: A Tutorial and Survey. In M. S.Floater and M. A. Sabin, editors, Advances in Multiresolution for Geometric Modelling, pages157–186. Springer-Verlag, 2005.
[8] A. Frome, D. Huber, R. Kolluri, T. Bulow, and J. Malik. Recognizing Objects in Range DataUsing Regional Point Descriptors. In European Conf. on Computer Vision, May 2004.
[9] M. Garland and P. Heckbert. Surface Simplification Using Quadric Error Metrics. In ACMSIGGRAPH, pages 209–216, 1997.
[10] N. Gelfand, N.J. Mitra, L.J. Guibas, and H. Pottmann. Robust Global Registration. In Sym-posium on Geometry Processing, 2005.
[11] A. Gray. Modern Differential Geometry of Curves and Surfaces with Mathematic. CRC Press,1998.
[12] X. Gu, S. Gortler, and H. Hoppe. Geometry Images. In ACM SIGGRAPH, pages 355–361,2002.
[13] C.G. Harris and M.J. Stephens. A Combined Corner and Edge Detector. In Fourth AlveyVision Conference, pages 147–151, 1988.
[14] Hugues Hoppe. Progressive Meshes. In SIGGRAPH, pages 99–108, New York, NY, USA, 1996.ACM.
[15] D. Huber and M. Hebert. Fully Automatic Registration of Multiple 3D Data Sets. Image andVision Computing, 21(7):637–650, July 2003.
[16] A. Johnson and M. Hebert. Using Spin-Images for Efficient Multiple Model Recognition inCluttered 3-D Scenes. IEEE Trans. on Pattern Analysis and Machine Intelligence, 21(5):433–449, 1999.
[17] T. R. Jones, F. Durand, and M. Desbrun. Non-Iterative, Feature-Preserving Mesh Smoothing.In SIGGRAPH, pages 943–949, New York, NY, USA, 2003. ACM.
45
[18] M. Kazhdan, M. Bolitho, and H. Hoppe. Poisson Surface Reconstruction. In EurographicsSymp. on Geometry Processing, pages 61–70, Aire-la-Ville, Switzerland, Switzerland, 2006.Eurographics Association.
[19] J.J. Koenderink. The Structure of Images. Biological Cybernetics, 50:363–370, 1984.
[20] J. Lalonde, R. Unnikrishnan, N. Vandapel, and M. Hebert. Scale Selection for Classification ofPoint-sampled 3-d Surfaces”. In Int’l Conf. on 3-D Digital Imaging and Modeling, 2005.
[21] C.H. Lee, A. Varshney, and D.W. Jacobs. Mesh Saliency. ACM Trans. on Graphics, 24(3):659–666, 2005.
[22] X. Li and I. Guskov. Multi-scale Features for Approximate Alignment of Point-based Surfaces.In Symposium on Geometry Processing, 2005.
[23] T. Lindeberg. Scale-Space Theory in Computer Vision. Kluwer Academics Publishers, 1994.
[24] T. Lindeberg. Feature Detection with Automatic Scale Selection. Int’l Journal of ComputerVision, 30:77–116, 1998.
[25] D. G. Lowe. Distinctive Image Features from Scale-Invariant Keypoints. IJCV, 60(2):91–110,2004.
[26] A. Makadia, A. IV Patterson, and K. Daniilidis. Fully Automatic Registration of 3D PointClouds. In CVPR, pages 1297–1304, Washington, DC, USA, 2006. IEEE Computer Society.
[27] D. Marr and E. Hildreth. Theory of Edge Detection. Proc. Royal Society London, 207:187 –217, 1980.
[28] K. Mikolajczyk and C. Schmid. Scale & Affine Invariant Interest Point Detectors. IJCV,60(1):63–86, 2004.
[29] K. Mikolajczyk and C. Schmid. A Performance Evaluation of Local Descriptors. PAMI,27(10):1615–1630, 2005.
[30] F. Mokhtarian, N. Khalili, and P. Yuen. Cuvature Computation on Free-Form 3-D Meshes atMultiple Scales. Computer Vision and Image Understanding, 83:118–139, 2001.
[31] J. Novatnack and K. Nishino. Scale-Dependent 3D Geometric Features. In IEEE Int’l Conf.on Computer Vision. IEEE Computer Society, 2007.
[32] J. Novatnack, K. Nishino, and A. Shokoufandeh. Extracting 3D Shape Features in DiscreteScale-Space. In Third Int’l Symposium on 3D Data Processing, Visualization and Transmission,2006.
[33] M. Pauly, R. Keiser, and M. Gross. Multi-scale Feature Extraction on Point-sampled Surfaces.EUROGRAPHICS, 21(3), 2003.
[34] M. Pauly, L. P. Kobbelt, and M. Gross. Point-Based Multi-Scale Surface Representation. ACMTrans. on Graphics, 25(2), 2006.
[35] M. Schlattmann. Intrinsic Features on Surfaces. In Central European Seminar on ComputerGraphics, pages 169–176, 2006.
[36] L. J. Skelly and S. Sclaroff. Improved Feature Descriptors for 3-d Surface Matching. Proc. SPIEConf. on Two- and Three-Dimensional Methods for Inspection and Metrology, 6762:63–85, 2007.
[37] Y. Sun and M.A. Abidi. Surface Matching by 3D Point’s Fingerprint. ICCV, 2:263–269, 2001.
46
[38] G. Taubin. A Signal Processing Approach to Fair Surface Design. In ACM SIGGRAPH, pages351–358, 1995.
[39] G. Turk and M. Levoy. Zippered Polygon Meshes from Range Images. In SIGGRAPH, pages311–318, New York, NY, USA, 1994. ACM.
[40] Shinji Umeyama. Least-Squares Estimation of Transformation Parameters Between Two PointPatterns. IEEE Trans. Pattern Anal. Mach. Intell., 13(4):376–380, 1991.
[41] J. Weickert, S. Ishikawa, and A. Imiya. Linear Scale-Space has First been Proposed in Japan.Journal of Mathematical Imaging and Vision, 10(3):237–252, 1999.
[42] A.P. Witkin. Scale-Space Filtering: A New Approach to Multi-Scale Description. In IEEE Int’lConf. on Acoustics, Speech, and Signal Processing, pages 150–153, 1984.
[43] S. Yoshizawa, A. Belyaev, and H-P. Seidel. A Fast and Simple Stretch-Minimizing MeshParametrization. In Int’l Conf. on Shape Modeling and Applications, pages 200–208, 2004.
[44] D. Zhang and M. Hebert. Harmonic Maps and Their Applications in Surface Matching. InIEEE Int’l Conf. on Computer Vision and Pattern Recognition, volume 2, 1999.