morphable surface models - cbclcbcl.mit.edu/publications/ps/shelton-ijcv-2000.pdf · 2005-06-15 ·...

International Journal of Computer Vision 38(1), 75–91, 2000c© 2000 Kluwer Academic Publishers. Manufactured in The Netherlands.

Morphable Surface Models

CHRISTIAN R. SHELTONCenter for Biological and Computational Learning, Artificial Intelligence Laboratory, M.I.T.,

Cambridge, MA, USA

Abstract. We describe a novel automatic technique for finding a dense correspondence between a pair ofn-dimensional surfaces with arbitrary topologies. This method employs a different formulation than previous cor-respondence algorithms (such as optical flow) and includes images as a special case. We use this correspondencealgorithm to build Morphable Surface Models (an extension of Morphable Models) from examples. We present amethod for matching the model to new surfaces and demonstrate their use for analysis, synthesis, and clustering.

Keywords: computer vision, learning, correspondence, morphable models, surface matching

1. Introduction

The goal of this paper is to describe a general methodfor learning models of surface classes without user in-tervention. The technique works on surfaces of any di-mension embedded in a Euclidean space and is not spe-cific to any particular modality. The Morphable SurfaceModels of this paper are a generalization of MorphableModels (described below).

The key problem in building such models is findingcorrespondences between surfaces. We define a corre-spondence to be a relation from points on one objectto their matching points on the other object. In im-ages such relations are often called flow fields and havefound a wide variety of uses. Here we extend the no-tion from images to surfaces in general1 and employ thecorrespondence algorithm to build Morphable SurfaceModels.

The remainder of this introduction describes previ-ous work in Morphable Models and surface matching.Section 2 describes the correspondence algorithm byfirst describing the minimization problem, then the al-gorithm for performing the minimization, and finallythe role of surface simplification. In Section 3, we de-scribe how to use the correspondence algorithm to buildmodels. To do so, we introduce a method for matchingthe models to surfaces, an inner product on the spaceof correspondences, and how to apply bootstrapping.Section 4 describes experimental results obtained by

building models of various types of surfaces and givesan illustration of the steps of the correspondence al-gorithm. Lastly, Section 5 gives some conclusions andpossible extensions.

1.1. Morphable Models

It is known that linear models in the image space aregenerally poor models for classes of images. Sucha model describes members of the class as alpha-blendings of the examples. This in general producespoor results. Unless the images align pixel-for-pixel,the alpha-blending of two images will not produce athird image in the same class of images: instead thesynthesized image will be two “ghosts” of the com-bined images.

The difficulty lies in the fact that every pixel of thenew image is a linear combination of the pixels at thesame location in the other images. These other pix-els may not be related to each other and thus takinga linear combination of them will produce nothing ofworth. However, if we can match “corresponding” pix-els across the set of example images, then linear combi-nations of these corresponding pixels will work betteras a model (Beymer and Poggio, 1996).

For this paper we define a correspondence to be a re-lation between two objects that maps each of the pointson one object to its “corresponding” point on the other

76 Shelton

object (where we appeal to intuition for the definition ofcorresponding). For images, a correspondence is sim-ply a flow field from one image to another. MorphableModels are constructed from a set of example imagesby finding flow fields from an arbitrary base exampleimage to all of the other example images. As a gener-ative model, the output is the base image warped by alinear combination of the flow fields. The parametersof the model are the weights of the linear combination.The pixel values aligned by the correspondences mayalso be combined linearly to produce an image that isnot only a combination of the “shapes” of the examples(the flow fields) but also the “texture” of the examples(the pixel values).

In this paper, we consider a more general morphablemodel. Instead of dealing only with images, we al-low the objects to be surfaces of arbitrary dimensionembedded in Euclidean space. Images are a particu-lar example of such a surface (a gray-scale image is atwo-dimensional manifold in three-dimensional space:one dimension for each image axis and one dimensionfor intensity). Thus we will build a general correspon-dence technique for any pair of surfaces (replacing theflow field algorithm for images) and, from that, a gen-eral technique for building morphable models of arbi-trary surfaces. At the end of this paper, we will showsome simple examples of the uses of Morphable Sur-face Models.

1.2. Related Work

Morphable Models have been applied mainly to im-ages. Jones and Poggio (1998) has a good descriptionof their image method and related image-based tech-niques. We would refer to their introduction as the bestdescription of the Morphable Model literature. Briefly,the result that new views of a 3D object can be syn-thesized linearly from three sample views (Ullman andBasri, 1991; Shashua, 1992) spurred the creation ofmodels of image classes as linear combinations of ex-amples (Poggio and Vetter, 1992; Vetter and Poggio,1995; Beymer et al., 1993; Beymer and Poggio, 1996).

Since Jones and Poggio (1998), Vetter et al. (1997)developed a bootstrapping technique (used and re-described in this paper) for better automatic construc-tion of Morphable Models. Recently, Blanz and Vetter(1999) extended Morphable Models to 3D shapes bydescribing the shape as its projection onto a 2D surface(which can then be unrolled and treated as an image).From there they were able to use Morphable Models

to reconstruct 3D shape from a single view. Kang andJones also have used such a projection technique toconstrain reconstruction.

Active Appearance Models (Cootes et al., 1998) alsotry to match images but require user specified cor-respondences. Deformable Intensity Surfaces (Nastaret al., 1996) treat images as surfaces and use an au-tomatic matching technique not dissimilar to the onedescribed here. However, the topology of the surfaceis essentially limited to that of a grid. Active ContourModels (or Snakes) (Kass et al., 1988; McInerney andTerzopoulos, 1995) are also similar to the work in thispaper. They operate on surfaces of arbitrary dimensionsand topologies. However, they are used to match a sur-face to gradients in volumetric data instead of to othersurfaces.

In this paper we have used the term “n-dimensionalsurface” to mean a surface withn orthogonal tangentvectors at every point. In Active Contour Models thisphrase is taken to imply that the surface lies in ann-dimensional space (and the surface itself has a di-mensionality less thann). We have used the phrase “di-mensionality of the embedded space” to denote such aquantity.

The difficult part of building a morphable model andmany other modeling techniques is finding correspon-dences. We believe this is the first paper to proposean algorithm for automatically finding surface corre-spondences between surfaces with arbitrary topologiesdirectly. The energy minimization method here is basedon that described in Shelton (1998), has certain similar-ities to elastic networks (Durbin and Willshaw, 1987),and has energy terms related to the surface reconstruc-tion work of Hoppe and others (Hoppe et al., 1992,1993).

2. Matching Surfaces

To build a Morphable Surface Model, we must first beable to find the correspondences between the examplesurfaces. This will serve as the foundation for modelconstruction algorithm. We define an energy functionover all possible correspondence relations for whichsmaller values indicate better correspondences. We willthen give an algorithm for finding a minimum of thisenergy function which will hopefully correspond to agood match according to the metric encoded in theenergy function. Crucial to the practical success of thisminimization is the mesh simplification described inSection 2.4.

Morphable Surface Models 77

2.1. Energy Function

Let C be a function which maps points on one surface,A, to arbitrary points in space. In order forC to be agood correspondence fromA to B we propose it mustsatisfy three properties:

1. Similarity: For every pointa on the surfaceA, C(a)should be near or on the surfaceB.

2. Structure:C should distort the surfaceA as little aspossible. Put differently,C(A) should be as struc-turally similar toA as possible.

3. Prior Information:C should represent a plausibledeformation of the surface.

The first property states thatC should be a correspon-dence and actually match points onA to points onB.The second says that such a matching should not be ar-bitrary, but rather should attempt to keep the structureof the first surface. This will hopefully force match-ing of similar substructures fromA to those ofB andpreserve our intuitive notion of correspondence. Thelast term serves to enforce prior knowledge about validshapes (for example, we may wish to penalize surfaceswhich cross a lot and are very unsmooth as being un-likely results from any correspondence).

Our energy function will therefore have the follow-ing form:

E(C) = Esim(C)+ αEstr(C)+ βEpri(C) (1)

Esim measures how closelyC matches points onAto points onB. To ease notation, let us definePX (q) tobe the point on the surfaceX closest to the pointq. Wewill then defineEsim as

Esim(C) =∫

C(A)‖c− PB(c)‖2 dc

+∫B

∥∥b− PC(A)(b)∥∥2

db (2)

This is the sum of the integrals over each surface ofthe squared distance from points on that surface to theother surface.

The definitions of the remaining two terms of the en-ergy function will depend on prior information we haveabout surfaces. More sophisticated methods of measur-ing the distortion of the surface could be developed andused. However, we have found that connecting springsbetween points on the surface to be sufficient. In par-

ticular we will use “directional” springs that prefertheir original orientation (not just their original length).There are two reasons for this choice: First, they tend tominimize buckling of the surface more than “regular”springs (which actually encourage it) and second, theylead to a nice form for the minimization algorithm sincethey are quadratic. For a directional spring connectingthe two pointsp andq, the energy of that spring underthe correspondenceC is

Eds(p,q,C) = ‖(p− q)− (C(p)− C(q))‖2‖p− q‖ (3)

To create a tight surface and an energy function whichdoes not depend on the parameterization of the surface,we let

Estr(C) =∫A

Eds(a,a+ da,C) (4)

This corresponds to placing directional springs contin-uously over the entire surface.

If we let Es(p,q,C) be similarly defined for a nor-mal spring of rest length 0,

Es(p,q,C) = ‖C(p)− C(q)‖2‖p− q‖ (5)

in order to penalize discontinuous surfaces, we defineEpri to be

Epri(C) =∫A

Es(a,a+ da,C) (6)

The three terms ofE(C) do not play equal roles.The first,Esim, must be zero forC to be a true corre-spondence. The second,Estr, exists to guide and selectamongC functions which set the first term to zero. Thelast term,Epri, serves to smooth out any noise whichmay arise due to inaccurate models or an inability tofind the true correspondence; as a prior over surfaces,it models the fact that very unlikely correspondencesshould be discarded in favor of more likely ones whichperhaps do not satisfy the other requirements as well.

Thus, the weighting of the various terms inE(C)willbe set as follows:α will be initially large to enforce agood correspondence. Over time, it will be reduced toallow the algorithm to driveEsim to zero.β will be set toa small constant value which is inversely proportional

78 Shelton

to our confidence in our model and ability to find thecorrect correspondence.

2.2. Practical Instantiation of the Energy Function

The energy function as described in the previous sec-tion is not practical to use. The integrals, for mostsurfaces, are intractable to compute. In this paper, weconsider only piece-wise linear surfaces and can there-fore make a number of useful simplifications. For ad-dimensional surface (meaning that there ared or-thogonal tangent vectors at every point on the surface),the surface can be defined as an ordered collection ofvectors (the positions of the vertices of the surface) anda set of linear patches connectingd+1 vectors from thecollection. Hoppe et al. (1993) gives a more mathemat-ically concrete definition of such a structure (a tuple ofthe vertex positions and a simplicial complex). A tri-angulated mesh is an example of such a 2-dimensionalsurface.

The integrals inEsim are easily and well approxi-mated by a uniform stochastic sampling of their re-spective surfaces. Thus, we have

Esim = 1

n

∑c∈Sn(C(A))

‖c− PB(c)‖2

+ 1

n

∑b∈Sn(B)

∥∥b− PC(A)(b)∥∥2

(7)

where Sn(X ) is a set ofn points sampled from thesurfaceχ .

Esim is therefore independent of the parameteriza-tion of the surface. In other words, given two differenttessellations of the same surface,Esim will not change.We have not yet found similarly parameter-independentapproximations ofEstr andEpri which yield simple al-gorithms. Instead we approximate both by connectingadjacent vertices in the surface description with the ap-propriate springs. Since the springs defined above arenormalized by length, a given straight length of springwill be independent of how it is cut (i.e. how many sec-tions are used to describe it). Yet, past 1-dimensionalsurfaces, the energy terms as a whole will not be in-dependent of the tessellation used. However, we havefound it to work well in practice.

Estr = 1

|Ad(A)|∑

(p,q)∈Ad(A)Eds(p,q,C) (8)

Epri = 1

|Ad(A)|∑

(p,q)∈Ad(A)Es(p,q,C) (9)

where Ad(A) is the set of all pairs of adjacent verticesin the surface tessellationA.

2.3. Energy Minimization

Just as we limited the surface to be piece-wise linear, wewill limit the correspondence function to be piece-wiselinear. In particular, we will allowC to take arbitraryvalues at the vertices ofA and require it to be a linearinterpolation of the vertex values along the faces ofA. This means that applyingC to A simply involvesapplying the elements ofC to their respective verticesofA: no structural or topological changes need be madetoA.

With this parameterization ofC, Estr and Epri areclearly quadratic in the elements ofC. However,Esim

is not so well behaved due to the unsmooth nature of thefunctionP (whose first derivative is discontinuous). Wetherefore turn the algorithm into a dual optimization.First, we computePB(c) and PC(A)(b) for all of theaandb samples. We then fix them at which pointEsim

becomes quadratic (and thusE(C) is quadratic) andwe can solve this minimization forC in closed form.This gives a new set of positions for the vertices fromwhich we can repeat the sampling, calculation of theclosest points, and values ofC.

To be more concrete about the algorithm, let us firstnote that any point on the surface ofX can be describedas a convex combination of the positions of the verticesofX . For vertices (such asp andq in Estr andEpri), thecombination coefficients are all 0 except for a single 1.For points internal to a face (such asc or PC(A)(b)in Esim), the coefficients are all zero except ford + 1non-zero elements (which sum to 1 due to convexity)corresponding to thed+1 vertices of the face on whichthe point is located. These coefficients are sometimescalled the barycentric coordinates of the point (Hoppeet al., 1993). The barycentric coordinates are invariantto the transformations we are allowing forC. That is,the barycentric coordinates ofa in A are the same asthe barycentric coordinates ofC(a) in C(A).

We will now add the notation that any variable withan overscore is not a Euclidean vector in the embed-ded space of the surface, but a barycentric coordinatevector (a vector of the coefficients of the previous para-graph) with respect to the appropriate surface. Further-more, we will let the matrixA be the matrix whosecolumns are the vectors of the positions of the verticesof A. B will be similarly defined forB andC will bethe matrix whose columns are the positions of the ver-tices ofA under the correspondence. Thus ifA hasvA


vertices,B hasvB vertices, and they are embedded ina D-dimensional space,A andC areD × vA andB isD × vB.

By expandingE(C) by using the relationshipsb =Bb, a = Aa, andC(a) = Ca, we can then write thewhole energy function as

E(C)= 1

n

∑c∈Sn(C(A))

‖Cc− BPB(Cc)‖2

+ 1

n

∑b∈Sn(B)

‖Bb− CPC(A)(Bb)‖2

+ α

|Ad(A)|∑

( p,q)∈Ad(A)

‖(C − A)( p− q)‖2‖A( p− q)‖

+ β

|Ad(A)|∑

( p,q)∈Ad(A)

‖C( p− q)‖2‖A( p− q)‖ (10)

where, to carry the overscore notation further,S is aset of barycentric coordinate vectors (of points sam-pled uniformly over the surface as before) andP is thebarycentric coordinate vector of the closest point on thesurface. Note that if we fix a sampling over each surfaceand we fix the values ofP (they actually depend onC,but we are fixing them), then every term ofE is of theform wi ‖Cci − di ‖2 (wi is a scalar,C is a matrix ofvertex positions,ci is the barycentric coordinate vectorfor the constraint, anddi is the target position).

This optimization can be solved by converting it to asparse linear-least squares problem separately for eachdimension (Hoppe et al., 1993). However, we havefound that an improvement can be made at this stagebefore conversion to a linear system. At each step ofthe minimization, points are sampled from each surfaceand for each point, the closest point on the other surfaceis found. Then,E is minimized assuming that the goalis to minimize the distance from each sampled point tothe found closest point. However, a better match mightbe found if, instead of insisting that the point matchthe closest point, we relax and allow the point to matchany position on the closest face. This is closer to ourgoal of allowing the point to match anywhere on thesurface. Since the closest face is bounded, we have toapproximate this by minimizing the distance from thepoint to the plane of the closest face.

Thus, instead of minimizingwi ‖Cai − BPi ‖2 forpoints sampled fromA orwi ‖CPi − Bbi ‖2 for pointssampled fromB, we are minimizingwi ‖Ti Cai −Ti BPi ‖2 andwi ‖Ti CPi − Ti Bbi ‖2 whereTi is a matrixwhich projects out the tangent directions for the face onB involved in thei th constraint. Depending on whetherthe sampled or projected point onB falls on a vertex,

edge, or face, there could be from 0 tod orthogonal tan-gent vectors at that point onB. If we enumerate themas the unit length vectorsti,1, . . . , ti,m,

Ti = I −∑

m

ti,mtTi,m (11)

(If instead of allowing the point to be anywhere on theplane of the face, we wish to penalize moving fromthe closest point slightly, we can place a coefficient be-tween 0 and 1 in front of the sum in the above equationthus resulting in a slightly different distance metric.)

Since the constraints now have terms of the form(Ti Cai )

T (Ti Cai ), they cannot be converted into asparse linear least-squares system independently foreach dimension. Instead, we must solve for all dimen-sions simultaneously (unless allTi matrices are diago-nal). However, with a bit of careful manipulation, wecan derive a different sparse system to solve.

Our energy function now has the form

E =∑

i

wi (Ti Cai − qi )2 (12)

wherewi , Ti , ai , andqi are different depending on theconstraint: for the firstn constraints,wi = 1

n , ai is thebarycentric coordinate vector of a point sampled fromA, qi is the closest point onB to this sampled point, andTi is the tangent matrix at the pointqi . For the secondn constraints,wi = 1

n , qi is a sampled point onB, Ti isthe tangent matrix to this point, andai is the barycentriccoordinate vector of the point onA closest toqi . For thenext|Ad(A)| constraints,wi = α

|Ad(A)|‖A( p−q)‖ , Ti = I ,qi is the difference between the positions of the two ver-tices inA, andai is the difference between the barycen-tric coordinate vectors of the two vertices (a vector withone 1 and one−1). Finally, the last|Ad(A)| constraintsare exactly the same as the previous|Ad(A)| except theα is replaced byβ in wi andqi = 0.

For the proper definition of the matrixRi (a functionof Ti and ai ) and the vectorc (a function ofC) asderived below, this can be converted into

E =∑

i

(Ri c− qi )2 (13)

which is a classic least-squares problem with thesolution

c =(∑

i

RTi Ri

)−1∑i

RTi qi (14)

80 Shelton

First we will definec as shown pictorially below tobe the vector of the components of the matrixC:

cT = [C1,1 · · · CD,1 C1,2 · · ·CD,2 · · · · · ·C1,N · · ·CD,N ]

(15)

whereN is the total number of constraints.We now defineRi as:

(Ri ) j,D(k−1)+l = (Ti ) j,l (ai )k (16)

which means thatRi has the block form:

Ri = [(ai )1Ti (ai )2Ti · · · (ai )NTi ] (17)

(remember that(ai )k is thekth element of the vectorai and thus a scalar). It can easily be shown from herethat Ri c = Ti Cai . We now have a simple linear least-squares problem with the solution shown in Eq. (14).

A nice property of this formulation is thatRi issparse. Only up tod + 1 elements ofai may be non-zero and thus only a maximum ofd + 1 of the blocksof Ri are non-zero. Many of theTi matrices are theidentity matrix (for all of the spring constraints) andthus many more of the elements ofRi will be zero forthese constraints. So, whileRT

i Ri is a DN × DN ma-trix, only a maximum ofD2(d+1)2 elements of it willbe non-zero. Furthermore, theseD × D blocks thatmake upRT

i Ri can only be non-zero for blocks thatcorrespond to adjacent vertices inA. Thus the matrix∑

i RTi Ri that needs to be inverted (or at least for which

an LU-decomposition needs to be found) in Eq. (14) isvery sparse. It is symmetric and on every row, a max-imum of Dg elements are non-zero (whereg is themaximum outgoing degree of any vertex inA). Thesolution therefore can be computed efficiently usingsparse inversion techniques such as conjugate gradientdescent (Press et al., 1992).

2.4. Surface Simplification

A vital component in our minimization algorithm isthe simplification of the surface descriptions. Attempt-ing to minimize the energy function as described inthe previous section directly on the original modelsA andB can result in severely suboptimal local min-ima and long running times. To combat both of theseproblems, we first perform a sequence of surface sim-plifications. ForAwe produce the sequence of surfaces{A0,A1, . . . ,Al A} and forB we produce the sequence{B0,B1, . . . ,Bl B} whereA0 andB0 are the original

input surfaces andX i+1 is a mesh approximatingXwith half the complexity ofX i . Complexity can be anymeasure; in this case we use the number of vertices.l A andl B are determined by setting a threshold on themaximum distance from the simplified surface to theoriginal surface. A single threshold works well for allsimilar models.

The computer graphics literature has a number ofalgorithms that can be used to produce simpler meshesapproximating an input mesh. Heckbert and Garland(1997) gives a good review of the major algorithmsof the field. We have implemented and used progres-sive meshes (Hoppe, 1996) and quadric error met-rics (Garland and Heckbert, 1997) and both providequalitatively the same performance when used in ouralgorithm on the types of surfaces described in the ex-perimental section of this paper. However, our imple-mentation of Garland’s and Heckbert’s quadric errormetrics runs faster and so the results reported in thispaper use that algorithm.

Once we have the sequence of simplified surfaces,we begin with the two simplest surfaces (Al A andBl B )and use the energy minimization technique from theprevious section to find a correspondence between thetwo surfaces. We then use the found correspondence toinitialize the starting position for finding the correspon-dence between the next two surfaces in each sequence(i.e.Al A−1 andBl B−1). We continue in this manner un-til we finally compute the correspondence betweenA0

andB0 (the original input surfaces). Ifl A 6= l B, thenat some point one of the two sequences will run outof more complex surfaces. At that point, we just con-tinue to use the most complex surface for the shortersequence until the algorithm has matched all of thesurfaces in the other sequence and finished.

At the base level, the correspondence is initializedto be the vertex positions ofAl A: no movement ofA.At each new step when a more complex model is in-troduced forA, we must find a position of the verticesof Ai based on the correspondence found forAi+1.BecauseAi andAi+1 could have arbitrarily differentgeometries or even topologies,2 for every vertexa ofthe surfaceAi , we letCAi (a) = a+CAi+1(PAi+1(a))−PAi+1(a). More informally, we take every vertex of thenew, more complex surface, and project it to the sim-pler surface. We compute the corresponding point tothat projection point based on the old coarse correspon-dence. We then take the implied motion of that point(the difference between the projected point the corre-spondence at the projected point) and add that to the


vertex to give the initial correspondence at the vertexof the more complex surface.

2.5. Polygon Reduction verses Image Pyramids

In some optical flow and other image matching algo-rithms, a similar simplification is used on the imagesto produce a pyramid of images from which the corre-spondence can be computed in a coarse-to-fine manner.Gaussian or Laplacian pyramids are usually the methodof choice where each simpler layer in the set is a blurreddown-sampled version of the previous (or possibly thederivative thereof). We argue that the polygon reduc-tion described above is fundamentally different in thatthe surface is simplified in adata-dependentfashion.Gaussian and Laplacian pyramids are constructed byapplying the same linear filters to the image regard-less of the image. Polygon reduction produces sim-pler surfaces by careful consideration of the shape ofthe surface. This insures a more faithful representationof the surface especially at the simplest levels whichmeans that the computations done at coarsest levelshave much more bearing on the final output therebyleading to fewer local minima and faster computationbecause less work needs to be undone.

As pointed out in Hoppe (1996), mesh simplifica-tion can be seen as a (potentially lossless) compressiontechnique. It is not surprising that compression can beused to obtain a good representation for matching. Ifwe assume that both surfaces are collections of fea-tures drawn from a random (but structured) distribu-tion, we would expect a compression algorithm to beable to find common substructures and reduce them inthe same way thus allowing for easier matching. Find-ing the best method for compression and the best fea-tures for learning are often similar problems. We canview the polygon reduction stages of the algorithm asimplicit feature detectors.

3. Morphable Surface Models

Now that we have an algorithm for finding correspon-dences between surfaces, we turn to building Mor-phable Surface Models. We are givenm surfaces,{A1,A2, . . . ,Am} and we fix one of them as the basesurface (without loss of generality, assume this is theone labeledA1). We now find the correspondences be-tweenA1 and all of the other surfaces. This producesa set of correspondences{C1,C2, . . . ,Cm} (whereC1

is trivially computed and the rest describe how to warpA1 to each of the other surfaces in the set).

These correspondences can be combined in a con-vex combination to produce a new warping of the basesurface. If4 = {ξ1, ξ2, . . . , ξm} are the convex com-bination coefficients (

∑i ξi = 1), then we define this

new warping in the obvious way,

W4(a) =∑

i

ξi Ci (a) (18)

whereW4(a) is the new position of the pointa from thebase mesh in the model whose parameters have beenset to4. This restricts the new warping to lie in them−1-dimensional space of thesem warpings. Clearlythere is a slight bias in this model relative to the arbi-trary choice of the base surface (since correspondencecalculations as described above are not invariant to theordering of the two surfaces). However, for classes ofsimilar objects with the same or similar topologies, thisbias is small.

To remove the condition that the sum of theξi param-eters must be 1, we will assume that the model param-eters have been centered (we will translate the spaceto a “center point” making any scaling of a warping avalid warping). By this we mean that an origin modelhas been set to warping when allξi = 1

m and the corre-spondence vectors have been changed from the abso-lute positions of the vertices to the relative differencebetween the vertex position for the correspondenceand the vertex position of this origin model. Mathe-matically, we define an origin asO(a) = ∑i

1mCi (a)

and then define new “displacement” correspondencesC′i (a) = Ci (a) − O(a) and associated parametersξ ′iwhich do not now need to sum to 1. This means thatWis now

W′4(a) = O(a)+∑

i

ξ ′i C′i (a) (19)

From now on, we will drop the prime notation and as-sume that all models have been described in this fashionto eliminate the convex constraint on theξ ’s.

3.1. Matching Models to Surfaces

We can now formulate an algorithm for finding theparameters of a morphable model so that the shape ofthe model best matches a given surface,B (this surfaceis not necessarily one of the ones used to construct themodel). We again pose this as an energy minimization

82 Shelton

problem with the following energy term:

Em(4)=∫A1

‖W4(a)− PB(a)‖2 da

+∫B

∥∥b− PW4(a)(b)∥∥2

db+ λ∑

i

ξi (20)

This minimizes the difference between points fromthe model’s surface and their closest points on the tar-get surface and vice-versa. We have also added a termpenalizing high coefficient values. This reflects our in-tuition that the model will not be a good one far awayfrom the surfaces used to construct it. If our model hasbeen whitened as described in Section 3.3, then thispenalty is analogous to assuming a Gaussian prior incorrespondence space.

Again, the minimization ofEm is difficult due to theintegrals, so we will approximate them with sums ofsampled points yielding

Em(4)=1

n

∑a∈Sn(O(A1))

‖W(a)− PB(W(a))‖2

+ 1

n

∑b∈Sn(B)

∥∥PW(A)(b)− b∥∥2+ λ

∑i

ξi (21)

Just as before, we can minimize this function by al-ternating between finding the closest points to sampledpoints from the opposite surface and fixing these pointsto find4 by solving a linear-least squares problem. Inthis case, we do not minimize the distance to the plane,but just the distance to the projected point for simplic-ity. By taking derivatives ofEm with respect to4whilekeeping the values ofPB(W(a)) constants (pretendingthey don’t depend on4) and the values ofPW(A)(b)as linear functions of4 (pretending their barycentriccoordinates are fixed), the second half of the minimiza-tion has the solution:

4 = (A+ 2nλI )−1 q (22)

where

Ai j =∑

k

Ci (ak)TCj (ak) (23)

qi =∑

k

Ci (ak)Tbk (24)

for k ranging over the terms of both sums of Eq. (21).ak is the point on model’s surface (sampled for the firstn values ofk and projected from a point onB for thesecondn) andbk is the associated point onB (projected

from a point on the model’s surface for the firstn valuesof k and sampled for the secondn).

3.2. Inner Products of Correspondence

In order to use bootstrapping (Vetter et al., 1997; Jonesand Poggio, 1998) for building models in a more ro-bust manner (Section 3.3), we need to define an innerproduct over the space of correspondences. It mightseem that if we simply stack all of the components ofthe correspondence vectors at each vertex into a largevector, we could use the dot product of these vectors asthe inner product of the correspondence space. How-ever, this definition is sensitive to the parameterizationof the base surface. As an example, if we keep the basesurface the same and the correspondences the same butsubdivide some of the faces of the base surface (so wehave the same surface described in a different way), weend up with a different inner product: we have length-ened the vector of vertex positions (although the newcomponents are dependent on the old ones) and there-fore magnified the importance of the correspondenceat the faces we subdivided.

To solve this problem, we define the inner prod-uct between two correspondences in a parameter-independent fashion,

〈Ci ,Cj 〉 =∫A1

Ci (a)TCj (a) da (25)

We would now like to find a tractable method for com-puting this integral for piece-wise linear surfaces. Forconcreteness, we will take the example of 2-dimen-sional surfaces which are therefore composed of trian-gles. The analysis and resulting answer generalize toarbitrary dimensional surfaces.

The above integral, for the case of 2-dimensionalpiece-wise linear surfaces, becomes the sum of inte-grals of Ci (a)TCj (a) over triangles. We know thatCi (a) is a linear function ofa over the triangle patch(and similarly withCj (a)). Therefore, the integral ofthe dot product of these two vectors is a quadratic func-tion of position taken over the triangle. The followingequality holds and can be verified easily.

∫4x0x1x2

Ci (x)TCj (x) dx= 1

12area(4x0x1x2)

×[

2∑k=0

Ci (xk)TCj (xk) +

2∑k=0

2∑l=0

Ci (xk)TCj (xl )

](26)


To allow us to use normal vector packages, we de-rive a new vector representation for a correspondencewhose dot-product in the usual vector sense is exactlythis inner product. LetVA be the set of all vertices ofA, FA be the set of all faces ofA, { f0, f1, f2} be thevertices of the facef , and F(v) be the set of facesadjacent to the vertexv.

〈Ci ,Cj 〉=∑f ∈FA

∫fCi (x)

TCj (x) dx

=∑f ∈FA

[1

12area( f )

[2∑

k=0

Ci ( fk)TCj ( fk)

+2∑

k=0

2∑l=0

Ci ( fk)TCj ( fl )

]]

=∑v∈VA

∑f ∈F(v)

1

12area( f )Ci (v)

TCj (v)

+∑f ∈FA

1

12area( f )

2∑k=0

Ci ( fk)T

2∑k=0

Cj ( fk)

=∑v∈VA

√∑ f ∈F(v) area( f )

12Ci (v)

T

×√∑ f ∈F(v) area( f )

12Cj (v)

+∑f ∈FA

[√area( f )

12

2∑k=0

Ci ( fk)

]T

×[√

area( f )

12

2∑k=0

Cj ( fk)

] (27)

which implies that, ifD is the dimensionality of theembedded space, the inner product can be representedas a dot product between two vectors whose firstD ×VA components are the positions of the vertices of thesurface multiplied by the square root of the sum ofthe areas of the adjacent faces and whose lastD × FAcomponents are the sums of the positions of the verticesof each of thefA faces multiplied by the square root ofthe area of the face.

3.3. Building a Model

Now that we have an inner product for correspon-dences, we can use the bootstrapping algorithm

described in Vetter et al., Jones and Poggio (1997,1998) to build the model robustly. Usually the methoddescribed above (where we take each of the surfacesand compute its correspondence to the base surface)will work fine. However, since the correspondence al-gorithm is not perfect, occasionally some parts of thecorrespondence will be off. If we have a large numberof surfaces to be incorporated, we can correct for thisproblem in many cases. The bootstrapping algorithmis

1. Let M be the model. Initialize it as just the basesurface (thus only one correspondence).

2. Repeat the following until the model has reach thedesired level of flexibility:

(a) for each input surface:

i) Match the current model to the surface (asin Section 3.1) yielding an approximationto the input surface:W(a)

ii) Find the correspondence from the base sur-face to the approximation,W(a) (as inSection 2):C(a)

iii) Let Ci (a) = W(C(a)).

(b) Center the correspondences found in the pre-vious step (as described in the beginning ofSection 3) and create the data matrixM whosecolumns are the centered correspondences (inthe vector representation of Section 3.2).

(c) Perform singular value decomposition onMyieldingU DVT .

(d) Retain only those columns ofU with the largestsingular values, increasing the number of pa-rameters of the model (each column ofU isan axis of correspondence space). Scale eachcolumn by its singular value.

In this algorithm, we successively build a more andmore flexible model. Each time we use the old modelto match the input surfaces and give an easier startingpoint for the correspondence routines. We then findand keep, via singular value decomposition, the axesof our space with largest variance since they are likelyto be “real” surface changes and not due to noise in ourcorrespondence algorithm.

Since we are scaling the axes by the standard de-viation of the correspondences (in step (d)), we havewhitened the data. This supports the

∑i ξi log-prior

term in the matching energy function of Section 3.1.

84 Shelton

4. Experimental Results

We report results of running the model building algo-rithm on three collections of surfaces. Figure 1 showsa few examples from a collection of hand-drawn “smi-ley faces.” These surfaces are (one-dimensional) linesembedded in the (two-dimensional) plane. They werecreated using a tablet input device which sampled thepen position over time. The example faces have be-tween 66 and 407 line segments each, with most sur-faces having about 100 segments. All faces were com-posed of four separate open curves and thus had thesame topology; however this topological equivalencewas not used in the matching.

These types of line drawings are very difficult forconventional optical flow algorithms: if the surfaceswere converted to images, the spatial derivatives wouldbe zero almost everywhere. A simple matching ofclosest points also fails dramatically because of thevariation in the position of the eyes and mouth. It re-sults in matching portions of some eyes and mouths tothe large head circle. The structure term plays a crucialrole.

We ran the bootstrapping algorithm on 57 faces for23 iterations producing a model with 23 parameters.Figure 2 shows the major axes of variation captured bythe model (the principal components of the last iterationof the bootstrapping algorithm).

We then asked two people to rank each face onfour attributes (friendly vs. sinister, ugly vs. cute, left-looking vs. right-looking, and down-looking vs. up-looking). We created a radial-basis function (RBF)

Figure 1. Six of the 57 example line-drawings used to construct the face model.

regressor (Bishop, 1995) mapping those four attributesto the dimensions of the face model. From that map-ping, we automatically generated the results shown inFig. 3. Similarly, we used an RBF to create the re-verse mapping (from parameters to attributes). Figure 4shows the results of matching eight faces not used increating the model and then estimating the attributesfrom the resulting model parameters.

Secondly, we constructed a surface model of carsfrom 15 computer graphics surfaces (6 of which areshown in Fig. 5). These models have different topolo-gies. All of the cars have five closed surfaces represent-ing the body and tires. Some additionally have eitheropen or closed surfaces representing the bumpers andside mirrors. Between 1270 and 10568 triangles de-fine each surface. Color is represented by three extradimensions (red, green, and blue). Thus, these shapesare 2D surfaces embedded in six-dimensional space.This 6D representation allows the algorithm to trade-off matching color verses matching shape. In previousMorphable Model work with images, the variations inshape had different model parameters than those of tex-ture (or color). This would be analogous to separatingevery correspondence in the model into two correspon-dences (one that had only non-zero spatial componentsand one that had only non-zero color components). Wehave chosen not to do so here and instead our resultsinsist that the color and surface changes be dependent.Thus a given correspondence represents a change in thesurface shape and surface color.

We ran the bootstrapping algorithm to comple-tion (15 rounds, adding one additional parameter per


Figure 2. The first 6 eigenvectors of the face model scaled by±2 standard deviations shown applied to the average face.

Figure 3. Results of fitting a radial-basis function network to mapping from four attributes (sinisterness, cuteness, left-right orientation, up-downorientation) to morphable model parameters. Four example outputs from settings of the attributes are shown.

round). The resulting principle components are shownin Fig. 6. Given that only 15 example surfaces wereused and the high dimensionality of the problem (over6000 numbers to represent one surface), we feel thesecapture the natural variations in car bodies well. Thebounding boxes on the cars were normalized, so thefirst eigenvector well captures the difference betweensmall and large car shapes. The second could be char-acterized as a “sporty” verses “boxy” axes. The thirdand fourth make distinctions in the tail-lights and rearwindows.

The results from the matching procedure fromSection 3.1 are shown in Fig. 7. The first line is theinput car to be matched. The second line is the modelmatched to the input. The third line is the model withthe input car removed matched to the input. The dif-ference between the second and third lines is mainlydue to limited size of the model. As there are only 14other cars, they do not completely span the space of allcars and thus were not able to exactly match the input.There were only two other “sporty” two-seater cars andno other car had such large side or rear windows. Thus

86 Shelton

Figure 4. Analysis results for faces. The first row is the input faces. The second row is the model after matching. The third row of numbers arethe output from the regressor (sinisterness, cuteness, left-right orientation, up-down orientation).

Figure 5. Six of the fifteen 3D surfaces used to construct the car morphable model.

it was impossible for the model to represent the inputshape as the learned examples did not cover that portionof “car shapes.”

The wheels in most of the car correspondencesare a problem. Because they are round, the matching

algorithm tends to add a bit of random rotation tothe correspondences. Rotation correspondence fieldsdo not add linearly and thus the wheel representa-tions are not as crisp as the rest of the vehicle (thewheels tend to collapse). This is essentially a mismatch


Figure 6. The first four eigenvectors of the car model. Each eigenvector is shown applied to the average model scaled by±2 standard deviations.The other eigenvectors have similar forms. We show the cars from this view since the back of the car tends to have the most variation.

between the polygonal representation and the roundsurface.

To visually demonstrate the correspondence algo-rithm, Fig. 8 shows the results of each step of thealgorithm run on two car surfaces. We think this clearlyshows the iterative refining of the solution from coarseto fine resolution. Notice that most of the gross changesare done at the coarse level where the number of freevariables (positions of vertices) are minimal. This helpsto prevent overfitting.

Finally, to demonstrate the use of these correspon-dences as a metric-space, we took the animal sur-faces of Fig. 9 and built a morphable model. Using

the distance metric implied by the vector represen-tation of Section 3.2, we clustered the examples us-ing thek-means algorithm (Bishop, 1995). This pro-duced one cluster of the cat surfaces and one of thedog surfaces without supervision. The projection of thepoints and cluster centers onto the first two principlecomponents is shown in Fig. 10. The color and spa-tial positioning differences among the surfaces wereeliminated by projecting out the correspondence di-rections of pure translation (in both spatial and colorcoordinates).

The goal in all of these experiments is to showthat the correspondence technique and model building

88 Shelton

Figure 7. An example of matching a surface to a model. The top line is two renderings of the same input surface. The second line is the resultof matching the top surface to a model of all 15 cars. The last line is the result of matching the surface to the model of only 14 cars (where theinput car has been removed). Note that in this final case, the 14 cars used to construct the model do not span the space of all cars sufficientlyenough to reconstruct the input car. The poor match in this case is mainly a result of insufficient flexibility in the model. For instance, in our setof car surface, there were no other cars with a small rear window and a similarly-shaped trunk.

algorithms produce models which are robust and goodat capturing the class of surfaces either for analysis orsynthesis. The true test of a correspondence (especiallyin a domain where there isn’t a ground truth) is in itsuse. We feel that these figures demonstrate the useful-ness of the models built.

5. Extensions and Conclusions

No user intervention was required to build the mod-els in this paper. The surfaces were input to the modelbuilding algorithm which ran automatically. They wereall roughly aligned (centered, scaled, and rotated ap-proximately the same). However, such rough alignmentcould have easily been done by comparing the first andsecond moments of the surface. Additionally, a fewparameters3 needed to be set (though not as many asin Shelton (1998) where gradient descent was used forminimization). We found that running a few test corre-spondences to find the correct order of magnitude forthe parameters was all that was necessary.

For its generality, we feel that the energy functiondescribed produces good results. However, in domainswhere prior knowledge about surface deformations is

available, better results could certainly be obtained bymodifying theEstr andEpri terms. Preliminary resultswith using this algorithm as a replacement for opti-cal flow are promising (Shelton, 1998): the polygonreduction produces large triangles in textureless areasleading to easy matching where traditionally opticalflow algorithms have had problems. Yet, this algorithmmakes no assumptions about the images matched.

We would like to replace the hard-threshold of themaximum operation in theP function (P takes a pointand finds its closest point on another surface) with asoft-max function as is done in elastic networks (Durbinand Willshaw, 1987) since this leads to a smoother en-ergy landscape (Durbin et al., 1989) and fewer localminima. However, so far we have not found a tractablemethod for doing so.

Although not shown here, user-defined correspon-dences can be used to improve the match by addingadditional spring terms connecting the points on onesurface to their matched points on the other surface. Wehave found such extra springs to be useful in match-ing shapes which differ greatly (i.e. in building a moregeneral model of animals which included a rhinoceros,a camel, a giraffe, and a buffalo) or in applicationswhere certain fairly undistinguished features are highly


Figure 8. An example of the process of finding correspondence. The upper right surface is the input surfaceA. The lower right surface is theinput surfaceB. The top line is the sequence of surfacesA5 throughA0 (left to right). Similarly, the bottom line is the sequence of surfacesB5 throughB0. These surfaces are produced from the input boxed surfaces as shown by the arrows at the top and bottom. Chronologically, thecorrespondence continues from the upper left corner and proceeds down each column and then across from left to right: FirstA5 is matched toB5 by iteratively solving the dual-minimization. The successive resulting correspondences, as applied toA5, are shown top-to-bottom in the firstcolumn (produced in the order shown by arrows). The final correspondence is then converted to a correspondence onA4 (the top surface in thecolumn to the left) and the process is repeated in the next column (now matching toB4). Finally, the rightmost column shows the algorithm’ssteps on the last coarse-to-fine level with the figure outlined in bold being the final correspondence applied toA.

important for appearance to human observers (i.e. theoutline of the lips can be difficult to match because oflimited contrast in some images).

We would like to add the ability for the topol-ogy of the surface to change. Currently, example sur-faces of different topologies can be used to create amodel. However, all of the surfaces produced by that

model will all have the topology of the base surface.McInerney and Terzopoulos (1995) describe a methodfor allowing the topology of snakes to change dur-ing the matching process. Combining this idea withthe mesh reduction algorithm of Popovi´c and Hoppe(1997), which allows topology changes, might providefor a more flexible surface model.

90 Shelton

Figure 9. Surfaces used for constructing an animal model (this model has also been used for psychophysics experiments (Riesenhuber andPoggio, 1999; Riesenhuber and Poggio, 2000)). Clustering using the k-means algorithms (fork = 2) produced the top row as one cluster andthe bottom row as another cluster. This nicely corresponds to cats and dogs.

Figure 10. Plotting the six surfaces in correspondence space (pro-jected onto the two axes of largest variance). The× markers arethe cat surfaces and the circles are the dogs. The stars represent thecluster centers found.

Acknowledgments

This paper describes research done within the Centerfor Biological and Computational Learning in the De-partment of Brain and Cognitive Sciences and at theArtificial Intelligence Laboratory at the Massachusetts

Institute of Technology. This research is sponsored bygrants from the National Science Foundation, ONRand Darpa. Additional support is provided by EastmanKodak Company, Daimler-Chrysler, Siemens, ATR,AT&T, Compaq, Honda R&D Co., Ltd., Merrill-Lynch,NTT and Central Research Institute of Electric PowerIndustry.

Notes

1. A gray scale image can be described as a surface or height-field.Color images can similarly be described (now in 5-dimensionalspace instead of 3-dimensional space).

2. For the two algorithms we tested, progressive meshes and quadricerror metrics, and many other mesh simplification algorithms, thesimplification is done by collapsing edges and thus the verticesof Ai can be mapped to the vertices ofAi+1. However, afterremoving half of the vertices such a mapping can be misleadingand we would like this algorithm to be independent of the methodused to obtain the simplified surfaces.

3. specifically:α and β (the weights of the terms of the energyfunction),n (number of sampled points),t (number of iterationsof the minimization),γ (the ratio of color to spatial coordinates),λ (the prior’s weight in the model matching algorithm), and theannealing schedule forα. None of these were sensitive in the leastand very easy to set except for the annealing schedule which wehad to tune differently for each set of surfaces.

References

Beymer, D. and Poggio, T. 1996. Image representations for visuallearning.Science, 272:1905–1909.


Beymer, D., Shashua, A., and Poggio, T. 1993. Example based imageanalysis and synthesis. A.I. Memo 1431, Artificial IntelligenceLaboratory, Massachusetts Institute of Technology.

Bishop, C.M. 1995.Neural Networks for Pattern Recognition.Clarendon Press: Oxford.

Blanz, V. and Vetter, T. 1999. A morphable model for thesyntheis of 3D faces. InComputer Graphics Proceedings,SIGGRAPH’99. pp. 187–194.

Cootes, T.F., Edwards, G.J., and Taylor, C.J. 1998. Active appearancemodels. InProceedings of the European Conference on ComputerVision, Vol. 2., pp. 484–498.

Durbin, R., Szeliski, R., and Yuille, A. 1989. An analysis of theelastic net approach to the traveling salesman problem.NeuralComputation, 1:348–358.

Durbin, R. and Willshaw, D. 1987. An analogue approach to thetravelling salesman problem using an elastic net method.Nature,326(16):689–691.

Garland, M. and Heckbert, P.S. 1997. Surface simplification usingquadric error metrics. InComputer Graphics Proceedings, SIG-GRAPH’97, pp. 209–216.

Heckbert, P.S. and Garland, M. 1997. Survey of polygonal sur-face simplification algorithms. Technical Report, Carnegie MellonUniversity. to appear.

Hoppe, H. 1996. Progressive meshes. InComputer Graphics Pro-ceedings, SIGGRAPH’96, pp. 99–108.

Hoppe, H., DeRose, T., Duchamp, T., McDonald, J., and Stuetzle,W. 1992. Surface reconstruction from unorganized points. InCom-puter Graphics Proceedings, SIGGRAPH’92, pp. 71–78.

Hoppe, H., DeRose, T., Duchamp, T., McDonald, J., and Stuetzle,W. 1993. Mesh optimization. InComputer Graphics Proceedings,SIGGRAPH’93, pp. 19–26.

Jones, M.J. and Poggio, T. 1998. Multidimensional morphablemodels: A framework for representing and matching objectclasses.International Journal of Computer Vision, 29(2):107–131.

Kang, S. and Jones, M. Personal communication.Kass, M., Witkin, A., and Terzopoulos, D. 1988. Snakes: Active con-

tour models.International Journal of Computer Vision, 1(4):321–331.

McInerney, T. and Terzopoulos, D. 1995. Topologically adaptablesnakes. InProceedings of the Fifth Internation Conference onComputer Vision (ICCV ’95), pp. 840–845.

Nastar, C., Moghaddam, B., and Pentland, A. 1996. Generalized im-age matching: Statistical learning of physically-based deforma-tions. In Proceedings of Fourth European Conference on Com-puter Vision.

Poggio, T. and Vetter, T. 1992. Recognition and structure from one 2Dmodel view. A.I. Memo 1347, Artificial Intelligence Laboratory,Massachusetts Institute of Technology.

Popovic, J. and Hoppe, H. 1997. Progressive simplicial complexes.In Computer Graphics Proceedings, SIGGRAPH’97, pp. 217–224.

Press, W.H., Teukolsky, S.A., Vetterling, W.T., and Flannery, B.P.1992. Numerical Recipes in C. 2nd ed., Cambridge UniversityPress.

Riesenhuber, M. and Poggio, T. 1999. A note on object class repre-sentation and categorical perception. Technical Report AI Memo1679, CBCL Paper 183, MIT AI Lab and CBCL, Cambridge,MA.

Riesenhuber, M. and Poggio, T. 2000. CBF: A new framework forobject categorization in cortex. InProceedings of BMCV2000,M.-H. Yoo (Ed.). New York. To appear.

Shashua, A. 1992. Projective structure from two uncalibrated images:Structure from motion and recognition. A.I. Memo 1363, ArtificialIntelligence Laboratory, Massachusetts Institute of Technology.

Shelton, C.R. 1998. Three-dimensional correspondence. Master’sThesis, Massachusetts Institute of Technology. Also available asAITR-1650.

Ullman, S. and Basri, R. 1991. Recognition by linear combinationsof models.IEEE Transactions on Pattern Analysis and MachineIntelligence, 13:992–1006.

Vetter, T., Jones, M.J., and Poggio, T. 1997. A bootstrapping al-gorithm for learning linear models of object classes. A.I. Memo1600, Artificial Intelligence Laboratory, Massachusetts Instituteof Technology.

Vetter, T. and Poggio, T., 1995. Linear object classes and imagesynthesis from a single example image. A.I. Memo 1531, ArtificialIntelligence Laboratory, Massachusetts Institute of Technology.

morphable surface models - cbclcbcl.mit.edu/publications/ps/shelton-ijcv-2000.pdf · 2005-06-15 ·...

Documents