em, mcmc, andchainflipping for structure from motion with

30
EM, MCMC, and Chain Flipping for Structure from Motion with Unknown Correspondence Frank Dellaert College of Computing, Georgia Institute of Technology, Atlanta GA Steven M. Seitz Department of Computer Science and Engineering, University of Washington, Seattle WA Charles E. Thorpe and Sebastian Thrun School of Computer Science, Carnegie Mellon University, Pittsburgh PA Abstract. Learning spatial models from sensor data raises the challenging data association problem of relating model parameters to individual measurements. This paper proposes an EM-based algorithm, which solves the model learning and the data association problem in parallel. The algorithm is developed in the context of the the structure from motion problem, which is the problem of estimating a 3D scene model from a collection of image data. To accommodate the spatial constraints in this domain, we compute virtual measurements as suf- ficient statistics to be used in the M-step. We develop an efficient Markov chain Monte Carlo sampling method called chain flipping, to calculate these statistics in the E-step. Experimental results show that we can solve hard data association problems when learning models of 3D scenes, and that we can do so efficiently. We conjecture that this approach can be applied to a broad range of model learning problems from sensor data, such as the robot mapping problem. Keywords: Expectation-Maximization, Markov chain Monte Carlo, Data Association, Struc- ture from Motion, Correspondence Problem, Efficient Sampling, Computer Vision 1. Introduction This paper addresses the problem of data association when learning models from data. The data association problem, also known as the correspondence problem, is the problem of relating sensor measurements to parameters in the model that is being learned. This problem arises in a range of disciplines. In clustering, it is the problem of determining which data point belongs to which cluster (McLachlan & Basford, 1988). In mobile robotics, learning a map of the environment creates the problem of determining the correspondence This work was performed when all authors were at Carnegie Mellon University. It was partially supported by grants from Intel Corporation, Siebel Systems, SAIC, the National Science Foundation under grants IIS-9876136, IIS-9877033, and IIS-9984672, by DARPA- ATO via TACOM (contract number DAAE07-98-C-L032), and by a subcontract from the DARPA MARS program through SAIC, which is gratefully acknowledged. The views and conclusions contained in this document are those of the author and should not be interpreted as necessarily representing official policies or endorsements, either expressed or implied, of the United States Government or any of the sponsoring institutions. c 2002 Kluwer Academic Publishers. Printed in the Netherlands. ml.tex; 19/01/2002; 13:46; p.1

Upload: others

Post on 04-Nov-2021

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: EM, MCMC, andChainFlipping for Structure from Motion with

EM, MCMC, andChainFlippingfor StructurefromMotionwith Unknown Correspondence

�FrankDellaertCollege of Computing, Georgia Instituteof Technology, AtlantaGA

StevenM. SeitzDepartmentof ComputerScienceandEngineering, University of Washington,SeattleWA

CharlesE. ThorpeandSebastianThrunSchool of ComputerScience, Carnegie Mellon University, PittsburghPA

Abstract. Learningspatialmodelsfrom sensordataraisesthe challengingdataassociationproblemof relatingmodel parametersto individual measurements.This paperproposesanEM-basedalgorithm,which solves the model learningand the dataassociationprobleminparallel.Thealgorithmis developedin thecontext of thethestructurefrom motionproblem,which is the problemof estimatinga 3D scenemodel from a collection of imagedata.Toaccommodatethespatialconstraintsin thisdomain,wecomputevirtual measurementsassuf-ficient statisticsto beusedin theM-step.We developanefficient Markov chainMonteCarlosamplingmethodcalledchainflipping, to calculatethesestatisticsin theE-step.Experimentalresultsshow that we cansolve harddataassociationproblemswhenlearningmodelsof 3Dscenes,andthatwe cando soefficiently. Weconjecturethatthis approachcanbeappliedto abroadrangeof modellearningproblemsfrom sensordata,suchastherobotmappingproblem.

Keywords: Expectation-Maximization,Markov chainMonteCarlo,DataAssociation,Struc-turefrom Motion, CorrespondenceProblem,EfficientSampling,ComputerVision

1. Intr oduction

This paperaddressestheproblemof dataassociationwhenlearningmodelsfrom data.Thedataassociationproblem, alsoknown asthecorrespondenceproblem, is theproblemof relatingsensormeasurementsto parametersin themodelthat is beinglearned.This problemarisesin a rangeof disciplines.Inclustering,it is theproblemof determiningwhichdatapointbelongsto whichcluster(McLachlan& Basford,1988). In mobile robotics,learninga mapof the environmentcreatesthe problemof determiningthe correspondence�

This work wasperformedwhenall authorswereat Carnegie Mellon University. It waspartially supportedby grantsfrom Intel Corporation,SiebelSystems,SAIC, the NationalScienceFoundationundergrantsIIS-9876136,IIS-9877033,andIIS-9984672,by DARPA-ATO via TACOM (contractnumberDAAE07-98-C-L032),and by a subcontractfrom theDARPA MARS programthroughSAIC, which is gratefully acknowledged.The views andconclusionscontainedin this documentarethoseof theauthorandshouldnot be interpretedasnecessarilyrepresentingofficial policiesor endorsements,eitherexpressedor implied, oftheUnitedStatesGovernmentor any of thesponsoringinstitutions.

c�

2002Kluwer AcademicPublishers. Printedin theNetherlands.

ml.tex; 19/01/2002; 13:46; p.1

Page 2: EM, MCMC, andChainFlipping for Structure from Motion with

2 Dellaert,Seitz,Thorpe& Thrun

betweenindividual measurements(e.g.,therobotseesadoor),andthecorre-spondingfeaturesin theworld (e.g.,doornumber17) (Leonardet al., 1992;Shatkay, 1998;Thrunet al., 1998a).A similarproblemcanbefoundin com-putervision, whereit is known asstructure frommotion(SFM). SFM seeksto learna3D modelfrom acollectionof images,which raisestheproblemofdeterminingthecorrespondencebetweenfeaturesin thesceneandmeasure-mentsin imagespace.In all of theseproblems,learninga modelrequiresarobustsolutionto thedataassociationproblemwhich, in thegeneralcase,ishardto obtain.Becausetheproblemis hard,many existing algorithmsmakehighly restrictiveassumptions,suchastheavailability of uniquelandmarksinrobotics(Borensteinetal., 1996),or theexistenceof reliablefeaturetrackingmechanismsin computervision (Tomasi& Kanade,1992;Hartley, 1994).

From a statisticalpoint of view, the dataassociationcan be phrasedasan incompletedataproblem(Tanner, 1996), for which a rangeof methodsexists. Onepopularapproachis expectationmaximization(EM) (Dempsteretal.,1977),whichhasbeenappliedwith greatsuccesstoclusteringproblemsanda rangeof otherestimationproblemswith incompletedata(McLachlan& Krishnan,1997).The EM algorithmiteratestwo estimationsteps,calledexpectation(E-step)andmaximization(M-step).TheE-stepestimatesa dis-tribution over the incompletedata using a fixed model. The M-step thencalculatesthe model that maximizesthe expectedlog-likelihood computedin the E-step.It hasbeenshown that iterating thesebasicstepsleadsto amodelthatlocally maximizesthelikelihood(Dempsteretal., 1977).

Applying EM to learningspatialmodelsis notstraightforward,aseachdo-maincomeswith asetof constraintsthatareoftendifficult to incorporate.Anexampleis thework on learninga mapof theenvironmentfor mobilerobotsin (Shatkay& Kaelbling,1997;Shatkay, 1998),andin (Burgardet al., 1999;Thrunetal.,1998b,1998a).Bothteamshaveproposedextensionsof EM thattake into accountthe geometricconstraintsof robot environments,and theresultingmappingalgorithmshave shown to scaleup to largeenvironments.

This paperproposesan algorithmthat appliesEM to a new domain:thestructurefrom motion problemin computervision. In SFM the model thatis being learnedis the location of all 3D features,along with the cameraposeswith 6 DOF. In this paper, we make the commonlymadeassumptionthatall 3D featuresareseenin all images(Tomasi& Kanade,1992;Hartley,1994).However, we will discussat theendof this paperhow to extendourmethodto imaging situationswith occlusionsand spuriousmeasurements.More importantly, we do not assumeany prior knowledgeon the camerapositionsor on the correspondencebetweenimagemeasurementsand 3Dfeaturesthefeatureidentities,giving riseto aharddataassociationproblem.

Themajority of literatureon SFM considersspecialsituationswherethedataassociationproblemcanbe solved easily. Someapproachessimply as-sumethat datacorrespondenceis known a priori (Ullman, 1979;Longuet-

ml.tex; 19/01/2002; 13:46; p.2

Page 3: EM, MCMC, andChainFlipping for Structure from Motion with

EM, MCMC, andChainFlipping for Structurefrom Motion 3

Higgins,1981;Tsai& Huang,1984;Hartley, 1994;Morris & Kanade,1998).Other approachesconsidersituationswhere imagesare recordedin a se-quence,sothatfeaturescanbetrackedfrom frameto frame(Broida& Chel-lappa,1991;Tomasi& Kanade,1992;Szeliski& Kang, 1993;Poelman&Kanade,1997). Several authorsconsideredthe specialcaseof correctbutincompletecorrespondence,by interpolatingoccludedfeatures(Tomasi&Kanade,1992;Jacobs,1997;Basrietal., 1998),or expandingaminimalcor-respondenceinto acompletecorrespondence(Seitz& Dyer, 1995).However,theseapproachesrequirethatanon-degeneratesetof correctcorrespondencesbeprovidedapriori. Finally, methodsbasedontherobustrecoveryof epipolargeometry, e.g.usingRANSAC (Beardsley et al., 1996;Torr et al., 1998)cancopewith largerinter-framedisplacementsandcanbevery effective in prac-tice.However, RANSACdependscruciallyontheability to identify areliablesetof initial correspondences,andthisbecomesmoreandmoredifficult withincreasinginter-framemotion.

In themostgeneralcase,however, imagesaretakenfrom widely separatedviewpoints. This problemhas largely beenignored in the SFM literature,due to the difficulty the dataassociationproblem,which hasbeenreferredto asthemostdifficult partof structurerecovery (Torr etal., 1998).Notethatthis is particularlychallengingin 3D: traditionalapproachesfor establishingcorrespondencebetweensetsof 2D features(Scott& Longuet-Higgins,1991;Shapiro& Brady, 1992;Gold et al., 1998)areof limited usein this domain,astheprojected3D structurecanlook very differentin eachimage.

Froma statisticalestimationpoint of view, theSFM problemcomeswithauniquesetof properties,whichmakestheapplicationof EM non-trivial:

1. Geometricconsistency. The laws of optical projectionconstrainthespaceof valid estimates(models,dataassociations)in anon-trivial way.

2. Mutual exclusiveness.Eachfeaturein the real world occursat mostoncein eachindividual cameraimage—thisis animportantassumptionthatseverelyconstrainsthedataassociation.

3. Large parameterspaces.The numberof featuresin computervisiondomainsis usuallylarge,giving raiseto a largenumberof localminima.

This paperdevelopsanalgorithmbasedon EM thataddressesthesechal-lenges.The correspondence(dataassociation)is encodedby an assignmentvectorthatassignsindividualmeasurementsto specificfeaturesin themodel.Thebasicstepsof EM aremodifiedto suit thespecificsof SFM:

The E-step calculatesa posteriorover the spaceof all possibleassign-ments.Unfortunately, theconstraintslistedabovemakeit impossibletocalcu-latetheposteriorin closedform. Thestandardapproachfor posteriorestima-tion in suchsituationsis Markov chainMonteCarlo(MCMC) (Doucetetal.,2001; Gilks et al., 1996; Neal, 1993). In particular, our approachusesthepopularMetropolis-Hastingsalgorithm(Hastings,1970;Smith & Gelfand,1992),for approximatingthedesiredposteriorsummaries.However, thede-

ml.tex; 19/01/2002; 13:46; p.3

Page 4: EM, MCMC, andChainFlipping for Structure from Motion with

4 Dellaert,Seitz,Thorpe& Thrun

signof efficientMetropolis-Hastingsalgorithmscanbeverydifficult in high-dimensionalspaces(Gilks et al., 1996). In this paper, we proposea novel,efficient proposalstrategy calledchain flipping, which canquickly jump be-tween globally different assignments.Experimentalresultsshow that thisapproachis much more efficient than approachesthat consideronly localchangesin theMCMC samplingprocess.

The M-step calculatesthelocationof thefeaturesin thescene,alongwiththecamerapositions.Aspointedout,theSFMliteraturehasdevelopedanum-berof excellentalgorithmsfor solvingthisproblemundertheassumptionthatthe dataassociationproblemis solved. However, the E-stepgeneratesonlyprobabilisticdataassociations.To bridge this gap,we introducethe notionof virtual measurements. Virtual measurementsaregeneratedin the E-step,andhave two pleasingproperties:first, they make it possibleto apply off-the-shelfSFM algorithmsfor learningthe modelandthe camerapositions,andsecond,they aresufficient statisticsof the posteriorwith respectto theproblemof learningthe model;hencethe M-step is mathematicallysound.Independentlyfrom us,theconceptof virtual measurementshadalreadybeenusedin thetrackingliterature(Avitzour, 1992;Streit& Luginbuhl, 1994).

From a machinelearningpoint of view, our approachextendsEM to animportantdomainwith asetof characteristicsfor whichwepreviously lackeda soundstatisticalestimator. From a SFM point of view, our approachaddsa methodfor dataassociationthat is statisticallysound.Our approachis or-thogonalto the vast majority of work on SFM in that it can be combinedwith virtually any algorithmthatassumesknown dataassociation.Thus,ourapproachaddsthebenefitof solvingthedataassociationproblemfor a largebodyof literaturethatpreviously operatedundermorenarrow assumptions.

2. EM for Structur e fr om Motion without Correspondence

Below we introducethe structurefrom motion problem and the assump-tions we make, and discussmethodsto find a maximum-likelihood modelfor knowncorrespondence.Wethenshow how theEM algorithmcanbeusedto learnthemodelparametersfor thecaseof unknowncorrespondence.

2.1. PROBLEM STATEMENT, NOTATION, AND ASSUMPTIONS

TheSFM problemis this: givena setof imagesof a scene,learna modelofthe 3D sceneandrecover the cameraposes.Several flavors of this problemexist, dependingon (a) whetherthe algorithmworks with raw pixel values,or whethera setof discretemeasurementsis first extracted,(b) whethertheimagesweretakenin a continuoussequenceor from arbitraryseparateloca-tions,or (c) whetherthecamera’s intrinsic parametersarevaryingor not. Inthispaperwe make thefollowing assumptions:

ml.tex; 19/01/2002; 13:46; p.4

Page 5: EM, MCMC, andChainFlipping for Structure from Motion with

EM, MCMC, andChainFlipping for Structurefrom Motion 5

x1

x2

x3

z11

j11

=3

z12

j12

=1z13

j13

=2

z21

j21

=2

z22

j22

=3

z23

j23

=1

m1

m2

Figure1. An examplewith 3 featuresseenin 2 images.The6 measurements��� � areassignedto theindividual features�� by meansof theassignmentvariables�� � .1. We adopt a feature-basedapproach,i.e., we assumethat the input to

the algorithm is a set of discreteimage measurements�� ���������� �������� �! #" � ����� $ ��% , where � is the imageindex. It is assumedthat the����� correspondto theprojectionof a setof realworld, 3D features &' �)(#*�� +,� ����� - % , corruptedby additive noise.

2. It is not requiredthat the correspondencebetweenmeasurementsin thedifferentimagesis known. This is exactly thedataassociationproblem.To model the correspondencebetweenmeasurements����� and 3D fea-tures(#* weintroduceanassignmentvector. : for eachmeasurement�����the vector . containsan indicator variable +/��� , indicating that ����� is ameasurementassignedto the +/��� -th feature(#* � � . Notethatthisadditionaldatais unknown or hidden.

3. We allow imagesto betaken from a setof arbitrarycamera poses01 ��2 � � �3� ����� � % . This makes the dataassociationproblemharder:mostexisting approachesrely on the temporalcontinuity of an imagestreamto track featuresovertime(Deriche& Faugeras,1990;Tomasi& Kanade,1992;Zhang& Faugeras,1992;Cox, 1993),or otherwiseconstrainthedataassociationproblem(Beardsley etal., 1996).

4. In this paper, we adoptthecommonlyusedassumptionthatall features(#* areseenin all images(Tomasi& Kanade,1992;Hartley, 1994),i.e.thereareno spuriousmeasurementsandthereis no occlusion.This is astrongassumption:we discussat theendof thispaperhow to extendourmethodto more generalimagingsituations.Note that this implies thatthereareexactly

-measurementsin eachimage,i.e.

$ � - for all � .Thevariousvariablesintroducedabove areillustratedin Figure1.

ml.tex; 19/01/2002; 13:46; p.5

Page 6: EM, MCMC, andChainFlipping for Structure from Motion with

6 Dellaert,Seitz,Thorpe& Thrun

2.2. SFM WITH KNOWN CORRESPONDENCE

In the casethat the assignmentvector 4 is known, i.e., the dataassociationis known, mostexistingapproachesto SFM canbeviewedasmaximumlike-lihood (ML) methods.The model parameters5 consistof the 3D featurelocations6 andthe cameraposes7 , i.e., 598;: 6!<=7?> , the structure andthe motion. The dataconsistsof the 2D imagemeasurements@ , and theassignmentvector 4 thatassignsmeasurementsA�B�C to 3D featuresD#EGF H . Themaximumlikelihoodestimate5JI giventhedata @ and4 is thengivenby5 I 8 argmaxK L�MONQPR:S5UTV@W<X4Q> (1)

wherethe likelihood PR:S5UTV@W<X4Q> is proportionalto YZ:S@W<X4\[�5U> , the condi-tional densityof the datagiven the model.To evaluatethe likelihood, weassumethateachmeasurementA�B�C is generatedby applyingthemeasurementfunction ] to themodel,thencorruptedby additive noise ^ :A�B�C_8`]\:Sa B <XD#EGF H=>cbd^A measurementA�B�C dependsonly on the parametersa B for the imageinwhich it wasobserved,andon the3D featureD#EGF H to which it is assigned.

Without lossof generality, let usconsiderthecasein whichthefeaturesD#Eare3D pointsandthemeasurementsA�B�C arepointsin the2D image.In thiscase] canbewrittenasa3D rigid displacementfollowedby a projection:]\:Sa B <XD#Ee>f8?g B=h ijB : D#Elknm B >po (2)

where ijB and m B arethe rotationmatrix andtranslationof the q -th camera,respectively, and g BRrtsvu?wxszy is a projectionoperatorwhich projectsa3D point to the 2D imageplane.Variouscameramodelscanbe definedbyspecifyingthe actionof this projectionoperatoron a point D{8|: }~<��c<���>��(Morris et al., 1999).For example,theprojectionoperatorsfor orthographyandcalibratedperspective aredefinedas:g��B h D~o�8 � } ��� <�g��B h D~o�8 � }~����c���J�

Finally, we needto assumea distribution for thenoise ^ . In thecasethat^ is i.i.d. zero-meanGaussiannoisewith standarddeviation � , thenegativelog-likelihoodis simplyasumof squaredreprojectionerrors:

L�MON�PR:S5UTV@W<X4Q>\8�k��� � yd�� B �\����C/�\��� A�B�C_k�]\:Sa B <XD#EGF He> � y (3)

The morerealisticmodel for automaticfeaturedetectors,whereeachmea-surementcanhave its own individual covariancematrix i B�C , canbeaccom-modatedwith obviousmodifications.

ml.tex; 19/01/2002; 13:46; p.6

Page 7: EM, MCMC, andChainFlipping for Structure from Motion with

EM, MCMC, andChainFlipping for Structurefrom Motion 7

2.3. EXISTING METHODS FOR STRUCTURE FROM MOTION

Thestructurefrom motionproblemhasbeenstudiedextensively in thecom-puter vision literatureover the pastthreedecades.A good survey of tech-niquescanbefoundin (Hartley & Zisserman,2000).

The earliestwork focusedon reconstructionfrom two imagesonly (Ull-man,1979;Longuet-Higgins,1981).Later new methodsweredevelopedtohandlemultiple images,andthey canall viewedasminimizing anobjectivefunctionsuchas(3), underavarietyof differentassumptions:

In the caseof orthographicprojection the maximum likelihood model�J�canbefoundefficiently usingusinga factorizationapproach(Tomasi&

Kanade,1992).Heresingularvaluedecomposition(SVD) is first appliedtothedata � in orderto obtainaffinestructure�W� andmotion ��� . Euclideanstructureandmotion is thenobtainedafter an additionalstepthat imposesmetric constraintson ��� . The factorizationmethodis fastanddoesnot re-quireagoodinitial estimateto converge.It hasbeenappliedto morecomplexcameramodels,i.e.,weak-andpara-perspective models(Poelman& Kanade,1997),andeven to fully perspective cameras(Triggs, 1996).The readerisreferredto (Tomasi& Kanade,1992;Poelman& Kanade,1997;Morris &Kanade,1998)for detailsandadditionalreferences.

In thecaseof full perspectivecamerasthemeasurementfunction  \¡S¢W£)¤X¥#¦e§is non-linear, and one needsto resort to non-linearoptimization to mini-mizethere-projectionerror(3). Thisprocedureis known in photogrammetryandcomputervision asbundleadjustment(Spetsakis& Aloimonos,1991;Szeliski& Kang, 1993;Hartley, 1994;Triggs et al., 1999).The advantagewith respectto factorizationis that it givestheexactML estimate,if it con-verges.However, it can easily get stuck in local minima, and thus a goodinitial estimateof thesolutionneedsto beavailable.To obtainthis, recursiveestimationtechniquescanbeusedtoprocesstheimagesasthey arrive(Broida& Chellappa,1991).

2.4. SFM WITHOUT CORRESPONDENCES

In the casethat the correspondencesareunknown we cannotdirectly applythemethodsdiscussedin Section2.3.Althoughwecanstill framethiscaseasaproblemof maximumlikelihoodestimation,solvingit directly is intractabledue to the combinatorialnatureof the dataassociationproblem.By totalprobability the maximumlikelihoodestimate

�J�n¨ ¡ � � ¤=� � § of structureandmotiongivenonly themeasurements� is givenby� � ¨

argmax© ª�«O¬Q­R¡ �U® �,§ ¨ argmax© ª�«O¬Q¯�°±­R¡ �U® �W¤X²Q§ (4)

a sumof likelihoodtermsof the form (1), with oneterm for every possibleassignmentvector² . Now, for any realisticnumberof features³ andnumber

ml.tex; 19/01/2002; 13:46; p.7

Page 8: EM, MCMC, andChainFlipping for Structure from Motion with

8 Dellaert,Seitz,Thorpe& Thrun

of images , the numberof assignmentsexplodescombinatorially. Thereare µQ¶ possibleassignmentvectors·c¸ in eachimage,yielding a total of µQ¶ ¹assignments.In summary, ºR»S¼U½V¾,¿ is in generalhardto obtainexplicitly, asit involvessummingoveracombinatorialnumberof possibleassignments.

2.5. THE EXPECTATION MAXIMIZATION ALGORITHM

A key insight is that we can use the well-known EM algorithm (Hartley,1958; Dempsteret al., 1977; McLachlan& Krishnan,1997) to attack thedataassociationproblemthatarisesin thecontext of structurefrom motion.While a directapproachto computingthetotal likelihood ºR»S¼U½V¾,¿ in (4) isgenerallyintractable,EM providesapracticalmethodfor finding its maxima.TheEM algorithmstartsfrom an initial guess¼RÀ for structureandmotion,andtheniteratesover thefollowing steps:

1. E-step: Calculatetheexpectedlog likelihoodfunction ÁÃÂÄ»S¼U¿ :Á  »S¼U¿ÆÅ�Ç�È�É Â » ·Q¿�Ê�ËOÌ~ºR»S¼U½V¾WÍX·Q¿ (5)

wherethe expectationis taken with respectto the posteriordistributionÉ�ÂÄ» ·Q¿ÃÎÅÐÏZ» ·\Ñ�¾WÍ=¼JÂG¿ over all possibleassignments· given thedata ¾andacurrentguess¼J for structureandmotion.

2. M-step: Find theML estimate¼JÂ�Ò\Ó for structureandmotion,by maxi-mizing ÁÃÂÄ»S¼U¿ : ¼ Â�Ò\Ó Å argmaxÔ Á  »S¼U¿

It is importantto notethat ÁÃÂe»S¼U¿ is calculatedin the E-stepby evaluatingÉ�ÂÄ» ·Q¿ usingthecurrentguess¼J for structureandmotion(hencethesuper-script Õ ), whereasin theM-stepwe areoptimizing ÁÃÂÄ»S¼U¿ with respectto thefreevariable ¼ to obtainthe new estimate¼JÂ�Ò\Ó . It canbe proven that theEM algorithmconvergesto a local maximumof ºR»S¼U½V¾,¿ (Dempsteret al.,1977;McLachlan& Krishnan,1997).

3. The M-step and Virtual Measurements

In this sectionwe show that the M-step for structurefrom motion can beimplementedin a simpleandintuitive way. We show that theexpectedlog-likelihoodcanberewritten such that theM-stepamountsto solvinga struc-ture from motion problemof the samesizeas before, but using as input anewly synthesizedsetof “virtual measurements”, createdin theE-step.Theconceptof using syntheticmeasurementsis not new. It is also usedin the

ml.tex; 19/01/2002; 13:46; p.8

Page 9: EM, MCMC, andChainFlipping for Structure from Motion with

EM, MCMC, andChainFlipping for Structurefrom Motion 9

trackingliterature,whereEM is usedto performtracksmoothing(Avitzour,1992;Streit& Luginbuhl, 1994).

We first rewrite the expectedlog-likelihood ÖÃ×ÄØSÙUÚ in termsof sum ofsquarederrors,which we can do under the assumptionof i.i.d. Gaussiannoise.By substitutingthe expressionfor the log likelihood Û�ÜOÝQÞRØSÙUßVàWáXâQÚfrom (3) in equation(5), we obtain:

Ö × ØSÙUÚÆã�ä�åæÄçvèêé�ë�ì × Ø âQÚ,íé î ï\ðòñéó ï\ð�ô�õ î ó ä�ö\ØS÷ î áXø#ùGú ûeÚ ô è (6)

The key to the efficiency of EM lies in the fact that the expressionabovecontainsmany repeatedterms.,andcanberewrittenas

Ö × ØSÙUÚfã�ä åæÄçvè íé î ï\ðüñéù ï\ðüñéó ï\ð ì ×î ù ó ô�õ î ó ä�ö\ØS÷ î áXø#ùeÚ ô è (7)

whereì ×î ù ó is themarginal posteriorprobability ýZØ þ î ó ãÿþ���àWá=ÙJ×pÚ . Notethatthis doesnot dependon theassumptionof Gaussiannoise,but ratheron theconditionalindependenceof imagemeasurements.The marginal probabili-ties ì ×î ù ó canbecalculatedby summingì ×ÄØ âQÚ over all possibleassignmentsâwhereþ î ó ãÿþ (with �vØ���á�� Ú theKronecker deltafunction):ì ×î ù ó��ãÿýZØ þ î ó ãÿþ���àWá=Ù × Úfã é�ë �vØ þ î ó á�þ Ú ì × Ø âQÚ (8)

The main point to be madein this sectionis this: it can be shown bysimplealgebraicmanipulationthat(7) canbewrittenasthesumof aconstantthatdoesnot dependon Ù , anda new re-projectionerrorof � featuresin images Ö × ØSÙUÚfã�ÿä åæÄçvè íé î ï\ðüñéù ï\ð ô � ×î ù ä�ö\ØS÷ î áXø#ùeÚ ô è (9)

wherethevirtual measurements� ×î ù aredefinedsimply asweightedaveragesof theoriginalmeasurementsõ î ó :

� ×î ù �ã ñéó ï\ð ì ×î ù ó õ î ó (10)

The importantpoint is that the M-stepobjective function (9) above, arrivedat by assumingunknowncorrespondence,is of exactly thesameform astheobjective function(3) for theSFM problemwith knowncorrespondence.Asa consequence,any of the existing SFM methodscan be usedto implementtheM-step.Thisprovidesanintuitive interpretationfor theoverallalgorithm:

1. E-step:Calculatetheweightsì ×î ù ó from thedistributionoverassignments.Then,in eachof the imagescalculate� virtual measurements� ×î ù .

ml.tex; 19/01/2002; 13:46; p.9

Page 10: EM, MCMC, andChainFlipping for Structure from Motion with

10 Dellaert,Seitz,Thorpe& Thrun

2. M-step: Solve a conventionalSFM problemusingthe virtual measure-mentsasinput.

In otherwords,theE-stepsynthesizesnew measurementdata,andtheM-stepis implementedusingconventionalSFM methods.

4. Mark ov Chain Monte Carlo and the E-step

Theprevioussectionshowedthat,whengiventhevirtual measurements,theM-step can be implementedusing any known SFM approach.As a con-sequence,we needonly concernourselves with the implementationof theE-step.In particular, we needto calculatethe marginal probabilities������������� � ��� � ������� � �"! neededto calculatethevirtual measurements#$���� .

Unfortunately, dueto themutualexclusionconstraintananalyticexpres-sion for the sufficient statistics�%������ is hardto obtain.Assumingconditionalindependenceof theassignments& � in eachimage,wecanfactor��� � &'! as:

� � � &'!)(� ��� & ����� � � ! � *+� ,)- ��� & � ��� � � � � !where

� � arethemeasurementsin image. . Applying Bayeslaw, we have��� & � ��� � � � � !0/ ��� & � ��� � !�1�243 5�6879;:=<?>@�A,)-�BDC ��� 6FE �HG �� ��I �� ! B <;J (11)

Thesecondpartof thisexpressionis simpleenough.However, theprior prob-ability

��� & � ��� �K! of anassignment& � encodestheknowledgewe have aboutthe structurefrom motion domain:if a measurementC ��� hasbeenassigned� ��� � �

, thennoothermeasurementin thesameimageshouldbeassignedthesamefeaturepoint

I � . While it is easyto evaluatethe posteriorprobability���� � & � ! for any givenassignment& � through(11), a closedform expressionfor �������� thatincorporatesthismutualexclusionconstraintis notavailable.

4.1. SAMPLING THE DISTRIBUTION OVER ASSIGNMENTS & �The solutionwe employ is to insteadsamplefrom the posteriorprobabilitydistribution �%�� � & � ! over valid assignmentsvectors& � . Formally this canbejustified in the context of a MonteCarlo EM or MCEM, a versionof EMwheretheE-stepis executedby aMonte-Carloprocess(Tanner, 1996).

To samplefrom ���� � & � ! we usea Markov chain Monte Carlo (MCMC)samplingmethod(Neal,1993;Gilks etal.,1996;Doucetetal.,2001).MCMCmethodscanbeusedto obtainapproximatevaluesfor expectationsover dis-tributionsthatdefy easyanalyticalsolutions.All MCMC methodswork the

ml.tex; 19/01/2002; 13:46; p.10

Page 11: EM, MCMC, andChainFlipping for Structure from Motion with

EM, MCMC, andChainFlipping for Structurefrom Motion 11

sameway: they generatea sequenceof states, in our casethe assignmentsvectorsLNM in image O , with the propertythat the collectionof generatedas-signmentsLNPM approximatesa samplefrom a target distribution, in our casethe posteriordistribution Q�RMTS LNMVU . To accomplishthis, a Markov chain is de-finedover thespaceof assignmentsLNM , i.e. a transitionprobabilitymatrix isspecifiedthatgivestheprobabilityof transitioningfrom any givenassignmentLNM to any other. Thetransitionprobabilitiesaresetup in a very specificway,however, suchthat thestationarydistribution of theMarkov chainis exactlythe target distribution Q%RM S LNMVU . This guaranteesthat, if we run thechainfor asufficiently long time andthenstartrecordingstates,thesestatesconstitutea(correlated)samplefrom thetargetdistribution.

The Metropolis-Hastings(MH) algorithm (Hastings,1970) is one wayto simulatea Markov chainwith the correctstationarydistribution, withoutexplicitly building the full transitionprobabilitymatrix (which would be in-tractable).In ourcase,weuseit to generateasequenceof W samplesLNPM fromthe posteriorQ�RM S LNMVU . The pseudo-codefor the MH algorithm is as follows(adaptedfrom (Gilks et al., 1996)):

1. Startwith avalid initial assignmentL=XM .

2. Proposeanew assignmentLNYM usingtheproposaldensityZ S LNYMV[ LNPM U3. Calculatetheacceptanceratio\^] Q�RM S LNYM UQ RM S L PM U Z S LNPM;[ LNYM UZ S L YM [ L PM U (12)

4. If \�_`]ba thenacceptLNYM , i.e.we setL P�c)dM e LNYM .Otherwise, acceptLNYM with probability \ . If theproposalis rejected,thenwe keeptheprevioussample,i.e. wesetL P�c)dM ] LNPM .

Intuitively, step2 proposes“moves” in statespace,generatedaccordingto aprobability distribution Z S LNYM�[ LNPM U which is fixed in time but candependonthe currentstateLNPM . The calculationof \ andthe acceptancemechanisminstepsf and g have theeffect of modifying the transitionprobabilitiesof thechainsuchthatits stationarydistribution is exactly Q�RM .

TheMH algorithmeasilyallows incorporatingthemutualexclusioncon-straint: if an an assignmentLNYM is proposedthat violatesthe constraint,theacceptanceratio is simply h , andthemove is notaccepted.Alternatively, andthis is moreefficient,onecouldtake carenever to proposesuchamove.

To computethe virtual measurementsin (10), we needto computethemarginal probabilitiesQ�RM�i�j from the sample k LNPMml . Fortunately, this can bedonewithout explicitly storingthesamples,by keepingrunningcounts n M�i�j

ml.tex; 19/01/2002; 13:46; p.11

Page 12: EM, MCMC, andChainFlipping for Structure from Motion with

12 Dellaert,Seitz,Thorpe& Thrun

of how many timeseachmeasuremento�p�q is assignedto featurer , ass�tp�u�qwvyxz|{$p�u�q~}� xz������)���=� r �p�q�� rm� (13)

is easilyseento betheMonteCarloapproximationto (8).Finally, in orderto implementthesampler, we needto know how to pro-

posenew assignments�N�p , i.e. the proposaldensity � � � �p;� �N�p � , and how tocomputetheratio � . Bothelementsarediscussedin detailin Section7.

5. The Algorithm in Practice

Thepseudo-codefor thefinal algorithmis asfollows:

1. Generateaninitial structureandmotionestimate�w� .

2. Given � tandthedata � , run theMetropolis-Hastingssamplerin each

imageto obtainapproximatevaluesfor theweightss tp�u�q (equation13).

3. Calculatethevirtual measurements� tp�u usingequation(10).

4. Find the new estimate� t�� �for structureandmotion using the virtual

measurements� tp�u as data.This can be done using any SFM methoddiscussedin Section2.3.

5. If notconverged,returnto step2.

Onesignificantdisadvantageof EM is thatis only guaranteedto converge toa local maximumof thelikelihoodfunction,not to aglobalmaximum.This isespeciallyproblematicin thecurrentapplication,wherebadinitial estimatesfor structureandmotioncanbelocked in by incorrectcorrespondences,andviceversa.In orderto avoid this,weemploy threedifferentstrategies:

1. Grossto fine structurevia annealing. A well known techniqueto avoidlocal mimima is annealing:herewe increasethe noiseparameter� inearly iterations,graduallydecreasingit to its correctvalue.This hastwobeneficialconsequences.First, the posteriordistribution

s tp � � p � is lesspeaked when � is high, allowing the MCMC samplerto explore thespaceof assignments� p moreeasily. Second,theexpectedlog-likelihood� t � ��� is smootherandhasfewer local maximafor highervaluesof � .

2. Minimizing the influenceof local mismatchesvia robust optimization.A typical failuremodeof thealgorithmin thefinal stagesis dueto localmismatches,wheremeasurementsgeneratedby two featuresarecorrectly

ml.tex; 19/01/2002; 13:46; p.12

Page 13: EM, MCMC, andChainFlipping for Structure from Motion with

EM, MCMC, andChainFlipping for Structurefrom Motion 13

Figure 2. Threeout of 11 cube images.Although the imageswere originally taken as asequencein time, theorderingof theimagesis irrelevantto our method.

assignedin most of the images,but switchedin some.Becauseof thequadraticerrorfunctionthis canseverly biasthemotionrecovery, whichin turn locks the incorrectcorrespondenceinto place.Fortunately, wefound that this can be overcomeby employing a robust optimizationalgorithmin the M-step,e.g. the robust factorizationmethoddecsribedin (Kurataet al., 1999).Note that the EM mechanismis still crucial torecovering thegrossstructureandrefining thesolution:the robust opti-mizationonly helpsin discardinglocalmismatchesin thefinal stages.Atthatpoint, thedistributionscomputedin theE-stepbecomereally sharp,andcangetlockedinto local minimamoreeasily.

3. Randomrestarts.It is easyto detectwhena local minimum is reachedbasedon theexpectedvalueof theresidual.If this occurs,thealgorithmis restartedwith differentinitial conditions,until eventuallysuccesful.

The combinationof thesestrategies leadsto goodresultsin many cases.Amoredetailedanalysisis presentedin thesectionbelow.

6. Resultsfor SFM without Correspondence

Below we show resultson two setsof imagesfor which the SFM problemis non-trivial in the absenceof correspondence.Many more examplescanbe found in (Dellaert,2001).The input to the algorithmis alwaysa setofmanuallyobtainedimagemeasurements.To initialize, the3D points��� weregeneratedrandomly in a normally distributed cloud arounda depth of 1,whereasthe cameras��� wereall initialized at the origin. In eachcase,weran the EM algorithm for 100 iterations,with the annealingparameter�decreasinglinearly from 40 pixels to 1 pixel. For eachEM iteration,we ranthesamplerin eachimagefor 1000stepsperpoint.An entirerun (of 100EMiterations)takeson theorderof aminuteof CPUtimeon astandardPC.

In practice,the algorithmconvergesconsistentlyandquickly to an esti-matefor thestructureandmotion wherethe correctassignmentis the mostprobableone,andwhereall assignmentsin the different imagesagreewitheachother. Weillustratethisusingtheimagesetshown in Figure2,whichwas

ml.tex; 19/01/2002; 13:46; p.13

Page 14: EM, MCMC, andChainFlipping for Structure from Motion with

14 Dellaert,Seitz,Thorpe& Thrun

t=0 σ=0.0 t=1 σ=25.1 t=3 σ=23.5 t=10 σ=18.7

t=20 σ=13.5 t=100 σ=1.0

Figure 3. Thestructureestimateasinitialized andat successive iterations� of thealgorithm.

Figure 4. 4 outof 5 perspective imagesof a house.

taken underorthographicprojection.The typical evolution of the algorithmis illustratedin Figure3, wherewe have shown a wire-framemodelof therecoveredstructureat successive instantsof time. Thereare two importantpointsto note:(a) thegrossstructure is recovered in theveryfirst iteration,startingfromrandominitial structure, and(b) finerdetailsof thestructurearegraduallyresolvedastheparameter� is decreased.Theestimatefor thestruc-ture after convergenceis almostidentical to the onefound by factorizationwhengiventhecorrectassignment.

To illustratetheEM iterations,considerthesetof imagesin Figure4 takenunderperspective projection.In theperspective case,we implementthe M-stepaspara-perspective factorizationfollowedby bundleadjustment.In thisexamplewe do not show the recoveredstructure(which is good),but show

ml.tex; 19/01/2002; 13:46; p.14

Page 15: EM, MCMC, andChainFlipping for Structure from Motion with

EM, MCMC, andChainFlipping for Structurefrom Motion 15

(a) it 1 (b) it 10 (c) it 100

Figure 5. Themarginal probabilities ¢¡£ ¤�¥ at threedifferentiterations,respectively 1, 10, and100.Eachrow correspondsto ameasurement¦ £ ¥ , groupedaccordingto imageindex, whereasthe columnsrepresentthe § features ¤ . In this example §^©FªA« and ¬­©Fª . Black corre-spondsto a marginal probability of 1. Note that in the final iteration,the correspondenceisnear-perfect.

the marginal probabilities®�¯°�±�² at threedifferent timesduring the courseofthealgorithm,in Figure5. In early iterations,³ is high andthereis still a lotof ambiguity. Towardstheend,thedistribution focusesin on oneconsistentassignment.In the last iterationthe marginal probabilitiesareall consistentwith the ground-truthassignment,andthe featuresin thefigure areorderedsuchthat this correspondsto a set of 5 stacked identity matrices.The off-diagonalmarginalsarenotexactly zero,simplyvery closeto zero.

Finally, in order to investigatethe behavior of the algorithmin termsoflocalmaxima,wehaveconductedaseriesof experimentswith syntheticdata.For differentsettingsof ´ and µ , 5 scenesweregeneratedrandomly. The µpointswererandomlygeneratedon a 2D squarewith side ¶¸·º¹ , thenweredisplacedfrom the planeaccordingto a normal distribution with standarddeviation of ³%»b·8¼¾½ ¿ , yielding a random’plane plus parallax scene.cameraswherethenplacedrandomlyon a spheresegmentspanningan arcÀ ·ÂÁ)Ã;Ä , atadistanceof ÅÆ·bÇ . Themeasurementmodelwasorthographic,with measurementnoisedrawn form a Gaussianwith ³b·È¼¾½ ¼¢¼¢Ç . For each

ml.tex; 19/01/2002; 13:46; p.15

Page 16: EM, MCMC, andChainFlipping for Structure from Motion with

16 Dellaert,Seitz,Thorpe& Thrun

scene: A B C D E

m=10n=20 4 5 5 4 5

m=15n=20 4 5 5 5 5

m=20n=20 4 5 4 5 3

m=25n=20 5 4 4 5 5

m=30n=20 5 4 5 5 5

scene: A B C D E

m=10n=20 4 5 5 4 5

m=10n=40 3 5 4 4 5

m=10n=60 4 4 (0.7) 4 3

m=10n=80 (0.3) 4 3 (1.3) 2

Figure 6. EM was run 5 times for eachof 5 randomlygeneratedexamplesA-E for eachdifferentsettingof É andÊ . Thetablesshow thenumberof timesEM convergedto theglobalmaximum,outof 5 trials.If noneof thetrialsconverged,thepercentageof incorrectlyassignedmeasurementsis shown in brackets.

settingof Ë and Ì , 5 sceneswere generatedin this manner, and the EMalgorithmwasrun5 timesfor eachscene.thenumberof timesEM convergedis summarizedin Figure6. For ÌÂÍÏ΢Рthemajority of trials converged,forall valuesof Ë . For morepointsthepossibility of confusingmeasurementsgrows, and in threecases(out of 20) noneof the 5 trials converged to theglobalmaximum.However, evenin thesecasesthepercentageof incorrectlyassignedmeasurementswas small, indicating that somelocal mismatchesremainedthat were unresolved by the robust factorizationscheme.This isto beexpectedasmoreandmorepointsareintroducedin thescene.

7. An Efficient Sampler

TheEM approachfor structureandmotionwithout correspondenceoutlinedin the previous sectionsis a statisticallysoundway to dealwith a difficultdataassociationproblem.However, in orderfor it to scaleup to largerprob-lems,it is imperative that it is alsoefficient. In this sectionwe show that theMetropolis-Hastingsmethodcan be madeto very effectively samplefromweightedassignments,yielding anefficient E-stepimplementation.

ml.tex; 19/01/2002; 13:46; p.16

Page 17: EM, MCMC, andChainFlipping for Structure from Motion with

EM, MCMC, andChainFlipping for Structurefrom Motion 17

Theconvergenceof theMetropolis-Hastingsalgorithmdependscruciallyon theproposaldensityÑ . Weneedaproposalstrategy thatleadsto arapidlymixing Markov chain,i.e. onethatconvergesquickly to thestationarydistri-bution. Below we discussthreedifferentproposalstrategies,eachof whichinducesaMarkov chainwith increasinglybetterconvergenceproperties.

7.1. PRELIMINARIES

It is convenientat this timeto look at thesamplingin eachimagein isolation,andthink of it in termsof weightedbipartite graph matching. Considerthebipartitegraph ÒÔÓÖÕ�×ÙØ�Ú$ØVÛÝÜ in imageÞ wherethevertices× correspondto

theimagemeasurements,i.e. ß%à�áÓ­â�ã�à , andtheverticesÚ areidentifiedwiththe projectedfeatures,given thecurrentguessäæå for structureandmotion,

i.e. ç¢èéáÓëê)ÕHì�åã Ø�í�åè Ü . Both î and ï rangefrom ð to ñ , i.e. òó×�òôÓõòóÚ~òöÓñ . Finally, the graphis fully connectedÛ÷Óy×ÖøéÚ , andwe associatethefollowing edge weightwith eachedgeù�ÓÏÕ ß%à�ØVç¢è�Ü :ú Õ ß%à�ØVç¢è�Ü)áÓ ðû;ü=ý$þ ß%àôÿ�ç¢è þ ý Ó ðû;ü=ý$þ â�ã�àôÿFê)ÕHì åã Ø�í åè Ü þ ýA matching is definedasasubset� of theedgesÛ , suchthateachvertex isincidentto atmostoneedge.An assignmentis definedasaperfectmatching:asetof ñ edgessuchthatevery vertex is incidentto exactlyoneedge.

Given thesedefinitions,it is easilyseenthat every assignmentvector � ãcorrespondsto an assignmentin the bipartitegraph Ò , so we usethe samesymbol to denoteboth entities.Furthermore,we usethe notation � ã Õ ßNÜ todenotethe matchof a vertex ß , i.e. � ã Õ ß%àmÜ|Ó ç¢è if f ïAã�à­Ó ï . Recallingequation(11), it is easily seenthat for valid assignments� ã , the posteriorprobability ��åã Õ � ã Ü canbeexpressedin termsof theedgeweightsasfollows:� åã Õ � ã Ü������ � ÿ ðû;ü ý ��à���� þ â�ã�à ÿFê)ÕHì åã Ø�í åè Ü þ ý�� �¸ù������ ��� � (14)

wheretheweight ú Õ � ã Ü of anassignmentis definedasú Õ � ã ÜTÓ���à���� ú Õ ß%à�Ø � ã Õ ß%à Ü"ÜExpression(14) hastheform of a Gibbsdistribution, whereú Õ � ã Ü playstherole of an energy term: assignmentswith higher weight (energy) are lesslikely, assignmentswith lower weight(energy) aremorelikely.

Thus,theproblemof samplingfromtheassignmentvectors � ã in thestruc-tureandmotionproblemisequivalentto samplingfromweightedassignmentsin thebipartite graph Ò , where the target distribution is givenby theGibbs

ml.tex; 19/01/2002; 13:46; p.17

Page 18: EM, MCMC, andChainFlipping for Structure from Motion with

18 Dellaert,Seitz,Thorpe& Thrun

(a) (b) (c)

(d) (e) (f)

Figure 7. An ambiguousassignmentproblemwith !#"%$ . The regular arrangementof theverticesyields two optimalassignments,(a) and(f), whereas(b-e)aremuchlesslikely. Thefigure illustratesa majorproblemwith “flip proposals”:thereis no way to move from (a) to(e) via flip proposalswithoutpassingthroughoneof theunlikely states(b-e).

distribution (14). Below wedroptheimageindex & , andthink solelyin termsof theweightedassignmentproblem.

7.2. FLIP PROPOSALS

Thesimplestway to proposeanew assignment')( from acurrentassignment' is simply to swaptheassignmentof two randomlychosenvertices* :

1. Pick two matchededges+ *�,�-/.),10 and + *)2-/.3230 at random.

2. Swaptheirassignments,i.e. set ')(4+ *�,1065 .32 and ')(7+ *)28065 .),To calculatethe ratio 9 , notethat the proposalratio :�;=<3> <�? @:�;=< ? > <1@BA C . Thus,theacceptanceratio 9 is equalto theprobabilityratio,givenby9 AED + ')(F0D + 'G0 AIH�JKML N + *�,�-/.),10GO N + *)2-/.3230�P N + *�,�-/.3230�P N + *)2-/.),107Q

Even thoughthis “flip proposal”strategy is attractive from a computa-tionalpointof view, it hastheseveredisadvantageof leadingto slowly mixingchainsin many instances.To seethis, considerthearrangementwith R ATSin Figure8. Thereis no way to move from themostlikely configurations(a)to (f) via flip proposalswithout passingthroughoneof theunlikely states(b-e). An MCMC samplerthatproposesonly suchmovescanstaystuckin themodes(a) or (f) for a long time.

ml.tex; 19/01/2002; 13:46; p.18

Page 19: EM, MCMC, andChainFlipping for Structure from Motion with

EM, MCMC, andChainFlipping for Structurefrom Motion 19

(a) (b) (c)

Figure 8. Augmentingpaths.(a) Original, partial matching.(b) An augmentingpath,alter-nating betweenfree and matchededges.(c) The resulting matchingafter augmentingthematchingin (a)with thepathin (b) .

7.3. AUGMENTING PATHS AND ALTERNATING CYCLES

In orderto improve theconvergencepropertiesof thechain,we usetheideaof randomlygeneratingan augmentingpath, a constructthat plays a cen-tral role in deterministicalgorithmsto find theoptimalweightedassignment(Bertsekas,1991;Cook et al., 1998;Papadimitriou& Steiglitz, 1982).Theintuition behindanaugmentingpathis simple:it is away to resolve conflictswhenproposinganew assignmentfor somerandomvertex in U . Whensam-pling,anideafor aproposaldensityis to randomlypick avertex V andchangeits assignment,but asthis canleadto a conflict, we proposeto usea similarmechanismto resolve theconflict recursively.

We now explain augmentingpathsfollowing (Kozen,1991).Assumewehave a partial matchingW . An exampleis given in Figure9 (a). Now pickanunmatchedvertex V , andproposeto matchit up with X . We indicatethisby traversingthefreeedge Y V�Z/X�[ . If X is free,wecansimply addthisedgetothematchingW . However, if X is not freewe cancelits currentassignmentby traversingthematchededgeY XGZ/V]\=[ . Wethenrecurse,until a freevertex in^

is reached,tracingout the augmentingpath _ . Onesucha pathis shownin Figure9 (b). Now thematchingcanbeaugmentedto W`\ by swappingthematchedandthe free edgesin _ . This augmentationoperationis written asW`\)a�Wcbd_ , where b is thesymmetricdifferenceoperatoronsetse bdfEagY eih fB[kjlY eim fB[nagY e jdfB[ h Y fEj e [For theexample,theresultingmatchingis shown in Figure9 (c). Algorithmsto find optimalmatchingsstartwith anemptymatching,andthenperformaseriesof augmentationsuntil amaximalmatchingis obtained.

For samplingpurposesalternatingcyclesareof interest,becausethey im-plementk-swaps.An exampleis shown for opaIq in Figure10.In contrasttotheoptimalalgorithms,whensamplingwe startout with a perfectmatching(an assignment),and want to proposea move to a different -also perfect-matching.We cando this by proposingthe matching r)\satrubwv , where

ml.tex; 19/01/2002; 13:46; p.19

Page 20: EM, MCMC, andChainFlipping for Structure from Motion with

20 Dellaert,Seitz,Thorpe& Thrun

(a) (b) (c)

Figure 9. (a)Original assignment.(b) An alternatingcycle implementinga k-swap,with k=3in this example.(c) Newly obtainedassignment.x

is an alternatingcycle, which hasthe effect of permutinga subsetof theassignments.Suchpermutationsthat leave no elementuntouchedare alsocalledderangements.

7.4. PROPOSING MOVES BY “ CHAIN FLIPPING”

Recall that the goal is to samplefrom assignmentsy usingthe Metropolis-Hastingsalgorithm.We now advancea new strategy to generateproposedmoves, throughan algorithmthat we call “chain flipping” (CF). The algo-rithm is basedon randomlygeneratinganalternatingcycle accordingto thefollowing algorithm:

1. Pick a randomvertex z in {2. Chooseamatch| in } by traversingtheedge~��g� z��/|�� accordingto the

transitionprobabilities� � z��/|����� ���� � ����� z��/|��7���� ���� � ����� z��/|��7� (15)

whichaccordshigherprobabilityto edges~��g� z��/|�� with lowerweight.

3. Traversethematchededge � |G�/z]�=� to undotheformermatch.

4. Continuewith 2 until acycle is formed.

5. Erasethetransientpartto getanalternatingcyclex

.

This algorithmsimulatesa Markov chain � xdefinedon thebipartitegraph�

andterminatesthesimulationwhenacycle is detected.Theresultingalter-natingcycle

xis usedto proposeanew assignment�)�)����� x

, i.e.we“flip”theassignmentson thealternatingcycleor “chain” of alternatingedges.

Wealsoneedto calculatetheacceptanceratio � . As it happens,we have���G���E� � �)�F�� � �G��� � �G�3�)�F�� � � � �3�G� �I� (16)

ml.tex; 19/01/2002; 13:46; p.20

Page 21: EM, MCMC, andChainFlipping for Structure from Motion with

EM, MCMC, andChainFlipping for Structurefrom Motion 21

To prove this,notethatby (14) and(15) theprobabilityratio is givenby�d� �) F¡�d� �G¡£¢E¤�¥�¦�§=¨�© ª¤ ¥�¦�§=¨1ª ¢¬«­�®°¯�± � ²�³��) 7� ²´¡7¡± � ²�³���� ²´¡7¡ (17)

Theproposaldensityµ � �) F¶3�G¡is equalto theprobabilityof proposingacycle·

thatyields�) 

from�

, which is givenby:µ � �   ¶3�G¡ ¢t¸¹º«§ ­�» ¼ ª ®°½ ± � ²�³��   � ²´¡7¡/¾¿ÁÀ7Â`�kà ¯ � ÄÅ¡(18)

wherethe sum is over all transientpathsÄ

that end on the cycle·

, and�kà ¯ � ÄÅ¡is the probability of onesuchtransient.The probability µ � �G¶3�) F¡

of proposing�

startingfrom�) 

is similarly obtained,andsubstitutingbothtogetherwith (17) into (16)yieldsthesurprisingresultÆÇ¢IÈ .

A distinctadvantageof theCF algorithmis that,aswith theGibbssam-pler (Gilks et al., 1996),every proposedmove is alwaysaccepted.The ÉGÊtransitionprobabilities± � ²�³/Ë�¡

arealsofixedandcanbeeasilypre-computed.A major disadvantage,however, is that many of the generatedpathsdo notactuallychangethecurrentassignment,makingthechainslower thanit couldbe.This is becausein step Ì thereis nothingthatpreventsus from choosinga matchededge,leadingto a trivial cycle,andin steadystatematchededgesareexactly thosewith high transitionprobabilities.

7.5. ” SMART CHAIN FLIPPING”

An obvious modification to the CF algorithm, and one that leadsto veryeffective sampling,is to make it impossibleto traversethrougha matchededgewhengeneratingthe proposalpaths.This ensuresthat every proposedmovedoesindeedchangetheassignment,if it is accepted.However, now theratio Æ canbelessthan È , causingsomemovesto berejected.

Forcing the chosenedgesto be free canbe accomplishedby modifyingthetransitionprobabilities± � ²�³/Ë�¡

. Wedenotethenew transitionprobabilitiesas ±8Í � ²�³/Ë�¡

, asthey dependon thecurrentassignment�

, anddefinethemasfollows:

± Í � ²�³/Ë�¡�΢ÐÏÑ Ò Ó ÔÖÕ §=¥�¦�§ ­�» ¼ ªFª×%ØÖÙÚ�Û°Ü Ý�Þ Ó ÔÖÕ §=¥�¦�§ ­�» ¼ ªFª ifËàߢ ��� ²´¡á

ifË ¢ ��� ²´¡

i.e.wedisallow thetransitionthroughamatchededge.Wecanrewrite this intermsof thetransitionprobabilities± � ²�³/Ë�¡

definedearlierin (15),asfollows

± Í � ²�³/Ë�¡ ¢ãâ ä § ­�» ¼ ªå ¥ ä § ­�» ¨3§ ­ ªFª ifËÁߢ ��� ²´¡á

ifË ¢ ��� ²´¡

ml.tex; 19/01/2002; 13:46; p.21

Page 22: EM, MCMC, andChainFlipping for Structure from Motion with

22 Dellaert,Seitz,Thorpe& Thrun

1: a= 0.58 (A) 2: a= 0.58 (A) 3: a= 2.99 (A) 4: a= 0.58 (R) 5: a= 1.00 (A)

6: a= 1.00 (A) 7: a= 0.58 (R) 8: a= 1.00 (A) 9: a= 1.00 (A) 10: a= 1.00 (A)

11: a= 0.58 (R) 12: a= 1.00 (A) 13: a= 1.00 (A) 14: a= 1.00 (A) 15: a= 0.58 (A)

16: a= 1.73 (A) 17: a= 0.58 (A) 18: a= 1.00 (A) 19: a= 1.73 (A) 20: a= 1.00 (A)

Figure 10. 20 iterationsof anMCMC samplerwith the“smartchainflipping” proposals.Foreachiterationweshow æ andwhetherthemove wasaccepted(A) or rejected(R).

Note that thesedependon thecurrent assignmentç , but in an implementa-tion their explicit calculationcanbeavoidedby appropriatelymodifying thecumulative distribution functionof è at run-time.

This proposalstrategy, which we call “smart chainflipping” (SMART),generatesmoreexploratorymovesthantheCF algorithm,but at theexpenseof rejectingsomeof themoves.It canbeeasilyverifiedthatwe now haveé�ê�ëíì]î]ïdð¬ñò�ó°ôöõÅ÷ è´ø ù�ú�ç�ø ù´û7ûõs÷ è´ø ù�ú�ç)ü7ø ù´û7ûIn Figure11 we have shown 20 iterationsof a Metropolis-HastingssamplerusingtheSMART proposals,andalsoshow the valueof é andwhetherthemovewasaccepted(A) or rejected(R).

8. Resultsfor Efficient Sampling

Experimentalresultssupporttheintuition that“smartchainflipping” leadstomorerapidlymixing chains.In orderto assesstherelativeperformanceof the

ml.tex; 19/01/2002; 13:46; p.22

Page 23: EM, MCMC, andChainFlipping for Structure from Motion with

EM, MCMC, andChainFlipping for Structurefrom Motion 23

100

101

102

103

104

10−4

10−3

10−2

10−1

100

random flipschain flippingsmart chain flipping

(a)Low Temperature

100

101

102

103

104

10−4

10−3

10−2

10−1

100

random flipschain flippingsmart chain flipping

(b) High Temperature

Figure11. Log-logplot comparingthemeanabsoluteerror(y-axis)versusnumberof samples(x-axis)for the3 differentproposaldistributions:randomflips, chainflipping, andsmartchainflipping. (a) For a ’sharp’distribution with low annealingparameterýBþ ÿ � � , and(b) for ahigh valueof ý þ#ÿ � � .threedifferentsamplerswe have discussedabove, we have generated1000syntheticexampleswith ����� , andran eachsamplerfor 10000iterationson eachexample.Therewasno needto wait until thestationarydistributionwasreached,astheinitial assignmentwasdrawn from theexactdistributionto startwith, which is possiblefor exampleswith � low. Wethengeneratedalog-logplot of theaverageabsoluteerror(averagedoverall examples)for oneof themarginalstatistics(13)ascomparedto thetruevalue(8).Thiswasdonefor two differentvaluesof theannealingparameters , whichdeterminesthesmoothnessof thedistribution.

As can be seenin Figure 8, the “smart chain flipping” proposalis anorderof magnitudebetterthanthetwo othersamplers,i.e. it reachesthesamelevel of accuracy in far fewer iterations.For lower temperatures,i.e. sharperdistributions, the differenceis more pronounced.For higher temperatures,the errorsare larger on average(as the samplerneedsto explore a largertypical set),andthedifferenceis lesspronounced.It canalsobeseenthatthedifferencebetweentherandomflip and(non-smart)chainflipping proposalsis negligible.

9. RelatedWork

In recentyears,EM hasbecomea popularalgorithmfor estimatingmodelsfrom incompletedata.As outlinedin theintroduction,theissueof incompletedataandthedataassociationproblemarecloselyrelated,thoughnotidentical.

ml.tex; 19/01/2002; 13:46; p.23

Page 24: EM, MCMC, andChainFlipping for Structure from Motion with

24 Dellaert,Seitz,Thorpe& Thrun

The structurefrom motion problemhasbeenstudiedextensively in thecomputervision literatureover the pastdecades,as we have discussedindetail in Section2.3. In the introductionwe discussedthe shortcomingsoftheexistingmethodsfor data-associationin theSFMliterature.

Also in computervision, EM hasbeenusedto determinemembershipof pixels to discreteimagelayers,to computeso-called“layered represen-tations” of the scene(see(Torr et al., 1999)andthe referencestherein).Inthe latterpaper, integratingout thedisparityof thepixels in any given layerfinessesthecorrespondenceproblemin aninterestingway. Thus,it couldbeviewedasanimage-basedversionof what is attemptedin this paper. On theotherhand,themotionparametersareassumedtobeknown by othermeansintheir approach,whereasmotionestimationis anintegral partof ourmethod.

The SFM problemis similar, and in somecasesequivalent, to the maplearningproblemin robotics.Here a mobile robot is given a sequenceofsensormeasurements(e.g.,rangemeasurements)alongwith odometryread-ings,andseeksto constructa mapof its environment.In thecaseof bearingmeasurementsondiscretefeatures,thisconcurrentmappingandlocalization(CML) (Leonard& Durrant-Whyte,1992), is mathematicallyidentical to aSFM problem.Oneof the dominantfamiliesof algorithmsrelied on recur-sive estimationof model featuresand robot posesby a variabledimensionKalmanfilter (Castellanosetal., 1999;Castellanos& Tardos,2000;Leonard& Durrant-Whyte,1992;Leonardet al., 1992).

Theclassicaltarget trackingliteratureprovidesa numberof methodsfordata-association(Bar-Shalom& Fortmann,1988;Popoli& Blackman,1999)that are usedin computervision (Cox, 1993) and CML (Cox & Leonard,1994;Federetal.,1999),suchasthetracksplittingfilter (Zhang& Faugeras,1992),theJointProbabilisticDataAssociationFilter (JPDAF) (Rasmussen&Hager, 1998),andthe multiple hypothesistracker (MHT) (Reid,1979;Cox& Leonard,1994;Cox & Hingorani,1994).Unfortunatelythe latter, morepowerful methodshave exponentialcomplexity so suboptimalapproxima-tionsareusedin practice.However, thestrategiesfor hypothesispruningarebasedon assumptionssuchas motion continuity that are often violated inpractice(Seitz& Dyer, 1995).Thus,they arenot directly applicableto theSFM or CML problemwhenthemeasurementsdo not arrive in a temporallycontinuousfashion.

Thus,both vision andmaplearningapproachesassumethat the dataas-sociationproblemis solved,eitherthroughuniquely identifiablefeaturesinthe environmentof a robot, or throughsensorstreamsthat make it possi-ble to track individual features.Of particulardifficulty, thus,is theproblemof mappingcyclic environments(Gutmann& Konolige,2000),wherefea-turescannotbe tracked and the dataassociationproblemarisesnaturally.Recently, analternative classof algorithmshasbeenproposedthataddressesthe dataassociationproblem(Burgard et al., 1999; Shatkay& Kaelbling,

ml.tex; 19/01/2002; 13:46; p.24

Page 25: EM, MCMC, andChainFlipping for Structure from Motion with

EM, MCMC, andChainFlipping for Structurefrom Motion 25

1997;Shatkay, 1998;Thrunetal.,1998b,1998a).Likeours,thesealgorithmsarebasedon EM, andthey have beendemonstratedto accommodateambi-guitiesandlarge odometricerrors.Thesealgorithmsaresimilar in spirit tothe oneproposedherein that they formulatethe mappingproblemasesti-mationproblemfrom incompletedata,andusetheE-stepof EM to estimateexpectationsover thosemissingdata.Thereareessentialdifferences,though.In particular, thesealgorithmsconsiderthecamerapositionsasmissingdata,whereasoursregardthecameraposesasmodelparameters,andinsteadthecorrespondencematrix is beingestimatedin theE-step.

Recently, EM hasalsobeenproposedin the target-trackingliteraturetoperformsmoothingof tracks,leadingto the ProbabilisticMulti-HypothesisTracker (PMHT) (Avitzour, 1992;Streit & Luginbuhl, 1994;Gauvrit et al.,1997). In the PMHT, the sameconditional independenceassumptionsaremadeasin this paper, andidenticalexpressionsareobtainedfor the virtualmeasurementsin the M-step.However, the PMHT makesthe samemotioncontinuityassumptionsastheclassicalJPDAF andMHT algorithms,whichwe do not assume.Moreover, in our work we optimizefor structure(targets)and motion, a considerablymoredifficult problem.Most importantly, how-ever, the PMHT altogetherabandonsthe mutualexclusionconstraintin theinterestof computationalefficiency. In contrast,in our work we have shownthat thecorrectdistribution in theE-stepcanbeefficiently approximatedbyMarkov chain Monte Carlo sampling.Nevertheless,the PMHT is a veryelegant algorithm, and we conjecturethat combining the PMHT with ourefficient samplerin theE-stepcould leadto a novel, approximatelyoptimaltracker and/orsmootherof interestto thetargettrackingcommunity.

Thenew proposalstrategieswe proposefor efficient samplingof assign-mentsbearan interestingrelation to researchin the field of computationalcomplexity theory. The“chainflipping” proposalis relatedin termsof mech-anism,if not description,to the Broder chain,an MCMC type methodtogenerate(unweighted)assignmentsat random(Broder, 1986).However, ourmethodis specificallygearedtowardssamplingfrom weightedassignments,andusestheweightsto biasproposalstowardsmorelikely assignments.

10. Discussion

In thispaperwehavepresentedanovel tool,whichenablesusto learnmodelsfrom datain thepresenceof non-trivial dataassociationproblems.We haveappliedit successfullyto the structurefrom motion problemwith unknowncorrespondence, significantlyextendingtheapplicabilityof thesemethodstonew imagingsituations.In particular, ourmethodcancopewith imagesgivenin arbitraryorderandtaken from widely separateviewpoints,obviating thetemporalcontinuityassumptionneededto trackfeaturesover time.

ml.tex; 19/01/2002; 13:46; p.25

Page 26: EM, MCMC, andChainFlipping for Structure from Motion with

26 Dellaert,Seitz,Thorpe& Thrun

The final algorithmis simpleandeasyto implement.As summarizedinSection5, at eachiteration one only needsto obtain a sampleof proba-ble assignments,computethe virtual measurements,and solve a syntheticSFM problemusingknown methods.In addition,we have developedanovelsamplingstrategy, called “smart chain flipping”, to calculatethesevirtualmeasurementsefficiently usingtheMetropolis-Hastingsalgorithm.

Thereis plenty of opportunityfor future work. In this paper, we makethe commonlymadeassumptionthat all 3D featuresareseenin all images(Tomasi& Kanade,1992;Hartley, 1994;McLauchlan& Murray, 1995).Ourapproachdoesnot dependon this assumption,however. In (Dellaert,2001)the approachwasextendedto dealwith spuriousmeasurements,by the in-troduction of a NULL feature(as in (Gold et al., 1998)), and occlusion,throughthe developmentof a more sophisticatedprior on assignments.Inaddition,(Dellaert,2001)alsoshows how appearancemeasurementscanbeeasilyintegratedwithin theEM framework.

Allowing occlusionintroducesthethorny issueof modelselection,whichwehaveasyetnotaddressed.In thepresenceof occlusionit is notcommonlyknown a priori how many featuresactuallyexist in theworld. This problemof modelselectionhasbeenaddressedsuccessfullybeforein thecontext ofvision (Ayer & Sawhney, 1995;Torr, 1997),andit is hopedthat the lessonslearnedtherecanequallyapplyin thecurrentcontext.

As arguedin the introductionto this paper, thedataassociationproblemarisesin many problemsof learningmodelsfrom data.While the currentwork hasbeenphrasedin thecontext of thestructurefrom motionproblemincomputervision,we conjecturethatthegeneralapproachis morewidely ap-plicable.For example,asdiscussedabove, therobotmappingproblemsharesa similar setof constraints,making the chainflipping proposaldistributiondirectly applicable.Thus, just aswe employed off-the-shelftechniquesforsolving the SFM problemwith knowncorrespondences,EM and our newMCMC techniquescanbestipulatedto therich literatureonconcurrentmap-pingandlocalization(CML) with known correspondences.Suchanapproachwould ”bootstrap”thesetechniquesto caseswith unknowncorrespondence,which hasgreatpracticalimportance,particularlyin theareaof multi-robotmapping.As a secondexample,we suspectthat our MCMC chainflippingapproachis also applicableto visual object identificationfrom distributedsensors,whereothers(Pasula,Russell,Ostland,& Ritov, 1999)have alreadysuccessfullyappliedEM andMCMC to solve thedataassociationproblem.Data associationproblemsoccur in a wide rangeof learningmodelsfromdata.The applicationof our approachto otherdataassociationproblemsissubjectof futureresearch.

ml.tex; 19/01/2002; 13:46; p.26

Page 27: EM, MCMC, andChainFlipping for Structure from Motion with

EM, MCMC, andChainFlipping for Structurefrom Motion 27

References

Avitzour, D. (1992). A maximumlikelihoodapproachto dataassociation. IEEETrans.on AerospaceandElectronic Systems, 28(2), 560–566.

Ayer, S.,& Sawhney, H. (1995).Layeredrepresentationof motionvideousingrobustmaximum-likelihoodestimationof mixture modelsandMDL encoding. InInt. Conf. on ComputerVision(ICCV), pp.777–784.

Bar-Shalom,Y., & Fortmann,T. (1988). Tracking anddataassociation. AcademicPress,New York.

Basri, R., Grove, A., & Jacobs,D. (1998). Efficient determinationof shapefrommultiple imagescontainingpartial information. PatternRecognition, 31(11),1691–1703.

Beardsley, P., Torr, P., & Zisserman,A. (1996).3D modelacquisitionfrom extendedimagesequences.In Eur. Conf. onComputerVision(ECCV), pp. II:683–695.

Bertsekas,D. (1991). Linear NetworkOptimization:Algorithmsand Codes. TheMIT press,Cambridge,MA.

Borenstein,J.,Everett,B., & Feng,L. (1996). NavigatingMobile Robots:SystemsandTechniques. A. K. Peters,Ltd., Wellesley, MA.

Broder, A. Z. (1986). How hardis to marry at random?(On the approximationofthe permanent).In Proceedingsof the EighteenthAnnualACM SymposiumonTheoryof Computing, pp.50–58Berkeley, California.

Broida, T., & Chellappa,R. (1991). Estimatingthe kinematicsandstructureof arigid objectfrom a sequenceof monocularimages. IEEE Trans.on PatternAnalysisandMachineIntelligence, 13(6), 497–513.

Burgard,W., Fox, D., Jans,H., Matenar, C., & Thrun,S. (1999). Sonar-basedmap-ping of large-scalemobile robotenvironmentsusingEM. In ProceedingsoftheInternationalConferenceon MachineLearningBled,Slovenia.

Castellanos,J.,Montiel,J.,Neira,J.,& Tardos,J.(1999).TheSPmap:A probabilisticframework for simultaneouslocalizationandmapbuilding. IEEE Trans.onRoboticsandAutomation, 15(5), 948–953.

Castellanos,J.,& Tardos,J. (2000). Mobile RobotLocalizationandMap Building:A MultisensorFusionApproach. Kluwer AcademicPublishers,Boston,MA.

Cook,W., Cunningham,W., Pulleyblank,W., & Schrijver, A. (1998).CombinatorialOptimization. JohnWiley & Sons,New York, NY.

Cox, I. (1993). A review of statistical data associationtechniquesfor motioncorrespondence.Int. J. of ComputerVision, 10(1), 53–66.

Cox,I., & Hingorani,S.(1994).An efficientimplementationandevaluationof Reid’smultiple hypothesistrackingalgorithmfor visual tracking. In Int. Conf. onPatternRecognition (ICPR), Vol. 1, pp.437–442Jerusalem,Israel.

Cox, I., & Leonard,J. (1994). Modelinga dynamicenvironmentusinga Bayesianmultiplehypothesisapproach.Artificial Intelligence, 66(2), 311–344.

ml.tex; 19/01/2002; 13:46; p.27

Page 28: EM, MCMC, andChainFlipping for Structure from Motion with

28 Dellaert,Seitz,Thorpe& Thrun

Dellaert, F. (2001). Monte Carlo EM for Data Associationand its Applicationsin ComputerVision. Ph.D. thesis,Schoolof ComputerScience,CarnegieMellon. Also availableasTechnicalReportCMU-CS-01-153.

Dempster, A., Laird, N., & Rubin,D. (1977).Maximumlikelihoodfrom incompletedatavia theEM algorithm. Journal of theRoyalStatisticalSociety, SeriesB,39(1), 1–38.

Deriche,R., & Faugeras,O. (1990). Tracking line segments. Image and VisionComputing, 8, 261–270.

Doucet,A., de Freitas,N., & Gordon,N. (Eds.).(2001). SequentialMonteCarloMethodsIn Practice. Springer-Verlag,New York.

Feder, H. J.S.,Leonard,J.J.,& Smith,C. M. (1999).Adaptivemobilerobotnaviga-tion andmapping.InternationalJournal of RoboticsResearch, SpecialIssueonField andServiceRobotics, 18(7), 650–668.

Gauvrit,H., Le Cadre,J.,& Jauffret,C.(1997).A formulationof multitargettrackingasan incompletedataproblem. IEEE Trans.on Aerospaceand ElectronicSystems, 33(4), 1242–1257.

Gilks, W., Richardson,S.,& Spiegelhalter, D. (Eds.).(1996). Markov chain MonteCarlo in practice. ChapmanandHall.

Gold,S.,Rangarajan,A., Lu, C.,Pappu,S.,& Mjolsness,E. (1998).New algorithmsfor 2D and3D point matching.PatternRecognition, 31(8), 1019–1031.

Gutmann,J.-S., & Konolige, K. (2000). Incrementalmapping of large cyclicenvironments. In Proceedingsof the IEEE International SymposiumonComputationalIntelligencein RoboticsandAutomation(CIRA).

Hartley, H. (1958). Maximumlikelihoodestimationfrom incompletedata.Biomet-rics, 14, 174–194.

Hartley, R. (1994).Euclideanreconstructionfrom uncalibratedviews.In Applicationof Invariancein ComputerVision, pp.237–256.

Hartley, R., & Zisserman,A. (2000). Multiple View Geometryin ComputerVision.CambridgeUniversityPress.

Hastings,W. (1970). Montecarlosamplingmethodsusingmarkov chainsandtheirapplications.Biometrika, 57, 97–109.

Jacobs,D. (1997). Linearfitting with missingdata:Applicationsto structurefrommotion andto characterizingintensityimages. In IEEE Conf. on ComputerVisionandPatternRecognition (CVPR), pp.206–212.

Kozen,D. C. (1991).Thedesignandanalysisof algorithms. Springer-Verlag.

Kurata,T., Fujiki, J.,Kourogi,M., & Sakaue,K. (1999). A robustrecursive factor-izationmethodfor recoveringstructureandmotionfrom livevideoframes.In1999ICCV Workshopon FrameRateprocessing, Corfu,Greece.

Leonard,J.,Cox, I., & Durrant-Whyte,H. (1992). Dynamicmmapbuilding for anautonomousmobilerobot. Int. J. RoboticsResearch, 11(4), 286–289.

Leonard,J.,& Durrant-Whyte,H. (1992).DirectedSonarSensingfor MobileRobotNavigation. Kluwer Academic,Boston.

ml.tex; 19/01/2002; 13:46; p.28

Page 29: EM, MCMC, andChainFlipping for Structure from Motion with

EM, MCMC, andChainFlipping for Structurefrom Motion 29

Longuet-Higgins,H. (1981). A computeralgorithmfor reconstructinga scenefromtwo projections.Nature, 293, 133–135.

McLachlan,G., & Basford,K. (1988). Mixture Models:InferenceandApplicationsto Clustering. MarcelDekker, New York.

McLachlan,G., & Krishnan,T. (1997). TheEM algorithm and extensions. Wileyseriesin probabilityandstatistics.JohnWiley & Sons.

McLauchlan,P., & Murray, D. (1995). A unifying framework for structureandmotion recovery from imagesequences.In Int. Conf. on ComputerVision(ICCV), pp.314–320.

Morris, D., & Kanade,T. (1998). A unified factorizationalgorithmfor points,linesegmentsand planeswith uncertaintymodels. In Int. Conf. on ComputerVision(ICCV), pp.696–702.

Morris, D., Kanatani,K., & Kanade,T. (1999). Uncertaintymodelingfor optimalstructurefrom motion. In ICCV Workshopon VisionAlgorithms:TheoryandPractice.

Neal,R. (1993). ProbabilisticinferenceusingMarkov chainMonteCarlomethods.Tech.rep.CRG-TR-93-1,Dept.of ComputerScience,Universityof Toronto.

Papadimitriou,C., & Steiglitz,K. (1982). CombinatorialOptimization:AlgorithmsandComplexity. Prentice-Hall.

Pasula,H., Russell,S.,Ostland,M., & Ritov, Y. (1999).Trackingmany objectswithmany sensors.In Int. Joint Conf. onArtificial Intelligence(IJCAI) Stockholm.

Poelman,C., & Kanade,T. (1997). A paraperspective factorizationmethodforshapeandmotion recovery. IEEE Trans.on Pattern Analysisand MachineIntelligence, 19(3), 206–218.

Popoli, R., & Blackman,S. S. (1999). Designand Analysisof ModernTrackingSystems. ArtechHouseRadarLibrary.

Rasmussen,C., & Hager, G. (1998). Jointprobabilistictechniquesfor trackingob-jectsusingmultiplevisionclues.In IEEE/RSJInt. Conf. onIntelligentRobotsandSystems(IROS), pp.191–196.

Reid, D. (1979). An algorithm for tracking multiple targets. IEEE Trans. onAutomationandControl, AC-24(6), 84–90.

Scott,G., & Longuet-Higgins,H. (1991). An algorithmfor associatingthefeaturesof two images.Proceedingsof RoyalSocietyof London, B-244, 21–26.

Seitz,S.,& Dyer, C. (1995). Completestructurefrom four point correspondences.In Int. Conf. on ComputerVision(ICCV), pp.330–337.

Shapiro,L., & Brady, J. (1992). Feature-basedcorrespondence:An eigenvectorapproach.ImageandVisionComputing, 10(5), 283–288.

Shatkay, H. (1998).LearningModelsfor RobotNavigation. Ph.D.thesis,ComputerScienceDepartment,Brown University, Providence,RI.

Shatkay, H., & Kaelbling, L. (1997). Learningtopologicalmapswith weak localodometricinformation. In Proceedingsof IJCAI-97. IJCAI, Inc.

ml.tex; 19/01/2002; 13:46; p.29

Page 30: EM, MCMC, andChainFlipping for Structure from Motion with

30 Dellaert,Seitz,Thorpe& Thrun

Smith, A., & Gelfand,A. (1992). Bayesianstatisticswithout tears:A sampling-resamplingperspective. AmericanStatistician, 46(2), 84–88.

Spetsakis,M., & Aloimonos,Y. (1991). A multi-frameapproachto visual motionperception.Int. J. of ComputerVision, 6(3), 245–255.

Streit, R., & Luginbuhl, T. (1994). Maximum likelihoodmethodfor probabilisticmulti-hypothesistracking. In Proc.SPIE,Vol. 2335, pp.394–405.

Szeliski, R., & Kang, S. (1993). Recovering 3D shapeand motion from imagestreamsusingnon-linearleastsquares.Tech.rep.CRL 93/3,DECCambridgeResearchLab.

Tanner, M. (1996).Toolsfor StatisticalInference. SpringerVerlag,New York. ThirdEdition.

Thrun,S.,Fox, D., & Burgard,W. (1998a).A probabilisticapproachto concurrentmappingandlocalizationfor mobile robots. Machine Learning, 31, 29–53.alsoappearedin AutonomousRobots5, 253–271.

Thrun,S.,Fox,D., & Burgard,W. (1998b).Probabilisticmappingof anenvironmentby a mobile robot. In Proceedingsof the IEEE InternationalConferenceonRoboticsandAutomation(ICRA).

Tomasi,C., & Kanade,T. (1992). Shapeand motion from imagestreamsunderorthography:a factorizationmethod. Int. J. of ComputerVision, 9(2), 137–154.

Torr, P., Fitzgibbon,A., & Zisserman,A. (1998).Maintainingmultiplemotionmodelhypothesesover many views to recovermatchingandstructure.In Int. Conf.onComputerVision (ICCV), pp.485–491.

Torr, P. (1997).An assessmentof informationcriteriafor motionmodelselection.InIEEEConf. onComputerVisionandPatternRecognition (CVPR), pp.47–53.

Torr, P., Szeliski,R.,& Anandan,P. (1999).An integratedbayesianapproachto layerextractionfrom imagesequences.In Int. Conf. on ComputerVision (ICCV),pp.983–990.

Triggs, B. (1996). Factorizationmethodsfor projective structureandmotion. InIEEE Conf. on ComputerVision and Pattern Recognition (CVPR), pp. 845–851.

Triggs,B., McLauchlan,P., Hartley, R.,& Fitzgibbon,A. (1999).Bundleadjustment– amodernsynthesis.In VisionAlgorithms99 Corfu,Greece.

Tsai,R., & Huang,T. (1984). Uniquenessandestimationof three-dimensionalmo-tion parametersof rigid objectswith curvedsurfaces.IEEETrans.onPatternAnalysisandMachineIntelligence, 6(1), 13–27.

Ullman,S.(1979).Theinterpretationof visualmotion. MIT Press,Cambridge,MA.

Zhang,Z., & Faugeras,O. (1992). Three-dimensionalmotioncomputationandob-ject segmentationin a long sequenceof stereoframes. Int. J. of ComputerVision, 7(3), 211–241.

ml.tex; 19/01/2002; 13:46; p.30