computer graphics and augmented reality

Publishedin

PRESENCE:TeleoperationsandVirtualEnvironments

SpecialIssueonAugmentedReality

Vol.6,No.4,August1997,pp.433451

ConfluenceofComputerVisionandInteractiveGraphicsforAugmentedReality

GudrunJ.Klinker,KlausH.Ahlers,DavidE.Breen,PierreYvesChevalier,ChrisCrampton,DouglasS.Greer,DieterKoller,AndreKramer,EricRose,MihranTuceryan,RossT.Whitaker

EuropeanComputerIndustryResearchCentre(ECRC)

Arabellastrae17,81925Munich,Germany

Abstract

Augmentedreality(AR)isatechnologyinwhichauser'sviewoftherealworldisenhancedoraugmentedwithadditionalinformationgeneratedfromacomputermodel.UsingARtechnology,userscaninteractwithacombinationofrealandvirtual objects in a natural way. This paradigm constitutes the core of a very promising new technology for manyapplications.However,beforeitcanbeappliedsuccessfully,ARhastofulfillverystrongrequirementsincludingprecisecalibration,registrationandtrackingofsensorsandobjectsinthescene,aswellasadetailedoverallunderstandingofthescene.

Weseecomputervisionandimageprocessingtechnologyplayanincreasingrole inacquiringappropriatesensorandscenemodels.Tobalancerobustnesswithautomation,weintegrateautomaticimageanalysiswithbothinteractiveuserassistanceandinputfrommagnetictrackersandCADmodels.Also,inordertomeettherequirementsoftheemergingglobal information society, future humancomputer interactionwill be highly collaborative and distributed.We thusconductresearchpertainingtodistributedandcollaborativeuseofARtechnology.Wehavedemonstratedourworkinseveral prototype applications, such as collaborative interior design, and collaborativemechanical repair. This paperdescribesourapproachtoARwithexamplesfromapplications,aswellastheunderlyingtechnology.

1. Introduction

Augmentedreality(AR)isatechnologyinwhichauser'sviewoftherealworldisenhancedoraugmentedwithadditionalinformationgeneratedfromacomputermodel.Theenhancementmayconsistofvirtualartifactstobefittedintotheenvironment,oradisplayofnongeometricinformationaboutexistingrealobjects.ARallowsausertoworkwithandexaminereal3Dobjects,whilereceivingadditionalinformationaboutthoseobjectsorthetaskathand.Byexploitingpeople'svisualandspatial skills,ARbrings information into theuser'srealworld

ratherthanpullingtheuserintothecomputer'svirtualworld.UsingARtechnology,userscanthusinteractwithamixedvirtualandrealworldinanaturalway.Thisparadigmforuserinteractionandinformationvisualizationconstitutesthecoreofaverypromisingnewtechnologyformanyapplications.However,realapplicationsimposeverystrongdemandsonARtechnologythatcannotyetbemet.Someofsuchdemandsarelistedbelow.

Inordertocombinerealandvirtualworldsseamlesslysothatthevirtualobjectsalignwellwiththerealones,weneedveryprecisemodelsoftheuser'senvironmentandhowitissensed.Itisessentialtodeterminethelocationandtheopticalpropertiesoftheviewer(orcamera)andthedisplay,i.e.:weneedtocalibratealldevices,registerthemandallobjectsinaglobalcoordinatesystem,andtrackthemovertimewhentheusermovesandinteractswiththescene.

Realisticmergingofvirtualobjectswitharealscenerequiresthatobjectsbehaveinphysicallyplausiblemannerswhentheyaremanipulated,i.e.:theyoccludeorareoccludedbyrealobjects,theyarenotabletomovethroughotherobjects,andtheyareshadowedorindirectlyilluminatedbyotherobjectswhilealsocastingshadowsthemselves.Toenforcesuchphysicalinteractionconstraintsbetweenrealandvirtualobjects,theARsystemneedstohaveaverydetaileddescriptionofthephysicalscene.

InordertocreatetheillusionofanARinterfaceitisrequiredtopresentthevirtualobjectswithahighdegreeofrealism,andtobuilduserinterfaceswithahighdegreeofimmersion.Convincinginteractionandinformationvisualizationtechniquesarestillverymucharesearchissue.Ontopofthat,formultiuserapplicationsinthecontextofARitisnecessarytoaddressthedistributionandsharingofvirtualenvironments,thesupportforusercollaborationandawareness,andtheconnectionbetweenlocalandremoteARinstallations.

Weseecomputervisionandimageprocessingtechnologyalthoughstillrelativelybrittleandslowplayanincreasingroleinacquiringappropriatesensorandscenemodels.Ratherthanusingthevideosignalmerelyasabackdroponwhichvirtualobjectsareshown,weexploretheuseofimageunderstandingtechniquestocalibrate,register and track cameras and objects and to extract the threedimensional structure of the scene.To balancerobustnesswithautomation,weintegrateautomaticimageanalysiswithinteractiveuserassistanceandwithinputfrommagnetictrackersandCADmodels.

InourapproachtoARwecombinecomputergeneratedgraphicswithalivevideosignalfromacameratoproduceanenhancedviewofarealscene,whichisthendisplayedonastandardvideomonitor.Wetrackusermotionandprovidebasicpointingcapabilitiesinformofa3Dpointingdevicewithanattachedmagnetictracker,asshowninFigure6.ThissufficesinourapplicationscenariostodemonstratehowARcanbeusedtoqueryinformationaboutobjectsintherealworld.Forthemanipulationofvirtualobjects,weusemousebasedinteractioninseveralrelated2Dviewsofthesceneonthescreen.

WeconductresearchpertainingtodistributedandcollaborativeuseofARtechnology.Consideringthegrowingglobalinformationsociety,weexpectanincreasingdemandforcollaborativeuseofhighlyinteractivecomputertechnologyovernetworks.Ouremphasisliesonprovidinginteractionconceptsanddistributiontechnologyforusers who collaboratively explore augmented realities, both locally immersed and remotely in the form of atelepresence.

We have demonstrated our work in several prototype applications, such as collaborative interior design, andcollaborativemechanical repair.Thispaperdescribesour approach toARwith examples fromapplications, aswellastheunderlyingtechnology.

2. PreviousWork

Research in augmented reality is a recent but expanding area of research.We briefly summarize the researchconductedtodate.BaudelandBeaudouinLafonhavelookedattheproblemofcontrollingcertainobjects(e.g.,cursorsonapresentationscreen)throughtheuseoffreehandgestures(Baudel&BeaudouinLafon,1993).Feineret al. have used augmented reality in a laser printermaintenance task. In this example, the augmented realitysystem aids the user in the steps required to open the printer and replace various parts (Feiner, MacIntre &Seligmann,1993).Wellnerhasdemonstratedanaugmentedrealitysystemforofficeworkintheformofavirtualdesktop on a physical desk (Wellner, 1993). He interacts on this physical desk both with real and virtualdocuments.Bajuraetal.haveusedaugmentedrealityinmedicalapplicationsinwhichtheultrasoundimageryofapatientissuperimposedonthepatient'svideoimage(Bajura,Fuchs&Ohbuchi,1992).Lorensenetal.useanaugmentedrealitysysteminsurgicalplanningapplications(Lorensen,Cline,Nafis,Kikinis,Altobelli&Gleason,1993).Milgram andDrascic et al. use augmented reality with computer generated stereo graphics to performtelerobotics tasks (Milgram, Zhai,Drascic&Grodski, 1993Drascic,Grodski,Milgram,Ruffo,Wong&Zhai,1993). Caudell andMizell describe the application of augmented reality to manual manufacturing processes(Caudell&Mizell,1992).Fournierhasposedtheproblemsassociatedwithilluminationincombiningsyntheticimageswithimagesofrealscenes(Fournier,1994).

TheutilizationofcomputervisioninARhasdependedupontherequirementsofparticularapplications.Deeringhasexplored themethods required toproduceaccuratehigh resolutionheadtracked stereodisplay inorder toachieve subcentimeter virtual to physical registration (Deering, 1992). Azuma and Bishop, and Janin et al.describetechniquesforcalibratingaseethroughheadmounteddisplay(Azuma&Bishop,1994Janin,Mizell&Claudell,1993).GottschalkandHughespresentamethodforautocalibratingtrackingequipmentusedinARandVR(Gottschalk&Hughes,1993).GleicherandWitkinstatethattheirthroughthelenscontrolsmaybeusedtoregister3Dmodelswithobjectsinimages(Gleicher&Witkin,1992).Morerecently,BajuraandNeumannhaveaddressed the issueof dynamic calibration and registration in augmented reality systems (Bajura&Neumann,1995).Theyuseaclosedloopsystemwhichmeasurestheregistrationerrorinthecombinedimagesandtriestocorrectthe3Dposeerrors.Grimsonetal.haveexploredvisiontechniquestoautomatetheprocessofregisteringmedicaldatatoapatient'sheadusingsegmentedCTorMRIdataandrangedata(Grimson,LozanoPerez,Wells,Ettinger,White&Kikinis,1994Grimson,Ettinger,White,Gleason,LozanoPerez,Wells&Kikinis,1995).Inarelatedproject,Mellorrecentlydevelopedarealtimeobjectandcameracalibrationalgorithmthatcalculatestherelationshipbetween thecoordinatesystemsofanobject,ageometricmodel,and the imageplaneofacamera(Mellor,1995).UenoharaandKanadehavedevelopedtechniquesfortracking2Dimagefeatures,suchasfiducialmarksonapatient'sleg,inrealtimeusingspecialhardwaretocorrelateaffineprojectionsofsmallimageareasbetweenimages(Uenohara&Kanade,1995).Periaetal.usespecializedopticaltrackingdevices(calibratedplateswithLEDsattachedtomedicalequipment)totrackanultrasoundprobeandregisteritwithSPECTdata(Peria,Chevalier,FranoisJoubert,Caravel,Dalsoglio,Lavallee&Cinquin,1995).Bettingetal.aswellasHenrietal.usestereodatatoalignapatient'sheadwithMRIorCTdata(Betting,Feldmar,Ayache&Devernay,1995Henri,Colchester,Zhao,Hawkes,Hill&Evans,1995).

Someresearchershavestudiedthecalibrationissuesrelevanttoheadmounteddisplays(Bajura,Fuchs&Ohbuchi,1992Caudell&Mizell,1992Azuma&Bishop,1994Holloway,1994Kancherla,Rolland,Wright&Burdea,1995).Othershavefocusedonmonitorbasedapproaches(Tuceryan,Greer,Whitaker,Breen,Crampton,Rose&Ahlers, 1995Betting, Feldmar,Ayache&Devernay, 1995Grimson, Ettinger,White,Gleason, LozanoPerez,Wells&Kikinis, 1995Henri,Colchester,Zhao,Hawkes,Hill&Evans, 1995Mellor, 1995Peria,Chevalier,FranoisJoubert,Caravel,Dalsoglio,Lavallee&Cinquin,1995Uenohara&Kanade,1995).Bothapproachescanbesuitabledependingonthedemandsoftheparticularapplication.

3. ApplicationScenarios

We have developed a comprehensive system, GRASP, which we have used as the basis for our applicationdemonstrations.Thissectiondiscussestwoexamples.ThenextsectionsdescribeindetailtheGRASPsystemandtheresearchissuesthatwefocuson.

1. CollaborativeInteriorDesign

Figure1.Augmentedroomshowingarealtablewitharealtelephoneandavirtuallamp,surroundedbytwovirtualchairs.Notethatthechairsarepartiallyoccludedbytherealtablewhilethevirtual

lampoccludesthetable.

Thescenariofortheinteriordesignapplicationassumesanofficemanagerwhoisworkingwithaninteriordesigneronthelayoutofaroom(Ahlers,Kramer,Breen,Chevalier,Crampton,Rose,Tuceryan,Whitaker&Greer,1995).Theofficemanagerintendstoorderfurniturefortheroom.Onacomputermonitortheyboth see a picture of the real room from the viewpoint of the camera. By interacting with variousmanufacturersoveranetwork,theyselectfurniturebyqueryingdatabasesusingagraphicalparadigm.Thesystemprovidesdescriptionsandpicturesoffurniturethatisavailablefromthevariousmanufactureswhohavemademodels available in their databases. Pieces or groups of furniture thatmeet certain reUsingGraphicsHardware inVolumeRenderingApplicationsquirements such as color,manufacturer, or pricemay be requested. The users choose pieces from this "electronic catalogue" and 3D renderings of thisfurnitureappearonthemonitoralongwiththeviewoftheroom.Thefurnitureispositionedusinga3Dmouse.Furniturecanbedeleted,added,andrearrangeduntil theusersaresatisfiedwiththeresult theyviewthesepiecesonthemonitorastheywouldappearintheactualroom.Astheymovethecameratheycanseethefurnishedroomfromdifferentpointsofview.

Theuserscanconsultwithcolleaguesatremotesiteswhoarerunningthesamesystem.Usersatremotesitesmanipulatethesamesetoffurnitureusingastaticpictureoftheroomthatisbeingdesigned.Changesbyoneuserareseeninstantaneouslybyalloftheothers,andadistributedlockingmechanismensuresthatapieceoffurnitureismovedbyonlyoneuseratatime.Inthiswaygroupsofusersatdifferentsitescanworktogetheronthelayoutof theroom(seeFigure1).Thegroupcanrecorda listoffurnitureandthelayoutofthatfurnitureintheroomforfuturereference.

2. CollaborativeMechanicalRepair

Figure2.Augmentedengine.

In the mechanical maintenance and repair scenario, a mechanic is assisted by an AR system whileexaminingandrepairingacomplexengine(Kramer&Chevalier,1996).Thesystempresentsavarietyofinformationtothemechanic,asshowninFigure2.Annotationsidentifythenameofparts,describetheirfunction, or present other important information like maintenance or manufacturing records. The userinteractswiththerealobjectinitsnaturalsettingwithapointingdevicemonitoredbythecomputer.Asthemechanicpointstoaspecificpartoftheengine,theARsystemdisplayscomputergeneratedlinesandtext (annotations) thatdescribe thevisiblecomponentsorgive theuserhints about theobject.Querieswiththepointingdeviceontherealworldobjectmaybeusedtoaddanddeleteannotationtags.Sincewealsotracktheengine,theannotationsmovewiththeengineasitsorientationchanges.Thelinesattachingtheannotationtagswiththeenginefollowtheappropriatevisiblecomponents,allowingtheusertoeasilyidentify the different parts as the viewof the engine changes.Themechanic can also benefit from theassistanceofaremoteexpertwhocancontrolwhatinformationisdisplayedonthemechanic'sARsystem.

4. SystemInfrastructure

Figure3.TheGRASPsystemhardwareconfiguration.

Figure4.TheGRASPsystemsoftwareconfiguration.

TheGRASPsystemformsthecentralcoreofoureffortstokeepthegraphicsandvisualsceneinalignmentandtoprovidean interactive threedimensional interface (Ahlers,Crampton,Greer,Rose&Tuceryan,1994).Figure3showsaschematicoftheGRASPhardwareconfiguration.Theworkstationhardwaregeneratesthegraphicalimageanddisplaysitonahighresolutionmonitor.Ascanconvertertransformsthegraphicsdisplayedonthemonitorintoastandardvideoresolutionandformat.Thescanconverteralsomixesthisgeneratedvideosignalwiththevideosignalinputfromthecameravialuminancekeying.A6DOFmagnetictracker,whichiscapableofsensingthe three translational and the three rotational degrees of freedom, provides theworkstationwith continuallyupdated values for the position and orientation of the tracked objects, including the video camera and thepointing device. A frame grabber digitizes video images for processing within the computer during certainoperations.ThesoftwarehasbeenimplementedusingtheC++programminglanguage.AschematicdiagramofthesoftwarearchitectureisshowninFigure4.

5. SpecificationandAlignmentofCoordinateSpaces

Inordertoalignthevirtualandrealobjectsseamlessly,weneedveryprecisemodelsoftheuser'senvironmentandhow it is sensed. It is essential to calibrate sensors and display devices (i.e., to determine their locations andopticalproperties),toregisterallobjectsandinteractiondevicesinaglobalcoordinatesystem,andtotrackthemwhiletheuseroperatesinthescene.

1. CalibrationofSensorsandVideoEquipment

Duringtheinitialsetup,thecameracharacteristics,thelocationofthe6Dtrackerandtheeffectsofscanconversionandvideomixingmustbedetermined.Theseproceduresarereferredtoastheimage,camera,andtrackingcalibration (Tuceryan,Greer,Whitaker,Breen,Crampton,Rose&Ahlers, 1995).We nowdescribe several such techniques thatmix computer vision algorithmswith varying amounts ofmodelbasedinformationandinteractiveinputfromtheuser.

1. ImageCalibration

Oneof theessentialstepsofourARsystemis themixingof livevideo inputwithsyntheticallygeneratedgeometricdata.Whiletheliveinputiscapturedasananalogvideosignalbythecamerasystem,thesyntheticdataisrendereddigitallyandthenscanconvertedintoavideosignal.Inorderto align the two signals, we need to determine the horizontal and vertical positioning of therendered, scan converted image with respect to the camera image, as well as the relationshipbetweenthetwoaspectratios.

Weuseasynthetictestimagethathastwomarkersinknownpositionstocomputefourdistortionparameters(2Dtranslationandscaling).Thetestimageisscanconvertedintoavideosignal.Forimage calibration purposes, we redigitize it and determine the location of the markers in the

grabbedimage.Thediscrepancybetweentheoriginallocationofthemarkersandtheirpositioninthe grabbed image determines the translational and scaling distortions induced by the scanconverter.Thisinteractiveimagecalibrationmethodaskstheusertoidentifythetwomarkersinthegrabbedimage.

The GRASP system also provides an alternative, automatic routine to compute the distortionparameters.Algorithmically, it iseasier tofinda large,homogeneouslycoloredareainanimagethan the thin linesofamarker.Accordingly, theautomaticalgorithmusesadifferent test imagewhichcontainsoneblacksquare. It finds thedarkarea, fits four lines to itsboundariesand thusdetermines the corners of the square. Two of the corners suffice to determine the distortionparametersofthescanconverter.

Thecomparisonofthetwoapproachesillustratesanimportantdistinctionbetweeninteractiveandautomatic algorithms: while humans work best with sharp line patterns to provide preciseinteractive input, automatic algorithms need to accommodate imprecision due to noise anddigitization effects and thus work better on thicker patterns. On the other hand, automaticalgorithmscandeterminegeometricpropertiesofextendedareasmorepreciselythanhumans,suchasthecenter,anedgeoracornerofanarea.Inconclusion,itisessentialtothedesignofasystemandtoitsuseinanapplicationthatvisualcalibrationaidesbechosenaccordingtotheirintendeduse.Thisisarecurringthemeinourwork.

2. CameraCalibration

Figure5.Thecameracalibrationgrid.

Camera calibration is the process which calculates the extrinsic (position and orientation) andintrinsicparameters(focallength,imagecenter,andpixelsize)ofthecamera.Weassumethattheintrinsicparametersofthecameraremainfixedduringtheaugmentedrealitysession.Thecamera'sextrinsicparametersmaybetrackedandupdated.

Tocomputethecamera'sintrinsicandextrinsicparameters,wepointthecameraataknownobjectinthescene,thecalibrationgridshowninFigure5.Thepositionofthegridand,inparticular,theposition of the centers of the butterfly markers on the grid are known within the 3D worldcoordinate system.We use themapping from these 3D object features to 2D image features tocalculate thecurrentvantagepointof thecameraand its intrinsic imagedistortionproperties. Inprinciple,eachmappingfroma3Dpointto2Dimagecoordinatesdeterminesarayinthescenethatalignstheobjectpointwiththefocalpointofthecamera.Accordingtothepinholecameramodel,several such rays from different object points intersect at the focal point and thus uniquelydeterminetheposeofthecamera,aswellasitsimagingproperties.Accordingly,wecandefineasystemofequationstocomputetheintrinsicandextrinsiccameraparametersusingamappingofobject points to image points andminimizingmeasurement errors. The details are described in(Tuceryan,Greer,Whitaker,Breen,Crampton,Rose&Ahlers,1995).

TheGRASPsystemprovidesaninteractivecameracalibrationroutine:Auserindicatesthecenterofallbutterflypatternswithamouseandlabelsthembytypingtheappropriatecodenameonthekeyboard.

Wealsouseanautomatic,computervisionbasedcameracalibrationalgorithm.Inthisapproach,weuseacalibrationboardthatshowsanarrangementof42blacksquaresonawhitebackground.Processing the imageat a coarse scale,wequicklydetermine thepositionsandextentsofblackblobsintheimage.Byfittingrectanglestothebloboutlinesatfinerscalesandmatchingthemleftto right and top to bottom to the squares of the calibrationboard,wedetermine the calibrationparametersofthecamera.

3. MagneticTrackerCalibration

AlthoughweemphasizeinthispapertheuseofcomputervisiontechniquesforAR,wedonotrelyexclusively on optical information. Complementarily, we also exploit magnetic trackingtechnology,aswell asother interactiveormodelbased input.The tracking systemconsistsofatransmitterandseveralreceivers(trackers)thatcanbeattachedtoobjects,camerasandpointersinthescene.Thetrackingsystemautomaticallyrelatesthe3Dpositionandorientationofeachtrackerto a tracking coordinate system in the transmitter box. It is the task of the tracker calibrationprocedure to determine where the tracking coordinate system resides with respect to the worldcoordinatesystemoftheARapplication.ThisisacriticalissuethatusuallydoesnotariseinVRapplicationssincesuchsystemsonlyneedtotrackrelativemotion.Yet,theabsolutepositioningandtrackingofobjectsanddeviceswithinarealworldcoordinateframeisofgreatestimportanceinARscenarioswhererealityisaugmentedwithvirtualinformation.

At the beginning of each session, we calibrate the magnetic tracking system, relating its localcoordinatesystemtotheworldcoordinatesystem.Thisprocessiscurrentlyperformedinteractively,usingthesamecalibrationgridasforcameracalibration.Wedothisbydeterminingthelocationofat least three points on the calibration grid withmagnetic trackers. Since these points are alsoknownintheworldcoordinatesystem,wecanestablishasystemoflinearequations,relatingthetracked coordinates to the world coordinates and thus determining the unknown position andorientationparametersofthetracker(Tuceryan,Greer,Whitaker,Breen,Crampton,Rose&Ahlers,1995).

2. RegistrationofInteractionDevicesandRealObjects

Inadditiontothesensingdevicesthatwerecalibratedintheprevioussection,scenesalsocontainphysicalobjectsthattheuserwantstointeractwithusing3Dinteractiondevices.Suchobjectsandgadgetsneedtoberegisteredwithrespecttotheworldcoordinatesystem.

1. PointerRegistration

Figure6.3Dpointingdevice.

Currently,weusethemagnetictrackingsystemtoregisterandtrackthepositionofa3Dpointerinoursystem(seeFigure6).

For thepointer registration,weneed todetermine theposition (offset) of the tipof apointer inrelationshiptoanattachedmagnetictracker.Ourprocedurerequirestheusertopointtothesame

pointin3Dspaceseveraltimes,usingadifferentorientationeachtimeforapointerthathasbeenattachedtooneofthetrackers.Foreachpick,thepositionandtheorientationofthetrackermarkwithinthetrackercoordinatesystemarerecorded.Theresultofthisprocedureisasetofpointsanddirections with the common property that the points are all the same distance from the single,pickedpointin3Dspaceandallofthedirectionsassociatedwiththepointsareorientedtowardthepicked point. From this information, we can compute six parameters defining the position andorientation of the pointing device, using a leastsquares approach to solve an overdeterminedsystemoflinearequations.

2. ObjectRegistration

Object registration is the process of finding the six parameters that define the 3D position andorientation,i.e.pose,ofanobjectrelativetosomeknowncoordinatesystem.Thisstepisnecessary,even when tracking objects magnetically, in order to establish the 3D relationship between amagneticreceiverandtheobjecttowhichitisfastened.

We have studied two strategies for determining the 3D pose of an object (Whitaker,Crampton,Breen,Tuceryan&Rose,1995).Thefirstisacamerabasedapproach,whichreliesonacalibratedcamera tomatch 3D landmarks ("calibration points") on the object to their to projection in theimageplane.The secondmethoduses the3Dcoordinatesof thecalibrationpoints, as indicatedmanuallyusingthe3Dpointerwithmagnetictracking,inordertoinferthe3Dposeoftheobject.

There has been extensive research in pose determination in the computer vision (Lowe, 1985Grimson,1990),butmostofthesetechniquesapplytoonlylimitedclassesofmodelsandscenes.Thefocusofthecomputervisionresearchistypicallyautomationandrecognition,featuresthatareinteresting,butnotessentialtoaugmentedvision.Inourwork,thelocationsoflandmarkpointsintheimagearefoundmanuallybyauserwithamouse.Weassumethatthepointsaremappedfromknownlocationsin3spacetotheimageviaarigid3Dtransformationandaprojection.

Werepresent theorientationof theobjectas33 rotationmatrix,whichcreatesa linearsystemwith12unknowns.Eachpointgives2equations,and6pointsarenecessaryforauniquesolution.In practice we assume noise in the input data and use an overdetermined system with a leastsquaredsolutioninordertogetreliableresults.However,becauseweusea33rotationmatrix,, and treat each element as an independent parameter, this linear system does not guarantee anorthonormalsolutionsforthismatrix,anditcanproduce"nonrigid"rotationmatrices.Suchnonrigiditiescanproduceundesirableartifactswhenthesetransformationsarecombinedwithothersinthegraphicssystem.

Orthonormality is enforced adding an additional penalty to the leastsquared solution,

. This creates a nonlinear optimization problemwhichwe solve through gradientdescent. The gradient descent is initialized with the unconstrained (linear) solution, andconstrainedsolutionsaretypicalfoundin1015iterations.

Figure7.Calibrationandtrackinganenginemodel:Awireframeenginemodelregisteredtoarealmodelengineusinganimagebasedcalibration(a),butwhenthemodelisturnedandits

movementstracked(b),thegraphicsshowthemisalignmentinthecamera'szdirection.

Despite goodpointwise alignment in the imageplane, the imagebased calibration canproducesignificanterrorinthedepthtermwhichisnotseeninthereprojectedsolutions.Forinstance,inthecase of the enginemodel shown in Figure 7(a), the imagebased approach can produce a rigidtransformationwhichmatcheslandmarkpointsintheimagetowithinabout2pixels.Yettheerrorinthezdirection(distancefromthecamera)canbeasmuchas23centimeters.Thiserrorbecomesevidentas theobject is turnedas inFigure7(b).Weattribute this errorprimarily toerror in thecamera calibration, andbetter cameramodels and calibration procedures are a topic of ongoingresearch.Becauseofsucherrorwehavedevelopedtheproceduredescribedinthenextsectionforcalibratingobjectswitha3Dpointingdevice.

Theproblemhereistocomputetherigidtransformationbetweenasetof3Dpointpairs.Usingthe3Dpointerandseveralkeystrokestheuserindicatestheworldcoordinates(orsomeotherknowncoordinate system) of landmark points on the object. also gives rise to a linear system of 12unknowns.Forauniquesolution4pointsareneeded,butinmostcasesweusemorethan4pointsand solve for the leastsquares error. As with the imagebased object calibration, error in themeasurements can produce solutions that represent nonrigid transformations. Thus, the samenonlinearpenaltytermcanbeintroducedinorderproduceconstrainedsolutions.

3. TrackingofObjectsandSensors

Calibrationandregistrationrefertostationaryaspectsofascene.InageneralARscenario,however,wehavetodealwithdynamicscenechanges.Withtrackingwedenotetheabilityofoursystemtocopewiththosedynamicscenechanges.Thus,while thecalculationof theexternalcameraparametersandof theposeofanobjectare theresultsofcalibrationandregistration, trackingcanberegardedasacontinuosupdateof thoseparameters.Wearecurrentlyexploringandusing twoapproaches to tracking,magnetictracking,andopticaltracking.

1. MagneticTracking

Asamagnetictrackingdeviceweusethe6Dtracker"FlockofBirds"fromAscensionTechnologyCorporation. Receivers are attached to the camera and each potential moving object. Thesereceiverssensethesixdegreesoffreedom(threetranslationalandthreerotational)withrespecttoatransmitter,whoselocationisbeingkeptfixedinworldcoordinates.

Initially, we have relied exclusively on this magnetic technology since the trackers providepositional and orientational updates at nearly realtime speeds and operatewell in a laboratorysetup.However,magnetic tracking is not practicable in large scale, realistic setups, because thetrackingdatacaneasilybecorruptedbyferromagneticmaterialsinthevicinityofthereceiverandbecausethetrackersoperateonlyinalimitedrange.Anotherdrawbackisthelimitedaccuracyofthesensorreadings.

2. OpticalTracking

Opticaltrackingmethodsarebasedondetectingandtrackingcertainfeaturesintheimage.Thesecanbelines,cornersoranyothersalientfeatures,whichareeasyandreliabletodetectintheimageandcanuniquelybeassociatedwithfeaturesofthe3Dworld.Ourtrackingapproachcurrentlyusesthecornersofsquaresattachedtoobjectsorwalls(seeFigure8)totrackamovingcamera.Oncethe

cameraparametersarerecovered,thescenecanbeaugmentedwithvirtualobjects,suchasshelvesandchairs(seeFigure9).

Figure8.Ouropticaltrackingapproachcurrentlytracksthecornersofsquares.Theleftfigureshowsacornerofaroomwitheightsquares.Therightfigureshowsthedetectedsquaresonly.

Figure9.Augmentedscenewithavirtualchairandshelfthatwererenderedusingtheautomaticallytrackedcameraparameters.

This scenario is relevant to many AR applications where a user moves in the scene and thuscontinuouslychangeshis(thecamera's)viewpoint.Weuseafixedworldcoordinatesystem,thusrecomputingthecameraparametersrelativetotheworldframeineachstep.Conversely,wecouldalso recompute the position of the world system relative to the camera frame, thus using anegocentricframeofrefererence.Theadvantageoftheformerapproachisthatwecanthusexploitcertainmotioninvariantswhichmakethetrackingproblemmuchsimpler.

Weassumethatamodelofthesceneexistsandthatweareabletoadd"fiducialmarks",suchasblacksquares,tothescenetoaidthetrackingprocess.Thesquaresareregisteredinthe3Dscenemodel.Thus,inprinciple,thesamecameracalibrationtechniquesdescribedinsection5.1.2.canbeusedtodetermine,atanypointintime,thepositionofthecamerainscene.Yet,duringthetrackingphase, we need to pay particular attention to speed and robustness of the algorithms. To ouradvantage,wecanexploittimecoherenceofuseractions:usersmoveincontinuousmotions.Wecan benefit from processing results of previous images and from an adaptivemodel of the usermotiontopredictwherethetrackedfeatureswillappearinthenextframe.Wethusdonotneedtoperformthefullcameracalibrationprocedureoneverynewincomingimage.

Itiswellknownthatreasoningaboutthreedimensionalinformationfromtwodimensionalimagesis error prone and sensitive to noise, a fact which has to be taken into account in any imageprocessingmethodusing realvideodata. Inorder tocopewith thisnoise sensitivityweexploitphysicalconstraintsofmovingobjects.Sincewedonothaveanyaprioriknowledgeaboutforceschangingthemotionofthecameraortheobjects,weassumenoforces(accelerations)andhenceaconstant velocity. In this case a generalmotion can be decomposed in a constant translationalvelocityofthecenterofmassoftheobject,andarotationwithconstantangularvelocityaroundanaxisthroughthecenterofmass(e.g.Goldstein,1980).Thisconstitutesoursocalledmotionmodel(seeFigure10).Sowedonotonlymeasure(estimate)thepositionandorientationofthecameraandmovingobjectsasinthecaseofmagnetictrackingbutalsotheirchangeintimewithrespecttoastationaryworldframe,i.e.theirtranslationalandangularvelocity.Thisisalsoreferredtoasmotionestimation.

Figure10.Each3Dmotioncanbedecomposedinatranslationtandarotation.Wechoosearotationaboutanaxesthroughthecenterofmassoftheobjects,whichisconstantintheabsence

ofanyforces. denotestheworldcoordinateframe,and denotesthecameracoordinateframe.

The motion parameters (translational and angular velocity according to the motion model) areestimated using timerecursive filtering based on Kalman Filter techniques (e.g. BarShalom &Fortmann, 1988Gelb, 1974),where the unknown accelerations are successfullymodeled as socalledprocessnoise, inorder to allow for changesof thevelocities.The timerecursive filtering

processenablessmoothmotionseveninthepresenceofnoisyimagemeasurements,andenablesapredictionmeasurementupdatestepforeachvideoframe.Thepredictionallowsareductionofthesearchspaceforfeaturesinthenextvideoimageandhencespeedsuptheprocess.

A typical drawback of opticalmethods is based on the fact thatwewant to reason about threedimensionalinformationfromtwodimensionalimagemeasurements,whichcanleadtonumericalinstabilitiesifnotperformedcarefully.Ontheotherhandthereistheadvantageoftheimageofrealobjectsbeingalmostperfectlyalignedwiththerenderedcounterpartsincethealignmenterrorcanbe minimized in the image. Optical tracking approaches can hence be very accurate. Anotheradvantageofopticaltrackingisthatitisanonintrusiveapproach,sinceitoperatesjustonvisualinformation,anditisbasicallynotlimitedtoanyspatialrange.Itisfurthermoresomehownaturalsinceitisthewaymosthumanstrackobjectsandnavigatewithinanenvironment.

4. ObjectInteraction

Realistic immersion of virtual objects into a real scene requires that the virtual objects behave inphysicallyplausiblemannerswhentheyaremanipulated,i.e.:theyoccludeorareoccludedbyrealobjects,theyarenotabletomovethroughotherobjects,andtheyareshadowedorindirectlyilluminatedbyotherobjectswhilealsocastingshadowsthemselves.Toenforcesuchphysicalinteractionconstraintsbetweenrealandvirtualobjects, theAugmentedRealitysystemneedstohaveaverydetaileddescriptionof thephysicalscene.

1. Acquisitionof3DSceneDescriptions

Figure11.ModifiedEngine.ThefactthattheuserhasremovedtheaircleanerisnotyetdetectedbytheARsystem.Thevirtualmodelthusdoesnotalignwithitsrealposition.

The most straightforward approach to acquiring scene descriptions would suggest the use ofgeometricmodels, e.g.,CADdata.Given suchmodels, theAR systemneeds to align themwiththeirphysicalcounterpartsintherealscene,asdescribedinsection5.2.2.Theadvantageofusingsuch models is that they can easily serve as starting points for accessing highlevel, semanticinformationabouttheobjects,asisdemonstratedinthemechanicalrepairapplication.

However,therearesomeproblemswiththisapproach.First,geometricmodelsarenotavailableinallcases.Forexample,interiorrestorationofoldbuildingstypicallyneedstooperatewithoutCADdata. Second, available models are not complete. Since models are abstractions of reality, realphysicalobjectstypicallyshowmoredetailthanisrepresentedinthemodels.Inparticular,genericscenemodelscannotfullyanticipatetheoccurrenceofnewobjects,suchascoffeemugsontables,carsorcranesonconstructionsites,users'hands,orhumancollaborators.Furthermore,thesystemneeds to account for the changing appearances of existing objects, such as buildings underconstructionorenginesthatarepartiallydisassembled(seeFigure11).Whenusersseesuchneworchangedobjectsinthescene,theyexpectthevirtualobjectstointeractwiththeseastheydowiththerestofthe(modeled)scene.

Computer vision techniques can be used to acquire additional information from the particularsceneunderinspection.Althoughsuchinformationgenerallylackssemanticdescriptionsaboutthesceneandthuscannotbeuseddirectlytoaugmentrealitywithhigherlevelinformation,suchas

theelectricwiringwithinawall, itprovides theessentialenvironmentalcontext for the realisticimmersion of virtual objects into the scene. Thus, we expect future AR systems to use hybridsolutions,usingmodeldatatoprovidethenecessaryhighlevelunderstandingoftheobjectsthataremost relevant to the tasks performed, and enriching themodelswith automatically acquiredfurtherinformationaboutthescene.

We are investigating how stateoftheart image understanding techniques can be used in ARapplications. One particular paradigm in computer vision, shape extraction, determines depthinformationassocalled2Dsketchesfromimages.Thesearenotfull3Ddescriptionsofthescenebutratherprovidedistance(depth)estimates,withrespecttothecamera,forsomeorallpixelsinanimage.Ongoingresearchdevelopstechniquestodetermineobjectshapefromstereoimages,frommotionsequences,fromobjectshading,fromshadowcasting,fromhighlightsandgloss,andmore.Itisimportanttoconsiderwhetherandhowsuchalgorithmscanbeusedcontinuously,i.e.,whiletheuserisworkinginthescene.Alternatively,thealgorithmscouldbeusedduringtheinitialsetupphase,gathering3DsceneinformationonceandcompilingaroughsketchofthescenethatthenneedstobeupdatedwithothertechniquesduringtheARsession.Yetotheroptionsinvolvetheuseofothersensingmodalitiesbesidescameras,suchaslaserrangescannersorsonarsensors.

Thissectiondiscussestwoapproachesweareinvestigating.

1. DenseShapeEstimatesfromStereoData

Stereoisaclassicalmethodofbuildingthreedimensionalshapefromvisualcues.Itusestwocalibratedcameraswithtwoimagesofthescenefromdifferentvantagepoints.Usingstereo triangulation, the 3D location of dominant object features that are seen in bothimagescanbedetermined:ifthesamepointonanobjectisseeninbothimages,rayscastfromthefocalpointsofbothcamerasthroughthefeaturepositionsintheimagesintersectin3Dspace,determiningthedistanceoftheobjectpointfromthecameras.

Shape from stereo has been studied extensively in the computer vision literature. Thechoiceof imagefeaturedetectionalgorithmsandoffeaturematchingalgorithmsbetweenimages is of critical importance.Depending on the type ofmethods and algorithms oneuses, shape from stereo may result in sparse depth maps or dense depth maps. For ourresearch,thegoalistousethecomputed3DshapeinformationintheARapplications.Inmostifnotallsuchscenarios,theavailabilityofdensemapsareneeded.Therefore,wehavetakenanexistingalgorithm(Weng,Huang&Ahuja,1989)tocomputeadensedepthmapwhich is used in the AR context. The camera geometry is obtained by calibrating bothcameras independentlyusingoneof thecameracalibrationmethodsdescribed in section5.1.

Thedetailsofthestereoalgorithmaregiveninthepaper(Weng,Huang&Ahuja,1989).Insummary,theheartofthealgorithmliesinthecomputationofthedisparitymap (du,dv)whichdescribesthedistancebetweenmatchedpointsinbothimages.Thisisaccomplishedby computing matches between four kinds of image features derived from the originalimages: smoothed intensity images, edge magnitudes, positive corners, and negativecorners. The positive and negative corners separate the contrast direction at a corner.Distinguishing between these four feature types improves the matching results bypreventing that incompatible image features are matched between the images, such aspositiveandnegativecorners.

The overall algorithm iteratively determines the (locally) bestmatch between the imagefeatures that have been computed in both images. Starting with an initial hypotheticalmatch,thematchesareiterativelychangedandimproved,minimizinganenergyfunctionwhichintegratesovertheentireimagetheinfluenceofseveralerrortermsrelatedtothequalityoftheedgematchesbetweentheleftandrightimage,aswellasasmoothnesstermwhichensuresthattherecoveredsurfaceisnotexceedinglyroughandnoisy.

Figure12showsapairofstereoimages.ThedisparitymapscomputedfromtheseimagesareshowninFigure13andthedepthmapisshowninFigure14(a).Finally,Figure14(b)showshowthecomputeddepthmapisusedtooccludethreevirtualfloatingcubes.

Figure12.Anexamplepairofstereoimages:(a)Leftimageand(b)Rightimage.

Figure13.ThedisparitiescomputedonthestereopairinFigure12(a)disparitiesinrows(du)and(b)disparitiesincolumns(dv).Thebrighterpointshavelargerdisparities.

Figure14.(a)ThecomputeddepthmapfromthepairofimagesinFigure12.Thebrighterpointsarefartherawayfromthecamera.(b)Thecomputeddepthmapin(a)isusedtooccludethevirtualobject(inthiscaseacube)whichhasbeenaddedinthescene.

2. ShapefromShading

Complementary to geometric shape extraction methods, some approaches exploit thephotometric reflection properties of objects. An image of a smooth object with uniformsurface reflectance properties exhibits smooth variations in the intensity of the reflectedlight referred toasshading.This information is usedbyhuman andother natural visionsystemstodeterminetheshapeoftheobject.Thegoalinshapefromshadingistoreplicatethis process to the point of being able to design an algorithm that will automaticallydeterminetheshapeofasmoothobjectfromitsimage(Horn&Brooks,1989).

Thisshapeinformationcanbeusedinanumberofapplicationareaswhereknowledgeofthe spatial characteristics in a scene is important. In particular, shape from shadinginformationcan fill thegaps in sparsedepthmaps that are left bygeometrybased shapeextraction methods. Geometric extraction works best on highly textured objects wheremanyfeaturescanbematchedbetweenimages.Shapefromshading,ontheotherhand,canpropagateshapeinformationintohomogeneousareas.

Weareinvestigatinghowthesecondderivative,orhessian,ofasmoothobjectsurfacecanbedetermineddirectlyfromshadinginformation.Themethodofcharacteristicstripswhichisoftenused for calculating shape fromshading (Horn,1986), is set in the frameworkofmoderndifferentialgeometry.Weextendthismethodtocomputethesecondderivativeoftheobjects surface, independently from the standard surfaceorientationcalculation.Thisindependently derived information can be used to help classify critical points, verifyassumptions about the reflectance function and identify effectively impossible images(Greer&Tuceryan,1995).

2. MixingofRealandVirtualWorlds

Onceappropriatescenedescriptionshavebeenobtainedinteractivelyorautomatically,theyformthebasisformixingrealandvirtualworlds.Sincethemixingmustbeperformedatinteractiverates,great emphasis has to be placed on efficiency. Depending on the representation of the scenedescriptions,differentoptionscanbepursued.

Ifthescenedescriptionisavailableasageometricmodel,wecanhandthecombinedlistofrealandvirtualmodelstothegeometricrendererwhichwillthencomputetheinteractionsbetweenrealandvirtual objects for us. By renderingmodels of real objects in black, we can use the luminancekeyingfeatureofthevideomixertosubstitutetherespectiveareawithlivevideodata.Asaresult,theuserseesapictureonthemonitorthatblendsvirtualobjectswithlivevideo,whilerespecting3Docclusionrelationshipsbetweenrealandvirtualobjects.

Thisisastraightforwardapproachinapplicationswheregeometric,polygonalscenedescriptionsareavailable.Ifthedescriptionsarecomputedasdepthmaps,asdescribedinsection6.1,thedepthmapsstillneedtobeconvertedintoageometricrepresentation,bytessellatinganddecimatingthedata(Schroeder,Zarge&Lorensen,1992Turk,1992).

Alternatively, we can sidestep the tessellation and rerendering phases for real objects byinitializingtheZbufferofthegraphicshardwarewiththedepthmap(Wloka&Anderson,1995).Occlusion of the virtual objects is then performed automatically. When the virtual object isrendered,pixelsthatarefurtherawayfromthecamerathantheZvaluesinthedepthmaparenotdrawn.Bysettingthebackgroundcolortoblack,therealobjectspresentintheoriginalvideoaredisplayed in these unmodified pixels. Figure 14(a) presents three virtual cubes occluded by awoodenstandwithanengineandoccludingtheotherobjectsinarealroom,usingthedepthbasedapproach.

These approaches have advantages and disadvantages, depending on the application. Full 3Dgeometricmodelsarebestforrealtimemovementofcameras.Polygonalapproximationstodepthmapscanbeusedoveracertain rangeofcamerapositionssince thesynthesizedscenemodel isrerendered when the camera moves. Copying the depth maps directly into the Zbuffer is thehardest approach: the map needs to be recomputed after each camera motion because the newprojectivetransformation"shifts"alldepthvaluesinthedepthmap.Thus,thisapproachonlyworkswithstationarycamerasorwithshapeextractionalgorithmsthatperformatinteractivespeeds.

Ontheotherhand,thegeometricmodelingapproachsuffersfromaninherentdependenceonscenecomplexity. If thesceneneeds toberepresentedbyavery largepolygonalmodel, therenderingtechnologymaynotbeabletoprocessitinrealtime.Incontrast,thesizeofadepthmapdoesnotdepend on scene complexity.Which approach to use in an application depends on the overallrequirementsandthesystemdesign.

5. CollaborativeUseofAR

So far we were discussing techniques and solutions that make AR "work" for the single user. Objectmodeling,objectinteraction,realisticdisplayandimmersiveinterfacesallservetopresenttheuserwithaconsistentandcoherentworldofrealandvirtualobjects.

Whenweconsidertheapplicationscenariosdescribedaboveweareremindedofthefactthatinanyvirtualor real environment it appears natural to encounter other persons and to interact with them. VirtualenvironmentsareapromisingplatformforresearchintheCSCWarea,anddistributedmultiuserinterfacesareachallengeformanyVEsystems(e.g.theeffortsrelatedtotheVRMLproposal(Bell,Parisi&Pesce,1995)). In the context of the GRASP system, we are interested in the problem and the paradigms ofdistributedAR.Weareinvestigatingsolutionsintheareaofdistributedcomputingandexperimentwithsystemarchitecturesforcollaborativeinterfacestosharedvirtualworlds.

1. ArchitectureforSharedAR

Eachsystemsupportingmultiuservirtualenvironmentscanbecharacterizedbythedegreeortypeof concurrency, distribution, and replication in the system architecture (Dewan, 1995). Sharingbetweenusershastobebasedonseparabilityintheuserinterface:wecallthedatabaseofsharedlogicalobjects the "model", andcreate "views" as a specific interpretationof themodel in eachinterface.TheneedforrapidfeedbackintheuserinterfacemakesareplicatedarchitectureattractiveforAR.Thisinturnleadstoobjectlevelsharingwhereeachusercanviewandmanipulateobjectsindependently. It is necessary to manage the shared information so that simultaneous andconflictingupdatesdonot lead to inconsistent interfaces.This isguaranteedby thedistributioncomponentinourapplications.

The model replication and distribution support allow the user interfaces of one application toexecute as different processes on different host computers. GRASP interfaces are not multithreaded,sothedegreeofdistributioncorrespondstothedegreeofconcurrencyinthesystem.Theresultingarchitecturewasimplementedandsuccessfullyusedintheinteriordesigndemonstration.

2. ProvidingDistribution

The replicated architecture is directly supported by the Forecast library of the GRASP system.Basedonamessagebusabstraction,Forecastprovidesaneasy,reliable,anddynamicapproachtoconstructingdistributedARapplications.

Centraltothissupportisaonetomanyreliablecommunicationfacilitywhichcanbedescribedasadistributedextensionofahardwaresystembus.Components,situatedondifferentmachines,candynamically connect to the same distributed bus and send and receive messages over it. ThisanalogyhasbeenusedbeforeforgroupcommunicationorbroadcastsystemsanditsmessagingandselectioncapabilityarecommontosystemssuchasLindaandSun'sToolTalk(Sunsoft,1991).

TheForecastmessagebus implementsaonetomanyFIFO(first in firstout)multicast transportprotocol.Aspecialsequencerprocessisusedtoimposeauniqueglobalorderingonmessages.Inthesimplerformoftheprotocol,nodesthatwishtobroadcastsendtheirmessagetothesequencerwhich thenuses theonetomany reliableprotocol todisseminate themessage.Auniqueglobalorderisimposedonthemessagestreamssinceallmessagespassthroughthesequencer.Nodescandetecthowtheirmessageswerescheduledbylisteningtotheglobalmessagestream.TheprotocolissimilartotheAmoebaereliablemulticastprotocol(Kaashoek&Tanenbaum,1992),exceptthatitusesreliablebufferedtransmissionbetweennodesandthesequencernodeattheexpenseofextraacknowledgments.

Wechoose themessagebusabstractionbecause itprovides location, invocationand replicationtransparency for applications (Architecture Projects Management, 1989) which makes theprogramming of these applications easier.GRASPprogrammers are familiarwith the concept ofmultiplelocalviewsandevents,bothofwhichwehaveextendedtoourdistributedsetting.

TheForecastmessage bus is usedwithin our two collaborativeARdemonstrators to implementmodel replication, direct interaction between components (e.g., to send pointer trackinginformationtoremoteparticipants),andalsousinggenericfunctionslikefloorcontrolandlocking,state transfer, shared manipulators, video transmission (based on theMBONE audio and videolibrary (Macedonia&Brutzman,1994),andsynchronizationbetweenvideoand trackingevents(usingRTPstyletimestamps).

6. Discussion

UsingAugmentedRealityinrealisticapplicationsrequiresthecomputertobeverywellinformedaboutthe3Dworldinwhichusersperformtheirtasks.Tothiseffect,ARsystemsusevariousdifferentapproachesto obtain, register and track object and scene models. Of particular importance are different sensingdevices, such as cameras or magnetic trackers. They provide the essential realtime link between thecomputer'sinternal,"virtual"understandingoftheworldandreality.Allsuchsensorsneedtobecalibratedcarefullysothattheincominginformationisinalignmentwiththephysicalworld.

SensorinputisnotusedtoitsfullpotentialincurrentARsystemsduetorealtimeconstraints,aswellasduetothelackofalgorithmsthatinterpretsignalsorcombineinformationfromseveralsensors.Researchfields such as computer vision, signal processing, pattern recognition, speech processing, etc. haveinvestigatedsuchtopicsforsometime.Somealgorithmsarematuringsothatconsideringtheprojectedannual increases in computer speed it should soon become feasible to consider their use in ARapplications. In particular, many applications operate under simplified (engineered) conditions so thatsceneunderstandingbecomesaneasiertaskthanthegeneralComputerVisionProblem(see,forexample(Marr,1980)).

Weoperateat thisborderlinebetweencomputervisionandAR, injectingasmuchautomation into theprocessasfeasiblewhileusinganengineeringapproachtowardssimplifyingthetasksofthealgorithms.Inthisrespect,weemphasizethehybriduseofvariousdifferenttechniques,includinginteractiveuserinputwhereconvenient,aswellasothersensingmodalities(magnetictrackers).Thispaperhasshownhowwehave developed and explored different techniques to address some of the important AR issues. Ourpragmatic approach has allowed us to build several realistic demonstrations. Conversely, theseapplicationsinfluenceourresearchfocus,indicatingclearlythediscrepancybetweenthestateoftheartand what is needed. Tradeoffs between automation and assistance need to be further explored. Userinteraction should be reserved as much as possible to the highlevel control of the scene and itsaugmentationwithsyntheticinformationfrommultimediadatabases.Moresensingmodalitiesneedtobeexploredwhichwillallowtheusertointeractwiththecomputerviamorechannels,suchasgestureandsound.Experimentationwithheadmounted,seethroughdisplaysiscrucialaswellespeciallyinregardtothequestionwhetherandhowtheARsystemcanobtainopticalinputsimilartowhattheuserseessothatcomputervisiontechniquescanstillbeused.Theforemostconcern,however,remainstheprovisionoffast,realtimeinteractioncapabilitieswithrealandvirtualobjectsintegratedseamlesslyinanaugmentedworld.Tothisend,theaccuratemodeling,trackingandpredictionofuserorcameramotionisessential.

ArelatedresearchdirectionleadsustoinvestigatethecollaborativeuseofAugmentedReality.Asreportedinthispaper,wehavedevelopedadistributedinfrastructuresothatallourdemonstrationscan

operateinacollaborativesetting.WeconsiderthecollaborativeuseofARtechnologytobeakeyinteractionparadigmintheemergingglobalinformationsociety.Thehighlyinteractive,visualnatureofARimposeshardrequirementsonthedistributedinfrastructure,anddemandsthedevelopmentofappropriatecollaborationstyles.

AugmentedReality, especially in a collaborative setting, has the potential to providemuch easier andmoreefficientuseofhumanandcomputerskillsbymergingthebestcapabilitiesofboth.Consideringtherapid researchprogress in this field,we expect futuristic scenarios like collaborative interior design, orjointmaintenanceandrepairofcomplexmechanicaldevicestosoonbecomerealityfortheprofessionaluser.

Acknowledgments

ThisworkwasfinanciallysupportedbyBullSA,ICLPLC,andSiemensAG.WewouldliketothankthedirectorofECRC,AlessandroGiacalone,formanystimulatingdiscussionsregardingpotentialapplicationscenariosfordistributed,collaborativeAugmentedReality.ManycolleaguesatECRC,especiallyStefaneBressanandPhilippeBonnet,contributedsignificantly to thesuccessful implementationandpresentationof the InteriorDesignandMechanicalRepair demonstrations, providing other key pieces of technology (data base access) thatwere notdiscussedinthispaper.

References

Ahlers, K.H., Crampton, C., Greer, D., Rose, E., & Tuceryan, M. (1994). Augmented vision: A technicalintroductiontotheGRASP1.2system.TechnicalReportECRC9414,http://www.ecrc.de.

Ahlers,K.H.,Kramer,A.,Breen,D.E.,Chevalier,P.Y.,Crampton,C.,Rose,E.,Tuceryan,M.,Whitaker,R.T.,&Greer,D.(1995).Distributedaugmentedrealityforcollaborativedesignapplications.Proc.Eurographics95.

ArchitectureProjectsManagement.(1989).ANSA:AnEngineersIntroductiontotheArchitecture.APMLimited,PoseidonHouse,CambridgeCB3ORD,UnitedKingdom,Nov.

Azuma,R.,&Bishop,G. (1994). Improving static and dynamic registration in an optical seethroughdisplay.ComputerGraphics,July,194204.

Bajura,M., Fuchs, H., &Ohbuchi, R. (1992).Merging virtual objects with the real world: Seeing ultrasoundimagerywithinthepatient.ComputerGraphics,July,203210.

Bajura,M.,&Neumann,U. (1995).Dynamic registrationcorrection inaugmentedrealitysystems. Proc. of theVirtualRealityAnnualInternationalSymposium(VRAIS95),189196.

BarShalom,Y.,&Fortmann,T.E.(1988).TrackingandDataAssociation.AcademicPress,NewYork.

Baudel, M., & BeaudouinLafon, M. (1993). Charade: Remote control of objects using freehand gestures.CommunicationsoftheACM,37(7),2835.

Bell, G., Parisi, A., & Pesce, M. (1995). The virtual reality modeling language, version 1.0 specification.http://vrml/wired.com/vrml.tech/

Betting,F.,Feldmar,J.,Ayache,N.,&Devernay,F.(1995).Aframeworkforfusingstereoimageswithvolumetricmedicalimages.Proc.of the IEEEConferenceonComputerVision,VirtualRealityandRobotics inMedicine(CVRMed95),3039.

Caudell,T.,&Mizell,D.(1992).Augmentedreality:Anapplicationofheadsupdisplaytechnologytomanualmanufacturingprocesses.Proc.oftheHawaiiInternationalConferenceonSystemSciences,659669.

Deering,M.(1992).Highresolutionvirtualreality.ComputerGraphics,26(2),195202.

Dewan,P.(1995).Multiuserarchitectures.Proc.EHCI95.

Drascic, D., Grodski, J.J., Milgram, P., Ruffo, K., Wong, P., & Zhai, S. (1993). Argos: A display system foraugmentingreality.Formalvideoprogramandproc.oftheConferenceonHumanFactorsinComputingSystems(INTERCHI93),521.

Feiner,S.,MacIntyre,B.,&Seligmann,D.(1993).Knowledgebasedaugmentedreality.CommunicationsoftheACM,36(7),5362.

Fournier, A. (1994). Illumination problems in computer augmented reality. Journe INRIA, Analyse/SynthseDImages,Jan,121.

Gelb,A.(ed.)(1974).AppliedOptimalEstimation.MITPress,Cambridge,MA.

Gleicher,M.,&Witkin,A.(1992).Throughthelenscameracontrol.ComputerGraphics,July,331340.

Goldstein,H.(1980).ClassicalMechanics,AddisonWesley,Reading,MA.

Gottschalk, S., & Hughes, J. (1993). Autocalibration for virtual environments tracking hardware. ComputerGraphics,Aug.,6572.

Greer,D.S.,&Tuceryan,M.(1995).Computingthehessianofobjectshapefromshading.TechnicalreportECRC9530,http://www.ecrc.de.

Grimson,W.E.L.,Ettinger,G.J.,White,S.J.,Gleason,P.L.,LozanoPerez,T.,Wells,W.M.III,&Kikinis,R.(1995).Evaluatingandvalidatinganautomatedregistrationsystemforenhancedrealityvisualizationinsurgery.Proc.oftheIEEEConferenceonComputerVision,VirtualRealityandRoboticsinMedicine(CVRMed95),312.

Grimson,W.E.L.,LozanoPerez,T.,Wells,W.M.III,Ettinger,G.J.,White,S.J.,&Kikinis,R.(1995).Anautomaticregistrationmethodforframelessstereotaxy,imageguidedsurgery,andenhancedrealityvisualization.Proc. oftheIEEEConferenceonComputerVision,VirtualRealityandRoboticsinMedicine(CVRMed95),430436.

Grimson,W.E.L.(1990).ObjectRecognitionbyComputer.MITPress,Cambridge,MA.

Henri,C.J.,Colchester,A.C.F.,Zhao,J.,Hawkes,D.J.,Hill,D.L.G.,&Evans,R.L.(1995).Registrationof3Dsurfacedataforintraoperativeguidanceandvisualizationinframelessstereotacticneurosurgery.Proc.oftheIEEEConferenceonComputerVision,VirtualRealityandRoboticsinMedicine(CVRMed95),4758.

Holloway,R. (1994).AnAnalysis ofRegistrationErrors in a SeeThroughHeadMountedDisplay System forCraniofacialSurgeryPlanning.Ph.D.thesis,UniversityofNorthCarolinaatChapelHill.

Horn,B.K.P.(1986).RobotVision.MITPress,Cambridge,MA.

Horn,B.K.P.,andBrooks,M.J.(1989).ShapefromShading.MITPress,Cambridge,MA.

Janin, A., Mizell, D., & Caudell, T. (1993). Calibration of headmounted displays for augmented realityapplications.Proc.oftheVirtualRealityAnnualInternationalSymposium(VRAIS93),246255.

Kaashoek,M.F.,&Tanenbaum,A.S. (1992). FaultTolerance usingGroupCommunication.Operating SystemsReview.

Kancherla, A.R, Rolland, J.P.,Wright, D.L.,& Burdea, G. (1995). ANovel Virtual Reality Tool for TeachingDynamic 3D Anatomy. Proc. of the IEEE Conference on Computer Vision, Virtual Reality and Robotics inMedicine(CVRMed95),163169.

Kramer, A., & Chevalier, P.Y. (1996). Distributing augmented reality. Submitted to Virtual Reality AnnualInternationalSymposium(VRAIS96).

Lorensen, W., Cline, H., Nafis, C., Kikinis, R., Altobelli, D., & Gleason, L. (1993). Enhancing reality in theoperatingroom.Proc.oftheIEEEConferenceonVisualization,410415.

Lowe,D.(1985).PerceptualOrganizationandVisualRecognition.KluwerAcademic,Norwell,MA.

Macedonia, M.R., & Brutzman, D.P. (1994). MBONE provides audio and video across the internet. IEEEComputer,April.

Marr,D.(1980).Vision:AComputationalInvestigationintotheHumanRepresentationandProcessingofVisualInformation.Freeman,SanFrancisco.

Mellor, J.P. (1995). Realtime camera calibration for enhanced reality visualizations. Proc. of the IEEEConferenceonComputerVision,VirtualRealityandRoboticsinMedicine(CVRMed95),471475.

Milgram, P., Zhai, S., Drascic, D., & Grodski, J.J. (1993). Applications of augmented reality for humanrobotcommunication.Proc.oftheInternationalConferenceonIntelligentRobotsandSystems(IROS93),14671472.

Peria,O.,Chevalier,L.FranoisJoubert,A.,Caravel,J.P.,Dalsoglio,S.,Lavallee,S.,&Cinquin,P.(1995).Usinga3DpositionsensorforregistrationofSPECTandUSimagesofthekidney.Proc.of the IEEEConferenceonComputerVision,VirtualRealityandRoboticsinMedicine(CVRMed95),2329.

Schroeder,W.,Zarge,J.&Lorensen,W.(1992).Decimationoftrianglemeshes.ComputerGraphics,26(2),6570.

SunSoft(1991).TheTooltalkService.Technicalreport,SunSoft,June.

Tuceryan, M., Greer, D., Whitaker, R., Breen, D., Crampton, C., Rose, E., & Ahlers, K. (1995). Calibrationrequirementsandproceduresforamonitorbasedaugmentedrealitysystem.IEEETransactionsonVisualizationandComputerGraphics,1,255273.

Turk,G.(1992).Retilingpolygonalsurfaces.ComputerGraphics,26(2),5564.

Uenohara,M.&Kanade,T.(1995).Visionbasedobjectregistrationforrealtimeimageoverlay.Proc.oftheIEEEConferenceonComputerVision,VirtualRealityandRoboticsinMedicine(CVRMed95),1322.

Wellner,P.(1993).Interactingwithpaperonthedigitaldesk.CommunicationsoftheACM,36(7),8796.

Weng,J.,Huang,T.S.,&Ahuja,N.(1989).Motionandstructurefromtwoperspectiveviews:Algorithms,erroranalysis,anderrorestimation.IEEETransactionsonPatternAnalysisandMachineIntelligence,11(5),451476.

Whitaker,R.,Crampton,C.,Breen,D.,Tuceryan,M.,&Rose,E.(1995).Objectcalibrationforaugmentedreality.Proc.Eurographics95.

Wloka,M.&Anderson,B.(1995).Resolvingocclusioninaugmentedreality.Proc.oftheACMSymposiumonInteractive3DGraphics,512.

TableofContents

1.Introduction

2.PreviousWork

3.ApplicationScenarios

3.1CollaborativeInteriorDesign

3.2CollaborativeMechanicalRepair

4.SystemInfrastructure

5.SpecificationandAlignmentofCoordinateSpaces

5.1CalibrationofSensorsandVideoEquipment

5.1.1ImageCalibration

5.1.2CameraCalibration

5.1.3MagneticTrackerCalibration

5.2RegistrationofInteractionDevicesandRealObjects

5.2.1PointerRegistration

5.2.2ObjectRegistration

5.3TrackingofObjectsandSensors

5.3.1MagneticTracking

5.3.2OpticalTracking

6.ObjectInteraction

6.1Acquisitionof3DSceneDescriptions

6.1.1DenseShapeEstimatesfromStereoData

6.1.2ShapefromShading

6.2MixingofRealandVirtualWorlds

7.CollaborativeUseofAR

7.1ArchitectureforSharedAR

7.2ProvidingDistribution

8.Discussion

ListofFigures

Figure1.Augmentedroomshowingarealtablewitharealtelephoneandavirtuallamp,surroundedbytwovirtualchairs.Notethatthechairsarepartiallyoccludedbytherealtablewhilethevirtuallampoccludesthetable.

Figure2.Augmentedengine.

Figure3.TheGRASPsystemhardwareconfiguration.

Figure4.TheGRASPsystemsoftwareconfiguration.

Figure5.Thecameracalibrationgrid.

Figure6.3Dpointingdevice.

Figure7.Calibrationandtrackinganenginemodel:Awireframeenginemodelregisteredtoarealmodelengineusinganimagebasedcalibration(a),butwhenthemodelisturnedanditsmovementstracked(b),thegraphicsshowthemisalignmentinthecamera'szdirection.

Figure8.Ouropticaltrackingapproachcurrentlytracksthecornersofsquares.Theleftfigureshowsacornerofaroomwitheightsquares.Therightfigureshowsthedetectedsquaresonly.

Figure9.Augmentedscenewithavirtualchairandshelfthatwererenderedusingtheautomaticallytrackedcameraparameters.

Figure10.Each3Dmotioncanbedecomposedinatranslationtandarotation.Wechoosearotationaboutanaxesthroughthecenterofmassoftheobjects,whichisconstantintheabsenceofany

forces. denotestheworldcoordinateframe,and denotesthecameracoordinateframe.

Figure11.ModifiedEngine.ThefactthattheuserhasremovedtheaircleanerisnotyetdetectedbytheARsystem.Thevirtualmodelthusdoesnotalignwithitsrealposition.

Figure12.Anexamplepairofstereoimages:(a)Leftimageand(b)Rightimage.

Figure13.ThedisparitiescomputedonthestereopairinFigure12(a)disparitiesinrows(du)and(b)disparitiesincolumns(dv).Thebrighterpointshavelargerdisparities.

Figure14.(a)ThecomputeddepthmapfromthepairofimagesinFigure12.Thebrighterpointsarefartherawayfromthecamera.(b)Thecomputeddepthmapin(a)isusedtooccludethevirtualobject(inthiscaseacube)whichhasbeenaddedinthescene.

computer graphics and augmented reality

Documents

arbrings information

collaborative interior

promising new technology

collaborativemechanical